5,947 Matching Annotations
  1. Feb 2025
    1. Author response:

      The following is the authors’ response to the previous reviews.

      Since multiple Reviewers requested that the results describing effects of TTX treatment on GluA2 receptor levels detected by immunofluorescence and confocal imaging be revised, we have made substantial changes, which are described below. We believe the changes have greatly improved the manuscript and thank the reviewers for their comments.

      Lack of significant increase in GluA2 receptor data is due to too few cultures sampled; anything could have happened [in one] particular dissociation. A concern that the TTX effect might vary greatly from culture to culture was why we felt it was important to match the receptor measurements on the same cultures that we recorded mEPSCs. We now present the culture means in Figure 5A (mEPSCs) and 5B (GluA2 receptor cluster size). These plots make it clear that the variability in the GluA2 receptor cluster size effect is not attributable to a failure of that culture to show a homeostatic effect. That is, the variability in GluA2 receptor effect is independent of the variability in mEPSC effect. To increase sample size, we examined 2 additional cultures for synaptic GluA2 receptor levels in control vs. TTX treatment. These cultures showed very modest increases (Figure 5C). When cell means from these experiments were pooled with those from the 3 matched cultures, the TTX effect was still not statistically significant (Figure 5G).

      Lack of significant increase in GluA2 receptor data is due to the choice to restrict our analysis to the primary dendrite, close to the cell body. We restricted our analysis to the primary dendrite because Figure 3 in Turrigiano et al, 1998, shows the increased response to exogenously applied glutamate after TTX treatment is greatest close to the cell body and wanes as the glutamate is applied further away (added to Results, new lines 388-389).

      Variability in GluA2 receptor data is due to the much smaller number of synapses sampled, compared to mEPSCs. We matched the sampling for mEPSC amplitude data to that of imaging data by taking only 20 samples from each electrophysiological recording. Each mEPSC represents one synapse; in a set of 20 mEPSCs some might come from the same synapse, so that we are sampling from £ 20 synapses. The effect of TTX on mEPSC amplitudes remained significant despite the reduced samples per cell (Figure 5A).

      Why do we fail to show a significant increase in receptors when this has been shown in many studies?

      We have added to our discussion the point that several studies, including Wang et al. 2019, use the number of puncta, rather than the number of cells, as the sample number. We ran an analysis of GluA2 receptor cluster size where we sampled multiple synapses per cell, and used the number of clusters as the sample n. We found that even with as few as 6 synapses randomly selected from each cell, the effect of TTX on GluA2 receptor cluster size became highly significant (p = 0.001 for data from 3 cultures and p = 0.005 for data from 5 cultures) (see new lines 400-406 in Discussion). In sum, our data are not very different from that of some previous studies. We are not arguing that receptors do not increase. Instead our point is that the increase is more variable than the increase in MESPC amplitude and thus takes a much bigger sample size to detect. In sum, the difference between the mEPSC data and the receptor data is that the mEPSC data consistently show a ~20-25% increase, whereas the receptor data do not always show an increase and sometimes the increase is only ~10%. Finally, we added two matched culture experiments examining synaptic GluA1 receptor cluster characteristics. GluA1 receptor cluster size decreased in one culture, and increased very modestly in the other (Supplemental Figure 1B), whereas mEPSC amplitude robustly increased (Supplemental Figure 1A; Results, new lines 265-268).

      We conclude that these data support the idea that there is another contributor to the TTXinduced increase in quantal size.

      Other changes in presentation of GluA2 receptor results: Since the effects on intensity and integral are of lesser magnitude than that on cluster size, we have removed these results from the graphs, although they are presented in Table 1. We have removed Figure 6, the presentation of individual culture results, since these results are now conveyed in Figure 5A-C. We have removed graphs depicting GluA2 receptor cluster size in response to TTX in Rab3A-/- cultures, but these data are still presented in Table 1.

      We address other detailed comments below.

      Public Reviews:

      Reviewer #1 (Public review):

      (2) The effects of Rab3A on TTX-induced mini frequency modulation remains unclear, because TTX does not induce a change in mini frequency in the Rab3A+/Ebd control (Fig. 2). The respective conclusions should be revised accordingly (l. 427).

      The effects on mini frequency were added for completeness, but given the lack of consistently significant changes with TTX treatment or changes in the KO or Rab3A<sup>Ebd/Ebd</sup> cultures, we have removed comment on these results from the Discussion.

      (3) The model is still not supported by the data. In particular, data supporting a negative regulation of Rab3A by APs, Rab3A-dependent release of a tropic factor, or a Rab3Adependent increase in GluA2 abundance are not presented.

      We have removed the model from the manuscript.

      (4) Data points are not overlapping and appear "quantal" in most box plots. How were the data rounded?

      The appearance of quantal variation in cell amplitude means is due to the binning that is part of the creation of the box plot. We have not remade the figures without binning, because the binning provides a visual depiction of the distribution of the data points. We have added the bin sizes to the appropriate figure legends.

      Reviewer #2 (Public review):

      However, the authors still have not provided further investigation of the mechanisms behind the role of Rab3A in this form of plasticity, and the revision therefore has added little to the significance of the study. Moreover, the experimental design for the investigation of the mismatch between mEPSC amplitude and GluA2 cluster fluorescence remains questionable, making it difficult to draw any credible conclusions from groups of data that not only look similar to the eye but also show no significance statistically.

      To our knowledge, no other study has matched measurements of mEPSC amplitude in the same cultures where synaptic receptor levels were assessed. As stated above, we have revised the presentation of GluA2 receptor results, concluding from the lack of significant effects on receptor levels that the mEPSC amplitude increase cannot be fully explained by the receptor data (which is strengthened by addition of two more cultures analyzed for GluA2 immunofluorescence). This is an important addition to the significance of the study.

      In summary, this study establishes that neuronal Rab3A plays a role in homeostatic synaptic plasticity, but so do a number of other molecules that have been implicated in homeostatic synaptic plasticity in the past two decades (only will grow with the new techniques such as RNAseq). Without going beyond this finding and demonstrating how exactly Rab3A participates in the induction and/or expression of this form of plasticity, or maybe the potential Rab3A-mediated functional and behavioral defects in vivo, the contribution of the current study to the field is limited. However, given the presynaptic location of Rab3A, this finding could serve as a starting point for researchers interested in pre-postsynaptic cross-talk during homeostatic plasticity in general.

      We previously published a review in which we list 19 molecules known at that time to be important for homeostatic synaptic plasticity (see Table 2, Koesters et al., 2024), and they fall into two categories: molecules involved in glutamate receptor expression or trafficking, and signaling molecules. Rab3A is the first synaptic vesicle protein to be implicated in homeostatic plasticity of quantal size. We have added this point to the Discussion, new lines 473-476. By demonstrating that Rab3A is not acting in glia (which release TNF, which regulates receptor expression), and that GluA2 receptor levels do not explain the homeostatic mEPSC increase in our experimental conditions, we have ruled out two major mechanisms.

      Reviewer #3 (Public review):

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a frequency effect that is unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. However the change in frequency seems to argue (as the authors do) that some synapses only have CP-AMPARs, while the rest of the synapses have few or none. Another possibility is that there are pre-synaptic NASPM-sensitive receptors that influence release probability. Further, the amplitude data show a strong trend towards smaller amplitude following NASPM treatment (Fig 3B). The p value for both control and TTX neurons was 0.08 - it is very difficult to argue that there is no effect. The decrease on average is larger in the TTX neurons, and some cells show a strong effect. It is possible there is some heterogeneity between neurons on whether GluA1/A2 heteromers or GluA1 homomers are added during HSP. This would impact the weakly supported conclusions about the GluA2 imaging vs mEPSC amplitude data.

      We cannot rule out that the NAPSM-induced decrease in mEPSC frequency is due to a loss of presynaptic glutamate receptor enhancement of release probability, and have added this statement to the Results, new lines 202-204. Regarding the p value of 0.08—we are not arguing that NASPM has no effect on mEPSC amplitude, only that it has no effect on the homeostatic increase in amplitude after TTX treatment. An increase in GluA1/A2 heteromers should have been detected in our imaging studies.

      Unaddressed issues that would greatly increase the impact of the paper:

      (1) Is Rab3A acting pre-synaptically, post-synaptically or both? The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where it is acting (pre or post) would aid substantially in understanding its role. They could use sparse knockdown of Rab3A, or simply mix cultures from KO and WT mice (with appropriate tags/labels). The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. The more support for their suggestion of a pre-synaptic site of control, the better.

      We agree that doing co-cultures of Rab3A-/- and Rab3A+/+ neurons is the definitive experiment to determine the locus of action of Rab3A in homeostatic synaptic plasticity. We hope to examine this question in a future manuscript.

      (2) Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs (ie the opposite of whatever is happening at excitatory synapses). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling; an effect only at excitatory synapses would argue for a more specific role just at these synapses.

      We agree that it would be very interesting to determine if the homeostatic decrease in mIPSCs after activity blockade depends on Rab3A. We hope to address this question in the future.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      The abstract is a bit repetitive in places. Some editing would be advised.

      We did not identify anything repetitive in the abstract except the parallel construction referring to the previous findings at the NMJ and current findings in cortical neurons. However, we have eliminated a section in the introduction which went into detail about the receptor imaging results (previous lines 103-110).

      Line 77: 'shift toward early awakening' is unclear; do you mean shorter sleep/wake cycle? Other circadian issues? A more complete description is needed.

      We have moved the additional detail about the Earlybird mutation’s effect on circadian period from the Results to the Introduction, new lines 77 to 79.

      The results section has many passages that seem more like discussion, offering various interpretation and alternatives for the data. While some commentary is appropriate, to justify the next series of experiments and maintain a logical flow, this manuscript has rather a high amount of this. Some editing and shifting material to the discussion might be warranted.

      We have reduced the commentary in the Results section.

      Line 245: GluA2 homomers are really unlikely, as they won't pass current (unless unedited) and don't often if ever form. But GluA2/A3 heteromers are likely (and detected by their methods).

      GluA2 homomers do conduct current, albeit less than heteromers (Swanson et al., 1997; Oh and Derkach, 2005; Coombs et al., 2019). [The Oh and Derkach paper shows a GluA2 homomer current in Supplementary Figure 3]. We have modified the text to acknowledge that the GluA2 receptor imaging will detect heteromers and homomers (Results, new lines 214 to 215).

      Line 258: If the number of synaptic pairs analyzed was usually <20, what was the average and range of pairs? This gets into the sampling issue.

      We have added the average number of synaptic sites (20.4 ± 6.5) and range (11-38) to the text, Results, new line 229.

      Are the stats of the baseline mEPSC amplitude and frequency shifts (WT vs KO on WT feeder layer) given somewhere (lines 398-402)? If not, please add them.

      These stats have been added to the text, mEPSC amplitude, (CON, WT on WT, 13.3 ± 0.5 pA; CON, KO on WT, 15.2 ± 1.1 pA, p = 0.23, Kruskal-Wallis test), new lines 325-326 and frequency, (CON, WT on WT, 2.54 ± 0.57 sec<sup>-1</sup>; CON, KO on WT, 4.46 ± 1.21 sec<sup>-1</sup>, p = 0.23, Kruskal-Wallis test), new lines, 329-330.

      25mM K+ is going to be much more than 'mildly' depolarizing (line 697). Should just skip that word.

      ‘mildly’ has been removed.

      The section on MiniAnalysis seems overly argumentative, and there is no need to discuss flaws in the Wu paper. The important thing (a bit buried at the end of this section) is that the manual mini selection was done blind to condition, which is the normal way of dealing with potential bias. It would be better to limit the methods to describing what was done.

      The bulk of the justification of manual analysis has been removed from the text.

      The discussion of potential conductance changes (lines 534-6) seems somewhat unwarranted.

      Modification of GluA1 phosphorylation in the GluA1/A2 heteromer would not be detected by NASPM (and the NASPM data being a bit inconclusive anyway). Further, auxiliary subunits (like TARPs) can alter conductance of any of the AMPARs. So I don't think they have enough data to exclude such a possibility.

      The discussion of contributions of conductance have been removed from the text.

      Coombs ID, Soto D, McGee TP, Gold MG, Farrant M, Cull-Candy SG (2019) Homomeric GluA2(R) AMPA receptors can conduct when desensitized. Nat Commun 10:4312.

      Oh MC, Derkach VA (2005) Dominant role of the GluR2 subunit in regulation of AMPA receptors by CaMKII. Nat Neurosci 8:853-854.

      Swanson GT, Kamboj SK, Cull-Candy SG (1997) Single-channel properties of recombinant AMPA receptors depend on RNA editing, splice variation, and subunit composition. J Neurosci 17:5869.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigate the role of BEND2, a novel regulator of meiosis, in both male and female fertility. Huang et al have created a mouse model where the fulllength BEND2 transcript is depleted but the truncated BEND2 version remains. This mouse model is fertile, and the authors used it to study the role of BEND2 on both male and female meiosis. Overall, the full-length BEND2 appears dispensable for male meiosis. The more interesting phenotype was observed in females. Females exhibit a lower ovarian reserve suggesting that full-length BEND2 is involved in the establishment of the primordial follicle pool.

      Strengths:

      The authors generated a mouse model that enabled them to study the role of BEND2 in meiosis. The role of BEND2 in female fertility is novel and enhances our knowledge of genes involved in the establishment of the primordial follicle pool.

      Weaknesses:

      The manuscript extensively explores the role of BEND2 in male meiosis; however, a more interesting result was obtained from the study of female mice. Only a few experiments were performed using female mice, therefore, more experiments should be performed to complete the story of the role of BEND2 on female fertility. In addition, the title and abstract of the manuscript do not align with the story, as female fertility is only a small portion of the data compared to the male fertility section.

      We appreciate the reviewer’s thoughtful summary, recognition of the strengths of our study, and constructive feedback. In the revised manuscript, we have performed additional experiments to enhance our understanding of the role of BEND2 in female gametogenesis. These new experiments provide further insights into the establishment of the ovarian reserve and the role of BEND2 in female fertility.

      Additionally, we have rewritten the title, abstract, and introduction to better align with the content of the manuscript and to reflect the balance between the male and female fertility results. We believe these changes address the reviewer’s concerns and improve the overall clarity and focus of the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      • I recommend that the authors re-organize their abstract and introduction to accurately reflect the manuscript's primary focus on male fertility. Right now, the title of the manuscript is misleading. The manuscript does not investigate reproductive aging; rather, it primarily describes the depletion of primordial follicle number. The mechanism behind this depletion and whether this phenotype accelerates reproductive aging, are not explored. Clarifying these points will help align the title and content of the manuscript more accurately.

      We thank the reviewer for this suggestion. We agree that the original title and abstract did not fully capture the focus of the study. In response, we have rewritten the title, abstract, and introduction to better align with the results presented, focusing more clearly on the implications of the effects of the full-length BEND2 depletion for spermatogenesis and oogenesis. These revisions ensure that the title, the abstract, and the manuscript's introduction are now more accurately reflective of the work performed.

      • Figure 1: I couldn't find the validation of the polyclonal antibody against BEND2 that the authors generated.

      Regarding this query about the validation of the polyclonal antibody against BEND2, we apologize for any confusion. We would like to clarify that this validation is indeed presented in Figure 2 of our manuscript. To ensure this information is easily accessible, we have revised the text to explicitly mention the validation in Figure 2.

      • Figure 2A: Could you provide the actual numbers for the weight of the mice testis?

      In response to this question regarding Figure 2A and the weights of the mice testis, we have now included this data in a graph in Fig 2A and Table S1 and added this information in the results section.

      • Figure 2C and D: I am confused by the fact that in the WB we can appreciate a high expression of the p75 protein, but the signal is very low in the IF (Figure 2D).

      We thank the reviewer for raising this point. We acknowledge the apparent discrepancy between the strong p75 signal observed in the Western blot (Fig. 2C) and the weaker signal seen in the immunofluorescence (Fig. 2D). We think several factors could contribute to this difference, such as differences in sensitivity and detection methods, epitope accessibility, protein localization or differences in sample preparation, antibody affinity, and experimental conditions between Western blot and IF.

      • In the same figure, the authors also mention that the p75 protein is functional. On what basis do they rely on reaching this conclusion?

      We acknowledge that we cannot definitively confirm the functionality of the p75 protein. Our assumption was based on the observed fertility of the male mice and existing literature indicating that BEND2 is essential for completing meiosis (Ma et al., 2022). However, we understand the importance of clarity in our claims. To avoid any potential confusion, we have revised the sentence to read: "The p75 BEND2 protein—likely corresponding to an exon 11-skipped transcript—is present and might be functional in our mutant testis, based on the observed phenotype (see below)."

      • The phenotype in females is very interesting. The authors conclude that BEND2 influences primordial follicle formation, oocyte quality, fertility, and reproductive aging by (1) performing follicle counts, (2) analyzing the litter size, and (3) analyzing meiotic progression. Given that the authors build their story around these experiments, I strongly encourage them to expand the section on female fertility, or reorganize the manuscript, or be more cautious with some of their conclusions. They might consider performing additional experiments such as:

      - Oocyte quality: To determine whether BEND2 impacts oocyte quality, mice should be stimulated with hormones and oocyte quality should be analyzed (GV, MI, MII progression, spindle morphology and/or fertilization, and embryo development). Does the decrease in primordial follicles correlate with the number of ovulated oocytes, or is the impact only on oocyte quality?

      We appreciate the reviewer's suggestion to assess the impact of BEND2 on oocyte quality. Following the reviewer’s recommendation, we stimulated three control and three mutant mice. We analyzed the number of ovulated oocytes, their fertilization rate, and the percentage of embryos that developed to the blastocyst stage. These new results are included in the revised manuscript (see Results section and new Table 1). Our analyses indicate that for all parameters assessed, control and mutant oocytes behaved similarly. Specifically, there were no significant differences in the number of ovulated oocytes, fertilization rates, or the ability of embryos to progress to the blastocyst stage between the control and mutant groups. These findings suggest that mutant oocyte quality is comparable to control mice of a similar age. We have incorporated these new results into the manuscript.

      - Reproductive aging: A fertility trial would provide more information on whether BEND2 depletion triggers an acceleration of reproductive aging. In addition, the oldest mice used by the authors are 9 months old, and at this point, fertility has not declined yet.

      We appreciate the reviewer's suggestion regarding the assessment of reproductive aging. However, we respectfully disagree with the assertion that fertility has not declined by 9 months of age. In our colony, we have observed a significant decline in fertility around 10 months of age. Specifically, out of 18 10-month-old female mice placed in breeding cages, we observed only three pregnancies within the first 30 days (N.N. and I.R., data not published). Based on these observations, we determined that fertility begins to decline around this age in our colony, which informed our decision to use 9-month-old mice as the oldest age group for our analysis. Thus, this age is appropriate for evaluating the potential effects of BEND2 depletion on reproductive aging in our specific mouse population.

      - The observation that the primordial follicle pool is already diminished in mice that are 1 week old is very interesting. Some experiments that the authors could perform to figure out the mechanism are: (1) Analyzing apoptosis. Are the primordial follicles dying during the pool's establishment, or is this an ongoing apoptotic process throughout the mice's lifespan? (2) If the authors still have ovaries from mice younger than 1 week of age (when the primordial pool is forming), they could perform DDX4 staining and quantify the number of oocytes in follicles and the total number of oocytes. These experiments would provide mechanistic insights into whether BEND2 impacts the formation of the primordial follicle pool or if the pool forms but is then depleted.

      We appreciate the reviewer's suggestion to further explore the mechanism behind the reduced primordial follicle pool. In response, we have analyzed the number of DDX4positive cells (DDX4 labels oocytes) in newborn mutant and wild-type animals. Our results show that mutant ovaries contain significantly fewer oocytes compared to controls (see new Fig. 5). This finding supports the hypothesis that BEND2 is critical for the establishment of a normal ovarian reserve. We are grateful for this suggestion, as these additional data reinforce our conclusion that BEND2 is required to determine a normal ovarian reserve in mice.

      • What is the red signal in Supplementary Figure 1C?

      This image depicts the BEND2 staining pattern in 16 days post-coitum (dpc) wild-type mouse ovaries. To clarify this and prevent any confusion, we have updated the figure legend to explicitly state that the sample shown is from a wild-type mouse.

      • Please spell out the full term of all the acronyms.

      We apologize for the oversight in not fully spelling out some acronyms in the original manuscript. We have carefully reviewed the entire manuscript and have ensured that all acronyms are now spelled out in full upon their first use in the revised version. We want to thank the reviewer for bringing this to our attention.

      • Is Line-1 also dysregulated in the ovary? This was one of the main findings from the male part. It would be interesting to perform the same analysis in the ovary since Line1 has a role in establishing the ovarian reserve (PDMI: 31949138).

      We thank the reviewer for this insightful suggestion. We have analyzed the number of LINE1 and SYCP3-positive cells in wild-type and mutant newborn ovaries (new Fig. S4). Our results show no significant difference between the two genotypes, suggesting that LINE-1 is not dysregulated in newborn Bend2 mutant oocytes. These findings indicate that, at least in the context of the newborn ovary, LINE-1 does not appear to be affected by BEND2 depletion.

      Reviewer #2 (Public Review):

      In their manuscript entitled "BEND2 is a crucial player in oogenesis and reproductive aging", the authors present their findings that full-length BEND2 is important for repair of meiotic double strand break repair in spermatocytes, regulation of LINE-1 elements in spermatocytes, and proper oocyte meiosis and folliculogenesis in females. The manuscript utilizes an elegant system to specifically ablate the full-length form of BEND2 which has been historically difficult to study due to its location on the X chromosome and male sterility of global knockout animals.

      While the manuscript is an overall excellent addition to the field, it would significantly benefit from a few additional experiments, as well as some additional clarification/elaboration.

      The claim that BEND2 is required for ovarian reserve establishment is not supported, as the authors only look at folliculogenesis and oocyte abundance starting at one week of age, after the reserve is formed. Analysis of earlier time points would be much more convincing and would parse the role of BEND2 in the establishment vs. maintenance of this cell population. In spermatocytes, the authors demonstrate a loss of nuclear BEND2 in their mutant but do not comment on the change in localization (which is now cytoplasmic) of the remaining protein in these animals. This may have true biological significance and a discussion of this should be more thoroughly explored.

      We thank the reviewer for their thoughtful feedback and constructive suggestions to improve our manuscript.

      In response to the comment regarding the establishment of the ovarian reserve, we have now analyzed Bend2 mutant and control newborn ovaries. Our results show a significant reduction in the number of DDX4-positive cells in mutant ovaries compared to controls. These findings demonstrate that BEND2 is required for the establishment of the ovarian reserve, as the reduction is evident at birth.

      Regarding the cytoplasmic staining of BEND2 in mutant spermatocytes, we did perform secondary-antibody-only controls using goat anti-rabbit Cy3 to address the specificity of the signal. The staining observed in the Bend2 mutants closely resembles background staining, suggesting that the cytoplasmic signal is nonspecific. Therefore, we do not believe this represents a meaningful change in the localization of BEND2 protein in the mutants. We have clarified this in the revised manuscript to address this point.

      We hope these additional experiments and clarifications strengthen the manuscript and address the reviewer’s concerns.

      Reviewer #2 (Recommendations For The Authors):

      Major points:

      (1) The title of the manuscript does not accurately capture the content of the work. The vast majority of the data presented here is from the male, which is not reflected at all in the title - perhaps considering revising it?

      Thank you for your valuable suggestion. We agree that the original title did not fully reflect the focus of the manuscript. In response, we have revised the title, along with the abstract and introduction, to more accurately capture the content of the study and the emphasis on the male data. These changes ensure that the manuscript more clearly aligns with the results presented.

      (2) In Figure 2D, the authors demonstrate that WT BEND2 expression and localization are lost in the mutant, but staining is still apparent, just in the cytoplasm. Did the authors perform secondary-antibody-only controls to determine if this was background staining or real staining? If real, can they comment on the change in localization of the protein?

      We thank the reviewer for this insightful question. We have indeed performed secondary antibody-only controls using goat anti-rabbit Cy3. The staining observed in the Bend2 mutants closely resembles background staining, suggesting that the signal in the cytoplasm is not specific. Therefore, we do not believe this staining represents any real or meaningful expression of the BEND2 protein in the mutants.

      (3) In Figure S2A, the authors show Ku70 staining and describe that it is similar between the genotypes, but - to my eye - it looks quite distinctly different. It appears to stain in patches in WT SYCP3+ spermatocytes, versus staining in patches in the more mature, SYCP3- germ cells closer to the lumen in the mutant. Can the authors please clarify, or provide arrows to point which foci they are referring to?

      We apologize for the confusion caused by the image provided in the original submission. Upon review, we realized that the mutant image was not fully representative of the staining pattern observed in the majority of mutant samples. We have replaced this image with a new one in the revised manuscript, which more accurately reflects the similarity in Ku70 staining between wild-type and mutant testis. In this updated Figure S2, we have also included arrowheads to indicate the relevant foci, making it clearer to the reader. We have updated the figure legend to correspond with these changes as well.

      (4) The authors state that BEND2 is "required to establish the ovarian reserve during oogenesis" but this has not been demonstrated. The authors do show a reduced density of primordial follicles at one week of age. While this is compelling data, the ovarian reserve is established earlier in the mouse, around postnatal days 0-1, so it is not clear from this manuscript whether BEND2 is required for the maintenance of this population after PND1, leading to reduced numbers by 1 week of age, OR if it is required for the establishment of this population, which would result in reduced numbers of oocytes around the time of birth. This is a critical experiment that should be performed in order to determine which of these possibilities is likely the case. Ideally, looking at embryonic through early postnatal time points during ovarian development would be very helpful.

      We thank the reviewer for raising this important point. As mentioned earlier in response to Reviewer 1, we have performed the experiment suggested by Reviewer 2 and analyzed the number of DDX4-positive cells in newborn ovaries. Our results show that Bend2 mutant ovaries have fewer oocytes at birth than wild-type controls (Fig. 5H). This finding reinforces our conclusion that BEND2 is indeed required to establish the ovarian reserve, as the reduction in oocyte number is evident at the time of birth. We agree that this additional data strengthens our original claim, so we have included these results in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Huang et al. investigated the phenotype of Bend2 mutant mice which expressed a truncated isoform. This mutant male showed increasing apoptosis due to unrepaired double-strand breaks. However, this mutant male has fertility, and this enabled them to analyze Bend2 function in females. They revealed that Bend2 mutation in females showed decreasing follicle numbers which leads to loss of ovarian reserve.

      Strengths:

      Since their Bend2 mutant males were fertile, they were able to analyze the function of Bend2 in females and they revealed that loss of Bend2 causes less follicle formation.

      Weaknesses:

      Why the phenotype of their mutant male is different from previous work (Ma et al.) is not clear enough although they discuss it.

      We appreciate the reviewer’s comment regarding the differences between our Bend2 mutant male phenotype and the previously reported phenotype by Ma et al., 2022. We believe this discrepancy is due to the fact that the Bend2 locus encodes two BEND2 isoforms: p140 and p80. In contrast to the previous study, where both proteins were ablated by mutation employed (the deletion of exons 12 and 13), our exon 11 deletion specifically ablates p140 expression while allowing the expression of p80 in the testis.

      Based on the distinct phenotypes observed in the two Bend2 mutant mouse models, we hypothesize that p80 is sufficient to fulfill BEND2’s roles in meiosis, which could explain why our Bend2 mutant males remain fertile. We have rewritten the relevant sections in the results and discussion to better articulate this hypothesis and clarify the potential mechanisms behind the observed phenotypic differences.

      We hope these clarifications and additional details adequately address the reviewer’s concerns.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors showed that Bend2 mutant females had decreased fertility. This may be due to decreased ovarian reserve. Did the authors check if the mutant mice decreased or lost fertility faster than WT? If the authors have the data, please refer to it in the manuscript.

      We followed the breeding performance of a small number of control and Bend2 mutant females, and preliminary observations suggested no clear differences between the two groups. However, due to the limited sample size, we felt that these data were not conclusive enough to be included in the manuscript. We agree that a more thorough analysis of fertility decline over time would be valuable, and we plan to address this question in a future study.

      (2) In Figure 1 A, there is no exon1 in the upper figure.

      We thank the reviewer for pointing this out. We have revised Figure 1A to include exon 1 and ensure the schematic is accurate. The updated figure is included in the revised version of the manuscript.

      (3) Figure 3A, it would be nice to show several tubules of the testis section as well as an enlarged one.

      Following the reviewer's advice, we have revised Figure 3A to include new images showing several tubules and an enlarged view of one section of a tubule. These updates are included in the revised manuscript to better represent the testis sections.

      (4) Please be consistent with the format of the graph, especially Supplemental figures 2C and 4D.

      We have revised the figures, including Supplemental Figures 2C and 4D, to ensure consistency in the format throughout the manuscript. We have made modifications to the figures to align them more closely and improve the overall presentation.

    1. Author response:

      We are grateful to the reviewers and editors for their time and positive assessment of our manuscript. We will incorporate all their comments to further improve our work. In the revised version of the manuscript, we will provide a more detailed description of the quantification of the wrapping index and further explain the differential roles of Htl and Uif during cell growth versus the role of Notch during axon wrapping. In addition, we will perform further experiments using combinations of reporters and antibodies to further explore the relationship between Htl, Uif and Notch. The discussion will be expanded and possible mechanisms by which Uif 'stabilises' a specific membrane domain will be included.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewing Editor Comment:<br /> Please note that all three reviewers suggested this manuscript would best fit as a resource paper at eLife.

      Reviewer #1 (Public review):

      Summary:

      This impressive study presents a comprehensive scRNAseq atlas of the cranial region during neural induction, patterning, and morphogenesis. The authors collected a robust scRNAseq dataset covering six distinct developmental stages. The analysis focused on the neural tissue, resulting in a highly detailed temporal map of neural plate development. The findings demonstrate how different cell fates are organized in specific spatial patterns along the anterior-posterior and medial-lateral axes within the developing neural tissue. Additionally, the research utilized high-density single-cell RNA sequencing (scRNAseq) to reveal intricate spatial and temporal patterns independent of traditional spatial techniques.

      The investigation utilized diffusion component analysis to spatially order cells based on their positioning along the anterior-posterior axis, corresponding to the forebrain, midbrain, hindbrain, and medial-lateral axis. By cross-referencing with MGI expression data, the identification of cell types was validated, affirming the expression patterns of numerous known genes and implicating others as differentially expressed along these axes. These findings significantly advance our understanding of the spatially regulated genes in neural tissues during early developmental stages. The emphasis on transcription factors, cell surface, and secreted proteins provides valuable insights into the intricate gene regulatory networks underpinning neural tissue patterning. Analysis of a second scRNAseq dataset where Shh signaling was inhibited by culturing embryos in SAG identified known and previously unknown transcripts regulated by Shh, including the Wnt pathway.

      The data includes the neural plate and captures all major cell types in the head, including the mesoderm, endoderm, non-neural ectoderm, neural crest, notochord, and blood. With further analyses, this high-quality data promises to significantly advance our understanding of how these tissues develop in conjunction with the neural tissue, paving the way for future breakthroughs in developmental biology and genomics.

      Strengths:

      The data is well presented in the figures and thoroughly described in the text. The quality of the scRNAseq data and bioinformatic analysis is exceptional.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Reviewer #2 (Public review):

      Summary:

      Brooks et al. generate a gene expression atlas of the early embryonic cranial neural plate. They generate single-cell transcriptome data from early cranial neural plate cells at 6 consecutive stages between E7.5 to E9. Utilizing computational analysis they infer temporal gene expression dynamics and spatial gene expression patterns along the anterior-posterior and mediolateral axis of the neural plate. Subsequent comparison with known gene expression patterns revealed a good agreement with their inferred patterns, thus validating their approach. They then focus on Sonic Hedgehog (Shh) signalling, a key morphogen signal, whose activities partition the neural plate into distinct gene expression domains along the mediolateral axis. Single-cell transcriptome analysis of embryos in which the Shh pathway was pharmacologically activated throughout the neural plate revealed characteristic changes in gene expression along the mediolateral axis and the induction of distinct Shh-regulated gene expression programs in the developing fore-, mid-, and hindbrain.

      Strengths:

      This manuscript provides a comprehensive transcriptomic characterisation of the developing cranial neural plate, a part of the embryo that to my knowledge has not been extensively analysed by single-cell transcriptomic approaches. The single-cell sequencing data appears to be of high quality and will be a great resource for the wider scientific community. Moreover, the computational analysis is well executed and the validation of the sequencing data using published gene expression patterns is convincing. Taken together, this is a well-executed study that describes a relevant scientific resource for the wider scientific community.

      Weaknesses:

      Conceptually, the findings that gene expression patterns differ along the rostrocaudal, mediolateral, and temporal axes of the neural plate and that Shh signalling induces distinct target genes along the anterior-posterior axis of the nervous system are more expected than surprising. However, the strength of this manuscript is again the comprehensive characterization of the spatiotemporal gene expression patterns and how they change upon ectopic activation of the Shh pathway.

      Reviewer #3 (Public review):

      Summary:

      The authors performed a detailed single-cell analysis of the early embryonic cranial neural plate with unprecedented temporal resolution between embryonic days 7.5 and 8.75. They employed diffusion analysis to identify genes that correspond to different temporal and spatial locations within the embryo. Finally, they also examined the global response of cranial tissue to a Smoothened agonist.

      Strengths:

      Overall, this is an impressive resource, well-validated against sets of genes with known temporal and spatial patterns of expression. It will be of great value to investigators examining the early stages of neural plate patterning, neural progenitor diversity, and the roles of signaling molecules and gene regulatory networks controlling the regionalization and diversification of the neural plate.

      Weaknesses:

      The manuscript should be considered a resource. Experimental manipulation is limited to the analysis of neural plate cells that were cultured in vitro for 12 hours with SAG. Besides the identification of a significant set of previously unreported genes that are differentially expressed in the cranial neural plate, there is little new biological insight emerging from this study. Some additional analyses might help to highlight novel hypotheses arising from this remarkable resource.

      We thank all three reviewers for their thoughtful and constructive public reviews and believe they nicely capture the contributions of our study. We agree that this article represents a valuable resource for the community and agree with its designation as a Tools and Resources article.

      We also thank the reviewers for their useful suggestions for improving the manuscript. In addition to addressing most of their comments, described below, we note that we have changed midbrain-hindbrain boundary (MHB) to rhombomere 1 (r1) throughout the paper and in Tables S4, S7, S10, and S11, as this designation is more closely aligned with the literature on this region. In addition, we added the anterior-posterior and mediolateral cluster identities from our wild-type analysis for the genes that were differentially expressed in SAG-treated embryos in Table S11. Lastly, we have added a new figure (Figure 5—figure supplement 2), as suggested by Reviewer 2, in which we compare our results with the published expression of genes in neural progenitor domains along the dorsal-ventral axis of the spinal cord.

      Reviewer #1 (Recommendations for the authors):

      I have a few small suggestions for improving the presentation of the data.

      (1) It would be helpful to show illustrations and embryo images of all the stages utilized in the analysis in Figures 1A and B.

      (2) It was difficult to distinguish all the different colors in Figures 3B and 4B. Could you label, as in Figure 4, supplements 1D, F?

      (3) I was confused by the position of the color code key for Figure 7D-J, thinking it belonged to panels B and C. Could you put it under the figure/heatmap key so that it is clearly linked to panels D-J?

      Thank you for these suggestions. We have incorporated the third suggestion to improve readability, but were not able to make the first two changes due to space limitations.

      Reviewer #2 (Recommendations for the authors):

      I only have a couple of minor additional suggestions/questions for the authors:

      (1) The authors state that nearly half of the transcripts they found as differentially regulated in SAG-treated embryos were also characterized as spatially regulated in the wild-type embryos. It would be great if the authors could provide more detail here. How many of the transcripts that are differentially regulated along the mediolateral axis of the wild-type are characterized as differentially regulated in the SAG-treated embryos? How does this further break down into where these genes are expressed along the mediolateral and the anterior-posterior axes? I am aware that the authors answer some of these questions already by providing examples, but a more systematic characterisation would be appreciated here.

      We have updated Table S11 to include the anterior-posterior and mediolateral cluster identities of differentially expressed genes in SAG-treated embryos, where applicable. In addition, we have added more discussion of the genes from our SAG analysis that were also found to be spatially patterned in wild-type embryos to the fourth paragraph of the last results section.

      (2) Related to the previous question, the authors nicely demonstrate that SAG treatment of embryos causes many transcriptional changes, including the expression/repression of several transcription factors well-known to mediate spatial patterning, raising the question of which of these effects are directly due to gene regulation by the Shh pathway and which effects are secondary consequences of transcriptional changes of other transcription factors. Similarly, the authors' results also suggest that some genes are only induced in specific parts along the neuraxis, raising the question of why. The authors could attempt some type of regulon-interference approaches to identify further candidates that may mediate these effects.

      This is an excellent suggestion for a future extension of this work, as we agree that validation of the predicted SHH targets, including which targets are direct, indirect, or region-specific, would be required to evaluate the predictions of this scRNA-seq analysis.

      (3) The authors report that they observed 'a previously unreported inhibition of Scube2' upon SAG treatment of the embryos. At least in the spinal cord Scube2 is well-known to be expressed at a distance from the source of Shh secretion (e.g. Kawakami et al. Curr. Biol. 2005), thus the direct or indirect repression by Shh signalling is strongly expected. Moreover, a recent preprint (Collins et al. bioRxiv, https://doi.org/10.1101/469239 ) suggests that the interaction between Shh and Scube2 can mediate the scale-invariance of Shh patterning. Of note, the authors of this preprint also state that 'upregulation of Shh represses scube2 expression while Shh downregulation increases scube2 expression thus establishing a negative feedback loop.'

      Thank you for this suggestion. We have added these references.

      (4) The authors partition genes based on different diffusion components as being differentially expressed along the mediolateral axis. However, starting from ~e8.5, neural progenitors in the neural tube can be partitioned based on the expression of well-characterised combinatorial sets of transcription factors into molecularly defined progenitor domains that subsequently give rise to functionally distinct types of neurons. How much of this patterning process can the authors capture with their diffusion component analysis and does their data also allow them to capture these finer-grained differences in gene expression along the mediolateral and prospective dorsal-ventral axis of the neural tube that are known to exist?

      This is a very interesting point. We have added a new figure showing UMAPs of the E8.5-9.0 cranial neural plate for a subset of 29 genes (described in Delile et al., 2019) that define distinct neural progenitor domains along the dorsal-ventral axis of the spinal cord (Figure 5—figure supplement 2). We observed that 18 of 20 genes that were detected in the midbrain/r1 region in our dataset were expressed in broad domains along the mediolateral axis of the cranial neural plate that were roughly consistent with their expression domains along the dorsal-ventral axis of the spinal cord. Of these 18 genes, 14 were patterned along both anterior-posterior and mediolateral axes, 2 were patterned only along the mediolateral axis, and 2 were patterned only along the anterior-posterior axis. These results suggest a general correspondence between mediolateral patterning in the cranial neural plate and dorsal-ventral patterning in the spinal cord. However, less refinement of these domains along the mediolateral axis was observed in the cranial neural plate, possibly because the relatively early, pre-closure stages captured by our dataset may be before the establishment of secondary feedback systems that lead to fine-scale patterning of mutually exclusive neural precursor domains. These results are described in the last paragraph of the results section titled “An integrated framework for analyzing cell identity in multiscale space.”

      (5) The authors state that they will not only make the raw sequencing data but also the processed intermediate data files available. This is greatly appreciated as it strongly facilitates the re-use of the data. However, it would be also appreciated if the authors made the computational code publicly available that was used to analyze the data and generate the figure panels in the manuscript.

      We have deposited the processed h5ad files in the GEO database, accession number GSE273804. Additionally, we have made interactive python notebooks available with the code used to analyze gene expression and generate the figures in this study, as well as code used to automatically generate customizable links to gene expression images in the Mouse Genome Informatics Gene Expression database, on our lab GitHub page (https://github.com/ZallenLab). We have updated the Data availability section to reflect these changes.

      Reviewer #3 (Recommendations for the authors):

      (1) Considering that individual progenitor domains in the developing neural tube are typically sharply delineated with few cells exhibiting mixed identities, it is interesting that clustering of single-cell data results in a largely continuous “cloud” of cells. Is this because the early neural plate cells have not yet crystallized their identity, or would clustering based on a smaller set of genes that exhibit high variance across only neural plate cells result in improved granularity, allowing for better characterization and quantification of distinct progenitor subtypes?

      Thank you for raising this interesting point. The apparent continuity of gene expression in the cranial neural plate could reflect a gene signature shared by cranial neural plate cells and that cells may not be extensively regionalized into unique populations at these early stages. We now discuss these possibilities in the third paragraph of the discussion.

      (2) Can the authors clarify how neural plate cells were identified and how they were distinguished from the anterior epiblast?

      Cell typing was performed by supervised clustering based on known markers of fate. Cranial neural plate cells were identified by their expression of pan-neural factors (Sox2 and Sox3), early or late neural plate markers (Cdh1 or Cdh2), and the lack of markers associated with non-neural ectodermal cell fates (Grhl2, Krt18, Tfap2a) or other cell types (Ets1, T, Tbx6). Full gene sets used to identify all cell types in our analysis are provided in Supplementary Table 13.

      (3) Did the study identify cells with cranial placode identity? Cranial placodes emerge during the same period, and it would be useful to highlight them in Figure 1.

      Thank you for highlighting this point. Examination of the early placode markers Six1 and Eya1 indicates that cranial placode cells are a subset of the cells in PhenoGraph cluster 17 in our full dataset Figure 1—figure supplement 1). We now mention this along with other cell types of interest in the last paragraph of the discussion.

      (4) It could be interesting to provide more information about the novel genes identified as differentially expressed along the AP or mediolateral axes. Do they belong to gene families that were not previously implicated in neural patterning, or do they point to novel biological mechanisms controlling neural patterning?

      Diverse gene families are represented by the genes that are patterned along the anterior-posterior and mediolateral axes of the cranial neural plate at these stages, likely due to the large number of genes that are spatially patterned in this tissue. Further investigation of the biological mechanisms suggested by these patterns is an important direction for future work, both in terms of molecularly classifying the genes identified as well as directly investigating their roles in neural patterning using genetic analysis.

      (5) It would be helpful to discuss how the data presented here compare to other relevant single-cell analyses, such as PMC10901739. This would help to highlight aspects that are unique to this study.

      We have added this reference as well as an earlier study from these authors and we discuss how our study complements this work in the introduction.

      (6) The inclusion of single-cell data from control embryos that were cultured for 12 hours is of great interest. The authors should identify the set of genes that are deregulated in cultured cells and, taking advantage of their detailed temporal series, examine whether the maturation of cultured embryos progresses normally or whether there are genes that fail to mature correctly in vitro.

      We agree that an analysis of the impact of ex vivo culture on gene expression would be useful. However, the large difference in the number of cells in our wild-type and cultured embryo datasets, as well as the lack of time-course data for the cultured embryos, could make a comparison between our current cultured and non-cultured embryo datasets difficult to interpret.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors studied how hippocampal connectivity gradients across the lifespan, and how these relate to memory function and neurotransmitter distributions. They observed older age with less distinct transitions and observed an association between gradient de-differentiation and cognitive decline.

      This is overall an innovative and interesting study to assess gradient alterations across the lifespan and its associations to cognition.

      The paper is well-written, and the methods appear sound and thoughtful. There are several strengths, including the inclusion of two independent cohorts, the use of gradient mapping and alignment techniques, and an overall sound statistical and analysis framework. There are several areas for potential improvements in the paper, and these are listed below:

      We thank the Reviewer for their positive assessment and summary of our work. We address each of the Reviewer’s comments below, and outline the revisions we have made to the manuscript based on the Reviewer’s suggestions.

      (1) The reported D1 associations appear a bit post-hoc in the current work and I was unclear why the authors specifically focussed on dopamine here, as other transmitter systems are similar present at the level of the hippocampus and implicated in aging.

      Other neurotransmitter systems may indeed be relevant in the context of hippocampal function in aging. In this study, however, we included a specific research question about the DA D1 receptor (D1DR) based on previous research 1) emphasizing the role of DA neuromodulation in maintaining functional network segregation in aging to support cognition (Pedersen et al., 2023), 2) reporting heterogeneous distribution of DA markers across the hippocampus, supporting efficient modulation of distinct behaviors (Dubovyk & ManahanVaughan, 2019; Edelmann & Lessmann, 2018; Gasbarri et al., 1994; Kempadoo et al., 2016), and 3) demonstrating the spatial distribution of D1DRs as varying across neocortex along a unimodal-transmodal gradient (Pedersen et al., 2024). To which degree this variation might be reflected in cortico-hippocampal connectivity, however, remained to be investigated. As such, one of the study’s specific aims was to evaluate the spatial distribution of D1DRs as a molecular correlate of the hippocampus’ functional organization. Importantly, we were interested in mapping associations between individual differences in the organization of connectivity and D1DRs. This was uniquely enabled by utilizing the DyNAMiC sample, as it includes structural and functional MRI data in combination with D1DR PET in the same individuals across the adult lifespan (n=180). However, after observing significant spatial correspondence between functional organization and D1DR expressed by the second hippocampal gradient (G2), we did indeed perform complimentary analyses with group-averaged data of additional dopamine markers (D2DR from a subsample of our participants, as well as DAT and FDOPA from open sources) to test the generalizability of the original finding. Taken together, the original analyses based on subject-level data and complimentary group-level analyses provided support for the interpretation of G2 as a dopaminergic mode.

      We have updated the manuscript to clarify the focus on the D1 receptor and the contribution of including additional DA markers.

      Updated paragraph in the Introduction, pages 5-6:

      “Dopamine (DA) is one of the most important modulators of hippocampus-dependent function(47,48), and influences the brain’s functional architecture through enhancing specificity of neuronal signaling(49). Consistently, there is a DA-dependent aspect of maintained functional network segregation in aging which supports cognition(50). Animal models suggest heterogeneous patterns of DA innervation(51,52) and postsynaptic DA receptors(53), across both transverse and longitudinal hippocampal axes, likely allowing for separation between DA modulation of distinct hippocampus-dependent behaviors(47). Moreover, the human hippocampus has been linked to distinct DA circuits on the basis of long-axis variation in functional connectivity with midbrain and striatal regions(54,55). Taken together with recent findings revealing a unimodal-transmodal organization of the most abundantly expressed DA receptor subtype, D1 (D1DR), across cortex(56), we tested the hypothesis that the organization of hippocampal-neocortical connectivity partly reflects the underlying distribution of hippocampal DA receptors, predicting predominant spatial correspondence for any hippocampal gradient conveying a unimodal-transmodal pattern across cortex.”

      Updated sections in the Results, page 13-14:

      “Our next aim was to investigate to which extent the distribution of hippocampal DA D1 receptors (D1DRs), measured by [<sup>11</sup>C]SCH23390 PET in the DyNAMiC(58) sample, may serve as a molecular correlate of the hippocampus’ functional organization.”

      “Complimentary analyses were then conducted to further evaluate G2 as a dopaminergic hippocampal mode by utilizing additional DA markers at group-level.”

      Moreover, the authors may be aware that multiple PET tracers are somewhat challenged in the mesiotemporal region. Is this the case for the D1 receptor as well? The hippocampus is a small and complex structure, and PET more of a low res technique so one would want to highlight and discuss the limitations of the correlations with PET maps here and/or evaluate whether the analysis adds necessary findings to the study.

      We thank the Reviewer for raising this point. The lower resolution of PET is indeed a relevant aspect to consider when quantifying D1DR availability in the hippocampus, even though previous research indicate high test-retest reliability of [<sup>11</sup>C]SCH23390 PET measurement in this region (Kaller et al., 2017). We have now elaborated on PET limitations in the Discussion of the revised manuscript.

      In our study, we made efforts to reduce potential partial volume effects (PVE) by correcting our PET data, and tested spatial associations between our functional gradients and D1DR maps using trend-surface modelling (TSM), rather than through voxel-wise comparisons. This allowed us to evaluate the spatial correspondence between functional connectivity and D1DRs at a level of spatial trends, estimated using TSM models computed at increasing levels of complexity. The results showed consistent spatial overlap between G2 and D1DRs across these models, that is, across spatial trends described at coarser-to-finer scales. Furthermore, this was replicated across several DA markers with PET and SPECT data from independent samples.

      Taken together, we agree with the Reviewer that the spatial correspondence observed between G2 and hippocampal D1DRs should be interpreted in the context of resolution-related limitations inherent to PET imaging. However, we strongly believe that our DA analyses offer valuable insight to the molecular underpinnings of hippocampal functional organization.

      Updated paragraph in the Discussion, pages 25-26:

      “We discovered that G2, specifically, manifested organizational principles shared among function, behavior, and neuromodulation. Meta-analytical decoding reproduced a unimodalassociative axis across G2 (Figure 3B), and analyses in relation to the distribution of D1DRs – which vary across cortex along a unimodal-transmodal axis(76,77) – demonstrated topographic correspondence both at the level of individual differences and across the group. It should, however, be acknowledged that PET imaging in the hippocampus is associated with resolutionrelated limitations, although previous research indicate high test-retest reliability of [<sup>11</sup>C]SCH23390 PET to quantify D1DR availability in this region(78). As such, mapping the distribution of hippocampal D1DRs at a fine spatial scale remains challenging, and replication of our results in terms of overlap with G2 is needed in independent samples. Here, we evaluated the observed spatial overlap between G2 topography and D1DRs across multiple TSM model orders, showing correspondence between modalities from simple to more complex parameterizations of their spatial properties. Topographic correspondence was additionally observed between G2 and other DA markers from independent datasets (Figure 3B), suggesting that G2 may constitute a mode reflecting a dopaminergic phenotype, which contributes to the currently limited understanding of its biological underpinnings.”

      From my (perhaps somewhat biased) perspective, it might be valuable to instead or in addition look at measures of hippocampal microstructure and how these relate to the functional aging effects. This could be done, if available, using data from the same subjects (eg based on quantitative MRI contrasts and/or structural MRI) and/or using contextualization findings as implemented in eg hippomaps.readthedocs.io

      We thank the Reviewer for this suggestion. We performed additional analyses investigating the spatial overlap between our connectivity gradients and estimates of hippocampal microstructure, computed as the ratio of T1- over T2-weighted (T1w/T2w) images (Glasser & Von Essen, 2011; vos de Wael et al., 2018). Analyses of spatial correspondence then followed the TSM-based method used to test the spatial overlap between functional connectivity gradients and D1DR distribution. Applying TSM to the T1w/T2w image computed for each participant yielded subject-level model parameters describing microstructure topography, which were then entered as predictors of connectivity topography in multivariate GLMs (separate models for each gradient and hemisphere, 6 models in total).

      Analyses revealed that microstructure of the right hippocampus significantly predicted gradient topography of right-hemisphere G1 (F = 1.325, p \= 0.034), while no other links between connectivity gradients and microstructure emerged as significant (F 0.930-1.184, ps 0.7060.079).

      These results, suggesting an association along the anteroposterior axis, deviate from previous findings linking hippocampal microstructure to G3-like, medial-lateral, connectivity organization (vos de Wael et al., 2018). As we believe that comprehensive analyses of our gradients in relation to microstructure across the lifespan would be best addressed in future work, we have not included these analyses of microstructure in the revised manuscript.

      (2) Can the authors clarify why they did not replicate based on cohorts that are more widely used in the community and open access, such as CamCAN and/or HCP-Aging? It might connect their results with other studies if an attempt was made to also show that findings persist in either of these repositories.

      We agree with the Reviewer that replication in samples such as CamCAN and/or HCP-Aging would provide valuable opportunities to connect our findings with those of other studies using those datasets. Here, we included the Betula dataset (Nilsson et al., 2004) as our replication sample, as it was immediately available to us, included a large sample of adults in a comparable age, and a word recall episodic memory task closely aligned with the one included in DyNAMiC. Importantly, leveraging the Betula dataset as our replication sample allows us to link our findings to a wide range of previous studies central to the understanding of neurocognitive aging in general, and hippocampal aging in particular (Nyberg, 2017; Nyberg et al., 2020). Betula is a large longitudinal project that has been tracking individuals since 1988, and is part of the National E-infrastructure for Aging Research (NEAR: www.near-aging.se), through which data from several Swedish studies are made available to both national and international researchers. While we acknowledge the value of extending replication efforts to datasets like CamCAN and HCP-Aging, we emphasize the significant contribution of having replicated our connectivity gradients in the Betula dataset.

      (3) The authors applied TSM and related these parameters to topographic changes in the gradients. I was wondering whether and how such an approach controls for autocorrelation present in both the PET map and gradients. Could the authors clarify?

      The Reviewer raises an important topic in spatial autocorrelation. The TSM approach used to parameterize the topography of the functional gradients and D1DR distribution, and to test the spatial correspondence between modalities, did not include any specific method to control for autocorrelation. Here, we highlight two aspects of our study in relation to this point. First, we demonstrated in the Supplementary information (S. Figure 4) that autocorrelation induced by spatial smoothing likely has limited effects on overall gradient topography and the ability of TSM parameters to capture meaningful inter-individual differences in terms of age. Second, in the case of spatial overlap effects being significantly impacted by autocorrelation, we would expect the association between right-hemisphere G2 and D1DR topography to similarly emerge for G2 in the left hemisphere. The absence of such an association may speak to a limited effect of spatial autocorrelation.

      (4) The TSM approach quantifies the gradients in terms of x/y/z direction in a cartesian coordinate system. Wouldn't a shape intrinsic coordinate system in the hippocampus also be interesting, and perhaps even be more efficient to look at here (see eg DeKraker 2022 eLife or Paquola et al 2020 eLife)?

      This is a very relevant question and we appreciate the Reviewer’s suggestion. We recognize that there may be several benefits associated with adopting a shape-intrinsic coordinate system when characterizing effects in the hippocampus, given its curved/folded anatomy. Approaches like the ones adopted in DeKraker et al., 2022 and Paquola et al., 2020, utilizes geodesic coordinate frameworks to represent the hippocampus in surface space, enabling mapping of connectivity onto the hippocampal surface while respecting its inherent curvature and topology. We anticipate that quantifying gradients within such a framework would especially benefit identification of connectivity change across the hippocampal surface relative to reference points such as subfield boundaries, while minimizing effects of interindividual differences in hippocampal shape and folding. In our study, hippocampal gradients and their associated cortical patterns were computed in volumetric space, with TSM subsequently used to parameterize the change in connectivity along these gradients. This indeed yields a description of connectivity change within a coordinate system less specific to hippocampal anatomy, but may favor generalizability and integration with previous gradient findings within and beyond the hippocampus (e.g., Przeździk et al., 2019; Tian et al., 2020; Katsumi et al., 2023; Navarro-Schröder et al., 2015), as well as connections with broader neuroimaging frameworks through techniques such as meta-analytical decoding. In our view, the different coordinate frameworks offer complimentary insight to hippocampal organization, and while we have opted to not undertake novel analyses to explore our gradients within a geodesic coordinate system for the purposes of this paper, we recognize the importance of such evaluation of our gradients in future analyses. We have made updates to the Discussion in the revised manuscript on this topic (pages 23-24):

      “Greater anatomical specificity, with more precise characterization of connectivity in relation to subfield boundaries while minimizing effects of inter-individual differences in hippocampal shape and folding, might be achieved by adopting techniques implementing a geodesic coordinate system to represent effects within the hippocampus(68,69).”

      Reviewer #2 (Public Review):

      Summary:

      This paper derives the first three functional gradients in the left and right hippocampus across two datasets. These gradient maps are then compared to dopamine receptor maps obtained with PET, associated with age, and linked to memory. Results reveal links between dopamine maps and gradient 2, age with gradients 1 and 2, and memory performance.

      Strengths:

      This paper investigates how hippocampal gradients relate to aging, memory, and dopamine receptors, which are interesting and important questions. A strength of the paper is that some of the findings were replicated in a separate sample.

      Weaknesses:

      The paper would benefit from added clarification on the number of models/comparisons for each test. Furthermore, it would be helpful to clarify whether or not multiple comparison correction was performed and - if so - what type or - if not - to provide a justification. The manuscript would furthermore benefit from code sharing and clarifying which results did/did not replicate.

      We thank the Reviewer for their positive assessment and suggestions regarding further clarifications. We have addressed the Reviewer’s comments in a point-by-point manner under the “Recommendations for the authors” section.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors analyzed the complex functional organization of the hippocampus using two separate adult lifespan datasets. They investigated how individual variations in the detailed connectivity patterns within the hippocampus relate to behavioral and molecular traits. The findings confirm three overlapping hippocampal gradients and reveal that each is linked to established functional patterns in the cortex, the arrangement of dopamine receptors within the hippocampus, and differences in memory abilities among individuals. By employing multivariate data analysis techniques, they identified older adults who display a hippocampal gradient pattern resembling that of younger individuals and exhibit better memory performance compared to their age-matched peers. This underscores the behavioral importance of maintaining a specific functional organization within the hippocampus as people age.

      Strengths:

      The evidence supporting the conclusions is overall compelling, based on a unique dataset, rich set of carefully unpacked results, and an in-depth data analysis. Possible confounds are carefully considered and ruled out.

      Weaknesses:

      No major weaknesses. The transparency of the statistical analyses could be improved by explicitly (1) stating what tests and corrections (if any) were performed, and (2) justifying the elected statistical approaches. Further, some of the findings related to the DA markers are borderline statistically significant and therefore perhaps less compelling but they line up nicely with results obtained using experimental animals and I expect the small effect sizes to be largely related to the quality and specificity of the PET data rather than the derived functional connectivity gradients.

      We thank the Reviewer for the thoughtful summary and positive assessment of our work. To increase transparency of the statistical analyses, we have in the revised manuscript added information regarding statistical tests and corrections for multiple comparisons. In the Results, p-values were reported at an uncorrected statistical threshold, and we have in the revised manuscript included the corresponding p-values adjusted for multiple comparisons using the Benjamini-Hochberg method to control the false discovery rate (FDR). Finally, in the revised manuscript, we have now elaborated on the potential limitations of our PET analyses and we include the updated paragraph below.

      Addition made to the Results section, page 13:

      “Individual maps of D1DR binding potential (BP) were also submitted to TSM, yielding a set of spatial model parameters describing the topographic characteristics of hippocampal D1DR distribution for each participant. D1DR parameters were subsequently used as predictors of gradient parameters in one multivariate GLM per gradient (in total 6 GLMs, controlled for age, sex, and mean FD). Results are reported with p-values at an uncorrected statistical threshold and p-values after adjustment for multiple comparisons using the Benjamini-Hochberg method to control the false discovery rate (FDR).”

      Addition made to the Results section, page 15:

      “Effects of age on gradient topography were assessed using multivariate GLMs including age as the predictor and gradient TSM parameters as dependent variables (controlling for sex and mean frame-wise displacement; FD). One model was fitted per gradient and hemisphere, each model including all TSM parameters belonging to a gradient (in total, 6 GLMs).”

      Addition made to the Results section, page 17:

      “Models were assessed separately for left and right hemispheres, across the full sample and within age groups, yielding eight hierarchical models in total. Results are reported with p-values at an uncorrected statistical threshold and p-values after FDR adjustment.”

      Updated paragraph in the Discussion, pages 25-26:

      “We discovered that G2, specifically, manifested organizational principles shared among function, behavior, and neuromodulation. Meta-analytical decoding reproduced a unimodalassociative axis across G2 (Figure 3B), and analyses in relation to the distribution of D1DRs – which vary across cortex along a unimodal-transmodal axis(76,77) – demonstrated topographic correspondence both at the level of individual differences and across the group. It should, however, be acknowledged that PET imaging in the hippocampus is associated with resolutionrelated limitations, although previous research indicate high test-retest reliability of [<sup>11</sup>C]SCH23390 PET to quantify D1DR availability in this region(78). As such, mapping the distribution of hippocampal D1DRs at a fine spatial scale remains challenging, and replication of our results in terms of overlap with G2 is needed in independent samples. Here, we evaluated the observed spatial overlap between G2 topography and D1DRs across multiple TSM model orders, showing correspondence between modalities from simple to more complex parameterizations of their spatial properties. Topographic correspondence was additionally observed between G2 and other DA markers from independent datasets (Figure 3B), suggesting that G2 may constitute a mode reflecting a dopaminergic phenotype, which contributes to the currently limited understanding of its biological underpinnings.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Please see the comments in the public review.

      We thank the Reviewer for their comments and recommendations, and have addressed them in the “Public review” section.

      Reviewer #2 (Recommendations For The Authors):

      (1) All statistical analyses are based on linear regressions using trend surface modeling (TSM) parameters that parameterize gradients at the subject level. These models resulted in 9 parameters for gradient 1 and 12 parameters each for gradients 2 and 3. The text states that 'Effects of age on gradient topography was assessed using multivariate GLMs including age as the predictor and gradient TSM parameters as dependent variables (controlling for sex and mean frame-wise displacement; FD)'. Please clarify whether these GLMs were fitted separately for each TSM parameter (i.e., 9+12+12=33 models for both left and right = 66 total models) or on the overall model?

      We appreciate the Reviewer’s request for clarification on this matter. These GLMs were fitted on the overall TSM model, that is, through one GLM per gradient (3) and hemisphere (2), each one including all TSM parameters belonging to a gradient (in total, 6 GLMs).

      In the revised manuscript, we have added more details to the Results section, page 15: “Effects of age on gradient topography were assessed using multivariate GLMs including age as the predictor and gradient TSM parameters as dependent variables (controlling for sex and mean frame-wise displacement; FD). One model was fitted per gradient and hemisphere, each model including all TSM parameters belonging to a gradient (in total, 6 GLMs).”

      (2) Similarly, for memory it appears that multiple models were performed (left and right, young, middle-aged, old, whole groups). Please clarify whether and how multiple comparison correction was performed in this case.

      In the revised manuscript, we have now specified the number of analyses conducted in relation to memory performance. We have also clarified that p-values were reported at an uncorrected statistical threshold, and we have in the revised manuscript included the corresponding p-values adjusted for multiple comparisons using the Benjamini-Hochberg method to control the FDR.

      Updated section in the Results, page 17:

      “Models were assessed separately for left and right hemispheres, across the full sample and within age groups, yielding eight hierarchical models in total. Results are reported with p-values at an uncorrected statistical threshold and p-values after FDR adjustment.”

      (3) Although I applaud the authors for their replication efforts, the results do not appear to replicate well. For example, memory was linked to gradient 2 in the whole group but to gradient 1 in the young group. Furthermore, dopamine was linked to gradient 2 in the right but not the left hemisphere. Although the overall group-level gradients were very stable between the two datasets, it is not clear whether the age findings replicated and the memory subgroup findings only replicated at trend level for memory and only partially replicated at the TSM parameter level.

      We thank the Reviewer for highlighting the inclusion of a replication dataset as a strength of our study, and we appreciate the recommendation to clarify to which extent results replicated. We provide a response to the Reviewer’s points below, and specify the revisions made to the manuscript in relation to this topic.

      The main aim of our study was to characterize the topographic organization of functional hippocampal-neocortical connectivity within the hippocampus across the adult lifespan, as previous studies have limited their focus to younger adults. Given the lack of previous studies for comparison, together with our identification of a novel secondary long-axis connectivity gradient (G2) taking precedence over the previously established medial-lateral G3, we included the Betula sample (Nilsson et al., 2004) for the purpose of replication. There was a high level of consistency between our main dataset and our replication dataset, with gradients 1-3 in left and right hemispheres identified in both samples.

      Further use of the replication dataset, beyond the identification of the connectivity gradients, was originally not planned. As such, not all subsequent analyses in the main dataset were conducted in the replication dataset. However, we found it critical to evaluate the observation that older individuals who maintained a youth-like gradient topography also exhibited higher levels of memory performance in an independent sample. This was possible given that the replication dataset included a comparable number of participants in similar ages and a word recall episodic memory task corresponding well to the one used in DyNAMiC. Overall, we conclude that these analyses replicated well across samples. Firstly, topography of lefthemisphere G1 informed the classification of older adults into youth-like and aged subgroups in both samples. Furthermore, in both samples, we observed that the older subgroups identified based on G1 topography also exhibited the youth-like vs. aged pattern in G2 topography. This pattern was, however, evident also in G3 only in the main sample, possibly suggesting a limited contribution of G3 topography in determining overall functional profiles in older age. In terms of the behavioral relevance of maintaining youth-like gradient topography in older age, we observed effects on word recall performance in both samples; although the Reviewer correctly points out that, the difference between subgroups was significant at trend-level (p = 0.058) in the replication dataset. While this indeed underscores the importance of replication efforts in additional samples, we argue that the pattern observed in our replication dataset is overall consistent with, and conveys effects in the expected direction based on, the original observations in our main dataset.

      In revising the manuscript, we have performed additional analyses for replication purposes in terms of memory. Originally, we observed a significant association between G2 topography and episodic memory across the main sample. However, this effect did not remain significant after FDR adjustment for multiple comparisons. To evaluate this association further, we conducted a corresponding hierarchical multiple regression analysis in the replication dataset, which supported a role of G2 in memory (Adj. R<sup>2</sup> = 0.368, ΔR<sup>2</sup> = 0.081, F= 1.992, p = 0.028). Together, these analyses suggest that inter-individual differences in episodic memory performance may in part be explained by the spatial characteristics of G2 across the adult lifespan, although increased statistical power in relation to the large number of TSM parameters included in the hierarchical regression models may be needed to explore this association in smaller, age-stratified, groups. Relatedly, it is worth mentioning that higher levels of memory performance in older age were linked to the maintenance of youth-like G2 topography in both our main and replication datasets.

      In parallel, topographic parameters of G1 predicted memory performance in the younger adults, which successfully replicates TSM-based results previously reported in Przeździk et al., 2019. Although similar associations were not evident within the other age groups, a link between G1 topography and memory was demonstrated in older age based on a) the identification of individuals maintaining a youth-like G1 profile and higher levels of memory, within which b) memory performance was, as in young adults, significantly predicted by G1 topography.

      The spatial correspondence between G2 topography and distribution of hippocampal D1DRs was lateralized to the right, and as the Reviewer points out, as such did not replicate across hemispheres. To which extent replication across hemispheres should be expected in this case is, however, difficult to determine. Lateralization and/or hemispheric asymmetry is commonly observed in numerous hippocampal features, from the molecular level to its functional involvement in behavior (Nematis et al., 2023; Persson & Söderlund, 2015), including various dopaminergic markers tested in the animal literature (Afonso et al., 1993; Sadeghi et al., 2017). Yet, potential differences between hemispheres in D1DR availability and the spatial distribution of receptors along hippocampal axes remain less studied in humans. More data is therefore needed to determine the nature of this right-hemisphere lateralization.

      In sum, we argue that our results show a good level of replication across independent datasets and across analyses in our main dataset. Whereas this study did not attempt replication of all analyses conducted in the main dataset, it has through replication across independent samples provided support for its main findings – the organization of hippocampal-neocortical connectivity along three main hippocampal gradients across the adult lifespan, and the gradient topography-based identification of older individuals maintaining a youth-like hippocampal organization in older age.

      The revised manuscript includes edits made to incorporate the new analyses and clarifications of observations in relation to memory.

      In the Results, page 17:

      “Observing that the association between G2 and memory did not remain significant after FDR adjustment, we performed the same analysis in our replication dataset, which also included episodic memory testing. Consistent with the observation in our main dataset, G2 significantly predicted memory performance (Adj. R<sup>2</sup> = 0.368, ΔR<sup>2</sup> = 0.081, F= 1.992, p = 0.028) over and above covariates and topography of G1. Here, the analysis also showed that G1 topography predicted performance across the sample (Adj. R<sup>2</sup> = 0.325, ΔR<sup>2</sup> = 0.112, F= 3.431, p < 0.001).”

      In the Discussion, page 26:

      “Results linked both G1 and G2 to episodic memory, suggesting complimentary contributions of these two overlapping long-axis modes. Considered together, analyses in the main and replication datasets indicated a role of G2 topography in memory across the adult lifespan, independent of age. A similar association with G1 was only evident across the entire sample in the replication dataset, whereas results in the main sample seemed to emphasize a role of youthlike G1 topography in memory performance. In line with previous research, memory was successfully predicted by G1 topography in young adults(30), and similarly predicted by G1 in older adults exhibiting a youth-like functional profile.”

      (4) Please share the data and code and add a description of data and code availability in the manuscript.

      We have now made our code available, and added a statement on data and code availability in the revised manuscript.

      On page 37: “Data from the DyNAMiC study are not publicly available. Access to the original data may be shared upon request from the Principal investigator, Dr. Alireza Salami. The Matlab, R, and FSL codes used for analyses included in this study are openly available at https://github.com/kristinnordin/hcgradients. Computation of gradients was done using the freely available toolbox ConGrads: https://github.com/koenhaak/congrads.”

      Reviewer #3 (Recommendations For The Authors):

      Please see the comments in the public review.

      We thank the Reviewer for their comments and recommendations, and have addressed them in the “Public review” section.

      References

      Afonso, D., Santana, C., & Rodriguez, M. (1993). Neonatal lateralization of behavior and brain dopaminergic asymmetry. Brain Research Bulletin, 32(1), 11–16. https://doi.org/10.1016/0361-9230(93)90312-Y

      DeKraker, J., Haast, R. A., Yousif, M. D., Karat, B., Lau, J. C., Köhler, S., & Khan, A. R. (2022). Automated hippocampal unfolding for morphometry and subfield segmentation with HippUnfold. eLife, 11, e77945. https://doi.org/10.7554/eLife.77945

      Dubovyk, V., & Manahan-Vaughan, D. (2019). Gradient of expression of dopamine D2 receptors along the dorso-ventral axis of the hippocampus. Frontiers in Synaptic Neuroscience, 11. https://doi.org/10.3389/fnsyn.2019.00028

      Edelmann, E., & Lessmann, V. (2018). Dopaminergic innervation and modulation of hippocampal networks. Cell and Tissue Research, 373(3), 711–727. https://doi.org/10.1007/s00441-018-2800-7

      Gasbarri, A., Verney, C., Innocenzi, R., Campana, E., & Pacitti, C. (1994). Mesolimbic dopaminergic neurons innervating the hippocampal formation in the rat: A combined retrograde tracing and immunohistochemical study. Brain Research, 668(1), 71–79. https://doi.org/10.1016/0006-8993(94)90512-6

      Glasser, M. F., & Essen, D. C. V. (2011). Mapping Human Cortical Areas In Vivo Based on Myelin Content as Revealed by T1- and T2-Weighted MRI. Journal of Neuroscience, 31(32), 11597–11616. https://doi.org/10.1523/JNEUROSCI.2180-11.2011

      Kaller, S., Rullmann, M., Patt, M., Becker, G.-A., Luthardt, J., Girbardt, J., Meyer, P. M., Werner, P., Barthel, H., Bresch, A., Fritz, T. H., Hesse, S., & Sabri, O. (2017). Test– retest measurements of dopamine D1-type receptors using simultaneous PET/MRI imaging. European Journal of Nuclear Medicine and Molecular Imaging, 44(6), 1025–1032. https://doi.org/10.1007/s00259-017-3645-0

      Katsumi, Y., Zhang, J., Chen, D., Kamona, N., Bunce, J. G., Hutchinson, J. B., Yarossi, M., Tunik, E., Dickerson, B. C., Quigley, K. S., & Barrett, L. F. (2023). Correspondence of functional connectivity gradients across human isocortex, cerebellum, and hippocampus. Communications Biology, 6(1), Article 1. https://doi.org/10.1038/s42003-023-04796-0

      Kempadoo, K. A., Mosharov, E. V., Choi, S. J., Sulzer, D., & Kandel, E. R. (2016). Dopamine release from the locus coeruleus to the dorsal hippocampus promotes spatial learning and memory. Proceedings of the National Academy of Sciences, 113(51), 14835–14840. https://doi.org/10.1073/pnas.1616515114

      Navarro Schröder, T., Haak, K. V., Zaragoza Jimenez, N. I., Beckmann, C. F., & Doeller, C. F. (2015). Functional topography of the human entorhinal cortex. eLife, 4, e06738. https://doi.org/10.7554/eLife.06738

      Nemati, S. S., Sadeghi, L., Dehghan, G., & Sheibani, N. (2023). Lateralization of the hippocampus: A review of molecular, functional, and physiological properties in health and disease. Behavioural Brain Research, 454, 114657. https://doi.org/10.1016/j.bbr.2023.114657

      Nilsson, L.-G., Adolfsson, R., Bäckman, L., Frias, C. M. de, Molander, B., & Nyberg, L. (2004). Betula: A Prospective Cohort Study on Memory, Health and Aging. Aging, Neuropsychology, and Cognition, 11(2–3), 134–148. https://doi.org/10.1080/13825580490511026

      Nyberg, L. (2017). Functional brain imaging of episodic memory decline in ageing. Journal of Internal Medicine, 281(1), 65–74. https://doi.org/10.1111/joim.12533

      Nyberg, L., Boraxbekk, C.-J., Sörman, D. E., Hansson, P., Herlitz, A., Kauppi, K., Ljungberg, J. K., Lövheim, H., Lundquist, A., Adolfsson, A. N., Oudin, A., Pudas, S., Rönnlund, M., Stiernstedt, M., Sundström, A., & Adolfsson, R. (2020). Biological and environmental predictors of heterogeneity in neurocognitive ageing: Evidence from Betula and other longitudinal studies. Ageing Research Reviews, 64, 101184. https://doi.org/10.1016/j.arr.2020.101184

      Paquola, C., Benkarim, O., DeKraker, J., Larivière, S., Frässle, S., Royer, J., Tavakol, S.,

      Valk, S., Bernasconi, A., Bernasconi, N., Khan, A., Evans, A. C., Razi, A., Smallwood, J., & Bernhardt, B. C. (2020). Convergence of cortical types and functional motifs in the human mesiotemporal lobe. eLife, 9, e60673. https://doi.org/10.7554/eLife.60673

      Pedersen, R., Johansson, J., Nordin, K., Rieckmann, A., Wåhlin, A., Nyberg, L., Bäckman, L., & Salami, A. (2024). Dopamine D1-Receptor Organization Contributes to Functional Brain Architecture. Journal of Neuroscience, 44(11). https://doi.org/10.1523/JNEUROSCI.0621-23.2024

      Pedersen, R., Johansson, J., & Salami, A. (2023). Dopamine D1-signaling modulates maintenance of functional network segregation in aging. Aging Brain, 3, 100079. https://doi.org/10.1016/j.nbas.2023.100079

      Persson, J., & Söderlund, H. (2015). Hippocampal hemispheric and long-axis differentiation of stimulus content during episodic memory encoding and retrieval: An activation likelihood estimation meta-analysis. Hippocampus, 25(12), 1614–1631. https://doi.org/10.1002/hipo.22482

      Przeździk, I., Faber, M., Fernández, G., Beckmann, C. F., & Haak, K. V. (2019). The functional organisation of the hippocampus along its long axis is gradual and predicts recollection. Cortex, 119, 324–335. https://doi.org/10.1016/j.cortex.2019.04.015

      Sadeghi, L., Rizvanov, A. A., Salafutdinov, I. I., Dabirmanesh, B., Sayyah, M., Fathollahi, Y., & Khajeh, K. (2017). Hippocampal asymmetry: Differences in the left and right hippocampus proteome in the rat model of temporal lobe epilepsy. Journal of Proteomics, 154, 22–29. https://doi.org/10.1016/j.jprot.2016.11.023

      Tian, Y., Margulies, D. S., Breakspear, M., & Zalesky, A. (2020). Topographic organization of the human subcortex unveiled with functional connectivity gradients. Nature Neuroscience, 1–12. https://doi.org/10.1038/s41593-020-00711-6

      vos de Wael, R., Larivière, S., Caldairou, B., Hong, S.-J., Margulies, D. S., Jefferies, E., Bernasconi, A., Smallwood, J., Bernasconi, N., & Bernhardt, B. C. (2018). Anatomical and microstructural determinants of hippocampal subfield functional connectome embedding. Proceedings of the National Academy of Sciences, 115(40), 10154–10159. https://doi.org/10.1073/pnas.1803667115

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This study presents valuable finding regarding the role of life history differences in determining population size and demography. The evidence for the claims is still partially incomplete, with concerns about generation times and population structure. Nonetheless, the work will be of considerable interest to biologists thinking about the evolutionary consequences of life history changes.  

      Thank you. We have addressed the generation time and population structure issues in detail in our revision and hope that you, like us, find them to be of sufficiently low concern (i.e., they are not driving the results) that they do not overshadow the main findings and conclusions.

      The opportunity to make in-depth revisions also helped the manuscript in two ways unanticipated by both us and the reviewers. First, KW made a mistake in the original analysis of phylogenetic signal, and catching that error simplifies that aspect of the study (there is none in our measured variables). Second, in June 2024 Hilgers et al. (2024; https://doi.org/10.1101/2024.06.17.599025) posted an important manuscript to bioRxiv noting the possibility of false population size peaks in PSMC analyses using the standard default settings. Our results had three of those, which we have eliminated. N<sub>e</sub>ither of these issues affect the overall conclusions, but their resolution improves the work.  

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This interesting study applies the PSMC model to a set of new genome sequences for migratory and nonmigratory thrushes and seeks to describe differences in the population size history among these groups. The authors create a set of summary statistics describing the PSMC traces - mean and standard deviation of N<sub>e</sub>, plus a set of metrics describing the shape of the oldest N<sub>e</sub> peak - and use these to compare across migratory and resident species (taking single samples sequenced here as representative of the species). The analyses are framed as supporting or refuting aspects of a biogeographic model describing colonization dynamics from tropical to temperate North and South America. 

      Strengths: 

      At a technical level, the sequencing and analysis up through PSMC looks good and the paper is engaging and interesting to read as an introduction to some verbal biogeographic models of avian evolution in the Pleistocene.

      The core findings - higher and more variable N<sub>e</sub> in migratory species - seem robust, and the biogeographic explanation is plausible.  

      Thanks. We thought so as well. Our analyses go beyond being simply descriptive and test some simple hypotheses, including a biogeographic+ecological expansion opportunity gained in some lineages through the adoption of a seasonal migration life-history strategy.  

      Weaknesses: 

      I did not find the analyses particularly persuasive in linking specific aspects of clade-level PSMC patterns causally to evolutionary driving forces. To their credit, the authors have anticipated my main criticism in the discussion. This is that variation in population size inferred by methods like PSMC is in "effective" terms, and the link between effective and census population size is a morass of bias introduced by population structure and selection so robustly connecting specific aspects of PSMC traces to causal evolutionary forces is somewhere between extremely difficult and impossible.  

      As R1 notes, we do not attempt to link effective population sizes and census sizes (though we do discuss this), and we are also careful to discuss correlated rather than causative factors when going beyond the overarching hypotheses regarding life-history strategy.

      Population structure is the most obvious force that can generate large N<sub>e</sub> changes mimicking the census-sizefocused patterns the authors discuss. The authors argue in the discussion that since they focus on relatively deep time (>50kya at least, with most analyses focusing on the 5mya - 500kya range) population structure is "likely to become less important", and the resident species are usually more structured today (true) which might bias the findings against the observed higher N<sub>e</sub> in migrants.  

      To clarify, the patterns we discuss are entirely related to effective population size, not census size. But, yes, this is why we’ve given population structure its own section in the Discussion.

      But is structure really unimportant in driving PSMC results at these specific timescales? There is no numerical analysis presented to support the claim in this paper. The biogeographic model of increased temperate-latitude land area supporting higher populations could yield high N<sub>e</sub> via high census size, but shifts in population structure (for example, from one large panmictic population to a series of isolated refugial populations as a result of glaciation-linked climate changes) could plausibly create elevated and more variable N<sub>e</sub>. Is it more land area and ecological release leading to a bigger and faster initial N<sub>e</sub> bump, or is it changes in population connectivity over time at expanding range edges, or is the whole single-bump PSMC trace an artifact of the dataset size, or what? The authors have convinced me that the N<sub>e</sub> history of migratory thrushes is on average very different from nonmigrant thrushes, but beyond that it's unclear what exactly we've learned here about the underlying process.  

      We do not argue that population structure is unimportant, only that it is less important as one goes into deeper time. Further, we agree with the reviewer’s observation above that structure is more likely to bias nonmigrant estimates of N<sub>e</sub>. In other words, following Li & Durbin’s (2011) simulations, we interpret that an inflated N<sub>e</sub> due to structure should occur more often among residents. We have clarified this in the revision. We also agree that what we’ve learned about the underlying process is not entirely clear, but as we stated, population structure does not seem to be the main driver, and there is evidence that both biogeographic and ecological factors are involved. With this being the first time that these questions have been asked, we think we’ve made an important advance and that we’ve opened a number of avenues for future study.

      It also important to consider the time scales involved and the sampling regime. Glacial-interglacial cycles averaged ~100 Kyr back to 0.74 Mya and then averaged ~41 Kyr from then back to 2.47 Mya; about 50-60 of these cycles occurred (Lisiecki & Raymo 2005: fig. 4). This probably caused a lot of population structuring and mixing in these lineages. In addition, in the PSMC output from one of our lineages, C. ustulatus swainsonii, we find that there are 54 time segments sampled for the Pleistocene, indicating the inadequacy of this method to reflect fine-scale changes and suggesting that each estimate is capturing a lot of both phenomena, structuring and mixing. We have added this to the revision.

      I generally agree with the authors that "at present there is no way to fully disentangle the effects of population structure and geographic space on our results". But given that, I think there are two options - either we can fully acknowledge that oversimplified demographic models like PSMC cannot be interpreted as supporting evidence of any particular mechanistic or biogeographic hypothesis and stop trying to use them to do that, or we have to do our best to understand specifically which models can be distinguished by the analyses we're employing. 

      Short of developing some novel theory deep in the PSMC model, I think readers would need to see simulations showing that the analyses employed in this paper are capable of supporting or refuting their biogeographic hypothesis before viewing them as strongly supporting a specific biogeographic model. Tools like msprime and stdpopsim can be used to simulate genome-scale data with fairly complex biogeographic models. Running simulations of a thrush-like population under different biogeographic scenarios and then using PSMC to differentiate those patterns would be a more convincing argument for the biogeographic aspects of this paper. The other benefit of this approach would be to nail down a specific quantitative version of the taxon cycles model referenced in the abstract, and it would allow the authors to better study and explain the motivation behind the specific summary statistics they develop for PSMC posthoc analysis.  

      These could very well be fruitful pursuits for future work, but they are beyond the scope of this paper. The impossibility of reconstructing ranges through deep time makes anything other than the very general biogeographic hypothesis we’ve posed an uncertain pursuit. Also, a purely biogeographic approach neglects the likelihood of ecological expansion also being involved. We get at the importance of the latter in the “Geography and evolutionary ecology” section of the Discussion. Below, the editor states that discussions among reviewers indicate that simulations are not warranted at this time. We agree that the complexities involved are substantial, to the point of making direct relevance to this empirical study uncertain (especially in such an among-lineage context). Regarding taxon cycles, we merely point out that that conceptual framework seems relevant given our findings. This was not even remotely anticipated at the outset of the study, so we are reluctant to do anything more than point out its possible relevance in several aspects of the results. Finally, the motivation for the study’s summary statistics were entirely driven by the hypotheses, as given in Methods, and due to an earlier error (noted above), there are no post-hoc analyses in the revision. Sorry for the needless confusion.

      Reviewer #2 (Public Review): 

      Summary: 

      Winker and Delmore present a study on the demographic consequences of migratory versus resident behavior by contrasting the evolutionary history of lineages within the same songbird group (thrushes of the genus Catharus). 

      Strengths: 

      I appreciate the test-of-hypothesis design of the study and the explicit formulation of three main expectations to test. The data analysis has been done with appropriate available tools. 

      Weaknesses: 

      The current version of the paper, with the case study chosen, the results, and the relative discussion, is not satisfying enough to support or reject the hypotheses here considered.  

      Given the stated strengths, the weaknesses noted seem a little incongruous, but we understand from the comments below that the reviewer would like to see the study redesigned and expanded.  

      The authors hypothesized that the wider realized breeding and ecological range characterising migrants versus resident lineages could be a major drive for increased effective population size and population expansion in migrants versus residents. I understand that this pattern (wider range in migrants) is a common characteristic across bird lineages and that it is viewed as a result of adapting to migration. A problem that I see in their dataset is that the breeding grounds range of the two groups are located in very different geographic areas (mainly South versus North America). The authors could have expanded their dataset to include species whose breeding grounds are from the two areas, regardless of their migratory behaviour, as a comparison to disentangle whether ecological differences of these two areas can affect the population sizes or growth rates.

      Because the questions are about the migratory life history strategy and the best way to get at this is in a phylogenetic framework, we’re not sure how we could effectively add species “regardless of their migratory behavior.” Further, we know that migration causes lineages to experience variable ecological conditions that include breeding, migration, and wintering conditions. Obligate migrants are going to have different breeding ranges from their close relatives, and the more distantly related species are, the less likely it is that they respond to particular ecological conditions the same way. So we do not think that an approach that included miscellaneous species from northern and southern regions would strengthen this study. Here, the comparative framework of closely related lineages that possess or lack the trait of interest is a study design strength. We do agree, however, that future work is needed that does encompass more lineages (we would argue in a phylogenetic context), and that disentangling the effects of geography and ecology will also be an important future endeavor. 

      As I understand from previous literature, the time-scale to population growth and estimates of effective population sizes considered in the present paper for the resident versus migratory clades seem to widely predate the times to speciation for the same lineages, which were reported in previous work of the same authors (Everson et al 2019) and others (Termignoni-Garcia et al 2022). This piece of information makes the calculation of species-specific population size changes difficult to interpret in the light of lineages' comparison. It is unclear what the authors consider to be lineage-specific in these estimates, as the clades were likely undergoing substantial admixture during the time predating full isolation.  

      We do recognize that timing estimates vary among studies. Differences among studies in important variables like markers, methods, generation time, and mutation or substitution rates create much of this uncertainty. Also, we are not confident in prior dating efforts in this group, largely because of gene flow and its effects on bringing estimates closer to the present. As we point out (line 485), differences among studies on these issues do not detract from the strengths here for within-study, among-lineage contrasts. In short, the timing could be off in an among-study context (and likely is with prior work, given gene flow), but relative performance of among-lineage N<sub>e</sub> differences is less susceptible to these factors. This was shown fairly well in Li & Durbin’s initial use of the method among human populations. Regarding substantial admixture, PSMC curves often unite at their origins with sister lineages (when they were the same lineage). A good example is with the two C. guttatus E & W curves in Fig. S3, which still have substantial gene flow today (they are subspecies and in contact), yet they show remarkably different N<sub>e</sub> curves through their history. It is not possible to mark a cutoff point for each lineage that represents the cessation of admixture with another lineage (e.g., Everson et al. 2019 showed substantial admixture between three full species in this group); that period can be very long (Price et al. 2008), varies among lineages, and will not be available for deeper lineage divergences in the phylogeny. We therefore chose to use all of the time intervals retrievable from the genomic data in each lineage, considering that this uniform treatment is the best approach for our among-lineage comparison. And note that we were careful to label these as “the lineages’ PSMC inception” (line 190).  

      Regarding the methodological difficulties in interpreting the impact of population structure on the estimates of effective population sizes with the PSMC approach, I would think that performing simulations to compare different scenarios of different degrees of structured populations would have helped substantially understand some of the outcomes.  

      The complexities of such modeling in a system like this are daunting. The different degrees of structuring among all of these lineages across just a single glacial-interglacial cycle would necessitate a lot of guesswork; projecting that back across 50-60 such cycles just in the Pleistocene would probably end up being fiction. Disentangling the effects of structure versus changes in N<sub>e</sub> in a system like this would probably not be possible with that approach and these data. As noted above and below, there was agreement among reviewers and the editor that simulations in this case are not warranted for revision. We have added the nature of the glacialinterglacial cycles and the PSMC sampling time segments to help readers understand this better (see above in response to R1, and lines 272-278).

      Additionally, I have struggled to understand if migratory behaviour in birds is considered to be acquired to relieve species competition, or as a consequence of expanded range (i.e., birds expand their range but their feeding ground is kept where speciation occurred as to exploit a ground with higher quality and abundance of seasonal local resources).  

      The origins of migration have been a struggle for researchers since the subject was taken up. But how the trait was acquired among these species does not really matter for our study. Here, migratory lineages possess different biogeographic+ecological attributes than their close relatives that are sedentary. Our focus is on the presence and absence of this life-history trait.

      The points raised above could be considered to improve the current version of the paper. 

      Thank you. We appreciate the opportunity to guide our revision using your comments.  

      Reviewer #3 (Public Review): 

      Summary: 

      This paper applies PSMC and genomic data to test interesting questions about how life history changes impact long-term population sizes. 

      Strengths: 

      This is a creative use of PSMC to test explicit a priori hypotheses about season migration and N<sub>e</sub>. The PSMC analyses seem well done and the authors acknowledge much of the complexity of interpretation in the discussion. 

      Weaknesses: 

      The authors use an average generation time for all taxa, but the citations imply generation time is known for at least some of them. Are there differences in generation time associated with migration? I am not a bird biologist, but quick googling suggests maybe this is the case (https://doi.org/10.1111/1365-2656.13983). I think it important the authors address this, as differences in generation time I believe should affect estimates of N<sub>e</sub> and growth.  

      Good point. The study cited by the reviewer encompasses a much higher degree of variation in body size and thus generation time. Differences in generation time in similarly sized close relatives, as in our study, should be small, and our approach has been to average those that are known. Unfortunately, generation times are not known for all of these species, but given their similarity in size we can have reasonable confidence in their being similar. We used data from the life-history research available (as cited) to obtain our average; there are not appropriate data for the residents, though. However, there is thought to be a generation time cost to seasonal migration in birds, and Bird et al. (2020) included this in their estimates to provide modeled values for all of the lineages we studied. We’re leery of using modeled values where good data for the nonmigrants in this group don’t exist (and the basis for quantifying this cost is tiny), but we recognize that this second approach is available and could leave some doubt in our results if not pursued. So we re-did everything with the modeled generation times of Bird et al. (2020). As expected, most of the differences are time-related. Importantly, our overall results are not different. We present them as Table S2 and have added the details on this to the Methods.

      The writing could be improved, both in the introduction for readers not familiar with the system and in the clarity and focus of the discussion.  

      We have added a phylogeny (new Fig. 1) to help readers better understand the system, and we’ve re-worked the Discussion to make it clearer what is clarified by our results and what remains unclear.  

      Recommendations for the authors:

      Reviewing Editor comment: 

      I note that discussion among the reviewers made clear that simulations are probably not the right answer given the complexity of the modeling required.  

      We appreciate this conclusion, with which we agree.  

      Reviewer #2 (Recommendations For The Authors): 

      Apologies for the delay with the review, which came at a very busy time. I hope you will find my comments helpful.

      Thanks. Your comments are helpful, and we fully understand how reviews (and our revisions!) have to wait until more pressing needs are addressed.

      I enjoyed reading the manuscript but I believe that the discussion sections could be heavily rewritten for better clarity. The discussion is sometimes redundant and lacks some flow/clarity. In a nutshell, I had the feeling that a bit of everything is thrown in the discussion but clear conclusions are not made.  

      Yes, the Discussion has been difficult to write, because more issues arose in the Results than we anticipated at the outset. We feel that discussing them is relevant, but we agree that much remains unclear. This coupling of paleodemographics with geography and ecology is a new area, which opens some important new (and relevant) areas to consider. So clarity is not possible in some areas. We’ve revised to point out where we do have clarity (e.g., in migrant lineages having different paleodemographic attributes than nonmigrants) and where only further study can provide clarity (e.g., in the roles of geography versus ecology). The journal format does not seem to have secondary subheaders, but we’ve used bold in one place to highlight ‘ecological mechanisms’ to offset that section, one of the more complex. We’ve also added a paragraph in the conclusions to clarify where we have clear takeaways and where uncertainties remain. 

      Reviewer #3 (Recommendations For The Authors): 

      The introduction should engage the reader with biology, not the use of demographic methods or genomics (both of which have been around for more than a decade). I would drop the first paragraph and considerably expand the second. What has previous research on ecology/behavior/genetics found regarding the demographic effects of seasonal migration?

      There are two important aspects to our study: 1) using paleodemographic methods to test hypotheses about adoption of a major life-history trait—an important biological question regardless of system, and so far (surprisingly) unaddressed; and 2) using this novel approach to study the effects of one such trait, seasonal migration. At these timescales, nothing exists on this subject, so there is really nothing to expand with. If there is relevant literature that we’ve missed, we’d be happy to add it.

      What is the missing bit of information or angle the current study addresses (other than just doing it larger and fancier with genomics)?  

      The effects of major life-history traits on paleodemographics has not been addressed before, to our knowledge. The whole context is new, so we’re not doing something “larger and fancier” with genomics. We are doing something that has not been done before: testing hypotheses about the effects of a major life-history trait on population sizes in evolutionary time. We’re not sure how this can be made clearer. To us this seems like a very engaging biological question with wide applicability. We hope that this study is just the first of many to come, in a diversity of biological systems.

      A figure showing the phylogenetic relationships of these taxa which are migratory would help the reader immensely. Although this is shown in Fig S3 I think it might be nice to have a map of the species and their ranges alongside a phylogeny as a main figure early on.  

      Thank you. This is a good suggestion. We can’t fit a phylogeny and all the distribution maps (Fig. S1) onto a page, but we can include a phylogeny as one of the main figures with nonmigrants highlighted. We’ve inserted this as a new Fig. 1. 

      If I understand correctly, the authors' arguments for why migratory species should show more growth hinge on large range size and geographic expansion. Yet they argue in the discussion that these forces are unlikely to be important (L226). I found the discussion on this confusing (e.g. L231 then says maybe it does matter). I think more clarity here would be helpful.

      Our argument and predictions are based both on geographic and ecological expansion. This was clearly stated as our third prediction “3) early population growth would be higher as seasonal migration opens novel ecological and geographic space…” We have gone back through and reiterated the coupling of these two factors. The line mentioned concludes the first paragraph in the section ‘Geography and evolutionary ecology,’ which focuses on the difficulty of decoupling these in this system. As the paragraph relates, geography alone does not seem to be driving our results (we do not argue that it is unimportant). 

      I also would have liked more time in the discussion addressing why variation in N<sub>e</sub> may be higher in migratory lineages.

      In addition to re-clarifying this in the Introduction, we have touched back on this now at line 221: “We attribute the higher variation in N<sub>e</sub> among migrants to be the result of the relative instability of northern biomes compared with tropical ones through glacial-interglacial cycles (e.g., Colinvaux et al., 2000; Pielou, 1991).”

      Minor comments: 

      L 62: Presumably PSMC is limited by the coalescent depth of the genelaogy, which may be younger or older than population "origins" depending on the history of colonization, lineage splitting, gene flow, etc.  

      We were careful to phrase these as “the lineages’ PSMC inception” (line 190), and responded to this issue in more detail above in response to R2’s public review. 

      L 338: I think a few more details on PSMC would be helpful. Was no maskfile used?  

      We did not use a maskfile, choosing instead to generate data of decent coverage and aligning reads to a single closely related relative. 

      Did the consensus fasta include all species?  

      No, we used a single reference high-quality fasta of Catharus ustulatus , as reported (lines 434-37). We have added that “Identical treatment of all lineages in these respects should provide a strong foundation for a comparative study like this among close relatives.” 

      L 361: Fair to assume the authors used a weighted average of N<sub>e</sub> from the output, rather than just averaging the N<sub>e</sub> values from each time segment?  

      No – we used all the values of N<sub>e</sub> produced by PSMC output. The PSMC method uses nonoverlapping portions of the genome in its analyses (which we’ve added to make that clear), and portions in juxtaposition will often provide data for very different periods in the time segments. Further, time segments are uneven within and among taxa, so it is not clear how a uniform and comparable weighting scheme could be implemented. We consider a uniform approach to be of primary importance, including for future comparisons among studies. 

      L 383 "delta" typo

      Thank you for catching this.

      L 93: I'd be tempted to present the questions (how does seasonal migration affect population size trajectory, means, and variation) and rationale before presenting the hypotheses. I found myself reading the hypotheses and wondering "why?"  

      We’ve tried this change in the revision. It makes the hypotheses a little harder to pull out (they are no longer numbered in a short sequence), but it is shorter and solves this concern.  

      L 337 read depth is usually expressed as X (e.g. "23X") rather than bp.

      Changed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study further validates DNAH12 as a causative gene for asthenoteratozoospermia and male infertility in humans and mice. The data supporting the notion that DNAH12 is required for proper axonemal development are generally convincing, although more experiments would solidify the conclusions. This work will interest reproductive biologists working on spermatogenesis and sperm biology, as well as andrologists working on male fertility.

      We thank the editor and the two reviewers for their time and careful evaluation of our manuscript. We sincerely appreciate their encouraging feedback and insightful guidance on improving our study. In the revised manuscript, we have performed additional experiments and provided quantitative data regarding the reviewers' comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Even though this is not the first report that the mutation in the DNAH12 gene causes asthenoteratozoospermia, the current study explores the sperm phenotype in-depth. The authors show experimentally that the said mutation disrupts the proper axonemal arrangement and recruitment of DNALI1 and DNAH1 - proteins of inner dynein arms. Based on these results, the authors propose a functional model of DNAH12 in proper axonemal development. Lastly, the authors demonstrate that the male infertility caused by the studies mutation can be rescued by ICSI treatment at least in the mouse. This study furthers our understanding of male infertility caused by a mutation of axonemal protein DNAH12, and how this type of infertility can be overcome using assisted reproductive therapy.

      Strengths:

      This is an in-depth functional study, employing multiple, complementary methodologies to support the proposed working model.

      Thank you for your recognition of the strength of this study. Your positive feedback motivates us to continue refining our research and methodological rigor in future studies.

      Weaknesses:

      The study strength could be increased by including more controls such as peptide blocking of the inhouse raised mouse and rat DNAH12 antibodies, and mass spectrometry of control IP with beads/IgG only to exclude non-specific binding. Objective quantifications of immunofluorescence images and WB seem to be missing. At least three technical replicates of western blotting of sperm and testis extracts could have been performed to demonstrate that the decrease of the signal intensity between WT and mutant was not caused by a methodological artifact.

      Thank you for your comments. In order to study in-depth, we have analyzed the protein sequence features of DNAH12 protein, 1-200 amino acids of DNAH12 were selected as the ideal antigen considering its good performance (1. high immunogenicity; 2. High hydrophilicity; 3. Good Surface Leakage Groups; 4. Sequence homology analysis to avoid unspecific recognition to other proteins;). The two different anti-DNAH12 antibodies were developed with the help Dia-An Biotech company in 2022, we have tried to acquire the polypeptide fragments of target proteins to do peptide blocking but the material were discard after the service. Luckily, we have got the target band of DNAH12 protein in western blotting experiment while the band was not detected in knockout mice group; the immunofluorescence signals of DNAH12 were strong but not present in knockout mice group. Besides, we have tested that the inhouse raised rabbit antibody were suitable for IP experiment. The IP experiment also showed the raised rabbit antibody were able to immunoprecipitated the DNAH12 band in the Dnah12<sup>+/+</sup> mice but not in Dnah12<sup>-/-</sup> mice. Collectively, these data could support the specificity of the raised DNAH12 antibodies.  In IP assay, we have added the IgG group in the IP-mass spectrometry to exclude non-specific binding. And the experimental design was described in Figure 6B. The raw data were deposited in iProX partner repository (accession number: PXD051681), and we have coordinated with the repository manager to make the data publicly accessible (https://www.iprox.cn/page/subproject.html?id=IPX0008674001).  

      Besides, we have conducted replicates of western blotting of sperm and testis extracts at least 3 times and added the objective quantifications of immunofluorescence signals and WB images. The quantifications of the blot were shown in figures to help readers understand these results easily.

      Reviewer #2 (Public Review):

      Summary:

      The authors first conducted whole exome sequencing for infertile male patients and families where they co-segregated the biallelic mutations in the Dynein Axonemal Heavy Chain 12 (DNAH12) gene.

      Sperm from patients with biallelic DNAH12 mutations exhibited a wide range of morphological abnormalities in both tails and heads, reminiscing a prevalent cause of male infertility, asthenoteratozoospermia. To deepen the mechanistic understanding of DNAH12 in axonemal assembly, the authors generated two distinct DNAH12 knockout mouse lines via CRISPR/Cas9, both of which showed more severe phenotypes than observed in patients. Ultrastructural observations and biochemical studies revealed the requirement of DNAH12 in recruiting other axonemal proteins and that the lack of DNAH12 leads to the aberrant stretching in the manchette structure as early as stage XI-XII. At last, the authors proposed intracytoplasmic sperm injection as a potential measure to rescue patients with DNAH12 mutations, where the knockout sperm culminated in the blastocyst formation with a comparable ratio to that in WT.

      Strengths:

      The authors convincingly showed the importance of DNAH12 in assembling cilia and flagella in both human and mouse sperm. This study is not a mere enumeration of the phenotypes, but a strong substantiation of DNAH12's essentiality in spermiogenesis, especially in axonemal assembly.

      The analyses conducted include basic sperm characterizations (concentration, motility), detailed morphological observations in both testes and sperm (electron microscopy, immunostaining, histology), and biochemical studies (co-immunoprecipitation, mass-spec, computational prediction). Molecular characterizations employing knockout animals and recombinant proteins beautifully proved the interactions with other axonemal proteins.

      Many proteins participate in properly organizing flagella, but the exact understanding of the coordination is still far from conclusive. The present study gives the starting point to untangle the direct relationships and order of manifestation of those players underpinning spermatogenesis. Furthermore, comparing flagella and trachea provides a unique perspective that attracts evolutional perspectives.

      Thank you for your thoughtful and positive feedback. We are delighted that you found our study to be a strong substantiation of DNAH12's essential role in spermiogenesis, particularly in axonemal assembly. We believe that this study represents a meaningful step toward unraveling the intricate coordination of axonemal proteins during spermatogenesis, and your comments further inspire us to continue exploring these complex mechanisms in future work. Thank you once again for your valuable insights and summary of this work.

      Weaknesses:

      Seemingly minor, but the discrepancies found in patients and genetically modified animals were not fully explained. For example, both knockout mice vastly reduced the count of sperm in the epididymis and the motility, while phenotypes in patients were rather milder. Addressing the differences in the roles that the orthologs play in spermatogenesis would deepen the comprehensive understanding of axonemal assembly.

      This is an interesting question. Actually, it seems that although humans and mice share the male infertility phenotypes with deficiency in dynein proteins essential for sperm flagellar development, they are different in some ways. For instance, it has been reported that deficiency in DNAH17 (Clin Genet. 2021. PMID: 33070343) or DNAH8 (Am J Hum Genet. 2020. PMID: 32619401; PMCID: PMC7413861), two other members of Dynein Axonemal Heavy Chain family, also cause more severe phenotype in mice, comparing with that of human patients carrying bi-allelic DNAH17 or DNAH8 loss-of-function mutations. In knockout mice, sperm counts are lower, and the proportion of abnormal sperm morphology is higher, whereas the phenotypes in human patients tend to be milder. These observations suggest that orthologs may influence spermatogenesis to slightly different extents in humans and mice. We plan to investigate the mechanisms underlying these discrepancies in future studies, which will provide deeper insights into axonemal assembly and the evolutionary aspects of spermatogenesis. Thank you again for bringing up this important issue.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This reviewer is impressed by the study's depth and the extent of the methodology used in the study. The study is well-designed, and the results are very interesting. The reviewer's enthusiasm was reduced by the lack of some controls (provided that the reviewer did not miss them). Further are point-to-point suggestions that this reviewer believes will increase the merit of the present study.

      Title:

      (1) Why a "special" dynein? What makes it special when compared to other dyneins? I suggest removing the word special.

      Through phylogenetic and protein domain analyses of the DNAH family, we found that DNAH12 is the shortest member and the only one that lacks a typical microtubule-binding domain (MTBD) in the DNAH family, thus we want to describe it as a “special” dynein. We have fully considered your valuable suggestion and decided to remove it from the title.

      Abstract:

      (2) L23: same as above, why special?

      We identified DNAH12 as the shortest member of the DNAH family and uniquely lacking the typical microtubule-binding domain (MTBD). This distinct characteristic prompted us to describe it as a 'special' dynein in the abstract part.

      (3) L37: the reviewer did not find a figure (neither main nor supplementary) that would demonstrate the proper organization of microtubules in cilia. Figure S11 only shows the presence of cilia in DNAH12-/- mouse. A TEM image of cilia is required to confirm or reject the claim that DNAH12 does not play a crucial role in proper microtubule organization in cilia.

      We have now added TEM images of cilia in wild-type and Dnah12<sup>-/-</sup> mice. The ultra-structures of cilia axonemes were comparable in wild-type and Dnah12<sup>-/-</sup> groups, suggesting that DNAH12 may not play crucial role in proper microtubule organization. The results have now been added to Supplemental Figure 11F.

      (4) L122-6: Did the authors also confirm these structures by cryo-EM? If not, this needs to be pointed out as a shortcoming in the discussion, that the structures and interactions are predicted in silico only.

      Thank you for your comment. Due to resource limit, we do not perform cryo-EM to confirm these structures. We will pursue the structures details at an atomic resolution structure in further study. We understand this point and now we have addressed this as a shortcoming in the discussion part.

      (5) L134: Be more specific about what characteristics of DNAH12 were analyzed.

      Thank you for your comment. We have now updated these in the method part. The characteristics of the DNAH12 including its region immunogenicity, hydrophilicity, surface leakage groups, and sequence homology were analyzed.

      (6) L137: Be more specific about how the antibodies validated were. Were the antibodies validated for both immunofluorescence and western blotting? I suggest doing peptide blocking of the antibody, for instance for ICC, preincubation of ab with immunizing peptide followed by primary ab incubation with studied cells/tissues.

      Thank you for your comments and suggestions. We validated the antibodies for both immunofluorescence and western blotting to ensure their effectiveness in our experiments. The two different anti-DNAH12 antibodies were developed with the help Dia-An Biotech company in 2022, we have attempted to acquire the polypeptide fragments of target proteins to do peptide blocking but the material were disposed after the service. Luckily, we have got the target band of DNAH12 protein which showed strong signal in western blotting experiment and the band was not detected in knockout mice group; the immunofluorescence signals of DNAH12 were strong but not present in knockout mice group. Besides, the IP experiment also showed the raised rabbit antibody were able to immunoprecipitated the DNAH12 band in the Dnah12<sup>+/+</sup> mice but not in Dnah12<sup>-/-</sup> mice. Collectively, these data could support the specificity of the raised DNAH12 antibodies. We sincerely admire your suggestion and will require for the peptide material if we develop new antibodies.

      (7) L142: This reviewer is unfamiliar with using TRIzol for sperm protein extraction. Is there a specific reason for not using PAGE loading buffer for human sperm protein extraction?

      Thanks for your suggestions. TRIzol reagent can be used for small amounts of samples (5×10<sup>6</sup> cells) as well as large amounts of samples (>10<sup>7</sup> cells). It is suitable for extraction of RNA and proteins at the same time. Our lab has adopted these methods in our previous work (Hum Reprod Open. 2023; PMID: 37325547; PMCID: PMC10266965.). This method is very useful to process valuable small amounts of samples for scientific work. The human sperm protein extraction was added with SDS-sample buffer [PAGE loading buffer] before SDS-PAGE separation. We have added this detail in the method part. We are sorry for making this misunderstanding.

      (8) L144: Were these the final concentrations of the SDS loading buffer? 1 × Laemmli buffer contains 62.5 mM TRIS, 2% (w/w) SDS, 10 % (w/v) glycerol, and 5% 2-mercaptoethanol. Please, amend accordingly.

      Thanks for your suggestions.  We apologized for incorrect labelling of concentrations (The previous one is 3× SDS loading buffer).  We have now amended the SDS loading buffer to 1 × Laemmli buffer as suggested.

      (9) L151: Table S2 contains other homemade antibodies than DNAH12. Please, include references to the studies where the generation and validation of these antibodies is described.

      Thank you for your suggestions. We have developed a DNAH1 antibody for use in Western blot assays, with its generation and validation detailed in Frontiers in Endocrinology (Lausanne), 2021 (PMID: 34867808; PMCID: PMC8635859). Additionally, we have produced a DNAH17 antibody for both immunofluorescence (IF) and Western blot, as described in Journal of Experimental Medicine, 2020 (PMID: 31658987; PMCID: PMC7041708). These references have now been included.

      (10) L167: Please, spell out ICR at its first appearance.

      Done as suggested, Thank you. The full name of ICR is Institute of Cancer Research.

      (11)L169: This reviewer is confused. It seems that the mouse encodes DNAH12 on exons 5 and 18 simultaneously. Each mouse model has only one exon targeted for a knockout. Would not this mean that the expression of DNAH12 in both models is not completely knocked down? Please, give more background in this paragraph for those less familiar with CRISPR/Cas9.

      Thank you for your insightful comment. We appreciate your attention to detail. To clarify, while the mouse model does indeed encode DNAH12 on exons 5 and 18 simultaneously, we specifically targeted the key exon 5 or exon 18 in each model to achieve different knockout strategies. This approach allows us to assess the functional implications of the remaining DNAH12 expression in both models. We have checked the DNAH12 expression in both models, and the result showed both models present with undetected DNAH12 proteins, indicating both models were completely knocked out of DNAH12 proteins. Additionally, we will revise the manuscript to include further details on the CRISPR/Cas9 methodology, ensuring accessibility for readers less familiar with this technique. Thank you again for your valuable feedback, which we believe will greatly enhance our manuscript.

      (12) L201: 50 % PBS? As in 0.5 x concentrated PBS? Please, rewrite for clarity.

      The term "50% PBS" refers to a 1:1 dilution of phosphate-buffered saline (PBS) with an appropriate diluent, resulting in a final concentration of 0.5x PBS. We will revise the text to explicitly clarify this, ensuring it is clear to all readers. Thank you for highlighting this point.

      (13) L224: Please, state what beads those were (magnetic/agarose, conjugated to protein A/G...) Include catalog # and manufacturer.

      Thank you for your suggestion. We have updated the manuscript to include this information. The beads used were Protein A/G Magnetic Beads (Catalog #B23202, Bimake, Texas, USA).

      (14) L227: What was the reason for adding a proteasomal inhibitor? What concentration was used? Please, add this information to the text.

      We adding MG132 in cell immunoprecipitation (IP) experiments is to inhibit proteasomal activity, thereby preventing the degradation of the target protein. This helps maintain the stability of the target protein during the experiment (Sci Adv. 2022. PMID: 35020426; PMCID: PMC8754306.), enhancing its detectability in subsequent analyses. MG132 (5 μM) was added. We have added this information in the revised the manuscript

      (15) L233: in vivo IP of mouse testis lysate? This does not make sense. I suggest removing "in vivo".

      Thank you for your careful review and comments on our manuscript. We have modified as suggested.

      (16) L317: Supplemental Figure 6 precedes Supplemental Figure 5 in the text, which is neither logical nor orderly.

      Thank you for your suggestion. Since the N-terminal DNAH12 antibody is already described in the Methods section (L317), we propose removing Supplemental Figure 6 from the content to improve the logical flow and maintain an orderly presentation.

      (17) L345 and elsewhere: how did the authors quantify the decrement of the signal? This needs to be measured objectively.

      Thank you for your valuable suggestion. We quantified the signal intensity using Fiji (Nat Methods. 2012. PMID: 22743772; PMCID: PMC3855844), which allows for precise analysis of pixel intensity. The results are presented in the figures to effectively illustrate the decrement in signal intensity. We appreciate your suggestion, and we have provided a description of the method in our methodology section.

      (18) L371: I recommend: ...and elongated spermatids; the abnormal...

      Done as suggested. Thank you.

      (19) L412-4: Cilia in both Dnah12<sup>mut/mut</sup> and Dnah12<sup>-/-</sup> are developed, but are they motile or immotile? This needs to be investigated. Is the DNAH12 in cilia truncated while still fulfilling its function?

      Thanks for your comment. We have checked the ciliary motility using an inverted microscope, and no significant difference of ciliary motility were observed between the knockout group and the control group. These results indicated that the ciliary motility was not affected by DNAH12 deficiency. The N-terminal DNAH12 antibody was developed to detect whether a truncated protein in mice tissues while we do not detect DNAH12 signals through immunofluorescence assay on trachea sections of the Dnah12<sup>-/-</sup> mice. These results indicate that DNAH12 may exert little influence on cilia, comparing to its important function in flagella.

      (20) L414-6: The results do not support this claim as the authors do not show that cilia are motile.

      Thanks for your comment. The supplemental videos 3-4 of trachea live of Dnah12<sup>+/+</sup> and Dnah12<sup>-/-</sup> mice have been uploaded to support this conclusion.

      (21) L421-3: Did the authors perform a negative test, where they let the testis lysate interact with beads/IgG only and performed the MS to identify non-specific binding? This is a crucial specificity test for this approach.

      We have performed negative test. In IP assay, we have added the IgG group in the IP-mass spectrometry to exclude non-specific binding. And the experimental design was described in Figure 6B. The raw data were deposited in iProX partner repository (PXD051681), which we have required the manager soon to update the status to public, so it will be visible to readers. 

      (22) L462: same as #18 the authors need to show that cilia are also motile. The mere presence of cilia in DNAH12-/- as shown in Fig S11C&D is not sufficient to conclude that the mice do not manifest PCD symptoms.

      Thanks for your comment. We do not observe obvious differences between the cilia of Dnah12<sup>+/+</sup> and Dnah12<sup>-/-</sup> mice.  The supplemental videos 3-4 of trachea live of Dnah12<sup>+/+</sup> and Dnah12<sup>-/-</sup> mice have been uploaded to show the motility of the trachea.

      (23) L529: MTBD region instead of domain, as "domain" is already part of the abbreviation.

      Done as suggested

      (24) L875: Sperm is both the singular and plural form. Spermatozoon vs spermatozoa can be used where the distinction between singular and plural needs to be made.

      Thanks for your suggestion. We have checked and changed this usage.

      (25) Figure 3H: Is there a specific reason why P11 is not shown?

      Because limited smear slides of P11 were available, the P11 were not stained for DNAH17 antibody previously. We have now updated the experiment, which showed that DNAH17 expression were not affected in patient P11. We have now added this result to Figure 3H.

      (26) Figure 8H: The authors in their MS do not describe what is happening to N-DRC proteins, yet they suggest in their model that it's unaffected in the mutant mouse/human. Please, address this in the MS and clearly state in the model that N-DRC needs further exploration in future studies.

      Thanks for your suggestion, we have checked the MS data but do not observe the enrichment of nexin-dynein regulatory complex (N-DRC) protein, just one known N-DRC protein DRC1 present with only 1 unique peptide. Instead, enrichment of inner dynein arm proteins and radial spoke proteins were observed. However, we cannot determine the N-DRC structures maybe affected or not. We have stated this in the discussion part and will pursue this with high resolution technology like cryo-EM in the future.

      (27) Figure 5F: Is it possible to choose a different Dnah12<sup>-/-</sup> spermatozoon to see a reduced level of DNALI1 so that it corresponds with the WB detection in Fig 5B?

      Thanks for your suggestion, we have chosen a Dnah12<sup>-/-</sup> spermatozoon with faint remnants of the DNALI1 signal as the representative picture.

      (28) Figure S2 and elsewhere: How were the authors able to resolve and calibrate 356 kDa protein using SDS PAGE? Agarose electrophoresis protein electrophoresis is more suitable for resolution of high molecular proteins. Most of the protein standards have as high molecular standard as 250 kDa.

      We have found that high molecular proteins (like 356kDa) were able to resolve in concentration 4-12% gradient gel of polyacrylamide gels and employ appropriate voltages and more time during electrophoresis to improve resolution of high molecular weight proteins. The DNAH12 proteins were calibrated by the using of a HiMark™ Pre-Stained High Molecular Weight Protein Standard (30-460 kDa). We have now updated the blot images to show the size of the DNAH12 protein (Fig S6B,). The target band is obvious between 268 kDa and 460 kDa, which make it easy to calculate the target band of DNAH12 antibody elsewhere. Thanks for your suggestion.

      (29) Figure S5: similar to #24: Why P10 and P11 are not shown?

      Because limited smear slides of P10 or P11 were available, we did not stain ODF2 antibody previously. We have now updated the experiments, which showed that ODF2 expression were not affected in patient P10 or P11. We have now added this result to Figure S5.

      (30) Figure S6B: The specificity of the anti-DNAH12 antibody against mouse DNAH12 seems to be questionable since the authors detect multiple bands on WB. I recommend doing peptide blocking to show that these are non-specific binding as opposed to off-target binding.

      Thank you for your comments. In order to study in-depth, we have analyzed the protein sequence features of DNAH12 protein, 1-200 amino acids of DNAH12 were selected as the ideal antigen considering its good performance (1. high immunogenicity; 2. High hydrophilicity; 3. Good Surface Leakage Groups; 4. Sequence homology analysis to avoid unspecific recognition to other proteins;). The two different anti-DNAH12 antibodies were developed with the help Dia-An Biotech company in 2022, we have attempted to acquire the polypeptide fragments of target proteins to do peptide blocking but the material were disposed after the service. Luckily, we have got the target band of DNAH12 protein which showed strong signal in western blotting experiment and the band was not detected in knockout mice group; the immunofluorescence signals of DNAH12 were strong but not present in knockout mice group. Besides, we have tested that the inhouse raised rabbit antibody was suitable for IP experiment. The IP experiment also showed the raised rabbit antibody were able to immunoprecipitated the DNAH12 band in the Dnah12<sup>+/+</sup> mice but not in Dnah12<sup>-/-</sup> mice. Collectively, these data could support the specificity of the raised DNAH12 antibodies. We admire your suggestion and will require for the peptide material if we develop new antibodies.

      Reviewer #2 (Recommendations For The Authors):

      Recruitment of DNAH1 and DNALI1 to the flagella is dependent on DNAH12 expression, according to the data. What would be the mechanism that locates DNAH12 which lacks MTBD to the flagella?

      Thank you for your insightful question. We are currently investigating the mechanisms that facilitate the loading of DNAH12 to the flagella. Based on existing data, we hypothesize that CCDC39 and/or CCDC40 may play a critical role in the recruitment of DNAH12 to sperm flagella during spermiogenesis (Nat Genet. 2011, PMID: 21131972; PMCID: PMC3509786; Nat Genet. 2011, PMID: 21131974; PMCID: PMC3132183). Furthermore, a structural study by Walton et al. showed that DNAH12 associates with CCDC39/CCDC40 proteins (Nature. 2023, PMID: 37258679; PMCID: PMC10266980). These findings suggest that CCDC39 and/or CCDC40 may play a role in facilitating the localization of DNAH12 to the flagella. Additional studies are needed to identify other potential factors involved in this process and to further elucidate the mechanisms underlying this complex biological phenomenon.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Regarding the manuscript's clarity, the sentence on page 5, "We also stained VTA sections for Tyrosine hydroxylase (TH) to estimate the rate of ChR2 colocalization with DA neurons," reads awkwardly. Removing the word "rate" could improve clarity.

      We have made the recommended clarifying edit (page 5, lines 30-31).

      Additionally, the anatomical data and findings are largely non-quantitative in nature. However, solid microscopy images are presented to support each claim. Additional quantification would strengthen the paper, specifically the quantification of projection density for each population and the proportion of each subpopulation that projects to their regions of interest.

      To rigorously quantify the projection density of each subpopulation would require a level of exhaustivity our study was not designed for. This is because during microscopy we focused efforts on imaging regions containing dense signals but did not exhaustively image regions receiving apparently weak or no input. While we considered including a semi-quantitative table of projection density, based on the data available we could not discriminate with confidence between, e.g., regions recipient of minimal input versus no input from VTA populations. Thus, while we stand by our descriptive statements we do not expand on those further.

      The authors should consider discussing the possibility that subpopulations of these cells could still be true interneurons especially if cells were looked at the single neuron level of resolution.

      We agree that some of the VTA populations we studied could include subpopulations that are bona fide interneurons. The identification of alternate markers or combinations of markers, or use of single-cell imaging approaches may indeed support this possibility in future. This is discussed in the context of currently available evidence on page 5 lines 32-34, page 11 lines 2-4, page 12 lines 2-11, and page 12 lines 15-16.

      Overall, the paper is well-written and important for the field and beyond.

      Thank you!

      Reviewer #2:

      Weaknesses:

      While the authors use several Cre driver lines to identify GABAergic projection neurons, they then use wild-type mice to show that projection neurons synapse onto neighboring cells within the VTA. This does not seem to lend evidence to the idea that previously described "interneurons" are projection neurons that collateralize within the VTA.

      We think the use of WT mice is a strength because it allows us to measure both GABA and non-GABA synapses made by VTA projections on to the same cells within VTA. However, we have also done this experiment targeting NAc-projecting VTA VGAT-Cre neurons, and VP-projecting VTA MOR-Cre neurons. Consistent with the WT dataset, we find that these defined projection neurons also make intra-VTA synapses. These data are now included as Figure 7.

      More broadly. Our review of the literature finds very little evidence to support the notion of a VTA interneuron as we define it: VTA neurons that makes only local connections. But the absence of evidence need not imply evidence of absence, thus we do not claim that all VTA neurons previously presumed to be interneurons must be projection neurons. We do express confidence in our findings that VTA projection neurons (that include GABA-releasing neurons) make local synapses in VTA. We argue that in the absence of compelling positive evidence for the existence of VTA interneurons, such as a selective marker, “we”, “the field”, should not presume their existence.

      Other suggestions:

      (1) While the authors present evidence that some projection neurons also synapse locally, there is no quantification as to the proportion of each neuronal subtype that collateralizes within the VTA. This would be a useful analysis.

      We agree this would be useful information. But our experiments were not designed to answer this question. Indeed, we have not conceived of a feasible method to discriminate between collateralizing and non-collateralizing VTA projection neurons at the single-cell level, thus we do not know how we would calculate such proportions.

      (2) There is significant interest in the molecular heterogeneity and spatial topography of the VTA. Additional analyses of the spatial topography of labeled projectors would be useful. For example, knowing if Pvalb+ projection neurons are distributed throughout the VTA or located along the midline would be a useful analysis.

      Prior studies and public databases (e.g., Allen brain atlas, GENSAT) allow one to visualize the location of VTA neurons positive for Pvalb and the other markers we investigated (Olson & Nestler, 2007). However, these label the entire population of neurons and thereby include those that project to any of the various projection targets. There are also studies that have used retrograde labeling approaches to map the distribution of labeled VTA cells projecting to one or another target (Beier et al., 2015; Lammel et al., 2008; Margolis et al., 2006). For example, finding that LHb-projecting neurons (a major target of Pvalb+ VTA neurons) are enriched in medial VTA (Root et al., 2014). From this evidence we might infer that Pvalb+ VTA neurons that project to LHb are likely to be medially biased. Future studies may more carefully map the intersection of specific projection targets for each VTA subpopulation.  

      Reviewer #3 (Recommendations For The Authors):

      Weaknesses:

      This study has a few modest shortcomings, of which the first is likely addressable with the authors' existing data, while the latter items will likely need to be deferred to future studies:

      (1) Some key anatomical details are difficult to discern from the images shown. In Figure 1, the low-magnification images of the VTA in the first column, while essential for seeing what overall section is being shown, are not of sufficient resolution to distinguish soma from processes. A supplemental figure with higher-resolution images could be helpful.

      We uploaded a higher resolution file for figure 1.

      Also, where are the insets shown in the second column obtained from? There is not a corresponding marked region on the low-magnification images. Is this an oversight, or are these insets obtained from other sections that are not shown?

      This was an oversight, we added the corresponding marked region to the low-magnification images.

      Lastly, there is a supplemental figure showing the NAc injection sites corresponding to Figure 5, but not one showing VP or PFC injection sites in Figure 6. Why not?

      We added a figure with histology examples for the VP and the PFC injection sites as done for Figure 5, included as Supplemental Figure 3.

      (2) Because multiple ChR2 neurons are activated in the optogenetic experiments, it is not clear how common is it for any specific projection neuron to make local connections. Are the observed synaptic effects driven by just a few neurons making extensive local collateralizations (while other projection neurons do not), or do most VTA projection neurons have local collaterals? I realize this is a complex question, that may not have an easy answer.

      This is a great question but, indeed, we don’t know the answer. As mentioned in response to Reviewer #2, we are not convinced there is a currently feasible way to discriminate between collateralizing and non-collateralizing cells at the single cell level.

      (3) There is something of a conceptual disconnect between the early and later portions of this paper. Whereas Figures 1-4 examine forebrain projections of genetic subtypes of VTA neurons, the optogenetic studies do not address genetic subtypes at all. I do realize that is outside of the scope of the author's intent, but it does give the impression of somewhat different (but related) studies being stitched together. For example, the MOR-expressing neurons seem to project strongly to the VP, but it is not addressed whether these are also the ones making local projections. Also, after showing that PV neurons project to the LHb, the opto experiments do not examine the LHb projection target at all.

      This too was raised by Reviewer #2. While addressing this question for all the populations we investigated feels redundant, we now include optogenetic data showing that NAc-projecting VTA VGAT-Cre and VP-projecting VTA MOR-Cre neurons also make local collaterals (Figure 7). We think this allows us to connect the two approaches to a greater degree. Based on our findings using a dual virus approach to express Syn:Ruby in each population of VTA projection neuron, we think it very likely that we’d continue to find similar results using optogenetics-assisted slice electrophysiology for each population.

      Other suggestions:

      (1) I appreciated the extensive and high-quality anatomical figures shown in Figures 2-4. However, the layout was sometimes left-to-right, and sometimes right-to-left, which felt distracting. At some point, the text refers to "Fig. 3KJ", i.e. with the letters being in backward alphabetical order, and Figures 3I and 3L do not appear mentioned anywhere in the main text, leading me to wonder if that text was intended to read "Fig. 3I-L".

      Thank you for noting this. We have harmonized the layout of Figures 2-4 and adjusted the in-text Figure call-outs.

      Also, the inset in Figure 3J appears to show local collaterals of NTS neurons in the VTA, since there is no soma in that inset. This is interesting, and worth reporting, but is not explained in either the main text or Figure legend.

      We added a more complete description in the result section (page 6 line 25-30).

      (2) Perhaps I missed it, but I could not find any mention of the intensity of the LED light delivered during the optogenetic experiments. While acknowledging that this can be variable, do the authors have at least a rough range?

      We have added this information to the methods, page 17 line 8.

      Editor's Note:

      Should you choose to revise your manuscript, please double check that you have fully reported all statistics including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals.

      We confirm that we have fully reported all statistics including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals.

      Note to Editor and Readers

      While reanalyzing our data for resubmission, we discovered that some of the short-latency optogenetic evoked postsynaptic currents (oPSCs) we detected were erroneously categorized. Specifically, some VTA cells that showed large outward currents (oIPSCs) when held at 0 mV, also had small inward currents when held at -60 mV. These small inward currents were initially categorized as oEPSCs, suggesting these VTA cells received input from populations of VTA projection neurons that released GABA and/or glutamate. However, the kinetics of these small inward currents were slow and aligned with the within-cell kinetics of the oIPSCs, indicating that these were very likely mediated by GABA<SUB>A</SUB> receptors. In one case the opposite was apparent, with a small PSC initially miscategorized as an oIPSC. These miscategorized oEPSCs and oIPSC were presumably detected because our holding potentials were not precisely identical to the reversal potentials for GABA<SUB>A</SUB> and AMPA receptors, respectively. For this reason, we removed these 14 oEPSCs and 1 oIPSCs from our analyses in the revised version. The revised dataset suggests that VTA glutamate projection neurons may be less likely to collateralize widely within VTA compared to GABA projection neurons. But, importantly, this correction does not affect any of our conclusions.

      Citations:

      Beier, K. T., Steinberg, E. E., DeLoach, K. E., Xie, S., Miyamichi, K., Schwarz, L., Gao, X. J., Kremer, E. J., Malenka, R. C., & Luo, L. (2015). Circuit Architecture of VTA Dopamine Neurons Revealed by Systematic Input-Output Mapping. Cell, 162(3), 622-634. https://doi.org/10.1016/j.cell.2015.07.015

      Lammel, S., Hetzel, A., Hackel, O., Jones, I., Liss, B., & Roeper, J. (2008). Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron, 57(5), 760-773. https://doi.org/10.1016/j.neuron.2008.01.022

      Margolis, E. B., Lock, H., Chefer, V. I., Shippenberg, T. S., Hjelmstad, G. O., & Fields, H. L. (2006). Kappa opioids selectively control dopaminergic neurons projecting to the prefrontal cortex. Proc Natl Acad Sci U S A, 103(8), 2938-2942. https://doi.org/10.1073/pnas.0511159103

      Olson, V. G., & Nestler, E. J. (2007). Topographical organization of GABAergic neurons within the ventral tegmental area of the rat. Synapse, 61(2), 87-95. https://doi.org/10.1002/syn.20345

      Root, D. H., Mejias-Aponte, C. A., Zhang, S., Wang, H. L., Hoffman, A. F., Lupica, C. R., & Morales, M. (2014). Single rodent mesohabenular axons release glutamate and GABA. Nat Neurosci, 17(11), 1543-1551. https://doi.org/10.1038/nn.3823

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This is by far the phylogenetic analysis with the most comprehensive coverage for the Nemacheilidae family in Cobitoidea. It is a much-lauded effort. The conclusions derived using phylogenetic tools coincide with geological events, though not without difficulties (Africa pathway).

      Strengths:

      Comprehensive use of genetic tools

      Weaknesses:

      Lack of more fossil records

      Thank you for appreciating the comprehensiveness of our study.

      We agree that additional nemacheilid fossils would have provided valuable support for reconstructing the evolutionary history of the family. However, the nemacheilid fossil used in our study is currently the only fossil species of the family, which precludes the possibility of including more. To address this limitation, we incorporated fossils from closely related fish families, as well as a geological event, to calibrate the time tree. We have added further details on this point in “Divergence time estimations and ancestral range reconstruction” section of the Methods. The reconstruction of the pathway by which loaches reached northeast Africa, is further complicated by the extensive aridification of the Arabian Peninsula and the Nile valley, leaving no fossil or extant Nemacheilidae species of Nemacheilidae to provide insights into the distribution of the family during late Miocene.

      Reviewer #2 (Public review):

      Summary:

      The authors present the results of molecular phylogenetic analysis with very comprehensive samplings including 471 specimens belonging to 250 species, trying to give a holistic reconstruction of the evolutionary history of freshwater fishes (Nemacheilidae) across Eurasia since the early Eocene. This is of great interest to general readers.

      Strengths:

      They provide very vast data and conduct comprehensive analyses. They suggested that Nemacheilidae contain 6 major clades, and the earliest differentiation can be dated to the early Eocene.

      Weaknesses:

      The analysis is incomplete, and the manuscript discussion is not well organized. The authors did not discuss the systematic problems that widely exist. They also did not use the conventional way to discuss the evolutionary process of branches or clades, but just chronologically described the overall history.

      In the revised version, we address the systematic issues within Nemacheilidae in a new paragraph. The polyphyly of the genus Schistura and the polyphyly or paraphyly of many other nemacheilid genera are wellknown challenges in ichthyology. However, the large size of the family Nemacheilidae and the absence of a clear basal classification system has made systematic work difficult.

      The chronological concept in the description of events is in accordance with the sequence in which the events occurred over time and corresponds with Figure 8. Additionally, a clade-by-clade description would make it challenging to capture the periods before all clades were formed. As a compromise, the revised version includes a new table where each clade is represented by a column, allowing readers to trace the history of each clade in a clear overview. With this table, we make both the chronological and clade-by-clade perspectives to enhance reader understanding

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have no major comments, except for Figure 8, where the colour code for Sunda is not consistent, appearing as light purple and then dark purple. I was trying to locate the colour legend, maybe include this for all figures or refer to it.

      Figure 8 has been revised to improve matching of the colours.

      Reviewer #2 (Recommendations for the authors):

      (1) It is better to discuss the evolutionary history of the major inner groups. For example, why the Branch A and B differentiated? How are the 6 major clades differentiated?

      As mentioned above, the new table provides an overview of the evolutionary history of the major clades and, where known, the mechanism that led to their differentiation. For branches A and B, the underlying causes of differentiation remain known. Currently, the extensive morphological variability within each clade prevents a definitive morphological diagnosis, but such a study is planned for the future.

      (2) In this study, there are still some phylogenetic or systematic problems unresolved. For example, the Genus Schistura remains polyphyletic even in different major clades. The situation is similar for the Genus Tripophysa though not so serious. These need to be discussed or at least partially solved before discussing the evolutionary history.

      We discuss these topics now in a new paragraph ‘Taxonomic implications’.

      (3) In Table S1, what is the meaning of "-". Does this mean no data available? If so, how do the authors treat this in their phylogenetic analysis?

      Indeed, in Table S1, a ‘-‘ indicates that no sequence was available for the given species and gene. In the phylogenetic analyses, these cases were treated as missing data.

      (4) What is the source of Figure 8? There are different opinions on the geological events. The authors need to indicate the source of their information.

      The sources of Fig. 8 are now provided in the figure caption.

      (5) The Eastern Clade forms continuous distribution in Figure 6, but discontinuous in Figure 8. Is this correct?

      Figure 6 does not display the distribution areas for the clades, but illustrates the biogeographic regions used in the biogeographic analysis.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Recommendations for the authors):

      A good number of sentences in the introduction, page two, refer to a figure, 'Fig. 2a', which appears to be the copy-paste effect of these sentences from another location (please see below):

      "Notably, SPHK2 does not directly contribute to levels of secreted S1P (Thuy et al., 2022), nor is it annotated in the chick genome. S1P can be exported from cells by a transporter (MFSD2A and SPNS2) or converted to sphingosine by a phosphatase (SGPP1) (Fig. 2a). Levels of sphingosine are increased by ASAH1 by conversion of ceramide or decreased by CERS2/5/6 by conversion to ceramide (Fig. 2a). S1P is known to activate G-protein coupled receptors, S1PR1 through S1PR5 (Fig. 2a). S1PRs are known to activate different cell signaling pathways including MAPK and PI3K/mTor, and crosstalk with pro-inflammatory pathways such as NFκB (Fig. 2a) (Hu et al., 2020)."

      We have removed references to Fig. 2a, which was from a previous draft of this manuscript.

      Please correct the typo in the following sentence (Fid.)

      "S1PR1 was most prominently expressed by resting MG and MG returning to a resting state, whereas S1PR3 was detected in relatively few scattered cells in clusters of MG, ganglion cells, horizontal cells, bipolar cells, amacrine cells, photoreceptors, oligodendrocytes, microglia and NIRG cells (Fid. 1d).

      We have corrected this typo_._

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses: 

      It is not always clear what the novel findings are that this manuscript is presenting. It appears to be largely similar to the analysis done by McKey et al. (2022) but with more time points and molecular markers. The novelty of the present study's findings needs to be better articulated. 

      The previous study focused on placing the Rete Ovarii in the context of ovarian development. The current study focuses on the novel findings that the EOR is a active structure that sends fluid/information to the ovary. We show this by characterizing the presence of secretory proteins in the RO epithelial cells, by dye injections into the EOR and observing transport of the dye to the ovary, and by collection of EOR fluid followed by proteomic analysis. We also show that RO is embedded in an elaborate vascular network and contacted by neurons. None of this data was not discussed in the McKey 2022 paper. 

      Reviewer #2 (Public Review):

      Clarifications: 

      (1) Is there any comparative data on the proteomics of RO and rete testis in early development? With some molecular markers also derived from rete testis, it would be better to provide the data or references.

      To the best of our knowledge, there are no available proteomic datasets of the embryonic or early postnatal mouse Rete Testis or Epididymis. The authors agree that having this information would be very useful. 

      (2) Although the size of RO and its components is quite small and difficult to operate, the researchers in this article had already been able to perform intracavitary injection of EOR and extract EOR or CR for mass spectrometry analysis. Therefore, can EOR, CR, or IOR be damaged or removed, providing further strong evidence of ovarian development function?

      We attempted to genetically ablate the RO by expressing the diphtheria toxin receptor (DTR) in RO cells and adding DT. This approach was not successful in ablating the RO. We also tried to use Pax2/8 homo- and heterozygous mutants for ablation (as used in the McKey 2022 paper), but so far, we cannot find a genetic combination that ablates the RO, but not the oviduct, uterus and/or kidneys. We have also embarked on a study to surgically remove the RO. This assay is taking some time to optimize. The goal of the current study was to characterize the cells along the length of the RO and to present evidence that it is a secretory appendage of the ovary.

      (3) Although IOR is shown on the schematic diagram, it cannot be observed in the immunohistochemistry pictures in Figure 1 and Figure 3. The authors should provide a detailed explanation.

      An annotation has been added to Figure 1 to indicate the IOR. As the images within the panels are of maximum intensity projections, it is often difficult to clearly see the IOR as it is deeper within the ovary. In Figure 3, the view of the ovary is from the ventral side:  this view does not allow for clear visualization of the IOR.

      Reviewer #3 (Public Review):

      Weaknesses: 

      There is a lack of conclusive data supporting many conclusions in the manuscript. Therefore, the paper's overall conclusions should be moderated until functional validations are conducted.

      We have moderated the conclusions where appropriate

      Reviewer #1 (Recommendations For The Authors):

      (1) The introduction is relatively brief and does not mention some historical data/hypotheses on the role of the RO in ovarian function (e.g. regulation of meiotic entry) or development (e.g. Mayère et al., 2022).

      Mayere 2022 was cited in line 57. Steins hypothesis about entry into meiosis has been added line 58.

      (2) L82-84: It is stated that KRT8 was first identified as a potential RO marker by sc/snRNAseq (Anbarci et al., 2023) and then validated in this manuscript. However, KRT8 was used by McKey et al. (2022) as a RO marker, and they noted there that KRT8 was enriched in the EOR. It is not clear why McKey et al. is not cited as the primary reference validating KRT8 as an EOR marker.

      The embryonic and neonatal timecourse description from KRT8 expression is first identified in this paper. McKey 2022 only highlights KRT8 at E18.5 A reference has been added to address this line 85

      (3) Figure 1: Can the IOR be seen in these images? If so, please label. 

      The label has been added.

      (4) L107: It is hypothesized that "the RO may respond to or interpret homeostatic cues." Can transcriptomics data shed light on what signals the RO may be capable of responding to? E.g. what receptors are expressed by cells of the RO (e.g. ER, LHCGR, FSHR)?

      The RO expresses ESR1, PGR, INSR, IGF1R. The IOR exclusively expresses LHCGR and FSHR.This has been added to the manuscript line 309

      (5) L152: Mass spec was used to identify proteins secreted into the lumen of the RO. These proteins were then compared to the mammalian secretome to filter out possible nonsecreted protein contaminants. Finally, the candidates were compared to the RO scRNAseq data from Anbarci et al., (2023). This method gives a very conservative candidate list. However, it may also be informative to compare the sc/snRNA-seq gene list directly to the secretome to ID other possible candidate-secreted proteins that may not have been detected in the mass spec data set. 

      There are quite a number of secreted proteins that are also not actively secreted. This is a good suggestion for future analysis. For the current study we wanted to take a more conservative approach, and chose to do proteomics to determine proteins that are actively secreted. 

      (6) L195: It is not clear if IGFBP2 is expressed by both OR and granulosa cells or only granulosa cells. It would be informative to know what ovarian cell types express both IGFBP2 and IGF1R (e.g. from sc/snRNA-seq)? This information is referenced in the discussion (L285-287) but would be better to reference it in the results section for clarity.

      Both RO and granulosa cells express IGFBP2 and IGF1R. A sentence has been added to results for clarity. (Line 197)

      (7) L295: "...the RO participates in endocrine signaling..." might be more accurate to say "...the RO responds to endocrine signaling...".

      The authors agreed that this statement is more accurate and the changes have been made. 

      Reviewer #3 (Recommendations For The Authors): 

      Several issues significantly affect the paper's quality in the current version. Firstly, there is a lack of conclusive data supporting many conclusions in the manuscript. For instance, the assertion in line 105 that "EOR was directly innervated by neurons" lacks substantial evidence beyond basic immunofluorescent staining. 

      We agree that the term “innervated” might be a step too far since we rely on IF evidence.  We changed the wording of this sentence to say, “The EOR was directly contacted by neurons”.

      In another pivotal experiment illustrated in Figure 3, the provided images lack temporal continuity and quantitative analysis, suggesting the incorporation of time-lapse imaging for improved sequential presentation in Figure 3.

      The microscope where we can perform injections cannot record movies.  We have tried moving the rete to another microscope after injection, but so far, we have been unable to capture dextran moving through the RO. We therefore believe that transport is rapid, but future experiments will be needed to optimize this imaging.

      Moreover, relying solely on proteomics analysis, as seen in lines 188-189, makes it challenging to assert conclusions such as "EOR actively secretes proteins." Therefore, the paper's overall conclusions should be moderated until functional validations are conducted. 

      The findings that (1) the cells of the EOR express SNARE complex proteins at their apical surfaces and (2) luminal fluid expelled from the EOR contains abundant secreted proteins strongly suggest that the RO is involved in active secretion. We use the word “suggest” in this sentence, lines 188-189 as we realize that further experiments should be done to validate this conclusion.

      Furthermore, the predominant methods in this study involve immunostaining and imaging. However, the current images exhibit a notable inconsistency in color definitions for different markers by the authors. For instance, in Figure 2.A/C, PAX8 is portrayed as cyan, while in D, it is represented in yellow. Similarly, in Figure 4, E-CAD is depicted using both cyan and yellow. Utilizing different colors for the same protein within a figure can significantly confuse readers' interpretation of the experiments. Rectifying these inconsistencies is essential to enhance the clarity and comprehension of the experimental results.

      These colors were chosen to be visible to those with color image impairments. We typically used cyan and magenta to emphasize the most important markers in the image. When E-Cad and KRT8 were often used to emphasized or landmark a structure by localization of these protein. When KRT8 and E-Cad were highlighted, they were represented in cyan and magenta for visibility. When these proteins were used as a landmark to orient the reader and not as the main point, they were labeled in yellow.

      At last, many markers in this study are derived from bulk and single-cell sequencing of developing RO. However, it seems that these important data were separated into another paper as a preprint. If this data were incorporated into the current manuscript, the manuscript would become more comprehensive for guiding future research on the RO.

      Since we have single cell and single nuclei data from fetal and adult estrus and metestrus stages, we found that incorporating all this data into the present manuscript was overwhelming. Instead, we devoted another manuscript to presenting and validating that data. We believe a quick look at the sequencing manuscript will make this clear.

    1. Author response:

      We appreciate the reviewers’ thoughtful and constructive feedback, which has provided valuable insights to refine our manuscript. Below, we outline the planned revisions in response to the public reviews.

      Response to Reviewer #1

      We are grateful for the reviewer’s recognition of our methodological approach and the potential significance of CD47 as a novel MSC marker for cartilage repair. To address the concerns raised:

      (1) Clarifying the proteomics data supporting CD47 as an MSC marker

      · The manuscript will be revised to clearly indicate where the proteomics data demonstrate elevated CD47 expression in MSCs compared to non-MSCs.

      · Additional figure annotations or a supplemental figure may be included to enhance clarity.

      (2) Providing further details on CD47hi and CD47lo MSC populations

      · Information on the number of isolated CD47hi and CD47lo cells, along with any necessary expansion steps before in vivo use, will be explicitly detailed.

      (3) Expanding the characterization of CD47hi MSCs in vitro

      · A more comprehensive analysis of the chondrogenic differentiation capacity of CD47hi MSCs will be incorporated to strengthen the findings.

      (4) Clarifying experimental details of the in vivo rat OA model

      · The methodology section will be updated to specify the number of injected cells and their labeling strategies.

      · Representative histological images will be added to support the results.

      · To further substantiate the cartilage repair potential of CD47hi MSCs, additional staining for Collagen Type II will be included alongside Sox9 expression.

      Response to Reviewer #2

      We appreciate the reviewer’s enthusiasm for the study and recognition of its rigor and translational significance. The following revisions are planned to address the feedback:

      (1) Addressing additional assessments for OA phenotype in the rat model

      · While this study primarily relied on histology, the limitations of this approach will be acknowledged in the discussion.

      · The absence of microCT and behavioral assessments will be explained, with suggestions for incorporating these methods in future studies.

      (2) Justifying the focus on CD47

      · The rationale behind prioritizing CD47 over other proteomics-identified markers will be expanded to provide better context for this choice.

      (3) Clarifying MSC engraftment patterns

      · The manuscript will include a discussion on whether CD47hi MSCs specifically engraft in articular cartilage or contribute to ectopic cartilage formation (e.g., osteophytes).

      (4) Contextualizing findings within recent research on synovial progenitors

      · Additional discussion will highlight recent studies on DPP4+ PI16+ CD34+ stromal cells and how the identified MSC populations may relate to these universal fibroblasts.

      We are confident that these revisions will strengthen the manuscript and enhance its clarity and impact. The reviewers’ insights have been invaluable, and we look forward to refining the study accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      While CRISPR/Cas technology has greatly facilitated the ability to perform precise genome edits in Leishmania spp., the lack of a non-homologous DNA end-joining (NHEJ) pathway in Leishmania has prevented researchers from performing large-scale Cas-based perturbation screens. With the introduction of base editing technology to the Leishmania field, the Beneke lab has begun to address this challenge (Engstler and Beneke, 2023).

      In this study, the authors build on their previously published protocols and develop a strategy that:

      (1) allows for very high editing efficiency. The cell editing frequency of 1 edit per 70 cells reported in this study represents a 400-fold improvement over the previously published protocol,

      (2) reduces the negative effects of high sgRNA levels on parasite growth by using a weaker T7 promoter to drive sgRNA transcription.

      The combination of these two improvements should open the door to exciting large-scale screens and thus be of great interest to researchers working with Leishmania and beyond.

      We thank reviewer #1 for these encouraging comments.

      Reviewer #2 (Public Review):

      Summary:

      Previously, the authors published a Leishmania cytosine base editor (CBE) genetic tool that enables the generation of functionally null mutants. This works by utilising a CAS9-cytidine deaminase variant that is targeted to a genetic locus by a small guide RNA (sgRNA) and causes cytosine to thymine conversion. This has the potential to generate a premature stop codon and therefore a loss of function mutant.

      CBE has advantages over existing CAS-based knockout tools because it allows the targeting of multicopy gene families and, potentially, the easier generation of pooled loss of function mutants in complex population experiments. Although successful, the first generation of this genetic tool had several limitations that may have prevented its wider adoption, especially in complex genome-wide screens. These include nonspecific toxicity of the sgRNAs, low transfection efficiencies, low editing efficiencies, a proportion of transfectants that express multiple different sgRNAs, and insufficient effectivity in some Leishmania species.

      Here, the authors set out to systematically solve each of these limitations. By trialling different transfection conditions and different CAS12a cut sites to promote sgRNA expression cassette integration, they increase the transfection efficiency 400-fold and ensure that only a single sgRNA expression cassette integrates that edits with high efficiencies. By trialling different T7 promoters, they significantly reduce the non-specific toxicity of sgRNA expression whilst retaining high editing efficiencies in several Leishmania species (Leishmania major, L. mexicana and L. donovani). By improving the sgRNA design, the authors predict that null mutants will be more efficiently produced after editing.

      This tool will find adoption for producing null mutants of single-copy genes, multicopy gene families, and potentially genome-wide mutational analyses.

      Strengths:

      This is an impressive and thorough study that significantly improves the previous iteration of the CBE. The approach is careful and systematic and reflects the authors' excellent experience developing CRISPR tools. The quality of data and analysis is high and data are clearly presented.

      Weaknesses:

      Figure 4 shows that editing of PF16 is 'reversed' between day 6 and day 16 in L. mexicana WTpTB107 cells. The authors reasonably conclude that in drug-selected cells there is a mixed population of edited and non-edited cells, possibly due to mis-integration of the sgRNA expression construct, and non-edited cells outcompete edited cells due to a growth defect in PF16 loss of function mutants. However, this suggests that the CBE tool will not work well for producing mutants with strong fitness phenotypes without incorporating a limiting dilution cloning step (at least in L. mexicana and quite possibly other Leishmania species). Furthermore, it suggests it will not be possible to incorporate genes associated with a growth defect into a pooled drop-out screen as described in the paper. This issue is not well explored in the paper and the authors have not validated their tool on a gene associated with a severe growth defect, or shown that their tool works in a mixed population setting.

      We would like to thank reviewer #2 for this helpful comment and valid point. We have now included a small-scale loss-of-function screen in L. mexicana, targeting nine known essential genes with 24 CBE sgRNAs and 15 non-targeting control sgRNAs. This approach successfully detected all known included growth-associated phenotypes in a pooled screening format. This experiment is now shown in Figure 5 and described in section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen”.

      In addition, we would like to re-iterate our initial public response to this comment. We believe that escapes or reversals of mutant phenotypes can be observed also with other genetic tools used for loss-of-function screening, including lentiviral CRISPR approaches in mammalian systems and RNAi in Trypanosoma brucei (e.g. Ariyanayagam et al., 2005 and Schlecker et al., 2005). Notably, in lentiviral delivered CRISPR screens, sgRNA expression cassettes are integrated in random places within the genome and multiple cassettes can be integrated depending on the viral titre. In these type of screens, cells can escape phenotypes through various mechanisms, such as promoter silencing or selection of non-deleterious mutations. Additionally, not every CRISPR guide is efficient in generating a mutant phenotype, and RNAi constructs can also vary in their effectiveness. Despite these challenges, genome-wide loss-of-function screens have been successfully carried out in mammalian cells and Trypanosoma parasites. Therefore, we believe that the observed escape of one mutant phenotype does not preclude the detection of growth-associated or other phenotypes in pooled screens. Moreover, we did not observe a reversal of the mutant phenotype in L. mexicana, L. donovani, and L. major parasites expressing tdTomato from an expression cassette integrated into the 18S rRNA SSU locus (Figure 4). Our now included small scale fitness screen (Figure 5) confirms these assumptions and shows that we can detect “strong” growth associated phenotypes. We would also like to point out that we have recently successfully conducted several genome-wide loss-of-function screens in vivo and in vitro, ultimately confirming the feasibility of this type of screen on a genome-wide scale (manuscript in preparation).

      We have included a discussion of these points under section “Integration of CBE sgRNA expression cassettes via AsCas12a ultra-introduced DSBs increase editing rates” and section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen” in our revised manuscript.

      Although welcome, the improvements to the crRNA CBE design tool are hypothetical and untested.

      We agree that the improvements to the CBE sgRNA design are currently hypothetical. We plan to systematically test our guide design principles in future studies. Since this will require testing hundreds of guides to draw robust conclusions, we believe that this aspect is beyond the scope of the current study. In section “Improved CBE sgRNA design to prioritize edits resulting only in STOP codons” of our revised manuscript we now discuss these future plans.

      The Sanger and Oxford Nanopore Technology analyses on integration sites of the sgRNA expression cassette integration will not detect the mis-integration of the sgRNA expression construct into an entirely different locus.

      We have now re-analysed our ONT data and have extracted all ONT contigs that match the CBE sgRNA expression cassette. All extracted contigs align to the 18S rRNA SSU locus, showing integration of the cassette into this locus. It is important to note that here a population was sequenced and not a clone. Despite this, no contigs could be found that would link the CBE sgRNA expression cassettes to another locus. This is now shown in Figure 4 S2 and described in section “Cas12a-mediated DSB ensures the integration of one CBE sgRNA per L. mexicana transfectant”.

      Reviewer #3 (Public Review):

      Genetic manipulation of Leishmania has some challenges, including some limitations in the DNA repair strategies that are present in the organism and the absence of RNA interference in many species. The senior author has contributed significantly to expanding the available routes towards Leishmania genetic manipulation by developing and adapting CRISPR-Cas9 tools to allow gene manipulation via DNA double-strand break repair and, more recently, base modification. This work seeks to improve on some limitations in the tools previously described for the latter approach of base modification leading to base change.

      The work in the paper is meticulously described, with solid evidence for most of the improvements that are claimed: Figure1 clearly describes reduced impairment in the growth of parasites expressing sgRNAs via changes in promoters; Figures 2 and 3 compellingly document the usefulness of using AsCas12a for integration after transformation; and Figures 1 and 4 demonstrate the capacity of the combined modifications to efficiently edit a gene in three different Leishmania species. There is little doubt these new tools will be adopted by the Leishmania community, adding to the growing arsenal of approaches for genetic manipulation.

      There are two weaknesses the authors may wish to address, one smaller and one larger.

      (1) The main advance claimed here is in this section title: 'Integration of CBE sgRNA expression cassettes via AsCas12a ultra-introduced DSBs increase editing rates', with the evidence for this presented in Figure 4. It is hard work in the submission to discern what direct evidence there is for editing rates being improved relative to earlier, Cas9-based approaches. Did they directly compare the editing by the new and old approach? If not, can they more clearly explain how they are able to make this claim, either by adding text or a new figure? A side-by-side comparison would emphasise the advance of the new approach more clearly.

      We would like to thank reviewer #3 for this helpful comment. We have directly compared our improved method to our previous base editing method in Figures 1E and 4, demonstrating higher editing rates in a much shorter time. Especially the L. major panel in Figure 4B shows that in a direct comparison between the previously published (Engstler and Beneke, eLife 2023) and our here presented new system, editing can be only observed with the version presented here. However, to clarify the improvements we made, we compare now data from our previous screen done in Engstler and Beneke, eLife 2023 with a loss-of-function screen carried out with our updated method (see Figure 5 and section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen”).

      In addition, we also feel that our title might have been misleading in a sense that we claim that Cas12a editing is more efficient than other Cas9 based approaches, which is something that we don’t want to state here. Given that we have now included a small scale CRISPR screen and given that we generally show improved base editing compared to our previous method (improved in terms of less toxicity, more editing in shorter time, higher transfection rates and less species specific variation), we have rephrased our title to: “Improved base editing and functional screening in Leishmania via co-expression of the AsCas12a ultra, a T7 RNA Polymerase, and a cytosine base editor”. 

      (2) The ultimate, stated goal of this work is (abstract) to 'enable a variety of loss-of-function screens', as the older approach had some limitations. This goal is not tested for the new tools that have been developed here; the experiment in Figure 5 merely shows that they can, not unexpectedly, make a gene mutant, which was already possible with available tools. Thus, to what extent is this paper describing a step forward? Why have the authors not run an experiment - even the same one that was described previously in Engstler and Beneke (2023) - to show that the new approach improves on previous tools in such a screen, either in scale or accuracy?

      We have now included a small-scale loss-of-function screen in L. mexicana, targeting nine known essential genes with 24 CBE sgRNAs and 15 non-targeting control sgRNAs. This approach successfully detected all known included growth-associated phenotypes in a pooled screening format. This experiment is now shown in Figure 5 and described in section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen”. We believe that this underscores our claims made here and believe therefore that our updated toolbox will indeed enable a variety of loss-of-function screens.

      As pointed out in the comment to reviewer #2, we have recently successfully conducted several genome-wide loss-of-function screens in vivo and in vitro, ultimately confirming the feasibility of this type of screen on a genome-wide scale (manuscript in preparation). Without the improvements presented here, such as the higher transfection and base editing rates, these genome-wide screens could have not been carried out.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would like to compliment Tom Beneke and his lab on their continued efforts to develop tools to facilitate genome editing in Leishmania.

      I have no doubt that the toolkit presented in this study will be very useful for the community. The submitted paper is very well written and contains all the necessary controls to support the author's claims. There is only one point that left me a bit concerned if this strategy is to be used for large-scale screens, and that is the potential for integration of multiple sgRNA expression cassettes in a single cell.

      We would like to thank reviewer 1 for helpful comments. We have addressed the major concern raised by including a small-scale loss-of-function screen in our revised manuscript. By targeting nine known essential genes with 24 CBE sgRNAs and 15 non-targeting control sgRNAs, this approach successfully detected growth-associated phenotypes in a pooled format (see section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen” and Figure 5). Regarding the point of multiple sgRNA expression cassette integration, please see the next comment below.

      Major points:

      Integration of multiple sgRNA expression cassettes:

      While Illumina-based gDNA-seq is well suited to determine changes in ploidy, I don't think it is sensitive enough to draw conclusions about possible double integration in a small percentage of cells. In fact, the data shown in Figure 4 S1D show a normalized coverage >1.5 for sgRNA cassette and NeoR, suggesting that they may have integrated >1 times in some cells.

      To verify that the integration of the CBE sgRNA expression cassette is specific, we have re-analysed our ONT results and confirmed that only ONT contigs can be detected that link the CBE sgRNA expression to the 18S rRNA locus. No other integration sites can be found. We also do not detect any contigs containing multiple CBE sgRNA expression cassettes. This is now shown in Figure 4 S2 and described in section “Cas12a-mediated DSB ensures the integration of one CBE sgRNA per L. mexicana transfectant”.

      Nevertheless, it is a valid concern that the sequencing depth is not sufficient to detect small percentage of cells that have integrated the CBE sgRNA expression multiple times. However, in this case we also like to make the point that this small percentage of cells within a screen is likely to be not relevant and we therefore now added a small scale pooled loss-of-function screen, targeting essential genes, to the manuscript (see new Figure 5) to proof our claim. If the integration of multiple sgRNAs into one cell would have any measurable combinatorial effect, the non-targeting controls in our screen would have been depleted as well. However, there is no detectable difference between all 15 included controls in our small-scale screen.

      We have addressed all points in sections “Cas12a-mediated DSB ensures the integration of one CBE sgRNA per L. mexicana transfectant“ and “Detection of fitness-associated phenotypes in a pooled loss-of-function screen”.

      To avoid double integration, wouldn't it be easiest to just create an allele-specific "landing pad" on one chromosome? I believe that a double integration rate of ~20% could severely complicate the analysis of any large-scale screen later on.

      We thank the reviewer for this suggestion but we have tried to use an allele-specific "landing pad" and described this already in our first manuscript version (see section “DSBs introduced by AsCas12a ultra increase integration rates of donor DNA constructs”). Specifically, we integrated CBE sgRNA expression cassettes into the neomycin resistance marker contained in the tdTomato expression cassette (Figure 2 S1D, Cas12a crRNA-5 and 6) but this resulted in lower transfection rates (Figure 2F: crRNA-5 1 in ~47,000; crRNA-6 1 in ~32,000) then when using a Cas12a crRNA that targets the 18S rRNA locus directly (Figure 2F: crRNA-4 1 in ~2,000). As we believe a high transfection rate is key for pooled large-scale screens, we therefore pursued further experiments with crRNA-4. However, since a different crRNA can be easily selected for our tool, simply by just changing the Cas12a crRNA during transfection, users can chose a different integration site or other “landing pads” if they want to. We have updated section “Cas12a-mediated DSB ensures the integration of one CBE sgRNA per L. mexicana transfectant” to clarify these details.

      Also, it is not clear to me how the integration of tdTomato could affect the integration of the sgRNA expression cassette 400 bp downstream.

      As said above, our ONT data clearly shows that we can only see integration into one locus (Figure 4 S1 and S2). Given that the recognition site of crRNA-4 is contained in the homology flank used to integrate tdTomato into the 18S rRNA locus, this may contribute to the effect we observe. But since the homology sequences match the original sequences within the locus, the reasons to why this affects integration of the CBE sgRNA expression cassettes remain also elusive to us. We try to discuss this better now in the section “Cas12a-mediated DSB ensures the integration of one CBE sgRNA per L. mexicana transfectant”.

      Data accessibility:

      The Illumina and ONT data should be made publicly available.

      ONT and Illumina fastq reads are now available at the European Nucleotide Archive (ENA Accession Number: PRJEB83088)

      Minor point:

      Line 30: It would be easier for readers if the authors could briefly explain what bar-seq is.

      We have added more details:[…] and bar-seq screens, which involve individually deleting, barcoding, and pooling mutants for analysis, have facilitated […].

      Lines 114, 120: I think the authors are referring to Figures 1E and F, not Figures 2E and F.

      Many thanks for picking this up, we have corrected the Figure reference.

      Reviewer #2 (Recommendations For The Authors):

      This has the potential to be a valuable tool for the community if it is efficiently distributed. If the authors have not yet done so they should make their plasmids available to the community via Addgene.

      We have started the deposit process with Addgene and plasmids will be available soon. In the meantime, all plasmid maps are available on our website www.leishbaseedit.net and can be requested for shipment from our lab.

      Line 162-165, 400-401: The potential for using AsCAS12a's intrinsic RNase activity for "multiplexing" would benefit from a little more explanation (i.e. how this would work, and what multiplexing means in this context).

      We have added further details on multiplexing with Cas12a and point out potential applications.

      “For example, Cas12a crRNA arrays with four or more guides can be assembled and transfected to introduce multiple DSBs within one gene. Since Cas12a generates sticky DNA ends that facilitate recombination via microhomology-mediated end joining and homologous recombination (Zhang et al., 2021), this approach could effectively disrupt target genes without requiring the addition of donor DNA and this may provide an alternative approach to our here presented base editing method in the future. Moreover, CBE sgRNAs could be multiplexed by interspacing them with Cas12a direct repeats (DRs), enabling simultaneous targeting of multiple genes in one cell.”

      Line 193-194: can the authors offer an explanation for the reduction in mNG editing observed with 30nt homology flanks?

      We assume this is caused by imprecise recombination events in some cells and have revised the original sentence.

      In several places in the manuscript, it is unclear if an analysis has been done on an individual clone or a population derived from multiple transfected cells. If on mixed population, clarify this and calculate the number of clones that the mixture represents. E.g. lines 195-196 and 221-223 (Sanger sequencing of integration site); Line 333-352 (ONT analysis of CBE expression cassette integration).

      Only when we tested whether multiple CBE sgRNAs are integrated, we generated and analysed clones (Figure 4 S3). In all other experiments we analysed parasite populations. For better clarity, we have where possible indicated this in the revised manuscript (e.g. at the lines requested). 

      Line 259: "site by site" should presumably be "side by side".

      Many thanks for pointing this out. We have changed this typo.

      Lines 315-317: Clarify why the mis-integration of the CBE sgRNA expression cassette might cause a lack of editing (e.g. lack of expression?).

      We have added: “This could potentially result in the silencing of the CBE sgRNA expression or even lead to the deletion of the guide cassette”

      Line 364 - 367: it is unlikely there is the statistical power to state that 2/10 represents lower than the previously observed 38% of double integrants.

      We agree that the statistical power is low and have therefore changed our phrasing to an overall estimation.

      Reviewer #3 (Recommendations For The Authors):

      I suggest that the authors make clearer to the reader the evidence for improved editing efficiency in the new CBE system described here relative to the system described in Engstler and Beneke, 2023. Such clarification could be as simple as an extra paragraph or figure, clearly comparing the editing rates with the two systems in, as far as possible, equivalent conditions.

      We have directly compared our improved method to our previous base editing method in Figures 1E and 4, demonstrating higher editing rates in a much shorter time. Especially the L. major panel in Figure 4B shows that in a direct comparison between the previously published (Engstler and Beneke, eLife 2023) and new system, editing can be only observed with the version presented here. However, to clarify the improvements we made, we compare now data from our previous screen done in Engstler and Beneke, eLife 2023 with a loss-of-function screen carried out with our updated method (see Figure 5 and section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen”).

      The significance of this work would be improved by running the type of loss of fitness screen described previously in Engstler and Beneke (2023), thereby showing that the new approach improves on previous tools. Without such data, questions remain about potential confounding effects that might not be anticipated from the targeted experiments provided in the current manuscript.

      We thank the reviewer for this suggestion. The requested experiment is now presented in Figure 5 and described in section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen”.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This important study provides empirical evidence of the effects of genetic diversity and species diversity on ecosystem functions across multi-trophic levels in an aquatic ecosystem. The support for these findings is solid, but a more nuanced interpretation of the results could make the conclusions more convincing. The work will be of interest to ecologists working on multi-trophic relationships and biodiversity.

      Thanks for this new assessment. Here below we reply to the comments that you and the reviewer have made. We understand the critics related to the issue of the interpretation of causal relationships from observational data. We now added an entire paragraph (in the second paragraph of the Discussion) that explicitly call for a cautionary interpretation of our results. We also tried to refrain the use of certain words (e.g., “we demonstrate”) when we think it is hard to conclude. This a tricky exercise as on the one hand we gathered a large and strong database (which had been underlined by the reviewers) that should supposedly strengthen statistical inferences, but on the other hands, the inferences we’ve made are based from observational data, which obviously comes from biases (even if partially controlled statistically). We hope that you’ll find our adding appropriate to find the good balance between a strong dataset and fragile interpretation.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work used a comprehensive dataset to compare the effects of species diversity and genetic diversity within each trophic level and across three trophic levels. The results stated that species diversity had negative effects on ecosystem functions, while genetic diversity had positive effects. Additionally, these effects were observed only within each trophic level and not across the three trophic levels studied. Although the effects of biodiversity, especially genetic diversity across multi-trophic levels, have been shown to be important, there are still very few empirical studies on this topic due to the complex relationships and difficulty in obtaining data. This study collected an excellent dataset to address this question, enhancing our understanding of genetic diversity effects in aquatic ecosystems.<br /> Strengths:

      The study collected an extensive dataset that includes species diversity of primary producers (riparian trees), primary consumers (macroinvertebrate shredders), and secondary consumers (fish). It also includes genetic diversity of the dominant species in each trophic level, biomass production, decomposition rates, and environmental data. The writing is logical and easy to follow.

      Weaknesses:

      The two main conclusions-(1) species diversity had negative effects on ecosystem functions, while genetic diversity had positive effects, and (2) these effects were observed only within each trophic level, not across the three levels-are overly generalized. Analysis of the raw data shows that species and genetic diversity have different effects depending on the ecosystem function. For example, neither affected invertebrate biomass, but species diversity positively influenced fish biomass, while genetic diversity had no effect. Furthermore, Table S2 reveals that only four effect sizes were significant (P < 0.05): one positive genetic effect, one negative genetic effect, and two negative species effects, with two effects within a trophic level and two across trophic levels. Additionally, using a P < 0.2 threshold to omit lines in the SEMs is uncommon and was not adequately justified. A more cautious interpretation of the results, with acknowledgment of the variability observed in the raw data, would strengthen the manuscript.

      There is actually no objective justification for having chosen p<0.20. This is a subjective threshold that has been chosen to simplify the visual interpretation of causal graphs while highlighting the most biologically relevant links. We have now added a sentence stating explicitly the subjective nature of the threshold. We understand the point you raised regarding the cautionary interpretation of the results. We have now added a paragraph (just before the detailed discussion) explicitly calling for a cautionary interpretation of the results (see l. 414-424). We think this paragraph prevails for the entire discussion. Our message in this paragraph is that inferences that we’ve made can arise from both a biological reality and statistical artefacts. We can not really tease apart at this stage, and our interpretation of the results therefore has to be taken with care. We hope you’ll find the statement adequate.  We prefer advertising the readers from the start rather than including cautionary note all over the discussion. We feel it was more logical and comfortable. We have also modified the text from place to place to avoid strong statement such as “we demonstrated” when we think the demonstration can not be considered as solid.

      Recommendations for the authors:

      Reviewing Editor:

      In addition to the comments from the reviewer, we have the following comments on your paper:

      (1) It would be important to clarify that there could be different interpretations about one of the major findings: for within-trophic BEF relationships, genetic and species diversity have the opposite effects on ecosystem functions (i.e., positive and negative effects for genetic and species diversity, respectively). (1) One possibility is that for each specific ecosystem function, genetic and species diversity have the opposite effects. (2) The other possibility is that genetic diversity has positive effects on some functions, while species diversity has negative effects on other functions. These two possibilities can have quite different implications about the generalizability of the conclusion, mechanisms involved, and practices for ecosystem management. Therefore, it would be important to clarify that the findings from this paper are more about the second rather than the first possibility both in the discussion and conclusion sections.

      Yes, true, this is an important distinction and we agree with your conclusion. We have added a section in the Discussion (l. 537-545) and a note in the Conclusion (l. 625-627).

      (2) Please take special caution when comparing the findings from this observational study vs. previous experimental works. (1) The different ranges of diversity in the observational vs. experimental works, together with the nonlinear nature of the BEF relationship challenge the direct comparisons of their results. That is, even if their true BEF relationship are identical, focusing on different sections of a nonlinear curve can give us different results of the estimated BEF relationships. This challenge is further aggravated when involving both genetic and species diversity because these two facets have different biological meanings as the authors have already noted. Using standardized effect size or explained variance, as this paper did, may partially get around but not truly resolve this issue. It would be important to add clarifications to make the comparisons between genetic and species diversity effects more understandable in a biological or ecological context. One possibility could be to state that both genetic and species diversity measured in this study well represent their natural gradients in this aquatic ecosystem, so that the standardized effect sizes quantify how these natural diversity gradients associate with ecosystem functions. This further points to the issue about the representatives of the genetic diversity sampled from up to 32 individuals for each species per site, which would also need clarification. We suggest the authors to identify these challenges in the discussion, so that future studies can be aware of these or even find alternative solutions. (2) The species diversity effects have quite different meanings between this study and previous observational and experimental studies. The negative effects are for the biomass of one target species from this study, while the species diversity effects are usually for the biomass of all species within a community. These two scenarios are not directly comparable. The negative relationship between species diversity and a target species' biomass can simply arise from a sampling process, for example, given the same community biomass, the more species occur in a community, the less biomass allocated to a single species, without assuming any biological interactions or species differences. And this study cannot exclude this possibility. Note that this null, sampling process is not equal to a negative covariance between biomass of a focal species and biomass of the community involving the species as stated in lines 446-448. To avoid possible mis-interpretation, we suggest the authors to revise or remove the comparison appearing in the paragraph starting from line 515.

      Thanks for these comments. Although we agree with the two points raised by the Editor, we must admit that we found them difficult to answer properly.  See our detailed responses hereafter.

      Point (1): this is true that comparisons with previous studies is tricky, especially when these comparisons also include both genetic and species components. This is a problem (a limit) for almost all comparisons in biology. We added a few lines to warn readers that these comparisons are not without any limits (see l. 414-424). Regarding the fact that « genetic and species diversity measured in this study well represent their natural gradients in this aquatic ecosystem »: all is about scales. The genetic and species diversity measured in this study are obviously representative of communities and populations of the upstream (piedmont) part of the Garonne River basin as our sampling design covers all the east-west gradient. On the other hand, these communities and populations are not representative of the entire Garonne River basin, as we lack all the downstream part of the network. We added a sentence to specify that the sampling communities are specific of this specific ecosystem (rivers from the piedmont, see l. 224-226). Regarding « the issue about the representatives of the genetic diversity sampled from up to 32 individuals », we must admit that we are surprised by this comment as it is a very classical way for estimating genomic diversity. Although there is no clear rule, 30 individuals per site is generally assumed (and has been shown) to be an appropriate sample size (especially given that we used here a genome-wide approach). We added a reference to justify the sample size.

      Point (2): We understand the point raised by the Editors. Regarding your note “Note that this null, sampling process is not equal to a negative covariance between biomass of a focal species and biomass of the community involving the species as stated in lines 446-448.”: this is true, we rephrase this sentence to be more neutral. Regarding the paragraph starting l. 515 (now 550), we refrained to remove this paragraph as it provides some mechanistic explanation for underlying patterns, which we think is important even if incomplete or speculative. The confusion probably arises because here we discuss all type of negative BEFs, including the effect of species diversity on the biomass of the community, on the biomass of focal species (including those from other trophic levels) and the litter degradation. Our discussion is very general, whereas you seem to focus on a specific case of negative species-BEFs. To highlight this further and warn readers about possible conclusions, we added the following sentence: “Given the empirical nature of our study and the fact that our meta-regressive approach includes several types of BEFs (e.g., species richness acting either on the biomass of a single focal species or on the biomass of an entire focal community), it is hard to tease apart specific and underlying mechanisms” (l. 573-576).

      (3) Please clarify how you derived the 95% CI in Fig. 5. For example, how did you involve the uncertainties of each raw effect size (e.g. each black triangle in Fig. 5a) when calculating their mean and 95% CI in each group (e.g., the red triangles and error bars in Fig. 5a)?

      Estimates and 95%-CI from Figure 5 are derived from the mixed-effect models described from l. 314. They are hence marginal effects derived from the models, and 95%-CI include all error terms (fixed and random). We now specify in the Figure caption that estimates and 95%-CI are marginal effects derived from the mixed-effect models.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors examined whether aberrantly projecting retinal ganglion cells in albino mice innervate a separate population of thalamocortical neurons, as would be predicted for Hebbian learning rules. The authors find support for this hypothesis in correlated light and electron microscopy (CLEM) reconstructions of retinal ganglion cell axons and thalamocortical neurons. In a second line of investigation, the authors ask the same question about retinal ganglion cell innervation of local inhibitory interneurons of the mouse LGN. The authors conclude that these connections are less specific.

      Strengths:

      The authors make good use of CLEM to test a circuit-level hypothesis, and they find an interesting difference in RGC synaptic innervation patterns for thalamocortical neurons vs. local interneurons.

      Weaknesses:

      The conclusions about the local interneuron innervation are a little more difficult to interpret. One would expect to only capture a small part of the local interneuron dendritic field, as compared to the smaller thalamocortical neurons, right? Doesn't that imply that finding some evidence of promiscuous connectivity means that other dendrites that were not observed probably connect to many different RGCs?

      We will try to clarify this point

      Reviewer #2 (Public review):

      In this article, the authors examined the organization of misplaced retinal inputs in the visual thalamus of albino mice at electron-microscopic (EM) resolution to determine whether these synaptic inputs are segregated from the rest of the retinogeniculate circuitry.

      The study's major strengths include its high resolution, achieved through serial EM and confocal microscopy, which enabled the identification of all synaptic inputs onto neurons in the dorsolateral geniculate nucleus (dLGN).

      The experiments are very precise and demanding; thus, only the synaptic inputs of a few neurons were fully reconstructed in one animal. A few figures could be improved in their presentation.

      Despite this, the authors clearly demonstrate the synaptic segregation of misrouted retinal axons onto dLGN neurons, separate from the rest of the retinogeniculate circuitry.

      This finding is impactful because retinal inputs typically do not segregate within the mouse dLGN, and it was previously thought that this was due to the nucleus's small size, which might prevent proper segregation. The study shows that in cases where axons are misrouted and exhibit a different activity pattern than surrounding retinal inputs, segregation of inputs can indeed occur. This suggests that the normal system has the capacity to segregate inputs, despite the limited volume of the mouse dLGN.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Please include page numbers and line numbers in future submissions.

      Done

      (2) I am red-green colorblind, and I had a lot of trouble seeing the red channels when they were mixed with green. I recommend using magenta when possible.

      Thanks for the heads up. We have switched to green and magenta where possible. In the tinted EM where switching colors did not seem helpful, we added an asterisk to RGC boutons so that red and green would not be the only identifiers.

      (3) It would help if the figure captions also stated the conclusions that can be drawn from the figures. I recommend stating the main conclusion in the first sentence of the caption, rather than stating only what we are viewing. Similarly, the last sentence of the caption can help summarize what has been seen.

      We have included summary sentences at the beginning and end of figure legends.

      (4) In the text when discussing Figure 2J, do the authors mean to cite Supplementary Figure 2?

      Yes, thanks.

      (5) I don't think TC was ever defined (or I didn't find it).

      Corrected

      (6) In the subsection "An exclusive set..." cite Liang et al. as more evidence of non-specific innervation.

      We cite Liang et al in the discussion, but I don’t see a good place to cite it in the referenced results section. Please elaborate if we are missing something.

      (7) Supplementary Figure 3 is never cited.

      We have added the citation to Figure 3.

      (8) I found myself unsure of what to conclude after the results on LIN. A few more sentences of interpretation and restating what was found would help.

      We have added additional clarification in the Results:

      “The LIN results are consistent with our prediction that shaft dendrites would be indifferent to island/non-island boundaries while individual targeted dendrites would target either the island or non-island RGC boutons. However, the restriction of the targeted dendrites to one or the other RGC field does not appear to be an absolute rule. Rather the scale of targeted dendrite exploration and the size of the exclusion zone is likely to reduce the chances that a targeted dendrite would find partners on both in the island and outside of the island. This matching between the exploration of targeted LIN dendrites and the segregation of retinogeniculate connectivity means that targeted LIN dendrites will have an RGC input profile (island/non-island) that matches the TCs they innervate.”

      Reviewer #2 (Recommendations for the authors):

      (1) The abbreviation TC is used in the text without a definition.

      Corrected

      (2) The features that allow for labeling the different dendrites/cells (TC and LIN) in Serial EM data (Figure 1) are necessary. While the explanation is provided for RGC boutons, the labeling for thalamic cells is not discussed.

      We added the sentence:

      “Thalamocortical dendrites were distinguished from local inhibitory neuron dendrites by the presence of spines and the absence of synaptic outputs.”

      (3) Image 2C (EM) appears blurry or pixelated. Enhancing its resolution could improve clarity.

      Image 2C is a demonstration of how much we felt we could sacrifice image quality and still reconstruct TC arbors and RGC inputs.

      (4) The gray circles that show the innervation of TC17 in Figure 2E are barely visible, especially on-screen without high magnification. A more contrasting color and wider lines would enhance visibility. It would also be helpful to indicate TC17 in Figure 2H and 2G, as this cell is special and highlighted in the main text.

      We have made the requested changes

      (5) A TC with no RGC input is mentioned. Have you identified other synaptic inputs, potentially related to SC or the cortex?

      Both TC17 (a few exclusion zone RGC inputs) and TC5 (no RGC inputs) were innervated by some large, dark mitochondria boutons that could be SC inputs.  However, we did not perform enough reconstruction of the axons to confidently describe their non-RGC input profile. I have previously observed occasional TCs in the same region of the dLGN where RGC inputs are almost entirely replaced by SC inputs, so finding two such cells was not surprising.

      (6) Two fully reconstructed TCs are mentioned. Please specify their exact number in the text, as citing Figure 2J or Supplementary Figure 1 alone is not sufficient for identification.

      Clarified as “(TC3, TC4, Figure 2J, Supplementary Figure 2,3).”

      (7) A correlation between the position of the dendrites and the location of RGC inputs would provide additional insights. This is somewhat reminiscent of the dendrite orientation of Layer IV spiny stellate neurons in the somatosensory cortex that receive inputs from the thalamocortical axons and could be mentioned in the discussion.

      We believe that the images provided are a strong argument for TC arbors being shaped by RGC bouton distributions. We agree that reporting the correlation between dendrites and RGC boutons would be useful, but we found this correlation difficult to quantify. One of the challenges is that we would need to perform several-fold more reconstruction of dendrites and RGC boutons to have an unbiased mapping of both. Currently, most of the reconstructions stop when the dendrites assume a distal morphology and stop interacting with RGC boutons. Likewise, the EM of the RGC boutons are only those that innervate the reconstructed cells. We considered simply quantifying the asymmetry of the TC arbors relative to a symmetrical distribution and a random distribution, but we felt that quantification would be difficult to interpret without a similar analysis performed in the same region of dLGN on wild-type TCs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper is an incremental follow-up to the authors' recent paper which showed that Purkinje cells make inhibitory synapses onto brainstem neurons in the parabrachial nucleus which project directly to the forebrain. In that precedent paper, the authors used a mouse line that expresses the presynaptic marker synaptophysin in Purkinje cells to identify Purkinje cell terminals in the brainstem and they observed labeled puncta not only in the vestibular and parabrachial nuclei, as expected, but also in neighboring dorsal brainstem nuclei, prominently the central pontine grey. The present study, motivated by the lack of thorough characterization of PC projections to the brainstem, uses the same mouse line to anatomically map the density and a PC-specific channelrhodopsin mouse line to electrophysiologically assess the strength of Purkinje cell synapses in dorsal brainstem nuclei. The main findings are (1) the density of Purkinje cell synapses is highest in vestibular and parabrachial nuclei and correlates with the magnitude of evoked inhibitory synaptic currents, and (2) Purkinje cells also synapse in the central pontine grey nucleus but not in the locus coeruleus or mesencephalic nucleus.

      Strengths:

      The complementary use of anatomical and electrophysiological methods to survey the distribution and efficacy of Purkinje cell synapses on brainstem neurons in mouse lines that express markers and light-sensitive opsins specifically in Purkinje cells is the major strength of this study. By systematically mapping presynaptic terminals and light-evoked inhibitory postsynaptic currents in the dorsal brainstem, the authors provide convincing evidence that Purkinje cells do synapse directly onto pontine central grey and nearby neurons but do not synapse onto trigeminal motor or locus coeruleus neurons. Their results also confirm previously documented heterogeneity of Purkinje cell inputs to the vestibular nucleus and parabrachial neurons.

      Weaknesses:

      Although the study provides strong evidence that Purkinje cells do not make extensive synapses onto LC neurons, which is a helpful caveat given previous reports to the contrary, it falls short of providing the comprehensive characterization of Purkinje cell brainstem synapses which seemed to be the primary motivation of the study. The main information provided is a regional assessment of PC density and efficacy, which seems of limited utility given that we are not informed about the different sources of PC inputs, variations in the sizes of PC terminals, the subcellular location of synaptic terminals, or the anatomical and physiological heterogeneity of postsynaptic cell types. The title of this paper would be more accurate if "characterization" were replaced by "survey".

      Several of the study's conclusions are quite general and have already been made for vestibular nuclei, including the suggestions in the Abstract, Results, and Discussion that PCs selectively influence brainstem subregions and that PCs target cell types with specific behavioral roles.

      We agree that we did not provide an in-depth characterization of PC synapses onto all identified types of brainstem neurons. With so many types of neurons in the brainstem, this would be a monumental task. Despite this limitation we prefer to keep our original title, since our study makes the following advances:

      • We provide a comprehensive map of all PC synaptic boutons across the brainstem, and corresponding maps of PC synaptic input sizes. The input sizes vary widely, but are often multiple nanoamps, indicating that the cerebellum is an important regulator of activity in these regions. These maps will be indispensable for future investigations of cerebellar outputs.

      • We find that PC projections and the synapses they make are spatially restricted within most target nuclei such as the vestibular and parabrachial nuclei. This suggests that the influence of the cerebellum is spatially segregated within these nuclei, and likely allows the cerebellum to regulate specific behaviors.  While some aspects of these gradients have been described previously, our study is comprehensive, and has a higher degree of specificity than can be achieved with immunohistochemistry. 

      • We discover that PCs form functional synapses in the pontine central grey and nearby nuclei. Much of this region’s function is unknown, but certain subregions are important for micturition and valence. PCs make large synapses onto a small fraction of cells in this region, which suggests that PCs may target specific cell types to control novel nonmotor behaviors.

      • We provide clarification regarding PC projections to the locus coeruleus. Multiple high-profile, highly influential studies using rabies tracing (Schwarz et al., Nature 2015; Breton-Provencher and Sur, Nature Neuroscience 2019; and others) described a prominent PC input to the locus coeruleus. We showed that this projection is essentially nonexistent, both anatomically and functionally. We previously addressed this issue, but the PC-specific optogenetic approach we used here provides the most compelling evidence against a prominent PC-LC connection. This is an important finding for the cerebellum and a cautionary tale for conclusions based solely on viral tracing methods. We will expand on this issue in response to the comments of reviewer #3.

      Reviewer #2 (Public review):

      Summary:

      While it is often assumed that the cerebellar cortex connects, via its sole output neuron, the Purkinje cell, exclusively to the cerebellar nuclei, axonal projections of the Purkinje cells to dorsal brainstem regions have been well documented. This paper provides comprehensive mapping and quantification of such extracerebellar projections of the Purkinje cells, most of which are confirmed with electrophysiology in slice preparation. A notable methodological strength of this work is the use of highly Purkinje cell-specific transgenic strategies, enabling selective and unbiased visualization of Purkinje terminals in the brainstem. By utilizing these selective mouse lines, the study offers compelling evidence challenging the general assumption that Purkinje cell targets are limited to the cerebellar nuclei. While the individual connections presented are not entirely novel, this paper provides a thorough and unambiguous demonstration of their collective significance. Regarding another major claim of this paper, "characterization of direct Purkinje cell outputs (Title)", however, the depth of electrophysiological analysis is limited to the presence/absence of physiological Purkinje input to postsynaptic brainstem neurons whose known cell types are mostly blinded. Overall, conceptual advance is largely limited to confirmatory or incremental, although it would be useful for the field to have the comprehensive landscape presented.

      Strengths:

      (1) Unsupervised comprehensive mapping and quantification of the Purkinje terminals in the dorsal brainstem are enabled, for the first time, by using the current state-of-the-art mouse lines, BAC-Pcp2-Cre and synaptophysin-tdTomato reporter (Ai34).

      (2) Combinatorial quantification with vGAT puncta and synaptophysin-tdTomato labeled Purkinje terminals clarifies the anatomical significance of the Purkinje terminals as an inhibitory source in each dorsal brainstem region.

      (3) Electrophysiological confirmation of the presence of physiological Purkinje synaptic input to 7 out of 9 dorsal brainstem regions identified.

      (4) Pan-Purkinje ChR2 reporter provides solid electrophysiological evidence to help understand the possible influence of the Purkinje cells onto LC.

      Weaknesses:

      (1) The present paper is largely confirmatory of what is presented in a previous paper published by the author's group (Chen et al., 2023, Nat Neurosci). In this preceding paper, the author's group used AAV1-mediated anterograde transsynaptic strategy to identify postsynaptic neurons of the Purkinje cells. The experiments performed in the present paper are, by nature, complementary to the AAV1 tracing which can also infect retrogradely and thus is not able to demonstrate the direction of synaptic connections between reciprocally connected regions. Anatomical findings are all consistent with the preceding paper. The likely absence of robust physiological connections from the Purkinje to LC has also been evidenced in the preceding paper by examining c-Fos response to Purkinje terminal photoinhibition at the PBN/LC region.

      We agree that we previously dealt with the issue of PC-LC synapses (Chen et al., 2023, Nat Neurosci), but our conclusions differed from several high-profile publications (Schwarz et al., Nature 2015; Breton-Provencher and Sur, Nature Neuroscience 2019), and still met considerable resistance. We felt that the optogenetic approach provided the most definitive means of evaluating the presence and strength of PC-LC synapse that will hopefully settle this issue. These experiments also set a standard for future studies assessing the presence of PC synapses onto other target neurons in the brainstem.

      (2) Although the authors appear to assume uniform cell type and postsynaptic response in each of the dorsal brainstem nuclei (as noted in the Discussion, "PCs likely function similarly to their inputs to the cerebellar nuclei, where a very brief pause in firing can lead to large and rapid elevations in target cell firing"), we know that the responses to the Purkinje cell input are cell type dependent, which vary in neurotransmitter, output targets, somata size, and distribution, in the cerebellar and vestibular nuclei (Shin et al., 2011, J Neurosci; Najac and Raman, 2015, J Neurosci; Özcan et al., 2020, J Neurosci). This consideration impacts the interpretation of two key findings: (a) "Large ... PC-IPSCs are preferentially observed in subregions with the highest densities of PC synapses (Abstract)". For example, we know that the terminal sparse regions reported in the present paper do contain Floccular Targeted Neurons that are sparse yet have dense somatic terminals with profound postinhibitory rebound (Shin et al.). Despite their sparsity, these postsynaptic neurons play a distinct and critical role in proper vestibuloocular reflex. Therefore, associating broad synaptic density with "PC preferential" targets, as written in the Abstract, may not fully capture the behavioral significance of Purkinje extracerebellar projections. (b) "We conclude ... only a small fraction of cell. This suggests that PCs target cell types with specific behavioral roles (Abstract, the last sentence)". Prior research has already established that "PCs target cell types with specific behavioral roles in brainstem regions". Also, whether 23 % (for PCG), for example, is "a small fraction" would be subjective: it might represent a numerically small but functionally important cell type population. The physiological characterization provided in the present cell type-blind analysis could, from a functional perspective, even be decremental when compared to existing cell typespecific analyses of the Purkinje cell inputs in the literature.

      We now cite the papers suggested by the reviewer (Shin et al., 2011, J Neurosci; Najac and Raman, 2015, J Neurosci; Özcan et al., 2020, J Neurosci) and add to the discussion.

      (3) The quantification analyses used to draw conclusions about

      (a) the significance of PC terminals among all GABAergic terminals and the fractions of electrophysiologically responsive postsynaptic brainstem neurons may have potential sampling considerations:.

      (a.i) this study appears to have selected subregions from each brainstem nucleus for quantification (Figure 2). However, the criteria for selecting these subregions are not explicitly detailed, which could affect the interpretation of the results.

      Additional explanation has been added to results in the section, “Quantification of PC synapses in the brainstem.”  

      (a.ii) the mapping of recorded cells (Figure 3) seems to show a higher concentration in terminal-rich regions of the vestibular nuclei.

      In Figure 3, we strived to record in an unbiased manner. However, there may have been a slight bias to recordings in areas of lower myelination where patching is easier. We now clarify this issue in the text.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Chen and colleagues explores the connections from cerebellar Purkinje cells to various brainstem nuclei. They combine two methods - presynaptic puncta labeling as putative presynaptic markers, and optogenetics, to test the anatomical projections and functional connectivity from Purkinje cells onto a variety of brainstem nuclei. Overall, their study provides an atlas of sorts of Purkinje cell connectivity to the brainstem, which includes a critical analysis of some of their own data from another publication. Overall, the value of this work is to both provide neural substrates by which Purkinje cells may influence the brainstem and subsequent brain regions independent of the deep cerebellar nuclei and also, to provide a critical analysis of viral-based methods to explore neuronal connectivity.

      Strengths:

      The strengths lie in the simplicity of the study, the number of cells patched, and the relationship between the presence of putative presynaptic puncta and electrophysiological results. This type of study is important and should provide a foundation for future work exploring cerebellar inputs and outputs. Overall, I think that the critique of viral-based methods to define connectivity, and a more holistic assessment of what connectivity is and how it should be defined is timely and warranted, as I think this is under-appreciated by many groups and overall, there is a good deal of research being published that do not properly consider the issues that this manuscript raises about what viral-based connectivity maps do and do not tell us.

      We thank the reviewer for highlighting this important aspect of this work, and for agreeing with our thesis concerning viral-based connectivity maps.

      Weaknesses:

      While I overall liked the manuscript, I do have a few concerns that relate to interpretation of results, and discussion of technological limitations. The main concerns I have relate to the techniques that the authors use, and an insufficient discussion of their limitations. The authors use a Cre-dependent mouse line that expresses a synaptophysin-tomato marker, which the authors confidently state is a marker of synapses. This is misleading. Synaptophysin is a vesicle marker, and as such, labels axons, where vesicles are present in transit, and likely cell bodies where the protein is being produced. As such, the presence of tdtomato should not be interpreted definitively as the presence of a synapse. The use of vGAT as a marker, while this helps to constrain the selection of putative pre-synaptic sites, is also a vesicle marker and will likely suffer the same limitations (though in this case, the expression is endogenous and not driven by the ROSA locus). A more conservative interpretation of the data would be that the authors are assessing putative pre-synaptic sites with their analysis. This interpretation is wholly consistent with their findings showing the presence of tdtomato in some regions but only sparse connectivity - this would be expected in the event that axons are passing through. If the authors wish to strongly assert that they are specifically assessing synapses, a marker better restricted to synapses and not vesicles may be more appropriate.

      We agree that synaptophysin-tdTomato is an imperfect marker, although it is vastly superior to cytosolic tdTomato.  We found that viral expression of synaptophysin-GFP gives much more punctate labelling, but an appropriate synaptophysin-GFP line is not available. We carefully point out this issue, and threshold the images to avoid faint labeling associated with fibers of passage.  The intersection of VGAT labelling and of the synaptophysin-tdTomato labelling provides us with superior identification of PC boutons.  We will add additional clarification to point out that these are putative presynaptic boutons, but that alone this does not establish the existence or the strength of functional synapses.

      Similarly, while optogenetics/slice electrophysiology remains the state of the art for assessing connectivity between cell populations, it is not without limitations. For example, connections that are not contained within the thickness of the slice (here, 200 um, which is not particularly thick for slice ephys preps) will not be detected. As such, the absence of connections is harder to interpret than the presence of connections. Slices were only made in the coronal plane, which means that if there is a particular topology to certain connections that is orthogonal to that plane, those connections may be under-represented. As such, all connectivity analyses likely are under-representations of the actual connectivity that exists in the intact brain. Therefore, perhaps the authors should consider revising their assessments of connections, or lack thereof, of Purkinje cells to e.g., LC cells. While their data do make a compelling case that the connections between Purkinje cells and LC cells are not particularly strong or numerous, especially compared to other nearby brainstem nuclei, their analyses do indicate that at least some such connections do exist. Thus, rather than saying that the viral methods such as rabies virus are not accurate reflections of connectivity - perhaps a more circumspect argument would be that the quantitative connectivity maps reported by other groups using rabies virus do not always reflect connectivity defined by other means e.g., functional connections with optogenetics. In some cases, the authors do suggest this (e.g."Together, these findings indicate that reliance on anatomical tracing experiments alone is insufficient to establish the presence and importance of a synaptic connection"), but in other cases, they are more dismissive of viral tracing results (e.g. "it further suggests that these neurons project to the cerebellum and were not retrogradely labeled"). Furthermore, some statements are a bit misleading e.g., mentioning that rabies methods are critically dependent on starter cell identity immediately following the citation of studies mapping inputs onto LC cells. While in general, this claim has merit, the studies cited (19-21) use Dbh-Cre to define LC-NE cells which does have good fidelity to the cells of interest in the LC. Therefore, rewording this section in order to raise these issues generally without proximity to the citations in the previous sentence may maintain the authors' intention without suggesting that perhaps the rabies studies from LC-NE cells that identified inputs from Purkinje cells were inaccurate due to poor fidelity of the Cre line. Overall, this manuscript would certainly not be the first report indicating that the rabies virus does not provide a quantitative map of input connections. In my opinion, this is still under-appreciated by the broad community and should be explicitly discussed. Thus, an acknowledgment of previous literature on this topic and how their work contributes to that argument is warranted.

      We have a different take on connectivity and the use of optogenetics.  Based on our years of experience studying synapses in brain slice, axons survive very well even when they are cut. It is not necessary to preserve intact axons that extend for long distances. It is also true that activation of these axons, with either extracellular electrical stimulation or with optogenetics, is sufficient to evoke synaptic inputs. Robust synaptic responses are evoked with optogenetic activation regardless of the slice orientation. We thank the reviewer for raising this issue, and we have added a couple of sentences to clarify this point under the section “Characterization of functional properties of PC synapses in the brainstem.”

      The discussion on starter cell specificity was not referring to the specificity of cre in transgenic animals, but the TVA/G helper proteins that are introduced by AAV and used in conjunction with the rabies virus. The issues related to this have recently been discussed in Elife (Beier, 2022) in addition to citations 58 and 59 in the manuscript. We have more explicitly highlighted this issue in the revised manuscript in the section “Lack of significant PC inputs to LC neurons.”

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) Methods need detail to be replicable, particularly in how PC synapses were identified and automatically counted. It is not clear what was the variation within subregions across mice. How were neurons selected or rejected for recordings and analyses? Was each subregion sampled at equal spacing? Methods for anatomy should mention sagittal sections.

      Wording in Methods section, “Anatomy” was changed to better reflect how PC synapses were identified as colabeled segments of vGAT and tdTomato labeling. 

      Each datapoint in Figure 2D-F was quantification of a region for each section and each mouse. The color of the data point indicates the anterior posterior location of the section. The violin plot quantifies the median and quartile value for all points across sections and mice. The variability captured by the violin point reflects variability across the anterior-posterior axis. 

      Neurons were mostly randomly selected in each slice, and rejected based on unstable holding current or access resistance. Cell locations were recorded and updated with each experiment so that we minimized oversampling easier to patch regions.

      Sagittal sections were added in methods.

      (2) Figure 2D-F what is the black line and grey region?

      Additional text was added in the caption for Figure 2D-F

      (3) MEV is confusing given LAV stands for lateral vestibular - perhaps call it ME5?

      We will remain consistent with the abbreviations in the Allen Brain Reference Atlas.

      Reviewer #2 (Recommendations for the authors):

      (1) What are the criteria for distinguishing large, small, and non-responders?

      Large are in the nA range, small are in the hundreds of pA, and non-responders are effectively zero. Manual curation of these responses indicated that a current amplitude threshold of 45 pA clearly separated non-responders from responders. To be clear, the average response (as stated in text and displayed in Figure 3D) includes all cells.

      (2) p1. "Unexpectedly": it would not be unexpected, rather, expected, because it was reported in Chen et al., 2023, Nat Neurosci.

      The PCG was hinted at, but an actual functional, anatomical connection was not reported in our previous manuscript.

      (3) p1. "We combined electrophysiological recordings with immunohistochemistry to assess the molecular identities of these PC targets": please clarify "these" here. It could be read that it refers to "pontine central gray and nearby subnuclei" but it doesn't make sense. Immuno has only been performed for MeV and LC.

      Corrected

      (4) p1. "but only inhibit a small fraction of cells in many nuclei": as far as I read Fig.3, it seems that ~50% for PBN/VN and ~25% for PCG: would this be "a small fraction"?

      The small fraction of cells was in reference to subnuclei within the PCG, but we agree this statement is too broad to be useful and have eliminated it.

      (5) p2. "conventional tracer": viral tracer is becoming a standard, so dye tracer could be better here.

      Corrected

      (6) p3. "rostral/cauda": typo.

      Corrected.  

      (7) p3. Quantification of PC synapses in the brainstem: it would be helpful to introduce why synapto-tdT alone is not sufficient, and the purpose of adding vGAT immunostaining.

      We have added more on vGAT labeling putative presynaptic sites and quantifying only synaptic labeling instead of axonal tdTomato in the Results, “Quantification of PC synapses in the brainstem.” In addition, vGAT staining allows us to examine the PC contribution to total inhibition in each region.

      (8) p7. "PB and are": typo.

      Corrected. And all instances of PBN were changed to PB

      (9) p7. "they are likely a mix of excitatory and inhibitory inputs 54,55": Bagnall et al., 2009, J Neurosci, would be critically relevant here.

      Added, thank you

      (10) Figures 2-3: Yellow/Blue color scheme is hard to distinguish, and having two colors could be read as implying two distinct regions.

      We are unsure what the reviewer is referring to exactly here, but the colors refer to the sections in 2C (see the color bar on the bottom right of each atlas schematic). The points represent an individual section that was quantified, and thus do represent distinct samples from distinct regions.

      (11) Figure 2D-F: what is indicated by each point?

      Each data point is the number of PC bouton (D), density of bouton (E), or percentage of synaptophysin/vGAT (F) quantified for each region per section. Each color represents a coronally distinct section of a region. Additional text was added into the captions to clarify this and point 10.

      (12) Figure 3E, right: what is the correlation coefficient?

      The correlation coefficient was found to be 0.74

      Reviewer #3 (Recommendations for the authors):

      Some minor grammatical errors and typos need to be cleaned up (e.g. "To quantifying the densities...", "The medial-ventral region of the PBN...have extensive...".

      These errors have been corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Abstract

      I don't think you need the first two sentences of the abstract. This is not a grant and your results are exciting enough to justify a full basic science-based approach.

      We fully understand this perspective.  However, we prefer to introduce the work in the broader context of sleep medicine.  This manuscript is part of our long-standing efforts to develop cavefish as a model for sleep disorders and we believe this provides important context.

      Last sentence of the abstract: the subject is missing. "That have developed..." who has developed?

      Thank you. We have corrected this error, the sentence now reads “...these findings suggest that cavefish have developed resilience to sleep loss...”

      Introduction

      First paragraph. Worth explaining in a sentence what is the link between DNA damage and ROS.

      We now state ‘Further, chronic sleep loss results in elevated reactive oxygen species (ROS), a known mediatior of DNA damage, in the gut and/or brain that contribute to mortality in Drosophila and mice [11,16].’

      "A. mexicanus exists as blind cave populations and an extant surface population that are interfertile". This needs rephrasing. As it is, it sounds like the surface population is infertile.

      We have rephrased for clarity; the line now reads: “while the surface and cave populations are geographically isolated, they remain interfertile and capable of hybridization in nature as well as laboratory settings”.

      "Further, the evolved differences in DNA repair genes, including links between mechanisms regulating sleep, light responsiveness, and DNA repair across all three cave populations studied to date [27,29]" This sentence is incomplete.

      We have corrected the phrasing, which now reads “...evolved differences in DNA repair genes have been identified across all three cave populations studied to date, including links between mechanisms regulating sleep, light responsiveness, and DNA repair”:

      Figure 1

      I recommend improving the legibility of the figure copying some of the information provided in the legend directly within the figure itself.

      A, B: label in the panel itself what is blue and what is green.

      Thank you, we have made this change.

      C: Make it clear in the figure itself that you are measuring yH2AX. Also, probably you have enough room in the figure to avoid abbreviations for Rhomb, mes, and tele. It may also help if you could add a little cartoon that explains what those three brain regions are.

      We have added text to the y axis indicating that yH2AX fluorescence is being measured, and replaced the abbreviations with eh full names of the regions.

      G: again, explain that DHE is being measured here. And perhaps pick a different colour choice to highlight the difference from C?

      We have added clarifiaction to the y-axis of the figure, but have retained the color scheme for consistency; in all surface-cave comparisons in the manuscript, gray is used for surface fish and red for cavefish.

      In the text: I would recommend adding some quantitative reminder of what is the difference in sleep amount between the two species (cave vs surface).

      We have added the following to highlight the magnitude of the difference in sleep: “Strikingly, cavefish sleep as little as 1-2 hours per day, in contrast to their surface counterparts, which sleep as much as 6-10 hours a day”

      "Together, these findings fortify the notion that cellular stress is elevated in the gut of cavefish relative to surface fish." Were the two populations fed the same diet and raised in the same lab conditions? If this is pinpointed to sleep amount, it's worth ruling out possible confounding factors.

      We have added a sentence to the results underlining this point: “Prior to imaging, both surface and cavefish had been reared in a temperature-controlled incubator, and relied solely on their yolk sac for nutrients; so, differences in gut ROS cannot be attributed to differences in rearing or feeding conditions.”

      Figure 2

      Spell out, somewhere in the figure itself, that the 30s and 60s refer to UV treatment protocols.

      We have added X-axis titles to clarify this in Fig 2 and supp. Fig 1.

      It would be worth providing a cartoon of the experimental setup that shows for instance what time of the day UV was given (it's only specified in the text) and which subsequent sleep period was selected for comparisons.

      We have added arrows to all sleep plots indicating the time of UV treatment, and brackets indicating the time period used for statistical comparisons, as well as text in the figure legends indicating this.

      Figure 3

      A. I don't think this is needed, to be honest, and if you want to keep it, it needs a better legend.

      We have edited the figure legend to increase clarity.

      B. I would make it clear in the figure that this refers to transcriptomics analysis. Perhaps you could change the order and show C, D, and then B.

      We have added text to the figure legend and the results text to more explicitly state that the PCA plot is of transcriptional response. We have however retained the original figure order, as well feel this figure is important to establish that both populations have strong, but distinct responses to the UV treatment.

      Figure 4

      A. Spell it out in the figure itself that you're staining for CPD.

      Thank you, we have made this change.

      B. You are using the same colour combination you had in Figure 1 but for yet another pairing. This is a bit confusing.

      Thank you for bringing this to our attention.  We have added descriptions of the colors in the figure legend.

      Discussion

      "Beyond the Pachón cavefish population, all three other cavefish populations have been found to have reduced sleep (Cite)." Citation missing here.

      Thank you.  We have now clarified this sentence and included a citation.

      Reviewer #2 (Recommendations For The Authors):

      Consideration of Environmental Conditions:

      Evaluate whether the lab conditions, which may more closely resemble surface environments, could influence the observed increase in neuronal DNA damage and gut ROS levels in cavefish. Adjusting these conditions or discussing their potential impact in the manuscript would strengthen the findings.

      We are very excited about these experiments.  We have a paper that will be submitted to BioRxiv this week where we record wild-caught fish, as well as fish in caves.  The conclusion is that sleep loss is present in both populations.  This field work took over 10 years to come together and still lacks the power of the lab based assays.  Nevertheless, we can conclusively say that the phenotypes we have observed for the last ~15 years in the lab are present in a natural setting.  We have included a statement about the need for future work to test these findings in a natural setting.

      Alternative Stressors:

      Given that cavefish are albino and blind (to my knowledge), consider using alternative sources of genotoxic stress beyond UV-induced damage. This could include chemical agents or other forms of environmental stress to provide a more comprehensive assessment of DDR.

      We agree and are enthusiastic about looking more generally at stress.  We note that we have previously found that cavefish rebound following sleep deprivation (McGaugh et al, 2020) suggesting that they are responsive to sleep disruption.  This will be a major research focus area moving forward.

      Broader Stress Responses:

      Investigate whether other forms of stress, such as dietary changes or temperature fluctuations, elicit similar differences in sleep patterns and DDR responses. This could provide additional insights into the robustness of the observed phenomena.

      We fully agree.  This will be the primary focus of this research area moving forward. We hypothesize that cavefish are generally less responsive to their environment.  Unpublished data reveals that temperature stress, circadian changes, and aging (presented here) to little to impact gene expression in surface fish.  We would like to test the hypothesis that transcriptional stability of cavefish contributes to their longevity.

      Potential Protective Mechanisms:

      Discuss the possibility that lower levels of gamma-H2AX in cavefish might be protective, as DDR can lead to cellular senescence or cancer. This perspective could add depth to the interpretation of the results.

      This was the hypothesis underlying this manuscript.  However, we found elevated levels of gamma-H2AX.  We believe there may be additional protective mechanisms that have evolved in cavefish, but cannot identify them to date.  Our hope is future functional studies by our group, as well as other groups’ access to this published work, may help address these questions.

      Strengthening the Sleep-DNA Damage Link:

      Further experiments are needed to directly link sleep differences to the observed variations in DNA damage and DDR. This could involve manipulating sleep patterns in surface fish and cavefish to observe corresponding changes in DNA repair mechanisms.

      We agree.  We have referenced work that conclusively showed this relationship in zebrafish. Our current methods for limiting sleep involves shaking, and this has too many confounds.  We are working on developing genetic tools, and applying the gentle rocking methods used previously in zebrafish to address these questions.

      Clarification of Causal Directionality:

      Address the potential that sleep patterns and DDR responses may both be downstream effects of a common cause or independent adaptations to the cave environment. Clarifying this in the manuscript would provide a more nuanced understanding of the evolutionary adaptations.

      Thank you for this suggestion.  We have now added a paragraph describing how these experiments (and the ones described above) are necessary for understanding the relationship between sleep and DDR.

      Clarification and Presentation:

      Fix the many typos, and improve the clarity of the figures and their legends to ensure they are easily interpretable. Additional context in the discussion section would help readers understand the significance and potential implications of the findings.

      Thank you, we have now included this.

      Reviewer #3 (Recommendations For The Authors):

      There are a number of suggestions that I have made in the public review, but there are a few things that I would like to add here.

      The methods section is missing many important details, for instance, the intensity of the illumination used in the UV exposure in larvae is not reported but is vital for the interpretation/replication of these experiments. In general, this section should be redone with a greater effort to include all important information. Similarly, the figure legends could be greatly improved, with important details like n-number and definition of significance thresholds defined (e.g. see Figures 1, C, and G.)

      We have added greater detail to the methods section to specify the spectral peak and power output of the bulbs used.

      There are a number of passages in the manuscript that do not make sense, which suggests that a future version of record should be carefully proofread. I know that this can be a case of reading multiple versions of a manuscript so many times that one doesn't really see it anymore, but, for example, phrases like "To differentiate between these two possibilities" are confusing to the reader when there has been no introduction of alternate possibilities.

      Thank you for this comment.  We have fixed this mistake and proofread the manuscript.

      Additionally, there are multiple examples of errors in citations/references. A few examples are below:

      "Further, chronic sleep loss results in elevated reactive oxygen species (ROS) in the gut and/or brain that contribute to mortality in Drosophila and mice [11, 16]". Reference 16 does not include mice at all, and reference 11 is Vaccaro et al. 2020, where Drosophila mortality is assessed, but mouse mortality is not.

      We have added the appropriate citations and revised this sentence.

      References 13 and 15 are the same.

      Thank you, we have fixed.

      References 24 and 26 are the same.

      Thank you, we have fixed.

      Public Reviews:

      Reviewer #1 (Publc Review):

      Summary:

      Lloyd et al employ an evolutionary comparative approach to study how sleep deprivation affects DNA damage repair in Astyanax mexicanus, using the cave vs surface species evolution as a playground. The work shows, convincingly, that the cavefish population has evolved an impaired DNA damage response both following sleep deprivation or a classical paradigm of DNA damage (UV).

      Strengths:

      The study employs a thorough multidisciplinary approach. The experiments are well conducted and generally well presented.

      Weaknesses:

      Having a second experimental mean to induce DNA damage would strengthen and generalise the findings.

      Overall, the study represents a very important addition to the field. The model employed underlines once more the importance of using an evolutionary approach to study sleep and provides context and caveats to statements that perhaps were taken a bit too much for granted before. At the same time, the paper manages to have an extremely constructive approach, presenting the platform as a clear useful tool to explore the molecular aspects behind sleep and cellular damage in general. The discussion is fair, highlighting the strengths and weaknesses of the work and its implications.

      We fully agree with this assessment.  We are currently performing experiments to test the effects of additional DNA damaging agents.  We hope to extend these studies beyond DNA-damage agents to look more generally at how animals respond to stress including ROS, sleep deprivation, and high temperature.  This will be a major direction of the laboratory moving forward.

      The manuscript investigates the relationship between sleep, DNA damage, and aging in the Mexican cavefish (Astyanax mexicanus), a species that exhibits significant differences in sleep patterns between surface-dwelling and cave-dwelling populations. The authors aim to understand whether these evolved sleep differences influence the DNA damage response (DDR) and oxidative stress levels in the brain and gut of the fish.

      Summary of the Study:

      The primary objective of the study is to determine if the reduced sleep observed in cave-dwelling populations is associated with increased DNA damage and altered DDR. The authors compared levels of DNA damage markers and oxidative stress in the brains and guts of surface and cavefish. They also analyzed the transcriptional response to UV-induced DNA damage and evaluated the DDR in embryonic fibroblast cell lines derived from both populations.

      Strengths of the Study:

      Comparative Approach:

      The study leverages the unique evolutionary divergence between surface and cave populations of A. mexicanus to explore fundamental biological questions about sleep and DNA repair.

      Multifaceted Methodology:

      The authors employ a variety of methods, including immunohistochemistry, RNA sequencing, and in vitro cell line experiments, providing a comprehensive examination of DDR and oxidative stress.

      Interesting Findings:

      The study presents intriguing results showing elevated DNA damage markers in cavefish brains and increased oxidative stress in cavefish guts, alongside a reduced transcriptional response to UV-induced DNA damage.

      Weaknesses of the Study:

      Link to Sleep Physiology:

      The evidence connecting the observed differences in DNA damage and DDR directly to sleep physiology is not convincingly established. While the study shows distinct DDR patterns, it does not robustly demonstrate that these are a direct result of sleep differences.

      We agree with this assessment.  We are currently working to apply tools developed in zebrafish to examine the physiology of sleep.  While this is important, and our results our promising, we will note that functional analysis of sleep physiology in fish has been limited to zebrafish.  We hope future studies will allow us to integrate approaches that examine the physiology of sleep.

      Causal Directionality:

      The study fails to establish a clear causal relationship between sleep and DNA damage. It is possible that both sleep patterns and DDR responses are downstream effects of a common cause or independent adaptations to the cave environment.

      We agree, however, we note that this could be the case for all animals in which sleep has been linked to DNA damage.  We believe the most likely explanation for Astyanax and other animals studied, is that sleep is that sleep and DDR are downstream/interface with the sleep homeostat.

      Environmental Considerations:

      The lab conditions may not fully replicate the natural environments of the cavefish, potentially influencing the results. The impact of these conditions on the study's findings needs further consideration.

      This is correct. We have considered this carefully.  After nearly a decade of effort,  we have completed analysis of sleep in the wild.  These will be uploaded to BioRxiv within the next week.

      Photoreactivity in Albino Fish:

      The use of UV-induced DNA damage as a primary stressor may not be entirely appropriate for albino, blind cavefish. Alternative sources of genotoxic stress should be explored to validate the findings.

      We have addressed this above.  Future work will examine additional stressors. Both fish are transparent at 6dpf and so it is unlikely that albinism impacts the amount of UV that reaches the brain.

      Assessment of the Study's Achievements:

      The authors partially achieve their aims by demonstrating differences in DNA damage and DDR between surface and cavefish. However, the results do not conclusively support the claim that these differences are driven by or directly related to the evolved sleep patterns in cavefish. The study's primary claims are only partially supported by the data.

      Impact and Utility:

      The findings contribute valuable insights into the relationship between sleep and DNA repair mechanisms, highlighting potential areas of resilience to DNA damage in cavefish. While the direct link to sleep physiology remains unsubstantiated, the study's data and methods will be useful to researchers investigating evolutionary biology, stress resilience, and the molecular basis of sleep.

      Reviewer #3 (Public Review):

      Lloyd, Xia, et al. utilised the existence of surface-dwelling and cave-dwelling morphs of Astyanax mexicanus to explore a proposed link between DNA damage, aging, and the evolution of sleep. Key to this exploration is the behavioural and physiological differences between cavefish and surface fish, with cavefish having been previously shown to have low levels of sleep behaviour, along with metabolic alterations (for example chronically elevated blood glucose levels) in comparison to fish from surface populations. Sleep deprivation, metabolic dysfunction, and DNA damage are thought to be linked and to contribute to aging processes. Given that cavefish seem to show no apparent health consequences of low sleep levels, the authors suggest that they have evolved resilience to sleep loss. Furthermore, as extended wake and loss of sleep are associated with increased rates of damage to DNA (mainly double-strand breaks) and sleep is linked to repair of damaged DNA, the authors propose that changes in DNA damage and repair might underlie the reduced need for sleep in the cavefish morphs relative to their surface-dwelling conspecifics.

      To fulfill their aim of exploring links between DNA damage, aging, and the evolution of sleep, the authors employ methods that are largely appropriate, and comparison of cavefish and surface fish morphs from the same species certainly provides a lens by which cellular, physiological and behavioural adaptations can be interrogated. Fluorescence and immunofluorescence are used to measure gut reactive oxygen species and markers of DNA damage and repair processes in the different fish morphs, and measurements of gene expression and protein levels are appropriately used. However, although the sleep tracking and quantification employed are quite well established, issues with the experimental design relate to attempts to link induced DNA damage to sleep regulation (outlined below). Moreover, although the methods used are appropriate for the study of the questions at hand, there are issues with the interpretation of the data and with these results being over-interpreted as evidence to support the paper's conclusions.

      This study shows that a marker of DNA repair molecular machinery that is recruited to DNA double-strand breaks (γH2AX) is elevated in brain cells of the cavefish relative to the surface fish and that reactive oxygen species are higher in most areas of the digestive tract of the cavefish than in that of the surface fish. As sleep deprivation has been previously linked to increases in both these parameters in other organisms (both vertebrates and invertebrates), their elevation in the cavefish morph is taken to indicate that the cavefish show signs of the physiological effects of chronic sleep deprivation.

      It has been suggested that induction of DNA damage can directly drive sleep behaviour, with a notable study describing both the induction of DNA damage and an increase in sleep/immobility in zebrafish (Danio rerio) larvae by exposure to UV radiation (Zada et al. 2021 doi:10.1016/j.molcel.2021.10.026). In the present study, an increase in sleep/immobility is induced in surface fish larvae by exposure to UV light, but there is no effect on behaviour in cavefish larvae. This finding is interpreted as representing a loss of a sleep-promoting response to DNA damage in the cavefish morph. However, induction of DNA damage is not measured in this experiment, so it is not certain if similar levels of DNA damage are induced in each group of intact larvae, nor how the amount of damage induced compares to the pre-existing levels of DNA damage in the cavefish versus the surface fish larvae. In both this study with A. mexicanus surface morphs and the previous experiments from Zada et al. in zebrafish, observed increases in immobility following UV radiation exposure are interpreted as following from UV-induced DNA damage. However, in interpreting these experiments it is important to note that the cavefish morphs are eyeless and blind. Intense UV radiation is aversive to fish, and it has previously been shown in zebrafish larvae that (at least some) behavioural responses to UV exposure depend on the presence of an intact retina and UV-sensitive cone photoreceptors (Guggiana-Nilo and Engert, 2016, doi:10.3389/fnbeh.2016.00160). It is premature to conclude that the lack of behavioural response to UV exposure in the cavefish is due to a different response to DNA damage, as their lack of eyes will likely inhibit a response to the UV stimulus.

      We believe that in A. mexicanus, like in zebrafish, it is highly unlikely that the effects of UV are mediated through visual processing. Even if this were the case, the timeframe of UV activation is very short compared to the time-scale of sleep measurements so this is unlikely to be a confound.

      Indeed, were the equivalent zebrafish experiment from Zada et al. to be repeated with mutant larvae fish lacking the retinal basis for UV detection it might be found that in this case too, the effects of UV on behaviour are dependent on visual function. Such a finding should prompt a reappraisal of the interpretation that UV exposure's effects on fish sleep/locomotor behaviour are mediated by DNA damage.

      We prefer not to comment on Zada et al, as that is a separate manuscript.

      An additional note, relating to both Lloyd, Xia, et al., and Zada et al., is that though increases in immobility are induced following UV exposure, in neither study have assays of sensory responsiveness been performed during this period. As a decrease in sensory responsiveness is a key behavioural criterion for defining sleep, it is, therefore, unclear that this post-UV behaviour is genuinely increased sleep as opposed to a stress-linked suppression of locomotion due to the intensely aversive UV stimulus.

      We understand this concern and are working on improved methodology for measuring sleep.  However, behavioral measurements are the standard for almost every manuscript that has studied sleep in zebrafish, flies, and worms to date. 

      The effects of UV exposure, in terms of causing damage to DNA, inducing DNA damage response and repair mechanisms, and in causing broader changes in gene expression are assessed in both surface and cavefish larvae, as well as in cell lines derived from these different morphs. Differences in the suite of DNA damage response mechanisms that are upregulated are shown to exist between surface fish and cavefish larvae, though at least some of this difference is likely to be due to differences in gene expression that may exist even without UV exposure (this is discussed further below).

      UV exposure induced DNA damage (as measured by levels of cyclobutene pyrimidine dimers) to a similar degree in cell lines derived from both surface fish and cave fish. However, γH2AX shows increased expression only in cells from the surface fish, suggesting induction of an increased DNA repair response in these surface morphs, corroborated by their cells' increased ability to repair damaged DNA constructs experimentally introduced to the cells in a subsequent experiment. This "host cell reactivation assay" is a very interesting assay for measuring DNA repair in cell lines, but the power of this approach might be enhanced by introducing these DNA constructs into larval neurons in vivo (perhaps by electroporation) and by tracking DNA repair in living animals. Indeed, in such a preparation, the relationship between DNA repair and sleep/wake state could be assayed.

      Comparing gene expression in tissues from young (here 1 year) and older (here 7-8 years) fish from both cavefish and surface fish morphs, the authors found that there are significant differences in the transcriptional profiles in brain and gut between young and old surface fish, but that for cavefish being 1 year old versus being 7-8 years old did not have a major effect on transcriptional profile. The authors take this as suggesting that there is a reduced transcriptional change occurring during aging and that the transcriptome of the cavefish is resistant to age-linked changes. This seems to be only one of the equally plausible interpretations of the results; it could also be the case that alterations in metabolic cellular and molecular mechanisms, and particularly in responses to DNA damage, in the cavefish mean that these fish adopt their "aged" transcriptome within the first year of life.

      This is indeed true.  However, one could also interpret this as a lack of aging.  If the profile does not change over time, the difference seems largely semantic.

      A major weakness of the study in its current form is the absence of sleep deprivation experiments to assay the effects of sleep loss on the cellular and molecular parameters in question. Without such experiments, the supposed link of sleep to the molecular, cellular, and "aging" phenotypes remains tenuous. Although the argument might be made that the cavefish represent a naturally "sleep-deprived" population, the cavefish in this study are not sleep-deprived, rather they are adapted to a condition of reduced sleep relative to fish from surface populations. Comparing the effects of depriving fish from each morph on markers of DNA damage and repair, gut reactive oxygen species, and gene expression will be necessary to solidify any proposed link of these phenotypes to sleep.

      We agree this would be beneficial.  We note that relatively few papers have sleep deprived fish.  While we done have this before in A. mexicanus the assay is less than ideal and likely induces generalizable stress.  We are working on adapting more recently developed methods in zebrafish.

      A second important aspect that limits the interpretability and impact of this study is the absence of information about circadian variations in the parameters measured. A relationship between circadian phase, light exposure, and DNA damage/repair mechanisms is known to exist in A. mexicanus and other teleosts, and differences exist between the cave and surface morphs in their phenomena (Beale et al. 2013, doi: 10.1038/ncomms3769). Although the present study mentions that their experiments do not align with these previous findings, they do not perform the appropriate experiments to determine if such a misalignment is genuine. Specifically, Beale et al. 2013 showed that white light exposure drove enhanced expression of DNA repair genes (including cpdp which is prominent in the current study) in both surface fish and cavefish morphs, but that the magnitude of this change was less in the cave fish because they maintained an elevated expression of these genes in the dark, whereas the darkness suppressed the expression of these genes in the surface fish. If such a phenomenon is present in the setting of the current study, this would likely be a significant confound for the UV-induced gene expression experiments in intact larvae, and undermine the interpretation of the results derived from these experiments: as samples are collected 90 minutes after the dark-light transition (ZT 1.5) it would be expected that both cavefish and surface fish larvae should have a clear induction of DNA repair genes (including cpdp) regardless of 90s of UV exposure. The data in Supplementary Figure 3 is not sufficient to discount this potentially serious confound, as for larvae there is only gene expression data for time points from ZT2 to ZT 14, with all of these time points being in the light phase and not capturing any dynamics that would occur at the most important timepoints from ZT0-ZT1.5, in the relevant period after dark-light transition. Indeed, an appropriate control for this experiment would involve frequent sampling at least across 48 hours to assess light-linked and developmentally-related changes in gene expression that would occur in 5-6dpf larvae of each morph independently of the exposure to UV.

      We agree that this would be useful, however, frequent sampling is not feasible given the experiments presented here and the challenges of working with an emerging model.

      On a broader point, given the effects of both circadian rhythm and lighting conditions that are thought to exist in A. mexicanus (e.g. Beale et al. 2013) experiments involving measurements of DNA damage and repair, gene expression, and reactive oxygen species, etc. at multiple times across >1 24 hour cycle, in both light-dark and constant illumination conditions (e.g. constant dark) would be needed to substantiate the authors' interpretation that their findings indicate consistently altered levels of these parameters in the cavefish relative to the surface fish. Most of the data in this study is taken at only single time points.

      Again, see comment above.  The goal was to identify whether there are differences in DNA Damage response between A. mexcicanus. Extending on this to examine interactions with the circadian system could be a useful path to pursue in the future.

      On a broader point, given the effects of both circadian rhythm and lighting conditions that are thought to exist in A. mexicanus (e.g. Beale et al. 2013) experiments involving measurements of DNA damage and repair, gene expression, and reactive oxygen species, etc. at multiple times across >1 24 hour cycle, in both light-dark and constant illumination conditions (e.g. constant dark) would be needed to substantiate the authors' interpretation that their findings indicate consistently altered levels of these parameters in the cavefish relative to the surface fish. Most of the data in this study is taken at only single time points.

      In summary, the authors show that there are differences in gene expression, activity of DNA damage response and repair pathways, response to UV radiation, and gut reactive oxygen species between the Pachón cavefish morph and the surface morph of Astyanax mexicanus. However, the data presented does not make the precise nature of these differences very clear, and the interpretation of the results appears to be overly strong. Furthermore, the evidence of a link between these morph-specific differences and sleep is unconvincing.

      In summary, the authors show that there are differences in gene expression, activity of DNA damage response and repair pathways, response to UV radiation, and gut reactive oxygen species between the Pachón cavefish morph and the surface morph of Astyanax mexicanus. However, the data presented does not make the precise nature of these differences very clear, and the interpretation of the results appears to be overly strong. Furthermore, the evidence of a link between these morph-specific differences and sleep is unconvincing.

    1. Author response:

      Reviewer #1 (Public review):

      Point 1. The authors postulate a synergistic role for Itgb1 and Itgb3 in the intravasation phenotype, because the single KOs did not replicate the phenotype of the DKO. However, this is not a correct interpretation in the opinion of this reviewer. The roles appear rather to be redundant. Synergistic roles would rather demonstrate a modest effect in the single KO with potentiation in the DKO.

      We agree that the interaction between Itgb1 and Itgb3 appears redundant and we will correct this point in the revised manuscript.

      Point 2. The experiment does not explain how these integrins influence the interaction of the MK with their microenvironment. It is not surprising that attachment will be impacted by the presence or absence of integrins. However, it is unclear how activation of integrins allows the MK to become "architects for their ECM microenvironment" as the authors posit. A transcriptomic analysis of control and DKO MKs may help elucidate these effects.

      We do not currently understand how α5β1 or αvβ3 integrins activation would contribute to ECM remodeling by megakaryocytes. Integrins are well known key regulators of ECM remodelling (https://doi.org/10.1016/j.ceb.2006.08.009). They can transmit traction force that provoques ECM remodelling (https://doi.org/10.1016/j.bpj.2008.10.009). We will discuss our previous study on the observed reduction in RhoA activation in double knockout (DKO) mice (Guinard et al., 2023,  PMID: 37171626), which likely impact the organization of the ECM microenvironment. Alternatively, integrin signalling contribute to gene expression regulation involved in ECM remodelling (ECM proteins, proteases….). We do agree with the reviewer that the transcriptomic analysis could provide strong evidence; however, it is challenging to perform this analysis in vivo. Isolation of native megakaryocytes (MKs) from DKO mice is challenging due to their reduced numbers, requiring too many mice for sufficient RNA and risk of cell contamination. An alternative approach will be to analyze platelets, which are more abundant and easier to isolate, while still mimicking the characteristics of bone marrow MKs. We will use PCR array technology for selected ECM panels and adhesion molecules (from all players currently known to contribute to ECM remodelling), providing a practical way to address the reviewer's suggestions and provide valuable insights.

      Point 3. Integrin DKO have a 50% reduction in platelets counts as reported previously, however laminin α4 deficiency only leads to 20% reduction in counts. This suggests a more nuanced and subtle role of the ECM in platelet growth. To this end, functional assays of the platelets in the KO and wildtype mice may provide more information.

      The difference in platelet counts between integrin DKO and laminin α4 KO mice is not fully understood. Although our study specifically focuses on MK-ECM interactions in the bone marrow, we recognize the importance of providing additional information on platelet functionality. To address this, we will use flow cytometry to examine the levels of P-selectin surface expression and fibrinogen binding under basal conditions and after stimulation with collagen-related peptide and TRAP.

      Point 4. There is insufficient information in the Methods Section to understand the BM isolation approach. Did the authors flush the bone marrow and then image residual bone, or the extruded bone marrow itself as described in PMID: 29104956?

      Additional information on the methodology will be provided to clarify the BM isolation.

      Point 5. The references in the Methods section were very frustrating. The authors reference Eckly et al 2020 (PMID: 32702204) which provides no more detail but references a previous publication (PMID: 24152908), which also offers no information and references a further paper (PMID: 22008103), which, as far as this reviewer can tell, did not describe the methodology of in situ bone marrow imaging.

      To address this confusion, we will add the reference "In Situ Exploration of the Major Steps of Megakaryopoiesis Using Transmission Electron Microscopy" by C. Scandola et al. (PMID: 34570102), which provides a standardized protocol for bone marrow isolation.

      Therefore, this reviewer cannot tell how the preparation was performed and, importantly, how can we be sure that the microarchitecture of the tissue did not get distorted in the process?

      Thank you for pointing this out. While we cannot completely rule out the possibility of distortion, we will clarify the precautions taken to minimize it. We utilized a double fixation process immediately after extruding the bone marrow, followed by embedding it in agarose to preserve its integrity as much as possible. We will address this point in greater detail in Methods section of the revised version.

      Reviewer #2 (Public review):

      Point 1. ECM cage imaging

      a) The value or additional information provided by the staining on nano-sections (A) is not clear, especially considering that the thick vibratome sections already display the entirety of the laminin γ1 cage structure effectively. Further clarification on the unique insights gained from each approach would help justify its inclusion.

      Ultrathin cryosection allow high-resolution imaging (10x fold increased in Z), facilitating the analysis of signal superposition. This study explores the interactions between MKs and their immediate ECM microenvironment, located at a distance of less than one micrometer, making nano-sections optimal for precise analysis of ECM distribution both within and surrounding MKs. This high-resolution approach has revealed the presence of collagen IV, laminin, fibronectin, and fibrinogen near MKs, More importantly, ultrathin cryosection allow us to clearly show with high resolution the presence of activated integrin in contact with laminin an coll IV fibers (see Fig. 3)

      We employed large-volume whole-mount imaging to clarify the overall three-dimensional architecture of the ECM interface, allowing us to identify the cages. Our findings emphasize the role of specific ECM components in facilitating proplatelet passage through the sinusoid barrier, an essential step for platelet production. Further details will be addressed in the revised manuscript.

      b) The sMK shown in Supplementary Figure 1C appears to be linked to two sinusoids, releasing proplatelets to the more distant vessels. Is this observation representative, and if so, can further discussion be provided?

      This observation is not representative; MKs can also be associated with just one sinusoid.

      c) Freshly isolated BM-derived MKs are reported to maintain their laminin γ1 cage. Are the proportions of MKs with/without cages consistent with those observed in microscopy?   

      In the revised manuscript, we will include the quantification of the proportion of BM-derived MKs with/without cages.

      Point 2.  ECM cage formation

      a) The statement "the full assembly of the 3D ECM cage required megakaryocyte interaction with the sinusoidal basement membrane" on page 7 is too strong given the data presented at this stage of the study. Supplemental Figure 1C shows that approximately 10% of pMKs form cages without direct vessel contact, indicating that other factors may also play a role in cage formation.

      The reviewer is correct. We will modify the text to reflect a more cautious interpretation of our results.

      b) The data supporting the statement that "pMK represent a small fraction of the total MK population" (cell number or density) could be shown to help contextualize the 10% of them with a cage.

      New bar graphs will be provided to represent the density of MK in the parenchyma against the total MK in the bone marrow.

      c) How "the full assembly of the 3D ECM cage" is defined at this stage of the study should be clarified, specifically regarding the ECM components and structural features that characterize its completion.

      We recognize that the term ' full assembly' of the 3D ECM cage can be misleading, as it might suggest different stages of cage formation, such as a completed cage, one that is in the process of formation, or an incomplete cage. Since we have not yet studied this concept, we will eliminate the term "full assembly" from the manuscript to avoid any confusion. Instead, we will simply mention the presence of a cage.

      Point 3. Data on MK Circulation and Cage Integrity: Does the cage require full component integrity to prevent MK release in circulation? Are circulating MKs found in Lama4-/- mice? Is the intravasation affected in these mice? Are the ~50% sinusoid associated MK functional?  

      These are very valid points. We will answer all these questions by performing a detailed analysis of MK localization, vessel association and intravascular MK detection using IF and high-resolution EM imaging of Lamα4<sup>-/-</sup> mice. Additionally, we will analyze data from Lamα4-/- bone marrow explants to assess the capacity of MKs to extend proplatelets.

      Point 4. Methodology

      a) Details on fixation time are not provided, which is critical as it can impact antibody binding and staining. Including this information would improve reproducibility and feasibility for other researchers.

      We will added this information in the methods section.

      b) The description of 'random length measuring' is unclear, and the rationale behind choosing random quantification should be explained. Additionally, in the shown image, it appears that only the branching ends were measured, which makes it difficult to discern the randomness in the measurements.

      The random length measurement method uses random sampling to provide unbiased data on laminin/collagen fibers in a 3D cage. Contrary to what the initial image might have suggested, measurements go beyond just the branching ends; they include intervals between various branching points throughout the cage.

      To clarify this process, we will outline these steps: 1) acquire 3D images, 2) project onto 2D planar sections, 3) select random intersection points for measurement, 4) measure intervals using ImageJ software, and 5) repeat the process for a representative dataset. This will better illustrate the randomness of our measurements.

      Point 5.  Figures

      a) Overall, the figures and their corresponding legends would benefit from greater clarity if some panels were split, such as separating images from graph quantifications.

      Following the reviewer’s suggestion, we will fully update all the Figures and separate images from graph quantifications.

      Reviewer #3 (Public review):

      Point 1. The data linking ECM cage formation to MK maturation raises several interesting questions. As the authors mention, MKs have been suggested to mature rapidly at the sinusoids, and both integrin KO and laminin KO MKs appear mislocalized away from the sinusoids. Additionally, average MK distances from the sinusoid may also help separate whether the maturation defects could be in part due to impaired migration towards CXCL12 at the sinusoid. Presumably, MKs could appear mislocalized away from the sinusoid given the data presented suggesting they leaving the BM and entering circulation. Additional data or commentary on intrinsic (ex-vivo) MK maturation phenotypes may help strengthen the author's conclusions and shed light on whether an essential function of the ECM cage is integrin activation at the sinusoid.

      The hypothesis of MK migration towards CXCL12 is interesting, although it has recently been challenged by Stegner et al. (2017), who found that MKs are primarily sessile. However, we cannot exclude this possibility. To address the reviewer's concerns, we will quantify the distance of MKs from the sinusoids. This could help to determine whether the maturation defects are due to impaired migration towards CXCL12 at the sinusoids or other factors, such as the ECM cage.

      We would appreciate some clarification regarding the second point raised by the reviewer. Is the question  specifically addressing whether the ECM cage has an effect on the activation of integrins in the sinusoids? If so, we will use immunofluorescence (IF) to investigate the relationship between the presence of an ECM cage and the activation of integrins on the surface of endothelial cells within the sinusoids. Thank you for your guidance on this matter.

      Point 2. The data demonstrating intact MKs inter circulation is intriguing - can the authors comment or provide evidence as to whether MKs are detectable in blood? A quantitative metric may strengthen these observations.

      We will conduct flow cytometry experiments and prepare blood smears to determine whether intact MKs are detectable in blood.

      Point 3. Supplementary Figure 6 - shows no effect on in vitro MK maturation and proplt, or MK area - But Figures 6B/6C demonstrate an increase in total MK number in MMP-inhibitor treated mice compared to control. Some additional clarification in the text may substantiate the author's conclusions as to either the source of the MMPs or the in vitro environment not fully reflecting the complex and dynamic niche of the BM ECM in vivo.

      This is a valid point. We will revise the text to include further clarification.

      Point 4.  Similarly, one function of the ECM discussed relates to MK maturation but in the B1/3 integrin KO mice, the presence of the ECM cage is reduced but there appears to be no significant impact upon maturation (Supplementary Figure 4). By contrast, MMP inhibition in vivo (but not in vitro) reduces MK maturation. These data could be better clarified in the text, or by the addition of experiments addressing whether the composition and quantity of ECM cage components directly inhibit maturation versus whether effects of MMP-inhibitors perhaps lead to over-activation of the integrins (as with the B4galt KO in the discussion) are responsible for the differences in maturation.

      These are very good questions, but they are difficult to assess in situ. To approach this, we will perform in vitro experiments :

      (1) We will vary collagenIV and laminin411 concentrations in the culture conditions to determine how this affects MK maturation ; and

      (2) We will assess the integrin activation states on cultured MKs treated with MMP inhibitors to determine if MMP inhibitors could influence MK maturation through over-activation of integrins.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This paper contains what could be described as a "classic" approach towards evaluating a novel taste stimuli in an animal model, including standard behavioral tests (some with nerve transections), taste nerve physiology, and immunocytochemistry of the tongue. The stimulus being tested is ornithine, from a class of stimuli called "kokumi", which are stimuli that enhance other canonical tastes, increasing essentially the hedonic attributes of these other stimuli; the mechanism for ornithine detection is thought to be GPRC6A receptors expressed in taste cells. The authors showed evidence for this in an earlier paper with mice; this paper evaluates ornithine taste in a rat model.

      Strengths:

      The data show the effects of ornithine on taste: in two-bottle and briefer intake tests, adding ornithine results in a higher intake of most, but not all, stimuli tests. Bilateral nerve cuts or the addition of GPRC6A antagonists decrease this effect. Small effects of ornithine are shown in whole-nerve recordings.

      Weaknesses:

      The conclusion seems to be that the authors have found evidence for ornithine acting as a taste modifier through the GPRC6A receptor expressed on the anterior tongue. It is hard to separate their conclusions from the possibility that any effects are additive rather than modulatory. Animals did prefer ornithine to water when presented by itself. Additionally, the authors refer to evidence that ornithine is activating the T1R1-T1R3 amino acid taste receptor, possibly at higher concentrations than they use for most of the study, although this seems speculative. It is striking that the largest effects on taste are found with the other amino acid (umami) stimuli, leading to the possibility that these are largely synergistic effects taking place at the tas1r receptor heterodimer.

      We would like to thank Reviewer #1 for the valuable comments. Our basis for considering ornithine as a taste modifier stems from our observation that a low concentration of ornithine (1 mM), which does not elicit a preference on its own, enhances the preference for umami substances, sucrose, and soybean oil through the activation of the GPRC6A receptor. Notably, this receptor is not typically considered a taste receptor. The reviewer suggested that the enhancement of umami taste might be due to potentiation occurring at the TAS1R receptor heterodimer. However, we propose that a different mechanism may be at play, as an antagonist of GPRC6A almost completely abolished this enhancement. In the revised manuscript, we will endeavor to provide additional information on the role of ornithine as a taste modifier acting through the GPRC6A receptor.

      Reviewer #2 (Public review):

      Summary:

      The authors used rats to determine the receptor for a food-related perception (kokumi) that has been characterized in humans. They employ a combination of behavioral, electrophysiological, and immunohistochemical results to support their conclusion that ornithine-mediated kokumi effects are mediated by the GPRC6A receptor. They complemented the rat data with some human psychophysical data. I find the results intriguing, but believe that the authors overinterpret their data.

      Strengths:

      The authors examined a new and exciting taste enhancer (ornithine). They used a variety of experimental approaches in rats to document the impact of ornithine on taste preference and peripheral taste nerve recordings. Further, they provided evidence pointing to a potential receptor for ornithine.

      Weaknesses:

      The authors have not established that the rat is an appropriate model system for studying kokumi. Their measurements do not provide insight into any of the established effects of kokumi on human flavor perception. The small study on humans is difficult to compare to the rat study because the authors made completely different types of measurements. Thus, I think that the authors need to substantially scale back the scope of their interpretations. These weaknesses diminish the likely impact of the work on the field of flavor perception.

      We would like to thank Reviewer #2 for the valuable comments and suggestions. Regarding the question of whether the rat is an appropriate model system for studying kokumi, we have chosen this species for several reasons: it is readily available as a conventional experimental model for gustatory research; the calcium-sensing receptor (CaSR), known as the kokumi receptor, is expressed in taste bud cells; and prior research has demonstrated the use of rats in kokumi studies involving gamma Glu-Val-Gly (Yamamoto and Mizuta, Chem. Senses, 2022).

      We acknowledge that fundamentally different types of measurements were conducted in the human psychophysical study and the rat study. Kokumi can indeed be assessed and expressed in humans; however, we do not currently have the means to confirm that animals experience kokumi in the same way that humans do. Therefore, human studies are necessary to evaluate kokumi, a conceptual term denoting enhanced flavor, while animal studies are needed to explore the potential underlying mechanisms of kokumi. We believe that a combination of both human and animal studies is essential, as is the case with research on sugars. While sugars are known to elicit sweetness, it is unclear whether animals perceive sweetness identically to humans, even though they exhibit a strong preference for sugars. In the revised manuscript, we will incorporate additional information to address the comments raised by the reviewer. We will also carefully review and revise our previous statements to ensure accuracy and clarity.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to investigate whether GPRC6A mediates kokumi taste initiated by the amino acid L-ornithine. They used Wistar rats, a standard laboratory strain, as the primary model and also performed an informative taste test in humans, in which miso soup was supplemented with various concentrations of L-ornithine. The findings are valuable and overall the evidence is solid. L-Ornithine should be considered to be a useful test substance in future studies of kokumi taste and the class C G protein-coupled receptor known as GPRC6A (C6A) along with its homolog, the calcium-sensing receptor (CaSR) should be considered candidate mediators of kokumi taste.

      Strengths:

      The overall experimental design is solid based on two bottle preference tests in rats. After determining the optimal concentration for L-Ornithine (1 mM) in the presence of MSG, it was added to various tastants, including inosine 5'-monophosphate; monosodium glutamate (MSG); mono-potassium glutamate (MPG); intralipos (a soybean oil emulsion); sucrose; sodium chloride (NaCl); citric acid and quinine hydrochloride. Robust effects of ornithine were observed in the cases of IMP, MSG, MPG, and sucrose, and little or no effects were observed in the cases of sodium chloride, citric acid, and quinine HCl. The researchers then focused on the preference for Ornithine-containing MSG solutions. The inclusion of the C6A inhibitors Calindol (0.3 mM but not 0.06 mM) or the gallate derivative EGCG (0.1 mM but not 0.03 mM) eliminated the preference for solutions that contained Ornithine in addition to MSG. The researchers next performed transections of the chord tympani nerves (with sham operation controls) in anesthetized rats to identify the role of the chorda tympani branches of the facial nerves (cranial nerve VII) in the preference for Ornithine-containing MSG solutions. This finding implicates the anterior half-two thirds of the tongue in ornithine-induced kokumi taste. They then used electrical recordings from intact chorda tympani nerves in anesthetized rats to demonstrate that ornithine enhanced MSG-induced responses following the application of tastants to the anterior surface of the tongue. They went on to show that this enhanced response was insensitive to amiloride, selected to inhibit 'salt tastant' responses mediated by the epithelial Na+ channel, but eliminated by Calindol. Finally, they performed immunohistochemistry on sections of rat tongue demonstrating C6A positive spindle-shaped cells in fungiform papillae that partially overlapped in its distribution with the IP3 type-3 receptor, used as a marker of Type-II cells, but not with (i) gustducin, the G protein partner of Tas1 receptors (T1Rs), used as a marker of a subset of type-II cells; or (ii) 5-HT (serotonin) and Synaptosome-associated protein 25 kDa (SNAP-25) used as markers of Type-III cells.

      Weaknesses:

      The researchers undertook what turned out to be largely confirmatory studies in rats with respect to their previously published work on Ornithine and C6A in mice (Mizuta et al Nutrients 2021).

      The authors point out that animal models pose some difficulties of interpretation in studies of taste and raise the possibility in the Discussion that umami substances may enhance the taste response to ornithine (Line 271, Page 9). They miss an opportunity to outline the experimental results from the study that favor their preferred interpretation that ornithine is a taste enhancer rather than a tastant.

      At least two other receptors in addition to C6A might mediate taste responses to ornithine: (i) the CaSR, which binds and responds to multiple L-amino acids (Conigrave et al, PNAS 2000), and which has been previously reported to mediate kokumi taste (Ohsu et al., JBC 2010) as well as responses to Ornithine (Shin et al., Cell Signaling 2020); and (ii) T1R1/T1R3 heterodimers which also respond to L-amino acids and exhibit enhanced responses to IMP (Nelson et al., Nature 2001). While the experimental results as a whole favor the authors' interpretation that C6A mediates the Ornithine responses, they do not make clear either the nature of the 'receptor identification problem' in the Introduction or the way in which they approached that problem in the Results and Discussion sections. It would be helpful to show that a specific inhibitor of the CaSR failed to block the ornithine response. In addition, while they showed that C6A-positive cells were clearly distinct from gustducin-positive, and thus T1R-positive cells, they missed an opportunity to clearly differentiate C6A-expressing taste cells and CaSR-expressing taste cells in the rat tongue sections.

      It would have been helpful to include a positive control kokumi substance in the two-bottle preference experiment (e.g., one of the known gamma-glutamyl peptides such as gamma-glu-Val-Gly or glutathione), to compare the relative potencies of the control kokumi compound and Ornithine, and to compare the sensitivities of the two responses to C6A and CaSR inhibitors.

      The results demonstrate that enhancement of the chorda tympani nerve response to MSG occurs at substantially greater Ornithine concentrations (10 and 30 mM) than were required to observe differences in the two bottle preference experiments (1.0 mM; Figure 2). The discrepancy requires careful discussion and if necessary further experiments using the two-bottle preference format.

      We would like to thank Reviewer #3 for the valuable comments and helpful suggestions. We propose that ornithine has two stimulatory actions: one acting on GPRC6A, particularly at lower concentrations, and another on amino acid receptors such as T1R1/T1R3 at higher concentrations. Consequently, ornithine is not preferable at lower concentrations but becomes preferable at higher concentrations. For our study on kokumi, we used a low concentration (1 mM) of ornithine. The possibility mentioned in the Discussion that 'the umami substances may enhance the taste response to ornithine' is entirely speculative. We will reconsider including this description in the revised version. As the reviewer suggested, in addition to GPRC6A, ornithine may bind to CaSR and/or T1R1/T1R3 heterodimers. However, we believe that ornithine mainly binds to GPRC6A, as a specific inhibitor of this receptor almost completely abolished the enhanced response to umami substances, and our immunohistochemical study indicated that GPRC6A-expressing taste cells are distinct from CaSR-expressing taste cells (see Supplemental Fig. 3). We conducted essentially the same experiments using gamma-Glu-Val-Gly in Wistar rats (Yamamoto and Mizuta, Chem. Senses, 2022) and compared the results in the Discussion. The reviewer may have misunderstood the chorda tympani results: we added the same concentration (1 mM) used in the two-bottle preference test to MSG (Fig. 5-B). Fig. 5-A shows nerve responses to five concentrations of plain ornithine. In the revised manuscript, we will strive to provide more precise information reflecting the reviewer’s comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The behavioral effects found with the CPRC6A antagonists are not entirely convincing, as the antagonist is seemingly just mixed up in the solution with the stimuli. There are no control experiments demonstrating that the antagonists do not have a taste themselves.

      We mixed the antagonists into both liquids used in the two-bottle preference test to eliminate any potential taste effects of the antagonists themselves. In the electrophysiological experiments, the antagonist was incorporated into the solution after confirming that it did not elicit any appreciable response in the taste nerve.

      (2) The effects of ornithine found with quinine did not have a satisfying explanation - if there is some taste cell-taste cell modulation that accounts for the taste enhancement, why is the quinine less aversive? Why is it not enhanced like the other compounds?

      The effects of ornithine on quinine responses remain difficult to explain. A previous study (Tokuyama et al., Chem Pharm Bull, 2006) proposed that ornithine prevents bitter substances from binding to bitter receptors, although this hypothesis lacks definitive evidence. In the present study, our findings suggest that the binding of quinine to bitter receptors is essential, as another agonist, gallate, also enhanced the preference for quinine, but this effect was abolished by EGCG, a GPRC6A antagonist (see Supplemental Fig. 2).

      (3) Unless I am missing something, there appears to be no quantitative analysis of the immunocytochemical data, just assertions.

      We have made quantitative analyses in the revised text, and the following sentences have been added: “Approximately 11% of GPRC6A-positive cells overlapped with IP3R3 (9 double-positive cells/80 GPRC6A-positive cells), while approximately 8.3% of IP3R3-positive cells expressed GPRC6A (9 double-positive /109 IP3R3-positive cells). In addition, GPRC6A-positive cells were unlikely to colocalize with a-gustducin, another marker for a subset of type II cells, in single taste cells (0 double-positive cell/93 GPRC6A-positive cells). Regarding type III cell markers, GPRC6A-positive cells were unlikely to colocalize with 5-HT in single taste cells (0 double-positive cell/75 GPRC6A-positive cells).”

      (4) The hallmarks of Kokumi taste include descriptors such as "thickness", and "mouthfeel", which sound like potential somatosensory attributes. Perhaps the authors should consider this possibility for at least some of the effects found.

      The term kokumi, a Japanese word, refers to a phenomenon in which the flavor of complexly composed food is enhanced through certain processes, making them more delicious. To date, kokumi has been described using the representative terms thickness, mouthfulness, and continuity, originally introduced in the first paper on kokumi by Ueda et al. (1990). However, these terms are derived from Japanese and may not fully convey the nuances of the original language when translated into these simple English words. In particular, thickness is often interpreted as referring to physical properties such as viscosity or somatosensory sensations. Since kokumi inherently lacks somatosensory elements, this revised paper adopts alternative terms and explanations for the three components of kokumi to prevent misunderstanding and confusion.

      Therefore, to clarify that kokumi attributes are inherently gustatory, thickness is replaced with intensity of whole complex tastes (rich flavor with complex tastes), emphasizing the synergistic effects of a variety of tastes rather than the mere enhancement of a single flavor. Mouthfulness is clarified as not referring to mouthfeel (the tactile sensation a food gives in the mouth) but rather as spread of taste and flavor throughout the oral cavity, describing how the flavor fills the mouth. Continuity is replaced with persistence of taste (lingering flavor).

      (5) I don't think the human experiment (S1) belongs to the paper, even as a supplementary bit of data. It's only 17 subjects, they are all female, and we don't know anything about how they were selected, even though it states they are all students/staff at Kio. Were any of them lab members? Were they aware of the goals of the experiment? Could simply increasing the amount of solute in the soup make it seem thicker? This (sparse) data seems to have been shoehorned into the paper without enough detail/justification.

      Despite the reviewer’s suggestion, we would like to include the human experiment because the rationale of the present study is to confirm, through a human sensory test, that the kokumi of a complex solution (in this case, miso soup) is enhanced by the addition of ornithine. This is followed by basic animal experiments to investigate the underlying mechanisms. Therefore, this human study serves an important role.

      The total number of participants increased to 22 (19 women and three men) following an additional experiment with 5 new participants. New results have been shown in Supplemental Figure 1 with statistical analyses. The rewritten parts are as follows:

      We recruited 22 participants (19 women and three men, aged 21-28 years) from Kio University who were not affiliated with our laboratory, including students and staff members. All participants passed a screening test based on taste sensitivity. According to the responses obtained from a pre-experimental questionnaire, we confirmed that none of the participants had any sensory abnormalities, eating disorders, or mental disorders, or were taking any medications that may potentially affect their sense of taste. All participants were instructed not to eat or drink anything for 1 hour prior to the start of the experiment. We provided them with a detailed explanation of the experimental procedures, including safety measures and personal data protection, without revealing the specific goals of the study.

      (6) The introduction could be more concise - for example, when describing Kokumi stimuli such as ornithine and its possible receptors, the authors do not need to add the detail about how this stimulus was deduced from adding clams to the soup. Details like this can be reserved for the discussion.

      Thank you for this comment. We have tried to shorten the Introduction.

      (7) Line 86: awkward phrasing - this doesn't need to be a rhetorical question.

      We have deleted the sentence.

      (8) Supplementary Figure 1: The labels on the figure say "Miso soup in 1 mM Orn" when the Orn is dissolved into the soup.

      Thank you for pointing out our mistake. We have changed the description, such as “1 mM Orn in miso soup”.

      Reviewer #2 (Recommendations for the authors):

      Major concerns

      (1) The impact of "kokumi" taste ligands on food perception appears to be profound in humans. This observation is fascinating because it implies that molecules like ornithine impact a variety of flavor perceptions, some of which are non-gustatory in nature (e.g., spread, mouthfulness and harmony). What remains unclear is whether "kokumi" ligands produce analogous sensations in rodents. If they don't, then rodents are an inappropriate model system for studying the impact of kokumi on flavor perceptions. The authors fail to address this key issue, and uncritically assume that kokumi ligands produce sensations like thickness, mouthfulness, and continuity in rodents. For this reason, the authors' reference to GPRC6A as a kokumi receptor is inappropriate.

      Thank you very much for the valuable comments. The term kokumi refers to a phenomenon in which the flavor of complexly composed foods is enhanced through certain processes, making them more delicious. It is an important concept in the field of food science, which studies how to make prepared dishes more enjoyable. Kokumi is also considered a higher-order, profound cognitive function evaluated by humans who experience a wide variety of foods. However, it is unclear whether animals, particularly experimental animals, can perceive kokumi in the same way humans do.

      To date, kokumi has been described using the representative terms thickness, mouthfulness, and continuity, originally introduced in the first paper on kokumi by Ueda et al. (1990). However, these terms are derived from Japanese and may not fully convey the nuances of the original language when translated into these simple English words. In particular, thickness is often interpreted as referring to physical properties such as viscosity or somatosensory sensations. Since kokumi inherently lacks somatosensory elements, this revised paper adopts alternative terms and explanations for the three components of kokumi to prevent misunderstanding and confusion.

      Therefore, to clarify that kokumi attributes are inherently gustatory, thickness is replaced with intensity of whole complex tastes (rich flavor with complex tastes), emphasizing the synergistic effects of a variety of tastes rather than the mere enhancement of a single flavor. Mouthfulness is clarified as not referring to mouthfeel (the tactile sensation a food gives in the mouth) but rather as spread of taste and flavor throughout the oral cavity, describing how the flavor fills the mouth. Continuity is replaced with persistence of taste (lingering flavor).

      Rodents are thought to possess basic taste functions similar to humans, such as the expression of taste receptors, including kokumi receptors, in taste cells. Regardless of whether rodents can perceive kokumi, findings from studies on rodents may provide insights into aspects of the kokumi concept as experienced by humans.

      Indeed, the results of this study indicate that ornithine enhances umami, sweetness, fat taste, and saltiness, leading to the enhancement of complex flavors—referred to as intensity of whole taste. The activation of various taste cells, resulting in the enhancement of multiple tastes, may contribute to the sensation of flavors spreading throughout the oral cavity. Furthermore, the strong enhancement of MSG and MPG suggests that glutamate contributes to the mouthfulness and persistence of taste characteristic of kokumi.

      (2) A related concern is that the authors did not make any measurements that model kokumi sensations documented in the literature. For example, they would need to develop behavioral/electrophysiological measurements that reflect the known effects of kokumi ligands on flavor perception (i.e., increases in intensity, spread, continuity, richness, harmony, and punch). For example, ornithine is thought to produce more "punch" (i.e., a more rapid rise in intensity). This could be manifested as a more rapid rise in peripheral taste response or a more rapid fMRI response in the taste cortex. Alternatively, ornithine is thought to increase "continuity" (i.e., make the taste response more persistent). This response would presumably be manifested as a peripheral taste response that adapts more slowly or a more persistent fMRI response. As it stands, the authors have documented that ornithine increases (i) the preference of rats for some chemical stimuli, but not others; and (ii) the response of the CT nerve to some but not all taste stimuli.

      In animal experiments, it is challenging to examine each attribute of kokumi. The increase of complex tastes can be investigated through behavioral experiments and neural activity recordings. However, phenomena such as spread or harmony, which arise from profound human judgments, are difficult to validate in animal studies.

      While it was possible to examine persistence through neural responses to tastants, all stimuli were rinsed at 30 seconds after onset of stimulation, so the exact duration of persistence was not investigated. However, since the MSG response was enhanced approximately 1.5 times with the addition of ornithine, it is strongly suggested that the duration might also have been prolonged.

      Regarding punch, no differences were observed in the neural responses when ornithine was added, likely because the phasic response already had a rapid onset.

      In the context of fMRI studies, there has been a report that adding glutathione to mixtures of umami and salt solutions increases responses (Goto et al. Chem Senses, 2016). However, research specifically examining the attributes of kokumi has not yet been reported.

      (3) The quality of the SNAP-25 immunohistochemistry is poor (see Figure 7D), with lots of seemingly nonspecific staining in and outside the taste bud.

      The quality of the SNAP-25 is not poor. It is known that SNAP-25 labels not only type III cells but also the dense network of intragemmal nerve fibers (Tizzano et al., Immunohistochemical Analysis of Human Vallate Taste Buds. Chem Senses.40:655-60, 2015). Therefore, lots of seemingly nonspecific staining is due to intense SNAP-25-immunoreactivity of the nerve fibers.

      (4) The authors need to drastically scale back the scope of their conclusions. What they can say is that ornithine appears to enhance the taste responses of rats to a variety of taste stimuli and that this effect appears to be mediated by the GPRC6A receptor. They cannot use their data to address kokumi effects in humans, as they have not attempted to model any of these effects. Given the known problems with pharmacological blocking agents (e.g., nonspecificity), the authors would significantly strengthen their case if they could generate similar results in a GPRC6A knockout mouse.

      Our research approach begins with confirming in humans that the addition of ornithine to complex foods (such as miso soup) induces kokumi. Based on this confirmation, we conduct fundamental studies using animal models to investigate the peripheral taste mechanisms underlying the expression of kokumi.

      It is possible that the key to kokumi expression lies in the enhancement of desirable tastes (particularly umami) and the suppression of unpleasant tastes. Moving forward, we will deepen our fundamental research on the action of ornithine mediated through GPRC6A, including studies using knockout mice.

      (5) The introduction is too long. Much of the discussion of kokumi perception in humans should either be removed or shortened considerably.

      Following the reviewer’s suggestion, the introduction has been shortened.

      (6) I recommend that the authors break up the Methods and Results sections into different experiments. This would enable the authors to provide separate rationales for each procedure. For instance, the authors conducted a variety of different behavioral procedures (e.g., long- and short-term preference tests, and preference tests with and without GPRC6A receptor antagonists).

      Rather than following the reviewer’s suggestion, we have added subheadings to describe the purpose of each experiment. This approach would help readers better understand the experimental flow, as each experiment is relatively straightforward.

      (7) The inclusion of the human data is odd for two reasons. First, the measurements used to assess the impact of ornithine on flavor perception in humans were totally different than those used in rats. This makes it impossible to compare the human and rat datasets. Second, the human study was rather limited in scope, had small effect sizes, and had a lot of individual variation. For these reasons, the human data are not terribly helpful. I recommend that the authors remove the human data from this paper, and publish them as part of a more extensive study on humans.

      Despite the reviewer’s suggestion, we would like to include the human experiment because the rationale of the present study is to confirm, through a human sensory test, that the kokumi of a complex solution (in this case, miso soup) is enhanced by the addition of ornithine. This is followed by basic animal experiments to investigate the underlying mechanisms. Therefore, this human study serves an important role. The considerable variation in the scores suggests that evaluating the three kokumi attributes is challenging and likely influenced by differences in judgment criteria among participants.

      The total number of participants increased to 22 (19 women and three men) following an additional experiment with 5 new participants. New results have been shown in Supplemental Figure 1 with statistical analyses. The rewritten parts are as follows:

      We recruited 22 participants (19 women and three men, aged 21-28 years) from Kio University who were not affiliated with our laboratory, including students and staff members. All participants passed a screening test based on taste sensitivity. According to the responses obtained from a pre-experimental questionnaire, we confirmed that none of the participants had any sensory abnormalities, eating disorders, or mental disorders, or were taking any medications that may potentially affect their sense of taste. All participants were instructed not to eat or drink anything for 1 hour prior to the start of the experiment. We provided them with a detailed explanation of the experimental procedures, including safety measures and personal data protection, without revealing the specific goals of the study.

      (8) While the use of English is generally good, there are many instances where the English is a bit awkward. I recommend that the authors ask a native English speaker to edit the text.

      Thank you for this comment. The text has been edited by a native English speaker.

      Minor concerns

      (1) Lines 13-14: The authors state that "the concept of 'kokumi' has garnered significant attention in gustatory physiology and food science." This is an exaggeration. Kokumi has generated considerable interest in food science but has yet to generate much interest in gustatory physiology.

      We have rewritten this part: “The concept of “kokumi” has generated considerable interest in food science but kokumi has not been well studied in gustatory physiology.”

      (2) Line 20: The use of "specific taste" is unclear in this context. The authors indicate (in Figure 5A) that 1 mM ornithine generates a CT nerve response. They also reveal (in Figure 1A) that rats do not prefer 1 mM ornithine over water. The results from a preference test do not provide insight into whether a solution can be tasted; they merely demonstrate a lack of preference for that solution. Based on these data, the authors cannot infer that 1 mM ornithine cannot be tasted.

      We agree with the reviewer’s comment. Ornithine at 1 mM concentration may have a weak taste because this solution elicited a small neural response (Fig. 5-A). We have rewritten the text: “… at a concentration without preference for this solution.”

      (3) Line 44: Sensory information from foods enters the oral and the nasal cavity.

      The nasal cavity has been added.

      (5) Lines 59: The terms "thickness", "mouthfulness" and "continuity" are not intuitive in English, and may reflect, at least in part, a failure in translation. The word thickness implies a tactile sensation (e.g., owing to high viscosity), but the authors use it to indicate a flavor that is more intense and onsets more quickly. The word mouthfulness is supposed to indicate that a flavor is experienced throughout the oral cavity. The problem here is that this happens with all tastants, independent of the presence of substances like ornithine. Indeed, taste buds occur in a limited portion of the oral epithelium, but we nevertheless experience tastes throughout the oral cavity, owing to a phenomenon called tactile referral (see the following reference: Todrank and Bartoshuk, 1991, A taste illusion: taste sensation localized by touch" Physiology & Behavior 50:1027-1031). The word continuity does not imply that the taste is long-lasting or persistent.

      These three attributes were originally introduced by Ueda et al. (1990), who translated Japanese terms describing the profound characteristics of kokumi, which are deeply rooted in Japanese culinary culture. However, these simply translated terms have caused global misunderstanding and confusion, because they sound like somatosensory rather than gustatory descriptions. Therefore, to clarify that kokumi attributes are inherently gustatory, in the revised version we use the terms “intensity of whole complex tastes (rich flavor with complex tastes)” instead of thickness, “mouthfulness (spread of taste and flavor throughout the oral cavity),” and “persistence of taste (lingering flavor)” instead of continuity.

      The results of this study indicate that ornithine enhances umami, sweetness, fat taste, and saltiness, leading to the enhancement of complex flavors—referred to as intensity of whole taste. The activation of various taste cells, resulting in the enhancement of multiple tastes, may contribute to the sensation of flavors spreading throughout the oral cavity. Furthermore, the strong enhancement of MSG and MPG suggests that glutamate contributes to the mouthfulness and persistence of taste characteristic of kokumi.

      (6) Figure legends: The authors provide results of statistical comparisons in several of the figures. They need to explain what statistical procedures were performed. As it stands, it is impossible to interpret the asterisks provided.

      We have explained statistical procedures in each Figure legend.

      (7) I did not see any reference to the sources of funding or any mention of potential conflicts of interest.

      We have added the following information:

      Funding: JSPS KAKENHI Grant Numbers JP17K00935 (to TY) and JP22K11803(to KU).

      Declaration of interests: The authors declare that they have no competing interests.

      Reviewer #3 (Recommendations for the authors):

      (1) I suggest that the authors increase their level of interest in glutathione and gamma-glutamyl peptides. This might include an appropriate gamma-glutamyl control substance in the two-bottle preference study (see Public Review). It might also include more careful attention to the work that identified glutathione as an activator of the CaSR (Wang et al., JBC 2006) and the nature of its binding site on the CaSR which overlaps with its site for L-amino acids (Broadhead et al., JBC 2011). This latter article also identified S-methyl glutathione, in which the free-SH group is blocked, as a high-potency activator of the CaSR. It would be expected to show comparable potency to gamma-glu-Val-Gly in assays of kokumi taste.

      We have appropriately referenced glutathione and gamma-Glu-Val-Gly, potent agonists of CaSR, where necessary. In our previous study (Yamamoto and Mizuta, Chem Senses, 2022), we examined the additive effects of these substances on basic taste stimuli in rodents, and the results were compared in greater detail with those obtained from the addition of ornithine in the present study. We have also discussed the potential binding of ornithine to other receptors, including CaSR and T1R1/T1R3 heterodimers.

      (2) Figures:

      -None of the figures were labelled with their Figure numbers. I have inferred the Figure numbers from the legends and their positions in the pdf.

      We are sorry for this inconvenience.

      - The labelling of Figure 1 and Figure 2 are problematic. In Figure 1 it should be made clear that the horizontal axes refer to the Ornithine concentration. In Figure 2 it should be made clear that the horizontal axes refer to the tastant concentrations (MSG, IMP, etc) and that the Ornithine concentrations were fixed at either zero or 1.0 mM.

      We are sorry for the lack of information about the horizontal axes. We have explained the horizontal axes in figure legends in Figs. 1 and 2. The labelling of both figures has also been modified to make this clear.

      - Figure 3B: 'Control' should appear at the top of this panel since the panels that follow all refer to it.

      Following the reviewer’s suggestion, we have added ‘Control’ at the top of Figure 3B.

      - Figure 5A. Provide a label for the test substance, presumably Ornithine.

      Yes, we have added ‘Ornithine’.

      - Figure 7 would be strengthened by the inclusion of immunohistochemistry analyses of the CaSR.

      We are sorry that we did not analyze immunohistochemistry for the CaSR because a previous study precisely had analyzed the CaSR expression on taste cells in rats. We have analyzed co-expression of GPRC6A and CaSR (see Supplemental Figure 3).

      (3) Other Matters:

      - Line 38: list the five basic taste modalities here.

      Yes, we have included the five basic taste modalities here.

      - Line 107: 'even if ... kokumi ... is less developed in rodents' - if there is evidence that kokumi is less developed in rodents it should be cited here.

      We cannot cite any references here because no studies have compared the perception of kokumi between humans and rodents.

      - Line 308: 'recently we conducted experiments in rats using gallate ...' - the authors appear to imply that they performed the research in Reference 43, however, I was unable to find an overlap between the two lists of authors.

      We are not doing a similar study as the research in Reference 43 (40 in the revised paper). Following the result that gallate is an agonist of GPRC6A as shown by Reference 43, we were interested in doing similar behavioral experiments using gallate instead of ornithine.

      The sentences have been rewritten to avoid misunderstanding.

      - Line 506: the sections are said to be 20 mm thick - should this read 20 micrometers?

      Thank you. We have changed to 20 micrometers.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      The Bagnat and Rawls groups' previous published work (Park et al., 2019) described the kinetics and genetic basis of protein absorption in a specialized cell population of young vertebrates termed lysosome-rich enterocytes (LREs). In this study they seek to understand how the presence and composition of the microbiota impacts the protein absorption function of these cells and reciprocally, how diet and intestinal protein absorption function impact the microbiome. 

      Strengths of the study include the functional assays for protein absorption performed in live larval zebrafish, which provides detailed kinetics on protein uptake and degradation with anatomic precision, and the gnotobiotic manipulations. The authors clearly show that the presence of the microbiota or of certain individual bacterial members slows the uptake and degradation of multiple different tester fluorescent proteins. 

      To understand the mechanistic basis for these differences, the authors also provide detailed single-cell transcriptomic analyses of cells isolated based on both an intestinal epithelial cell identity (based on a transgenic marker) and their protein uptake activity. The data generated from these analyses, presented in Figures 3-5, are valuable for expanding knowledge about zebrafish intestinal epithelial cell identities, but of more limited interest to a broader readership. Some of the descriptive analysis in this section is circular because the authors define subsets of LREs (termed anterior and posterior) based on their fabp2 expression levels, but then go on to note transcriptional differences between these cells (for example in fabp2) that are a consequence of this initial subsetting. 

      Inspired by their single-cell profiling and by previous characterization of the genes required for protein uptake and degradation in the LREs, the authors use quantitative hybridization chain reaction RNA-fluorescent in situ hybridization to examine transcript levels of several of these genes along the length of the LRE intestinal region of germ-free versus mono-associated larvae. They provide good evidence for reduced transcript levels of these genes that correlate with the reduced protein uptake in the mono-associated larval groups. 

      The final part of the study (shown in Figure 7) characterized the microbiomes of 30-day-old zebrafish reared from 6-30 days on defined diets of low and high protein and with or without homozygous loss of the cubn gene required for protein uptake. The analysis of these microbiomes notes some significant differences between fish genotypes by diet treatments, but the discussion of these data does not provide strong support for the hypothesis that "LRE activity has reciprocal effects on the gut microbiome". The most striking feature of the MDS plot of Bray Curtis distance between zebrafish samples shown in Figure 7B is the separation by diet independent of host genotype, which is not discussed in the associated text. Additionally, the high protein diet microbiomes have a greater spread than those of the low protein treatment groups, with the high protein diet cubn mutant samples being the most dispersed. This pattern is consistent with the intestinal microbiota under a high protein diet regimen and in the absence of protein absorption machinery being most perturbed in stochastic ways than in hosts competent for protein uptake, consistent with greater beta dispersal associated with more dysbiotic microbiomes (described as the Anna Karenina principle here: https://pubmed.ncbi.nlm.nih.gov/28836573/). It would be useful for the authors to provide statistics on the beta dispersal of each treatment group. 

      Overall, this study provides strong evidence that specific members of the microbiota differentially impact gene expression and cellular activities of enterocyte protein uptake and degradation, findings that have a significant impact on the field of gastrointestinal physiology. The work refines our understanding of intestinal cell types that contribute to protein uptake and their respective transcriptomes. The work also provides some evidence that microbiomes are modulated by enterocyte protein uptake capacity in a diet-dependent manner. These latter findings provide valuable datasets for future related studies. 

      We thank the reviewer for their thorough and kind assessment. We appreciate the suggestion for edits and for pointing out areas that need further clarification.

      One point that clearly needs further explanation is the use fabp6 (referred to as fabp2 by the reviewer) to define anterior LREs and their gene expression pattern. which includes high levels of fabp6. This was deemed by the reviewer as a “circular argument”.  We would like to clarify that the rationale for using fabp6 as anchor is that we had previously reported overlap between fabp6 and LREs (see Fig.6C-E in Wen et al. PMID: 34301599) and thus were able here to define fabp6’s spatial pattern in relation to other LRE markers and the neighboring ileocyte population using transgenic markers and HCR. Thus, far from being a circular argument, using fabp6 allowed us to identify other markers that are differentially expressed between anterior and posterior LREs, which share a core program that we highlight in our study. In the revised manuscript we will clarify this point.

      We will also add the analysis suggested for the 16S rRNA gene sequencing data, include statistics on beta dispersal, and expand the discussion of these data as suggested.

      Reviewer #2 (Public review): 

      Summary: 

      The authors set out to determine how the microbiome and host genotype impact host protein-based nutrition. 

      Strengths: 

      The quantification of protein uptake dynamics is a major strength of this work and the sensitivity of this assay shows that the microbiome and even mono-associated bacterial strains dampen protein uptake in the host by causing down-regulation of genes involved in this process rather than a change in cell type. 

      The use of fluorescent proteins in combination with transcript clustering in the single cell seq analysis deepens our understanding of the cells that participate in protein uptake along the intestine. In addition to the lysozome-rich enterocytes (LRE), subsets of enteroendocrine cells, acinar, and goblet cells also take up protein. Intriguingly, these non-LRE cells did not show lysosomal-based protein degradation; but importantly analysis of the transcripts upregulated in these cells include dab2 and cubn, genes shown previously as being essential to protein uptake. 

      The derivation of zebrafish mono-associated with single strains of microbes paired with HCR to localize and quantify the expression of host protein absorption genes shows that different bacterial strains suppress these genes to variable extents. 

      The analysis of microbiome composition, when host protein absorption is compromised in cubn-/- larvae or by reducing protein in the food, demonstrates that changes to host uptake can alter the abundance of specific microbial taxa like Aeramonas. 

      Weaknesses: 

      The finding that neurons are positive for protein uptake in the single-cell data set is not adequately discussed. It is curious because the cldn:GFP line used for sorting does not mark neurons and if the neurons are taking up mCherry via trans-synaptic uptake from EECs, those neurons should be mCherry+/GFP-; yet methods indicate GFP+ and GFP+/mCherry+ cells were the ones collected and analyzed. 

      We thank the Reviewer for the kind and positive assessment of our work, for suggestions to improve the accessibility and clarity of the manuscript, and for pointing out an issue related to a neuronal population that needs further clarification.

      We confirm that there is a population of neurons that express cldn15la (and cldn15la:GFP). They are not easily visualized by microscopy because IECs express this gene at a relatively much higher level. However, the endogenous cldn15la transcript can be found in a recently published dataset (PMID: 35108531) as well as in ours. We will add a Discussion point to clarify this issue.

      Reviewer #3 (Public review): 

      Summary: 

      Childers et al. address a fundamental question about the complex relationship within the gut: the link between nutrient absorption, microbial presence, and intestinal physiology. They focus on the role of lysosome-rich enterocytes (LREs) and the microbiota in protein absorption within the intestinal epithelium. By using germ-free and conventional zebrafishes, they demonstrate that microbial association leads to a reduction in protein uptake by LREs. Through impressive in vivo imaging of gavaged fluorescent proteins, they detail the degradation rate within the LRE region, positioning these cells as key players in the process. Additionally, the authors map protein absorption in the gut using single-cell sequencing analysis, extensively describing LRE subpopulations in terms of clustering and transcriptomic patterns. They further explore the monoassociation of ex-germ-free animals with specific bacterial strains, revealing that the reduction in protein absorption in the LRE region is strain-specific. 

      Strengths: 

      The authors employ state-of-the-art imaging to provide clear evidence of the protein absorption rate phenotype, focusing on a specific intestinal region. This innovative method of fluorescent protein tracing expands the field of in vivo gut physiology. 

      Using both conventional and germ-free animals for single-cell sequencing analysis, they offer valuable epithelial datasets for researchers studying host-microbe interactions. By capitalizing on fluorescently labelled proteins in vivo, they create a new and specific atlas of cells involved in protein absorption, along with a detailed LRE single-cell transcriptomic dataset. 

      Weaknesses: 

      While the authors present tangible hypotheses, the data are primarily correlative, and the statistical methods are inadequate. They examine protein absorption in a specific, normalized intestinal region but do not address confounding factors between germ-free and conventional animals, such as size differences, transit time, and oral gavage, which may impact their in vivo observations. This oversight can lead to bold conclusions, where the data appear valuable but require more nuance. 

      The sections of the study describing the microbiota or attempting functional analysis are elusive, with related data being overinterpreted. The microbiome field has long used 16S sequencing to characterize the microbiota, but its variability due to experimental parameters limits the ability to draw causative conclusions about the link between LRE activity, dietary protein, and microbial composition. Additionally, the complex networks involved in dopamine synthesis and signalling cannot be fully represented by RNA levels alone. The authors' conclusions on this biological phenomenon based on single-cell data need support from functional and in vivo experiments. 

      We thank the reviewer for their assessment and for pointing out some areas that need to be explained better and/or discussed further.

      The reviewer mentions some potential confounding factors (ie., size differences, transit time, oral gavage) in the gnotobiotic experiments. We would like to convey that these aspects have been addressed in our experimental design and will be clarified in our full in the revised manuscript by adding information to Methods or by adding data statements. Briefly: 1-larval sizes were recorded and found to be similar between GF and monoassociated larvae. A statement will be added to text.; 2-while intestinal transit time has been reported to be affected by microbes in larval zebrafish (PMIDs: 16781702, 28207737, 33352109) and is a topic of interest, it does not represent a confounding factor for our experiments. In our assay, luminal cargo is present at high concentrations throughout the gut and is not limiting at any point during the assay; 3-gavage, which is necessary for quantitative assays, is indeed an experimental manipulation that may somehow alter the subjects (the same is true for microscopy and virtually any research method). However, any potential effects of gavage manipulation would not explain differences between GF and CV animals or alter our conclusions about microbial or dietary effects. We will elaborate on this in the revised Discussion.

      We acknowledge that microbiota composition is prone to relatively high degrees of interindividual and interexperimental variation, and that measuring microbiota composition using 16S rRNA gene sequencing is accompanied by inherent technical limitations such as limited taxonomic resolution, primer bias, etc.  It is important to note that comparable assays such as shotgun metagenomic DNA sequencing are not currently suitable for samples such as larval zebrafish or their dissected digestive tracts where the relative superabundance of host DNA prevents adequate coverage of microbial DNA. However, 16S rRNA gene sequencing remains a mainstream assay in the larger microbial ecology field, has proven effective at revealing important impacts of environmental factors on the gut microbiota (PMIDs: 21346791, 31409661, 31324413). Our results here also illustrate how 16S rRNA gene sequencing can be a useful method to detect perturbations to the zebrafish gut microbiome. Reproducing previous findings, we detected in our samples many of the core zebrafish microbiota taxa that have been identified by other studies (PMIDs: 26339860, 21472014, 17055441). To increase the robustness of our results, we included several biological replicates for each condition, co-housed genotypes and included large sample sizes to minimize environmental variation between groups. Importantly, replicates housed in different tanks showed similar results. We will emphasize these points in the revised Discussion. To further underscore this in the revised manuscript, we will add a beta diversity plot and statistical analysis showing that the microbiome was not significantly affected by our experimental replicates.

      Regarding dopamine pathways, we thank the reviewer for pointing out that the language we used in our interpretation of this and other pathways enriched in our scRNAseq data was too strong. In the revised manuscript, we will soften those conclusions, and instead indicate that these may be areas worthy of future dedicated investigation.

      Finally, the reviewer mentions the use of inadequate statistical methods for some analyses but without specifying or indicating alternative analyses. Only the need to justify the use of two-way ANOVA was made explicit. In this point, we respectfully disagree and would like to emphasize that we use statistical methods that are standards in the field. We will nevertheless add a justification for the use of two-way ANOVA where appropriate. Briefly, the two-way ANOVA test was used to compare fluorescence profiles of gavages cargoes or HCR probes at each level along the length of the LRE region. This test accounts for differences in fluorescence between experimental conditions at each level (binned 30 μm areas) along the LRE region (~300 μm). This test allows us to capture differences in fluorescence between experimental conditions while accounting for heterogeneity in the LRE region.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, Harpring et al. investigated divisome assembly in Chlamydia trachomatis serovar L2 (Ct), an obligate intracellular bacterium that lacks FtsZ, the canonical master regulator of bacterial cell division. They find that divisome assembly is initiated by the protein FtsK in Ct by showing that it forms discrete foci at the septum and future division sites. Additionally, knocking down ftsK prevents divisome assembly and inhibits cell division, further supporting their hypothesis that FtsK regulates divisome assembly. Finally, they show that MreB is one of the last chlamydial divisome proteins to arrive at the site of division and is necessary for the formation of septal peptidoglycan rings but does not act as a scaffold for division assembly as previously proposed.

      Strengths:

      The authors use microscopy to clearly show that FtsK forms foci both at the septum as well as at the base of the progenitor cell where the next septum will form. They also show that the Ct proteins PBP2, PBP3, MreC, and MreB localize to these same sites suggesting they are involved in the divisome complex.

      Using CRISPRi the authors knock down ftsK and find that most cells are no longer able to divide and that PBP2 and PBP3 no longer localized to sites of division suggesting that FtsK is responsible for initiating divisome assembly. They also performed a knockdown of pbp2 using the same approach and found that this also mostly inhibited cell division. Additionally, FtsK was still able to localize in this strain, however PBP3 did not, suggesting that FtsK acts upstream of PBP2 in the divisome assembly process while PBP2 is responsible for the localization of PBP3.

      The authors also find that performing a knockdown of ftsK also prevents new PG synthesis further supporting the idea that FtsK regulates divisome assembly. They also find that inhibiting MreB filament formation using A22 results in diffuse PG, suggesting that MreB filament formation is necessary for proper PG synthesis to drive cell division.

      Overall the authors propose a new hypothesis for divisome assembly in an organism that lacks FtsZ and use a combination of microscopy and genetics to support their model that is rigorous and convincing. The finding that FtsK, rather than a cytoskeletal or "scaffolding" protein is the first division protein to localize to the incipient division site is unexpected and opens up a host of questions about its regulation. The findings will progress our understanding of how cell division is accomplished in bacteria with non-canonical cell wall structure and/or that lack FtsZ.

      Weaknesses:

      No major weaknesses were noted in the data supporting the main conclusions. However, there was a claim of novelty in showing that multiple divisome complexes can drive cell wall synthesis simultaneously that was not well-supported (i.e. this has been shown previously in other organisms). In addition, there were minor weaknesses in data presentation that do not substantially impact interpretation (e.g. presenting the number of cells rather than the percentage of the population when quantifying phenotypes and showing partial western blots instead of total western blots).

      We agree with the weaknesses identified by the reviewer. We removed the statements in the Results and Discussion that multiple independent divisome complexes can simultaneously direct PG synthesis. We presented the data in Figs. 3-5 as % of the cells in the population, and complete western blots are shown in Supp. Fig. S1.

      Reviewer #2 (Public review):

      Summary:

      Chlamydial cell division is a peculiar event, whose mechanism was mysterious for many years. C. trachomatis division was shown to be polar and involve a minimal divisome machinery composed of both homologues of divisome and elongasome components, in the absence of an homologue of the classical division organizer FtsZ. In this paper, Harpring et al., show that FtsK is required at an early stage of the chlamydial divisome formation.

      Strengths:

      The manuscript is well-written and the results are convincing. Quantification of divisome component localization is well performed, number of replicas and number of cells assessed are sufficient to get convincing data. The use of a CRISPRi approach to knock down some divisome components is an asset and allows a mechanistic understanding of the hierarchy of divisome components.

      Weaknesses:

      The authors did not analyse the role of all potential chlamydial divisome components and did not show how FtsK may initiate the positioning of the divisome. Their conclusion that FtsK initiates the assembly of the divisome is an overinterpretation and is not backed by the data. However, data show convincingly that FtsK, if perhaps not the initiator of chlamydial division, is definitely an early and essential component of the chlamydial divisome.

      The following statement has been included in the Discussion (pg. 16 of the revised manuscript)  “Although we focused our study on a subset of the divisome and elongasome proteins that Chlamydia expresses (bolded in Fig. 6G), our results support our conclusion that chlamydial budding is dependent upon a hybrid divisome complex and that FtsK is required for the assembly of this hybrid divisome. At this time, we cannot rule out that other proteins act upstream of FtsK to initiate divisome assembly in this obligate intracellular bacterial pathogen.”

      We will soon be submitting another manuscript that addresses how FtsK specifies the site of divisome assembly. This work is too extensive to be included in this manuscript.

      Reviewer #3 (Public review):

      Summary:

      The obligate intracellular bacterium Chlamydia trachomatis (Ct) divides by binary fission. It lacks FtsZ, but still has many other proteins that regulate the synthesis of septal peptidoglycan, including FtsW and FtsI (PBP3) as well as divisome proteins that recruit and activate them, such as FtsK and FtsQLB. Interestingly, MreB is also required for the division of Ct cells, perhaps by polymerizing to form an FtsZ-like scaffold. Here, Harpring et al. show that MreB does not act early in division and instead is recruited to a protein complex that includes FtsK and PBP2/PBP3. This indicates that Ct cell division is organized by a chimera between conserved divisome and elongasome proteins. Their work also shows convincingly that FtsK is the earliest known step of divisome activity, potentially nucleating the divisome as a single protein complex at the future division site. This is reminiscent of the activity of FtsZ, yet fundamentally different.

      Strengths:

      The study is very well written and presented, and the data are convincing and rigorous. The data underlying the proposed localization dependency order of the various proteins for cell division is well justified by several different approaches using small molecule inhibitors, knockdowns, and fluorescent protein fusions. The proposed dependency pathway of divisome assembly is consistent with the data and with a novel mechanism for MreB in septum synthesis in Ct.

      Weaknesses:

      The paper could be improved by including more information about FtsK, the "focus" of this study. For example, if FtsK really is the FtsZ-like nucleator of the Ct divisome, how is the Ct FtsK different sequence-wise or structurally from FtsK of, e.g. E. coli? Is the N-terminal part of FtsK sufficient for cell division in Ct like it is in E. coli, or is the DNA translocase also involved in focus formation or localization? Addressing those questions would put the proposed initiator role of FtsK in Ct in a better context and make the conclusions more attractive to a wider readership.

      We will be submitting another manuscript soon that details the conserved domain organization of FtsK from different bacteria, and the role of the various domains of chlamydial FtsK (including the N-terminus and the C-terminal translocase domain) in directing its localization in dividing Chlamydia. We have added text to the discussion (pg. 16 of the revised manuscript) that describes the sequence homology of chlamydial FtsK to FtsK from E. coli.

      Another weakness is that the title of the paper implies that FtsK alone initiates divisome assembly. However, the data indicate only that FtsK is important at an early stage of divisome assembly, not that it is THE initiator. I suggest modifying the title to account for this--perhaps "FtsK is required to initiate....".

      We agree with the reviewer and modified the title to “FtsK is Critical for the Assembly of the Unique Divisome Complex of the FtsZ-less Chlamydia trachomatis”. We have also modified the text throughout to indicate that FtsK is required for the assembly of the hybrid divisome of Chlamydia

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improvement (mostly minor):

      (1) For several of the graphs, the authors plot the number of cells with a given phenotype on the y-axis, but then describe percentages of cells in the text. It would make it clearer if all the graphs had the percentage of cells on the y-axis instead.

      We have modified the figures to indicate the percentage of cells on the y-axis with a given phenotype.

      (2) In Figures 3, 4, and 5 the authors show separate graphs for plus/minus drug or inducer. These should be on the same graph as they are directly comparing these two different conditions. Having them on separate graphs makes it less clear whether these differences are significant between the two conditions

      We modified Fig. 4 to show +/- inducer in ftsk and pbp2 knockdown strains in the same graph.  Regarding Figures 3 and 5, we believe the figures in the original submission effectively demonstrate the +/- drug conditions, so these figures remain unchanged in the revised manuscript.

      (3) In Figure 2 the authors show microscopy of the colocalization of FtsK with several other divisome proteins from Ct. Quantification of the colocalization of FtsK with these other proteins would provide a more holistic understanding of their colocalization and help further support their argument that FtsK initiates the assembly of the divisome.

      Supp. Fig. S4A of the revised manuscript contains images showing the colocalization of FtsK with the fusions at the septum and the base of dividing cells, and the colocalization of FtsK with the fusions that are only at the base of dividing cells. Supp. Fig. S4B quantified the percentage of dividing cells where FtsK overlaps the localization of each of the fusions at the septum, at the septum and the base, and at the base alone.

      (4) In Figure 6 the authors mention that the PG ring was at a slight angle relative to the MOMP-stained septum. What is the significance of this? The authors mention it several times but do not explain its relevance to divisome assembly. It is not really evident in the images presented.

      We mention in the discussion pgs. 17-18 of the revised manuscript that “The relevance of the angled orientation of PG and MreC rings relative to the MOMP-stained septum in division intermediates is unclear. However, it appears to be a conserved feature of the cell division process and may arise because the divisome proteins are often positioned slightly above or below the plane of the MOMP-stained septum. The positioning of divisome proteins above or below the septum is indicated in Figs. 1 and 2.

      We included cartoons in Fig. 6C of the revised manuscript to assist the reader in visualizing the angled orientation of the PG ring relative to the MOMP-stained septum.

      (5) In line 270 the authors claim that "these are the first data in any system to suggest that septal PG synthesis/modification is simultaneously directed by multiple independent divisome complexes." However, their experiments do not demonstrate that multiple divisome complexes are active at the same time. They show that multiple foci of FtsK etc. are present at sites where PG synthesis has occurred, but that does not necessarily mean that each focus/complex was actively synthesizing PG at the same time. Moreover, similar approaches were used to support a claim that septal PG synthesis is directed by multiple discrete divisome complexes previously (e.g. in Figure 1 of Bisson-Filho et al. 2017 (PMID: 28209898) in Bacillus subtilis and in Perez et al 2021 (PMID: 33269494) in Streptococcus pneumoniae). This claim is not central to the main conclusions of the study and could just be removed.

      This statement has been removed from the Results and the Discussion.

      (6) In Figure 6B the authors see three distinct FtsK foci. Why is this the only place in the manuscript where they see three foci? They mentioned previously that they saw foci at the septum and at the base of the progenitor mother cell, but why are there three foci here?

      The vast majority of dividing cells displayed one foci at the septum and/or the base.  Representative images were chosen that reflected the localization profiles observed in the majority of cells. While we observed cells with  multiple foci, as shown in Figure 6C, these cells were relatively rare   (~2% of cells for all the divisome proteins in 3 independent experiments).  Since  the number of cells with multiple foci were relatively rare, we chose to group these cells with the cells that had single foci at the septum, the septum and base, or base alone categories in the quantification shown in Fig. 2C. This is stated in the legend of Fig. 2 of the revised manuscript.

      (7) The Discussion section is lacking a couple of things that would put the data in a broader context. Can the authors speculate on how FtsK knows how to find the division site? I.e. what might be upstream of FtsK localization? Additionally, the authors do not talk about the FtsK sequence or domains at any point in the paper. Does Ct FtsK have a similar sequence/structure to FtsKs from other bacteria? Are there any differences in sequence/structure that might tell us about its function in Ct?

      We will be submitting another manuscript soon that examines how the site of assembly of the divisome is defined in dividing Chlamydia. This manuscript will also define the localization of the different sub-domains of chlamydial FtsK during cell division.  For this manuscript, we added a paragraph in the Discussion (pg. 16 of the revised manuscript) that states the domain organization is conserved in FtsK proteins from different bacteria. This paragraph includes information regarding the % sequence identity of the C-terminus and the N-terminus of chlamydial FtsK when compared to E. coli FtsK.

      (8) For Supplementary Figure S1B-C. The authors should show the full blots rather than just the single band of the protein of interest to show that the antibodies are specific. Additionally, the authors should include a loading control to show that they loaded the same amount of protein for each sample.

      We have included the full blots in Supp. Fig. S1 of the revised manuscript. We do not see the need for including a loading control for these blots because we are not making arguments about the relative level of the proteins that were assayed. We only use the blots to show that the fusion proteins are primarily a single species of the predicted molecular mass.

      (9) In Supplementary Figure S4A the authors use RT-qPCR to measure ftsK and pbp2 transcript levels. Since they have antibodies against these proteins, they should also include Western blots to show that the proteins are not being produced when targeted using CRISPRi.

      We have included data in Supp. Fig. S5E of the resubmission that indicates foci of FtsK and PBP2 could not be detected following the knockdown of ftsk and pbp2. We feel that these data support our conclusion that the induced expression of dCas12 in the the ftsk and pbp2 knockdown strains results in the downregulation of the endogenous FtsK and PBP2 polypeptides.

      (10) In lines 261-262 the authors say that "PG organization was the same or differed at the septum." What is the PG organization being compared to? Same or different from what?

      We agree with the reviewer that the text in lines 261-262 in the original submission was confusing.  The text has been modified.

      (11) Lines 201-215 the authors refer to Supplementary Figure S3 throughout this section, but they should refer to Supplementary Figure S4.

      This has been corrected.

      Reviewer #2 (Recommendations for the authors):

      I am not convinced that this paper shows that FtsK initiates the assembly of the divisome since the authors did not analyse the role and localization of all other chlamydial divisome components. Out of the ten homologues of divisome and elongasome components encoded by C. trachomatis genome, only five are investigated in this study. There is no explanation about how these five were chosen.

      We state on pg. 16 of the revised manuscript that “Although we focused our study on a subset of the divisome and elongasome proteins that Chlamydia expresses (bolded in Fig. 6G), our results support our conclusion that chlamydial budding is dependent upon a hybrid divisome complex and that FtsK is required for the assembly of this hybrid divisome. At this time, we cannot rule out that other proteins act upstream of FtsK to initiate divisome assembly in this obligate intracellular bacterial pathogen.

      Results convincingly indicate that FtsK is an early divisome component, but proofs are lacking to indicate that it initiates the divisome formation. Indeed, the authors do not show how FtsK would be the first protein to selectively accumulate at a given location to initiate the divisome formation. For this reason, the model they propose at the end of their study is not backed by sufficient data, to my opinion.

      We agree with the reviewer that our data does not show that FtsK initiates divisome assembly. The title of the manuscript has been modified to “FtsK is Critical for the Assembly of the Unique Divisome Complex of the FtsZ-less Chlamydia trachomatis” and the text throughout has been modified to indicate that FtsK is the first protein we assayed that associates with nascent divisomes at the base of dividing cells. We will soon be submitting another manuscript that details how FtsK is recruited to a specific site to initiate nascent divisome assembly, This work is too extensive to be included in this manuscript.

      There are also discrepancies in the number of cells analysed to quantify the localization of divisome components, ranging from 50 to 250 cells. The authors could better explain why there are such variations.

      There were differences in the number of cells analyzed in the various experiments, but in every instance the effect of inhibitors (A22 and mecillinam) or ftsk and pbp2 knockdown on divisome assembly was statistically significant.

      There are a few mistakes in the text regarding figure numbering (Figure S4 is mentioned as S3 in the text). Figures 5B and D are not specifically cited.

      These mistakes have been corrected in the revised manuscript.

      Line 261-262: the sentence starting "Our imaging analysis.." is not clear to me.

      We agree with the reviewer that the text in lines 261-262 was confusing.  The text has been modified (pg. 14 of the revised manuscript).

      Line 270-271: there are insufficient proofs to say that there are multiple independent divisome complexes. This is in my opinion an overinterpretation of the data, since there is no proof that these complexes are independent.

      This statement has been removed from the text.

      A few details are lacking in the figure legends:

      Figure 2C: when was the expression of the different mCherry and 6xHis constructs induced?

      The onset and length of the induction of the fusions have been included in the legend of Fig. 2.

      Bars are sometimes mentioned as uM and should be um. Bars sizes, number of replicates, and/or meaning of the error bars are lacking in legends of Figures S2, S3, and S4

      This has been corrected in the revised manuscript.

      The consistency of Figures could be improved between Figures 3A, 4A, B, and 5A. The results of treated cells could be always shown as dark grey. It would help the reader.

      We have used consistent coloring in Figs. 3-5 to indicate the treated cells.

      Reviewer #3 (Recommendations for the authors):

      (1) Lines 113-118: do Ct cells increase in size as they get closer to starting division? If so, could a pseudo-time course (demograph) be done to bolster the evidence that the base foci formed mainly in predivisional cells and not newborn cells? This evidence might be more convincing than the data in Figures 1F and G.

      Chlamydial cells in the population were heterogeneous in size at the timepoint we are studying. This observation is consistent with previous reports in the literature (Liechti et al.,2021). While we agree that a pseudo-time course could potentially bolster the evidence about when FtsK foci appear, we believe our current analysis sufficiently demonstrates that basal foci of FtsK appear prior to the appearance of new buds at the base of dividing cells.

      (2) Figure 3E: It looks like MreC localization to foci doesn't strictly require MreB polymerization. Is this known for E. coli or other species?

      To our knowledge, MreC assembly into a filament has not been shown to be dependent upon MreB in other bacteria.  In Caulobacter crescentus, MreC forms a helical structure that is not dependent upon MreB or MreB filament formation (Dye et al., 2005. PNAS; Divakaruni et al., 2005. PNAS).

      (3) Figure 5E: why is nearly half of PBP2 and PBP3 still localized to foci at the membrane even after treatment with mecillinam? This suggests, as the authors mention, that mecillinam reduces the efficiency of localization to the divisome but does not eliminate it. Any ideas why?

      At this time, we do not know why inhibiting the catalytic activity of PBP2 with mecillinam does not fully prevent the association of PBP2 with the chlamydial divisome. We have included a statement in the Results (pg. 13 of the revised manuscript) that inhibiting the catalytic activity of PBP2 prevents it from efficiently associating with or maintaining its association with polarized divisome complexes.

      (4) Line 262-263: This sentence is confusing-please rephrase. The same as what? Differed from what?

      We agree with the reviewer. The wording in lines 262-263 of the original submission has been modified.  

      (5) Lines 265-267 and Figure 6: Adding cartoon schematics might help readers visualize cell orientations in Fig. 6 (especially 6B).

      Cartoons have been added to Fig. 6C (Fig. 6B in the original submission) to orient the reader.

      (6) Line 294-298: Do the authors think that the residual 5-10% of PG foci after FtsK knockdown is due to the ability of residual FtsK to organize divisomes?

      We show that knockdown of FtsK is not complete, and while we cannot be certain, it is likely, that the PG foci detected in FtsK knockdown cells is due to the ability of the residual FtsK to organize divisomes that direct PG synthesis.

      (7) Do the authors have any evidence that FtsK foci are mobile like treadmilling FtsZ?

      We have not performed real-time imaging studies, and we currently have no evidence that FtsK foci are mobile.

      (8) FtsK foci here are reminiscent of mobile foci formed by the FtsK-like SpoIIIE at the Bacillus subtilis sporulation septum. This might be a good idea to mention in the Discussion. Is it possible that Ct FtsK is also involved in coordinating chromosome partitioning through the developing septum? (That is another reason why it would be useful to know if the translocase domain was dispensable for localization/activity).

      We are currently preparing another manuscript that documents the contribution of the various domains of FtsK to its localization profile and whether the division defect in ftsk knockdown cells can be suppressed by specific subdomains of FtsK. This manuscript not only will include these data, it will also include experiments that address how the site of polarized budding is defined. In the revised manuscript, we have included a description of how the domain organization of chlamydial FtsK is similar to E. coli FtsK (pg. 16 of revision). Chlamydial FtsK also has a similar domain organization as SpoIIIE from B. subtilis. The C-terminal catalytic domain of SpoIIIE is 45% identical to chlamydial FtsK. The N-terminus of SpoIIIE is predicted to encode 4 transmembrane spanning helices, like chlamydial FtsK. However, the N-terminus of SpoIIIE shares no sequence homology with the N-terminus of chlamydial FtsK.  We have not included the similar domain organization of SpoIIIE and chlamydial FtsK in the revised manuscript.

      (9) It seems that FtsK foci localize to a particular spot opposite from the active septum, although how this spot is specified is not clear. Is there any geometric clue for FtsK's localization like there is for Min-specified FtsZ localization?

      As mentioned above, we are currently preparing another manuscript that documents our efforts to understand how the site of polarized budding is defined.  This analysis is too extensive to include in this study.

      (10) As mentioned in the Summary, do the authors know whether the N-terminal membrane binding part of FtsK (FtsKn) sufficient for localization/divisome assembly in Ct as it is in other species? Oullette et al. 2012 showed that FtsKn could interact with MreB in BACTH.

      We are currently preparing another manuscript that documents the contribution of the various domains of FtsK to its localization profile.

      (11) The previous BACTH result with MreB and FtsKn implies that this interaction is direct, yet the current data suggest that this is not the case. Can the authors comment on this? Is this due to bridging effects inherent in the BACTH system?

      We have not presented any data to indicate that FtsK and MreB do not interact. We have only shown that FtsK localization is not dependent upon MreB filament formation (Fig. 3).

      (12) The FtsZ-independent role of FtsK in nucleating the divisome suggests that Ct FtsK may differ from other FtsKs structurally - can this be explored, perhaps with AlphaFold 3?

      As mentioned above, we have included a paragraph in the discussion of the revised manuscript (pg. 16 of the revised manuscript) that states the domain organization of chlamydial FtsK is similar to E.coli FtsK. This conserved domain organization is evident when we view the structures of the proteins using Alphafold.

      (13) Typo on line 559: should be HeLa.

      This has been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a comprehensive exploration of the role of liver-specific Survival Motor Neuron (SMN) depletion in peripheral and central nervous system tissue pathology through a well-constructed mouse model. This study is pioneering in its approach, focusing on the broader physiological implications of SMN, which has traditionally been associated predominantly with spinal muscular atrophy (SMA).

      Strengths:

      (1) Novelty and Relevance: The study addresses a significant gap in understanding the role of liver-specific SMN depletion in the context of SMA. This is a novel approach that adds valuable insights into the multi-organ impact of SMN deficiency.

      (2) Comprehensive Methodology: The use of a well-characterized mouse model with liver-specific SMN depletion is a strength. The study employs a robust set of techniques, including genetic engineering, histological analysis, and various biochemical assays.

      (3) Detailed Analysis: The manuscript provides a thorough analysis of liver pathology and its potential systemic effects, particularly on the pancreas and glucose metabolism.

      (4) Clear Presentation: The manuscript is well written. The results are presented clearly with well-designed figures and detailed legends.

      We thank the reviewer for their positive comments. They had some concerns for us to consider (see below). We provide a point-by-point response to their comments.

      Weaknesses:

      (1) Limited Time Points: The study primarily focuses on a single time point (P19). This limits the understanding of the temporal progression of liver and pancreatic pathology in the context of SMN depletion. Longitudinal studies would provide a better understanding of disease progression.

      We thank the reviewer for the suggestion. We extended our analysis to include P60 mice and performed both liver and pancreatic analyses at this time point to address this suggestion.

      (2) Incomplete Recombination: The mosaic pattern of Cre-mediated excision leads to variability in SMN depletion, which complicates the interpretation of some results. Ensuring more consistent recombination across samples would strengthen the conclusions.

      The variability in Cre-mediated excision is inherently stochastic, influenced by factors such as Cre expression levels, timing of recombination, and the accessibility of the target locus in individual cells. Achieving complete consistency across samples is particularly challenging, especially given the complexity of our breeding scheme, which occasionally results in litters without any animals of the desired genotype. Importantly, our study not only demonstrates that liver-specific SMN depletion results in liver alterations and pancreatic dysfunction but also highlights the limitations and challenges associated with this mouse model. By doing so, we aim to provide valuable insights for other researchers considering similar approaches in future studies.

      Reviewer #2 (Public review):

      Summary:

      Marylin Alves de Almeida et al. developed a novel mouse cross via conditionally depleting functional SMN protein in the liver (AlbCre/+;Smn2B/F7). This mouse model retains a proportion of SMN in the liver, which better recapitulates SMN deficiency observed in SMA patients and allows further investigation into liver-specific SMN deficiency and its systemic impact. They show that AlbCre/+;Smn2B/F7 mice do not develop an apparent SMA phenotype as mice did not develop motor neuron death, neuromuscular pathology or muscle atrophy, which is observed in the Smn2B/- controls. Nonetheless, at P19, these mice develop mild liver steatosis, and interestingly, this conditional depletion of SMN in the liver impacts cells in the pancreas.

      Strengths:

      The current model has clearly delineated the apparent metabolic perturbations which involve a significantly increased lipid accumulation in the liver and pancreatic cell defects in AlbCre/+;Smn2B/F7 mice at P19. Standard methods like H&E and Oil Red-O staining show that in AlbCre/+;Smn2B/F7 mice, their livers closely mimic the livers of Smn2B/- mice, which have the full body knockout of SMN protein. Unlike previous work, this liver-specific conditional depletion of SMN is superior in that it is not lethal to the mouse, which allows an opportunity to investigate the long-term effects of liver-specific SMN on the pathology of SMA.

      We thank the reviewer for their positive comments. They had some concerns for us to consider (see below). We provide a point-by-point response to their comments (review comments in black, our response in red).

      Weaknesses:

      Given that SMA often involves fatty liver, dyslipidemia and insulin resistance, using the current mouse model, the authors could have explored the long-term effects of liver-specific depletion of SMN on metabolic phenotypes beyond P19, as well as systemic effects like glucose homeostasis. Given that the authors also report pancreatic cell defects, the long-term effect on insulin secretion and resistance could be further explored. The mechanistic link between a liver-specific SMN depletion and apparent pancreatic cell defects is also unclear.

      We extended our analysis to include P60 mice and performed both liver and pancreatic analyses at this time point to address this suggestion. In addition, we discussed the liver-pancreas axis in the Discussion.

      Discussion:

      This current work explores a novel mouse cross in order to specifically deplete liver SMN using an Albumin-Cre driver line. This provides insight into the contribution of liver-specific SMN protein to the pathology of SMA, which is relevant for understanding metabolic perturbations in SMA patients. Nonetheless, given that SMA in patients involve a systemic deletion or mutation of the SMN gene, the authors could emphasize the utility of this liver-specific mouse model, as opposed to using in vitro models, which have been recently reported (Leow et al, 2024, JCI). Authors should also discuss why a mild metabolic phenotype is observed in this current mouse model, as opposed to other SMA mouse models described in literature.

      We appreciate the reviewer’s insightful comment. We have thoroughly addressed this suggestion in the Discussion section, particularly in lines 284-298; 309-322 and 334-359.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Longitudinal Studies: Conducting studies at maybe one more time points postnatally to provide a clearer picture of how liver-specific SMN depletion affects tissue pathology over time.

      We extended our analysis to include P60 mice and performed both liver and pancreatic analyses at this time point to address this suggestion.

      (2) Functional Assays: Incorporate glucose tolerance tests, insulin sensitivity tests, and more detailed metabolic profiling to better understand the physiological consequences of liver-specific SMN depletion on glucose metabolism and pancreatic function.

      We sincerely thank the reviewer for this suggestion. We have included a full panel of metabolic hormones associated with glucose metabolism from animals at P19 and P60. These new data, along with additional figures, have now been provided in our revised manuscript.

      (3) Mechanism: Discuss the molecular pathways affected by SMN depletion in the liver and pancreas. Mechanistic studies including transcriptomic or proteomic analyses to identify dysregulated pathways will help.

      We appreciate the reviewer’s insightful comment. We have thoroughly addressed this suggestion in the Discussion section, particularly in lines 284-298 and 334-359.

      (4) Typos in the abstract: beta cells secret insulin and alpha cells produce gulcagon. 

      Thank you for catching this error. It has been corrected to reflect that insulin is produced by beta cells and glucagon by alpha cells.

      (5) Efficiency and specificity of the Alb-Cre: if possible, cross the Alb-Cre with the Rosa26 reporter line to test the efficiency and specificity of the Alb-Cre.

      We agree that this would provide valuable insights. However, initiating a new breeding program to generate the required genotypes would take over a year and is beyond the scope of this study. To address this in part, we performed Cre immunostaining of the liver, pancreas, and spinal cord at P19, as well as the liver at P60. These results, now included in Supplemental Figure 1, demonstrate liver-specific expression and variability across hepatocytes.

      Reviewer #2 (Recommendations for the authors):

      The title of this manuscript is potentially misleading. The manuscript largely investigates the involvement of SMN protein on peripheral organs such as the liver, muscles, neuromuscular junction, and the pancreas. Yet, the title could be interpreted that the peripheral nervous system or central nervous system is the main focus. The title should be edited to indicate key terms such as "motor neuron and peripheral tissue pathology".

      Thank you for pointing this out. We have revised the title to better represent the study’s focus. It is now “Impact of liver-specific survival motor neuron (SMN) depletion on central nervous system and peripheral tissue pathology”.

      Suggestions:

      Please clarify and explain clearly the various mouse lines (Smn2B/+, Smn2B/- and +/+; Smn2B/F7 ) used as controls as the nomenclature used is confusing. In addition, authors could consider the use of a wild-type mouse line to be used as a control to validate changes in AlbCre/+;Smn2B/F7 mice.

      We have now provided clarification on mouse line nomenclature in the Results section (lines 104–124). Full-body heterozygous mice (_Smn_2B/+) are used as controls due to their slightly reduced SMN protein levels and absence of phenotypic changes compared to wild-type mice.

      Given that the main phenotype implicated by the liver-specific depletion of SMN protein in AlbCre/+;Smn2B/F7 mice is pancreatic abnormalities (changes in alpha- and beta- cell numbers and blood glucose levels), authors should expand further on the pancreatic phenotype.  

      We added a full panel of metabolic hormones related to glucose metabolism in animals at P19 and P60. Furthermore, this has been discussed in detail in lines 284-298 and 334-344 of the Discussion.

      A pancreas-specific depletion of SMN would provide this current manuscript with a better understanding of the role of SMN in regulating SMA pathology and provide more definitive conclusions on the contribution of liver-specific SMN depletion on normal pancreatic function.

      We agree that this would be very informative. However, to do this would require initiation of a new breeding program that will take more than a year to arrive at the right genotypes. Although valuable, it is beyond the scope of the present study.

      The authors should also delineate the role of hepatic SMN in pancreatic function, and how the intrinsic liver-specific loss of SMN directly impacts the pancreas. Currently, literature demonstrates that the fatty liver phenotype in SMA patients is a primary SMN-dependent hepatocyte-intrinsic liver defect associated with mitochondrial and other hepatic metabolism implications (see Leow et al, 2024 J Clin Invest). Given that the authors describe that SMN protein levels are not altered in the pancreas of AlbCre/+;Smn2B/F7 mice at P19, the authors ought to clarify how pancreas development and function is impacted in this mouse model, whether in-utero or postnatally. This could potentially underscore the cross-talk between liver SMN and pancreas function.

      We have discussed the relationship between hepatic SMN and pancreatic function in the Discussion at lines 284-298 and 334-359.

      Authors should also perform some metabolic tolerance tests to both oral glucose and insulin at an older age (e.g. P60) to study their homeostasis in these mice. These would help to substantiate the authors' conclusion and provide the paper with a greater level of novelty.

      We thank the reviewer for this suggestion. A full panel of metabolic hormones related to glucose metabolism at P19 and P60 has been included, supported by additional figures that enhance the manuscript's novelty and depth.

      Authors mentioned in the Discussion in lines 238 to 240: "Altogether, our findings underscore the necessity of conducting further investigations at later time points to unveil potential modifications in other pathways and their repercussions on liver physiology". Please elucidate the effects of longer term liver-specific depletion of SMN beyond P19, such as the onset of NAFLD or a diabetic phenotype due to pancreatic dysfunctions.

      We extended our data to include P60 mice and performed liver and pancreatic analyses at these time points. The observed effects were transient, possibly due to the stochastic nature of Cre expression.

      In addition, while AlbCre/+;Smn2B/F7 mice had similar weight gain trends as controls, it does appear that AlbCre/+;Smn2B/F7 mice weigh more than their controls by P60 (Figure 9C). This data would provide more convincing evidence of the metabolic defects observed in these mice.

      As per the reviewer’s suggestion, we included new data (Figure 9D) showing % weight gain at P60 normalized to basal weight at P7. However, no statistically significant differences were detected.

      Other than protein quantification, authors should perform immunohistochemistry or in-situ hybridization of SMN and imaging of AlbCre/+;Smn2B/F7 organs to validate the loss of liver-specific SMN. It is unclear from western blots that the expression of SMN is only in hepatocytes.

      We thank the reviewer for the suggestion. Unfortunately, SMN antibodies have not produced reliable tissue immunostaining. To address this, we performed Cre immunostaining of the liver, pancreas, and spinal cord at P19, and the liver at P60, which demonstrated liver-specific expression. These results are now included in Supplemental Figure 1.

      Authors should consider re-wording lines 228 through 231: "While our current analysis did not reveal significant differences in AlbCre/+;Smn2B/F7 mice, the observed upward trend in transferrin and HO levels suggests ongoing changes in iron metabolism, which may not be fully manifested at P19". Alternatively, a higher number of mouse samples would allow them to qualify this statement. Authors should also consider comparing levels of liver biomarkers such as ALT and AST, to check for liver homeostatic function.

      We have removed speculative statements to avoid unsupported claims.

      Recommendations:

      The methods and additional details to generate the AlbCre/+;Smn2B/F7 should be explained better in section 2.1 of the Results. It is potentially confusing as to why these mice had to carry both 2B and F7 alleles. Additionally, the role of the F7 allele is not deliberately clear in the Introduction.

      Additional details are now included in the Introduction (lines 87-90) and the Results section (lines 104-124).

      Authors should refer to Leow et al 2024 (J Clin Invest) and discuss how their current findings compare with their hepatocyte-intrinsic SMN deficiency IPSCs model.<br /> We note a previous publication (Deguise et al 2021 Cell Mol Gastroenterol Hepatol) by the authors which characterized the Smn2B/- mouse model and its NAFLD/NASH features. From our understanding, the Smn2B/- mouse model appears to recapitulate SMA phenotype well, such as the early onset of hepatic steatosis and neurological conditions. As a follow-up to this publication, authors should discuss why this current study of a liver-specific SMN depletion is important and relevant to the study of SMA pathology.

      We thank the reviewer for the insightful suggestions. We have included a discussion of these findings and their relevance to the study of SMA pathology in lines 284-298 and 309-322.

      Minor corrections:

      Abstract (line 32) reads: "a decrease in insulin producing alpha-cells and an increase in glucagon producing beta-cells". The authors should clarify and correct as insulin producing beta-cells and glucagon producing alpha-cells.

      Thank you for catching the error. We corrected the description of insulin- and glucagon-producing cells.

      Please clarify the number and gender of mice used for weight tracking and motor function experiments up to P60 (Figure 9C). It would be inappropriate if male and female mice were plotted together. If so, authors should stratify data by gender.

      We thank the reviewer for the suggestion. Unfortunately, we did not stratify the animals by sex due to the unequal and insufficient number of males and females in our study. To address this, we normalized weight gain to each animal’s starting weight, and no significant differences were observed (now shown in Figure 9D).

      The number of figures should be reduced. We recommend merging Figures 1 and 2 (generation of AlbCre/+;Smn2B/F7 mouse line and validation) and Figures 3 and 4 (liver function). Figures 5 through 9 may be supplemental figures instead.

      We thank the reviewer for the suggestions. We merged Figures 1 and 2, and Figures 3 and 4, as requested. However, we would prefer to keep the other figures within the main results as they assess the impact of liver-specific depletion of SMN on other pathologies within the mouse model.

      Standardize the use of asterisks and reporting p-values in Figure 2. All other figures in the manuscript utilize asterisks, but Figures 2C', 2D' and 2E' use p-values across comparisons.

      P-values were included only when they approached statistical significance, providing additional clarity to the results.

      It is unclear what the white arrow in Figure 7A indicates.

      It is meant to point out the absence of an innervating axon. Please see Figure 5 legend, lines 801-802.

      Note spelling errors in Figures 8B and 8C: 'Muscle flber'.

      Thank you for catching this. We have corrected the typo to indicate muscle fiber instead.

      Please clarify if muscle fiber size should be indicated as µm2 instead of µ2 in Figures 8B and 8C, as written in Materials and Methods under line 394.

      Thank you for catching this. We corrected the typo to indicate µm2 instead.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) The overall conclusion, as summarized in the abstract as "Together, our study documents the diversification of locomotor and oculomotor adaptations among hunting teleost larvae" is not that compelling. What would be much more interesting would be to directly relate these differences to different ecological niches (e.g. different types of natural prey, visual scene conditions, height in water column etc), and/or differences in neural circuit mechanisms. While I appreciate that this paper provides a first step on this path, by itself it seems on the verge of stamp collecting, i.e. collecting and cataloging observations without a clear, overarching hypothesis or theoretical framework.

      There are limited studies on the prey capture behaviors of larval fishes, and ours is the first to compare multiple species systematically using a common analysis framework. Our analysis approach could have uncovered a common set of swim kinematics and capture strategies shared by all species; but instead, we found that medaka used a monocular strategy rather than the binocular strategy of cichlids and zebrafish. Our analysis similarly could have revealed first-feeding larvae of all species go through a “bout” stage, which was previously proposed as important for sensorimotor decision making (Bahl et al., 2019), but instead we found that medaka and some cichlids have more continuous swimming from an early life stage. Finally, the rate at which prey capture kinematics evolves is not known. Our approach could have revealed rapid diversification of feeding strategies in cichlids (similarly to how adult feeding behavior evolves), but instead we found smaller differences within cichlids than between cichlids and medaka.

      (2) The data to support some of the claims is either weak or lacking entirely.

      Highlighted timestamps in videos, new stats in fig 1H and fig 2, updated supplementary figures now provide additional support for claims.

      - It would be helpful to include previously published data from zebrafish for comparison.

      We appreciate the suggestion. Mearns et al. (2020) provided a comprehensive account of prey capture in zebrafish larvae in an almost identical setup with similar analyses. We do not feel it is necessary to recount all the findings in that paper here. There are many studies on prey capture in zebrafish from the past 20 years, and reproducing these here would not add anything to that extensive pre-existing literature.

      - Justification is required for why it is meaningful to compare hunting strategies when both fish species and prey species are being varied. For instance, artemia and paramecia are different sizes and have different movement statistics.

      We added text explaining why different food was chosen for medaka/cichlids. There is no easy way to stage match fishes as evolutionarily diverged as cichlids, medaka, and zebrafish. Size is a reasonable metric within a species, but there is no guarantee that sizematched larvae of two different species are at the same level of maturity. Therefore, we thought the most appropriate stage to address is when larvae first start feeding, as this enables us to study innate prey capture behavior before any learning or experience-dependent changes have taken place. Given that zebrafish, medaka and cichlid larvae are different sizes when they first start feeding, it was necessary to study their hunting behavior to different prey items.

      - It would be helpful in Figure 1A to add the abbreviations used elsewhere in the paper. I found it slightly distracting that the authors switch back and forth in the paper between using "OL" and "medaka" to refer to the same species: please pick one and then remain consistent.

      Medaka is the common name for the japanese rice fish, O. latipes. Cichlilds do not have common names are only referred to by their scientific names. Since readers are more likely to be familiar with the common name, medaka, we now use medaka (OL) throughout the manuscript, which we hope makes the text clearer.

      - The conceptual meaning of behavioral segmentation is somewhat unclear. For zebrafish, the bouts already come temporally segmented. However in medaka for instance, swimming is more continuous, and the segmentation is presumably more in terms of "behavioral syllables" as have been discussed for example mouse or drosophila behavior (in the last row of Figure S1 it is not at all obvious why some of the boundaries were placed at their specific locations). It's not clear whether it's meaningful to make an equivalence between syllables and bouts, and so whether for instance Figure 1H is making an apples-to-apples comparison.

      We clarified the text to say we are comparing syllables, rather than bouts.

      - The interpretation of 1H is that "medaka exhibited significantly longer swims than cichlids"; however this is not supported by the appropriate statistical test. The KS test only says that two probability distributions are different; to say that one quantity is larger than another requires a comparison of means.

      Updated Fig 1H; boostrap test (difference of medians) and re plotted data as violin plots.

      (2) The data to support some of the claims is either weak or lacking entirely.

      Highlighted timestamps in videos, new stats in fig 1H and fig 2, updated supplementary figures now provide additional support for claims.

      - I think the evidence that there are qualitatively different patterns of eye convergence between species is weak. In Figure 2A I admire the authors addressing this using BIC, and the distributions are clearly separated in LA (the Hartigan dip test could be a useful additional test here). However for LO, NM, and AB the distributions only have one peak, and it's therefore unclear why it's better to fit them with two Gaussians rather than e.g. a gamma distribution. Indeed the latter has fewer parameters than a two-gaussian model, so it would be worthwhile to use BIC to make that comparison. The positions of the two Gaussians for LO, NM, and AB are separated by only a handful of degrees (cf LA, where the separation is ~20 degrees), which further supports the idea that there aren't really two qualitatively different convergence states here.

      Added explanation to text.

      - Figure S2 is unfortunately misleading in this regard. I don't claim the authors aimed to mislead, but they have made the well-known error of using colors with very different luminances in a plot where size matters (see e.g.

      https://nam12.safelinks.protection.outlook.com/?url=https%3A %2F%2Fwww.r-project.org%2Fconferences%2FDSC2003%2FProceedings%2FIhaka.pdf&data=05%7C02%7Cdme arns%40princeton.edu%7C17ae2b44f0f246f15ddd08dc9b8e2 01c%7C2ff601167431425db5af077d7791bda4%7C0%7C0%7

      C638556282750568814%7CUnknown%7CTWFpbGZsb3d8ey

      JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ XVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Ll4J4Xo39JEtKb %2FNnRWNoyedZAu5aAOMq0lHJCwsfXI%3D&reserved=0).

      Thus, to the eye, it appears there's a big valley between the red and blue regions, but actually, that valley is full of points: it's really just one big continuous blob.

      Kernel density estimation of eye convergence angles were added to Figure S2. The point we wish to make is that there is higher density when both eyes are rotated invwards (converged) in cichlids, but not medaka (O. latipes). The valley between converged and unconverged states being full of points is due to (1) slight variation with placement of key points in SLEAP, which blurs the boundary between states and (2) the eye convergence angle must pass through the valley in order to become converged, so necessarily there are points in between the two extremes of eye convergence.

      - In Figure 2D please could the authors double-check the significance of the difference between LO and NM: they certainly don't look different in the plot.

      Thank for for flagging this. We realize the way we previously reported the stats was open to misinterpretation. We have updated figure 2C, D and F to use letters to indicate statistical groupings, which hopefully makes it clearer which species are statistically different from each other.

      - In Figure 2G it's not clear why AB is not included. It is mentioned that the artemia was hard to track in the AB videos, but the supplementary videos provided do not support this.

      The contrast of the artemia in the AB videos is sufficiently different from the other cichlid videos that our pre-trained YOLO model fails. Retraining the model would be a lot of extra work and we feel like a comparison of three species is sufficient to address the sensorimotor transformations that occur over the course of prey capture in cichlids.

      - The statement "Zebrafish larvae have a unique swim repertoire during prey capture, which is distinct from exploratory swim bouts" is not supported by the work of others or indeed the authors' own work. In Figure 4F all types of bouts can occur at any time, it's just the probability at which they occur that varies during prey capture versus other times (see also Mearns et al (2020) Figure S4B).

      The point is well taken that there probably is not a hard separation between spontaneous and prey capture swims based on tail kinematics alone, which is also shown in Marques et al. (2018). However, we think that figure 2I of Mearns et al., which plots the probability of swims being drawn from different parts of the behavior space during prey capture (eyes converged) or not (eyes unconverged), shows that the repertoire of swims during the two states is substantially different. Points are blue or red; there are very few pale blue/pale red points in that figure panel. Figure S4B is showing clustered data, and clustering is a notoriously challenging problem for which there exists no perfect solution (Kleinberg, 2002). The clusters in Mearns et al. incorporated information about transition structure, as this was necessary for obtaining interpretable clusters for subsequent analyses. However, a different clustering approach could have yielded different boundaries, which may have shown more (or less) separation of bout types during prey capture/exploratory swimming. Therefore, we have updated the text to say that zebrafish perferentially perform different swim types during prey capture and exploration, and re-interpreted the behavior of cichlids similarly.

      - More discussion is warranted of the large variation in the number of behavioral clusters found between species (11-32). First, how much is this variation really to be trusted? I appreciate the affinity propogation parameters were the same in all cases, but what parameters "make sense" is somewhat dependent on the particular data set. Second, if one does believe this represents real variation, then why? This is really the key question, and it's unsatisfying to merely document it without trying to interpret it.

      Extended paragraph with more interpretation.

      - What is the purpose of "hovers"? Why not stay motionless? Could it be a way of reducing the latency of a subsequent movement? Is this an example of the scallop theorem?

      Added a couple of sentences speculating on function.

      - I'm not sure "spring-loaded" is a good term here: the tension force of a coiled tail is fairly negligible since there's little internal force actively trying to straighten it.

      Rewrote this part to highlight that fish spring toward the prey, without the implication that tension forces in the tail are responible for the movement. However, we are not aware of any literature measuring passive forces within the tail of fishes. Presumably the notochord is relatively stiff and may provide an internal force trying to straighten the tail.

      - There are now several statements for which no direct evidence is presented. We shouldn't have to rely on the author's qualitative impressions of what they observed: show us quantitative analysis.

      * "often hover"

      * "cichlids often alternate between approaches and hover swims"

      * "over many hundreds of milliseconds"

      * "we have also observed suction captures and ram-like attacks"

      * "may swim backwards"

      * "may expel prey from their mouth"

      * "cichlid captures often occur in two phases"

      Added references to supplementary videos with timestamps to highlight these behaviors.

      - I don't find it plausible that sated fish continue hunting prey that they know they're not going to eat just for the practice.

      Removed the speculation.

      - In Figure 3 is it not possible to include medaka, based on the hand-tracked paramecia?

      The videos are recorded at high frame rate, so it would be a lot of additional work to track these manually. Furthermore, earlier in prey capture it is very difficult to tell by watching videos which prey the medaka are tracking, especially as single paramecia can drift in and out of focus in the videos. Since there is no eye convergence, it is very difficult to ascertain for certain when tracking a given prey begins. In Fig 4, it was only possible to track paramecia by hand since it is immediately prior to the strike and from the video it is possible to see which paramecium the fish targeted. Our analyses of heading changes was performed over the 200 ms prior to a strike, which we think is a conservative enough cutoff to say that fish were probably pursuing prey in this window (it is shorter than the average behavioral syllable duration in medaka).

      - Figure 3 (particularly 3D) suggests the interesting finding that LA essentially only hunt prey that is directly in front of them (unlike LO and NM, the distribution of prey azimuth actually seems to broaden slightly over the duration of hunting events).

      This is worthy of discussion.

      We offer a suggestion for the many instances of prey capture being initiated in the central visual field in LA later in the manuscript when we discuss spitting behavior. We have added text to make this point earlier in the manuscript. The increase in azimuthal range at the end of prey capture may be due to abort swims (e.g. supp. vid. 1, 00:21). The widening of azimuthal angles is present in LO and NM also and is not unique to LA.

      - The reference Ding et al (2016) is not in the reference list.

      Wrong paper was referenced. Should be Ding 2019, which has been added to bibliography.

      - I am not convinced that medaka exhibit a unique side-swing behavior. I agree there is this tendency in the example movie, however, the results of the quantification (Figure 4) are underwhelming. First, cluster 5 in 4K appears to include a proportion of cases from LA and AB. These proportions may be small, but anything above zero means this is not unique to medaka. Second, the heading angle (4N) starts at 4 degrees for LA and 8 degrees for medaka. This difference is genuine but very small, much smaller than what's drawn in the schematic (4M). I'm not sure it's justifiable to call a difference of 4 degrees a qualitatively different strategy.

      We have changed the text to highlight that side swing is highly enriched in medaka. Comparing 4J to 3B we would argue that there is a qualitative difference in the strategy used to capture prey in the cichlid larvae we study here and medaka. We agree that further work is required to understand distance estimation behaviors in different species. In this manuscript, we use heading angle as a proxy for how prey position might change on the retina over a hunting sequence. But as the heading and distance are changing over time, the actual change in angle on the retina for prey may be much larger than the ~8 degree shift reported here. The actual position of the prey is also important here, which, for reasons mentioned above, we could not track. Given the final location of prey in the visual field prior to the strike (Fig 4J), the most parsimonious explanation of the data is that the prey is always in the monocular visual field. In cichlids, the prey is more-or-less centered in the 200 ms preceding the strike. While it is true theat the absolute difference in heading is 4 degrees, when converted to an angular velocity (4N, right), the medaka (OL) effectively rotate twice as fast as LA (20 deg/s vs 40 deg/s), which we think is a substantial difference and evidence of a different targeting strategy.

      - 4K: This is referred to in the caption as a confusion matrix, which it's not.

      Fixed.

      - 4N right panel: how many fish contributed to the points shown?

      Added to figure legend (n=113, LA; n=36, OL). Same data in left and right panels.

      - In the Discussion it is hypothesized that medaka use their lateral line in hunting more than in other species. Testing this hypothesis (even just compared to one other species) would be fairly straightforward, and would add significant interest to the paper overall.

      We agree that this is an interesting experiment for follow up studies, but it is beyond the scope of the current manuscript as we do not have the appropriate animal license for this experiment.

      Reviewer 2:

      The paper is rather descriptive in nature, although more context is provided in the discussion. Most figures are great, but I think the authors could add a couple of visual aids in certain places to explain how certain components were measured.

      Added new supplemental figure (Supp Fig 2)

      Figure 1B- it could be useful to add zebrafish and medaka to the scientific names (I realize it's already in Figure A but I found myself going back and forth a couple of times, mostly trying to confirm that O. latipes is medaka).

      Added common names to 1B, sprinkled reminders of OL/medaka throughout text.

      Figure 1G. I wasn't sure how to interpret the eye angle relative to the midline. Can they rotate their eyes or is this due to curvature in the 'upper' body of the fish? Adding a schematic figure or something like that could help a reader who is not familiar with these methods. Related to this, I was a bit confused by Figure 2A. After reading the methods section, I think I understand - but I little cartoon to describe this would help. It also reminds the reader (especially if they don't work with fish) that fish eyes can rotate. I also wanted to note that initially, I thought convergence was a measure of how the two eyes were positioned relative to the prey given the emphasis given on binocular vision, and only after reading certain sections again did I realize convergence was a measure of eye rotation/movement.

      New supplemental figure explaining how eye tracking is performed

      Figure 3. It was not immediately clear to me what onset, middle, and end represented - although it is explained in the caption. I think what tripped me up is the 'eye convergence' title in the top right corner of Figure 3A.

      Updated figure with schematic illustrating that time is measured relative to eye convergence onset and end.

      The result section about attack swim, S-strike, capture spring, etc. was a bit confusing to read and could benefit from a couple of concise descriptions of these behaviors. For example, I am not familiar with the S strike but a couple of paragraphs into this section, the reader learns more about the difference between S strike vs. attack swim. This can be mentioned in the first paragraph when these distinct behaviors are mentioned.

      Added description of behavior earlier in text.

      Figure 4. Presents lots of interesting data! I wonder if using Figure 1E could help the reader better understand how these measurements were taken.

      New supplemental figure added, explaining how tail tracking is performed.

      I probably overlooked this, but I wonder why so many panels are just focused on one species.

      Added explanation to the text.

      Is the S-shaped capture strategy the same as an S strike?

      Clarified in text to say "S-strike-like". This is a description of prey capture from adult largemouth bass in New et al. (2002). From the still frames shown in that paper, the kinematics looks similar to an S-strike or capture spring. The important point we wish to make is that tail is coiled in an S-shape prior to a strike, which indicates this that a kinematically similar behavior exists fishes beyond just larval cichlids and zebrafish.

      At the end of the page, when continuous swimming versus interrupted swimming is discussed, please remind the reader that medaka shows more continuous swimming (longer bouts).

      Added "while medaka swim continuously with longer bouts ("gliding")".

      After reading the discussion, it looks like many findings are unique. For example, given that medaka is such a popular model species in biology, it strikes me that nobody has ever looked into their hunting movements before. If their findings are novel, perhaps they should state so it is clear that the authors are not ignoring the literature.

      We have highlighted what we believe to be the novelty of our findings (first description of prey capture in larval cichlids and medaka). To our knowledge, we are first to describe hunting in medaka; but there is an extensive literature on medaka dating back to the early 20th century, some of which is only published in Japanese. We have done our best to review the literature, but we cannot rule out that there are papers that we missed. No English language article or review we found mentions literature on hunting behavior in medaka larvae.

      Reviewer 3:

      More evidence is needed to assess the types of visual monocular depth cues used by medaka fish to estimate prey location, but that is beyond the scope of this compelling paper. For example, medaka may estimate depth through knowledge of expected prey size, accommodation, defocus blur, ocular parallax, and/or other possible algorithms to complement cues from motion parallax.

      Added sentence to discussion highlighting that other cues may also contribute to distance estimation in cichlids and medakas. Follow-up studies will require new animal license.

      None. It's quite nice, timely, and thorough work! For future work, one could use 3D pose estimation of eye and prey kinematics to assess the dynamics of the 2D image (prey and background) cast onto the retina. This sort of representation could be useful to infer which monocular depth cues may be used by medaka during hunting.

      Great suggestion for follow up studies. Bolton et al. and Mearns et al. both find changes in z associated with prey capture, and it would be interesting to see how other fish species use the full 3-dimensional water column during prey capture, especially considering the diversity of hunting strategies in adult cichlids (ranging from piscivorous species, like LA, to algar grazers).

      In Figure 4N, you use "change in heading leading up to a strike as a proxy for the change in visual angle of the prey for cichlids and medaka." This proxy makes sense, but you also have the eye angles and (in some cases) the prey positions. One could estimate the actual change in visual angle from this information, which would also allow one to measure whether the fish are trying to stabilize the position of the prey on a high-acuity patch of the retina during the final moments of the hunt. This information may also shed light on which monocular depth cues are used.

      As addressed in comment to reviewer 1, this would require actually manually tracking individual paramecia over hundreds of frames. It is not possible to determine exactly when hunting begins in medaka, and it is prone to errors if medaka switch between targets over the course of a hunting episode. This question is better addressed with psychophysics experiments in embedded animals where it is possible to precisely control the stimulus, but this requires new animal licenses and is beyond the scope of this paper.

      In Figure 5, you could place the prey object a little farther from the D. rerio fish for the S-strike diagram.

      Fixed.

      Figure 4F legend should read "...at the peak of each bout."

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Thank you for your constructive feedback and recognition of our work. We followed your suggestion and improved the accuracy of the language used to interpret some of our findings. 

      Summary:

      The present study by Mikati et al demonstrates an improved method for in-vivo detection of enkephalin release and studies the impact of stress on the activation of enkephalin neurons and enkephalin release in the nucleus accumbens (NAc). The authors refine their pipeline to measure met and leu enkephalin using liquid chromatography and mass spectrometry. The authors subsequently measured met and leu enkephalin in the NAc during stress induced by handling, and fox urine, in addition to calcium activity of enkephalinergic cells using fiber photometry. The authors conclude that this improved tool for measuring enkephalin reveals experimenter handling stress-induced enkephalin release in the NAc that habituates and is dissociable from the calcium activity of these cells, whose activity doesn't habituate. The authors subsequently show that NAc enkephalin neuron calcium activity does habituate to fox urine exposure, is activated by a novel weigh boat, and that fox urine acutely causes increases in met-enk levels, in some animals, as assessed by microdialysis.

      Strengths:

      A new approach to monitoring two distinct enkephalins and a more robust analytical approach for more sensitive detection of neuropeptides. A pipeline that potentially could help for the detection of other neuropeptides.

      Weaknesses:

      Some of the interpretations are not fully supported by the existing data or would require further testing to draw those conclusions. This can be addressed by appropriately tampering down interpretations and acknowledging other limitations the authors did not cover brought by procedural differences between experiments.

      We have taken time to go through the manuscript ensuring we are more detailed and precise with our interpretations as well as appropriately acknowledging limitations. 

      Reviewer #2 (Public Review):

      Thank you for your constructive and thorough assessment of our work. In our revised manuscript, we adjusted the text to reflect the references you mentioned regarding the methionine oxidation procedure. Additionally, we expanded the methods section to include the key details of the statistical tests and procedures that you outlined. 

      Summary:

      The authors aimed to improve the detection of enkephalins, opioid peptides involved in pain modulation, reward, and stress. They used optogenetics, microdialysis, and mass spectrometry to measure enkephalin release during acute stress in freely moving rodents. Their study provided better detection of enkephalins due to the implementation of previously reported derivatization reaction combined with improved sample collection and offered insights into the dynamics and relationship between Met- and Leu-Enkephalin in the Nucleus Accumbens shell during stress.

      Strengths:

      A strength of this work is the enhanced opioid peptide detection resulting from an improved microdialysis technique coupled with an established derivatization approach and sensitive and quantitative nLC-MS measurements. These improvements allowed basal and stimulated peptide release with higher temporal resolution, lower detection thresholds, and native-state endogenous peptide measurement.

      Weaknesses:

      The draft incorrectly credits itself for the development of an oxidation method for the stabilization of Met- and Leu-Enk peptides. The use of hydrogen peroxide reaction for the oxidation of Met-Enk in various biological samples, including brain regions, has been reported previously, although the protocols may slightly vary. Specifically, the manuscript writes about "a critical discovery in the stabilization of enkephalin detection" and that they have "developed a method of methionine stabilization." Those statements are incorrect and the preceding papers that relied on hydrogen peroxide reaction for oxidation of Met-Enk and HPLC for quantification of oxidized Enk forms should be cited. One suggested example is Finn A, Agren G, Bjellerup P, Vedin I, Lundeberg T. Production and characterization of antibodies for the specific determination of the opioid peptide Met5-Enkephalin-Arg6-Phe7. Scand J Clin Lab Invest. 2004;64(1):49-56. doi: 10.1080/00365510410004119. PMID: 15025428.

      Thank you for highlighting this. It was not our intention to imply that we developed the oxidation method, rather that we were able improve the detection of metenkephalin by oxidation of the methionine without compromising the detection resolution of leu-enkephalin, enabling the simultaneous detection of both peptides. We have addressed this is the manuscript and included the suggested citation. 

      Another suggestion for this draft is to make the method section more comprehensive by adding information on specific tools and parameters used for statistical analysis:

      (1) Need to define "proteomics data" and explain whether calculations were performed on EIC for each m/z corresponding to specific peptides or as a batch processing for all detected peptides, from which only select findings are reported here. What type of data normalization was used, and other relevant details of data handling? Explain how Met- and Leu-Enk were identified from DIA data, and what tools were used.

      Thank you for pointing out this source of confusion. We believe it is because we use a different DIA method than is typically used in other literature. Briefly, we use a DIA method with the targeted inclusion list to ensure MS2 triggering as opposed to using large isolation widths to capture all precursors for fragmentation, as is typically done with MS1 features. For our method, MS2 is triggered based on the 4 selected m/z values (heavy and light versions of Leu and Met-Enkephalin peptides) at specific retention time windows with isolation width of 2 Da; regardless of the intensity of MS1 of the peptides. 

      (2) Simple Linear Regression Analysis: The text mentions that simple linear regression analysis was performed on forward and reverse curves, and line equations were reported, but it lacks details such as the specific variables being regressed (although figures have labels) and any associated statistical parameters (e.g., R-squared values). 

      Additional detail about the linear regression process was added to the methods section, please see lines 614-618. The R squared values are also now shown on the figure. 

      ‘For the forward curves, the regression was applied to the measured concentration of the light standard as the theoretical concentration was increased. For plotting purposes, we show the measured peak area ratios for the light standards in the forward curves. For the reverse curves, the regression was applied to the measured concentration of the heavy standard, as the theoretical concentration was varied.’

      (3) Violin Plots: The proteomics data is represented as violin plots with quartiles and median lines. This visual representation is mentioned, but there is no detail regarding the software/tools used for creating these plots.

      We used Graphpad Prism to create these plots. This detail has been added to the statistical analysis section. See line 630.

      (4) Log Transformation: The text states that the data was log-transformed to reduce skewness, which is a common data preprocessing step. However, it does not specify the base of the logarithm used or any information about the distribution before and after transformation.

      We have added the requested details about the log transformation, and how the data looked before and after, into the statistical analysis section. We followed convention that the use of log is generally base 10 unless otherwise specified as natural log (base 2) or a different base. See lines 622-625

      ‘The data was log10 transformed to reduce the skewness of the dataset caused by the variable range of concentrations measured across experiments/animals. Prior to log transformation, the measurements failed normality testing for a Gaussian distribution. After the log transformation, the data passed normality testing, which provided the rationale for the use of statistical analyses that assume normality.’

      (5) Two-Way ANOVA: Two-way ANOVA was conducted with peptide and treatment as independent variables. This analysis is described, but there is no information regarding the software or statistical tests used, p-values, post-hoc tests, or any results of this analysis.

      Information about the two-way ANOVA analysis has been added to the statistical analysis section. Additionally, more detailed information has been added to the figure legends about the statistical results. Please see lines 625-628.

      ‘Two-way ANOVA testing with peptide (Met-Enk or Leu-Enk) and treatment (buffer or stress for example) as the two independent variables. Post-hoc testing was done using Šídák's multiple comparisons test and the p values for each of these analyses are shown in the figures (Figs. 1F, 2A).’ 

      (6) Paired T-Test: A paired t-test was performed on predator odor proteomic data before and after treatment. This step is mentioned, but specific details like sample sizes, and the hypothesis being tested are not provided.

      The sample size is included in the figure legend to which we have included a reference. We have also included the following text to highlight the purpose of this test. See lines 628-630

      A paired t-test was performed on the predator odor proteomic data before and after odor exposure to test that hypothesis that Met-Enk increases following exposure to predator odor  (Fig. 3F). These analyses were conducted using Graphpad Prism.

      (7) Correlation Analysis: The text mentions a simple linear regression analysis to correlate the levels of Met-Enk and Leu-Enk and reports the slopes. However, details such as correlation coefficients, and p-values are missing.

      We apologize for the use of the word correlation as we think it may have caused some confusion and have adjusted the language accordingly. Since this was a linear regression analysis, there is no correlation coefficient. The slope of the fitted line is reported on the figures to show the fitted values of Met-Enk to Leu-Enk. 

      (8) Fiber Photometry Data: Z-scores were calculated for fiber photometry data, and a reference to a cited source is provided. This section lacks details about the calculation of zscores, and their use in the analysis. 

      These details have been added to the statistical analysis section. See lines 634-637

      ‘For the fiber photometry data, the z-scores were calculated as described in using GuPPy which is an open-source python toolbox for fiber photometry analysis. The z-score equation used in GuPPy is z=(DF/F-(mean of DF/F)/standard deviation of DF/F) where F refers to fluorescence of the GCaMP6s signal.’

      (9) Averaged Plots: Z-scores from individual animals were averaged and represented with SEM. It is briefly described, but more details about the number of animals, the purpose of averaging, and the significance of SEM are needed.

      We have added additional information about the averaging process in the statistical analysis section. See lines 639-643.

      ‘The purpose of the averaged traces is to show the extent of concordance of the response to experimenter handling and predator odor stress among animals with the SEM demonstrating that variability. The heatmaps depict the individual responses of each animal. The heatmaps were plotted using Seaborn in Python and mean traces were plotted using Matplotlib in Python.’

      A more comprehensive and objective interpretation of results could enhance the overall quality of the paper.

      We have taken this opportunity to improve our manuscript following comments from all the reviewers that we hope has resulted in a manuscript with a more objective interpretation of results. 

      Reviewer #3 (Public Review):

      Thank you for your thoughtful review of our work. To clarify some of the points you raised, we revised the manuscript to include more detail on how we distinguish between the oxidized endogenous and standard signal, as well as refine the language concerning the spatial resolution. We also edited the manuscript regarding the concentration measurements. We conducted technical replicates, so we appreciate you raising this point and clarify that in the main text. 

      Summary:

      This important paper describes improvements to the measurement of enkephalins in vivo using microdialysis and LC-MS. The key improvement is the oxidation of met- to prevent having a mix of reduced and oxidized methionine in the sample which makes quantification more difficult. It then shows measurements of enkephalins in the nucleus accumbens in two different stress situations - handling and exposure to predator odor. It also reports the ratio of released met- and leu-enkephalin matching what is expected from the digestion of proenkephalin. Measurements are also made by photometry of Ca2+ changes for the fox odor stressor. Some key takeaways are the reliable measurement of met-enkephalin, the significance of directly measuring peptides as opposed to proxy measurements, and the opening of a new avenue into the research of enkephalins due to stress based on these direct measurements.

      Strengths:

      -Improved methods for measurement of enkephalins in vivo.

      -Compelling examples of using this method.

      -Opening a new area of looking at stress responses through the lens of enkephalin concentrations.

      Weaknesses:

      (1) It is not clear if oxidized met-enk is endogenous or not and this method eliminates being able to discern that.

      We clarified our wording in the text copied below to provide an explanation on how we distinguish between the two. Even after oxidation, the standard signal has a higher m/z ratio due to the presence of the Carbon and Nitrogen isotopes as described in the Chemicals section of the methods ‘For Met Enkephalin, a fully labeled L-Phenylalanine (<sup>13</sup>C<sub>9</sub>, <sup>15</sup>N) was added (YGGFM). The resulting mass shift between the endogenous (light) and heavy isotope-labeled peptide are 7Da and 10Da, respectively.’, so they can still be differentiated from the endogenous signal. We have clarified the language in the results section. See lines 82-87. 

      ‘After each sample collection, we add a consistent known concentration of isotopically labeled internal standard of Met-Enk and Leu-Enk of 40 amol/sample to the collected ISF for the accurate identification and quantification of endogenous peptide. These internal standards have a different mass/charge (m/z) ratio than endogenous Met- and Leu-Enk. Thus, we can identify true endogenous signal for Met-Enk and Leu-Enk (Suppl Fig. 1A,C) versus noise, interfering signals, and standard signal (Suppl. Fig. 1B,D).’

      (2) It is not clear if the spatial resolution is really better as claimed since other probes of similar dimensions have been used.

      Apologies for any confusion here. To clarify we primarily state that our approach improves temporal resolution and in a few cases refer to improved spatiotemporal resolution, which we believe we show. The dimensions of the microdialysis probe used in these experiments allow us to target the nucleus accumbens shell and as well as being smaller – especially at the membrane level - than a fiber photometry probe. 

      (3) Claims of having the first concentration measurement are not quite accurate.

      Thank you for your feedback. To clarify, we do not claim that we have the first concentration measurements, rather we are the first to quantify the ratio of Met-Enk to Leu-Enk in vivo in freely behaving animals in the NAcSh. 

      (4) Without a report of technical replicates, the reliability of the method is not as wellevaluated as might be expected.

      We have added these details in the methods section, please see lines 521-530. 

      ‘Each sample was run in two technical replicates and the peak area ratio was averaged before concentration calculations of the peptides were conducted. Several quality control steps were conducted prior to running the in vivo samples. 1) Two technical replicates of a known concentration were injected and analyzed – an example table from 4 random experiments included in this manuscript is shown below. 2) The buffers used on the day of the experiment (aCSF and high K+ buffer) were also tested for any contaminating Met-Enk or Leu-Enk signals by injecting two technical replicates for each buffer. Once these two criteria were met, the experiment was analyzed through the system. If either step failed, which happened a few times, the samples were frozen and the machines were cleaned and restarted until the quality control measures were met.’

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      • The authors should provide appropriate citations of a study that has validated the Enkephalin-Cre mouse line in the nucleus accumbens or provide verification experiments if they have any available.

      Thank you for your comment. We have added a reference validating the Enk-Cre mouse line in the nucleus accumbens to the methods section and is copied here. 

      D.C. Castro, C.S. Oswell, E.T. Zhang, C.E. Pedersen, S.C. Piantadosi, M.A. Rossi, A.C. Hunker, A. Guglin, J.A. Morón, L.S. Zweifel, G.D. Stuber, M.R. Bruchas, An endogenous opioid circuit determines state-dependent reward consumption, Nature 2021 598:7882 598 (2021) 646–651. https://doi.org/10.1038/s41586-02104013-0.

      • Better definition of the labels y1,y2,b3 in Figures 1 and S1 would be useful. I may have missed it but it wasn't described in methods, results, or legends.

      Thank you for this comment. We have added this information to Fig.1 legend ‘Y1, y2, b3 refer to the different elution fragments resulting from Met-Enk during LC-MS.

      • It is interesting that the ratio of KCl-evoked release is what changes differentially for Met- vs Leu. Leu enk increases to the range of met-enk. There is non-detectable or approaching being non-detectable leu-enk (below the 40 amol / sample limit of quantification) in most of the subjects that become apparent and approach basal levels of met-enkephalin. This suggests that the K+ evoked response may be more pronounced for leu-enk. This is something that should be considered for further analysis and should be discussed.

      Thank you for this astute observation, and you make a great point. We have added some discussion of this finding in the results and discussion sections see lines 111112 and lines 253-257. 

      ‘Interestingly, Leu-Enk showed a greater fold change compared to baseline than did Met-Enk with the fold changes being 28 and 7 respectively based on the data in Fig.1F.’

      ‘We also noted that Leu-Enk showed a greater fold increase relative to baseline after depolarization with high K+ buffer as compared to Met-Enk. This may be due to increased Leu-Enk packaging in dense core vesicles compared to Met-Enk or due to the fact that there are two distinct precursor sources for Leu-Enk, namely both proenkephalin and prodynorphin while Met-Enk is mostly cleaved from proenkephalin (see Table 1 [48]).’

      • For example in 2E, it would be helpful to label in the graph axis what samples correspond to the manipulation and also in the text provide the reader with the sample numbers. The authors interpret the relationship between the last two samples of baseline and posthandling stress as the following in the figure legend "the concentration released in later samples is affected; such influence suggests that there is regulation of the maximum amount of peptide to be released in NAcSh. E. The negative correlation in panel d is reversed by using a high K+ buffer to evoke Met-Enk release, suggesting that the limited release observed in D is due to modulation of peptide release rather than depletion of reserves." However, the correlations are similar between 2D and E and it appears that two mice are mediating the difference between the two groups. The appropriate statistical analysis would be to compare the regressions of the two groups. Statistics for the high K+ (and all other graphs where appropriate) need to be reported, including the r2 and p-value.

      Thank you for your constructive critique. To elucidate the effect of high K+, we have plotted the regression line and reported the slope for Fig. 2E. Notably, the slope is reduced by a factor of 2 and appears to be driven by a large subset of the animals. The statistics for the high K+ graph are shown on the figure (Fig 1F) which test the hypothesis of whether high K+ leads to the release of Leu-Enk and Met-Enk respectively compared to baseline with aCSF. We have added the test statistics to the figure legend for additional clarity. Fig. 1G has no statistics because it is only there to elucidate the ratio between Met-Enk and Leu-Enk in the same samples. We did not test any hypotheses related to whether there are differences between their levels as that is not relevant to our question. The correlation on the same data is depicted in Fig. 1H, and we have added the R<sup>2</sup> value per your request. 

      • The interpretation that handling stress induces enkephalin release from microdialysis experiments is also confounded by other factors. For instance, from the methods, it appears that mice were connected and sample collection started 30 min after surgery, therefore recovery from anesthesia is also a confounding variable, among other technical aspects, such as equilibration of the interstitial fluid to the aCSF running through the probe that is acting as a transmitter and extracellular molecule "sink". Did the authors try to handle the mice post hookup similar to what was done with photometry to have a more direct comparison to photometry experiments? This procedural difference, recording from recently surgerized animals (microdialysis) vs well-recovered animals with photometry should be mentioned in addition to the other caveats the authors mention.

      Thank you for your comment. We are aware of this technical limitation, and it is largely why we sought to conduct the fiber photometry experiments to get at the same question. As you requested, we have included additional language in the discussion to acknowledge this limitation and how we chose to address it by measuring calcium activity in the enkephalinergic neurons, which would presumably be the same cell population whose release we are quantifying using microdialysis. See lines 262-273.  

      ‘Our findings showed a robust increase in peptide release at the beginning of experiments, which we interpreted as due to experimenter handling stress that directly precedes microdialysis collections. However, there are other technical limitations to consider such as the fact that we were collecting samples from mice that were recently operated on. Another consideration is that the circulation of aCSF through the probe may cause a sudden shift in oncotic and hydrostatic forces, leading to increased peptide release to the extracellular space. As such, we wanted to examine our findings using a different technique, so we chose to record calcium activity from enkephalinergic neurons - the same cell population leading to peptide release. Using fiber photometry, we showed that enkephalinergic neurons are activated by stress exposure, both experimenter handling and fox odor, thereby adding more evidence to suggest that enkephalinergic neurons are activated by stress exposure which could explain the heightened peptide levels at the beginning of microdialysis experiments.’

      • The authors should provide more details on handling stress manipulation during photometry. For photometry what was the duration of the handling bout, what was the interval between handling events, and can the authors provide a description of what handling entailed? Were mice habituated to handling days before doing photometry recording experiments?

      Thank you for your suggestion. We have addressed all of your points in the methods section. See lines 564-570. 

      ‘The handling bout which mimicked traditional scruffing lasted about 3-5 seconds. The mouse was then let go and the handling was repeated another two times in a single session with a minimum of 1-2 minutes between handling bouts. Mice were habituated to this manipulation by being attached to the fiber photometry rig, for 3-5 consecutive days prior to the experimental recording. Additionally, the same maneuver was employed when attaching/detaching the fiber photometry cord, so the mice were subjected to the same process several times.’

      • For the novel weigh boat experiments, the authors should explicitly state when these experiments were done in relation to the fox urine, was it a different session or the same session? Were they the same animals? Statements like the following (line 251) imply it was done in the same animals in the same session but it should be clarified in the methods "We also showed using fiber photometry that the novelty of the introduction of a foreign object to the cage, before adding fox odor, was sufficient to activate enkephalinergic neurons."

      As shown in supplementary figure 4, individual animal data is shown for both water and fox urine exposure (overlaid) to depict whether there were differences in their responses to each manipulation – in the same animal. And yes, you are correct, the animals were first exposed to water 3 times in the recording session and then exposed to fox urine 3 times in the same session. We have added that to the methods section describing in vivo fiber photometry. See lines 575-576.  

      • Statistical testing would be needed to affirm the conclusions the authors draw from the fox urine and novel weigh boat experiments. For example, it shows stats that the response attenuates, that it is not different between fox urine and novel (it looks like the response is stronger to the fox urine when looking at the individual animals), etc. These data look clear but stats are formally needed. Formal statistics are also missing in other parts of the manuscript where conclusions are drawn from the data but direct statistical comparisons are not included (e.g. Fig 2.G-I).

      The photometry data is shown as z-scores which is a formal statistical analysis. ANOVA would be inappropriate to run to compare z-scores. We understand that this is erroneously done in fiber photometry literature, however, it remains incorrect. The z-scores alone provide all the information needed about the deviation from baseline. We understand that this is not immediately clear to readers, and we thank you for allowing us to explain why this is the case. We have added test statistics to figure legends where hypothesis testing was done and p-values were reported. 

      • Did the authors try to present the animals with repeated fox urine exposure to see if this habituates like the photometry?

      No, we did not do that experiment due to the constrained timing within which we had to run our microdialysis/LC-MS timeline, but it is a great point for future exploration. 

      • It would be useful to present the time course of the odor experiment for the microdialysis experiment.

      The timeline is shown in Fig.1a and Fig.3e. To reiterate, each sample is 13 minutes long.

      • Can the authors determine if differences in behavior (e.g. excessive avoidance in animals with with one type of response) or microdialysis probe location dictate whether animals fall into categories of increased release, no release, or no-detection? From the breakdown, it looks like it is almost equally split into three parts but the authors' descriptions of this split are somewhat misleading (line 210). " The response to predator odor varies appreciably: although most animals show increased Met-Enk release after fox odor exposure, some show continued release with no elevation in Met-Enk levels, and a minority show no detectable release".

      Thank you for your constructive feedback. We do not believe the difference in behavior is correlated with probe placement. The hit map can be found in suppl. Fig 3 and shows that all mice included in the manuscript had probes in the NAcSh. We purposely did not distinguish between dorsal and ventral because of our 1 mm membrane would make it hard to presume exclusive sampling from one subregion. That is a great point though, and we have thought about it extensively for future studies. We have edited the language to reflect the almost even split of responses for Met-Enk and appreciate you pointing that out. 

      • Overall, given the inconsistencies in experimental design and overall caveats associated, I think the authors are unable to draw reasonable conclusions from the repeated stressor experiments and something they should either consider is not trying to draw strong conclusions from these observations or perform additional experiments that provide the grounds to derive those conclusions.

      We have included additional language on the caveats of our study, and our use of a dual approach using fiber photometry and microdialysis was largely driven by a

      desire to offer additional support of our conclusions. We expected pushback about our conclusions, so we wanted to offer a secondary analysis using a different technique to test our hypothesis. To be honest the tone of this comment and content is not particularly constructive (especially for trainees) nor does it offer a space to realistically address anything. This work took multiple years to optimize, it was led by a graduate student, and required a multidisciplinary team. As highlighted, we believe it offers an important contribution to the literature and pushes the field of peptide detection forward.  

      Reviewer #2 (Recommendations For The Authors):

      A more comprehensive and objective interpretation of results could enhance the overall quality of the paper. The manuscript contains statements like "we are the first to confirm," which can be challenging to substantiate and may not significantly enhance the paper. It's essential to ensure that novelty statements are well-founded. For example, the release of enkephalins from other brain regions after stress exposure is well-documented but not addressed in the paper. Similarly, the role of the NA shell in stress has been extensively studied but lacks coverage in this manuscript.

      We have edited the language to reflect your feedback. We have also included relevant literature expanding on the demonstrated roles of enkephalins in the literature. We would like to note that most studies have focused on chronic stress, and we were particularly interested in acute stress. See lines 129-134.

      ‘These studies have included regions such as the locus coeruleus, the ventral medulla, the basolateral nucleus of the amygdala, and the nucleus accumbens core and shell. Studies using global knockout of enkephalins have shown varying responses to chronic stress interventions where male knockout mice showed resistance to chronic mild stress in one study, while another study showed that enkephalin-knockout mice showed delayed termination of corticosteroid release. [33,34]’ 

      Finally, not a weakness but a clarification suggestion: the method description mentions the use of 1% FA in the sample reconstitution solution and LC solvents, which is an unusually high concentration of acid. If this concentration is intentional for maintaining the peptides' oxidation state, it would be beneficial to mention this in the text to assist readers who might want to replicate the method.

      This is correct and has been clarified in the methods section

      Reviewer #3 (Recommendations For The Authors):

      -The Abstract should state the critical improvements that are made. Also, quantify the improvements in spatiotemporal resolution.

      Thank you for your comment. We have edited the abstract to reflect this. 

      - The use of "amol/sample" as concentration is less informative than an SI units (e.g., pM concentration) and should be changed. Especially since the volume used was the same for in vivo sampling experiments.

      Thank you for your comment. We chose to report amol/sample because we are measuring such a small concentration and wanted to account for any slight errors in volume that can make drastic differences on reported concentrations especially since samples are dried and resuspended.  

      -Please check this sentence: "After each collection, the samples were spiked with 2 µL of 12.5 fM isotopically labeled Met-Enkephalin and Leu-Enkephalin" This dilution would yield a concentration of ~2 fM. In a 12 uL sample, that would be ~0.02 amol, well below the detection limit. (note that fM would femtomolar concentration and fmol would be femtomoles added).

      -"liquid chromatography/mass spectrometry (LC-MS) [9-12]"... Reference 9 is a RIA analysis paper, not LC-MS as stated.

      Thank you for catching these. We have corrected the unit and citation. 

      -Given that improvements in temporal resolution are claimed, the lack of time course data with a time axis is surprising. Rather, data for baseline and during treatment appear to be combined in different plots. Time course plots of individuals and group averages would be informative.

      Due to the expected variability between individual animal time course data, where for example, we measure detectable levels in one sample followed by no detection, it was very difficult to combine data across time. Therefore, to maximize data inclusion from all animals that showed baseline measurements and responses to individual manipulations, we opted to report snapshot data. Our improvement in temporal resolution refers to the duration of each sample rather than continuous sampling, so those two are unrelated. Thank you for your feedback and allowing us to clarify this.

      - I do not understand this claim "We use custom-made microdialysis probes, intentionally modified so they are similar in size to commonly used fiber photometry probes to avoid extensive tissue damage caused by traditional microdialysis probes (Fig. 1B)." The probes used are 320 um OD and 1 mm long. This is not an uncommon size of microdialysis probes and indeed many are smaller, so is their probe really causing less damage than traditional probes?

      Thank you for your comment. We are only trying to make the point that the tissue damage from these probes is comparable to commonly used fiber photometry probes. We only point that out because tissue damage is used as a point to dissuade the usage of microdialysis in some literature, and we just wanted to disambiguate that. We have clarified the statement you pointed out.  

      -The oxidation procedure is a good idea, as mentioned above. It would be interesting to compare met-enk with and without the oxidation procedure to see how much it affects the result (I would not say this is necessary though). It is not uncommon to add antioxidants to avoid losses like this. Also, it should be acknowledged that the treatment does prevent the detection of any in vivo oxidation, perhaps that is important in met-enk metabolism?

      The comparison between oxidized and unoxidized Met-Enk detection is in figure 1C. 

      -It would be a best practice to report the standard deviation of signal for technical replicates (say near in vivo concentrations) of standards and repeated analysis of a dialysate sample to be able to understand the variability associated with this method. Similarly, an averaged basal concentration from all rats.

      Thank you for your comment. We have included a table showing example quality control standard injections from 4 randomly selected experiments included in the manuscript that were run before and after each experiment and descriptive statistics associated with these technical replicates. We also added some detail to the methods section to describe how quality control is done. See lines 521-530. 

      ‘Each sample was run in two technical replicates and the peak area ratio was averaged before concentration calculations of the peptides were conducted. Several quality control steps were conducted prior to running the in vivo samples. 1) Two technical replicates of a known concentration were injected and analyzed – an example table from 4 random experiments included in this manuscript is shown below. 2) The buffers used on the day of the experiment (aCSF and high K+ buffer) were also tested for any contaminating Met-Enk or Leu-Enk signals by injecting two technical replicates for each buffer. Once these two criteria were met, the experiment was analyzed through the system. If either step failed, which happened a few times, the samples were frozen and the machines were cleaned and restarted until the quality control measures were met.’

      EDITORS NOTE

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

      Thank you for your suggestion. We have included more detail about statistical analysis in the figure legends per this comment and reviewer comments.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Responses to Reviewer #1:

      We thank the reviewer for these additional comments, and more generally for their extensive engagement with our work, which is greatly appreciated. Here, we respond to the three points in their latest review in turn.

      The results of these experiments support a modest but important conclusion: If sub-optimal methods are used to collect retrospective reports, such as simple yes/no questions, inattentional blindness (IB) rates may be overestimated by up to ~8%.

      It is true, of course, that we think the field has overstated the extent of IB, and we appreciate the reviewer characterizing our results as important along these lines. Nevertheless, we respectfully disagree with the framing and interpretation the reviewer attaches to them. As explained in our previous response, we think this interpretation — and the associated calculations of IB overestimation ‘rates’ — perpetuates a binary approach to perception and awareness which we regard as mistaken.

      A graded approach to IB and visual awareness 

      Our sense is that many theorists interested in IB have conceived of perception and awareness as ‘all or nothing’: You either see a perfectly clear gorilla right in front of you, or you see nothing at all. This is implicit in the reviewer’s characterization of our results as simply indicating that fewer subjects fail to see the critical stimulus than previously assumed. To think that way is precisely to assume the orthodox binary position about perception, i.e., that any given subject can neatly be categorized into one of two boxes, saw or didn’t see.

      Our perspective is different. We think there can be degraded forms of perception and awareness that fall neatly into neither of the categories “saw the stimulus perfectly clearly” or “saw nothing at all”. On this graded conception, the question is not: “What proportion of subjects saw the stimulus?” but: “What is the sensitivity of subjects to the stimulus?” This is why we prefer signal detection measures like d′ over % noticing and % correct. This powerful framework has been successful in essentially every domain to which it has been applied, and we think perception and visual awareness are no exception. We understand that the reviewer may not think the same way about this foundational issue, but since part of our goal is to promote a graded approach to perception, we are keen to highlight our disagreement here and so resist the reviewer’s interpretation of our results (even to the extent that it is a positive one!).

      Finally, we note that given this perspective, we are correspondingly inclined to reject many of the summary figures following below in Point (1) by the reviewer. These calculations (given in terms of % noticing and not noticing) make sense on the binary conception of awareness, but not on the SDT-based approach we favor. We say more about this below. 

      (1) In experiment 1, data from 374 subjects were included in the analysis. As shown in figure 2b, 267 subjects reported noticing the critical stimulus and 107 subjects reported not noticing it. This translates to a 29% IB rate if we were to only consider the "did you notice anything unusual Y/N" question. As reported in the results text (and figure 2c), when asked to report the location of the critical stimulus (left/right), 63.6% of the "non-noticer" group answered correctly. In other words, 68 subjects were correct about the location while 39 subjects were incorrect. Importantly, because the location judgment was a 2-alternative-forced-choice, the assumption was that if 50% (or at least not statistically different than 50%) of the subjects answered the location question correctly, everyone was purely guessing. Therefore, we can estimate that ~39 of the subjects who answered correctly were simply guessing (because 39 guessed incorrectly), leaving 29 subjects from the nonnoticer group who were correct on the 2AFC above and beyond the pure guess rate. If these 29 subjects are moved from the non-noticer to the noticer group, the corrected rate of IB for Experiment 1 is 20.86% instead of the original 28.61% rate that would have been obtained if only the Y/N question was used. In other words, relying only on the "Y/N did you notice anything" question led to an overestimate of IB rates by 7.75% in Experiment 1.

      In the revised version of their manuscript, the authors provided the data that was missing from the original submission, which allows this same exercise to be carried out on the other 4 experiments.  

      (To briefly interject: All of these data were provided in our public archive since our original submission and remain available at https://osf.io/fcrhu. The difference now is only that they are included in the manuscript itself.)

      Using the same logic as above, i.e., calculating the pure-guess rate on the 2AFC, moving the number of subjects above this pure-guess rate to the non-noticer group, and then re-calculating a "corrected IB rate", the other experiments demonstrate the following:

      Experiment 2: IB rates were overestimated by 4.74% (original IB rate based only on Y/N question = 27.73%; corrected IB rate that includes the 2AFC = 22.99%)

      Experiment 3: IB rates were overestimated by 3.58% (original IB rate = 30.85%; corrected IB rate = 27.27%)

      Experiment 4: IB rates were overestimated by ~8.19% (original IB rate = 57.32%; corrected IB rate for color* = 39.71%, corrected IB rate for shape = 52.61%, corrected IB rate for location = 55.07%)

      Experiment 5: IB rates were overestimated by ~1.44% (original IB rate = 28.99%; corrected IB rate for color = 27.56%, corrected IB rate for shape = 26.43%, corrected IB rate for location = 28.65%)

      *note: the highest overestimate of IB rates was from Experiment 4, color condition, but the authors admitted that there was a problem with 2AFC color guessing bias in this version of the experiment which was a main motivation for running experiment 5 which corrected for this bias.

      Taken as a whole, this data clearly demonstrates that even with a conservative approach to analyzing the combination of Y/N and 2AFC data, inattentional blindness was evident in a sizeable portion of the subject populations. An important (albeit modest) overestimate of IB rates was demonstrated by incorporating these improved methods.

      We appreciate the work the reviewer has put into making these calculations. However, as noted above, such calculations implicitly reflect the binary approach to perception and awareness that we reject. 

      Consider how we’d think about the single subject case where the task is 2afc detection of a low contrast stimulus in noise. Suppose that this subject achieves 70% correct. One way of thinking about this is that the subject fully and clearly sees the stimulus on 40% of trials (achieving 100% correct on those) and guesses completely blindly on the other 60% (achieving 50% correct on those) for a total of 40% + 30% = 70% overall. However, this is essentially a ‘high threshold’ approach to the problem, in contrast to an SDT approach. On an SDT approach — an approach with tremendous evidential support — on every trial the subject receives samples from probabilistic distributions corresponding to each interval (one noise and one signal + noise) and determines which is higher according to the 2afc decision rule. Thus, across trials, they have access to differentially graded information about the stimulus. Moreover, on some trials they may have significant information from the stimulus (perhaps, well above their single interval detection criterion) but still decide incorrectly because of high noise from the other spatial interval. From this perspective, there is no nonarbitrary way of saying whether the subject saw/did not see on a given trial. Instead, we must characterize the subject’s overall sensitivity to the stimulus/its visibility to them in terms of a parameter such as d′ (here, ~ 0.7).

      We take the same attitude to the subjects in our experiments (and specifically to our ‘super subject’). Instead of calculating the proportion of subjects who saw or failed to see the stimulus (with some characterized as aware and some as unaware), we think the best way to characterize our results is that, across subjects (and so trials also), there was differential graded access to information from the stimulus, and this is best represented in terms of the group-level sensitivity parameter d′. This is why we frame our results as demonstrating that subjects traditionally considered inattentionally blind exhibit significant residual visual sensitivity to the critical stimulus.

      (2) One of the strongest pieces of evidence presented in this paper was the single data point in Figure 3e showing that in Experiment 3, even the super subject group that rated their non-noticing as "highly confident" had a d' score significantly above zero. Asking for confidence ratings is certainly an improvement over simple Y/N questions about noticing, and if this result were to hold, it could provide a key challenge to IB. However, this result can most likely be explained by measurement error.

      In their revised paper, the authors reported data that was missing from their original submission: the confidence ratings on the 2AFC judgments that followed the initial Y/N question. The most striking indication that this data is likely due to measurement error comes from the number of subjects who indicated that they were highly confident that they didn't notice anything on the critical trial, but then when asked to guess the location of the stimulus, indicated that they were highly confident that the stimulus was on the left (or right). There were 18 subjects (8.82% of the high-confidence non-noticer group) who responded this way. To most readers, this combination of responses (high confidence in correctly judging a stimulus feature that one is highly confident in having not seen at all) indicates that a portion of subjects misunderstood the confidence scales (or just didn't read the questions carefully or made mistakes in their responses, which is common for experiments conducted online).

      In the authors' rebuttal to the first round of peer review, they wrote, "it is perfectly rationally coherent to be very confident that one didn't see anything but also very confident that if there was anything to be seen, it was on the left." I respectfully disagree that such a combination of responses is rationally coherent. The more parsimonious interpretation is that a measurement error occurred, and it's questionable whether we should trust any responses from these 18 subjects.

      In their rebuttal, the authors go on to note that 14 of the 18 subjects who rated their 2AFC with high confidence were correct in their location judgment. If these 14 subjects were removed from analysis (which seems like a reasonable analysis choice, given their contradictory responses), d' for the high-confidence non-noticer group would most likely fall to chance levels. In other words, we would see a data pattern similar to that plotted in Figure 3e, but with the first data point on the left moving down to zero d'. This corrected Figure 3e would then provide a very nice evidence-based justification for including confidence ratings along with Y/N questions in future inattentional blindness studies.

      We appreciate the reviewer’s highlighting of this particular piece of evidence as amongst our strongest. (At the same time, we must resist its characterization as a “single data point”: it derives from a large pre-registered experiment involving some 7,000 subjects total, with over 200 subjects in the relevant bin — both figures being far larger than a typical IB experiment.) We also appreciate their raising the issue of measurement error.

      Specifically, the reviewer contends that our finding that even highly confident non-noticers exhibit significant sensitivity is “most likely … explained by measurement error” due to subjects mistakenly inverting our confidence scale in giving their response. In our original reply, we gave two reasons for thinking this quite unlikely; the reviewer has not addressed these in this revised review. First, we explicitly labeled our confidence scale (with 0 labeled as ‘Not at all confident’ and 3 as ‘Highly confident’) so that subjects would be very unlikely simply to invert the scale. This is especially so as it is very counterintuitive to treat “0” as reflecting high confidence. More importantly, however, we reasoned that any measurement error due to inverting or misconstruing the confidence scale should be symmetric. That is: If subjects are liable to invert the confidence scale, they should do so just as often when they answer “yes” as when they answer “no” – after all the very same scale is being used in both cases. This allows us to explore evidence of measurement error in relation to the large number of high-confidence “yes” subjects (N = 2677), thus providing a robust indicator as to whether subjects are generally liable to misconstrue the confidence scale. Looking at the number of such high confidence noticers who subsequently respond to the 2afc question with low confidence (a pattern which might, though need not, suggest measurement error), we found that the number was tiny. Only 28/2677 (1.05%) of high-confidence noticers subsequently gave the lowest level of confidence on the 2afc question, and only 63/2677 (2.35%) subjects gave either of the two lower levels of confidence. For these reasons, we consider any measurement error due to misunderstanding the confidence scale to be extremely minimal.

      The reviewer is correct to note that 18/204 (9%) subjects reported both being highly confident that they didn't notice anything and highly confident in their 2afc judgment, although only 14/18 were correct in this judgment. Should we exclude these 14? Perhaps if we agree with the reviewer that such a pattern of responses is not “rationally coherent” and so must reflect a misconstrual of the scale. But such a pattern is in fact perfectly and straightforwardly intelligible. Specifically, in a 2afc task, two stimuli can individually fall well below a subject’s single interval detection criterion — leading to a high confidence judgment that nothing was presented in either interval. Quite consistent with this, the lefthand stimulus may produce a signal that is much higher than the right-hand stimulus — leading to a high confidence forced-choice judgment that, if something was presented, it was on the left. (By analogy, consider how a radiologist could look at a scan and say the following: “We’re 95% confident there’s no tumor. But even on the 5% chance that there is, our tests completely rule out that it’s a malignant one, so don’t worry.”) 

      (3) In most (if not all) IB experiments in the literature, a partial attention and/or full attention trial is administered after the critical trial. These control trials are very important for validating IB on the critical trial, as they must show that, when attended, the critical stimuli are very easy to see. If a subject cannot detect the critical stimulus on the control trial, one cannot conclude that they were inattentionally blind on the critical trial, e.g., perhaps the stimulus was just too difficult to see (e.g., too weak, too brief, too far in the periphery, too crowded by distractor stimuli, etc.), or perhaps they weren't paying enough attention overall or failed to follow instructions. In the aggregate data, rates of noticing the stimuli should increase substantially from the critical trial to the control trials. If noticing rates are equivalent on the critical and control trials, one cannot conclude that attention was manipulated in the first place.

      In their rebuttal to the first round of peer review, the authors provided weak justification for not including such a control condition. They cite one paper that argues such control conditions are often used to exclude subjects from analysis (those who fail to notice the stimulus on the control trial are either removed from analysis or replaced with new subjects) and such exclusions/replacements can lead to underestimations of inattentional blindness rates. However, the inclusion of a partial or full attention condition as a control does not necessitate the extra step of excluding or replacing subjects. In the broadest sense, such a control condition simply validates the attention manipulation, i.e., one can easily compare the percent of subjects who answered "yes" or who got the 2AFC judgment correct during the critical trial versus the control trial. The subsequent choice about exclusion/replacement is separate, and researchers can always report the data with and without such exclusions/replacements to remain more neutral on this practice.

      If anyone were to follow-up on this study, I highly recommend including a partial or full attention control condition, especially given the online nature of data collection. It's important to know the percent of online subjects who answer yes and who get the 2AFC question correct when the critical stimulus is attended, because that is the baseline (in this case, the "ceiling level" of performance) to which the IB rates on the critical trial can be compared.

      We agree with the reviewer that future studies could benefit from including a partial or full attention condition. They are surely right that we might learn something additional from such conditions. 

      Where we differ from the reviewer is in thinking of these conditions as “controls” appropriate to our research question. This is why we offered the justification we did in our earlier response. When these conditions are used as controls, they are used to exclude subjects in ways that serve to inflate the biases we are concerned with in our work. For our question, the absence of these conditions does not impact the significance of the findings, since such conditions are designed to answer a question which is not the one at the heart of our paper. Our key claim is that subjects who deny noticing an unexpected stimulus in a standard inattentional blindness paradigm nonetheless exhibit significant residual sensitivity (as well as a conservative bias in their response to the noticing question); the presence or absence of partial- or full-attention conditions is orthogonal to that question.

      Moreover, we note that our tasks were precisely chosen to be classic tasks widely used in the literature to manipulate attention. Thus, by common consensus in the field, they are effective means to soak up attention, and have in effect been tested in partial- and full-attention control settings in a huge number of studies. Second, we think it very doubtful that subjects in a full-attention trial would not overwhelmingly have detected our critical stimuli. The reviewer worries that they might have been “too weak, too brief, too far in the periphery, too crowded by distractor stimuli, etc.” But consider E5 where the stimulus was a highly salient orange or green shape, present on the screen for 5 seconds. The reviewer also suggests that subjects in the full-attention control might not have detected the stimulus because they “weren't paying enough attention overall”. But evidently if they weren’t paying attention even in the full-attention trial this would be reason for thinking that there was inattentional blindness even in this condition (a point made by White et al. 2018) and certainly not a reason for thinking there was not an attentional effect in the critical trial. Lastly, the reviewer suggests that a full-attention condition would have helped ensure that subjects were following instructions. But we ensured this already by (as per our pre-registration) excluding subjects who performed poorly in the relevant primary tasks.

      Thus, both in principle and in practice, we do not see the absence of such conditions as impacting the interpretation of our findings, even as we agree that future work posing a different research question could certainly learn something from including such conditions.

      Responses to Reviewer #2:

      We note that this report is unchanged from an earlier round of review, and not a response to our significantly revised manuscript. We believe our latest version fully addresses all the issues which the reviewer originally raised. The interested reader can see our original response below. We again thank the reviewer for their previous report which was extremely helpful.

      —-

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents valuable findings to the field interested in inattentional blindness (IB), reporting that participants indicating no awareness of unexpected stimuli through yes/no questions, still show above-chance sensitivity to specific properties of these stimuli through follow-up forced-choice questions (e.g., its color). The results suggest that this is because participants are conservative and biased to report not noticing in IB. The authors conclude that these results provide evidence for residual perceptual awareness of inattentionally blind stimuli and that therefore these findings cast doubt on the claim that awareness requires attention. Although the samples are large and the analysis protocol novel, the evidence supporting this interpretation is still incomplete, because effect sizes are rather small, the experimental design could be improved and alternative explanations have not been ruled out.

      We are encouraged to hear that eLife found our work “valuable”. We also understand, having closely looked at the reviews, why the assessment also includes an evaluation of “incomplete”. We gave considerable attention to this latter aspect of the assessment in our revision. In addition to providing additional data and analyses that we believe strengthen our case, we also include a much more substantial review and critique of existing methods in the IB literature to make clear exactly the gap our work fills and the advance it makes. (Indeed, if it is appropriate to say this here, we believe one key aspect of our work that is missing from the assessment is our inclusion of ‘absent’ trials, which is what allows us to make the crucial claims about conservative reporting of awareness in IB for the first time.) Moreover, we refocus our discussion on only our most central claims, and weaken several of our secondary claims so that the data we’ve collected are better aligned with the conclusions we draw, to ensure that the case we now make is in fact complete. Specifically, our two core claims are (1) that there is residual sensitivity to visual features for subjects who would ordinarily be classified as inattentionally blind (whether this sensitivity is conscious or not), and (2) that there is a tendency to respond conservatively on yes/no questions in the context of IB. We believe we have very compelling support for these two core claims, as we explain in detail below and also through revisions to our manuscript.

      Given the combination of strengthened and clarified case, as well as the weakening of any conclusions that may not have been fully supported, we believe and hope that these efforts make our contribution “solid”, “convincing”, or even “compelling” (especially because the “compelling” assessment characterizes contributions that are “more rigorous than the current state-of-the-art”, which we believe to be the case given the issues that have plagued this literature and that we make progress on).

      Reviewer #1 (Public review):

      Summary:

      In the abstract and throughout the paper, the authors boldly claim that their evidence, from the largest set of data ever collected on inattentional blindness, supports the views that "inattentionally blind participants can successfully report the location, color, and shape of stimuli they deny noticing", "subjects retain awareness of stimuli they fail to report", and "these data...cast doubt on claims that awareness requires attention." If their results were to support these claims, this study would overturn 25+ years of research on inattentional blindness, resolve the rich vs. sparse debate in consciousness research, and critically challenge the current majority view in cognitive science that attention is necessary for awareness.

      Unfortunately, these extraordinary claims are not supported by extraordinary (or even moderately convincing) evidence. At best, the results support the more modest conclusion: If sub-optimal methods are used to collect retrospective reports, inattentional blindness rates will be overestimated by up to ~8% (details provided below in comment #1). This evidence-based conclusion means that the phenomenon of inattentional blindness is alive and well as it is even robust to experiments that were specifically aimed at falsifying it. Thankfully, improved methods already exist for correcting the ~8% overestimation of IB rates that this study successfully identified.

      We appreciate here the reviewer’s recognition of the importance of work on inattentional blindness, and the centrality of inattentional blindness to a range of major questions. We also recognize their concerns with what they see as a gap between our data and the claims made on their basis. We address this in detail below (as well as, of course, in our revised manuscript). However, from the outset we are keen to clarify that our central claim is only the first one the reviewer mentions — and the one which appears in our title — namely that, as a group, participants can successfully report the location, color, and shape of stimuli they deny noticing, and thus that there is “Sensitivity to visual features in inattentional blindness”. This is the claim that we believe is strongly supported by our data, and all the more so after revising the manuscript in light of the helpful comments we’ve received.

      By contrast, the other claims the reviewer mentions, concerning awareness (as opposed to residual sensitivity–which might be conscious or unconscious) were intended as both secondary and tentative. We agree with the referee that these are not as strongly supported by our data (and indeed we say so in our manuscript), whereas we do think our data strongly support the more modest — and, to us central — claim that, as a group, inattentionally blind participants can successfully report the location, color, and shape of stimuli they deny noticing. 

      We also feel compelled to resist somewhat the reviewer’s summary of our claims. For example, the reviewer attributes to us the claim that “subjects retain awareness of stimuli they fail to report”; but while that phrase does appear in our abstract, what we in fact say is that our data are “consistent with an alternative hypothesis about IB, namely that subjects retain awareness of stimuli they fail to report”. We do in fact believe that our data are consistent with that hypothesis, whereas earlier investigations seemed not to be. We mention this only because we had used that careful phrasing precisely for this sort of reason, so that we wouldn’t be read as saying that our results unequivocally support that alternative.

      Still, looking back, we see how we may have given more emphasis than we intended to some of these more secondary claims. So, we’ve now gone through and revised our manuscript throughout to emphasize that our main claim is about residual sensitivity, and to make clear that our claims about awareness are secondary and tentative. Indeed, we now say precisely this, that although we favor an interpretation of “our results in terms of residual conscious vision in IB … this claim is tentative and secondary to our primary finding”. We also weaken the statements in the abstract that the reviewer mentions, to better reflect our key claims.

      Finally, we note one further point: Dialectically, inattentional blindness has been used to argue (e.g.) that attention is required for awareness. We think that our data concerning residual sensitivity at least push back on the use of IB to make this claim, even if (as we agree) they do not provide decisive evidence that awareness survives inattention. In other words, we think our data call that claim into question, such that it’s now genuinely unclear whether awareness does or does not survive inattention. We have adjusted our claims on this point accordingly as well.

      Comments:

      (1) In experiment 1, data from 374 subjects were included in the analysis. As shown in figure 2b, 267 subjects reported noticing the critical stimulus and 107 subjects reported not noticing it. This translates to a 29% IB rate, if we were to only consider the "did you notice anything unusual Y/N" question. As reported in the results text (and figure 2c), when asked to report the location of the critical stimulus (left/right), 63.6% of the "non-noticer" group answered correctly. In other words, 68 subjects were correct about the location while 39 subjects were incorrect. Importantly, because the location judgment was a 2-alternative-forced-choice, the assumption was that if 50% (or at least not statistically different than 50%) of the subjects answered the location question correctly, everyone was purely guessing. Therefore, we can estimate that ~39 of the subjects who answered correctly were simply guessing (because 39 guessed incorrectly), leaving 29 subjects from the nonnoticer group who may have indeed actually seen the location of the stimulus. If these 29 subjects are moved to the noticer group, the corrected rate of IB for experiment 1 is 21% instead of 29%. In other words, relying only on the "Y/N did you notice anything" question leads to an overestimate of IB rates by 8%. This modest level of inaccuracy in estimating IB rates is insufficient for concluding that "subjects retain awareness of stimuli they fail to report", i.e. that inattentional blindness does not exist.

      In addition, this 8% inaccuracy in IB rates only considers one side of the story. Given the data reported for experiment 1, one can also calculate the number of subjects who answered "yes, I did notice something unusual" but then reported the incorrect location of the critical stimulus. This turned out to be 8 subjects (or 3% of the "noticer" group). Some would argue that it's reasonable to consider these subjects as inattentionally blind, since they couldn't even report where the critical stimulus they apparently noticed was located. If we move these 8 subjects to the non-noticer group, the 8% overestimation of IB rates is reduced to 6%.

      The same exercise can and should be carried out on the other 4 experiments, however, the authors do not report the subject numbers for any of the other experiments, i.e., how many subjects answered Y/N to the noticing question and how many in each group correctly answered the stimulus feature question. From the limited data reported (only total subject numbers and d' values), the effect sizes in experiments 2-5 were all smaller than in experiment 1 (d' for the non-noticer group was lower in all of these follow-up experiments), so it can be safely assumed that the ~6-8% overestimation of IB rates was smaller in these other four experiments. In a revision, the authors should consider reporting these subject numbers for all 5 experiments.

      We now report, as requested, all these subject numbers in our supplementary data (see Supplementary Tables 1 and 2 in our Supplementary Materials).

      However, we wish to address the larger question the reviewer has raised: Do our data only support a relatively modest reduction in IB rates? Even if they did, we still believe that this would be a consequential result, suggesting a significant overestimation of IB rates in classic paradigms. However, part of our purpose in writing this paper is to push back against a certain binary way of thinking about seeing/awareness. Our sense is that the field has conceived of awareness as “all or nothing”: You either see a perfectly clear gorilla right in front of you, or you see nothing at all. Our perspective is different: We think there can be degraded forms of awareness that fall into neither of those categories. For that reason, we are disinclined to see our results in the way that the reviewer suggests, namely as simply indicating that fewer subjects fail to see the stimulus than previously assumed. To think that way is, in our view, to assume the orthodox binary position about awareness. If, instead, one conceives of awareness as we do (and as we believe the framework of signal detection theory should compel us to), then it isn’t quite right to think of the proportion of subjects who were aware, but rather (e.g.) the sensitivity of subjects to the relevant stimulus. This is why we prefer measures like d′ over % noticing and % correct. We understand that the reviewer may not think the same way about this issue as we do, but part of our goal is to promote that way of thinking in general, and so some of our comments below reflect that perspective and approach.

      For example, consider how we’d think about the single subject case where the task is 2afc detection of a low contrast stimulus in noise. Suppose that this subject achieves 70% correct. One way of thinking about that is that the subject sees the stimulus on 40% of trials (achieving 100% correct on those) and guesses blindly on the other 60% (achieving 50% correct on those) for a total of 40% + 30% = 70% overall. However, this is essentially a “high threshold” approach to the problem, in contrast to an SDT approach. On an SDT approach (an approach with tremendous evidential support), on every trial the subject receives samples from probabilistic distributions corresponding to each interval (one noise and one signal + noise) and determines which is higher according to the 2afc decision rule. Thus, across trials they have access to differentially graded information about the stimulus. Moreover, on some trials they may have significant information from the stimulus (perhaps, well above their single interval detection criterion) but still decide incorrectly because of high noise from the other spatial interval. From this perspective, there is no non-arbitrary way of saying whether the subject saw/did not see on a given trial. Instead, we must characterize the subject’s overall sensitivity to the stimulus/its visibility to them in terms of a parameter such as d′ (here, ~ 0.7).

      We take the same attitude to our super subject. Instead of saying that some subjects saw/failed to see the stimuli, instead we suggest that the best way to characterize our results is that across subjects (and so trials also) there was differential graded access to information from the stimulus best represented in terms of the group-level sensitivity parameter d′.

      We acknowledge that (despite ourselves) we occasionally fell into an all-too-natural binary/high threshold way of thinking, as when we suggested that our data show that “inattentionally blind subjects consciously perceive these stimuli after all” and “the inattentionally blind can see after all." (p.17) We have removed such problematic phrasing as well as other problematic phrasing as noted below.

      (2) Because classic IB paradigms involve only one critical trial per subject, the authors used a "super subject" approach to estimate sensitivity (d') and response criterion (c) according to signal detection theory (SDT). Some readers may have issues with this super subject approach, but my main concern is with the lack of precision used by the authors when interpreting the results from this super subject analysis.

      Only the super subject had above-chance sensitivity (and it was quite modest, with d' values between 0.07 and 0.51), but the authors over-interpret these results as applying to every subject. The methods and analyses cannot determine if any individual subject could report the features above-chance. Therefore, the following list of quotes should be revised for accuracy or removed from the paper as they are misleading and are not supported by the super subject analysis: "Altogether this approach reveals that subjects can report above-chance the features of stimuli (color, shape, and location) that they had claimed not to notice under traditional yes/no questioning" (p.6)

      "In other words, nearly two-thirds of subjects who had just claimed not to have noticed any additional stimulus were then able to correctly report its location." (p.6)

      "Even subjects who answer "no" under traditional questioning can still correctly report various features of the stimulus they just reported not having noticed, suggesting that they were at least partially aware of it after all." (p.8)

      "Why, if subjects could succeed at our forced-response questions, did they claim not to have noticed anything?" (p.8)

      "we found that observers could successfully report a variety of features of unattended stimuli, even when they claimed not to have noticed these stimuli." (p.14)

      "our results point to an alternative (and perhaps more straightforward) explanation: that inattentionally blind subjects consciously perceive these stimuli after all... they show sensitivity to IB stimuli because they can see them." (p.16)

      "In other words, the inattentionally blind can see after all." (p.17)

      We thank the reviewer for pointing out how these quotations may be misleading as regards our central claim. We intended them all to be read generically as concerning the group, and not universally as claiming that all subjects could report above-chance/see the stimuli etc. We agree entirely that the latter universal claim would not be supported by our data. In contrast, we do contend that our super-subject analysis shows that, as a group, subjects traditionally considered intentionally blind exhibit residual sensitivity to features of stimuli (color, shape, and location) that they had all claimed not to notice, and likewise that as a group they could succeed at our forced-choice questions. 

      To ensure this claim is clear throughout the paper, and that we are not interpreted as making an unsupported universal claim we have revised the language in all of the quotations above, as follows, as well as in numerous other places in the paper.

      “Altogether this approach reveals that subjects can report above-chance the features of stimuli (color, shape, and location) that they had claimed not to notice under traditional yes/no questioning” (p.6) => “Altogether this approach reveals that as a group subjects can report above-chance the features of stimuli (color, shape, and location) that they had all claimed not to notice under traditional yes/no questioning” (p.6)

      “Even subjects who answer “no” under traditional questioning can still correctly report various features of the stimulus they just reported not having noticed, suggesting that they were at least partially aware of it after all.” (p.8) => “... even subjects who answer “no” under traditional questioning can, as a group, still correctly report various features of the stimuli they just reported not having noticed, indicating significant group-level sensitivity to visual features. Moreover, these results are even consistent with an alternative hypothesis about IB, that as a group, subjects who would traditionally be classified as inattentionally blind are in fact at least partially aware of the stimuli they deny noticing.” (p.8)

      “Why, if subjects could succeed at our forced-response questions, did they claim not to have noticed anything?” (p.8) => “Why, if subjects could succeed at our forcedresponse questions as a group, did they all individually claim not to have noticed anything?” (p.8)

      “we found that observers could successfully report a variety of features of unattended stimuli, even when they claimed not to have noticed these stimuli.” (p.14) => “we found that groups of observers could successfully report a variety of features of unattended stimuli, even when they all individually claimed not to have noticed those stimuli.” (p.14)

      “our results point to an alternative (and perhaps more straightforward) explanation: that inattentionally blind subjects consciously perceive these stimuli after all... they show sensitivity to IB stimuli because they can see them.” (p.16) => “our results just as easily raise an alternative (and perhaps more straightforward) explanation: that inattentionally blind subjects may retain a degree of awareness of these stimuli after all.” (p.16) Here deleting: “they show sensitivity to IB stimuli because they can see them.”

      “In other words, the inattentionally blind can see after all.” (p.17) => “In other words, as a group, the inattentionally blind enjoy at least some degraded or partial sensitivity to the location, color and shape of stimuli which they report not noticing.” (p.17)

      In one case, we felt the sentence was correct as it stood, since it simply reported a fact about our data:

      “In other words, nearly two-thirds of subjects who had just claimed not to have noticed any additional stimulus were then able to correctly report its location.” (p.6)

      After all, if subjects were entirely blind and simply guessed, it would be true to say that 50% of subjects would be able to correctly report the stimulus location (by guessing).

      In addition to these and numerous other changes, we also added the following explicit statement early in the paper to head-off any confusion on this point: “Note that all analyses reported here relate to this super subject as opposed to individual subjects”. 

      (3) In addition to the d' values for the super subject being slightly above zero, the authors attempted an analysis of response bias to further question the existence of IB. By including in some of their experiments critical trials in which no critical stimulus was presented, but asking subjects the standard Y/N IB question anyway, the authors obtained false alarm and correct rejection rates. When these FA/CR rates are taken into account along with hit/miss rates when critical stimuli were presented, the authors could calculate c (response criterion) for the super subject. Here, the authors report that response criteria are biased towards saying "no, I didn't notice anything". However, the validity of applying SDT to classic Y/N IB questioning is questionable.

      For example, with the subject numbers provided in Box 1 (the 2x2 table of hits/misses/FA/CR), one can ask, 'how many subjects would have needed to answer "yes, I noticed something unusual" when nothing was presented on the screen in order to obtain a non-biased criterion estimate, i.e., c = 0?' The answer turns out to be 800 subjects (out of the 2761 total subjects in the stimulus-absent condition), or 29% of subjects in this condition.

      In the context of these IB paradigms, it is difficult to imagine 29% of subjects claiming to have seen something unusual when nothing was presented. Here, it seems that we may have reached the limits of extending SDT to IB paradigms, which are very different than what SDT was designed for. For example, in classic psychophysical paradigms, the subject is asked to report Y/N as to whether they think a threshold-level stimulus was presented on the screen, i.e., to detect a faint signal in the noise. Subjects complete many trials and know in advance that there will often be stimuli presented and the stimuli will be very difficult to see. In those cases, it seems more reasonable to incorrectly answer "yes" 29% of the time, as you are trying to detect something very subtle that is out there in the world of noise. In IB paradigms, the stimuli are intentionally designed to be highly salient (and unusual), such that with a tiny bit of attention they can be easily seen. When no stimulus is presented and subjects are asked about their own noticing (especially of something unusual), it seems highly unlikely that 29% of them would answer "yes", which is the rate of FAs that would be needed to support the null hypothesis here, i.e., of a non-biased criterion. For these reasons, the analysis of response bias in the current context is questionable and the results claiming to demonstrate a biased criterion do not provide convincing evidence against IB.

      We are grateful to the reviewer for highlighting this aspect of our data. We agree with several of these points. For example, it is indeed striking that — given the corresponding hit rate — a false alarm rate of 29% would be needed to obtain an unbiased criterion. At the same time, we would respectfully push back on other points above. In our first experiment that uses the super-subject analysis, for example, d′ is 0.51 and highly significant; to describe that figure, as the reviewer does, as “slightly above zero” seemed not quite right to us (and all the more so given that these experiments involve very large samples and preregistered analysis plans). 

      We also respectfully disagree that our data call into question the validity of applying SDT to classic yes/no IB questioning. The mathematical foundations of SDT are rock solid, and have been applied far more broadly than we have applied them here. In fact, in a way we would suggest that exactly the opposite attitude is appropriate: rather than thinking that IB challenges an immensely well-supported, rigorously tested and broadly applicable mathematical model of perception, we think that the conflict between our SDT-based model of IB and the standard interpretation constitutes strong reason to disfavor the standard interpretation. Several points are worth making here.

      First, it is already surprising that 11.03% of our subjects in E2 (46/417) and 7.24% of our subjects in E5 (200/2761) E5 reported noticing a stimulus when no stimulus was present. But while this may have seemed unlikely in advance of inquiry, this is in fact what the data show and forms the basis of our criterion calculations. Thus, our criterion calculations already factor in a surprising but empirically verified high false alarm rate of subjects answering “yes” when no stimulus was presented and were asked about their noticing. (We also note that the only paper we know of to report a false alarm rate in an IB paradigm, though not one used to calculate a response criterion, found a very consistent false alarm rate of 10.4%. See Devue et al. 2009.)

      Second, while the reviewer is of course correct that a common psychophysical paradigm involves detection of a “threshold-level”/faint stimulus in noise, it is widely recognized that SDT has an extremely broad application, being applicable to any situation in which two kinds of event are to be discriminated (Pastore & Scheirer 1975) and being “almost universally accepted as a theoretical account of decision making in research on perceptual detection and recognition and in numerous extensions to applied domains” quite generally (Estes 2002, see also: Wixted 2020). Indeed, cases abound in which SDT has been successfully applied to situations which do not involve near threshold stimuli in noise. To pick two examples at random, SDT has been used in studying acceptability judgments in linguistics (Huang and Ferreira 2020) and the assessment of physical aggression in childstudent interactions (Lerman et al. 2010; for more general discussion of practical applications, see Swets et al. 2000). Given that the framework of SDT is so widely applied and well supported, and that we see no special reason to make an exception, we believe it can be relied on in the present context.

      Finally, we note that inattentional blindness can in many ways be considered analogous to “near threshold” detection since inattention is precisely thought to degrade or even abolish awareness of stimuli, meaning that our stimuli can be construed as near threshold in the relevant sense. Indeed, our relatively modest d′ values suggest that under inattention stimuli are indeed hard to detect. Thus, even were SDT more limited in its application, we think it still would be appropriate to apply to the case of IB.

      (4) One of the strongest pieces of evidence presented in the entire paper is the single data point in Figure 3e showing that in Experiment 3, even the super subject group that rated their non-noticing as "highly confident" had a d' score significantly above zero. Asking for confidence ratings is certainly an improvement over simple Y/N questions about noticing, and if this result were to hold, it could provide a key challenge to IB. However, this result hinges on a single data point, it was not replicated in any of the other 4 experiments, and it can be explained by methodological limitations. I strongly encourage the authors (and other readers) to follow up on this result, in an in-person experiment, with improved questioning procedures.

      We agree that our finding that even the super-subject group that rated their non-noticing as “highly confident” had a d' score significantly above zero is an especially strong piece of evidence, and we thank the reviewer for highlighting that here. At the same time, we note that while the finding is represented by a single marker in Figure 3e, it seemed not quite right to call this a “single data point”, as the reviewer does, given that it derives from a large pre-registered experiment involving some 7,000 subjects total, with over 200 subjects in the relevant bin — both figures being far larger than a typical IB experiment. It would of course be tremendous to follow up on this result – and we certainly hope our work inspires various follow-up studies. That said, we note that recruiting the necessary numbers of in person subjects would be an absolutely enormous, career-level undertaking – it would involve bringing more than the entire undergraduate population at our own institution, Johns Hopkins, into our laboratory! While those results would obviously be extremely valuable, we wouldn’t want to read the reviewer’s comments as implying that only an experiment of that magnitude — requiring thousands upon thousands of in-person subjects — could make progress on these issues. Indeed, because every subject can only contribute one critical trial in IB, it has long been recognized as an extremely challenging paradigm to study in a sufficiently well-powered and psychophysically rigorous way. We believe that our large preregistered online approach represents a major leap forward here, even if it involves certain trade-offs.

      In the current Experiment 3, the authors asked the standard Y/N IB question, and then asked how confident subjects were in their answer. Asking back-to-back questions, the second one with a scale that pertains to the first one (including a tricky inversion, e.g., "yes, I am confident in my answer of no"), may be asking too much of some subjects, especially subjects paying half-attention in online experiments. This procedure is likely to introduce a sizeable degree of measurement error.

      An easy fix in a follow-up study would be to ask subjects to rate their confidence in having noticed something with a single question using an unambiguous scale:

      On the last trial, did you notice anything besides the cross?

      (1): I am highly confident I didn't notice anything else

      (2): I am confident I didn't notice anything else

      (3): I am somewhat confident I didn't notice anything else

      (4): I am unsure whether I noticed anything else

      (5): I am somewhat confident I noticed something else

      (6): I am confident I noticed something else

      (7): I am highly confident I noticed something else

      If we were to re-run this same experiment, in the lab where we can better control the stimuli and the questioning procedure, we would most likely find a d' of zero for subjects who were confident or highly confident (1-2 on the improved scale above) that they didn't notice anything. From there on, the d' values would gradually increase, tracking along with the confidence scale (from 3-7 on the scale). In other words, we would likely find a data pattern similar to that plotted in Figure 3e, but with the first data point on the left moving down to zero d'. In the current online study with the successive (and potentially confusing) retrospective questioning, a handful of subjects could have easily misinterpreted the confidence scale (e.g., inverting the scale) which would lead to a mixture of genuine high-confidence ratings and mistaken ratings, which would result in a super subject d' that falls between zero and the other extreme of the scale (which is exactly what the data in Fig 3e shows).

      One way to check on this potential measurement error using the existing dataset would be to conduct additional analyses that incorporate the confidence ratings from the 2AFC location judgment task. For example, were there any subjects who reported being confident or highly confident that they didn't see anything, but then reported being confident or highly confident in judging the location of the thing they didn't see? If so, how many? In other words, how internally (in)consistent were subjects' confidence ratings across the IB and location questions? Such an analysis could help screen-out subjects who made a mistake on the first question and corrected themselves on the second, as well as subjects who weren't reading the questions carefully enough.

      As far as I could tell, the confidence rating data from the 2AFC location task were not reported anywhere in the main paper or supplement.

      We are grateful to the reviewer for raising this issue and for requesting that we report the confidence rating data from our 2afc location task in Experiment 3. We now report all this data in our Supplementary Materials (see Supplementary Table 3).

      We of course agree with the reviewer’s concern about measurement error, which is a concern in all experiments. What, then, of the particular concern that some subjects might have misunderstood our confidence question? It is surely impossible in principle to rule out this possibility; however, several factors bear on the plausibility of this interpretation. First, we explicitly labeled our confidence scale (with 0 labeled as ‘Not at all confident’ and 3 as ‘Highly confident’) so that subjects would be very unlikely simply to invert the scale. This is especially so as it is very counterintuitive to treat “0” as reflecting high confidence. However, we accept that it is a possibility that certain subjects might nonetheless have been confused in some other way.

      So, we also took a second approach. We examined the confidence ratings on the 2afc question of subjects who reported being highly confident that they didn't notice anything.

      Reassuringly, the large majority of these high confidence “no” subjects (~80%) reported low confidence of 0 or 1 on the 2afc question, and the majority (51%) reported the lowest confidence of 0. Only 18/204 (9%) subjects reported high confidence on both questions. 

      Still, the numbers of subjects here are small and so may not be reliable. This led us to take a third approach. We reasoned that any measurement error due to inverting or misconstruing the confidence scale should be symmetric. That is: If subjects are liable to invert the confidence scale, they should do so just as often when they answer “yes” as when they answer “no” – after all the very same scale is being used in both cases. This allows us to explore evidence of measurement error in relation to the much larger number of highconfidence “yes” subjects (N = 2677), thus providing a much more robust indicator as to whether subjects are generally liable to misconstrue the confidence scale. Looking at the number of such high confidence noticers who subsequently respond to the 2afc question with low-confidence, we found that the number was tiny. Only 28/2677 (1.05%) of highconfidence noticers subsequently gave the lowest level of confidence on the 2afc question, and only 63/2677 (2.35%) subjects gave either of the two lower levels of confidence. In this light, we consider any measurement error due to misunderstanding the confidence scale to be extremely minimal.

      What should we make of the 18 subjects who were highly confident non-noticers but then only low-confidence on the 2afc question? Importantly, we do not think that these 18 subjects necessarily made a mistake on the first question and so should be excluded. There is no a priori reason why one’s confidence criterion in a yes/no question should carry over to a 2afc question. After all, it is perfectly rationally coherent to be very confident that one didn’t see anything but also very confident that if there was anything to be seen, it was on the left. Moreover, these 18 subjects were not all correct on the 2afc question despite their high confidence (4/18 or 22% getting the wrong answer). 

      Nonetheless, and again reassuringly, we found that the above-chance patterns in our data remained the same even excluding these 18 subjects. We did observe a slight reduction in percent correct and d′ but this is absolutely what one should expect since excluding the most confident performers in any task will almost inevitably reduce performance.

      In this light, we consider it unlikely that measurement error fully explains the residual sensitivity found even amongst highly confident non-noticers. That said, we appreciate this concern. We now raise the issue and the analysis of high confidence noticers which addresses it in our revised manuscript. We also thank the reviewer for pressing us to think harder about this issue, which led directly to these new analyses that we believed have strengthened the paper.

      (5) In most (if not all) IB experiments in the literature, a partial attention and/or full attention trial (or set of trials) is administered after the critical trial. These control trials are very important for validating IB on the critical trial, as they must show that, when attended, the critical stimuli are very easy to see. If a subject cannot detect the critical stimulus on the control trial, one cannot conclude that they were inattentionally blind on the critical trial, e.g., perhaps the stimulus was just too difficult to see (e.g., too weak, too brief, too far in the periphery, too crowded by distractor stimuli, etc.), or perhaps they weren't paying enough attention overall or failed to follow instructions. In the aggregate data, rates of noticing the stimuli should increase substantially from the critical trial to the control trials. If noticing rates are equivalent on the critical and control trials one cannot conclude that attention was manipulated.

      It is puzzling why the authors decided not to include any control trials with partial or full attention in their five experiments, especially given their online data collection procedures where stimulus size, intensity, eccentricity, etc. were uncontrolled and variable across subjects. Including such trials could have actually helped them achieve their goal of challenging the IB hypothesis, e.g., excluding subjects who failed to see the stimulus on the control trials might have reduced the inattentional blindness rates further. This design decision should at least be acknowledged and justified (or noted as a limitation) in a revision of this paper.

      We acknowledge that other studies in the literature include divided and full attention trials, and that they could have been included in our work as well. However, we deliberately decided not to include such control trials for an important reason. As the referee comments, the main role of such trials in previous work has been to exclude from analysis subjects who failed to report the unexpected stimulus on the divided and/or full attention control trials.

      (For example, as Most et al. 2001 write: “Because observers should have seen the object in the full-attention trial (Mack & Rock, 1998), we used this trial as a control … Accordingly, 3 observers who failed to see the cross on this trial were replaced, and their data were excluded from the analyses.") As the reviewer points out, excluding such subjects would very likely have ‘helped' us. However, the practice is controversial. Indeed, in a review of 128 experiments, White et al. 2018 argue that the practice has “problematic consequences” and “may lead researchers to understate the pervasiveness of inattentional blindness". Since we wanted to offer as simple and demanding a test of residual sensitivity in IB as possible, we thus decided not to use any such exclusions, and for that reason decided not to include divided/full attention trials. 

      As recommended, we discuss this decision not to include divided/full attention trials and our logic for not doing so in the manuscript. As we explain, not having those conditions makes it more impressive, not less impressive, that we observed the results we in fact did — it makes our results more interpretable, not less interpretable, and so absence of such conditions from our manuscript should not (in our view) be considered any kind of weakness.

      (6) In the discussion section, the authors devote a short paragraph to considering an alternative explanation of their non-zero d' results in their super subject analyses: perhaps the critical stimuli were processed unconsciously and left a trace such that when later forced to guess a feature of the stimuli, subjects were able to draw upon this unconscious trace to guide their 2AFC decision. In the subsequent paragraph, the authors relate these results to above-chance forced-choice guessing in blindsight subjects, but reject the analogy based on claims of parsimony.

      First, the authors dismiss the comparison of IB and blindsight too quickly. In particular, the results from experiment 3, in which some subjects adamantly (confidently) deny seeing the critical stimulus but guess a feature at above-chance levels (at least at the super subject level and assuming the online subjects interpreted and used the confidence scale correctly), seem highly analogous to blindsight. Importantly, the analogy is strengthened if the subjects who were confident in not seeing anything also reported not being confident in their forced-choice judgments, but as mentioned above this data was not reported.

      Second, the authors fail to mention an even more straightforward explanation of these results, which is that ~8% of subjects misinterpreted the "unusual" part of the standard IB question used in experiments 1-3. After all, colored lines and shapes are pretty "usual" for psychology experiments and were present in the distractor stimuli everyone attended to. It seems quite reasonable that some subjects answered this first question, "no, I didn't see anything unusual", but then when told that there was a critical stimulus and asked to judge one of its features, adjusted their response by reconsidering, "oh, ok, if that's the unusual thing you were asking about, of course I saw that extra line flash on the left of the screen". This seems like a more parsimonious alternative compared to either of the two interpretations considered by the authors: (1) IB does not exist, (2) super-subject d' is driven by unconscious processing. Why not also consider: (3) a small percentage of subjects misinterpreted the Y/N question about noticing something unusual. In experiments 4-5, they dropped the term "unusual" but do not analyze whether this made a difference nor do they report enough of the data (subject numbers for the Y/N question and 2AFC) for readers to determine if this helped reduce the ~8% overestimate of IB rates.

      Our primary ambition in the paper was to establish, as our title suggests, residual sensitivity in IB. The ambition is quite neutral as to whether the sensitivity reflects conscious or unconscious processing (i.e. is akin to blindsight as traditionally conceived). We were evidently not clear about this, however, leading to two referees coming away with an impression of our claims that is different than we intended. We have revised our manuscript throughout to address this. But we also want to emphasize here that we take our data primarily to support the more modest claim that there is residual sensitivity (conscious or unconscious) in the group of subjects who are traditionally classified as inattentionally blind. We believe that this claim has solid support in our data.

      We do in the discussion section offer one reason for believing that there is residual awareness in the group of subjects who are traditionally classified as inattentionally blind. However, we acknowledge that this is controversial and now emphasize in the manuscript that this claim “is tentative and secondary to our primary finding”. We also emphasize that part of our point is dialectical: Inattentional blindness has been used to argue (e.g.) that attention is required for awareness. We think that our data concerning residual sensitivity at least push back on the use of IB to make this claim, even if they do not provide decisive evidence (as we agree) that awareness survives inattention. (Cf. here, Hirshhorn et al. 2024 who take up a common suggestion in the field that awareness is best assessed by using both subjective and objective measures, with claims about lack of awareness ideally being supported by both; our data suggest at a minimum that in IB objective measures do not neatly line up with subjective measures.)

      We hope this addresses the referee’s concern that we dismiss the “the comparison of IB and blindsight too quickly”. We do not intend to dismiss that comparison at all, indeed we raise it because we consider it a serious hypothesis. Our aim is simply to raise one possible consideration against it. But, again, our main claim is quite consistent with sensitivity in IB being akin to “blindsight”.

      We also agree with the referee that a possible explanation of why some subjects say they do not notice something unusual in IB paradigms, is not because they didn’t notice anything but because they didn’t consider the unexpected stimulus sufficiently unusual. However, the reviewer is incorrect that we did not mention this interpretation; to the contrary, it was precisely the kind of concern which led us to be dissatisfied with standard IB methods and so motivated our approach. As we wrote in our main text: “However, yes/no questions of this sort are inherently and notoriously subject to bias…   For example, observers might be under-confident whether they saw anything (or whether what they saw counted as unusual); this might lead them to respond “no” out of an excess of caution.” On our view, this is exactly the kind of reason (among other reasons) that one cannot rely on yes/no reports of noticing unusual stimuli, even though the field has relied on just these sorts of questions in just this way.

      We do not, however, think that this explanation accounts for why all subjects fail to report noticing, nor do we think that it accounts for our finding of above-chance sensitivity amongst non-noticers. This is for two critical reasons. First, whereas the word “unusual” did appear in the yes/no question in our Experiments 1-3, it did not appear in our Experiments 4 and 5 on dynamic IB. (In both cases, we used the exact wording of such questions in the experiments we were basing our work on.) And, of course, we still found significant residual sensitivity amongst non-noticers in Experiments 4 and 5. Second, in relation to our confidence experiment, we think it unlikely that subjects who were highly confident that they did not notice anything unusual only said that because they thought what they had seen was insufficiently unusual. Yet even in this group of subjects who were maximally confident that they did not notice anything unusual, we still found residual sensitivity.

      (7) The authors use sub-optimal questioning procedures to challenge the existence of the phenomenon this questioning is intended to demonstrate. A more neutral interpretation of this study is that it is a critique on methods in IB research, not a critique on IB as a manipulation or phenomenon. The authors neglect to mention the dozens of modern IB experiments that have improved upon the simple Y/N IB questioning methods. For example, in Michael Cohen's IB experiments (e.g., Cohen et al., 2011; Cohen et al., 2020; Cohen et al., 2021), he uses a carefully crafted set of probing questions to conservatively ensure that subjects who happened to notice the critical stimuli have every possible opportunity to report seeing them. In other experiments (e.g., Hirschhorn et al., 2024; Pitts et al., 2012), researchers not only ask the Y/N question but then follow this up by presenting examples of the critical stimuli so subjects can see exactly what they are being asked about (recognition-style instead of free recall, which is more sensitive). These follow-up questions include foil stimuli that were never presented (similar to the stimulus-absent trials here), and ask for confidence ratings of all stimuli. Conservative, pre-defined exclusion criteria are employed to improve the accuracy of their IB-rate estimates. In these and other studies, researchers are very cautious about trusting what subjects report seeing, and in all cases, still find substantial IB rates, even to highly salient stimuli. The authors should consider at least mentioning these improved methods, and perhaps consider using some of them in their future experiments.

      The concern that we do not sufficiently discuss the range of “improved” methods in IB studies is well-taken. A similar concern is raised by Reviewer #2 (Dr. Cohen). To address the concern, we have added to our manuscript a substantial new discussion of such improved methods. However, although we do agree that these methods can be helpful and may well address some of the methodological concerns which our paper raises, we do not think that they are a panacea. Thus, our discussion of these methods also includes a substantial discussion of the problems and pitfalls with such methods which led us to favor our own simple forced-response and 2afc questions, combined with SDT analysis. We think this approach is superior both to the classic approach in IB studies and to the approach raised by the reviewers.

      In particular, we have four main concerns about the follow up questions now commonly used in the field:

      First, many follow up questions are used not to exclude people from the IB group but to include people in the IB group. Thus, Most et al. 2001 asked follow up questions but used these to increase their IB group, only excluding subjects from the IB group if they both reported seeing and answered their follow ups incorrectly: “Observers were regarded as having seen the unexpected object if they answered 'yes' when asked if they had seen anything on the critical trial that had not been present before and if they were able to describe its color, motion, or shape." This means that subjects who saw the object but failed to see its color, say, would be treated as inattentionally blind. This has the purpose of inflating IB rates, in exactly the way our paper is intended to critique. So, in our view this isn’t an improvement but rather part of the approach we take issue with.

      Second, many follow up questions remain yes/no questions or nearby variants, all of which are subject to response bias. For example, in Cohen’s studies which the reviewer mentions, it is certainly true that “he uses a carefully crafted set of probing questions to conservatively ensure that subjects who happened to notice the critical stimuli have every possible opportunity to report seeing them.” We agree that this improves over a simple yes/no question in some ways. However, such follow up probes nonetheless remain yes/no questions, subject to response bias, e.g.:

      (1) “Did you notice anything strange or different about that last trial?”

      (2) “If I were to tell you that we did something odd on the last trial, would you have a guess as to what we did?”

      (3) “If I were to tell you we did something different in the second half of the last trial, would you have a guess as to what we did?”

      (4) “Did you notice anything different about the colors in the last scene?”

      Indeed, follow up questions of this kind can be especially susceptible to bias, since subjects may be reluctant to “take back” their earlier answers and so be conservative in responding positively to avoid inconsistency or acknowledgement of earlier error. This may explain why such follow up questions produce remarkable consistency despite their rather different wording. Thus, Simons and Chabris (1999) report: “Although we asked a series of questions escalating in specificity to determine whether observers had noticed the unexpected event, only one observer who failed to report the event in response to the first question (“did you notice anything unusual?'') reported the event in response to any of the next three questions (which culminated in “did you see a ... walk across the screen?''). Thus, since the responses were nearly always consistent across all four questions, we will present the results in terms of overall rates of noticing.” Thus, while there are undoubtedly merits to these follow ups, they do not resolve problems of bias.

      This same basic issue affects the follow up question used in Pitts et al. 2012 which the reviewer mentions. Pitts et al. write: “If a participant reported not seeing any patterns and rated their confidence in seeing the square pattern (once shown the sample) as a 3 or less (1 = least confident, 5 = most confident), she or he was placed in Group 1 and was considered to be inattentionally blind to the square patterns.” The confidence rating follow-up question here remains subject to bias. Moreover, and strikingly, the inclusion criterion used means that subjects who were moderately confident that they saw the square pattern when shown (i.e. answered 3) were counted as inattentionally blind (!). We do not think this is an appropriate inclusion criterion.

      The third problem is that follow up questions are often free/open-response. For instance, Most et al. (2005) ask the follow up question: "If you did see something on the last trial that had not been present during the first two trials, what color was it? If you did not see something, please guess." This is a much more difficult and to that extent less sensitive question than our binary forced-response/2afc questions. For this reason, we believe our follow up questions are more suitable for ascertaining low levels of sensitivity.

      The fourth and final issue is that whereas 2afc questions are criterion free (in that they naturally have an unbiased decision rule), this is in fact not true of n_afc questions in general, nor is it true in general of _delayed n-alternative match to sample designs. Thus, even when limited response options are given, they are not immune to response biases and so require SDT analysis. Moreover, some such tasks can involve decision spaces which are often poorly understood or difficult to analyze without making substantial assumptions about observer strategy. 

      This last point (as well as the first) is relevant to Hirshhorn et al. 2024. Hirshhorn et al. write that they “used two awareness measures. Firstly, participants were asked to rate stimulus visibility on the Perceptual Awareness Scale (PAS, a subjective measure of awareness: Ramsøy & Overgaard, 2004), and then they were asked to select the stimulus image from an array of four images (an objective measure: Jakel & Wichmann, 2006).”

      While certainly an improvement on simple yes/no questioning, the PAS remains subject to response bias. On the other hand, we applaud Hirshhorn et al.’s use of objective measures in the context of IB which of course our design implements. However, while Hirshhorn et al. 2024 suggest that their task is a spatial 4afc following the recommendation of this design by Jakel & Wichmann (2006), it is strictly a 4-alternative delayed match to sample task, so it is doubtful if it can be considered a preferred psychophysical task for the reasons Jakel & Wichmann offer. Regardless, the more crucial point is that observers in such a task might be biased towards one alternative as opposed to another. Thus, use of d′ (as opposed to percent correct as in Hirshhorn et al. 2024) is crucial in assessing performance in such tasks.

      For all these reasons, then, while we agree that the field has taken significant steps to move beyond the simple yes/no question traditionally used in IB studies (and we have revised our manuscript to make this clear); we do not think it has resolved the methodological issues which our paper seeks to highlight and address, and we believe that our approach contributes something additional that is not yet present in the literature. We have now revised our manuscript to make these points much more clearly, and we thank the reviewer for prompting these improvements.

      Reviewer #2 (Public review):

      In this study, Nartker et al. examine how much observers are conscious of using variations of classic inattentional blindness studies. The key idea is that rather than simply asking observers if they noticed a critical object with one yes/no question, the authors also ask follow-up questions to determine if observers are aware of more than the yes/no questions suggest. Specifically, by having observers make forced choice guesses about the critical object, the authors find that many observers who initially said "no" they did not see the object can still "guess" above chance about the critical object's location, color, etc. Thus, the authors claim, that prior claims of inattentional blindness are mistaken and that using such simple methods has led numerous researchers to overestimate how little observers see in the world. To quote the authors themselves, these results imply that "inattentionally blind subjects consciously perceive these stimuli after all... they show sensitivity to IB stimuli because they can see them."

      Before getting to a few issues I have with the paper, I do want to make sure to explicitly compliment the researchers for many aspects of their work. Getting massive amounts of data, using signal detection measures, and the novel use of a "super subject" are all important contributions to the literature that I hope are employed more in the future.

      We really appreciate this comment and that the reviewer found our work to make these important contributions to the literature. We wrote this paper expecting not everyone to accept our conclusions, but hoping that readers would see the work as making a valuable contribution to the literature promoting an underexplored alternative in a compelling way. Given that this reviewer goes on to express some skepticism about our claims, it is especially encouraging to see this positive feedback up top!

      Main point 1: My primary issue with this work is that I believe the authors are misrepresenting the way people often perform inattentional blindness studies. In effect, the authors are saying, "People do the studies 'incorrectly' and report that people see very little. We perform the studies 'correctly' and report that people see much more than previously thought." But the way previous studies are conducted is not accurately described in this paper. The authors describe previous studies as follows on page 3:

      "Crucially, however, this interpretation of IB and the many implications that follow from it rest on a measure that psychophysics has long recognized to be problematic: simply asking participants whether they noticed anything unusual. In IB studies, awareness of the unexpected stimulus (the novel shape, the parading gorilla, etc.) is retroactively probed with a yes/no question, standardly, "Did you notice anything unusual on the last trial which wasn't there on previous trials?". Any subject who answers "no" is assumed not to have any awareness of the unexpected stimulus.

      If this quote were true, the authors would have a point. Unfortunately, I do not believe it is true. This is simply not how many inattentional blindness studies are run. Some of the most famous studies in the inattentional blindness literature do not simply as observes a yes/no question (e.g., the invisible gorilla (Simons et al. 1999), the classic door study where the person changes (Simons and Levin, 1998), the study where observers do not notice a fight happening a few feet from them (Chabris et al., 2011). Instead, these papers consistently ask a series of follow-up questions and even tell the observers what just occurred to confirm that observers did not notice that critical event (e.g., "If I were to tell you we just did XYZ, did you notice that?"). In fact, after a brief search on Google Scholar, I was able to relatively quickly find over a dozen papers that do not just use a yes/no procedure, and instead as a series of multiple questions to determine if someone is inattentionally blind. In no particular order some papers (full disclosure: including my own):

      (1) Most et al. (2005) Psych Review

      (2) Drew et al. (2013) Psych Science

      (3) Drew et al. (2016) Journal of Vision

      (4) Simons et al. (1999) Perception

      (5) Simons and Levin (1998) Perception

      (6) Chabris et al. (2011) iPerception

      (7) Ward & Scholl (2015) Psych Bulletin and Review

      (8) Most et al. (2001) Psych Science

      (9) Todd & Marois (2005) Psych Science

      (10) Fougnie & Marois (2007) Psych Bulletin and Review

      (11) New and German (2015) Evolution and Human Behaviour

      (12) Jackson-Nielsen (2017) Consciousness and cognition

      (13) Mack et al. (2016) Consciousness and cognition

      (14) Devue et al. (2009) Perception

      (15) Memmert (2014) Cognitive Development

      (16) Moore & Egeth (1997) JEP:HPP

      (17) Cohen et al. (2020) Proc Natl Acad Sci

      (18) Cohen et al. (2011) Psych Science

      This is a critical point. The authors' key idea is that when you ask more than just a simple yes/no question, you find that other studies have overestimated the effects of inattentional blindness. But none of the studies listed above only asked simple yes/no questions. Thus, I believe the authors are mis-representing the field. Moreover, many of the studies that do much more than ask a simple yes/no question are cited by the authors themselves! Furthermore, as far as I can tell, the authors believe that if researchers do these extra steps and ask more follow-ups, then the results are valid. But since so many of these prior studies do those extra steps, I am not exactly sure what is being criticized.

      To make sure this point is clear, I'd like to use a paper of mine as an example. In this study (Cohen et al., 2020, Proc Natl Acad Sci USA) we used gaze-contingent virtual reality to examine how much color people see in the world. On the critical trial, the part of the scene they fixated on was in color, but the periphery was entirely in black and white. As soon as the trial ended, we asked participants a series of questions to determine what they noticed. The list of questions included:

      (1) "Did you notice anything strange or different about that last trial?"

      (2) "If I were to tell you that we did something odd on the last trial, would you have a guess as to what we did?"

      (3) "If I were to tell you we did something different in the second half of the last trial, would you have a guess as to what we did?"

      (4) "Did you notice anything different about the colors in the last scene?"

      (5) We then showed observers the previous trial again and drew their attention to the effect and confirmed that they did not notice that previously.

      In a situation like this, when the observers are asked so many questions, do the authors believe that "the inattentionally blind can see after all?" I believe they would not say that and the reason they would not say that is because of the follow-up questions after the initial yes/no question. But since so many previous studies use similar follow-up questions, I do not think you can state that the field is broadly overestimating inattentional blindness. This is why it seems to me to be a bit of a strawman: most people do not just use the yes/no method.

      We appreciate this reviewer raising this issue. As he (Dr. Cohen) states, his “primary issue” concerns our discussion of the broader literature (which he worries understates recent improvements made to the IB methodology), rather than, e.g., the experiments we’ve run. We take this concern very seriously and address it comprehensively here.

      A very similar issue is identified by Reviewer #1, comment (7). To review some of what we say in reply to them: To address the concern we have added to our manuscript a substantial new discussion of such improved methods. However, although we do agree that these methods can be helpful and may well address some of the methodological concerns which our paper raises, we do not think that they are a panacea. Thus, our discussion of these methods also includes a substantial discussion of the problems and pitfalls with such methods which led us to favor our own simple forced-response and 2afc questions, combined with SDT analysis. We think this approach is superior both to the classic approach in IB studies and to the approach raised by the reviewers.

      In particular, we have three main concerns about the follow up questions now commonly used in the field:

      First, many follow up questions are used not to exclude subjects from the IB group but to include subjects in the IB group. Thus, Most et al. (2001) asked follow up questions but used these to increase their IB group, only excluding subjects from the IB group if they both reported seeing and failed to answer their follow ups correctly: “Observers were regarded as having seen the unexpected object if they answered 'yes' when asked if they had seen anything on the critical trial that had not been present before and if they were able to describe its color, motion, or shape." This means that subjects who saw the object but failed to describe it in these respects would be treated as inattentionally blind. This is problematic since failure to describe a feature (e.g., color, shape) does not imply a complete lack of information concerning that feature; and even if a subject did lack all information concerning these features of an object, this would not imply a complete failure to see the object. Similarly, Pitts et al. (2012) asked subjects to rate their confidence in their initial yes/no response from 1 = least confident to 5 = most confident, and used these ratings to include in the IB group those who rated their confidence in seeing at 3 or less. This is evidently problematic, since there is a large gap between being under confident that one saw something and being completely blind to it. More generally, using follows up to inflate IB rates in such ways raises precisely the kinds of issues our paper is intended to critique. So in our view this isn’t an improvement but rather part of the approach we take issue with.

      Second, many follow up questions remain yes/no questions or nearby variants, all of which are subject to response bias. For example, in the reviewer’s own studies (Cohen et al. 2020, 2011; see also: Simons et al., 1999; Most et al., 2001, 2005; Drew et al., 2013; Memmert, 2014) a series of follow up questions are used to try and ensure that subjects who noticed the critical stimuli are given the maximum opportunity to report doing so, e.g.:

      (1) “Did you notice anything strange or different about that last trial?”

      (2) “If I were to tell you that we did something odd on the last trial, would you have a guess as to what we did?”

      (3) “If I were to tell you we did something different in the second half of the last trial, would you have a guess as to what we did?”

      (4) “Did you notice anything different about the colors in the last scene?”

      We certainly agree that such follow up questions improve over a simple yes/no question in some ways. However, such follow up probes nonetheless remain yes/no questions, intrinsically subject to response bias. Indeed, follow up questions of this kind can be especially susceptible to bias, since subjects may be reluctant to “take back” their earlier answers and so be conservative in responding positively to avoid inconsistency or acknowledgement of earlier error. This may explain why such follow up questions produce remarkable consistency despite their rather different wording. Thus, Simons and Chabris (1999) report: “Although we asked a series of questions escalating in specificity to determine whether observers had noticed the unexpected event, only one observer who failed to report the event in response to the first question (“did you notice anything unusual?'') reported the event in response to any of the next three questions (which culminated in “did you see a ... walk across the screen?''). Thus, since the responses were nearly always consistent across all four questions, we will present the results in terms of overall rates of noticing.” Thus, while there are undoubtedly merits to these follow ups, they do not resolve problems of bias.

      It is also important to recognize that whereas 2afc questions are criterion free (in that they naturally have an unbiased decision rule), this is not true of n_afc nor delayed _n-alternative match to sample designs in general. Performance in such tasks thus requires SDT analysis – which itself may be problematic if the decision space is not properly understood or requires making substantial assumptions about observer strategy.

      Third, and finally, many follow up questions are insufficiently sensitive (especially with small sample sizes). For instance, Todd, Fougnie & Marois (2005) used a 12-alternative match-tosample task (see similarly: Fougnie & Marois, 2007; Devue et al., 2009). And Most et al. (2005) asked an open-response follow-up: “If you did see something on the last trial that had not been present during the first two trials, what color was it? If you did not see something, please guess.” These questions are more difficult and to that extent less sensitive than binary forced-response/2afc questions of the sort we use in our own studies – a difference which may be critical in uncovering degraded perceptual sensitivity.

      For all these reasons, then, while we agree that the field has taken significant steps to move beyond the simple yes/no question traditionally used in IB studies (and we have revised our manuscript to make this clear); we do not think it has resolved the methodological issues which our paper seeks to highlight and address, and we believe that our approach of using 2afc or forced-response questions combined with signal detection analysis is an important improvement on prior methods and contributes something additional that is not yet present in the literature. We have now revised our manuscript to make these points much clearer.

      Other studies that improve on the standard methodology

      This reviewer adds something else, however: A very helpful list of 18 papers which include follow ups and that he believes overcome many of the issues we raise in our paper. To just state our reaction bluntly: We are familiar with every one of these papers (indeed, one of them is a paper by one of us!), and while we think these are all very valuable contributions to the literature, it is our view that none of these 18 papers resolves the worries that led us to conduct our work.  

      Here we briefly comment on the relevant pitfalls in each case. We hope this serves to underscore the importance of our methodological approach.

      (1) Most et al. (2005) Psych Review

      Either a 2-item or 5-item questionnaire was used. The 2-item questionnaire ran as follows:

      (1) On the last trial, did you see anything other than the 4 circles and the 4 squares (anything that had not been present on the original two trials)? Yes No 

      (2) If you did see something on the last trial that had not been present during the original two trials, please describe it in as much detail as possible.

      This clearly does not substantially improve on the traditional simple yes/no question. Moreover, the second question (as well as being open-ended) was used to include additional subjects in the IB group, in that participants were counted as having seen the object only if they responded “yes” to Q1 and in addition “were able to report at least one accurate detail” in response to Q2. In other words, either a subject says “no” (and is treated as unaware), or says “yes” and then is asked to prove their awareness, as it were. If anything, this intensifies the concerns we raise, by inflating IB rates. 

      The 5-item questionnaire looked like this: 

      (1) On the last trial, did you see anything other than the black and white L’s and T’s (anything that had not been present on the first two trials)?

      (2) If you did see something on the last trial that had not been present during the first two trials, please describe it.

      (3) If you did see something on the last trial that had not been present during the first two trials, what color was it? If you did not see something, please guess. (Please indicate whether you did see something or are guessing)

      (4) If you did see something during the last trial that had not been present in the first two trials, please draw an arrow on the “screen” below showing the direction in which it was moving. If you did not see something, please guess. (Please indicate whether you did see something or are guessing)

      (5) If you did see something during the last trial that had not been present during the first two trials, please circle the shape of the object below [4 shapes are presented to choose from]. If you did not see anything, please guess. (Please indicate whether you did see something or are guessing)

      Q5 was not used for analysis purposes. (It suffers from the second issue raised above.) Q1 is the traditional y/n question. Qs 2&3 are open ended. It is unclear how responses to Q4 were analyzed (at the limit it could be considered a helpful, forced-choice question – though it again would suffer from the second issue raised above). However, as noted with respect to the 2-item questionnaire, these responses were not used to exclude people from the IB group but to include people in it. So again, this approach does not in any way address the issues we are concerned about, and if anything, only makes them worse. 

      (2)  Drew et al. (2013) Psych Science

      All follow ups were yes/no: “we asked a series of questions to determine whether they noticed the gorilla: ‘Did the final trial seem any different than any of the other trials?’, ‘Did you notice anything unusual on the final trial?’, and, finally, ‘Did you see a gorilla on the final trial?’”. So, this paper essentially implements the standard methodology we mention (and criticize). 

      (3)  Drew et al. (2016) Journal of Vision

      Follow up questions were used, but the reported procedure does not provide sufficient details to evaluate them (we are only told: “After the final trial, they were asked: ‘On that last trial of the task, did you notice anything that was not there on previous trials?’ They then answered questions about the features of the unexpected stimulus on a separate screen (color, shape, movement, and direction of movement).”). It is not clear that these follow ups were used to exclude any subjects from the analysis. Finally, given that the unexpected object could be the same color as the targets/distractors, it is clear that biases would have been introduced which would need to be considered (but which were not).

      (4)  Simons & Chabris (1999) Perception

      All follow ups were yes/no: “observers were … asked to provide answers to a surprise series of additional questions. (i) While you were doing the counting, did you notice anything unusual on the video? (ii) Did you notice any- thing other than the six players? (iii) Did you see anyone else (besides the six players) appear on the video? (iv) Did you see a gorilla [woman carrying an umbrella] walk across the screen? After any “yes'' response, observers were asked to provide details of what they noticed. If at any point an observer mentioned the unexpected event, the remaining questions were skipped.” As noted previously, the analyses in fact did not use these questions to exclude subjects since answers were so consistent.

      (5)  Simons and Levin (1998) Perception

      This is a change detection paradigm, not a study of inattentional blindness. And in any case, one yes/no follow up was used: “Did you notice that I'm not the same person who approached you to ask for directions?”

      (6)  Chabris et al. (2011) iPerception

      Two yes/no questions were asked: “we asked whether the subjects had seen anything unusual along the route, and then whether they had seen anyone fighting.” It seems that follow up questions (a request to describe the fight) were asked only of those who said yes.

      This is in fact a common procedure – follow up questions only being asked of the “yes” group. As discussed, it is sometimes used to increase rates of IB, compounding the problem we identify in our paper. So this is another example of a follow-up question that makes the problem we identify worse, not better.

      (7) Ward & Scholl (2015) Psych Bulletin and Review

      Two yes/no questions were used: “...observers were asked whether they noticed ‘anything … that was different from the first three trials’ — and if so, to describe what was different. They were then shown the gray cross and asked if they had noticed it—and if so, to describe where it was and how it moved. Only observers who explicitly reported not noticing the cross were counted as ‘nonnoticers’ to be included in the final sample (N = 100).” In each case, combining the traditional noticing question with a request to describe and identify may have induced conservative response biases in the noticing question, since a subject might consider being able to describe or identify the unexpected stimulus a precondition of giving a positive answer to the noticing question.

      (8) Most et al. (2001) Psych Science

      The same 5-item questionnaire discussed above in relation to Most et al. (2005) was used: 

      (1) On the last trial, did you see anything other than the black and white L’s and T’s (anything that had not been present on the first two trials)?

      (2)   If you did see something on the last trial that had not been present during the first two trials, please describe it.

      (3) If you did see something on the last trial that had not been present during the first two trials, what color was it? If you did not see something, please guess. (Please indicate whether you did see something or are guessing)

      (4) If you did see something during the last trial that had not been present in the first two trials, please draw an arrow on the “screen” below showing the direction in which it was moving. If you did not see something, please guess. (Please indicate whether you did see something or are guessing)

      (5) If you did see something during the last trial that had not been present during the first two trials, please circle the shape of the object below [4 shapes are presented to choose from]. If you did not see anything, please guess. (Please indicate whether you did see something or are guessing)

      Q5 was not used for analysis purposes. (It suffers from the second issue raised above.) Q1 is the traditional yes/no question. Qs 2&3 are open ended. It is unclear how responses to Q4 were analyzed (at the limit it could be considered a helpful, forced-choice question – though it again would suffer from the second issue raised above). However, as noted with respect to the two item questionnaire in Most et al. 2005, these responses were not used to exclude people from the IB group but to include people in it. So again this approach does not in any way address the issues we are concerned about, and if anything only makes them worse.

      (9) Todd, Fougnie & Marois (2005) Psych Science

      “participants were probed with three questions to determine whether they had detected the critical stimulus ... .The first question assessed whether subjects had seen anything unusual during the trial; they responded ‘‘yes’’ or ‘‘no’’ by pressing the appropriate key on the keyboard. The second question asked participants to select which stimulus they might have seen among 12 possible objects and symbols selected from MacIntosh font databases. The third question asked participants to select the quadrant in which the critical stimulus may have appeared by pressing one of four keys, each of which corresponded to one of the quadrants.”

      These follow ups were used to include people in the IB group: “In keeping with previous studies (Most et al., 2001), participants were considered to have detected the critical stimulus successfully if they (a) reported seeing an unexpected stimulus and (b) correctly selected its quadrant location.” In line with our third point about sensitivity, the object identity test transpired to be “too difficult even under full-attention conditions … Thus, performance with this question was not analyzed further.”

      (10) Fougnie & Marois (2007) Psych Bulletin and Review

      Same exact methods and problems as with Todd & Marois (2005) Psych Science, just discussed.

      (11) New and German (2015) Evolution and Human Behaviour

      “After the fourth trial containing the additional experimental stimulus, the participant was asked, “Did you see anything in addition to the cross on that trial?” and which quadrant the additional stimulus appeared in. They were then asked to identify the stimulus in an array which in Experiment 1 included two variants chosen randomly from the spider stimuli and the two needle stimuli. Participants in Experiment 2 picked from all eight stimuli used in that experiment.”

      Our second concern about response biases and the need for appropriate SDT analysis of the 4/8 alternative tasks applies to all these questions. We also note that analyses were only performed on groups separately (those who detected/failed to detect, those who located/failed to locate, and those who identified/failed to identify) and on the group which did all three/failed to do any one of the three. Especially in light of the fact that some subjects could clearly detect the stimulus without being able to identity it (e.g.), the most stringent test given our concerns (which were not obviously New and German’s comparative concerns), would be to consider the group which could not detect, identify or localize.

      (12) Jackson-Nielsen (2017) Consciousness and cognition

      This is a very interesting example of a follow-up which used a 3-AFC recognition test:

      “participants were immediately asked, ‘‘which display looks most like what you just saw?’ from 3 alternatives”. However, though such an objective test is definitely to be preferred in our view to an open-ended series of probes, the 3-AFC test administered clearly had issues with response biases, as discussed, and actually yielded significantly below chance performance in one of the experiments.

      (13) Mack et al. (2016) Consciousness and cognition

      The follow ups here were essentially yes/no combined with an assessment of surprise. Participants were asked to enter letters into a box, and if they did so “were immediately asked by the experimenter whether they had noticed anything different about the array on this last trial and if they did not, they were told that there had been no letters and their responses to that news were recorded. Clearly, if they expressed surprise, this would be compelling evidence that they were unaware of the absence of the letters. Those observers who did not enter letters and realized there were no letters present were considered aware of the absence.” So, this again has all of the same problems we identify, considering subjects unaware because they expressed surprise.

      (14) Devue et al. (2009) Perception

      An 8-alternative task was used. The authors were primarily interested in a comparative analysis and so did not use this task to exclude subjects. We note that an 8 alternative task is very demanding – compare the 12-alternative task used in Todd, Fougnie & Marois (2005). There was an attempt to investigate biases in a separate bias trial, however SDT measures were not used.

      (15) Memmert (2014) Cognitive Development

      “After watching the video and stating the number of passes, participants answered four questions (following Simons & Chabris, 1999): (1) While you were counting, did you perceive anything unusual on the video? (2) Did you perceive anything other than the six players? (3) Did you see anyone else (besides the six players) appear on the video? (4) Did you notice a gorilla walk across the screen? After any “yes” reply, children were asked to provide details of what they noticed. If at any point a child mentioned the unexpected event, the remaining questions were omitted.” All of these follow-up questions are yes/no judgments, used to determine awareness in exactly the way we critique as problematic.

      (16) Moore & Egeth (1997) JEP:HPP

      This study (which includes one of us, Egeth, as author) did use forced choice questions. In one case, the question was 2-alternative, in the other it was 4-alternative. In the latter case, SDT would have been appropriate but was not used. In the former case, it may have been that a larger sample would have revealed evidence of sensitivity to the background pattern (as it stood 55% answered the 2-alternative question correctly). Although these results have been replicated, unfortunately the replication in Wood and Simons 2019 used a 6-alternative recognition task and this was not analyzed using SDT. We also note that the task is rather difficult in this study. Wood and Simons report: “Exclusion rates were much higher than anticipated, primarily due to exclusions when subjects failed to correctly report the pattern on the full-attention trial; we excluded 361 subjects, or 58% of our sample.”

      (17) Cohen et al. (2020) Proc Natl Acad Sci

      While this paper improves over a simple yes/no question in some ways, especially in that it used the follow up questions to exclude subjects from the unaware (IB) group, the follow up probes nonetheless remain yes/no questions, subject to response bias, e.g.:

      (1) “Did you notice anything strange or different about that last trial?”

      (2) “If I were to tell you that we did something odd on the last trial, would you have a guess as to what we did?”

      (3) “If I were to tell you we did something different in the second half of the last trial, would you have a guess as to what we did?”

      (4) “Did you notice anything different about the colors in the last scene?”

      Follow up questions of this kind can be especially susceptible to bias, since subjects may be reluctant to “take back” their earlier answers and so be conservative in responding positively to avoid inconsistency or acknowledgement of earlier error. This may explain why such follow up questions can produce remarkable consistency despite their rather different wording. 

      (18) Cohen et al. (2011) Psych Science

      Here are the probes used in this study:

      (1) Did you notice anything different on that trial?

      (2) Did you notice something different about the background stream of images?

      (3) Did you notice that a different type of image was presented in the background that was unique in some particular way?

      (4) Did you see an actual photograph of a natural scene in that stream?

      (5) If I were to tell you that there was a photograph in that stream, can you tell me what it was a photograph of?

      Qs 1-4 are yes/no. Q5 is yes/no with an open-ended response. After this, a 5 or 6-alternative recognition test was administered. So again, this faces the same issues, since y/n questions are subject to bias in the way we have described, and many-alternative tests are more problematic than 2afc tests.

      In summary

      We really appreciate the care that went into compiling this list, and we agree that these papers and the improved methods they contain are relevant. But as hopefully made clear above, the approaches in each of these papers simply don’t solve the foundational issues our critique is aimed at (though they may address other issues). This is why we felt our new approach was necessary. And we continue to feel this way even after reading and incorporating these comments from Dr. Cohen.

      Nevertheless, there is clearly lots for us to do in light of these comments. And so as noted earlier we have now added a very substantial new section to our discussion section to more fairly and completely portray the state of the art in this literature. This is really to our benefit in the end, since we now not only better acknowledge the diverse approaches present, but also set up ourselves to make our novel contribution exceedingly clear.

      Main point 2: Let's imagine for a second that every study did just ask a yes/no question and then would stop. So, the criticism the authors are bringing up is valid (even though I believe it is not). I am not entirely sure that above chance performance on a forced choice task proves that the inattentionally blind can see after all. Could it just be a form of subliminal priming? Could there be a significant number of participants who basically would say something like, "No I did not see anything, and I feel like I am just guessing, but if you want me to say whether the thing was to the left or right, I will just 100% guess"? I know the literature on priming from things like change and inattentional blindness is a bit unclear, but this seems like maybe what is going on. In fact, maybe the authors are getting some of the best priming from inattentional blindness because of their large sample size, which previous studies do not use.

      I'm curious how the authors would relate their studies to masked priming. In masked priming studies, observers say the did not see the target (like in this study) but still are above chance when forced to guess (like in this study). Do the researchers here think that that is evidence of "masked stimuli are truly seen" even if a participant openly says they are guessing?

      We’re grateful to the reviewer for raising this question. As we say in response to Reviewer #1, our primary ambition in the paper is to establish, as our title suggests, residual sensitivity in IB. The ambition is quite neutral as to whether the sensitivity reflects conscious or unconscious processing (i.e. is akin to blindsight as traditionally conceived, or what the reviewer here suggests may be happening in masked priming). Since we were evidently insufficiently clear about this we have revised our manuscript in several places to clarify that we take our data primarily to support the more modest claim that there is residual sensitivity (conscious or unconscious) in the group of subjects who are traditionally classified as inattentionally blind. We believe that this claim has much more solid support in our data than our secondary and tentative suggestion about awareness.

      This said, we do consider masked priming studies to be susceptible to the critique that performance may reflect degraded conscious awareness which is unreported because of conservative response criteria. There is good evidence that response criteria tend to be conservative near threshold (Björkman et al. 1993; see also: Railo et al. 2020), including specifically in masked priming studies (Sand 2016, cited in Phillips 2021). So, we consider it a perfectly reasonable hypothesis that subjects who say they feel they are guessing in fact have conscious access to a degraded signal which is insufficient to reach a conservative response criterion but nonetheless sufficient to perform above chance in 2afc detection. Of course, we appreciate that this hypothesis is controversial, so it is not one we argue for in our paper (though we are happy to share our feelings about it here).

      Main point 3: My last question is about how the authors interpret a variety of inattentional blindness findings. Previous work has found that observers fail to notice a gorilla in a CT scan (Drew et al., 2013), a fight occurring right in front of them (Chabris et al., 2011), a plane on a runway that pilots crash into (Haines, 1991), and so forth. In a situation like this, do the authors believe that many participants are truly aware of these items but simply failed to answer a yes/no question correctly? For example, imagine the researchers made participants choose if the gorilla was in the left or right lung and some participants who initially said they did not notice the gorilla were still able to correctly say if it was in the left or right lung. Would the authors claim "that participant actually did see the gorilla in the lung"? I ask because it is difficult to understand what it means to be aware of something as salient as a gorilla in a CT scan, but say "no" you didn't notice it when asked a yes/no question. What does it mean to be aware of such important, ecologically relevant stimuli, but not act in response to them and openly say "no" you did not notice them?

      Our view is that in such cases, observers may well have a “degraded” percept of the relevant feature (gorilla, plane, fight etc.). But crucially we do not suggest that this percept is sufficient for observers to recognize the object/event as a gorilla, plane, fight etc. Our claim is only that, in our studies at least, observers (as a group) do have enough information about the unexpected stimuli to locate them, and discriminate certain low level features better than chance. Crudely, it may be that subjects see the gorilla simply as a smudge or the plane as a shadowy patch etc. (One of us who is familiar with the gorilla CT scan stimuli notes that the gorilla is in fact rather hard to see even when you know which slide it is on, suggesting that they are not as “salient” as the reviewer suggests!) 

      More precisely, in the paper we write that in our view perhaps “...unattended stimuli are encoded in a partial or degraded way. Here we see a variety of promising options for future work to investigate. One is that unattended stimuli are only encoded as part of ensemble representations or summary scene statistics (Rosenholtz, 2011; Cohen et al., 2016). Another is that only certain basic “low-level” or “preattentive” features (see Wolfe & Utochkin, 2019 for discussion) can enter awareness without attention. A final possibility consistent with the present data is that observers can in principle be aware of individual objects and higher-level features under inattention but that the precision of the corresponding representations is severely reduced. Our central aim here is to provide evidence that awareness in inattentional blindness is not abolished. Further work is needed to characterize the exact nature of that awareness.” We hope this sheds light on our perspective while still being appropriately cautious not to go too far beyond our data.

      Overall: I believe there are many aspects of this set of studies that are innovative and I hope the methods will be used more broadly in the literature. However, I believe the authors misrepresent the field and overstate what can be interpreted from their results. While I am sure there are cases where more nuanced questions might reveal inattentional blindness is somewhat overestimated, claims like "the inattentionally blind can see after all" or "Inattentionally blind subjects consciously perceive thest stimuli after all" seem to be incorrect (or at least not at all proven by this data).

      Once again, we would like to thank this reviewer for his feedback, which obviously comes from a place of tremendous expertise on these issues. We appreciate his assessment that our studies are innovative and that our methodological advances will be of use more broadly. We also hear the reviewer loud and clear about the passages in question, which on reflection we agree are not as central to our case as the other claims we make (regarding residual sensitivity and conservative responding), and so we have now edited them accordingly to refocus our discussion on only those claims that are central and supported. Thank you for making our paper stronger!

      Reviewer #3 (Public review):

      Summary:

      Authors try to challenge the mainstream scientific as well as popularly held view that Inattentional

      Blindness (IB) signifies subjects having no conscious awareness of what they report not seeing (after being exposed to unexpected stimuli). They show that even when subjects indicate NOT having seen the unexpected stimulus, they are at above chance level for reporting features such as location, color or movement of these stimuli. Also, they show that 'not seen' responses are in part due to a conservative bias of subjects, i.e. they tend to say no more than yes, regardless of actual visibility. Their conclusion is that IB may not (always) be blindness, but possibly amnesia, uncertainty etc.

      We just thought to say that we felt this was a very accurate summary of our claims, and in ways underscore the modesty we had hoped to convey. This is especially true of the reviewer’s final sentence: “Their conclusion is that IB may not (always) be blindness, but possibly amnesia, uncertainty etc.”; as we noted in response to other reviewers, our claim is not that IB doesn’t exist, that subjects are always conscious of the stimulus, etc.; it is only that the cohort of IB subjects show sensitivity to the unattended stimulus in ways that suggest they are not as blind as traditionally conceived. Thank you for reading us as intended!

      Strengths:

      A huge pool of (25.000) subjects is used. They perform several versions of the IB experiments, both with briefly presented stimuli (as the classic Mack and Rock paradigm), as well as with prolonged stimuli moving over the screen for 5 seconds (a bit like the famous gorilla version), and all these versions show similar results, pointing in the same direction: above chance detection of unseen features, as well as conservative bias towards saying not seen.

      We’re delighted that the reviewer appreciated these strengths in our manuscript!

      Weaknesses:

      Results are all significant but effects are not very strong, typically a bit above chance. Also, it is unclear what to compare these effects to, as there are no control experiments showing what performance would have been in a dual task version where subjects have to also report features etc for stimuli that they know will appear in some trials

      The backdrop to the experiments reported here is the “consensus view” (Noah & Mangun, 2020) according to which inattention completely abolishes perception, such that subjects undergoing IB “have no awareness at all of the stimulus object” (Rock et al., 1992) and that “one can have one’s eyes focused on an object or event … without seeing it at all” (Carruthers, 2015). In this context, we think our findings of significant above-chance sensitivity (e.g., d′ = 0.51 for location in Experiment 1; chance, of course, would be d′ = 0 here) are striking and constitute strong evidence against the consensus view. We of course agree that the residual sensitivity is far lower than amongst subjects who noticed the stimulus. For this reason, we certainly believe that inattention has a dramatic impact on perception. To that extent, our data speak in favor of a “middle ground” view on which inattention substantially degrades but crucially does not abolish perception/explicit encoding. We see this as an importantly neglected option in a literature which has overly focused on seen/not seen binaries (see our section ‘Visual awareness as graded’).

      Regarding the absence of a control condition, we think those conditions wouldn’t have played the same role in our experiments as they typically play in other experiments. As Reviewer #1 comments, the main role of such trials in previous work has been to exclude from analysis subjects who failed to report the unexpected stimulus on the divided and/or full attention control trials. As Reviewer #1 points out, excluding such subjects would very likely have ‘helped’ us. However, the practice is controversial. Indeed, in a review of 128 experiments, White et al. 2018 argue that the practice has “problematic consequences” and “may lead researchers to understate the pervasiveness of inattentional blindness". Since we wanted to offer as simple and demanding a test of residual sensitivity in IB as possible, we thus decided not to use any exclusions, and for that reason decided not to include divided/full attention trials.

      As recommended, we discuss this decision not to include divided/full attention trials and our logic for not doing so in the manuscript. As we explain, not having those conditions makes it more impressive, not less impressive, that we observed the results we in fact did — it makes our results more interpretable, not less interpretable, and so absence of such conditions from our manuscript should not (in our view) be considered any kind of weakness.

      There are quite some studies showing that during IB, neural processing of visual stimuli continues up to high visual levels, for example, Vandenbroucke et al 2014 doi:10.1162/jocn_a_00530 showed preserved processing of perceptual inference (i.e. seeing a kanizsa illusion) during IB. Scholte et al 2006 doi: 10.1016/j.brainres.2005.10.051 showed preserved scene segmentation signals during IB. Compared to the strength of these neural signatures, the reported effects may be considered not all that surprising, or even weak.

      We agree that such evidence of neural processing in IB is relevant to — and perhaps indeed consistent with — our picture, and we’re grateful to the reviewer for pointing out further studies along those lines. Previously, we mentioned a study from Pitts et al., 2012 in which, as we wrote, “unexpected line patterns have been found to elicit the same Nd1 ERP component in both noticers and inattentionally blind subjects (Pitts et al., 2012).” We have added references to both the studies which the reviewer mentions – as well as an additional relevant study – to our manuscript in this context. Thank you for the helpful addition.

      We do however think that our studies are importantly different to this previous work. Our question is whether processing under IB yields representations which are available for explicit report and so would constitute clear evidence of seeing, and perhaps even conscious experience. As we discuss, evidence for this kind of processing remains wanting: “A handful of prior studies have explored the possibility that inattentionally blind subjects may retain some visual sensitivity to features of IB stimuli (e.g., Schnuerch et al., 2016; see also Kreitz et al., 2020, Nobre et al., 2020). However, a recent meta-analysis of this literature (Nobre et al., 2022) argues that such work is problematic along a number of dimensions, including underpowered samples and evidence of publication bias that, when corrected for, eliminates effects revealed by earlier approaches, concluding “that more evidence, particularly from well-powered pre-registered experiments, is needed before solid conclusions can be drawn regarding implicit processing during inattentional blindness” (Nobre et al., 2022).” Our paper is aimed at addressing this question which evidence of neural processing can only speak to indirectly.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) Please report all of the data, especially the number of subjects in each experiment that answered Y/N and the numbers of subjects in each of the Y and N groups that guessed a feature correctly/incorrectly on the 2AFC tasks. And also the confidence ratings for the 2AFC task (for comparison with the confidence ratings on the Y/N questions).

      We now report all this data in our (revised) Supplementary Materials. We agree that this information will be helpful to readers.

      (2) Consider adding a control condition with partial attention (dual task) or full attention (single task) to estimate the rates of seeing the critical stimulus when it's expected.

      This is the only recommendation we have chosen not to implement. The reason, as we explain in detail above (especially in response to Reviewer #1 comment 5), is that this would not in fact be a “control condition” in our studies, and indeed would only inflate the biases we are concerned with in our work. As the referee comments, the main role of such trials in previous work has been to exclude from analysis subjects who failed to report the unexpected stimulus on the divided and/or full attention control trials. And the practice is controversial: Indeed, in a review of 128 experiments, White et al. 2018 argue that the practice has “problematic consequences” and “may lead researchers to understate the pervasiveness of inattentional blindness" (emphasis added). So, our choice not to have such conditions ensures an especially stringent test of our central claim. Not having those conditions (and their accompanying exclusions) makes our results more interpretable, not less interpretable, and so the absence of such conditions from our manuscript should not (in our view) be considered any kind of weakness.

      We have added a paragraph to our “Design and analytical approach” section explaining the logic behind our deliberate decision not to include divided or full attention trials in our experiments. (For even fuller discussion, see our response to Reviewer #1’s comment 5 above.)

      (3) Consider revising the interpretations to be more precise about the distinction between the super subject being above chance versus each individual subject who cannot be at chance or above chance because there was only a single trial per subject.

      We have now done this throughout the manuscript, as discussed above. We have also added a substantive additional discussion to our “Design and analytical approach” section discussing what should be said about individual subjects in light of our group level data.

      This was a very helpful point, and greatly clarifies the claims we wish to make in the paper. Thank you for this comment, which has certainly made our paper stronger.

      Reviewer #2 (Recommendations for the authors):

      I would be curious to hear the authors' response to two points:

      (1) What do they have to say about prior studies that do more than just ask yes/no questions (and ask several follow-ups)? Are those studies "valid"?

      A very substantial new discussion of this important point has been added. As you will see above, we comment on every one of the 18 papers this reviewer raised (as well as the general argument made); we contend that while many of these papers improve on past methodology in various ways, most in fact do “just ask yes/no questions”, and none of them makes the methodological advance we offer in our manuscript. However, this discussion has helped us clarify that very advance, and so working through this issue has really helped us improve our paper and make its relation to existing literature that much clearer. Thank you for raising this crucial point.

      (2) Do the authors think it is possible that in many cases, people are just guessing about a critical item's location or color and this is at least in part a form of priming?

      We have clarified our discussion in numerous places to further emphasize that our main point concerns above-chance sensitivity, not awareness. Given this, we take very seriously the hypothesis that something like priming of a kind sometimes proposed to occur in cases of blindsight or other putative cases of unconscious perception could be what is driving the responses in non-noticers.

      Reviewer #3 (Recommendations for the authors):

      (1) Control dual task version with expected stimuli would be nice

      We have added a paragraph to our “Design and analytical approach” section explaining the logic behind our deliberate decision not to include divided or full attention trials, which would not in fact be a “control” task in our experiments. For full discussion, see our response to Reviewer 3 above, as well as our summary here in the Recommendations for Authors section in responding to Reviewer 1, recommendation (2).

      (2) Please do a better job in discussing and introducing experiments about neural signatures during IB.

      A discussion of Vandenbroucke et al. 2014 and Scholte et al. 2006 has been added to our discussion of neural signatures in IB, as well as an additional reference to an important early study of semantic processing in IB (Rees et al., 1999). Thank you for these very helpful suggestions!

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Dong et al. study the directed cell migration of tracheal stem cells in Drosophila pupae. The migration of these cells which are found in two nearby groups of cells normally happens unidirectionally along the dorsal trunk towards the posterior. Here, the authors study how this directionality is regulated. They show that inter-organ communication between the tracheal stem cells and the nearby fat body plays a role. They provide compelling evidence that Upd2 production in the fat body and JAK/STAT activation in the tracheal stem cells play a role. Moreover, they show that JAK/STAT signalling might induce the expression of apicobasal and planar cell polarity genes in the tracheal stem cells which appear to be needed to ensure unidirectional migration. Finally, the authors suggest that trafficking and vesicular transport of Upd2 from the fat body towards the tracheal cells might be important.

      Strengths:

      The manuscript is well written. This novel work demonstrates a likely link between Upd2JAK/STAT signalling in the fat body and tracheal stem cells and the control of unidirectional cell migration of tracheal stem cells. The authors show that hid+rpr or Upd2RNAi expression in a fat body or Dome RNAi, Hop RNAi, or STAT92E RNAi expression in tracheal stem cells results in aberrant migration of some of the tracheal stem cells towards the anterior. Using ChIP-seq as well as analysis of GFP-protein trap lines of planar cell polarity genes in combination with RNAi experiments, the authors show that STAT92E likely regulates the transcription of planar cell polarity genes and some apicobasal cell polarity genes in tracheal stem cells which appear to be needed for unidirectional migration. Moreover, the authors hypothesise that extracellular vesicle transport of Upd2 might be involved in this Upd2-JAK/STAT signalling in the fat body and tracheal stem cells, which, if true, would be quite interesting and novel.

      Overall, the work presented here provides some novel insights into the mechanism that ensures unidirectional migration of tracheal stem cells that prevents bidirectional migration. This might have important implications for other types of directed cell migration in invertebrates or vertebrates including cancer cell migration.

      Weaknesses:

      It remains unclear to what extent Upd2-JAK/STAT signalling regulates unidirectional migration. While there seems to be a consistent phenotype upon genetic manipulation of Upd2-JAK/STAT signalling and planar cell polarity genes, as in the aberrant anterior migration of a fraction of the cells, the phenotype seems to be rather mild, with the majority of cells migrating towards the posterior.

      We agree that the phenotype is mild, as perturbing JAK/STAT signaling in the progenitors specifically affects the coordinated migration of the cells rather than alters their direction or completely blocks migration. Our data indicate that inter-organ communication ensures coordinated behavior of the progenitor cells, although the differential responses exhibited by individual cells represent an interesting unresolved issue that awaits future in-depth investigation.

      While I am not an expert on extracellular vesicle transport, the data presented here regarding Upd2 being transported in extracellular vesicles do not appear to be very convincing.

      We performed additional PLA experiments which support the interaction between Upd2 and the core components of extracellular vesicles (revised Figure 8). Furthermore, we performed electron microscopy to visualize the Lbm-containing vesicles in fat body (Figure 8-figure supplement 1D).

      These data are now provided in the revised manuscript.

      Major comments:

      (1) The graphs showing the quantification of anterior (and in some cases also posterior migration) are quite confusing. E.g. Figure 1F (and 5E and all others): These graphs are difficult to read because the quantification for the different conditions is not shown separately. E.g. what is the migration distance for Fj RNAi anterior at 3h in Fig5E? Around -205micron (green plus all the other colors) or around -70micron (just green, even though the green bar goes to -205micron). If it's -205micron, then the images in C' or D' do not seem to show this strong phenotype. If it's around -70, then the way the graph shows it is misleading, because some readers will interpret the result as -205. Moreover, it's also not clear what exactly was quantified and how it was quantified. The details are also not described in the methods. It would be useful, to mark with two arrowheads in the image (e.g. 5 A' -D') where the migration distance is measured (anterior margin and point zero).

      Overall, it would be better, if the graph showed the different conditions separately. Also, n numbers should be shown in the figure legend for all graphs.

      We apologize for those inappropriate presentation and insufficient description and thank you for kindly pointing them out. We used different colors to represent different genotypes, and the columns were superimposed. we chose to show the quantification in different conditions separately in the revised Figures. The anterior migration distance for Fj RNAi is around 70 µm.

      We now provided detailed description in the revised methods. For migration distance measurement, we took snapshots at 0hr\ 1hr\ 2hr and 3hr, and measured the distance from the starting point (the junction of TC and DT) to the leading edge of progenitor clusters. The velocity formula: v=d (micrometer)/t (min). As you kindly suggested, we indicated the anterior margin and point zero in the corresponding panels. We have added n number in the legends.

      (2) Figure 2-figure supplement 1: C-L and M: From these images and graph it appears that Upd2 RNAi results in no aberrant anterior migration. Why is this result different from Figures 2D-F where it does?

      The fat body-expressing lsp2-Gal4 was used in Figure 2-figure supplement 1C-L and Figure 2D-F, while trachea specific btl-Gal4 was used in Figure 2-figure supplement 1K-L. The lsp2-Gal4-driven but not btl-Gal4-driven upd2RNAi causes aberrant anterior migration, suggesting that fat bodyderived Upd2 plays a role. We have further clarified this in the text.

      (3) Figure 5F: The data on the localisation of planar cell polarity proteins in the tracheal stem cell group is rather weak. Figure 5G and J should at least be quantified for several animals of the same age for each genotype. Is there overall more Ft-GFP in the cells on the posterior end of the cell group than on the opposite side? Or is there a more classic planar cell polarity in each cell with FtGFP facing to the posterior side of the cell in each cell? Maybe it would be more convincing if the authors assessed what the subcellular localisation of Ft is through the expression of Ft-GFP in clones to figure out whether it localises posteriorly or anteriorly in individual cells.

      We staged the animals, measured several animals for each genotype and provided the quantifications in the revised manuscript. The level of Ft-GFP is higher in the cells at the frontal edge. We tried to examine the expression of Ft-GFP at single-cell level. However, this turned out to be technically difficult because the tracheal stem cells are not regularly arranged as epithelial cells and the proximal-distant axis of the tracheal stem cells remains unclear. We thus decided to measure the fluorescence signal of groups of stem cells along the DT regardless of their individual polarity within cells.

      (4) Regarding the trafficking of Upd2 in the fat body, is it known, whether Grasp65, Lbm, Rab5, and 7 are specifically needed for extracellular vesicle trafficking rather than general intracellular trafficking? What is the evidence for this?

      In our experiments, knocking down rab5, rab7, grasp65 or lbm in trachea using btl-Gal4 did not cause abnormality in the disciplined migration, which excludes their intracellular contribution in the trachea (Figure 7-figure supplement 1). Perturbation of Grasp65 or Lbm in fat body increased intracellular upd2-containing vesicles, indicating that intracellular production is functional (Figure 6J). The Grasp65 is specifically required for Upd2 production. Lbm, Rab5 and Rab7 are important of vesicle trafficking. Our conclusion does not pertain to extracellular or intracellular compartment.

      (5) Figure 8A-B: The data on the proximity of Rab5 and 7 to the Upd2 blobs are not very convincing.

      The confocal images indicate the proximity of Rab5 and Rab7 to the Upd2 vesicles. We interpret the proximity together with the results from Co-IP and PLA data (Figure 8E-K).

      (6) The authors should clarify whether or not their work has shown that "vesicle-mediated transport of ligands is essential for JAK/STAT signaling". In its current form, this manuscript does not appear to provide enough evidence for extracellular vesicle transport of Upd2.

      Lbm belongs to the tetraspanin protein family that contains four transmembrane domains, which are the principal components of extracellular vesicles. We show that Lbm interacts with Upd2. The JAK/STAT signaling depends on the Upd2 in the fat body as well as vesicle trafficking machinery. Furthermore, we performed electron microscopy and show the presence of Lbm-containing vesicles in fat body (Figure 8-figure supplement 1D).

      (7) What is the long-term effect of the various genetic manipulations on migration? The authors don't show what the phenotype at later time points would be, regarding the longer-term migration behaviour (e.g. at 10h APF when the cells should normally reach the posterior end of the pupa). And what is the overall effect of the aberrant bidirectional migration phenotype on tracheal remodelling?

      We observed that the integrity of tracheal network especially the dorsal trunk was impaired, which may be due to incomplete regeneration (Figure 3-figure supplement1E-I).

      (8) The RNAi experiments in this manuscript are generally done using a single RNAi line. To rule out off-target effects, it would be important to use two non-overlapping RNAi lines for each gene.

      We validated the phenotype using several independent RNAi alleles.

      Reviewer #2 (Public review):

      Summary:

      This work by Dong and colleagues investigates the directed migration of tracheal stem cells in Drosophila pupae, essential for tissue homeostasis. These cells, found in two nearby groups, migrate unidirectionally along the dorsal trunk towards the posterior to replenish degenerating branches that disperse the FGF mitogen. The authors show that inter-organ communication between tracheal stem cells and the neighboring fat body controls this directionality. They propose that the fat body-derived cytokine Upd2 induces JAK/STAT signaling in tracheal progenitors, maintaining their directional migration. Disruption of Upd2 production or JAK/STAT signaling results in erratic, bidirectional migration. Additionally, JAK/STAT signaling promotes the expression of planar cell polarity genes, leading to asymmetric localization of Fat in progenitor cells. The study also indicates that Upd2 transport depends on Rab5- and Rab7-mediated endocytic sorting and Lbm-dependent vesicle trafficking. This research addresses inter-organ communication and vesicular transport in the disciplined migration of tracheal progenitors.

      Strengths:

      This manuscript presents extensive and varied experimental data to show a link between Upd2JAK/STAT signaling and tracheal progenitor cell migration. The authors provide convincing evidence that the fat body, located near the trachea, secretes vesicles containing the Upd2 cytokine. These vesicles reach tracheal progenitors and activate the JAK-STAT pathway, which is necessary for their polarized migration. Using ChIP-seq, GFP-protein trap lines of planar cell polarity genes, and RNAi experiments, the authors demonstrate that STAT92E likely regulates the transcription of planar cell polarity genes and some apicobasal cell polarity genes in tracheal stem cells, which seem to be necessary for unidirectional migration.

      Weaknesses:

      Directional migration of tracheal progenitors is only partially compromised, with some cells migrating anteriorly and others maintaining their posterior migration.

      Our results suggest that Upd2-JAK/STAT signaling is required for the consistency of disciplined migration. Although only a few tracheal progenitors display anterior migration, these cells lose the commitment of directional movement. We acknowledge that the phenotype is moderate.

      Additionally, the authors do not examine the potential phenotypic consequences of this defective migration.

      We examined the long-term effects of the aberrant migration and observed an impairment of tracheal integrity and melanized tracheal branches (Figure 3-figure supplement1E-I).

      It is not clear whether the number of tracheal progenitors remains unchanged in the different genetic conditions. If there are more cells, this could affect their localization rather than migration and may change the proposed interpretation of the data.

      We examined the progenitor cell number in bidirectional movement samples and control group. The results show that cell number does not exhibit a significant difference between control and bidirectional movement groups (Figure 3-figure supplement 1).

      Upd2 transport by vesicles is not convincingly shown.

      We performed additional PLA experiments to further support the interaction between Upd2 and the core components of extracellular vesicles. Furthermore, we performed electron microscopy and show the presence of Lbm-containing vesicles in fat body (Figure 8-supplement 1D). Additional experiments such as colocalization and Co-IP assay and better quantification are provided in the revised manuscript (see revised Figure 8).

      Data presentation is confusing and incomplete.

      We used different colors to represent different genotypes, and the columns were superimposed. we changed the graphs to show the quantification in different conditions separately. We revised data presentation to avoid confusing.

      Reviewer #3 (Public review):

      Summary:

      Dong et al tackle the mechanism leading to polarized migration of tracheal progenitors during Drosophila metamorphosis. This work fits in the stem cell research field and its crucial role in growth and regeneration. While it has been previously reported by others that tracheal progenitors migrate in response to FGF and Insulin signals emanating from the fat body in order to regenerate tracheal branches, the authors identified an additional mechanism involved in the communication of the fat body and tracheal progenitors.

      Strengths:

      The data presented were obtained using a wide range of complementary techniques combining genetics, molecular biology, quantitative, and live imaging techniques. The authors provide convincing evidence that the fat body, found in close proximity to the trachea, secrete vesicles containing the Upd2 cytokine that reach tracheal progenitors leading to JAK-STAT pathway activation, which is required for their polarized migration. In addition, the authors show that genes regulating planar cell polarity are also involved in this inter-organ communication.

      Weaknesses:

      (1) Affecting this inter-organ communication leads to a quite discrete phenotype where polarized migration of tracheal progenitors is partially compromised. The study lacks data showing the consequences of this phenotype on the final trachea morphology, function, and/or regeneration capacities at later pupal and adult stages. This could potentially increase the significance of the findings.

      Regarding your kind suggestion, we examined the long-term effects of the aberrant migration and observed the impairment of tracheal integrity and melanized tracheal branches (Figure 3-figure supplement1E-I).

      (2) The conclusions of this paper are mostly well supported by data, but some aspects of data acquisition and analysis need to be clarified and corrected, such as recurrent errors in plotting of tracheal progenitor migration distance that mislead the reader regarding the severity of the phenotype.

      We used different colors to represent different genotypes, and the columns were superimposed. we changed the graphs to show the quantification in different conditions separately. We thank you for kindly pointing it out.

      (3) The number of tracheal progenitors should be assessed since they seem to be found in excess in some genetic conditions that affect their behavior. A change in progenitor number could lead to crowding, thus affecting their localization rather than migration capacities, thereby changing the proposed interpretation. In addition, the authors show data suggesting a reduced progenitor migration speed when the fat body is affected, which would also be consistent with a crowding of progenitors.

      We examined the cell number in bidirectional movement samples and control group. We examined cell number and cell proliferation and observed that there was no significance between control and bidirectional movement groups (Figure 3-figure supplement 2).

      (4) The authors claim that tracheal progenitors display a polarized distribution of PCP proteins that is controlled by JAK-STAT signaling. However, this conclusion is made from a single experiment that is not quantified and for which there is no explanation of how the plot profile measurements were performed. It also seems that this experiment was done only once. Altogether, this is insufficient to support the claim. Finally, a quantification of the number of posterior edges presenting filopodia rather than the number of filopodia at the anterior and posterior leading edges would be more appropriate.

      We staged the animals, measured several animals for each genotype and provided the quantifications in the revised manuscript. The level of Ft-GFP is higher in the cells at the frontal edge. We tried to examine the expression of Ft-GFP at single-cell level. However, this turned out to be difficult due to the fact that the tracheal stem cells are not regularly patterned as epithelial cells and the proximaldistant axis of tracheal stem cells is not well defined. We thus decided to measure the fluorescence signal of groups of stem cells along the DT regardless of their individual polarity.

      (5) The authors demonstrate that Upd2 is transported through vesicles from the fat body to the tracheal progenitors where they propose they are internalized. Since the Upd2 receptor Dome ligand binding sites are exposed to the extracellular environment, it is difficult to envision in the proposed model how Upd2 would be released from vesicles to bind Dome extracellularly and activate the JAK-STAT pathway. Moreover, data regarding the mechanism of the vesicular transport of Upd2 are not fully convincing since the PLA experiments between Upd2 and Rab5, Rab7, and Lbm are not supported by proper positive and negative controls and co-immunoprecipitation data in the main figure do not always correlate to the raw data.

      We use molecular modeling to show that Upd2 and Lbm intermingle, and Upd2 is not entirely encapsulated in vesicles (Figure 8-supplement 1E). We performed PLA experiments using the animals not expressing upd2-Cherry as negative control (Figure 8 E-J). We corrected the Co-IP panel and apologize for this error.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (1) Figure 1-figure supplement 1: E: How was the migration velocity assessed? By live imaging individual cells or following the cell front of the group? Over what time period? Do the data points in the graph correspond to individual cells or the cell group? It would be important to show confocal images that go along with this quantification.

      We took snapshots of pupae at 0hr\ 1hr\ 2hr and 3hr, and measured the distance covered by the migrating progenitor cells from the start place (the junction of TC and DT) to the leading edge of progenitor groups. We then calculated the migration rate by v=d (micrometer)/t (min). As the progenitor cells revolve around and migrate along the DT, tracking single tracheoblast through intact cuticle is technically challenging. We have therefore measured the leading edge as a proxy to the whole cell group. We agree with you that time-lapse imaging is favorable for analysis of migration.

      (2) Figure 1-figure supplement 1: F: Why is there Gal80ts in the genotype? (and in Figure 1H). Also, what pupal age was used for this quantification?

      Expression of hid and rpr in L3 stage impaired fat body integrity and adipocyte abundance, and caused lethality. Gal80ts was used for controlling the expression of rpr.hid. The pupal at 0hr APF were used in EdU experiment.

      (3) Figure 2C: what is shown in the 6 columns (why 3 each for control and rpr/hid)?

      We conducted 3 replicates of each group for control and rpr.hid.

      (4) In the methods, several Drosophila stocks are listed as 'source:" from a particular person (e.g. Dr Ma). Please list the real source of this stock, e.g. Bloomington stock number, or the lab and publication in which the stock was originally made.

      We provide the information on these stocks in the revised methods.

      (5) The SKOV3 carcinoma cell and S2 cell work is not described in the methods.

      We added detailed description of this experiment in the revised method-Cell culture and transfection. 

      (6) Figure 6 (F) 'Bar graph plots the abundance of Upd2-mCherry-containing vesicles in progenitors.' What does abundance mean? What was quantified, the number of vesicles, or the mean intensity? This is also not mentioned in the methods.

      We counted the number of Upd2-mCherry-containing vesicles in fat body cells and trachea progenitors and added the description of measurement in the method.

      (7) There are a few language mistakes throughout the manuscript. E.g.

      (a) Line 117 and other places: Language: 'fat body' should be 'the fat body'.

      We thank you for pointing out these errors and corrected it accordingly.

      (b) Line 1276 Language mistakes: 'Video 1 3D-view of confocal image stacks of tracheal progenitors and fat body. Scale bar: 100 μm. Genotypes: UAS-mCD8-GFP/+;lsp2-Gal4,P[B123]-RFP-moe/+.' :stacks and genotypes should be singular.

      We fixed these errors and thank you for kindly pointing them out. We also proofread the entire manuscript to assure accuracy.

      (8) In general, it is hard to figure out the exact genotypes used in experiments. This is mostly not written very clearly in the figure legends. E.g. Figure 2: genotype for A-C missing in figure legend (is B from control animals?)

      We added genotypes in the figure legends. For Figure 2, A and C lsp2-Gal4,P[B123]-RFP-moe/+ for control, UAS-rpr-hid/+;Gal80ts/+;lsp2-Gal4,P[B123]-RFP-moe/+ for rpr.hid; B from control animals.

      Reviewer #2 (Recommendations for the authors):

      Major comments:

      (1) The phenotype resulting from Upd2 downregulation by RNAi is subtle and shown by unconvincing images. In addition, these phenotypes are analyzed using only one RNAi line.

      We used two independent alleles of upd2RNAi from THFC (THU1288 and THU1331), and observed similar phenotype. For RNAi experiments, we always use multiple independent alleles.

      (2) The authors should analyze the phenotypic consequences of directional migration changes. Is there an effect on tracheal remodeling?

      We observed that the integrity of tracheal network especially the dorsal trunk was impaired and that melanized tracheal branches were present, which may be due to incomplete regeneration (Figure 3figure supplement1E-I).

      (3) The number of tracheal progenitors should be quantified, as some genetic conditions may affect cell numbers, as is apparent in some panels.

      We examined cell number and cell proliferation and observed that there was no significance between control and bidirectional movement groups (Figure 3-figure supplement 1).

      (4) The data on PCP protein distribution are unconvincing, unquantified, and insufficient to support one of the main conclusions of the study, which is stated in the abstract: "JAK/STAT signaling promotes the expression of genes involved in planar cell polarity, leading to asymmetric localization of Fat in progenitor cells."

      We staged the animals, measured several animals for each genotype and provided the quantifications in the revised manuscript. The level of Ft-GFP is higher in the cells at the frontal edge. We tried to examine the expression of Ft-GFP at single-cell level. However, this turned out to be difficult due to the fact that the tracheal stem cells are not regularly patterned as epithelial cells and the proximaldistant axis of tracheal stem cells is not well defined. We thus decided to measure the fluorescence signal of groups of stem cells along the DT regardless of their individual polarity.

      Minor comments:

      (1) Language should be revised. In many places in the manuscript, starting in line 113, "fat body" should be "the fat body".

      Thank you for pointing out this error. We corrected it accordingly.

      (2) Genotypes used in experiments should be described.

      We added all the genotypes. We proofread the entire manuscript to complete the figure legends for genotypes.

      (3) Line 67, the reference to "The progenitor cells reside in Tr4 and Tr5 metameres and start to move along the tracheal branch" should include (Chen and Krasnow, Science 2014).

      We added the reference in the manuscript.

      (4) Line 1081, Figure 7 Legend. "Bar graph plots the abundance of Upd2-mCherry-containing vesicles" Abundance is the number of vesicles? The graph displays the average number of vesicles? Please explain and describe the quantification.

      The bar graph represents the number of Upd2-mCherry-containing vesicles in different conditions. We quantified the number of vesicles per area.

      (5) Figure 1 (I-J) What is shown on the panels? Progenitors marked with? This information is not present in the figure or figure legend. Same for Figure 2 (D-E).

      Figure 1I-J show the vector of migrating progenitors. We added the information in the legends. The tracheal cells were labeled by nls-mCherry in Figure 1I-J. In Figure 2D-E, the progenitors were marked with P[B123]-RFP-moe.

      (6) Figure 3 Q, Stat92E-GFP values in the graph are not well-explained. What do the numbers in the y-axis refer to?

      y-axis represents the intensity of Stat92E-GFP normalized to control. We have changed the y-axis label to ‘normalized Stat92E-GFP intensity’ in the legends.

      (7) In general, figures and figure legends must be revised. Sometimes stainings are not well-defined, some scale bars are missing and plots do not say what the values are.

      We apologized for inadequate information and have revised the figures and legends accordingly.

      Reviewer #3 (Recommendations for the authors):

      Several points should be addressed by the authors in order to improve their manuscript.

      Major points:

      (1) The phenotype obtained from decreasing the inter-organ signaling is quite discrete. It is further weakened by the fact that the images chosen to illustrate the measures are not really convincing. No image at 1h APF shows any clear anterior migration. Based on the scale, most of the images at 3h APF do not show a striking difference compared to the control, and in any case, stronger phenotypes would be missed anteriorly since they would thus be out of frame. In addition, at 3h APF, progenitors migrating anteriorly from Tr5 position get mixed with those migrating posteriorly from Tr4 so it is not clear how measurements were made. Given that most phenotypes are observed upon the use of RNAis, it is possible that phenotypes are weak due to persistent gene expression. Using null clones for dome, hop, or stat in progenitors could therefore aggravate the phenotypes and support further the significance of the study. Finally, assessing the consequences of compromised fat body-tracheal communication on trachea morphology, function, and regeneration later in pupal development and on adult flies would also help strengthen the importance of the findings.

      We agree with you that anteriorly migrated Tr5 progenitors adjoining Tr4 progenitor hinders measurements and that mutants may give stronger phenotype than RNAi lines. We only measured Tr4 progenitors (instead of Tr5) when assessing anterior migration. Thus, we performed experiments using mutant alleles, which gave aberrant migration of tracheal progenitors (Figure 3-figure supplement1A-D). We can now show that the integrity of tracheal network especially dorsal trunk was impaired, which may be due to incomplete regeneration (Figure 3-figure supplement1E-I).

      (2) Although the authors did not observe defects in tracheal progenitor proliferation, progenitors seem to be present in excess in some key genetic background (e.g, upon expression of rpr.hid, statRNAi, Rab-RNAi or in the presence of BFA). This excess could be the result of another mechanism than proliferation (recruitment of extra progenitors since it is not clear how they originate, defect in apoptosis...) and could impact the localization of progenitors, those being pushed anteriorly as a consequence of crowding. A proper characterization of tracheal progenitor number would thus help to discriminate between defects in migration or crowding. This point could also be addressed by performing individual tracking of tracheal progenitors, to find out whether each progenitor is indeed migrating in the wrong direction or if the movement assessed by the global tracking method that is used is just a consequence of progenitor excess.

      We examined the cell number in bidirectional movement samples and control group. The results show that there was no significance between control and bidirectional movement groups (Figure 3figure supplement 1). We also tried to follow every progenitor, but were unable to obtain convincing results with P[B123]-RFP-moe, as tracking single tracheoblast through intact cuticle is technically challenging.

      (3) Regarding the ChIP-seq experiment, an explanation of why choosing the "establishment of planar polarity" family should be provided since data indicate a quite low GeneRatio. Indeed, the "cell adhesion" family seems a more obvious candidate, which would be further supported by the fact that the JAK-STAT pathway has been shown to affect cell adhesion components such as ECadherin and FAK (Silver and Montell 2001, Mallart et al 2024). Also, have these known targets of JAK-STAT signaling been found in the ChIP-seq data? Since filopodia polarization is affected in tracheal progenitors when JAK-STAT signaling is decreased, the same question also applies to enabled, which is involved in filopodia formation and has been recently identified as a target of JAK-STAT signaling.

      As you kindly suggested, we tested a number of cell adhesion-related genes such as E-Cadherin (shg), fak, robo2 and enabled (ena). We did not observe an apparent aberrancy in the migration of tracheal progenitors (Figure 5-supplement 1J).

      (4) Data investigating PCP protein distribution is not convincing, not quantified, and not sufficient to draw one of the main conclusions of the study, which is even written in the abstract "JAK/STAT signaling promotes the expression of genes involved in planar cell polarity leading to asymmetric localization of Fat in progenitor cells."

      We better quantified the abundance of Ft in in the progenitors in the frontal edge and those lagging behind. The traces plot multiple replicates in the figures. The level of Ft-GFP is higher in the cells at the frontal edge.

      (5) Overall, the figures together with their caption and/or the material and methods section lack some important information for the reader to fully understand the data. In addition, some errors are found in multiple plots throughout the article and must be corrected. Here are some examples:

      According to your suggestion, we revised legends and methods section to include sufficient information.

      (a) Migration distance plots from Figure 3E do not match the data presented in the source data file. It seems that, when creating the plot, instead of superimposing the bars, bars were stacked. This should be corrected for all migration distance plots from Figure 3E onward, including in supplementary figures.

      We apologized for misleading representation. We revised it accordingly and show the quantification in different conditions separately.

      (b) The number of analyzed flies and/or clusters of tracheal progenitors from different flies should be stated for all quantification or observations made on images. This information is lacking for all migration distance plots, for progenitor migration tracking (Figure 1 I, J), for DIPF reporter in Figure 2J, for plot profiles (Figure 5G, J), for Upd2-Rab5/Rab7/Lbm co-detections, PLA, CoIP, and lbm-pHluorin experiments. This also applies to RNA seq, ChIP seq, and surface proteomics, for which the number of pupae and number of replicates is not indicated.

      We changed the graphs to show the quantification and n number in different conditions separately.

      We also added the n number of replicates in methods.

      (c) How quantifications were performed is not sufficiently explained. For example, the reference point for migration distance measurement is not defined, and neither is whether the measures were made on fixed or live imaging samples. In fluorescence intensity measurements and Upd2 vesicle counting, information on whether measures were made on a single z slice or on a projection of several z slices should be stated together with what ROI and which FIJI tool for quantification were used. For plot profiles, the same information regarding z slices misses together with how the orientation, the thickness, and the length of the line were chosen, and again the number of times the experiment was conducted should be mentioned and error bars should appear on graphs.

      We thank this reviewer for the suggestions which help clarify the methodology of our experiments and improve presentation of our data. We have made the changes according to the suggestions and modified our methods section and the related figures to incorporate these changes.

      For measuring the migration distance of tracheal progenitors, we took snapshots of living pupae at 0hr\ 1hr\ 2hr and 3hr APF, and measured the migration distance of tracheal progenitors from the start place (the junction of TC and DT) to the leading edge of progenitor groups.

      For the measurements of fluorescent intensity of stat92E-GFP and DIPF, we took z-stack confocal images of samples and quantified the fluorescent intensity using FIJI. Specifically, intensity was quantified for regions of interest, using the Analysis and Measurement tools. To quantify Upd2mCherry vesicles, z-stack confocal images of fat body were taken and the cell counting function of FIJI was used to measure the vesicle number.

      To quantify the fluorescent intensity of in vivo tagged Ds, Ft and Fj proteins, a single z slice was used. The expression level of the protein was assessed as the integrated fluorescent intensity normalized to area.

      For the measurement of Ft-GFP distribution, a single z slice of the progenitors immediately proximal to the DT was imaged. An arbitrary line was drawn along the migration direction from the starting TC-DT junction to the leading front (the length of the line corresponds to the distribution range of tracheal stem cell clusters). Then, fluorescent intensity along the line was automatically calculated with the imbedded measurement function of Zeiss confocal software.

      Minor points:

      (1) In several instances, the authors generalize that stem cells migrate to leave their niche, but this is not the case for all stem cells.

      The phenomenon that stem cells leave their niche when they are activated is commonly observed. We interpreted the general mechanism from our system of tracheal stem cells. We fully agree with you that it may not be the case for all stem cells. We modified the text accordingly.

      (2) Line 122 -a reference paper or an image showing the expression pattern of the lsp2-Gal4 driver is missing.

      We added the reference in the manuscript.

      (3) Line 136 - The term "traces of individual progenitors" is overstated and should be reformulated as the method used does not seem to be individual cell tracking.

      We rephrased accordingly in the revised manuscript.

      (4) Line 146 - Fat body and tracheal progenitors are qualified as interdependent organs, in which aspect do tracheal progenitors affect the fat body?

      Current knowledge suggests a close inter-organ crosstalk between trachea and fat body: The fly trachea provides oxygen to the body and influences the oxidation and metabolism of the whole body. When the trachea is perturbed, the body is in hypoxia, which causes inflammatory response in adipose tissue as an important immune organ (Shin et al., 2024).

      (5) Line 163 - Not all the genes tested are cytokines, so the sentence should be reformulated. In addition, in supplementary Fig2-1 C-J, the KD of hh seems to abolish completely tracheal progenitor migration, which is not commented on.

      According to your suggestion, we revised the description on information of the genes tested. We added comments in the revised manuscript regarding phenotypes of hh knockdown. 

      (6) Line 180 - Conclusion is made on Dome expression while using a dome-Gal4 construct, which does not necessarily recapitulate the endogenous pattern of dome expression, so it should be reformulated. Ideally, dome expression should be assessed in another way. Also, it is not clear whether GFP is present only in progenitors since images are zoomed.

      We revised statement and provided larger view of dome>GFP that shows an enriched expression in the tracheal progenitors (Figure 2-figure supplement 2E), an expression pattern that is consistent with FlyBase.

      (7) Line 199 - Is it upd-Gal4 or upd2-Gal4 that is used? Since the conclusion of the experiment is made on upd2, the use of upd-gal4 would not be relevant. If upd2-gal4 is used, it should be corrected. In general, the provenance of the Gal4 lines should be provided. In addition, a strong GFP signal in the trachea is visible on the image in Supplementary Figure 2-2F but not commented on and seems contradictory with the conclusion mentioning that fat body and gut are the main source of Upd2 production.

      We removed data obtained from the use of this irrelevant upd-Gal4 line.

      (8) Figures:

      -  Figure 1 G, H - Scale bar is missing.

      We added it accordingly.

      -  Figure 1 I, J - The information on the staining is missing.

      We added it in the revised manuscript.

      -  Figure 2A - Providing explanations of the terms "Count" and "Gene ratio" in the caption would be helpful for readers who are not used to this kind of data. In addition, the color code is confusing since the same color is used for the selected gene family and for high p-values (the same applies to other similar graphs).

      Gene ratio refers to the proportion of genes in a dataset that are associated with a particular biological process, function, or pathway. Count indicates the number of genes from input gene list that are associated with a specific GO term. We used redness to indicate a smaller p-value and a higher significance.

      -  Figure 2 B, C - What does the color scale represent? What do the columns in C correspond to, different time points, different replicates?

      The color scale represents the normalized expression. The columns in C correspond to different replicates of control and rpr.hid.

      -  Figure 2 F - The error bars on the 3h APF posterior bars are missing.

      We added error bars accordingly.

      -  Figure 2 G - The legend "Down-Stable-Up" is in comparison to what?

      The control group was generated from the reaction without H2O2. The comparison was relative to the control group.

      -  Figure 2 J - The specificity of the DIPF tool that has been created should be validated in other tissues displaying known JAK-STAT activity and/or in conditions of decreased JAK-STAT signaling. In addition, the added value of the tool as compared to the JAK-STAT activity reporter used later, which has been well characterized, is not obvious.

      We added the signal of DIPF in fat body and salivary gland, both of which harbor active JAK/STAT signaling (Figure 2-figure supplement 2F-H). As opposed to the well characterized Stat92E-GFP reporter that assays the downstream transcription activity, the DIPF reporter measures the upstream event of receptor dimerization.

      -  Figure 3 I-P - Reporter tool validation in Images I-L could be moved to supplementary data. In images M-P, staining of nuclei and/or membranes would be useful to assess cell integrity.

      We revised the figures accordingly.

      -  Figure 3Q and similar plots in the following figures do not explain the normalization performed and how it can be higher than 1 in control conditions.

      In these figures, we normalized the signal relative to control groups, e.g., The value of Stat92E-GFP in btl-GFP control group was set to 1 in the previous Figure 3Q (revised Figure 3-supplementary

      Figure B-J).

      -  Figure 4C - These representations lack explanations to be fully understood by a broad audience.

      The figure showing that Stat92E binding was detected in the promoters and intronic regions (the orange peaks) of genes functioning in distal-to-proximal signaling, such as ds, fj, fz, stan, Vang and fat2. We added the information in figure legend according to your suggestion.

      -  Figure 5 K,L - What is the x-axis missing, together with the method of tracking used?

      The x-axis refers to time of recording from a t stack series with a time interval of 5 min. We revised method section and provide detailed procedure of this experiment.

      -  Figures 6 and 8- The overall figures lack a wider view of the cells/tissues/organs and/or additional staining to understand what is presented.

      We showed preparation of fat body. In order to obtain the high resolution of vesicles, we used high magnification. We now added wider views of the tissues under investigation (e.g. Figure 6-figure supplement 1).

      -  Figure 6 D,E - The scale bar is missing.

      We added it accordingly.

      -  Figure 8 O-S - What is the blue staining?

      The blue staining shows DAPI-stained nuclei. We have added the information in the legend.

      -  PLA experiments can give a lot of non-specific background. What kind of controls have been used in Figure 8 F-J? Negative controls should be done on cells that do not express upd2-mCherry using both antibodies to detect non-specific background, which does not usually appear completely black.

      If possible, a positive control using a known protein interacting with Rab5-GFP should be included.

      We used the control samples without one of the primary antibodies in previous Figure 8. In the revised Figure 8, we conducted experiment as you suggested with controls that do not express upd2mCherry (Figure 8 E-J).

      -  Co-IP experiments - The raw data file for blots is quite hard to read through. Some legends are not facing the right lane and some blots presented in the main figure are difficult to track since several blots are presented in the raw data file. e.g.

      (a)  Raw blot for Figure 8 K: the band for mCherry in the IP anti-GFP blot (lane one in K) is not convincing, it is not distinguishable from other aspecific bands. On the reverse IP presented only in raw data, on the input from blot IB anti-mCherry, both lanes present exactly the same bands at 72kb when one of the lanes corresponds to extract from flies not expressing upd2-mCherry.

      We thank you for pointing out the incorrect labels. We apologized for the errors and corrected it accordingly.

      (b)  Raw blot for Figure 8 L: on the input blot IB anti-GFP, there is a band corresponding to Rab7-GFP in the lane of the extract from flies not expressing Rab7-GFP.

      We corrected it.

      (c)  Raw data for Figure 8 M: on the last blot, legends are missing above the input Ib anti-GFP blot.

      We added the missing legends in the figure.

      Shin, M., Chang, E., Lee, D., Kim, N., Cho, B., Cha, N., Koranteng, F., Song, J.J., and Shim, J. (2024). Drosophila immune cells transport oxygen through PPO2 protein phase transition. Nature 631, 350-359.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) Summary:

      In this manuscript, the model's capacity to capture epistatic interactions through multi-point mutations and its success in finding the global optimum within the protein fitness landscape highlights the strength of deep learning methods over traditional approaches.

      We thank the reviewer for his/her recognition of our model’s potential and advantages.

      (2) Strengths:

      It is impressive that the authors used AI combined with limited experimental validation to achieve such significant enhancements in protein performance. Besides, the successful application of the designed antibody in industrial settings demonstrates the practical and economic relevance of the study. Overall, this work has broad implications for future AI-guided protein engineering efforts.

      We are thankful for the editor’s appreciation on our work, especially acknowledged the practical application of our model.

      (3) Weaknesses:

      However, the authors should conduct a more thorough computational analysis to complement their manuscript. While the identification of improved multi-point mutants is commendable, the manuscript lacks a detailed investigation into the mechanisms by which these mutations enhance protein properties. The authors briefly mention that some physicochemical characteristics of the mutants are unusual, but they do not delve into why these mutations result in improved performance. Could computational techniques, such as molecular dynamics simulations, be employed to explore the effects of these mutations?

      We thank the reviewer for this good question, which allows us to provide a deeper investigation into the mechanisms by which the mutations significantly enhance the alkali-resistance of proteins. By following the reviewer’s suggestion, we have expanded our analysis by incorporating molecular dynamics (MD) simulations to understand the impact of the mutations. As an example, we focused on the representative alkali-resistant mutant, A57D;P29T, and examined its MD simulation results. As shown in Figure S4A, the two-point mutant of A57D;P29T has a Tm increase of around 8 ℃ and a much stronger binding affinity than the WT. Our analysis of the MD trajectories indicates that the A57D;P29T mutant has a more rigid structure than that of WT due to its lower root mean squared deviation (RMSD) of protein (Figure S4B). Furthermore, we calculated the root mean squared fluctuation (RMSF) for each residue, and realized that the mutant displayed less fluctuation at residue 29 but similar flexibility at residue 57. Interestingly, residues at positions 10, 108 and 118 which spatially distant from residues 29 and 57 in the mutant exhibited remarkable weakened fluctuations than those in the WT (Figure S4C), implying a more rigid structure of the mutant contributing to its improved resistance on high temperature and strong alkalinity. However, Figure S4D shows the AlphaFold3 predicted structures of the WT and the mutant are quite similar.

      To unveil the origin of change on structural flexibility, we computed the intramolecular interactions, such as salt bridges and hydrogen bonds for both WT and the mutant. We observed that the mutations increased the number of hydrogen bonds between the mutation sites and the rest of the protein (Figure S4E). However, the overall structure of the mutant did not show significant changes, which is also evident from the solvent-accessible surface area (SASA) analysis (Figure S4F). We also analyzed changes in salt bridges and found that although residue 57 mutated to Histidine, no new salt bridges were formed. Additionally, RMSF results showed that residues 10, 108, and 118 became more rigid, but further analysis revealed that there was no significant change in hydrogen bonds or other interactions in these regions. Overall, the MD results suggest that more hydrogen bonds introduced by the mutations of A57D;P29T stabilize the protein, leading to the enhanced alkali resistance observed in the mutant. These results are now presented in Figure S4 and discussed in detail in the revised manuscript.

      Specifically, we have added the following discussion in the main text:

      “In order to gain deeper insights into the mechanisms by which the identified mutations enhance protein properties, we performed molecular dynamics (MD) simulations on the best alkali-resistant mutant. The simulation results revealed several key observations that help explain the observed improvements in protein stability and alkali resistance. As shown in Figure S4A, the two-point mutant of A57D;P29T has a Tm increase of around 8℃ and a much stronger binding affinity than the WT. Our analysis of the MD trajectories indicates that the A57D;P29T mutant has a more rigid structure than that of WT due to its lower root mean squared deviation (RMSD) of protein (Figure S4B). Furthermore, we calculated the root mean squared fluctuation (RMSF) for each residue, and realized that the mutant displayed less fluctuation at residue 29 but similar flexibility at residue 57. Interestingly, residues at positions 10, 108 and 118 which spatially distant from residues 29 and 57 in the mutant exhibited remarkable weakened fluctuations than those in the WT (Figure S1C), implying a more rigid structure of the mutant contributing to its improved resistance on high temperature and strong alkalinity. However, Figure S4D shows the AlphaFold3 predicted structures of the WT and the mutant are quite similar. To unveil the origin of change on structural flexibility, we computed the intramolecular interactions, such as salt bridges and hydrogen bonds for both WT and the mutant. We observed that the mutations increased the number of hydrogen bonds between the mutation sites and the rest of the protein (Figure S4E). However, the overall structure of the mutant did not show significant changes, which is also evident from the solvent-accessible surface area (SASA) analysis (Figure S4F). We also analyzed changes in salt bridges and found that although residue 57 mutated to Histidine, no new salt bridges were formed. Additionally, RMSF results showed that residues 10, 108, and 118 became more rigid, but further analysis revealed that there were no significant changes in hydrogen bonds or other interactions in these regions. Taken together, these findings suggest that the enhanced alkali resistance of the mutant is likely due to an overall increase in protein stability, rather than a dramatic change in its structural conformation. The MD simulation results, which are detailed in Figure S4, provide a deeper understanding of how specific mutations can improve protein properties and offer valuable insights for future protein engineering applications.”

      And we also included the following content in the SI:

      “Molecular Dynamics (MD) simulations

      The initial structures for molecular dynamics (MD) simulations of both the wild type and the mutant were predicted using AlphaFold3. To simulate experimental conditions, each protein was placed in a cubic water box containing 0.1 M NaCl. The CHARMM27 force field and the TIP4P water model were applied throughout the simulations. After an initial energy minimization of 50,000 steps, the systems were heated and equilibrated for 1 ns in the NVT ensemble at 300 K followed by an additional 1 ns in the NPT ensemble at 1 atm. The production phase then involved 200-ns simulations with periodic boundary conditions, using a 2 fs integration time step. The LINCS algorithm was used to constrain covalent bonds involving hydrogen atoms, while Lennard-Jones interactions were cut off at 10 Å. Electrostatic interactions were computed with the particle mesh Ewald method, using a 10 Å cutoff and a grid spacing of approximately 1.6 Å with a fourth-order spline. Temperature and pressure were regulated by the velocity rescaling thermostat and Parrinello-Rahman algorithm, respectively. All simulations were performed using GROMACS 2020.4 software packages. Both systems have reached equilibrium according to the analyses of root mean squared deviation (RMSD).”

      (4) Additionally, the authors claim that their method is efficient. However, the selected VHH is relatively short (<150 AA), resulting in lower computational costs. It remains unclear whether the computational cost of this approach would still be acceptable when designing larger proteins (>1000 AA). Besides, the design process involves a large number of prediction tasks, including the properties of both single-site saturation and multi-point mutants. The computational load is closely tied to the protein length and the number of mutation sites. Could the authors analyze the model's capability boundaries in this regard and discuss how scalable their approach is when dealing with larger proteins or more complex mutation tasks?

      In our prior work, we have demonstrated that our method is applicable to larger proteins as well [Jiang et al., Sci. Adv. 10, eadr2641 (2024)]. For instance, when engineering a protein with 1000 amino acids, inferring the fitness of one million mutants using the model on a single 4090 GPU takes approximately 20 hours. However, it remains infeasible to explore all possible mutations when designing multi-point mutants due to the vast space. To address this challenge, we propose the design of a reliable mutant library. In the first round of experiments, we used the model to score all single-point mutations, and then constructed the multi-point mutant library by combining experimentally tested single-point mutations. In this way, even when designing five-point mutants, we only need to score on the order of millions of mutants, making the inference process time-efficient and fully acceptable. As a result, the number of single-point mutations selected for combination into the multi-point mutant library becomes a crucial parameter that affects both inference time and scope. We limited the number of single-point mutations to between 30 and 50 to strike a balance between efficiency and accuracy.

      These results are discussed in the revised manuscript. Specifically, we have added the following discussion at the section 2.2 in the main text:

      “Although the model inference is fast, it is not feasible to explore all possible mutations when designing multi-point mutants due to the exponential increase in the number of potential combinations. To manage this challenge, we constructed a mutant library based on a two-stage design process. In the first stage, we scored all single-point mutations using the model, and in the second stage, we combined experimentally validated single-point mutations to create the multi-point mutant library. This approach ensures that even when designing multi-point mutants (e.g., five-point mutants), the number of mutants to score remains in the millions, which is computationally efficient and practical. The number of single-point mutations selected for the multi-point mutant library is a key factor influencing both the computational load and the scope of the design space. To maintain a balance between efficiency and accuracy, we limited the number of single-point mutations to between 30 and 50. This strategic approach allows us to achieve both scalability and precision in our protein engineering tasks.”

      Reviewer #2 (Public review):

      In this paper, the authors aim to explore whether an AI model trained on natural protein data can aid in designing proteins that are resistant to extreme environments. While this is an interesting attempt, the study's computational contributions are weak, and the design of the computational experiments appears arbitrary.

      The reviewer’s comments give us an opportunity to further state the novelty of this study. Despite the AI model has been reported in our previous work [Sci. Adv. 10, eadr2641 (2024)], the unnatural physicochemical properties of proteins, to the best of our knowledge, have never been predicted using AI models. Our preceding work [Sci. Adv. 10, eadr2641 (2024)] has demonstrated that the large language model can predict the performances of the mutants on thermostability, catalytic activity, and binding affinity, etc. However, whether the AI models are able to evaluate the unnatural properties of the mutants remains unexplored. Our work has shown that AI models trained on the natural proteins can be used to design the mutants that resistant extreme conditions, such as strong alkalinity, substantially expanding the application of AI for bioengineering. Moreover, our design of the computational experiments was driven by the nature of the task and the availability of experimental data. We employed different strategies for designing single-point and multi-point mutants, specifically using a zero-shot approach for single-point mutations to overcome the challenge of rare data and fine-tuning the model for multi-point mutations to leverage the experimental data of single-point mutations.

      (1) The writing throughout the paper is poor. This leaves the reader confused.

      The manuscript has been revised accordingly, and we would like to address the reader’s questions if anything is confused.

      (2) The main technical issue the authors address is whether AI can identify protein mutations that adapt to extreme environments based solely on natural protein data. However, the introduction could be more concise and focused on the key points to better clarify the significance of this question.

      We thank the reviewer for this comment. We have revised the manuscript, particularly the introduction, where we focused on the research questions, methods, and main findings, while removing excessive background information to improve the manuscript’s conciseness and clarity.

      “Protein engineering, situated at the nexus of molecular biology, bioinformatics, and biotechnology, focuses on the design of proteins to introduce novel functionalities or enhance existing attributes[1-3]. With the exponential growth of biological data and computational power, protein engineering has experienced a significant shift towards advanced computational methodologies, particularly deep learning, to expedite the design process and unravel complex protein-function relationships[4-9]. However, a significant challenge in industrial protein engineering is designing proteins with inherent resistance to extreme conditions, such as high temperature and extreme pH environments (acidic or alkaline)[17, 18]. Unlike proteins in natural ecosystems, those used in industrial processes often encounter harsh physical and chemical conditions, necessitating exceptional resilience to maintain functionality[19, 20]. Previous efforts to enhance protein resistance have often relied on rational design and mutant library screening. These methods are typically labor-intensive, inefficient, and yield limited improvements[23-26]. Consequently, the industrial demand for proteins resilient to harsh environments poses a notable absence within the training datasets of Artificial Intelligence (AI) models. Exploring whether AI can achieve the evolution of protein resistance to extreme environments is crucial for broadening protein applications and improving modification efficiency.

      Recent advances in large-scale protein language models (LLMs) have enabled zero-shot predictions of protein mutants based on self-supervised learning from natural protein sequences. Although AI-guided protein design has been applied to predict the mutants with greater thermostability and higher activity[34-36], it is unexplored whether these models based on the natural protein information can find the mutants that adapt the unnatural extreme environments, such as the alkaline solution with the pH value higher than 13.

      Here, we employed a LLM (large language model) developed by our group, the Pro-PRIME model[27], to predict dozens of mutants of a nano-antibody against growth hormone (a VHH antibody), and examined their fitness, including alkali resistance and thermostability, to evaluate their performance under extreme environments.

      We utilized the Pro-PRIME model to score saturated single-point mutations of the VHH in a zero-shot setting, and selected the top 45 mutants for experimental testing. Some mutants exhibited improved alkali resistance, while others demonstrated higher thermal stability or affinity. Subsequently, we fine-tuned the Pro-PRIME model to predict dozens of multi-point mutations. As a result, we obtained three multi-point mutants with enhanced alkali resistance, higher thermostability, as well as strong affinity to the targeted protein. Also, the dynamic binding capacity of the selected mutant did not show significant decline after more than 100 cycles, making it suitable for practical application in industrial production. The selected mutant has been used in practical production and lower the cost for over one million dollars in a year. To the best of our knowledge, this is the first protein product developed by a LLM that has been successfully applied in mass production. Due to the Pro-PRIME model's ability to achieve precise predictions of multi-point mutations with reliance on a small amount of experimental data, our two-round design process involved experimental validation of only 65 mutants in two months, demonstrating remarkable high efficiency. Furthermore, we performed a systematic analysis of these findings and determined that the model can yield more valuable predictive outcomes while remaining consistent with rational design principles. Specifically, within the framework of multi-point combinations, the model's incorporation of negative single-point mutations into the combinatorial space led to exceptional results, showcasing its capacity to capture epistatic interactions. Notably, in striving for global optimum, deep learning methods offer distinct advantages over traditional rational design approaches.”

      (3) The authors did not develop a new model but instead used their previously developed Pro-PRIME model. This significantly weakens the novelty and contribution of this work.

      While it is true that the Pro-PRIME model was previously developed, the novelty and contribution of this work lie in its novel application to design proteins with properties that are not naturally found or are rare in nature. In our original work, the Pro-PRIME model was used to optimize proteins for existing, well-established properties, such as thermal stability, enzymatic activity, and affinity. However, in this study, we extended the model’s capabilities to design proteins that exhibit resilience to extreme environments, such as high pH—properties that are not inherently present in most natural proteins. To our knowledge, no existing model has addressed the challenge of engineering alkali-resistant proteins, nor is there relevant dataset available for training such models.

      This shift from optimizing existing characteristics to engineering entirely new properties represents a significant step forward in the field of protein design. By focusing on the design of proteins that can survive and function in harsh, unnatural environments, we have demonstrated the broader applicability of the Pro-PRIME model beyond its initial scope. This expansion of the model's application is a novel contribution that has the potential to accelerate the development of proteins for industrial, agricultural, and biotechnological applications.

      Thus, while the Pro-PRIME model itself is not new, its application to the new challenge of engineering proteins with alkali resistance and other novel properties significantly enhances the impact and novelty of this work. Moreover, this work is groundbreaking not only in terms of the model’s novel application but also because no previous studies have specifically targeted alkali resistance or provided data for training models on such extreme properties. Therefore, our approach is unique, marking a new direction in protein engineering.

      We have made the following revisions to the conclusions section of the manuscript:

      “Through two rounds of evolution, we successfully designed a VHH antibody with strong resistance to extreme environments and enhanced affinity using the Pro-PRIME model. Although rare case can tolerate the extreme pH and saline conditions in our pre-training dataset, the Pro-PRIME model showed impressive performance after supervised learning with limited data, especially on capturing the epistatic effects. The analysis of these 65 mutants revealed that the Pro-PRIME model is adept at exploring the large space of protein fitness, being less susceptible to local optima, and having greater potential to find the global optimum. Our efficient method of designing mutants that consider multiple properties improvement holds promise for industrial application of proteins. Specifically, the VHH antibody has been deployed in practical production and significantly enhancing the efficiency of the entire production line after our design. While the Pro-PRIME model itself has been reported, this work demonstrates its first-time application to the challenge of designing proteins with alkali resistance and other extreme properties that are not found in natural proteins, nor have previous studies addressed or provided data for such applications. This shift from optimizing existing protein properties to engineering entirely new, unnatural traits is a significant advance in the field. This study shows that the AI models, such as Pro-PRIME, can not only guide the evolution of protein thermal stability, enzymatic activity, ligand affinity, etc., but also enable to develop the mutants adapting the harsh unnatural environments, such as extreme pH and concentrated salt, largely expanding its application. The novelty of this work lies in the ability to design and engineer proteins with novel properties, specifically alkali resistance, which is an unprecedented achievement in AI-assisted protein engineering. The great potential of AI model is expected to significantly accelerate the development of proteins for diverse applications in medicine, agriculture, bioengineering, etc.”

      (4) The computational experiments are not well-justified. For instance, the authors used a zero-shot setting for single-point mutation experiments but opted for fine-tuning in multiple-point mutation experiments. There is no clear explanation for this discrepancy. How does the model perform in zero-shot settings for multiple-point mutations? How would fine-tuning affect single-point mutation results? The choice of these strategies seems arbitrary and lacks sufficient discussion.

      We appreciate the reviewer’s comment regarding the use of zero-shot and fine-tuning settings for single-point and multi-point mutation experiments, and we are grateful for the opportunity to further clarify this aspect of our work.

      In the first round of design, we used the zero-shot approach for single-point mutations because the number of possible single-point mutations is limited, and no prior experimental data was available. In the absence of relevant data, the zero-shot approach allows the model to make predictions based on the learned sequence patterns from the pre-trained protein language model. Given that single-point mutations are relatively fewer in number and computationally feasible to evaluate, the zero-shot approach was deemed appropriate for this task.

      However, when it comes to designing multi-point mutants, the number of potential combinations increases exponentially, making it computationally impractical to explore all possible mutations in a reasonable timeframe. Furthermore, since we had already obtained some experimental data for single-point mutations in the first round, we fine-tuned the model with this data in the second round to improve the accuracy of predictions for multi-point mutants. Fine-tuning helps the model better capture the specific features that contribute to protein functionality, which are critical when dealing with multi-point mutations where multiple residues interact. This allows the model to produce more reliable and targeted predictions for multi-point mutants, ultimately leading to better design outcomes.

      Regarding the model's performance in zero-shot settings for multi-point mutations, we tested this approach, and the results did not align well with the experimental data for multi-point mutants. Specifically, the Spearman correlation coefficient between the zero-shot predictions and experimental results was -0.71, indicating that zero-shot predictions for multi-point mutations were not as accurate as those from the fine-tuned model.

      In summary, the choice of using zero-shot for single-point mutations and fine-tuning for multi-point mutations was driven by the nature of the task and the availability of experimental data. Fine-tuning the model improves its predictive performance, especially for more complex multi-point mutation tasks. We have now clarified these choices in the manuscript and have added further discussion on the trade-offs between zero-shot and fine-tuning approaches.

      Specifically, we have added the following discussion at the section 2.2 in the main text:

      “Note that we employed different strategies for designing single-point and multi-point mutants, specifically using a zero-shot approach for single-point mutations and fine-tuning the model for multi-point mutations. These choices were made based on the distinct characteristics of the two tasks and the availability of experimental data. For single-point mutations, the number of possible mutations is relatively limited, and at the outset, there were no experimental data available. In such cases, the zero-shot setting was chosen because it allows the model to predict the fitness of mutants based solely on the information learned during pre-training on a large protein sequence dataset. Since single-point mutations are computationally manageable, this approach was deemed appropriate to generate initial predictions for protein engineering. However, when designing multi-point mutants, the situation changes significantly. The potential combinations of mutations increase exponentially, and without prior data, it becomes computationally infeasible to evaluate every possible combination within a reasonable timeframe. Moreover, by the time we reached the multi-point mutation design stage, experimental data for several single-point mutations had already been obtained. This data enabled us to fine-tune the model to better capture the specific structural and functional features that contribute to protein stability and resistance, especially in the context of multiple interacting mutations. Fine-tuning improves the model’s accuracy by adjusting its parameters to align more closely with the experimental data, ensuring that the predicted multi-point mutants are more likely to meet the desired engineering goals. After the second round of design, the fitness of the mutants was further improved. In improving alkali resistance, experimental results showed that 15 of the 45 designed mutants exhibited positive responses, yielding a success rate of 30%, close to the 35% success rate achieved in the second round. Compared to the wild type, the best single-point mutant improved alkali resistance by approximately 44.7%, while the best multi-point mutant achieved a 67.7% increase. For thermal stability enhancement, the success rate in the first round was 77.8%, rising to 100% in the second round. The top single-point mutant exhibited a Tm increase of 6.37°C over the wild type, while the best multi-point mutant had a Tm increase of 10.02°C. We also tested the performance of the zero-shot approach for multi-point mutants, and the results showed that this method did not yield satisfactory predictions. The Spearman correlation coefficient between the zero-shot predictions and experimental results for multi-point mutants was -0.71, indicating a significant discrepancy. This further highlights the importance of fine-tuning the model for multi-point mutations, as the fine-tuned model provided more accurate and reliable results. In summary, the choice of zero-shot for single-point mutations and fine-tuning for multi-point mutations was driven by practical considerations regarding computational feasibility and the availability of experimental data. Fine-tuning the model significantly enhances its predictive performance, particularly for complex multi-point mutations where multiple residues interact. We believe this strategy strikes an optimal balance between computational efficiency and predictive accuracy, making it well-suited for practical protein engineering applications.”

    1. Author response:

      We would like to thank the reviewers and the editors for carefully reading and commenting our manuscript and plan to prepare a revised manuscript. Particularly, we want to thank reviewer 2 for spotting a major oversight regarding the use of the TKO (TRiP-CRISPR knockout) and TOE (TRiP-CRISPR Over Expression) systems and the MiMIC alleles. As the reviewer pointed out, these lines were not used as intended, therefore our results and conclusions regarding the genetic interactions between Pink1 and several of genes in the paper (PIG-A, Rab7, Ccz1, CG10646, Mon1, FASN2, CG17712) that we attempted to target, are incorrect and based on a technical mistake. These results need to be removed from the manuscript.

    1. Author response:

      Reviewer 1:

      Summary: This work presents an Interpretable protein-DNA Energy Associative (IDEA) model for predicting binding sites and affinities of DNA-binding proteins. Experimental results demonstrate that such an energy model can predict DNA recognition sites and their binding strengths across various protein families and can capture the absolute protein-DNA binding free energies.

      We appreciate the reviewer’s careful assessment of the paper, and we thank the reviewer for the insightful suggestions and comments.

      Strengths:

      (1) The IDEA model integrates both structural and sequence information, although such an integration is not completely original. (2) The IDEA predictions seem to have agreement with experimental data such as ChIP-seq measurements.

      We appreciate the reviewer’s comments on the strength of the paper.

      Weaknesses:

      (1) The authors claim that the binding free energy calculated by IDEA, trained using one MAX-DNA complex, correlates well with experimentally measured MAX-DNA binding free energy (Figure 2) based on the reported Pearson Correlation of 0.67. However, the scatter plot in Figure 2A exhibits distinct clustering of the points and thus the linear fit to the data (red line) may not be ideal. As such. the use of the Pearson correlation coefficient that measures linear correlation between two sets of data may not be appropriate and may provide misleading results for non-linear relationships.

      We thank the reviewer for the insightful comments and agree that the linear fit between our predictions and the experimental data may not be ideal. The primary utility of the IDEA model is for assessing the relative binding affinities of different DNA sequences. To further support this, we plan to conduct additional statistical analyses that are independent of the linear correlation assumption but instead focus on the ranked order of DNA sequence binding affinities.

      (2) In the same vein, the linear Pearson Correlation analysis performed in Figure 5A and the conclusion drawn may be misleading.

      We thank the reviewer for the insightful comments. We will perform the same analysis for Figure 5A as detailed in our response to the previous comments.

      (3) The authors included the sequences of the protein and DNA residues that form close contacts in the structure in the training dataset, whereas a series of synthetic decoy sequences were generated by randomizing the contacting residues in both the protein and DNA sequences. In particular, synthetic decoy binders were generated by randomizing either the DNA (1000 sequences) or protein sequences (10,000 sequences) from the strong binders. However, the justification for such randomization and how it might impact the model’s generalizability and transferability remain unclear.

      We thank the reviewer for the insightful comments. We will perform additional analyses to assess the robustness of our model predictions with respect to the number of randomized decoys. Additionally, we will examine how randomization would potentially affect the model’s generalizability and transferability.

      (4) The authors performed Receiver Operating Characteristic (ROC) analysis and reported the Area Under the Curve (AUC) scores in order to quantitate the successful identification of the strong binders by IDEA. It would be beneficial to analyze the precision-recall (PR) curve and report the PRAUC metric which could be more robust.

      We agree with Reviewer 1 that more statistical metrics should be used to evaluate our model’s performance. We will include a more robust approach, such as PRAUC, to evaluate our model.

      Reviewer 2:

      Summary:

      Zhang et al. present a methodology to model protein-DNA interactions via learning an optimizable energy model, taking into account a representative bound structure for the system and binding data. The methodology is sound and interesting. They apply this model for predicting binding affinity data and binding sites in vivo. However, the manuscript lacks discussion of/comparison with state-of-the-art and evidence of broad applicability. The interpretability aspect is weak, yet over-emphasized.

      We appreciate the reviewer’s excellent summary of the paper, and we thank the reviewer for the insightful suggestions and comments.

      Strengths:

      The manuscript is well organized with good visualizations and is easy to follow. The methodology is discussed in detail. The IDEA energy model seems like an interesting way to study a protein-DNA system in the context of a given structure and binding data. The authors show that an IDEA model trained on one system can be transferred to other structurally similar systems. The authors show good performance in discriminating between binding-vs-decoy sequences for various systems, and binding affinity prediction. The authors also show evidence of the ability to predict genome-wide binding sites.

      We appreciate the reviewer’s strong assessment of the strengths of this paper.

      Weaknesses:

      An energy-based model that needs to be optimized for specific systems is inherently an uncomfortable idea. Is this kind of energy model superior to something like Rosetta-based energy models, which are generally applicable? Or is it superior to family-specific knowledge-based models? It is not clear.

      We thank the reviewer for the insightful comments. We will include predictions by generic protein-DNA energy models, such as the Rosetta-based energy model or family-specific knowledge-based model, to compare with our model performance.

      Prediction of binding affinity is a well-studied domain and many competitors exist, some of which are well-used. However, no quantitative comparison to such methods is presented. To understand the scope of the presented method, IDEA, the authors should discuss/compare with such methods (e.g. PMID 35606422).

      We thank the reviewer for the insightful comments. In our initial submission, Figure S5 presents a comparison between our model’s prediction and those of an existing method using 10-fold cross-validation. We agree a more comprehensive comparison with other methods is needed and will include a discussion and comparison of the IDEA model’s performance with additional state-of-the-art models.

      The term “interpretable” has been used lavishly in the manuscript while providing little evidence on the matter. The only evidence shown is the family-specific residue-nucleotide interaction/energy matrix and speculations on how these values are biologically sensible. Recent works already present more biophysical, fine-grained, and sometimes family-independent interpretability (e.g. PMID 39103447, 36656856, 38352411, etc.). The authors should put into context the scope of the interpretability of IDEA among such works.

      We agree that “interpretability” should be discussed in a relevant context. We will discuss the scope of IDEA interoperability within the context of recent works, including those suggested by the reviewers.

      The manuscript disregards subtle yet important differences in commonly used terminology in the field. For example, the authors use the term ”specificity” and ”affinity” almost interchangeably (for example, the caption for Figure 3A uses ”specificity” although the Methods text describes the prediction as about ”affinity”). If the authors are looking to predict specificity, IDEA needs to be put in the context of the corresponding state-of-the-art (PMID 36123148, 39103447, 38867914, 36124796, etc).

      We really appreciate the reviewer for pointing out our conflation of “specificity” and “affinity” in the manuscript. To clarify, IDEA’s primary function is to predict the binding affinities of protein-DNA pairs in a sequence-specific manner. The acquired binding affinities of target DNA sequences can then be used to assess the specific binding motifs. We will revise our text to clarify this point.

      It is not clear how much the learned energy model is dependent on the structural model used for a specific system/family. It would be interesting to see the differences in learned model based on different representative PDB structures used. Similarly, the supplementary figures show a lack of discriminative power for proteins like PDX1 (homeodomain family), POU, etc. Can the authors shed some light on why such different performances?

      We thank the reviewer for the insightful comments and agree that the familyspecific energy model could provide insight into the model predictions. We will examine different energy models based on the protein family, and especially investigate whether they can explain the lack of discriminative power for certain proteins.

      It is also not clear if IDEA’s prediction for reverse complement sequences is the same for a given sequence. If so, how is this property being modelled? Either this description is lacking or I missed it.

      We thank the reviewer for the insightful comments. The IDEA model treats reverse complementary sequences separately. We will provide additional details on how these sequences are modeled.

      Reviewer 3:

      Summary:

      Protein-DNA interactions and sequence readout represent a challenging and rapidly evolving field of study. Recognizing the complexity of this task, the authors have developed a compact and elegant model. They have applied well-established approaches to address a difficult problem, effectively enhancing the information extracted from sparse contact maps by integrating artificial sequences decoy set and available experimental data. This has resulted in the creation of a practical tool that can be adapted for use with other proteins.

      We appreciate the reviewer’s excellent summary of the paper, and we thank the reviewer for the insightful suggestions and comments.

      Strengths:

      (1) The authors integrate sparse information with available experimental data to construct a model whose utility extends beyond the limited set of structures used for training. (2) A comprehensive methods section is included, ensuring that the work can be reproduced. Additionally, the authors have shared their model as a GitHub project, reflecting their commitment to transparency of research.

      We appreciate the reviewer’s strong assessment of the strengths of this paper.

      Weaknesses:

      (1) The coarse-graining procedure appears artificial, if not confusing, given that full-atom crystal structures provide more detailed information about residue-residue contacts. While the selection procedure for distance threshold values is explained, the overall motivation for adopting this approach remains unclear. Furthermore, since this model is later employed as an empirical potential for molecular modeling, the use of P and C5 atoms raises concerns, as the interactions in 3SPN are modeled between C<sub>α</sub> and the nucleic base, represented by its center of mass rather than P or C5 atoms.

      We appreciate the reviewer’s insightful comments. The selection of P and C5 atoms will augment our model prediction, but the prediction is robust without this selection scheme. We will provide more details on the motivation behind this selection.

      Regarding the simulation model, we acknowledge a potential disconnection between the coarse-grained level of the 3SPN model (3 coarse-grained sites per nucleotide) and the data-driven model (1 coarse-grained site per nucleotide). The selection of nucleic bases for molecular interactions in the 3SPN model follows the PI’s previous work [PMID: 34057467] and its code implementation. We will test the simulation model by incorporating interactions between Cff and P atoms. In the future, we will work on implementing IDEA model output for 1-bead-per-nucleotide DNA simulation models.

      (2) Although the authors use a standard set of metrics to assess model quality and predictive power, some ∆∆G predictions compared to MITOMI-derived ∆∆G values appear nonlinear, which casts doubt on the interpretation of the correlation coefficient.

      We thank the reviewer for the insightful comments and agree that the linear fit between our model’s prediction and the experimental data may not be ideal. The primary utility of the IDEA model is for assessing the relative binding affinities of different DNA sequences. To this end, we plan to perform additional statistical analyses that are independent of the linear correlation assumption but instead focus on the ranked order of DNA sequence binding affinities.

      (3) The discussion section lacks information about the model’s limitations and a comprehensive comparison with other models. Additionally, differences in model performance across various proteins and their respective predictive powers are not addressed.

      We thank the reviewer for the insightful comments and will compare the performance of the IDEA model with state-of-the-art methods. We will also perform detailed analyses of the learned energy models across different proteins and examine their correlation with the model’s predictive powers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Hahn et al use bystander BRET, NanoBiT assays, and APEX2 proteomics to investigate endosomal signaling of CCR7 by two agonists, CCL19 and CCL21. The authors suggest that CCR7 signals from early endosomes following internalisation. They use spatial proteomics to try to identify novel interacting partners that may facilitate this signaling and use this data to specifically enhance a Rac1 signaling pathway. Many of the results in the first few figures showing simultaneous recruitment of Barr and G proteins by CCR7 have been shown previously (Laufer et al, 2019, Cell Reports), as has signaling from endomembranes, and Rac1 activation at intracellular sites. The new findings are the APEX2 proteomics studies, which could be useful to the scientific community. Unfortunately, the authors only follow up on a single finding, and the expansion of this section would improve the manuscript.

      First of all, we would like to thank the reviewer for helping with the manuscript. The summary is mostly accurate except for the statement that simultaneous recruitment of barr and G protein to CCR7 has been shown before. It should also be noted that it has not been demonstrated that CCR7 activates G proteins from endosomes previously nor has the functional role of this signaling mechanism. However, that CCR7 activity at endomembranes is associated with Rac1 signaling was demonstrated in the Laufer et al. study as the reviewer correctly points out.

      Strengths:

      (1) The APEX2 resource will be valuable to the GPCR and immunology community. It offers many opportunities to follow up on findings and discover new biology. The resource could also be used to validate earlier findings in the current manuscript and in previous manuscripts. Was there enrichment of early endosomal markers, Barr and Gi as this would provide further evidence for their earlier claims regarding endosomal signaling? Previous studies have suggested signaling from the TGN, so it is possible that the different ligands also direct to different sites. This could easily be investigated using the APEX2 data.

      Thank you for your comment. We do in fact observe enrichment of TGN/Golgi markers in response to chemokine stimulation, which we now have highlighted in the manuscript (fourth paragraph on page 7).

      (2) The results section is well written and can be followed very easily by the reader.

      We are glad that the reviewer found the results section very readable.

      (3) Some findings verify previous studies (e.g. endomembrane signalling). This should be acknowledged as this shows the validity of the findings of both studies.

      This is correct. We have now included more discussion of previous work related to CCR7 signaling at endomembranes (thirdparagraph on page 10).

      Weaknesses:

      (1) The findings are interesting although the studies are almost all performed in HEK293 cells. I understand that these are commonly used in GPCR biology and are easy to transfect and don't express many GPCRs at high concentrations, but their use is still odd when there are many cell-lines available that express CCR7 and are more reflective of the endogenous state (e.g. they are polarised, they can perform chemotaxis/ migration). Some of the findings within the study should also be verified in more physiologically relevant cells. At the moment only the final figure looks at this, but findings need to be verified elsewhere.

      We thank the reviewer for raising this point and giving us an opportunity to elaborate in further detail. The major goal of our study was to investigate whether CCR7 activates G protein from endosomes, the underlying mechanism, and functions of this potential signaling mechanism. The reason we chose CCR7 as our model receptor was that it belongs to a group of GPCRs, the chemokine receptors, that most often have features associated with the ability to promote endosomal G protein activation (phosphorylation site clusters in the C-terminal region).

      Specific detection of G protein activation at distinct subcellular compartments is currently very challenging in truly endogenous systems despite new innovative biosensors that are available (not just related to CCR7, but GPCRs in general). To our knowledge, most if not all studies that detect direct activation of G protein at a specific compartment whether at the plasma membrane, endosome, Golgi, or other compartments, have overexpressed either the receptor, G protein, or both. This is why we choose the HEK293 cell system for most of our experiments, which are easy to manipulate. That being said, we did confirm major findings in an indirect manner using Jurkat T-cells, which express CCR7 endogenously and are physiological relevant. Our hope is that in the future we will be able to use highly sensitive biosensors to directly confirm our findings in such a cell system as the reviewer wisely suggests.

      (2) The authors acknowledge that the kinetic patterns of the signals at the early endosome are not consistent with the rates of internalisation. They mention that this could be due to trafficking elsewhere. This could be easily looked at in their APEX2 data. Is there evidence of proximity to markers of other membranes? Perhaps this could be added to the discussion. Similarly, previous studies have shown that CCR7 signaling may involve the TGN. Was there enrichment of these markers? If not, this could also be an interesting finding and should be discussed. It is also possible that the Rab5 reporter is just not as efficient as the trafficking one, especially as in later figures the very convincing differences in the two ligands are not as robust as the differences in trafficking.

      Excellent point. We have now highlighted the possibility of CCR7 being further trafficked to the trans-Golgi network (TGN) as possible explanation for the transient translocation of activated CCR7 to the early endosome in Fig. 1G-H (second paragraph on page 3).

      Furthermore, in the APEX2 experiment we observe enrichment of proteins involved in lysosomal trafficking (LAMP1, VPS16, VAMP7, WDR91, and PP4P1) by CCL19 stimulation at 25 min, and recycling endosomes/TGN markers (SNX6, RAB7L, and GGA) by CCL21 stimulation at 25 min. In addition to this, several markers of TGN/Golgi (SNX3, COG5, YIF1A, SC22B, and AP3S1) were enriched as well in response to both CCL19 and CCL21 stimulation. We have now included a statement in the manuscript, which describes the likely trafficking of CCR7 to the TGN/Golgi in response to CCL19 and CCL21 stimulation (fourth paragraph on page 7).

      (3) In the final sentence of paragraph 2 of the results the authors state that the internalisation is specific to CCR7 as there isn't recruitment to V2R. I'm not sure this is the best control. The authors can only really say it doesn't recruit to unrelated receptors. The authors could have used a different chemokine receptor which does not respond to these ligands to show this.

      The point with this control experiment was to demonstrate that the loss of NanoBiT signal in response to CCL19 in CCR7-SmBiT/LgBiT-CAAX expressing cells, but not in V2R-SmBiT/LgBiT-CAAX expressing cells, was a result of bona fide CCR7 internalization rather than potential artifactual effects of CCL19 on the NanoBiT system. Our intent was not to demonstrate specificity of CCL19 among chemokine receptors, which already has been thoroughly tested in previous studies. We have now modified the sentence (second paragraph on page 3) “Moreover, CCL19/CCL21-stimulation of receptor internalization to endosomes is specific to CCR7 as none of the chemokines promote internalization or trafficking to endosomes of the vasopressin type 2 receptor (V<sub>2</sub>R)-SmBiT construct (Fig. S1E-F)” to “Moreover, CCL19/CCL21-stimulation did not promote internalization or trafficking to endosomes of the vasopressin type 2 receptor (V<sub>2</sub>R)-SmBiT construct, which validates that these chemokines act specifically via the CCR7-SmBiT system (Fig. S1E-F).”

      (4) The miniGi-Barr1 and imaging showing co-localisation could be more convincing if it was also repeated in a more physiological cell line as in the final figure. Imaging of CCR7, miniGi, and Barr1 would also provide further evidence that the receptor is also present within the complex.

      We agree with the reviewer’s assessment. However, as mentioned above it is currently extremely challenging to detect endogenous G protein coupling/activation to endogenous receptors. In addition, we are not sure if overexpressing fluorophore-tagged receptor, miniG, and barr1 in a physiological-relevant cell line would provide truly physiological conditions as the expression of these proteins still would be artificially high. This is why we chose to conduct these mechanistic experiments in HEK293 cells and then indirectly verify key findings in an endogenous and physiological-relevant cell line.

      (5) The findings regarding Rac1 are interesting, although an earlier paper found similar results (Laufer et al, 2019, Cell Reports), so perhaps following up on another APEX2-identified protein pathway would have been more interesting. The authors' statement that Rac1 is specifically activated, and RhoA and Cdc42 are not, is unconvincing from the current data. Only a single NanoBiT assay was used, and as raw values are not reported it is difficult for the reader to glean some essential information. The authors should show evidence that these reporters work well for other receptors (or cite previous studies) and also need evidence from an independent (i.e. non-NanoBiT or BRET) assay.

      The major focus of the study was to investigate whether CCR7 can activate G protein after having been internalized into endosomes via formation of CCR7-Gi/o-barr megaplexes, and to dissect out potential functions of said endosomal G protein signaling. To do this, we used CCL19 and CCL21 which stimulate G protein to the same extent but differ in their ability of promote barr recruitment and receptor internalization with CCL19 being superior to CCL21. To this end, we found that CCL19 also promote endosomal G protein activation to a greater extent than CCL21, and therefore, we specifically looked for proteins enriched by CCL19 in our APEX experiment. This led us to some Rho GTPase regulators that were differentially enriched by CCL19 and CCL21. We agree that there were other interesting effectors related to CCR7 biology identified in the APEX experiment such as EYA2, GRIP2, and EI24. However, those proteins were enriched similar by CCL19 and CCL21 challenge, and thus, do not seem to be activated specifically at endosomes. Following the same argument, we also did not observe any difference in the activity of RhoA or Cdc42 when stimulated with CCL19 or CCL21, so we cannot conclude that these signaling proteins are activated specifically in endosomes. On the other hand, Rac1 was stimulated to a larger degree by CCL19 than CCL21, its activity was inhibited by the Gi/o inhibitor PTX and endocytosis inhibitors Dyngo-4a and PitStop2. CCR7-mediated Rac1 signaling was also inhibited by expression of a dominant negative dynamin mutant that inhibits receptor internalization, and Rac1 was not activated by an internalization-deficient CCR7-DS/T mutant. Finally, the involvement of Rac1 in CCR7 mediated chemotaxis of Jurkat T cells was also demonstrated. We believe that these findings together provide strong basis for the claim that endosomal Gi/o protein signaling by CCR7 activates Rac1.

      Following the reviewer’s suggestion, we have now included experiments to show that the activation of RhoA, Rac1, and Cdc42 by CXCR4 also can be detected by the NanoBiT biosensors (Fig. S7D-F). We have also added the appropriate references to the original studies where these biosensors were developed in the results section (first paragraph on page 8).

      (6) At present, the studies in Figure 7 do not go beyond those in the previous Laufer et al study in which they showed blocking endocytosis affected Rac1 signalling. The authors could show that Rac1 signalling is from early endosomes to improve this, otherwise, it could be from the TGN as previously reported.

      The major purpose of Figure 7 was to indirectly confirm findings from HEK293 cells experiments and to tie them to physiological functions. Our experiments using Jurkat T-cells show that CCL19 promote stronger chemotactic response than CCL21 despite similar Gi/o response. In addition, we showed that CCR7-mediated Gi/o activation, receptor endocytosis, as well as Rac1 activity, are required to drive chemotaxis. The Laufer et al. study did not investigate whether CCR7 activates G protein after having been internalized into endosomes via formation of CCR7-Gi/o-barr megaplexes, and thus, did not focus on functional outcomes of this signaling mechanism. Based on this, we believe our work provides new and valuable knowledge to the field.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript describes a comprehensive analysis of signalling downstream of the chemokine receptor CCR7. A comprehensive dataset supports the authors' hypothesis that G protein and beta-arrestin signalling can occur simultaneously at CCR7 with implications for continued signalling following receptor endocytosis.

      We would like to thank the reviewer for helping with the manuscript. We agree on all points made and have now updated the manuscript accordingly.

      Strengths:

      The experiments are well controlled and executed, employing a wide range of assays using - in the main - CCR7 transfectants. Data are well presented, with the authors' claims supported by the data. The paper also has an excellent narrative which makes it relatively easy to follow. I think this would certainly be of interest to the readership of the journal.

      We appreciate the positive assessment of strengths.

      Weaknesses:

      Since the authors show a differential enrichment of RhoGTPases by CCR7 stimulation with CCL19 versus CCL21, I think that they also need to show that the Gi/o coupling of HEK-292-CCR7-APEX2 cells to both CCL19 and CCL21 is not perturbed by the modification. Currently, the authors only show data for CCL19 signalling, which leaves the potential for a false negative finding in terms of CCL21 signalling being selectively impaired. This should be relatively easy to do and should strengthen the authors' conclusions.

      We agree with the reviewer and have now included experiments to show that both CCL19- and CCL21-mediated CCR7-APEX2 stimulation leads to Gi/o activation (Fig. S4C). In addition, our proteomics experiments show strong effects of both CCL19 and CCL21 stimulation, which suggest that the receptor is activated by both ligands.

      The authors conclude the discussion by suggesting that their findings highlight endosomal signalling as a general mechanism for chemokine receptors in cell migration. I think this is an overreach. The authors chose several studies of CXC chemokine receptors to support their argument that C-terminal truncation or mutation of the C-terminal phosphorylation sites impairs endocytosis and chemotaxis (refs 40-42). However, in some instances e.g. at the related chemokine receptor CCR4, C-terminal removal of these sites impairs endocytosis but promotes chemotaxis (Nakagawa et al, 2014); Anderson et al, 2020). I therefore think that either the final statement needs to be tempered down or the counterargument discussed a little.

      We appreciate the reviewer highlighting this point. We have now modified the concluding sentence from “Thus, the findings from our study highlight endosomal G protein signaling by chemokine receptors as a potential general mechanism that regulates key aspects of cell migration” to “Thus, the findings from our study highlight endosomal G protein signaling by some chemokine receptors as a potential mechanism that regulates key aspects of cell migration.” We hope that the temper level of this sentence is more appropriate.

      References:

      Anderson, C. A. et al. A degradatory fate for CCR4 suggests a primary role in Th2 inflammation. J Leukocyte Biol 107, 455-466 (2020).

      Nakagawa, M. et al. Gain-of-function CCR4 mutations in adult T cell leukaemia/lymphoma. Journal of Experimental Medicine 211, 2497-2505 (2014).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The results section is well written, although the introduction needs more information on what is known about CCR7 trafficking and endomembrane signaling. I understand this is because the authors wanted to focus on GPCR signaling, but the study will equally be of interest to researchers in the immunology and chemokine fields, and therefore more CCR7-focussed discussion in the introduction would be useful. Similarly, the discussion would benefit from more discussion of previous studies of CCR7 trafficking and endomembrane signaling (in particular the Laufer et al paper) to acknowledge that many of the findings within this paper verify previous studies.

      We have now included additional immunology/endomembrane background information about CCR7 at the place where the receptor is introduced (first paragraph on page 3). We have also expanded our discussion of our work in relation to the Laufer et al. study (third paragraph on page 10).

      (2) On page 5, the authors state that 'The response to chemokine stimulation was not observed in mock transfected HEK293 cells'. Figure S4D does not have a legend so it is difficult to see what they mean by mock transfected. Do they mean not transfecting with anything or not with the receptor? The better control would be transfecting the reporters but not the receptor. This may have been done, but the wording needs clarifying and S4D needs a legend.

      Thanks for pointing this out. We believe the reviewer refers to Figure S2D and we have now highlighted/clarified the legend better. Mock transfected conditions refer to HEK293 cells transfected with the reporter, but not the receptor. This is written in the legend as “(D) Change in luminescence signal generated between SmBiT-barr1 and LgBiT-miniGi in response to 100 nM CCL19 or 100 nM CCL21 in mock transfected HEK293 cells (no CCR7)”, which we believe should be clear to the audience.

      (3) The validation of the APEX2 receptor construct relies on a single assay with one ligand. The authors should show that the receptor expresses at the cell surface, is internalised normally, and that both ligands activate the receptor.

      We have now included additional data to show that (1) the receptor is expressed at the cell surface, (2) that the CCR7-APEX2 recruits barr1 to the plasma membrane, (3) that this association leads to barr1 translocation to the early endosomes as an indirect measurement of receptor internalization, and (4) that both CCL19- and CCL21-stimulation inhibit forskolin induced cAMP production (Fig.S4A-C, and described in fifth paragraph on page 6).

      (4) The APEX2 section is very short, especially as this is novel data. It lacks some important information, e.g. when the authors state that 'we identified a total of 579 proteins', is this in total for both ligands, separately or were some shared? More information on each ligand separately and combined would make this clearer.

      We have now specified that the identified total proteins enriched from our APEX2 approach is when the cells are stimulated with either CCL19 or CCL21 (third paragraph on page 7). Furthermore, we have included a Venn diagram in Fig. S5C to show how many proteins were enriched by CCL19 or CCL21 stimulation and how many of those were shared at different time points.

      (5) The discussion would benefit from some further work. The current first two paragraphs just reiterate the introduction and don't discuss the current paper so could be removed completely. The Laufer et al study needs much more discussion as they report many of the findings of the current paper (signaling following endocytosis, Rac1 endomembrane signaling) five years ago. The APEX2 findings that are discussed, though interesting, are not followed up by further experimental evidence and there is little discussion of why the two ligands have different responses or what the physiological effects could be.

      We appreciate the reviewer’s effort in helping with the discussion. To this end, we have now expanded our discussion of the mentioned paper further as suggested (third paragraph on page 10). We agree that the findings from our APEX experiment are interesting, but the focus of this study relates to proteins enriched specifically at endosomes. Several of the most enriched proteins did not show this localization bias, which is why these proteins were not further investigated.

      Minor changes:

      (1) The authors should remove the word 'recent' at the start of the first sentence of the third paragraph. Endosomal signaling by GPCRs was described 15 years ago so cannot really be seen as recent anymore.

      We have now adjusted the manuscript accordingly.

      (2) Tukey defaulted to Turkey in some places.

      We thank the reviewer for pointing out these typos, which now have been corrected.

      Reviewer #2 (Recommendations For The Authors):

      Minor Points:

      (1) ACKRs do not couple to G proteins so it is peculiar to see them in this table. I would limit the table to the conventional CCR1-10, CXCR1-6 and XCR1. The ligand for XCR1 is XCL1 which is absent from the table.

      We have now modified the table accordingly.

      (2) CCL19 (formerly known as ELC) has been long known to be a more efficacious and potent ligand in chemotaxis assays (Bardi et al, 2001). This earlier reference should be added to the citations in the preceding statement on page 10.

      This is an important study showing that CCL19 is more efficacious than CCL21 in promoting chemotaxis and that this has been known for decades. We have now included the reference accordingly (reference 59 in second paragraph on page 11).

      (3) Figure 6, Panel Q. I think the legends for CCR7 and CCR7 delta ST might be flipped.

      We thank the reviewer for pointing out this error. We have now corrected the figure panel.

      (4) Figure S5 (or 5) might benefit from simple Venn diagrams showing the numbers of differentially enriched proteins following treatment with the two ligands at different time points.

      We have included a Venn diagram in Fig. S5C to show how many proteins were enriched by CCL19 or CCL21 stimulation and how many of those where shared.

      Reference:

      Bardi, G., Lipp, M., Baggiolini, M. & Loetscher, P. The T cell chemokine receptor CCR7 is internalized on stimulation with ELC, but not with SLC. European Journal of Immunology 31, 3291-3297 (2001).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Understanding the mechanisms of how organisms respond to environmental stresses is a key goal of biological research. Assessment of transcriptional responses to stress can provide some insights into those underlying mechanisms. The researchers quantified traits, fitness, and gene expression (transcriptional) response to salinity stress (control vs stress treatments) for 130 accessions of rice (three replicates for each accession), which were grown in the field in the Philippines. This experimental design allowed for many different types of downstream analyses to better understand the biology of the system. These analyses included estimating the strength of selection imposed on transcription in each environment, evaluating possible trade-offs in gene expression, testing whether salinity induces transcriptional decoherence, and conducting various eQTL-type analyses.

      Strengths:

      The study provides an extensive analysis of gene expression responses to stress in rice and offers some insights into underlying mechanisms of salinity responses in this important crop system. The fact that the study was conducted under field conditions is a major plus, as the gene expression responses to soil salinity are more realistic than if the study was conducted in a greenhouse or growth chamber. The preprint is generally well-written and the methods and results are mostly well-described.

      Weaknesses:

      While the study makes good use of analyzing the dataset, it is not clear how the current work advances our understanding of gene regulatory evolution or plant responses to soil salinity generally. Overall, the results are consistent with other prior studies of gene expression and studies of selection across environmental conditions. Some of the framing of the paper suggests that there is more novelty to this study than there is in reality. That said, the results will certainly be useful for those working in rice and should be interesting to scientists interested in how gene expression responses to stress occur under field conditions. I detail other concerns I had about the preprint below:

      The abstract on lines 33-35 illustrates some of my concerns about the overstatement of the novelty of the current study. For example, is it really true that the role of gene expression in mediating stress response and adaptation is largely unexplored? There have been numerous studies that have evaluated gene expression responses to stresses in a wide range of organisms. Perhaps, I am missing something critically different about this study. If so, I would recommend that the authors reword this sentence to clarify what gap is being filled by this study. Further, is it really the case that none of them have evaluated how the correlational structure of gene expression changes in response to stresses in plants, as implied in lines 263-265? Don't the various modules and PC analyses of gene expression get at this question?

      We have re-worded these sentences, and highlighted the novelty of our work.

      There were some places in the methods of the preprint that required more information to properly evaluate. For example, more information should be provided on lines 664-668 about how G, E, and GxE effects were established, especially since this is so central to this study. What programs/software (R? SAS? Other?) were used for these analyses? If R, how were the ANOVAs/models fit? What type of ANOVA was used? How exactly was significance determined for each term? Which effects were considered fixed and which were random? If the goal was to fit mixed models, why not use an approach like voom-limma (Law et al. 2014 Genome Biology)? More details should also be added to lines 688-709 about these analyses, including what software/programs were used for these analyses.

      We have added more details in the methods. Also, although we could in priciple use voom-limma to fit our mixed model, to be able to partition variance into G, E and G×E, we need to use the function fitExtractVarPartModel (from package VariancePartition) which requires all categorical variables to be modeled as random effects. Therefore, we couldn’t model environment as a fixed effect.

      One thing that I found a bit confusing throughout was the intermixing of different terms and types of selection. In particular, there seemed to be some inconsistencies with the usage of quantitative genetics terms for selection (e.g. directional, stabilizing) vs molecular evolution terms for selection (e.g. positive, purifying). I would encourage the authors to think carefully about what they mean by each of these terms and make sure that those definitions are consistently applied here.

      We have defined the selection terms used in the study and used these terms consistently throughout the manuscript.

      It would be useful to clarify the reasons for the inherent bias in the detection of conditional neutrality (CN) and antagonistic pleiotropy (AP; Lines 187-196). It is also not clear to me what the authors did to deal with the bias in terms of adjusting P-value thresholds for CN and AP the way it is currently written. Further, I found the discussion of antagonistic pleiotropy and conditional neutrality to be a bit confusing for a couple of reasons, especially around lines 489-491. First of all, does it really make sense to contrast gene expression versus local adaptation, when lots of local adaptation likely involves changes in gene expression? Second, the implication that antagonistic pleiotropy is more common for local adaptation than the results found in this study seems questionable. Conditional neutrality appears to be more common for local adaptation as well: see Table 2 of Wadgymar et al. 2017 Methods in Ecology and Evolution. That all said, it is always difficult to conclude that there are no trade-offs (antagonistic pleiotropy) for a particular locus, as the detecting trade-offs may only manifest in some years and not others and can require large sample sizes if they are subtle in effect.

      We have now explained the cause of the inherent bias in the detection of CN, and also elaborated on how we deal with this bias. Also, we have edited our discussion and added relevant citations to indicate both conditional neutrality and antagonistic pleiotropy can lead to local adaptations and added the caveat regarding detecting antagonistic pleiotropy.

      Reviewer #2 (Public Review):

      The authors investigate the gene expression variation in a rice diversity panel under normal and saline growth conditions to gain insight into the underlying molecular adaptive response to salinity. They present a convincing case to demonstrate that environmental stress can induce selective pressure on gene expression, which is in agreement to their earlier study (Groen et al, 2020). The data seems to be a good fit for their study and overall the analytic approach is robust.

      (1) The work started by investigating the effect of genotype and their interaction at each transcript level using 3'-end-biased mRNA sequencing, and detecting a wide-spread GXE effect. Later, using the total filled grain number as a proxy of fitness, they estimated the strength of selection on each transcript and reported stronger selective pressure in a saline environment. However, this current framework relies on precise estimation of fitness and, therefore can be sensitive to the choice of fitness proxy.

      We now acknowledge this caveat in the discussion.

      (2) Furthermore, the authors decomposed the genetic architecture of expression variation into cis- and trans-eQTL in each environment separately and reported more unique environment-specific trans-eQTLs than cis-. The relative contribution of cis- and trans-eQTL depends on both the abundance and effect size. I wonder why the latter was not reported while comparing these two different genetic architectures. If the authors were to compare the variation explained by these two categories of eQTL instead of their frequency, would the inference that trans-eQTLs are primarily associated with expression variation still hold?

      We have now also reported the effect sizes for both cis- and trans-eQTLs in the two environments and showed that the trans-eQTLs have higher effect sizes as compared to cis-eQTLs, indicating that they are able to explain higher proportion of variation in transcript abundances in the two environments.

      (3) Next, the authors investigated the relationship between cis- and trans-eQTLs at the transcript level and revealed an excess of reinforcement over the compensation pattern. Here, I struggle to understand the motivation for testing the relationship by comparing the effect of cis-QTL with the mean effect of all trans-eQTLs of a given transcript. My concern is that taking the mean can diminish the effect of small trans-eQTLs potentially biasing the relationship towards the large-effect eQTLs.

      We wanted to estimate compensating vs reinforcing effects, which essentially entails identifying genes that have opposing directionality of cis and trans-effects. To get the total trans-effect we decided to take the mean effect of trans-eQTLs. This mean was only used to identify the compensating/reinforcing genes and although the mean effects diminishes the effect of small trans-eQTLs, this mean was not used in downstream analyses.

      Reviewer #3 (Public Review):

      In this work, the authors conducted a large-scale field trial of 130 indica accessions in normal vs. moderate salt stress conditions. The experiment consists of 3 replicates for each accession in each treatment, making it 780 plants in total. Leaf transcriptome, plant traits, and final yield were collected. Starting from a quantitative genetics framework, the authors first dissected the heritability and selection forces acting on gene expression. After summarizing the selection force acting on gene expression (or plant traits) in each environment, the authors described the difference in gene expression correlation between environments. The final part consists of eQTL investigation and categorizing cis- and trans-effects acting on gene expression.

      Building on the group's previous study and using a similar methodology (Groen et al. 2020, 2021), the unique aspect of this study is in incorporating large-scale empirical field works and combining gene expression data with plant traits. Unlike many systems biology studies, this study strongly emphasizes the quantitative genetics perspective and investigates the empirical fitness effects of gene expression data. The large amounts of RNAseq data (one sample for each plant individual) also allow heritability calculation. This study also utilizes the population genetics perspective to test for traces of selection around eQTL. As there are too many genes to fit in multiple regression (for selection analysis) and to construct the G-matrix (for breeder's equation), grouping genes into PCs is a very good idea.

      Building on large amounts of data, this study conducted many analyses and described some patterns, but a central message or hypothesis would still be necessary. Currently, the selection analysis, transcript correlation structure change, and eQTL parts seem to be independent. The manuscript currently looks like a combination of several parallel works, and this is reflected in the Results, where each part has its own short introduction (e.g., 185-187, 261-266, 349-353). It would be great to discuss how these patterns observed could be translated to larger biological insights. On a related note, since this and the previous studies (focusing on dry-wet environments) use a similar methodology, one would also wonder what the conclusions from these studies would be. How do they agree or disagree with each other?

      We acknowledge that the manuscript currently presents some analyses in a somewhat independent manner. Although it would be ideal to have a central hypothesis/message, our study is meant to broadly outline the various responses and fitness effects of salinity stress in rice. Throughout the manuscript, we have also included comparisons between our findings and that of our previous studies on drought stress to highlight any consistent themes or novel insights.

      Many analyses were done separately for each environment, and results from these two environments are listed together for comparison. Especially for the eQTL part, no specific comparison was discussed between the two environments. It would be interesting to consider whether one could fit the data in more coherent models specifically modeling the X-by-environment effects, where X might be transcripts, PCs, traits, transcript-transcript correlation, or eQTLs.

      We do plan to consider fitting models that explicitly incorporate X-by-environment interactions to provide a more detailed understanding of the genetics of plasticity between the two environments, but it is beyond the scope of this paper. This will be explored in a separate report.

      As stated, grouping genes into PCs is a good idea, but although in theory, the PCs are orthogonal, each gene still has some loadings on each PC (ie. each PC is not controlled by a completely different set of genes). Another possibility is to use any gene grouping method, such as WGCNA, to group genes into modules and use the PC1 of each module. There, each module would consist of completely different sets of genes, and one would be more likely to separate the biological functions of each module. I wonder whether the authors could discuss the pros and cons of these methods.

      We recognize that individual genes can contribute to multiple PCs, and this is precisely why we choose PCA clustering over WGCNA where one gene can belong to only one module. Our aim was to recognize all biological processes that could be under selection in either environment, and since one gene can be involved in various different processes, we wanted to identify the contribution of these genes to different processes which can be done effectively by a PCA analyses.

      Reviewer #4 (Public Review):

      The manuscript examines how patterns of selection on gene expression differ between a normal field environment and a field environment with elevated salinity based on transcript abundances obtained from leaves of a diverse panel of rice germplasm. In addition, the manuscript also maps expression QTL (eQTL) that explains variation in each environment. One highlight from the mapping is that a small group of trans-mapping regulators explains some gene expression variation for large sets of transcripts in each environment. The overall scope of the datasets is impressive, combining large field studies that capture information about fecundity, gene expression, and trait variation at multiple sites. The finding related to patterns indicating increased LD among eQTLs that have cis-trans compensatory or reinforcing effects is interesting in the context of other recent work finding patterns of epistatic selection. However, other analyses in the manuscript are less compelling or do not make the most of the value of collected data. Revisions are also warranted to improve the precision with which field-specific terminology is applied and the language chosen when interpreting analytical findings.

      Selection of gene expression:

      One strength of the dataset is that gene expression and fecundity were measured for the same genotypes in multiple environments. However, the selection analyses are largely conducted within environments. The addition of phenotypic selection analyses that jointly analyze gene expression across environments and or selection on reaction norms would be worthwhile.

      We do plan to consider fitting models that explicitly incorporate G×E interactions to provide a more detailed understanding of the genetics of plasticity between the two environments, but it is beyond the scope of this paper. This will be explored in a separate report.

      Gene expression trade-offs:

      The terminology and possibly methods involved in the section on gene expression trade-offs need amendment. I specifically recommend discontinuing reference to the analysis presented as an analysis of antagonistic pleiotropy (rather than more general trade-offs) because pleiotropy is defined as a property of a genotype, not a phenotype. Gene expression levels are a molecular phenotype, influenced by both genotype and the environment. By conducting analyses of selection within environments as reported, the analysis does not account for the fact that the distribution of phenotypic values, the fitness surface, or both may differ across environments. Thus, this presents a very different situation than asking whether the genotypic effect of a QTL on fitness differs across environments, which is the context in which the contrasting terms antagonistic pleiotropy and conditional neutrality have been traditionally applied. A more interesting analysis would be to examine whether the covariance of phenotype with fitness has truly changed between environments or whether the phenotypic distribution has just shifted to a different area of a static fitness surface.

      We recognize that pleiotropy is a property of a genotype, and not phenotype, but since our phenotype (gene expression) is strongly coupled with the genotype, we choose to call trade-offs as antagonistic pleiotropy. That being said, we did test whether the covariance of gene expression with phenotype significantly varies between environments, and found that to indeed be the case.

      Biological processes under selection / Decoherence: PCs are likely not the most ideal way to cluster genes to generate consolidated metrics for a selection gradient analysis. Because individual genes will contribute to multiple PCs, the current fractional majority-rule method applied to determine whether a PC is under direct or indirect selection for increased or decreased expression comes across as arbitrary and with the potential for double-counting genes. A gene co-expression network analysis could be more appropriate, as genes only belong to one module and one can examine how selection is acting on the eigengene of a co-expression module. Building gene co-expression modules would also provide a complementary and more concrete framework for evaluating whether salinity stress induces "decoherence" and which functional groups of genes are most impacted.

      We recognize that individual genes can contribute to multiple PCs, and this is precisely why we choose PCA clustering over WGCNA where one gene can belong to only one module. Our aim was to recognize all biological processes that could be under selection in either environment, and since one gene can be involved in various different processes, we wanted to identify the contribution of these genes to different processes which can be done effectively by a PCA analyses. But again as pointed out by the reviewer, our PCs did contain contribution (even negligible) of each gene, so to identify the ‘primary’ biological processes represented by the PCs, we chose the majority rule. As for testing decoherence, we agree that a co-expression module analyses would have provided additional support to the specific test performed in our manuscript, but since it would just be additional support, we choose to not add it in the manuscript.

      But based on the recommendation of the reviewer(s), we did perform a WGCNA analyses and found a total of 14 and 13 modules in normal and saline conditions, of which 0 and 2 modules (with no significant GO enrichment) were under directional selection. This supports our reasoning of potentially missing on identification of processes under selection.

      Selection of traits:

      Having paired organismal and molecular trait data is a strength of the manuscript, but the organismal trait data are underutilized. The manuscript as written only makes weak indirect inferences based on GO categories or assumed gene functions to connect selection at the organismal and molecular levels. Stronger connections could be made for instance by showing a selection of co-expression module eigengene values that are also correlated with traits that show similar patterns of selection, or by demonstrating that GWAS hits for trait variation co-localize to cis-mapping eQTL.

      We did perform a GWAS for all the traits collected in both normal and saline environment, and only found significant hits for fecundity (in both normal and saline environment) and chlorophyll_a content (in the saline environment). But these regions did not overlap with any candidate genes or cis-mapping eQTL. Hence we choose to mention it in the manuscript. Additionally, using the WGCNA modules, we found that the only two module under selection in the saline environment were not significantly correlated with any of the traits measured.

      Genetic architecture of gene expression variation:

      The descriptive statistics of the eQTL analysis summarize counts of eQTLs observed in each environment, but these numbers are not broken down to the molecular trait level (e.g., what are the median and range of cis- and trans-eQTLs per gene). In addition, genetic architecture is a combination of the numbers and relative effect sizes of the QTLs. It would be useful to provide information about the relative distributions of phenotypic variance explained by the cis- vs. trans- eQTLs and whether those distributions vary by environment. The motivation for examining patterns of cis-trans compensation specifically for the results obtained under high salinity conditions is unclear to me. If the lines sampled have predominantly evolved under low salinity conditions and the hypothesis being evaluated relates to historical experience of stabilizing selection, then my intuition is that evaluating the eQTL patterns under normal conditions provides the more relevant test of the hypothesis.

      We have added the median number of eQTLs per gene in each environment. Additionally, we recognize that genetic architecture is a combination id numbers and effect size, and we have added information regarding the effect sizes of eQTLs by type and by environment as recommend by another reviewer. We did explore the distributions of phenotypic variance explained by the cis- vs. trans- eQTLs as recommended here, and found that trans-eQTLs explain more phenotypic variance than cis-eQTLs in both environments and that the distribution of either type of eQTL does not vary by environment. We are choosing to not add this in the main text due to space limitations. Lastly, we examined the patterns of cis-trans compensation/reinforcement under both normal and salinity conditions and have compared and contrasted the results from both in the main text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Lines 126: I would recommend citing those who originally developed the 3' end targeted RNA sequencing methods (e.g. Meyer et al 2011 Molecular Ecology).

      We have cited the recommended paper.

      Lines 128-130: It would be useful to include a description here of what models were fit to the data to partition out G, E, and GxE effects.

      Due to space limitations, we have in brief added a sentence to this effect.

      Line 139: I would suggest changing "found little" to "no" since the test was not significant.

      The sentence has been modified to say no evidence.

      Line 313: I think you mean directional selection instead of positive selection.

      We have corrected the text

      Lines 362-363: Would the authors also expect an enrichment of reinforcing genes for most scenarios where that has been divergent selection, such as local adaptation among populations?

      Based on our hypothesis, we would indeed expect an enrichment of reinforcing genes for scenarios of local adaptation where different alleles are maintained in different populations due to local adaptation.

      Reviewer #3 (Recommendations For The Authors):

      Figures 1d-e are not mentioned in the Results.

      The figures have been referenced in appropriate places.

      Lines 41-45: Terms such as reinforcement and compensation need to be explained in this specific context. Also "different selection regimes" is a bit broad and vague.

      Due to word-count limitation, we are choosing to not elaborate the terms reinforcement and compensation in the abstract (since these are commonly used in the literature, and we have also defined these in the main text). Additionally, we now explicitly state the selection pressures associated with cis and trans eQTLs.

      Table 1: Please explain S and C in the footnote.

      We have added the recommended footnote

      Figures: Some panel labels (a, b, c...) are mingled with the graphs.

      We are re-made our figure such that the panel labels do not mingle with the plots.

      Lines 588-591: font.

      Modified

      Lines 620-633: Please describe how these RNAseq libraries were allocated/pooled into different sequencing lanes to avoid potential batch effects among sequencing lanes.

      The sequencing was performed on the same Illumina NextSeq 500 machine and we have added the sequencing libraries/pool plan in the methods (lines 688-689). 

      Lines 690-692: At the beginning of this paragraph, it was mentioned that the un-standardized coefficients were estimated. But here, it seems like the transcript data were already standardized in the data preparation step. What do lines 687-688 refer to? Further standardizing those estimated coefficients so that the whole distribution has mean=0 and sd=1?

      Thank you pointing out our oversight. We checked our scripts and data preparation did not include transcript standardization, and we have removed the above line from the manuscript.

      Lines 705-711: Please explain why assigning the positive/negative selection status for each gene is important. "Positive selection" here is defined as genes whose increased expression also increases fitness, but traditionally positive selection was defined as "the derived state is favored over the ancestral state". For a gene whose ancestral expression is high but lower expression increases fitness in this experiment, could we also say this gene is under positive selection? Given that we don't know the ancestral state here, maybe the authors could explain whether this definition is necessary. Also, given that many genes positively or negatively regulate each other in a pathway, it is also unclear whether it is necessary to assign the positive/negative status for a PC using the majority rule (lines 710-711).

      We have now defined the different selection terms with respect to our study and use them consistently throughout the manuscript.

      Lines 711-715: If I understand correctly, PCs were used as traits, and by definition PCs should all be orthogonal. Is this section saying only retaining PCs whose correlation < 0.6 with each other? What is the rationale?

      PCA were performed on transcript abundance and the resulting orthogonal PCs explaining over 0.5% variance were all retained for selection analyses.

      We also performed selection analyses on the functional traits measured in the field, but since these functional traits are correlated (and as such would not satisfy the independent variable requirement of regression analyses), we retained only those functional traits which had a Pearson correlation coefficient < 0.6.

      Line 729: Please briefly describe what CLIP is doing.

      We have added the required description.

      Lines 736-741: The accession numbers do not add up to 125.

      Thank you for catching our oversight. We have edited the text, and now the numbers add upto 125.

      Line 796: Please remind readers where these 247k SNPs come from. Supposedly all accessions have been whole-genome sequenced, so the total number of SNPs should be larger than this.

      We have detailed method detailing how the SNPs were obtained and processed in the lines preceding this. Indeed the number of SNPs would have been much bigger, but the stringent cutoffs and linkage disequilibrium pruning reduced our dataset to about 247k SNPs.

      Lines 154-160: This is a bit confusing. The authors first mentioned, for the raw selection differentials, the mean and variance differ between environments, meaning they are misleading (why?). The next sentence then says non-standardized selection differentials will be used.

      The mean and variance for transcript abundances vary between the two environments. Because traits are usually measured in different scales, it is recommended to standardize trait values using variance or mean before estimating selection coefficients. Multiplying this variance (or mean) standardized selection differential with heritability gives the expected response to selection in standard deviation (or mean) units. But if the trait variance (or mean) varies between traits or environments, it leads to a conflation between the standardized selection differential and trait variance (or mean), which can be misleading. So to avoid this, and given that our traits (transcript abundance in this case) were all measured on the same scale, we chose to not standardize our trait values and estimated raw selection differentials.

      Figure 1 c-e: Please explain how the horizontal axis values were obtained. Is it assuming these selection differentials have a normal distribution of mean=0 & sd=1?

      Yes, horizontal axis represents theorical quantile for selection differential assuming they have a normal distribution with mean=0 and sd=1. This has been added to the figure legend.

      Line 162-168: Please clarify this part. What does “general trend towards stronger positive compared to negative selection on gene expression” mean? Does it mean the whole distribution of S is significantly different from 0, the difference in the number of genes in the S>0 vs S<0 category, or the a-bit-higher median |S| in the S>0 vs S<0 category? If it is the last one, are the small differences biological meaningful (0.053 vs. 0.047 for control & 0.051 vs. 0.050 for salt conditions), given that the authors defined |S|<0.1 as neutral?

      By “general trend towards stronger positive compared to negative selection on gene expression”, we mean that more transcripts were under positive directional selection as compared to negative directional selection. We have also clarified this in the text now.

      Line 177-178: This sentence implies disruptive selection is more important than stabilizing selection in the saline environment, but the test was not significant (line 176).

      Although there was no significant difference in the magnitude of stabilizing vs disruptive selection within the saline environment, the number of transcripts experiencing stronger disruptive selection in the saline condition was greater than the number of transcripts experiencing disruptive selection in the normal conditions. And so comparing between conditions, disruptive selection plays an important role in the saline conditions.

      Line 188-190: How CN vs. AP was statistically defined was not mentioned in the Methods section.

      We have added in the main text within the Results section.

      Line 203-214: How do these results fit with the previous observations that almost all transcripts have significant heritability?

      Although we do find that all but three transcripts have a have significant genetic effect (and thus have significant heritability), the median broad-sense heritability for 51 antagonistically pleiotropic genes is 0.23. Give that, we would only be able to detect SNPs regulating gene expression with high effect size since our sample size is n=130. Additionally, we used a very stringent criteria (FDR < 0.001) to define eQTLs. These two factors in combination could lead to us not being able to detect significant eQTLs for AP genes.

      Line 246-250: Please explain why the current conclusion would be opposite from the previous study. Supposedly the PCA, G matrix, and breeder’s equation were done for each environment separately. It makes sense that the G matrix and response to selection could be different between saline and drought treatments, but for the control treatments in the two studies, do they still differ? Why? Also in Table S7, it would be nice to show the % variation explained by each PC.

      Although both our studies had largely overlapping samples, about 20% samples were unique to each study. Additionally, although the site where the study was performed was the same across the two studies, we found significant temporal differences in gene expression due to micro-environmental differences. Both these factors can lead to changes in direct and indirect selection and its response, and we are examining these differences as part of a separate study. We also highlight these caveats in our discussion.

      Information on percent explained by each PCs is given in Table S5.

      Figure 2b: The vertical axis was labeled as “selection gradient”, but I think the responses to selection (D, I, T) have different units.

      We have re-labeled the vertical axis as “selection”.

      Reviewer #4 (Recommendations For The Authors):

      The manuscript mixes terminology for selection from quantitative genetics with that from population genetics. This is problematic, and the adjectives positive and negative should be replaced as descriptors of selection by instead rewording, for example, positive directional selection as directional selection for higher transcript abundance.

      Lines 193-196: The phrasing here reads as if the selection is solely acting on the presence/absence of expression rather than on quantitative variation in expression. During revision, it would be worth considering including an analysis of genes that parses genes that show the presence/absence of variation of expression within or across environments separately from genes that are expressed to non-trivial levels in both environments.

      We have modified the sentence in question now. Also, we pre-processed RNA-seq data to remove all transcripts with low expression signals (sigma signal < 20), and further retained only transcripts that had non-trivial expression in at least 10% of the population, which we believe represents presence/absence of variation of expression within or across environments.

      Lines 216-231: Is this analysis solely for directional selection? Not clear since previous sections examined both directional and stabilizing selection.

      Yes, we performed this analysis for only directional selection, and have clarified this in the text too.

      Lines 224-226: The meaning of this sentence is unclear and should be written more concretely.

      We have rephrased the sentence to be more clear.

      Lines 232-241: The description of the scientific logic here could be read as implying that genes interacting in networks are the sole source of indirect selection. I recommend revising the language to indicate this cause is one of several potential causes.

      We have reworded the sentence such that we indicate selection acting on interacting genes is just one of the causes of indirect selection.

      The strength of the conclusions of the decoherence analysis should be evaluated in light of caveats with such analyses (see Cai and Des Marais New Phytologist 2023).

      We have added the caveat with relevant citation in the manuscript.

      Rename this section as "Selection on Organismal Traits", as the previous sections have also been investigating selection on traits, just molecular traits.

      We have renamed the section as recommended

      Lines 314-318: Rewrite for clarity. Most environments select for an optimal phenotype; it is just the case here that the phenotypic distribution in the high salinity environment overlaps with the optimum.

      We have rephrased and clarified the statement.

      Lines 343-345: Rephrase to "These results indicate that natural variation in gene regulation under..."

      Rephrased.

      Line 354: "most" reads as too strong a descriptor here if the majority is ~60%.

      We have reworded the sentence to read “more than half”

      Lines 359-361: It is unclear to me how this interpretation follows from the above analysis.

      We have reworded the sentence so that the claim follows our analysis.

      Line 372: Is the expectation here more specifically one of epistatic selection? Other processes could stochastically lead to the genetic fixation of compensatory/reinforcing variants, but I think only epistasis for fitness would cause the interesting patterns of LD observed.

      The expectation here is that certain cis and trans variants only exists to compensate/reinforce, potentially through epistasis. We have clarified this in the text.

      Line 405: Change "adaptive organismal responses of organisms" to "organismal responses." As written, the sentence reads as being about plasticity rather than evolutionary responses, which are by populations, not organisms. None of the analyses included the manuscript test specifically test for adaptive plasticity.

      Rephrased.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      The conserved AAA-ATPase PCH-2 has been shown in several organisms including C. elegans to remodel classes of HORMAD proteins that act in meiotic pairing and recombination. In some organisms the impact of PCH-2 mutations is subtle but becomes more apparent when other aspects of recombination are perturbed. Patel et al. performed a set of elegant experiments in C. elegans aimed at identifying conserved functions of PCH-2. Their work provides such an opportunity because in C. elegans meiotically expressed HORMADs localize to meiotic chromosomes independently of PCH-2. Work in C. elegans also allows the authors to focus on nuclear PCH-2 functions as opposed to cytoplasmic functions also seen for PCH-2 in other organisms. 

      The authors performed the following experiments: 

      (1) They constructed C. elegans animals with SNPs that enabled them to measure crossing over in intervals that cover most of four of the six chromosomes. They then showed that doublecrossovers, which were common on most of the four chromosomes in wild-type, were absent in pch-2. They also noted shifts in crossover distribution in the four chromosomes. 

      (2) Based on the crossover analysis and previous studies they hypothesized that PCH-2 plays a role at an early stage in meiotic prophase to regulate how SPO-11 induced double-strand breaks are utilized to form crossovers. They tested their hypothesis by performing ionizing irradiation and depleting SPO-11 at different stages in meiotic prophase in wild-type and pch-2 mutant animals. The authors observed that irradiation of meiotic nuclei in zygotene resulted in pch-2 nuclei having a larger number of nuclei with 6 or greater crossovers (as measured by COSA-1 foci) compared to wildtype. Consistent with this observation, SPO11 depletion, starting roughly in zygotene, also resulted in pch-2 nuclei having an increase in 6 or more COSA-1 foci compared to wild type. The increased number at this time point appeared beneficial because a significant decrease in univalents was observed. 

      (3) They then asked if the above phenotypes correlated with the localization of MSH-5, a factor that stabilizes crossover-specific DNA recombination intermediates. They observed that pch-2 mutants displayed an increase in MSH-5 foci at early times in meiotic prophase and an unexpectedly higher number at later times. They conclude based on the differences in early MSH-5 localization and the SPO-11 and irradiation studies that PCH-2 prevents early DSBs from becoming crossovers and early loading of MSH-5. By analyzing different HORMAD proteins that are defective in forming the closed conformation acted upon by PCH-2, they present evidence that MSH-5 loading was regulated by the HIM-3 HORMAD. 

      (4) They performed a crossover homeostasis experiment in which DSB levels were reduced. The goal of this experiment was to test if PCH-2 acts in crossover assurance. Interestingly, in this background PCH-2 negative nuclei displayed higher levels of COSA-1 foci compared to PCH-2 positive nuclei. This observation and a further test of the model suggested that "PCH-2's presence on the SC prevents crossover designation." 

      (5) Based on their observations indicating that early DSBS are prevented from becoming crossovers by PCH-2, the authors hypothesized that the DNA damage kinase CHK-2 and PCH2 act to control how DSBs enter the crossover pathway. This hypothesis was developed based on their finding that PCH-2 prevents early DSBs from becoming crossovers and previous work showing that CHK-2 activity is modulated during meiotic recombination progression. They tested their hypothesis using a mutant synaptonemal complex component that maintains high CHK-2 activity that cannot be turned off to enable crossover designation. Their finding that the pch-2 mutation suppressed the crossover defect (as measured by COSA-1 foci) supports their hypothesis. 

      Based on these studies the authors provide convincing evidence that PCH-2 prevents early DSBs from becoming crossovers and controls the number and distribution of crossovers to promote a regulated mechanism that ensures the formation of obligate crossovers and crossover homeostasis. As the authors note, such a mechanism is consistent with earlier studies suggesting that early DSBs could serve as "scouts" to facilitate homolog pairing or to coordinate the DNA damage response with repair events that lead to crossing over. The detailed mechanistic insights provided in this work will certainly be used to better understand functions for PCH-2 in meiosis in other organisms. My comments below are aimed at improving the clarity of the manuscript. 

      We thank the reviewer for their concise summary of our manuscript and their assessment of our work as “convincing” and providing “detailed mechanistic insight.”

      Comments 

      (1) It appears from reading the Materials and Methods that the SNPs used to measure crossing over were obtained by mating Hawaiian and Bristol strains. It is not clear to this reviewer how the SNPs were introduced into the animals. Was crossing over measured in a single animal line? Were the wild-type and pch-2 mutations made in backgrounds that were isogenic with respect to each other? This is a concern because it is not clear, at least to this reviewer, how much of an impact crossing different ecotypes will have on the frequency and distribution of recombination events (and possibly the recombination intermediates that were studied). 

      We have clarified these issues in the Materials and Methods of our updated preprint. The control and pch-2 mutants were isogenic in either the Bristol or Hawaiian backgrounds. Control lines were the original Bristol and Hawaiian lines and pch-2 mutants were originally made in the Bristol line and backcrossed at least 3 times before analysis. Hawaiian pch-2 mutants were made by backcrossing pch-2 mutants at least 8 times to the Hawaiian background and verifying the presence of Hawaiian SNPs on all chromosomes tested in the recombination assay. To perform the recombination assays, these lines were crossed to generate the relevant F1s.

      (2) The authors state that in pch-2 mutants there was a striking shift of crossovers (line 135) to the PC end for all of the four chromosomes that were tested. I looked at Figure 1 for some time and felt that the results were more ambiguous. Map distances seemed similar at the PC end for wildtype and pch-2 on Chrom. I. While the decrease in crossing over in pch-2 appeared significant for Chrom. I and III, the results for Chrom. IV, and Chrom. X. seemed less clear. Were map distances compared statistically? At least for this reviewer the effects on specific intervals appear less clear and without a bit more detail on how the animals were constructed it's hard for me to follow these conclusions. 

      We hope that the added details above makes the results of these assays more clear. Map distances were compared and did not satisfy statistical significance, except where indicated. While we agree that the comparisons between control animals and pch-2 mutants may seem less clear with individual chromosomes, we argue that more general, consistent patterns become clear when analyzing multiple chromosomes. Indeed, this is why we expanded our recombination analysis beyond Chromosome III and the X Chromosome, as reported in Deshong, 2014. We have edited this sentence to: “Moreover, there was a striking and consistent shift of crossovers to the PC end of all four chromosomes tested.”

      (3) Figure 2. I'm curious why non-irradiated controls were not tested side-by-side for COSA-1 staining. It just seems like a nice control that would strengthen the authors' arguments. 

      We have added these controls in the updated preprint as Figure 2B.

      (4) Figure 3. It took me a while to follow the connection between the COSA-1 staining and DAPI staining panels (12 hrs later). Perhaps an arrow that connects each set of time points between the panels or just a single title on the X-axis that links the two would make things clearer. 

      To make this figure more clear, we have generated two different cartoons for the assay that scores GFP::COSA-1 foci and the assay that scores bivalents. We have also edited this section of the results to make it more clear.

      Reviewer #2 (Public review): 

      Summary: 

      This paper has some intriguing data regarding the different potential roles of Pch-2 in ensuring crossing over. In particular, the alterations in crossover distribution and Msh-5 foci are compelling. My main issue is that some of the models are confusingly presented and would benefit from some reframing. The role of Pch-2 across organisms has been difficult to determine, the ability to separate pairing and synapsis roles in worms provides a great advantage for this paper. 

      Strengths: 

      Beautiful genetic data, clearly made figures. Great system for studying the role of Pch-2 in crossing over. 

      We thank the reviewers for their constructive and useful summary of our manuscript and the analysis of its strengths. 

      Weaknesses: 

      (1) For a general audience, definitions of crossover assurance, crossover eligible intermediates, and crossover designation would be helpful. This applies to both the proposed molecular model and the cytological manifestation that is being scored specifically in C. elegans. 

      We have made these changes in an updated preprint.

      (2) Line 62: Is there evidence that DSBs are introduced gradually throughout the early prophase? Please provide references. 

      We have referenced Woglar and Villeneuve 2018 and Joshi et. al. 2015 to support this statement in the updated preprint.

      (3) Do double crossovers show strong interference in worms? Given that the PC is at the ends of chromosomes don't you expect double crossovers to be near the chromosome ends and thus the PC? 

      Despite their rarity, double crossovers do show interference in worms. However, the PC is limited to one end of the chromosome. Therefore, even if interference ensures the spacing of these double crossovers, the preponderance of one of these crossovers toward one end (and not both ends) suggest something functionally unique about the PC end.

      (4) Line 155 - if the previous data in Deshong et al is helpful it would be useful to briefly describe it and how the experimental caveats led to misinterpretation (or state that further investigation suggests a different model etc.). Many readers are unlikely to look up the paper to find out what this means. 

      We have added this to the updated preprint: “We had previously observed that meiotic nuclei in early prophase were more likely to produce crossovers when DSBs were induced by the Mos transposon in pch-2 mutants than in control animals but experimental caveats limited our ability to properly interpret this experiment.”

      (5) Line 248: I am confused by the meaning of crossover assurance here - you see no difference in the average number of COSA-1 foci in Pch-2 vs. wt at any time point. Is it the increase in cells with >6 COSA-1 foci that shows a loss of crossover assurance? That is the only thing that shows a significant difference (at the one time point) in COSA-1 foci. The number of dapi bodies shows the loss of Pch-2 increases crossover assurance (fewer cells with unattached homologs). So this part is confusing to me. How does reliably detecting foci vs. DAPI bodies explain this? 

      We have removed this section to avoid confusion.

      (6) Line 384: I am confused. I understand that in the dsb-2/pch2 mutant there are fewer COSA-1 foci. So fewer crossovers are designated when DSBs are reduced in the absence of PCH-2.

      How then does this suggest that PCH-2's presence on the SC prevents crossover designation? Its absence is preventing crossover designation at least in the dsb-2 mutant. 

      We have tried to make this more clear in the updated preprint. In this experiment, we had identified three possible explanations for why PCH-2 persists on some nuclei that do not have GFP::COSA-1 foci: 1) PCH-2 removal is coincident with crossover designation; 2) PCH-2 removal depends on crossover designation; and 3) PCH-2 removal facilitates crossover designation. The decrease in the number of GFP::COSA-1 foci in dsb2::AID;pch-2 mutants argues against the first two possibilities, suggesting that the third might be correct. We have edited the sentence to read: “These data argue against the possibility that PCH-2’s removal from the SC is simply in response to or coincident with crossover designation and instead, suggest that PCH-2’s removal from the SC somehow facilitates crossover designation and assurance.”

      (7) Discussion Line 535: How do you know that the crossovers that form near the PCs are Class II and not the other way around? Perhaps early forming Class I crossovers give time for a second Class II crossover to form. In budding yeast, it is thought that synapsis initiation sites are likely sites of crossover designation and class I crossing over. Also, the precursors that form class I and II crossovers may be the same or highly similar to each other, such that Pch-2's actions could equally affect both pathways. 

      We do not know that the crossovers that form near the PC are Class II but hypothesize that they are based on the close, functional relationship that exists between Class I crossovers and synapsis and the apparent antagonistic relationship that exists between Class II crossovers and synapsis. We agree that Class I and Class II crossover precursors are likely to be the same or highly similar, exhibit extensive crosstalk that may complicate straightforward analysis and PCH-2 is likely to affect both, as strongly suggested by our GFP::MSH-5 analysis. We present this hypothesis based on the apparent relationship between PCH-2 and synapsis in several systems but agree that it needs to be formally tested. We have tried to make this argument more clear in the updated preprint.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript describes an in-depth analysis of the effect of the AAA+ ATPase PCH-2 on meiotic crossover formation in C. elegant. The authors reach several conclusions, and attempt to synthesize a 'universal' framework for the role of this factor in eukaryotic meiosis. 

      Strengths: 

      The manuscript makes use of the advantages of the 'conveyor' belt system within the c.elegans reproductive tract, to enable a series of elegant genetic experiments. 

      We thank this reviewer for the useful assessment of our manuscript and the articulation of its strengths.

      Weaknesses: 

      A weakness of this manuscript is that it heavily relies on certain genetic/cell biological assays that can report on distinct crossover outcomes, without clear and directed control over other aspects and variables that might also impact the final repair outcome. Such assays are currently out of reach in this model system. 

      In general, this manuscript could be more generally accessible to non-C.elegans readers. Currently, the manuscript is hard to digest for non-experts (even if meiosis researchers). In addition, the authors should be careful to consider alternative explanations for certain results. At several steps in the manuscript, results could ostensibly be caused by underlying defects that are currently unknown (for example, can we know for sure that pch-2 mutants do not suffer from altered DSB patterning, and how can we know what the exact functional and genetic interactions between pch-2 and HORMAD mutants tell us?). Alternative explanations are possible and it would serve the reader well to explicitly name and explain these options throughout the manuscript. 

      We have made the manuscript more accessible to non-C. elegans readers and discuss alternate explanations for specific results in the updated preprint. 

      Recommendations for the authors:  

      Reviewing Editor Comments: 

      (1) Please provide 'n' values for each experiment. 

      n values are now included in the Figure legends for each experiment.

      (2) Line 129: Please represent the DCOs as percent or fraction (1%-9.8%, instead of 1-13). 

      We have made this change.

      (3) Figure 3A legend: the grey bar should read 20hr. COSA-1/ 32 hr DAPI. In Figure 3E, it is not clear why 36hr Auxin and 34hr Auxin show a significant difference in DAPI bodies between control and pch-2, but 32hr Auxin treatment does not. Here again 'n' values will help. 

      We have made this change. We also are not sure why the 32 hour auxin treatment did not show a significant difference in DAPI stained bodies. We have included the n values, which are not very different between timepoints and therefore are unlikely to explain the difference. The difference may reflect the time that it takes for SPO-11 function to be completely abrogated.

      (4) Line 360: Please provide the fraction of PCH-2 positive nuclei in dsb-2.

      We have made this change. 

      Please also address all reviewer comments. 

      Reviewer #1 (Recommendations for the authors): 

      (1) Page 3, line 52. While I agree that crossing over is important to generate new haplotypes, work has suggested that the contribution by an independent assortment of homologs to generate new haplotypes is likely to be significantly greater. One reference for this is: Veller et al. PNAS 116:1659. 

      We deeply appreciate this reviewer pointing us to this paper, especially since it argues that controlling crossover distribution contributes to gene shuffling and now cite it in our introduction! While we agree that this paper concludes that independent assortment likely explains the generation of new haplotypes to a greater degree than crossovers, the authors performed this analysis with human chromosomes and explicitly include the caveat that their modeling assumes uniform gene density across chromosomes. For example, we know this is not true in C. elegans. It would be interesting to perform the same analysis with C. elegans chromosomes in control and pch-2 mutants, taking into account this important difference.

      (2) Figure 2. It would really help the reader if an arrow and text were shown below each irradiation sign to indicate the stage in meiosis in which the irradiation was done as well as another arrow in the late pachytene box to show when the COSA-1 foci were analyzed. In general, having text in the figures that help stage the timing in meiosis would help the non C. elegans reader. This is also an issue where staging of C. elegans is shown (Figure 4). 

      We have made these changes to Figure 2. To help readers interpret Figure 4, we have added TZ and LP to the graphs in Figure 4B and 4D and indicated what these acronyms (transition zone and late pachytene, respectively) are in the Figure legend.

      (3) Page 12, line 288. It would be valuable to first outline why the him3-R93Y and htp-3H96Y alleles were chosen. This was eventually done on Page 13, but introducing this earlier would help the reader. 

      We have introduced these mutations earlier in the manuscript.

      (4) Page 13, line 323. A one sentence description of the OLLAS tagging system would be useful. 

      We have added this sentence: “we generated wildtype animals and pch-2 mutants with both GFP::MSH-5 and a version of COSA-1 that has been endogenously tagged at the Nterminus with the epitope tag, OLLAS, a fusion of the E. coli OmpF protein and the mouse Langerin extracellular domain”

      Reviewer #2 (Recommendations for the authors): 

      (1) The title is a little awkward. Consider: PCH-2 controls the number and distribution of crossovers in C. elegans by antagonizing their formation 

      We have made this change.

      (2) Abstract: 

      Consider removing "that is observed" from line 20. 

      We have made this change.

      I'm confused by the meaning of "reinforcement of crossover-eligible intermediates" from line 27. 

      We have removed this phrase from the abstract.

      A definition of crossover assurance would be helpful in the abstract. 

      We have added this to the abstract: “This requirement is known as crossover assurance and is one example of crossover control.”

      (3) Line 36: I know a stickler but many meioses only produce one haploid gamete (mammalian oocytes, for example) 

      Thanks for the reminder! We have removed the “four” from this sentence.

      (4) Line 284 - are you defining MSH-5 foci as crossover-eligible intermediates? If so, please state this earlier. 

      We have added this to the introduction to this section of the results: “In C. elegans, these crossover-eligible intermediates can be visualized by the loading of the pro-crossover factor MSH-5, a component of the meiosis-specific MutSγ complex that stabilizes crossover-specific DNA repair intermediates called joint molecules”

      (5) Can the control be included in Figure S1? 

      We have made this change.

      (6) Can you define that crossover designation is the formation of a COSA-1 focus? 

      We did this in the section introducing GFP::MSH-5: “In the spatiotemporally organized meiotic nuclei of the germline, a functional GFP tagged version of MSH-5, GFP::MSH-5, begins to form a few foci in leptotene/zygotene (the transition zone), becoming more numerous in early pachytene before decreasing in number in mid pachytene to ultimately colocalize with COSA-1 marked sites in late pachytene in a process called designation” 

      (7) Would it be easier to see the effect of DSB to crossover eligible intermediates in Spo-11, Pch-2 vs. Spo-11 mutant with irradiation using your genetic maps? At least for early vs. late breaks? 

      Unfortunately, irradiation does not show the same bias towards genomic location that endogenous double strand breaks do so it is unlikely to recapitulate the effects on the genetic map.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      In my estimation, the following would improve this manuscript:

      (1) The physiological relevance of these data could be better highlighted. For instance, future work could revolve around incubating oocytes with oviduct fluid (or OVGP1) to reduce polyspermy in porcine IVF, and naturally improve sperm selection in human IVF.

      Thank you for the suggestions. We have added these physiological relevance points at the end of the discussion.

      (2) Biological and technical replicate values for each experiment are unclear - for semen, oocytes, and oviduct fluid pools. I suggest providing in the Materials and Methods and/or Figure legends.

      Biological and technical replicates are now indicated in M&M. Number of oocytes or ZPs used were already indicated in every Supplementary Table.

      (3) Although differences presented in the bar charts seem obvious, providing statistical analyses would strengthen the manuscript.

      Statistical analyses are now indicated in each bar chart.

      (4) Results are presented as {plus minus} SEM (line 677); however, I believe standard deviation is more appropriate.

      This was a mistake; all the results are indicated as standard deviation.

      (5) Given the many independent experimental variables and combinations, a schematic depiction of the experimental design may benefit readers.

      A schematic depiction of the experimental design is now included as Figure 1. This new Figure modifies the number assigned to the rest of Figures.

      (6) Attention to detail can be improved in parts, as delineated in the "author recommendation" review section.

      Done

      Reviewer #2 (Public review):

      Weaknesses:

      The authors postulate a role for oviductal fluid in species-specific fertilization, but in my opinion, they cannot rule out hormonal effects or differences in the method of oocyte maturation employed.

      As we indicate below, the effect of hormones has been analyzed, and we have demonstrated that it is not the cause of zona pellucida specificity.

      They also cannot unequivocally prove that OVGP1 is the oviductal protein involved in the effect. Additional experiments are necessary to rule out these alternative explanations.

      Our work does not demonstrate that other proteins could be involved, but it does show that OVGP1 is involved in the process.

      When performing the EZPT assay on mouse oocytes obtained either from the ovary or from the oviduct, the oocytes obtained from the ovary came from mice primed with eCG, whereas the ones collected from the oviduct were obtained from superovulated mice (eCG plus hCG). This difference in the hormonal environment may make a difference in the properties of the ZP. Additionally, the ones obtained from the ovary were in vitro matured, which is also different from the freshly ovulated eggs and, again, may change the properties of the ZP. I suggest doing this experiment superovulating both groups of mice but collecting the fully matured MII eggs from the ovary before they get ovulated. In that way the hormonal environment will be the same in both groups and in both groups, oocytes will be matured in vivo. Hence, the only difference will be the exposure to oviductal fluids.

      In Figure 2, we compare ZPs from murine oocytes obtained from the ovary using only PMSG with ZPs from oviductal oocytes treated with both HCG and PMSG. But in Figure 7, however, we compared ZPs from murine oocytes exposed only to PMSG, with the only difference being whether or not they had been in contact with OVGP1. This shows that it is not the effect of the hormone but rather the contact with OVGP1 that determines their specificity.

      Mice with OVGP1 deletion are viable and fertile. It would be quite interesting to investigate the species-specificity of sperm-ZP binding in this model. That would indicate whether OVGP1 is the only glycoprotein involved in determining species-specificity. Alternatively, the authors could immunodeplete OVGP1 from oviductal fluid and then ascertain whether this depleted fluid retains the ability to impede cross-species fertilization.

      We agree with the reviewer that it would be interesting to investigate sperm-ZP binding in this model. Unfortunately, we do not have the OVGP1 knockout mouse strain. We also believe that immunodepletion of OVGP1 would not completely remove the protein, so its effect would likely not be entirely eliminated.

      What is the concentration of OVGP1 in the oviduct? How did the authors decide what concentration of protein to use in the experiments where they exposed ZPs to purified OVGP1? Why did they use this experimental design to check the structure of the ZP by SEM? Why not do it on oocytes exposed to oviductal fluid, which would be more physiological?

      We have included in the manuscript that the concentration of OVGP1 in the oviductal fluid was quantified using ImageJ software by comparing the mean gray value of the band in the oviductal fluid to the band in the recombinant protein lane. By establishing this relationship, along with the known concentration of protein amount in the recombinant one and in the total protein amount of oviductal fluid, the concentration of OVGP1 in the oviductal fluid was determined as the average of three western blots. The concentration of OVGP1 in oviductal fluids was in the range of 100-150 ng/µl in mice and 150-200 ng/µL in cow. We have included also in the manuscript the concentration that we have use for the EZPTs, 30 ng/µL of recombinants OVGP1 (bovine, murine and human) for 30 minutes in 20µL drops. With this concentration, we observed a clear effect on zona specificity with no negative impact on the gametes.

      As you can see in supplementary Fig S8B, we already realized SEM of oocytes exposed to oviductal fluid.

      None of the figures show any statistical analysis. Please perform analysis for all the data presented, include p values, and indicate which statistical tests were performed. The Statistical analysis section in the Methods indicating that repeated measures ANOVA was used must refer to the tables. Was normality tested? I doubt all the data are normally distributed, in which case using ANOVA is not appropriate.

      Statistical results are now included in each Figure and Table. All the statistical analysis are included, all the data pass normality, homogeneity of variance and independence; for this reason the data analysis was conducted by using a one-way ANOVA, followed by Tukey´s post hoc test. Significance level was set at p <0.05.

      Why was OVGP1 selected as the probable culprit of the species specificity? In the Results section entitled "Homology of bovine, human and murine OVGP1 proteins..." the authors delve into the possible role of this protein without any rationale for investigating it. What about other oviductal proteins?

      A sentence indicating this rationale for investigating OVGP1 has been introduced in this paragraph.

      Reviewer #3 (Public review):

      Weaknesses:

      The manuscript began with a well-written introduction, but problems started to surface in the Results section, in the Discussion, as well as in the Materials and Methods. Major concerns include inconsistencies, misinterpretation of results, lacking up-to-date literature search, numerous errors found in the figure legends, misleading and incorrect information given in the Materials and Methods, missing information regarding statistical analysis, and inadequate discussion. These concerns raise questions regarding the authenticity of the study, reliability of the findings, and interpretation of the results. The manuscript does not provide solid and convincing findings to support the conclusion.

      We have modified and clarified all the issues, some of which are misunderstandings, we have also performed the suggested experiment of putting sperm in contact with OVGP1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Ensure consistency in (past) tense, for example, "decondensed" (line 102), "induced" (line 103), and elsewhere.

      Done

      (2) Replace "table" with "Table" throughout.

      Done

      (3) The authors often refer to "co-incubation". I believe this should read "incubation". My understanding is that oocytes were incubated with oviduct fluid or sperm but never both simultaneously as "co-incubation" implies.

      Done

      (4) Synonymous terms "OVGP1" and "oviductin" are used interchangeably. Consider using one or the other for consistency.

      We believe that by using both terms, reading is more fluid.

      (5) Delete "around" on line 256 and "approximately" on line 263 and provide actual percentages.

      Done

      (6) The point of the sentence on lines 311-313 is unclear to me.

      Rewritten

      (7) Suggest specifying "wildtype" on line 419.

      All the mice used in this work are wildtype

      (8) Do the authors have details regarding cattle oocyte donor breeds?

      Done

      (9) What do the authors mean by "strengthen" on line 500?

      The word strengthen has been changed to carefully isolated

      (10) Ponceau and vinculin (Figure 3) details are not provided in the manuscript.

      Ponceau and vinculin details are now included in the manuscript

      (11) Address formatting issues (e.g. citation 26 among others).

      Done

      (12) Primary and secondary antibody controls for immunofluorescent imaging (to fully exclude autofluorescence) are lacking.

      Controls for immunofluorescent imaging are indicated in Supplementary Figure S7.

      (13) The corresponding author on the manuscript and in the eLife submission system are different

      It was a problem during submission, now it is corrected.

      Reviewer #2 (Recommendations for the authors):

      (1) For the experiment depicted in Figures 3C and D, the authors need to perform a negative control to demonstrate that this fluorescent signal is specific. What happens if they express a different FLAG-tagged protein instead of bOVGP1 and mOVGP1? FLAG antibodies give quite strong non-specific binding. Or if they expressed untagged bovine and mouse OVGP1?

      The negative controls are in the supplementary Figure S7. A rabbit polyclonal antibody to the human OVGP1 was used for murine and bovine IVM ZPs from ovaries and murine superovulated ZPs recovered from mouse oviducts. There is a remarkable difference in the ones that are not incubated with any OVGP1 and the endogenous one, given the specificity of the antibody.

      Also, IVM mouse and bovine oocytes incubated or not with OF were immunoblotted with anti-Flag-tag antibody. Since any of them present OVGP1 tagged to Flag, there is not signal in the immunofluorescence.

      (2) For the Western blots of recombinant proteins, why are the authors not showing the blots using His and FLAG tag antibodies? Is the 50-kDa band observed for the mouse OVGP1 detected with His-Tag antibody?

      We have included a supplementary figure S6 with the western blot with anti-His and anti-Flag. The protein around 50 kDa is not a specific band (there is not signal with anti-Flag). This new figure modifies the number assigned to the rest of supplementary figures (S6-S8).

      (3) How was the estrous cycle stage determined in mice? It is not described in the Methods.

      Estrous cycle stage was determined in mice by visual examination of the vaginal opening and cytological examination of the vagina smear. This is now included in the M&M

      (4) For sperm binding, what does the percentage mean?

      It was a mistake, percentages were related to pronuclear formation and cleavage not to sperm binding, this is now corrected.

      (5) In Figure 3A, the labels for regions C, D, and E are mixed up. It is regions A and C that are conserved (or orange and blue, if the letters are incorrect). The purple region is only present in the mouse (E?), and the red region (D?) is only in the human form. Also, the legend for this panel is repeated verbatim in the Results section. Please remove one of them.

      Errors in Figure 3a have been corrected. Legend repetition is removed.

      (6) In the title of Figure 1B and in different places in the text, it should be mouse (not mice) oocytes.

      Done

      (7) In line 140, I would change the part indicating "We extracted the cytoplasmic contents from the oocytes". It is not only the cytoplasm, but all the oocyte, including the nucleus and membranes, that are being removed.

      Done

      (8) Please rephrase the sentence in lines 245-247, as it is quite confusing.

      Done

      (9) In line 236, the authors indicate that "During in vitro maturation (IVM), oocytes displayed a porous ZP structure...". Do they mean after IVM? When were those oocytes collected for SEM?

      The sentence has been modified by “after IVF”. Bovine oocytes were collected from slaughterhouse ovaries and were similar to those used in the rest of the experiments in the manuscript.

      (10) In the legend of Figure 1, please indicate what the parthenogenic group is.

      Done

      (11) In the legend to Figure 1G, the text indicates "Note sperm only appear outside the zona". However, I cannot see any sperm in that image.

      The phrase has been removed, as when enlarging the image to better see the sperm that are inside the area, the vision of those that are outside has been lost.

      (12) In the legend to Figure 2 describing the different zona pictures, the letters of the panels are not correct.

      Done

      (13) In line 999, please provide the right concentration for NMase (it indicates 10 μ/mL).

      Done

      (14) Where does the model depicted at the end of the manuscript go? Is it a Figure? A graphical abstract? In that model, please correct some typos: it should be "ZP obtained from ovarian oocytes"; and change specie for species in all three panels.

      Done. It is a model (Fig. 10)

      (15) The FITC-PNA staining to visualize acrosomes is not described in the Methods section.

      Done

      Reviewer #3 (Recommendations for the authors):

      The present study reports findings from a series of experiments suggesting that bovine oviductal fluid and species-specific oviductal glycoprotein (OVGP1 or oviductin) from bovine, murine, or human sources modulate the species specificity of bovine and murine oocytes. The manuscript began with a well-written introduction, but problems started to surface in the Results section, Discussion as well as in the Materials and Methods. Major concerns include inconsistencies, misinterpretation of results, lacking up-to-date literature search, numerous errors found in the figure legends, misleading and incorrect information given in the Materials and Methods, missing information regarding statistical analysis, and inadequate discussion.

      We have modified and clarified all the issues, some of which are misunderstandings, we have also performed the suggested experiment of putting sperm in contact with OVGP1.

      Specific comments:

      (1) Lines 142 to 143 on page 5: It is stated that "Because this experiment was done on empty ZPs, we called this test "empty zona penetration test" (EZPT)". In fact, the experiment was not actually done on empty ZPs, but on oocytes with the ooplasm extracted. Therefore, the zona pellucidae used in the experiment were not empty but contained an intact zona matrix of glycoproteins. The term "EZPT" used by the authors in the manuscript is a misnomer. A better term should be used to reflect the ZPs which were intact and not empty.

      We extracted the cytoplasmic containing all the organelles, nucleus and membranes, and the polar body. This has been clarified in the text.

      (2) The authors need to distinguish between sperm penetration and sperm binding in the manuscript. In lines 169 to 177 on page 6, the authors mixed up the terms "penetration" and "binding" in the text. In writing about events leading to fertilization in reproductive biology, the term "sperm binding" refers to the interaction between the sperm plasma membrane and the oocyte zona pellucida (ZP), whereas the term "sperm penetration" refers to the passage of the sperm through the ZP. Therefore, the statements in lines 169 to 177 describing the binding of bovine, murine, and human sperm to bovine oocytes with and without prior treatment with oviductal fluid are misleading and not correct. In fact, Figure 2 and Table 6 show sperm penetration and not sperm binding.

      Figure 2A and B (now 3A and 3B), and Tables S6 show both sperm penetration (% penetration rate and average sperm in penetrated ZPs) and sperm binding (average sperm bound to ZPs). Throughout the manuscript, a clear distinction is made between sperm attached to the ZP and sperm that have penetrated it.

      (3) Lines 182 to 187 on page 6: What is being described in the text here does not match what is being shown in Figure 3A. As a result, the information provided in lines 182 to 187 is not correct and misleading. For example, it is stated in lines 182 to 183 that "As depicted in Fig. 3A, the sequences of these three OVGP1 have five distinct regions (A, B, C, D and E)." However, Figure 3A shows that hOVGP1 and mOVGP1 both have only 4 regions and bOVGP1 has only 3 regions. None of the three has 5 regions. In lines 183 to 184, the authors continued to state that "Regions A and D are conserved in the different mammals." This statement is also not true because Figure 3A shows that only region A is conserved in all three species but not region D which is found only in the human. What is stated in lines 186 to 187 is also not correct based on the information provided in Figure 3A. It is stated here that "Region C is an insertion present only in the mouse (Mus) and region E is typical of human oviductin." However, based on the color codes provided in Figure 3A, region C is present in all three species while region E is present only in the mouse.

      Errors with naming regions in Figure 3A (now 4A) have been corrected.

      (4) In lines 195 to 197 on page 6, the authors stated that "Western blots of the three OVGP1 recombinants indicated expected sizes based on those of the proteins: 75 kDa for human and murine OVGP1 and around 60 kDa for bovine OVGP1 (Fig. 3B)." However, the expected size of the recombinant human OVGP1 is not in agreement with what has been published in literature regarding the molecular weight of recombinant human OVGP1. It has been previously reported that a single protein band of approximately 110-150 kDa was detected for recombinant human OVGP1 using an antibody against human OVGP1. The authors provided Western blots of murine oviductal fluid and bovine oviductal fluid in Figure 3B but not a Western blot of native human oviductal fluid. The latter should have been included for a comparison with the recombinant human OVGP1.

      We do not have human oviductal fluid, but we have included now a supplementary figure 6S of a western blot with antibody again His and Flag (present in the recombinant OVGP1) which shows that the size of the recombinant protein is as indicated in the Figure 3B (now 4B).

      (5) Lines 220 to 229 on page 7: In this experiment, the authors conducted the EZPT using ZPs from bovine oocytes that were either treated with or without bOVGP1 followed by incubation, respectively, with homologous sperm (bovine) and heterologous sperm (human and murine). This is a logical experiment to determine if OVGP1 plays a species-specific role in setting the specificity of the zona pellucida. However, in the in vivo situation, sperm that reach the lumen of the ampulla region of the oviduct where fertilization takes place are also exposed to oviductal fluid of which OVGP1 is a major constituent. Therefore, an additional experiment in which sperm are treated with OVGP1 prior to incubation with ZP should be carried out for a comparison.

      The additional experiment in which sperm are treated with OVGP1 prior to incubation with ZP has been done (Table S9). No effects were observed. This is now included in the manuscript.

      (6) Regarding the results obtained with the use of neuraminidase (lines 278 to 293 on pages 8 to 9), if neuraminidase treatment of bovine ZP prevented bovine sperm penetration regardless of whether ZPs had been or had not been in contact with OVGP1, that means OVGP1 is not responsible for penetration despite the description of earlier findings in the manuscript. Sialic acid is likely associated with the sugar side chains of ZP glycoproteins and not sugar side chains of OVGP1. To attribute the species-specific property of sialic acid to OVGP1 for sperm binding, an experiment in which OVGP1 will be treated with neuraminidase prior to performing the EZPT is needed.

      We conducted the experiment by treating only OVGP1 with neuraminidase and then isolating OVGP1 from the enzyme previously to incubate treated OVGP1 with ZPs. The results agree with our previous findings, indicating the importance of sialic acid on OVGP1 for sperm binding and penetration, and confirming that OVGP1 is responsible for species-specific penetration. Results are shown in Fig. 9 and Table S14.

      (7) The Discussion appears superficial and a more in-depth discussion regarding the results obtained in the present study in relation to other reports about OVGP1 published in literature is needed (e.g. a recent paper published by Kenji Yamatoya et al. (2023) Biology of Reproduction https://doi.org/10.1093/biolre/ioad159). Lines 317 to 342 of the Discussion on pages 10 to 11 should belong to the Introduction.

      Results of Yamatoya are now included in discussion. Part of the discussion from 317 to 342 are now in the introduction

      (8) In is not clear what the authors exactly want to say in lines 343 to 344 of the Discussion on page 11. It is stated here that "The empty zona penetration test (EZPT) enables heterologous sperm to overcome the oocyte's second barrier, the plasma membrane or oolemma." Do the authors mean that the sperm can now enter the empty space encircled by the ZP without having to go through the plasma membrane or oolemma? In Figure S4 which depicts the method used to empty the ooplasm in the bovine oocyte, does the method extract only the ooplasm (or cytoplasmic contents) leaving behind the intact plasma membrane or oolemma? This needs to be clearly shown and clearly explained. High magnifications of the zona pellucida are also needed to show whether the plasma membrane (or oolemma) is still present and intact after extraction of the ooplasm.

      This is clearly explained in the text. To obtain empty ZP, everything except ZP (nucleus, organelles, membranes and cytoplasmic contents of the oocytes) was removed using a micromanipulator, following the procedure outlined in Figure S4.

      (9) The authors stated in the Discussion in lines 383 to 383 on page 12 that "After ovulation, the changes reported in the carbohydrate composition of the ZP (3, 25) are likely induced by the addition of glycoproteins of oviductal origin, as we have seen here with OVGP1." There is no evidence in the present study to suggest that OVGP1 or glycoproteins of oviductal origin have changed or can change the carbohydrate composition of the ZP. At present, it is not known if OVGP1 or glycoproteins of oviductal origin directly interact with ZP glycoproteins (including ZP1, ZP2, ZP3 and/or ZP4) that make up the zona matrix.

      There is scientific evidence suggesting that oviductal glycoproteins, including OVGP1, interact with the zona pellucida (ZP) glycoproteins of the oocyte. Studies have shown that OVGP1 binds to the ZP of the oocyte. Specifically, OVGP1 is thought to interact with ZP glycoproteins, such as ZP2 and ZP3, in a way that may help stabilize the oocyte or modify the ZP structure during its passage through the oviduct. This interaction is believed to influence processes like sperm binding, oocyte maturation, and potentially the prevention of polyspermy during fertilization. For example, in several studies, the absence of OVGP1 in knockout animals (such as in Ovgp1-KO hamsters) has been associated with impaired fertilization and embryonic development, which indicates the importance of this interaction. However, the detailed molecular mechanisms and functional significance of these interactions require further exploration. We have use the work “likely” to soften this statement.

      Velásquez, J. G., Canovas, S., Barajas, P., Marcos, J., Jiménez‐Movilla, M., Gallego, R. G., ... & Coy, P. (2007). Role of sialic acid in bovine sperm–zona pellucida binding. Molecular reproduction and development, 74(5), 617-628.

      Kunz, P., et al. (2013). "The role of oviductal glycoprotein 1 in sperm–egg interaction and early embryonic development." Reproduction, 145(3), 225-233. DOI: 10.1530/REP-12-0300

      Yamatoya, K., Kurosawa, M., Hirose, M., Miura, Y., Taka, H., Nakano, T., ... & Araki, Y. (2024). The fluid factor OVGP1 provides a significant oviductal microenvironment for the reproductive process in golden hamster. Biology of reproduction, 110(3), 465-475.

      (10) Lines 390 to 391 page 12: The statement "This determines that OVGP1 modifications are critical to define the barrier among the different species of mammals." needs to be rephrased because there is no evidence in the present study showing that OVGP1 has been modified. There are many concerns with errors, important information that is missing, and inconsistencies as well as wrong and misleading information in the Materials and Methods which are troublesome. These concerns raise questions regarding the authenticity and reliability of the study. Some of the major concerns are listed below:

      All concerns have been fixed

      (11) It says in line 399 on page 13 that "Human semen samples were obtained from a normozoospermic donor...". Do the authors really mean that the semen samples were obtained from only one donor?

      Samples were obtained from 3 normozoospermic donor, this is now indicated in M&M

      (12) In lines 409 to 411 on page 13, what do the authors mean by "...the samples were frozen into pellets..."? Was centrifugation of the samples carried out prior to freezing the samples? Secondly, what do the authors mean by "....and stored in liquid nitrogen at -196{degree sign}C or lower.", particularly what do the authors mean by "or lower"? The temperature of liquid nitrogen is -196{degree sign}C. What is the "lower" temperature?

      Centrifugation of the samples were no carried out at this time. A more detailed protocol is now included The word lower has been removed.

      (13) Line 424 on page 13: Provide the full name of "M2" when it is first used in the text then followed by the abbreviation.

      Done

      (14) Is there a reason why different counting chambers were used to determine sperm concentrations? In line 432 on page 13, a Thomas cell counting chamber was used to determine the sperm count of epididymal mouse sperm whereas it is mentioned in line 441 on page 14 that a Neubauer cell counting chamber was used to determine epididymal cat sperm. Furthermore, where did the cat's sperm come from?

      The cat sperm was obtained and processed at the Faculty of Veterinary Medicine and the rest of the samples were processed in the INIA-CSIC lab, and different chambers were used in both places.

      (15) The mention of the use of cat spermatozoa in line 439 on page 14 is a worrisome problem of the manuscript. The present study used bovine, mouse, and human sperm and not cat. Therefore, the sudden mentioning of the use of cat spermatozoa in the Materials and Methods is troublesome and worrisome. It appears that the paragraph from lines 439 to 450 was directly copied and pasted from previously published work. Furthermore, lines 441 to 445 do not flow and do not make sense. In fact, what is described in this paragraph (lines 439 to 450) does not appear to correspond to the method(s) used to obtain the results presented in the Results section of the manuscript.

      I don't understand why the reviewer says we don't use cat sperm. This study uses cat sperm. Results of cat sperm are indicated in the Figure 1A (now 2A). We have modified the M&M to clarify frozen description.

      (16) Similarly, several problems are also found in the paragraphs (lines 453-478 on page 14) describing the methods and procedures to obtain homologous and heterologous IVF of bovine oocytes. Firstly, it is mentioned here (in line 460) that COCs were co-incubated with selected sperm without removing the cumulus cells. However, the results of the sperm penetration experiment indicated otherwise. Figures 2 and 3 show that the oocytes were denuded of cumulus cells. Secondly, it is very worrisome and troublesome to read what is written in line 468 on page 14 that "...from other species (cat, human, mouse, and rabbit)." One wonders where the cat and rabbit came from. Again, it appears that this paragraph was directly copied and pasted from previously published work.

      Cat sperm was used in this manuscript and it is correctly indicated in every section and figures. About IVF and EZPT protocols, in the protocol of IVF for bovine oocytes, COCs were used without removing the cumulus cells. For the EZPT cumulus cells were removed, this is described in the following sections of the material and methods. The word rabbit was a mistake and it has been removed.

      (17) In lines 468 to 469 on page 14, it is mentioned that "Sperm-egg interactions were assessed through a sperm-ZP binding assay...". The authors only examined sperm penetration in their study. Therefore, this needs to be specified in the Materials and Methods. Secondly, the authors did not use the conventional sperm-ZP binding assay in their study. Instead, they used the EZPT in their study. There appear to be many inconsistencies throughout the manuscript.

      When the IVF experiments using bovine COCs were done (Fig 2A and C, Fig 1S to 3S, and Tables 1S to 4S) conventional sperm-egg interaction was assessed at 2.5 hours after IVF. EZPT was used in the rest of experiments. IVF with COCs and EZPT with ZPs are different experiments.

      (18) Lines 480 to 489 on page 15 under the sub-heading of "In vitro culture of presumptive zygotes to first cleavage embryos on Day 2" do not provide the correct methodology used for obtaining the results presented in the manuscript. In line 482, it is not clear where the "synthetic oviductal fluid" came from. In fact, in the Results section, none of the results came from the use of synthetic oviductal fluid. In line 487, humans and rabbits are mentioned here. However, human and rabbit oocytes were not used in the present study. It is very strange indeed to read human and rabbit in the sentence.

      SOF reference is now included. Human results are in Fig 1A; the sentence is referred about the cultures of bovine oocytes inseminated with sperm of bull, human, mouse or cat). Rabbit word is a mistake and is now eliminated of the manuscript.

      (19) In line 500 on page 15, what do the authors mean by "Each oviduct was strengthen by removing the adjacent tissue..."?

      The sentence has been modified.

      (20) On page 15 in the Materials and Methods, the authors described the collection of bovine and mouse oviductal fluid. However, there is no mention of human oviductal fluid and how it was collected. This important information is missing.

      We have not use human oviductal fluid in this manuscript.

      (21) Line 510 on page 15: The sub-heading of "Preparation of empty zonae pellucidae from bovine ovarian oocytes" should be rephrased. As pointed out earlier in my review, the ZPs prepared by the authors were intact and not "empty". It was the oocyte which was empty after extraction of the ooplasm.

      Everything except the ZP were removed from the oocyte, this is now clarified in the manuscript.

      (22) Line 518 on page 16 and line 553 on page 17: "Figure S5" should be "Figure 4S".

      Done

      (23) Line 538 and line 547 on page 16: "mice oocytes" should be "mouse oocytes".

      Done

      (24) On page 17, the procedures for in vitro fertilization, sperm penetration, and binding assessment in mice were described here in lines 560 to 574. Several problems are noted in this paragraph as listed below:<br /> a. As mentioned earlier the authors in the present manuscript mixed up sperm penetration and sperm binding which are two separate events. Based on the results presented in the manuscript, they represent sperm penetration and not sperm binding. Therefore, the authors need to precisely explain in the manuscript whether the results presented refer to sperm penetration or sperm binding.

      Both sperm penetration and binding have been analyzed in this work.

      b. In line 570 on page 17, the term "insemination" is wrongly used here. Insemination is the introduction of semen into the female reproductive tract either through sexual intercourse or through an instrument. The procedure used in the present study was carried out in vitro in a co-incubation manner and not by transferring sperm into the female reproductive tract.

      The word insemination has been changed to incubation

      c. Information regarding procedures for treatment with various oviductal fluid and OVGP1s are all missing in the Materials and Methods.

      This information is now in M&M

      d. The concentrations of various oviductal fluids and OVGP1s used and the number of ZPs used in each incubation are also missing.

      Concentrations are now indicated in the manuscript. All the numbers and ZPs used are indicated in supplementary figures.

      (25) Lines 577 to 603 on pages 17 to 18: Were recombinant bovine and murine glycoproteins prepared using the same methodology? In line 595 on page 18, it is stated that "Supernatant was saved in subsequent experiments." It is not clear exactly what experiments the supernatant was subsequently used in.

      Details about how the bovine and murine glycoproteins were prepared are now included. Sentence about subsequent experiment is delete; supernatant was used for the next steps of protein purification.

      (26) What is being described in lines 604 to 609 on page 18 is problematic. The paragraph starts by saying that "Human recombinant oviductin was obtained from Origene Technologies....". Strangely, the paragraph continues by saying that the recombinant proteins were produced by transfection in HEK293T...". If recombinant human OVGP1 had already been obtained from Origene Technologies, why did the authors want to produce it again? It does not make sense.

      We briefly described the method that Origene used for the production of the human recombinant OVGP1

      (27) In lines 626 to 627 on page 18, it is stated that "Zonae pellucidae previously incubated with OVGP1 proteins from several species and murine oviductal fluid...". Were the zonae pellucidae previously incubated with only murine oviductal fluid or also with others?

      It was only incubated with OVGP1 or with oviductal fluid, this is now clarified in the text.

      (28) In lines 638 and 639 on page 19, can the authors please explain the difference between "endogenous OVGP1 and bOVGP1" and "exogenous recombinant hOVGP1 and mOVGP1"?

      This is now clarified

      (29) As stated in lines 676 to 679 on page 20, statistical analysis was performed in the study. Strangely, no "n" numbers and p values were provided in any of the figures that require statistical analysis. This is problematic.

      Statistical analysis and significant differences are now included in the figures, all the numbers used are included in the supplementary tables that are related with the figures.

      There are also many errors noted in the Figure Legends. These concerns raise questions regarding the reliability of the findings and interpretation of the results. Some major ones that require attention are listed below:

      (30) Figure legend 1 on page 27: In line 912, where did the "cat sperm" come from? In line 913, where did the "feline sperm" come from? In line 918, as pointed out earlier, the term "empty zona penetration test (EZPT)" is a misnomer and should be replaced with a better term. In line 924, it is stated that "Note sperm only appear outside the zona." However, no sperm can be seen outside the zona pellucida shown in Figure 1.

      Cat sperm is used in this manuscript. Term EZPT is now clarified The sentence about sperm outside of ZP is removed

      (31) Figure legend 2 on page 27 (lines 928 to 940) needs to be rewritten. Some of the sentences are not clearly written. Authors, please check all the capital labeling letters some of which appear to be wrong.

      Done

      (32) As is written, Figure legend 3 on pages 28 and 29 (lines 943 to 959) presents many problems:

      a. Contrary to what is stated in the figure legend, not all five regions are present in the hOVGP1, mOVGP1, and bOVGP1.

      Done

      b. Contrary to what is stated in line 946, region D is not conserved in the mouse and bull as shown in Figure 3A, and region C is not present only in the mouse.

      Done

      c. Based on what is shown in Figure 3A, region E is present only in the mouse and not in the human.

      Done

      d. What is stated in line 951 that "Proteins were expressed in mammalian cells..." is not correct. Based on the information provided in the manuscript, recombinant human OVGP1 was obtained from Origene Technologies and was not expressed in mammalian cells as claimed.

      All the recombinant proteins were produced in mammalian cells.

      (33) Figure legend 6 on page 28: In lines 985 to 986, what do the authors mean by "...and combinations of the three oviductins with sperm of the three species."? As is written, it appears that the bovine ZPs were pretreated with a combination of all three oviductins and then co-incubated with sperm from the bull, mouse and human together.

      We have clarified this sentence

      (34) What is described in the figure legend for the supplemental figure (Figure S7) does not make sense.

      Legend of Fig S7 (now S8) is related to pictures A to E, the legend is now clarified.

      (35) In addition to the figures and supplemental figures provided in the manuscript, there is also an additional figure labeled with "Model" showing three diagrams. Strangely, there is no mention of this additional figure in the manuscript. There is no figure legend for or description of this figure. It is not clear what is being shown in this figure, and it is not clear about the purpose of the use of this figure.

      We have included a legend to the model that is now Figure 10.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, a chromosome-level genome of the rose-grain aphid M. dirhodum was assembled with high quality, and A-to-I RNA-editing sites were systematically identified. The authors then demonstrated that: 1) Wing dimorphism induced by crowding in M. dirhodum is regulated by 20E (ecdysone signaling pathway); 2) an A-to-I RNA editing prevents the binding of miR-3036-5p to CYP18A1 (the enzyme required for 20E degradation), thus elevating CYP18A1 expression, decreasing 20E titer, and finally regulating the wing dimorphism of offspring.

      Strengths:

      he authors present both genome and A-to-I RNA editing data. An interesting finding is that a A-to-I RNA editing site in CYP18A1 ruin the miRNA binding site of miR-3036-5p. And loss of miR-3036-5p regulation lead to less 20E and winged offspring.

      Weaknesses:

      How crowding represses the miR-3036-5p is still unclear.

      Reviewer #2 (Public Review):

      Summary:

      Environmental influences on development are ubiquitous, affecting many phenotypes in organisms. However molecular genetic and cellular mechanisms transducing environmental signals are still only barely understood. This study examines part of one such intracellular mechanism in a polyphenic (or dimorphic) aphid.

      Strengths:

      While other published reports have linked phenotypic plasticity to RNA editing before, this study reports such an interaction in insects. The study uses a wide array of molecular tools to identify connections upstream and downstream of the RNA editing to elucidate the regulatory mechanism, which is illuminating.

      Weaknesses:

      While this system is intriguing, this report does not foster confidence in its conclusions. Many of the analyses seem based on very small sample sizes. It is itself problematic that sample sizes are not obvious in most figures, although based on Methods section covering RNAseq, they seem to be either 3, 6 or 9, depending on whether stages were pooled, but that point is not made clear. With such small sample sizes, statistical tests of any kind are unreliable. Besides the ambiguity on sample sizes, it's unclear what error bars or whiskers show in plots throughout this study. When sample sizes are small estimates of variance are not reliable. Student's t-test is not appropriate for comparisons with such small sample sizes. Presently, it is not possible to replicate the tests shown in Figures 3, 4 and 6. (Besides the HT-seq reads, other data should also be made publicly available, following the journal's recommendations.) Regardless, effect sizes in some comparisons (Fig 3J, 4A-C, 6E, H) are clearly not large, making confidence in conclusions low. The authors should be cautious about over-interpreting these data.

      We appreciate very much for the reviewers’ time spent on our manuscript and the referees for the valuable suggestions and comments.

      To Reviewer #1:

      At present, researches on miRNAs mainly focus on its role in gene regulation by binding to the mRNA of target genes, “how miRNAs are regulated” has received less attention.

      Recent researches indicated that the expression of miRNAs is also regulated at the transcriptional or post transcriptional level. Transcriptional regulation including changes in the promoter of microRNA genes, and post-transcriptional mechanisms such as changes in miRNA processing and stability can both affect the final expression level of miRNAs.

      This article did not address how crowding treatment regulates miRNA expression. But this will be a very interesting issue, and we will pay attention to it in our future research.

      Thank you for this suggestion.

      To Reviewer #2:

      (1) “Transgenerational wing dimorphism was observed in M. dirhodum in which crowding of the parent (100 mother aphids in a 10 cm³ tube) increased the winged offspring (Fig 3E).” In this experiment, over 250 offsprings were used to calculate the proportion of winged and wingless individuals in normal (277), crowding (255) and crowding+20E (272) groups, respectively.

      “The RNAi-mediated knockdown of CYP18A1 and ADAR2 can significantly increase the titer of 20E (Fig. 4E) and reduce the number of winged offspring by 29.6% and 24.4% (Fig. 4F), respectively.” In this experiment, over 245 offsprings were used to calculate the proportion of winged and wingless individuals in dsEGFP (273), dsCYP18A1(248), and dsADAR2 (250) groups, respectively.

      “miR-3036-5p agomir and antagomir treatments could affect the proportion of winged offspring under normal conditions (Fig. 6F), but have no effect on the wing dimorphism of offspring under crowded conditions (Fig. 6L).” In this experiment, over 235 offsprings were used to calculate the proportion of winged and wingless individuals in each group, respectively.

      So I think our conclusion that crowding treatment, A-to-I RNA editing, and miRNAs could affect the wing dimorphism of offspring in M. dirhodum is very reliable. Because the number of aphids we use to count the results is sufficient.

      (2) The quantitative PCR method is used to detect changes in gene expression levels of CYP18A1 and ADAR2 after treatment with crowding, 20E, dsRNA, miRNA agomir and antagomir, and the results are shown in Fig. 3J, 4A-C, 5B, 6B, H, respectively. 5 biological replicates (more than 100 aphids were used for each biological replicate) were used in each sample, which might be sufficient for qPCR experiments. And among these biological replicates, the differences in gene expression levels are relatively small.

      (3) The titer of 20E was detected after treatment with crowding, 20E, dsRNA, miRNA agomir and antagomir, and the results are shown in Fig. 3I, 4E, 6E, K, respectively. 8 biological replicates (more than 100 aphids were used for each biological replicate) were used in each sample.

      The number of biological replicates used in each analysis and the number of aphids included in each biological replicate have been added in the Materials and Methods section. Thank you very much for pointing out this important issue.

      Reviewer #1 (Recommendations For The Authors):

      Several questions:

      (1) This study was conducted on the rose-grain aphid M. dirhodum. However, pea aphid Acyrthosiphon pisum seems to be a better object in wing dimorphism and development studies. Have the authors also identified the A-to-I RNA editing on pea aphids or other aphids?

      Wheat is one of the main grain crops in China as well as in the world. Metopolophium dirhodum is one of the most important wheat aphids around China, and has posed a significant threat to grain production. The current study was conducted to determine the regulatory mechanism of wing dimorphism on M. dirhodum, which might be of great significance to better control this pest in wheat production.

      Surely the pea aphid offers more established experimental tools and genomic resources. However, with the development of high-throughput sequencing technology, the chromosome level genomes of many insect species have been assembled. That means any of various insects might be studied as a model species, and not limited to Drosophila melanogaster, Acyrthosiphon pisum, etc.

      We didn’t identify the A-to-I RNA editing on pea aphids or other aphids. A recent study has shown that editing events are poorly conserved across different Xenopus species. Even sites that are detected in both X. laevis and X. tropicalis show largely divergent editing levels or developmental profiles. In protein-coding regions, only a small subset of sites that are found mostly in the brain are well conserved between frogs and mammals. The conservation of RNA editing in aphids is still unknown, and we will continue to pay attention to this issue in our future research works.

      Reference: Nguyen TA, Heng JWJ, Ng YT, Sun R, Fisher S, Oguz G, Kaewsapsak P, Xue S, Reversade B, Ramasamy A, Eisenberg E, Tan MH. Deep transcriptome profiling reveals limited conservation of A-to-I RNA editing in Xenopus. BMC Biology. 2023, 21(1):251.

      (2) "Two miRNA-target prediction software programs, miRanda and RNAhybrid, were used to identify the miRNAs that potentially act on CYP18A1. The results showed that miR-3036-5p could bind to the sequence containing edited position (editing site 528) of CYP18A1 in M. dirhodum." Is there any other miRNA that can also act on CYP18A1, thereby regulating its expression?

      The predicted results indicate that there are several other miRNAs can act on CYP18A1, but none of them can bind to this editing site (editing site 528). Therefore, we did not pay attention to other miRNAs.

      (3) 11678 A-to-I RNA-editing sites were systematically identified in M. dirhodum. Does that mean RNAi-mediated knockdown of ADAR2 may affect the RNA-editing and expression of a large number of genes? Please clarify.

      It is of course possible that RNAi-mediated knockdown of ADAR2 may affect the RNA-editing and expression of a large number of genes. A-to-I RNA editing was also observed in 5 other genes that involved in 20E biosynthesis and signaling pathway, but no evident difference was identified for the RNA editing and expression levels of these 5 genes after crowding treatment (Fig. S2, Table S5). That means the A-to-I RNA editing of CYP18A1 might be crucial in 20E-mediated wing dimorphism in M. dirhodum.

      (4) It is interesting that "the transcriptional level of ADAR2 was 2.19 fold higher in the crowding+20E treatment parent than that in the normal group, but no significant difference was identified between the crowding and normal groups". ADAR2 can be induced by 20E, rather than crowding. How should the author explain? It seems that 20E induction can also cause many RNA editing events.

      20-hydroxyecdysone (20E) can affect the growth and development, molting, metamorphosis, and reproductive processes of insects. According to this result, 20E induction can also cause RNA editing events by regulating the expression of ADAR2, and which may provide valuable references for the future study on 20E. Meanwhile, we will also continue to pay attention to this issue in our future research works.

      (5) Authors provided a lot of text to describe the genome assembly. I don't think it's necessary, authors can make appropriate deletions.

      Thank you for this suggestion. This is the first high-quality chromosome-level genome of M. dirhodum, which will be very helpful for the cloning, functional verification, and evolutionary analysis of genes in this important species or even other Hemiptera insects. Therefore, I think it is necessary to provide a detailed description. We will also make appropriate deletions in the “Result and Discussion” sections.

      Reviewer #2 (Recommendations For The Authors):

      Additional concerns

      - With an existing genome sequence available for the peas aphid *Acyrthosiphon pisum*, why have these authors chosen to use the rose-grain aphid for this study? It would be helpful to address any limitations in *Acyrthosiphon pisum* or advantages in *Metopolophium dirhodum* that explain that decision.

      Wheat is one of the main grain crops in China as well as in the world. Metopolophium dirhodum is one of the most important wheat aphids around China, and has posed a significant threat to grain production. The current study was conducted to determine the regulatory mechanism of wing dimorphism on M. dirhodum, which might be of great significance to better control this pest in wheat production.

      Surely the pea aphid offers more established experimental tools and genomic resources. However, with the development of high-throughput sequencing technology, the chromosome level genomes of many insect species have been assembled. That means any of various insects might be studied as a model species, and not limited to Drosophila melanogaster, Acyrthosiphon pisum, etc.

      - In Figure 5E, what anatomy is being shown in FISH? Moreover, this represents a single sample. It would be preferable to include a supplemental figure with comparable images from at least 3 additional specimens.

      It is the whole aphid body, and we have already uploaded additional 2 FISH images to the supplementary material Fig. S5. Thank you for this suggestion.

      - L190: Conservation alone seems inadequate to conclude that a chromosome functions as a sex chromosome. It would be fine to note the homology between Chr1 and the X of other Aphidini, but there are other explanations for that. Inference that Chr 1 is a sex chromosome might come from observations in karyotypes (by relative size comparisons or ideally from FISH) or from comparison of reads mapped to the chromosomes, suggesting Chr1 is hemizygous in males.

      Karyotype analysis experiment was not conducted in this research, so here the sex chromosome was determined based on chromosome homology between M. dirhodum and A. pisum genome. We have made appropriate modifications to the description in the article. Thank you for this suggestion.

      - L205: It's unclear to me how to interpret RNA editing results, based on RNAseq data, that map to "intergenic regions", especially when this is such a large fraction (37.3%) of the total result. Does this suggest a fundamental problem with the analysis, that so much RNAseq data maps to parts of the genome that are not annotated as genes?

      Non-coding RNA regions often account for a large proportion in the genome, and this RNAseq data is mapped to non-coding RNA transcription regions (37.3%) between protein-coding genes (intergenic regions).

      - L288-290: What degrees of confidence are attached to the predictions of these miRNA targets?

      There is no clear research indicating the accuracy of miRNA target prediction software. However, by comprehensively utilizing multiple prediction tools and experimental verification, the accuracy and reliability of prediction can be significantly improved.

      Actually, the prediction of miRNA targets is only a preliminary identification step, and we have subsequently demonstrated that miR-3036-5p can act on CYP18A1 through dual-luciferase reporter assay, RNA immunoprecipitation and FISH, etc.

      - L296-298: The mechanism proposed in this study seems to imply that miR-3036-5p should be absent (not expressed) in aphids under crowded conditions. Therefore, relative realtime PCR is not particularly useful here. Finding that the miR relative expression is reduced by 48.8% is meaningless, because in *relative* expression, zero has no special meaning. In this case, absolute quantitative PCR, measuring actual transcript numbers, would be far more informative.

      miR-3036-5p is not absent in aphids under crowded conditions. Only a significant decrease of miR-3036-5p in expression level under crowded conditions was identified compared to normal feeding conditions (Fig. 5B). So it should be reasonable to use relative quantitative methods for expression level analysis.

      - L361: Isn't alternative mRNA splicing a more common post-transcriptional modification?

      I'm very sorry, this sentence has been modified to “A-to-I RNA editing is one of the most prevalent forms of posttranscriptional modification in animals, plants, and other organisms.” Thank you for this suggestion.

      - L372: "Functional wing polymorphism is commonly observed in insects as a form of adaptation and a source of variation for natural selection (14)." The relationship between plastic phenotypic variation and natural selection is complex, and there is a large theoretical literature in evolutionary biology and evo-devo on this topic, but it is not a focus in the cited review by Zhang et al.. It would be helpful if the authors could expand on this idea with reference to some of this literature (e.g. Levins 1968; Harrison 1980; Moran 1992; Roff 1996; West-Eberhard 2003; Zera 2009).

      I have changed the citation and expanded on this idea. “Wing polymorphism is commonly observed in insects, resulting from variation in both genetic factors and environmental factors (Zera 2009).”

      - L404: Use the word "accurate" seems inappropriate in this context. Both morphs are equally "accurate".

      This sentence has been modified to “resulting in the alteration of CYP18A1 expression and wing dimorphism of offspring regulated by miR-3036-5p”, Thank you for this suggestion.

      - L412: Reference 67 seems irrelevant to this point.

      References have been changed and added.

      67. E.J. Duncan, C.B. Cunningham, P.K. Dearden. Phenotypic plasticity: what has DNA methylation got to do with it? Insects. 13(2):110 (2022).

      68. K.J. Rangan, S.L. Reck-Peterson, RNA recoding in cephalopods tailors microtubule motor protein function. Cell 186, 2531-2543 (2023).

      - L443: Is this referring to "mixed stage" aphids?

      Yes. To make it clearer, this sentence has been modified to “Approximately 200 mg of fresh M. dirhodum with mixed stages (including first- to fourth-instar nymphs and winged and wingless adults)”.

      - L483: What mass or number of individual aphids was used? I assume multiple individuals were pooled?

      Each sample contains approximately 200 aphids.

      - L499: Why was k = 17 used? The default is k = 21.

      The selection of k is usually an odd number between 15 and 21, which ensures that the types of k-mers can cover the genome while being small enough to avoid erroneous effects. Therefore, using 17 is very reasonable.

      - L574: what does it mean "multiple editing types"? What different types are possible? Are you referring to things other than A-to-I editing?

      That means besides A-to-I, this locus may also have other editing situations, such as A-to-C. If this situation occurs, it will be discarded.

      - L635: Which luciferase construct or plasmid has been used in this experiment? Citation to that source is necessary.

      PmirGLO vector (Promega, Leiden, Netherlands) was used in this experiment, and a reference has been added.

      B. Zhu, L. Li, R. Wei, P. Liang, X. Gao. Regulation of GSTu1-mediated insecticide resistance in Plutella xylostella by miRNA and lncRNA. PLoS Genetics. 17(10), e1009888 (2021).

      - L644: Did cDNA synthesis employ random primers or a poly-dT primer?

      This kit provides mixed primers, including random and poly-dT primers. (PrimeScript™ RT reagent Kit with gDNA Eraser (Perfect Real Time), Takara Biotechnology, Dalian, China).

      - Fig 4D: Seems like this panel should be divided to cover the two sites, as in Fig 3F. Right now the x-axis labels seem redundant.

      Done. Thank you for this suggestion.

      - Fig 7: Consider adding ADAR2 to this figure.

      Done. Thank you for this suggestion.

      - Table 1: It would be helpful to represent this data in a figure where the phylogenetic relationships among the species can be shown.

      The phylogenetic relationships among the species were shown in Fig. 1D, and the table here may present genome information in more detail.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      This paper focuses on secondary structure and homodimers in the HIV genome. The authors introduce a new method called HiCapR which reveals secondary structure, homodimer, and long-range interactions in the HIV genome. The experimental design and data analysis are well-documented and statistically sound. However, the manuscript could be further improved in the following aspects.

      Major comments:

      (1) Please give the full name of an abbreviation the first time it appears in the paper, for example, in L37, "5' UTR" "RRE".

      Thank you for your suggestion. We have added the full name of these abbreviations.

      (2) The introduction could be strengthened by discussing the limitations of existing methods for studying HIV RNA structures and interactions and highlighting the specific advantages of the HiCapR method.

      Thank you for your insightful suggestion. We have modifed sentences in the introduction section (line 66 -line 71, line 80-line 81 in the revised manuscript).

      (3) Please reorganize Results Part 1.

      Thank you for your advice. We have reorganized results part 1. We hope the revision provides a logical flow and clarity to the results, making it easier for readers to follow the progression of the study and the significance of the findings regarding to the HiCapR method.

      (4) Is there any reason that the authors mention "genome structure of SARS-CoV-2" in L95?

      Thank you for your insightful question. We have deleted this sentence in the revised paper.

      Initially, the mention of our previous work on SARS-CoV-2 serves two purposes: firstly, to demonstrate our capability to perform proximity ligation assays on viral samples; and secondly, to underscore the necessity of the hybridization step, which is particularly relevant for the study of HIV.

      Unlike SARS-CoV-2, which is highly abundant in infected cells and does not require post-library hybridization, HIV-1 presents a unique challenge due to its typically low viral RNA input within cells. The simplified SPLASH protocol, while effective for more abundant viral RNAs, does not provide the necessary coverage for high-resolution analysis when applied directly to HIV samples.

      Now, we have deleted this sentence according to your comments, and discuss the technical difference elsewhere.

      (5) L102: Please clarify the purpose of comparing "NL4-3" and "GX2005002." Additionally, could you explain what NL4-3 and GX2005002 are? The connection between NL4-3, GX2005002, and HIV appears to be missing.

      Thank you for your question, and we apologize for the misleading. "NL4-3" and "GX2005002" are two distinct HIV-1 strains that exhibit different prevalence patterns in various geographical regions. The NL4-3 strain is a well-characterized laboratory strain that is widely used in HIV research and is representative of the HIV-1 subtype B, which is highly prevalent in Europe and the Americas. On the other hand, GX2005002 is a primary isolate of the CRF01_AE subtype, which is one of the most prevalent strains in Southeast Asia, particularly in China.

      The reason for comparing these two strains in our study is twofold. Firstly, it allows us to assess the applicability and versatility of our HiCapR method across different HIV-1 strains that may have distinct genetic and structural features. This is crucial for understanding the potential broad utility of our method in studying various HIV-1 strains globally. Secondly, by comparing these strains, we can begin to elucidate any strain-specific differences in RNA structure, homodimer formation, and long-range interactions, which may have implications for viral pathogenesis, transmission, and response to therapeutic interventions.

      The connection between NL4-3, GX2005002, and HIV lies in their representation of different subtypes of the HIV-1 virus, which exhibit genetic diversity and are associated with different geographical distributions. This diversity is epidemiologically and clinically relevant, as it may be associated with different pathogenesis and resistance mechanisms, and might has implications for vaccine development and treatment strategies.

      (6) Figure 1A is not able to clearly present the innovation point of HiCapR.

      Thank you for your comment. We have revised this figure to more clearly illustrate the steps and principles of the post-library capture process using HIV pooled probes hybridization and streptavidin pull down to enrich HIV RNA-derived chimeras.

      (7) Please compare the contact metrics detected by HiCapR and current techniques like SHAPE on the local interactions to assess the accuracy of HiCapR in capturing local RNA interactions relative to established methods.

      Thank you for your request to compare the contact metrics detected by HiCapR and current techniques like SHAPE on local interactions to assess the accuracy of HiCapR in capturing local RNA interactions relative to established methods.

      In this study, HiCapR has demonstrated its ability to identify key structural elements within the HIV genome, including TAR, polyA, SL1, SL2, and SL3, as well as the polyA-SL1 in the monomeric conformation. These elements are crucial for understanding the local RNA structures involved in HIV replication and pathogenesis. By visualizing the base pairing probability as a heatmap, we have identified the most stable base pairs in the 5’ UTR of HIV, which is consistent across both NL4-3 and GX2005002 strains (Figure 2D). This consistency suggests robustness in the overall structure despite sequence variations and alternative RNA conformations, indicating a high level of agreement between HiCapR and SHAPE methods in detecting local interactions.

      Furthermore, HiCapR not only confirms the presence of known structural elements but also reveals alternative conformations of the 5'UTR that support the alternative conformations found in SHAPE analysis. This additional layer of information provides a more comprehensive view of the RNA structures, highlighting HiCapR's ability to capture local RNA interactions with a high degree of accuracy comparable to established methods like SHAPE.

      (8) The paper needs further language editing.

      We have thoroughly revised the paper. We hope it’s improved significantly.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript "Mapping HIV-1 RNA Structure, Homodimers, Long-Range Interactions and 1 persistent domains by HiCapR" Zhang et al report results from an omics-type approach to mapping RNA crosslinks within the HIV RNA genome under different conditions i.e. in infected cells and in virions. Reportedly, they used a previously published method which, in the present case, was improved for application to RNAs of low abundance.

      Their claims include the detection of numerous long-range interactions, some of which differ between cellular and virion RNA. Further claims concern the detection and analysis of homodimers.

      Strengths:

      (1) The method developed here works with extremely little viral RNA input and allows for the comparison of RNA from infected cells versus virions.

      (2) The findings, if validated properly, are certainly interesting to the community.

      Thank you for your comprehensive review and insightful comments on our manuscript. We appreciate your recognition of the strengths of our HiCapR method and the potential interest of our findings to the scientific community.

      Weaknesses:

      (1) On the communication level, the present version of the manuscript suffers from a number of shortcomings. I may be insufficiently familiar with habits in this community, but for RNA afficionados just a little bit outside of the viral-RNA-X-link community, the original method (reference 22) and the presumed improvement here are far too little explained, namely in something like three lines (98-100). This is not at all conducive to further reading.

      Thank you for your feedback on the clarity of our manuscript, particularly regarding the explanation of the HiCapR method and its improvements over the original method mentioned in reference 22

      In response to your feedback, we expand on the description of the HiCapR method in the revised manuscript to ensure that it is accessible to a broader audience. We will provide a more thorough comparison between HiCapR and the original method, detailing the specific improvements and how they enable the analysis of low-abundance viral RNAs like HIV. This will include:

      Post-library Hybridization: Unlike the original method, HiCapR incorporates a post-library hybridization step. This innovation allows for the capture of target RNA involved in interactions after library construction, offering additional flexibility and enhancing the resolution of the analysis.

      Enhanced Sensitivity: HiCapR has been optimized to work with extremely low viral RNA input, which is a significant advancement over the original method. This is crucial for studying viruses like HIV, where obtaining high quantities of viral RNA can be challenging. As a matter of fact,

      (2) Experimentally, the manuscript seems to be based on a single biological replicate, so there is strong concern about reproducibility.

      Thank you for raising the issue of reproducibility in our study. We understand the importance of experimental replication in ensuring the reliability of our findings. In response to your concern, we would like to provide the following clarification and additional details regarding the reproducibility of our HiCapR experiments:

      Replicates in HiCapR Experiments: All ligation and control samples in our HiCapR experiments were performed in three biological replicates. This was done to ensure the high reproducibility of our results. The high degree of correlation (r > 0.99) between these replicates underscores the reliability of our findings.

      Dimer Validation Experiments: To validate the dimer formation of RRE and 5’-UTR, we employed multiple independent methods, including Native agarose gel electrophoresis, Agilent 4200 TapeStation Capillary electrophoresis, and Biomolecular Binding Kinetics Assays. These methods provide complementary perspectives on the dimer formation, enhancing the robustness of our validation process. The data presented in Figure 3C and Supplementary figure S12 are representative results from these experiments, which consistently support our findings on dimer formation.

      Agreement Between Cellular and Virion RNA: Our study also demonstrates a significant similarity between virions in the supernatant and infected cells from the same viral strain, as shown in Supplementary Figure S3. This consistency further validates the reproducibility and reliability of our HiCapR method in capturing RNA structures and interactions under different conditions.

      Consistency across two strains: Our study includes a comprehensive analysis of two distinct HIV-1 strains, NL4-3 and GX2005002, which are prevalent in Europe and Southeast Asia, respectively. The consistency in our findings across these strains serves as a strong indicator of the reproducibility and general applicability of our HiCapR method. Specifically, presence of key structural elements such as TAR, polyA, SL1, SL2, and SL3 in both NL4-3 and GX2005002 strains, suggests a robust structural framework that is conserved across different strains, despite sequence variations. Additionally, our study reveals approximately 20 candidate dimer peaks conserved between the NL4-3 and GX2005002 strains along the genome. The conservation of these dimer peaks across strains indicates a reproducible pattern of dimerization.

      (3) The authors perform an extensive computational analysis from a limited number of datasets, which are in thorough need of experimental validation

      Thank you for your comment.

      In response to your concern, we would like to clarify that while our manuscript does present an extensive computational analysis, we have also conducted a series of experiments. Specifically, we have validated dimer formation using multiple independent methods (afore discussed).

      Given the time-consuming nature of additional experiments, we have chosen to share the HiCapR data with the community in a timely manner. This approach allows for more immediate communication and evaluation of the data on HIV structure, which we believe is valuable for advancing the field.

      We are committed to further investigating the functional implications of our structural findings. We plan to conduct more experiments to explore the functional linking between the structural insights of HIV, which will help to deepen our understanding of the virus's replication and potential antiviral strategies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I suggest a major revision of the manuscript.

      Minor comments:

      (1) The article lacks consistency in its presentation. The expression of the proper noun is wrong in the paper. For example, (a) L89, "RNA:RNA interaction" →RNA-RNA interaction; (b) L431, "SARS-COV-2" → SARS-CoV-2;

      We are sorry for the inconsistency. We have corrected the mistakes.

      (2) "We identified dimers based on the methodology described in23." is not a complete sentence.

      Thank you for your insightful comment. We have revised the sentence to provide a complete and clear description of our methodology. The revised sentence is as follows: "Homodimers were identified in accordance with the methods previously reported in the literature."

      Reviewer #2 (Recommendations for the authors):

      (1) The authors perform an extensive computational analysis from a limited number of datasets, which are in thorough need of experimental validation. There is a single series on in vitro validation of the interaction of an homodimerization site, described in five lines (278-283) plus the Figure panel 3c with a very brief legend, and an extremely minimalist Figure S12. The panel to Figure 3c contains Kd values which have not been assessed for significant digits.

      Thank you for your constructive feedback on our manuscript.

      We acknowledge that our computational analysis is based on a limited number of datasets. Due to the initial exploratory nature of our study and the logistical challenges of generating additional datasets, we have focused on in-depth analysis of the available data. We are currently working on further validating our findings and are committed to publishing these results in a follow-up study.

      Regarding Experimental Validation:

      We agree that the initial description of our in vitro validation of the homodimerization site was concise. To address this, we have expanded the description of our experimental procedures. Specifically, we have detailed the methods used for the in vitro transcription, the preparation of RNA samples, and the use of the Octet R8 platform for biomolecular binding kinetics assays.

      For the Kd values presented in Figure 3c. We have now included standard error of the mean and have defined the significant digits in the figure legend. This revision provides a more accurate representation of the binding affinities.

      (2) As a further example to be experimentally validated, splice sites are discussed after lines 354, for which unsophisticated validation techniques such as targeted RT-PCR are widely accepted.

      In response to your comment, we would like to clarify that the splice sites mentioned in our study are well-established and widely recognized in the literature. They have been previously characterized and are considered canonical within the HIV research community. Given their established nature, we have relied on this foundational knowledge in our analysis.

      However, we concur with the importance of validating the regulatory role of homodimers in splicing, which is a significant aspect of HIV biology. While we have provided evidence for the presence of these homodimers and their potential implications for splicing, we acknowledge the need for further functional studies to elucidate their mechanistic role.

      Due to the scope and length constraints of the current manuscript, we have chosen to focus on the structural and interaction analyses provided by HiCapR. The functional validation of these homodimers and their impact on splicing will be pursued in subsequent studies, which we plan to initiate promptly. We believe that a dedicated follow-up study will allow for a more in-depth exploration of this complex and important aspect of HIV gene regulation.

      We are committed to advancing our understanding of the functional significance of these homodimers in the context of HIV splicing and will ensure that this line of investigation is thoroughly addressed in our future work.

      Thank you again for your valuable feedback. We look forward to contributing further to the field with our ongoing research.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      “This work presents valuable data demonstrating that a camelid single-domain antibody can selectively inhibit a key glycolytic enzyme in trypanosomes via an allosteric mechanism. The claim that this information can be exploited for the design of novel chemotherapeutics is incomplete and limited by the modest effects on parasite growth, as well as the lack of evidence for cellular target engagement in vivo.”

      We agree with this assessment. In this re-worked version, we implemented the textual changes suggested by the reviewers and performed additional in silico work. The reviewers also presented valuable suggestions for additional experiments. However, we currently don’t have dedicated hands and funding for this project, which renders it impossible for us to perform additional “wet lab” experiments at this stage. We have thus not included new experimental “wet lab” data. Finally, the claim that our results may be exploited for the design of novel chemotherapeutics perhaps came across stronger than we intended to. We still believe our findings indicate a potential for such an endeavor, but this clearly requires further investigation and experimental evidence. We have softened this statement by removing it from the abstract and have edited the discussion to end as follows.

      “Based on the presented results, we propose that sdAb42 may pinpoint a site of vulnerability on trypanosomatid PYKs that could potentially be exploited for the design of novel chemotherapeutics. Indeed, antibodies (or fragments thereof) are valuable drug discovery tools. Antibodies (and camelid sdAbs especially) are known for their ability to "freeze out" specific conformations of highly dynamic antigens, thereby exposing target sites of interest, which could be exploited for rational drug design (the development of so-called "chemo-superiors", (Lawson, 2012; Khamrui et al., 2013; van Dongen et al., 2019)). While the design of a "chemo-superior" inspired on the sdAb42-mediated allosteric inhibition mechanism will require further investigation, the results presented here provide a foundation to fuel such an endeavour.”

      REVIEWER 1:

      Summary:

      The authors identified nanobodies that were specific for the trypanosomal enzyme pyruvate kinase in previous work seeking diagnostic tools. They have shown that a site involved in the allosteric regulation of the enzyme is targeted by the nanobody and using elegant structural approaches to pinpoint where binding occurs, opening the way to the design of small molecules that could also target this site.

      Strengths:

      The structural work shows the binding of a nanobody to a specific site on Trypanosoma congolense pyruvate kinase and provides a good explanation as to how binding inhibits enzyme activity. The authors go on to show that by expressing the nanobodies within the parasites they can get some inhibition of growth, which albeit rather weak, they provide a case on how this could point to targeting the same site with small molecules as potential trypanocidal drugs.

      Weaknesses:

      The impact on growth is rather marginal. Although explanations are offered on the reasons for that, including the high turnover rate of the expressed nanobody and the difficulty in achieving the high levels of inhibition of pyruvate kinase required to impact energy production sufficiently to kill parasites, this aspect of the work doesn't offer great support to developing small molecule inhibitors of the same site.

      Recommendations for authors:

      Generally, the paper is very well written and the figures and their legends are clear.

      Comment 1.1: I thought the Introduction could give more focus to the need for new drugs for veterinary trypanosomiasis. The reality is that with fexinidazole now available and acoziborole soon to be available, with <1,000 cases of human African trypanosomiasis in each of the last five years, the case for needing new drugs is difficult to make. For Animal trypanosomiasis, however, the need for novel drugs is much more pressing.

      We agree with this comment and have included an additional section in the Introduction’s second paragraph, which reads as follows.

      “Hence, there is a need for alternative compounds, preferably with novel modes of action and/or designed based on mechanistic insights of the target’s structure-function relationship (Field et al., 2017; De Rycker et al., 2018). This need is especially pressing for AAT, which strongly impedes sustainable livestock rearing in Sub-Saharan Africa. AAT results in drastic reductions of draft power, meat, and milk production by the infected animals (small and large ruminants), and its control relies mainly on vector control and chemotherapy, with only few drugs currently available. The lack of routine field diagnosis has resulted in the misuse of trypanocidal drugs, thereby accelerating the rise of parasite resistance and further exacerbating the problem (Richards et al., 2021). As such, AAT-inflicted annual losses are estimated at around $5 billion (and the necessity to invest another $30 million each year to control AAT through chemotherapy), thereby having a devastating impact on the socio-economic development of Sub-Saharan Africa (Fetene et al., 2021). In contrast, HAT is perceived as a minor threat as it has reached a post-elimination phase as a public health problem with less than 1,000 yearly documented cases (Franco et al., 2022). In addition, new and effective drugs for HAT treatment have recently become available (De Rycker et al., 2023). HAT control currently relies on case detection and treatment, and vector control (Büscher et al., 2017).”

      Comment 1.2: A few pedantic things can be tidied up too, for example on line 61 it is stated tsetse flies are part of the life cycle for all trypanosomes while some veterinary species e.g. T. evansi and some T.vivax strains use other biting flies for transmission. I'd also add in the Introduction that pyruvate kinase is not a glycosomal enzyme (it is covered in the legend to figure 1 but I think it is quite important to clarify in the Introduction too so as to assure readers aren't wondering if "intrabodies" can get targeted there.

      We agree with this comment and have included an additional section in the Introduction’s third paragraph to expand on the life cycles of African trypanosomes, which reads as follows.

      “African trypanosomes are extracellular parasites that have a bipartite life cycle involving insect vectors and mammals as hosts (Radwanska et al., 2018). Most HAT (T. brucei gambiense and T. b. rhodesiense) and AAT (T. b. brucei and T. congolense) causing trypanosomes are uniquely vectored by tsetse flies (Glossina spp.) and are confined to Sub-Saharan Africa. T. b. evansi and T. vivax (both causative agents of AAT) have expanded beyond the tsetse belt due to their ability to be mechanically transmitted by a variety of biting flies (Glossina, Stomoxys, and Tabanus spp.). Finally, T. b. equiperdum infects equids and represents an exception as it is transmitted directly from animal to animal through sexual contact.”

      The introduction now also explicitly mentions that pyruvate kinase is not a glycosomal enzyme.

      Comment 1.3: The introduction would also be a good place to include some more information on what is known about the allosteric effectors of pyruvate kinase in trypanosomes, and emphasize where gaps in knowledge exist too.

      We agree with this comment and have included an additional section in the Introduction’s third paragraph, which reads as follows.

      “Pyruvate kinase (PYK) represents another attractive glycolytic target. This non-glycosomal enzyme catalyses the last step of the glycolysis (the irreversible conversion of phosphoenolpyruvate (PEP) to pyruvate; Figure 1A). The importance of this reaction is two-fold: i) the generation of ATP through the transfer of a phosphoryl group from PEP to ADP and ii) the formation of pyruvate, a crucial metabolite of the central metabolism. Like most PYKs, trypanosomatid PYKs are homotetramers. The PYK monomer is a ∼55 kDa protein organized into four domains termed ’N’, ’A’, ’B’, and ’C’ (Figure 1B). The A domain constitutes the largest part of the PYK monomer and is characterized by an (𝛼/𝛽)8-TIM barrel fold that contains the active site. Together with the N-terminal domain, it is also involved in the formation of the PYK tetramer AA’ dimer interfaces. The B domain is known as the flexible ’lid’ domain that shields the active site during enzyme-mediated phosphotransfer. Finally, the C domain harbors the binding pocket for allosteric effectors and stabilizes the PYK tetramer by formation of CC’ dimer interfaces. Because of their role in ATP production and distribution of fluxes into different metabolic branches, the activity of trypanosomatid PYKs is tightly regulated through an allosteric mechanism known as the "rock and lock" model (Morgan et al., 2010, 2014; Pinto Torres et al., 2020). In this model (which is detailed in Figure 1C), the binding of substrates and/or effectors (and analogs thereof) to the active and effector sites, respectively, trigger a conformational change from the enzymatically inactive T state to the catalytically active R state. Known effector molecules for trypanosomatid PYKs are fructose 2,6-bisphosphate (F26BP), fructose 1,6-bisphosphate (F16BP) and sulfate (SO<sub>4</sub><sup>2-</sup>), with F26BP being the most potent one (van Schaftingen et al., 1985; Callens and Opperdoes, 1992; Ernest et al., 1994; Tulloch et al., 2008). Interestingly, trypanosomatid PYKs seem to be largely unresponsive to the allosteric regulation of enzyme activity by free amino acids (Callens et al., 1991), which contrasts with human PYKs (Chaneton et al., 2012; Yuan et al., 2018). Known trypanosomatid PYK inhibitors impair enzymatic activity through occupation of the PYK active site (Morgan et al., 2011).”

      In the Results, although I am not qualified to analyse the structural data in detail I am confident in the ability of the authors to do so.

      Comment 1.4: Differences in nanobody binding kinetics to the T. congolense enzyme when compared to T. brucei and Leishmania enzymes are attributed to the relatively few amino acid differences in those sites. It is desirable to test site-directed mutagenesis of those residues.

      This is a highly valuable suggestion from the reviewer. However, we currently don’t have dedicated hands and funding for this project, which renders it impossible for us to perform additional experiments at this stage.

      Comment 1.5: In the section on slow-binding inhibition kinetics (lines 194-220) I found it difficult to follow whether it was just the R<>T transition that slowed nanobody inhibition, or whether competition with effectors at the site would also impact on those inhibition kinetics. Can this be clarified?

      Since the sdAb42 epitope is located relatively far away from both active and effector sites (~20 and ~40 Å, respectively), it seems highly unlikely the observed “slow-binding inhibition” kinetics are the result of a competition between sdAb42 on one hand and substrates and/or effectors on the other for enzyme binding. Instead, given that sdAb42 selectively binds and locks the enzyme’s inactive T state, these data can be explained by the idea that sdAb42 can only bind to trypanosomatid PYKs after having undergone an R- to T-state transition. To clarify this matter, we slightly reformulated the discussion as indicated below. We also included a small discussion on the observation that there is a 400-fold difference between the Kd and the IC50.

      “Since the sdAb42 epitope is located relatively far away from both active and effector sites (~20 and ~40 Å, respectively), it seems highly unlikely that the observed “slow-binding inhibition” kinetics are the result of a direct competition between sdAb42 and substrates and/or effectors. Instead, given that sdAb42 selectively binds and locks the enzyme’s inactive T state, these data can be explained by the idea that sdAb42 can only bind to trypanosomatid PYKs after having undergone an R- to T-state transition. An additional observation in this context, is the 400-fold difference between the K<sub>D</sub> and IC<sub>50</sub> values. Although we currently do not have a mechanistic explanation, similar differences have been observed for the sdAb-mediated allosteric inhibition of other kinases (Singh et al., 2022).”

      For the intrabody expression work, the reference cited on line 230 actually points to a growing ability to genetically modify T. congolense. However, it is justifiable to work on T.brucei given the much wider availability and advanced status of the genetic tools.

      The growth inhibition data shown in Figure 7 is weak, albeit significant and the case is made as to why that might be.

      Comment 1.6: The authors do point to the fact that inhibiting other parts of the glycolytic pathway might be helpful in getting a better growth inhibitory effect. It would be useful, in this regard, to test the ability of the PFK inhibitors in the Macnae et al. paper in the intrabody expressing line, and possibly other inhibitors e.g. 2-deoxy-D-glucose to see if these combinations do have the desired impacts. Also, looking at the metabolome of the intrabody expressors under induction could also give some further insights into changes in flux (although perhaps not on its own given the weak effects on the growth seen).

      This is a highly valuable suggestion from the reviewer. However, we currently don’t have dedicated hands and funding for this project, which renders it impossible for us to perform additional experiments at this stage. We would like to point out that, in our experience, studying the effect of enzyme inhibition on the metabolome is usually only useful shortly after adding the onset of inhibition. The system adapts to the lowered flux and relevant changes are mostly transient. Since the induced expression of sdAb42 is almost certainly slow, we expect the metabolic changes will be minimal.

      REVIEWER 2:

      Summary:

      In this work, the authors show that the camelid single-chain antibody sdAb42 selectivity inhibits Trypanosome pyruvate kinase (PYK) but not human PYK. Through the determination of the crystal structure and biophysical experiments, the authors show that the nanobody binds to the inactive T-state of the enzyme, and in silico analysis shows that the binding site coincides with an allosteric hotspot, suggesting that nanobody binding may affect the enzyme active site. Binding to the T-state of the enzyme is further supported by non-linear inhibition kinetics. PYK is an important enzyme in the glycolytic pathway, and inhibition is likely to have an impact on organisms such a trypanosomes, that heavily rely on glycolysis for their energy production. The nanobody was generated against Trypanosoma congolense PYK, but for technical reasons the authors progressed to testing its impact on cell viability in Trypanosoma brucei brucei. First, they show that sdA42 is able to inhibit Tbb PYK, albeit with lower potency. Cell-based experiments next show that expression of sdA42 has a modest, and dose-dependent effect on the growth rate of Tbb. The authors conclude that their data indicates that targeting this allosteric site affects cell growth and is a valuable new option for the development of new chemotherapeutics for trypanosomatid diseases.

      Strengths:

      The work clearly shows that sdA42A inhibits Trypanosome and Leishmania PYK selectively, with no inhibition of the human orthologue. The crystal structure clearly identifies the binding site of the nanobody, and the accompanying analysis supports that the antibody acts as an allosteric inhibitor of PYK, by locking the enzyme in its apo state (T-state).

      Weaknesses:

      (1) The most impactful claim of this work is that sdAb42-mediated inhibition of PYK negatively affects parasite growth and that this presents an opportunity to develop novel chemotherapeutics for trypanosomatid diseases. For the following reasons I think this claim is not sufficiently supported:

      Comment 2.1: The authors do not provide evidence of target-engagement in cells, i.e. they do not show that sdA42A binds to, or inhibits, Tbb PYK in cells and/or do not provide a functional output consistent with PYK inhibition (e.g. effect on ATP production). Measuring the extent of target engagement and inhibition is important to draw conclusions from the modest effect on growth.

      The authors do not explore the selectivity of sdA42A in cells. Potentially sdA42A may cross-react with other proteins in cells, which would confound interpretation of the results.

      We understand the reviewer’s concern. While it is theoretically possible that sdAb42 may non-specifically (cross-)react with other proteins in the cell, this would be highly unlikely based on the following arguments. First, many studies have employed sdAbs as intrabodies and reported on specific sdAb-mediated effects (outstanding reviews on the topic are Cheloha et al. (PMID 32868455) and Soetens et al. (PMID 33322697)). Second, it has been demonstrated that selecting sdAbs from an immune library through phage display or “bacteriomatch” (a bacterial system similar to yeast two hybrid) yields highly similar results (Pellis et al., PMID 22583807), thereby indicating that sdAbs interact specifically with their target antigens in an intracellular environment. Third, we identified TcoPYK as the target for sdAb42 by employing sdAb42 as bait in a pull-down from a parasite whole cell lysate (Pinto Torres et al., PMID 29899344). The pull-down fractions were analysed by SDS-PAGE and we observed a clear prominent band, which was further analysed by mass spectrometry and revealed TcoPYK as the target with great certainty. Even though the affinity of sdAb42 for TbrPYK is lower, it still remains high (nM affinity) and we expect it to bind TbrPYK with high specificity.

      Regarding measuring the effect on ATP production, we would like to state that such experiments are not obvious. Instead of measuring ATP levels, one should measure ATP turnover as ATP levels may not necessarily be decreased. The latter was observed to be the case for the specific inhibition of trypanosomal PFK (Nare et al. PMID 36864883). The specific trypanosomal PFK inhibitor inhibits motility (and growth) of T. congolense IL3000 at concentrations that only slightly affect ATP levels. One could think of repeating the sdAb42 experiments in a T. congolense model. However, T. congolense BSF metabolism is more complicated than that of T. brucei BSF. First, the T. congolense glucose metabolic network is more expanded, allowing a lower glucose consumption rate to produce ATP and metabolites for growth. Second, pyruvate is not excreted but further metabolised, in part in the mitochondrion. Steketee et al. (PMID 34310651) have shown that T. congolense also takes up pyruvate from the medium. One can thus check if (increased) external pyruvate (partially) rescues the growth inhibition by sdAb42. It will not provide proof, but maybe an indication. As mentioned above, we are currently unable to perform such additional experiments due to lack of dedicated hands and funding.

      Comment 2.2: sdA42A only affects minor growth inhibition in Tbb. The growth defect is used as the main evidence to support targeting this site with chemotherapeutics, however based on the very modest effect on the parasites, one could reasonably claim that PYK is actually not a good drug target. The strongest effect on growth is seen for the high expressor clone in Figure 4a, however here the uninduced cells show an unusual profile, with a sudden increase in growth rate after 4 days, something that is not seen for any of the other control plots. This unexplained observation accentuates the growth difference between induced and uninduced, and the growth differences seen in all other experiments, including those with the highest expressors (clones 54 and 55) are much more modest. The loss of expression of sdA42A over time is presented as a reason for the limited effect, and used to further support the hypothesis that targeting the allosteric site is a suitable avenue for the development of new drugs. However, strong evidence for this is missing.

      We agree that the growth effect of sdAb42 expression is modest, and we have provided several explanations as to why this could be the case. In addition, as mentioned at the start of this rebuttal, the claim that our results may be exploited for the design of novel chemotherapeutics was perhaps expressed stronger than we intended to. We still believe our findings indicate a potential for such an endeavor, but this clearly requires further investigation and experimental evidence as mentioned by the reviewer.

      We, however, disagree that PYK would not be a good drug target. Its potential to serve as a drug target is related to its fundamentally important role in trypanosomal glycolysis and not to the features of sdAb42. Steketee et al. (PMID 34310651) have shown that glycolysis is essential for T. congolense BSF, despite a lower glycolytic flux than in T. brucei BSF. The T. congolense glucose metabolic network is more expanded, allowing a lower glucose consumption rate to produce ATP and metabolites for growth. Also here, PYK is thus almost certainly essential and from that perspective a good drug target.

      Comment 2.3: For chemotherapeutic interventions to be possible, a ligandable site is required. There is no analysis provided of the antibody binding site to indicate that small molecule binding is indeed feasible.

      We agree with the reviewer’s comment and have included APOP analysis on the TcoPYK T state crystal structure (see also reply to Comment 3.1). Briefly, APOP works by detecting pockets and then perturbing each pocket in the protein's elastic network (GNM) by adding stiffer springs between the surrounding residues. The pockets are scored and ranked based on the calculated shifts in the eigenvalues of the global GNM modes and their local hydrophobic densities, thereby also considering the pocket’s surface accessibility, which renders it suitable for the identification of allosteric (and druggable) pockets. The APOP analysis identifies pockets overlapping with the sdAb42 epitope as highly ranking allosteric ligand binding pockets. The data have been summarized in an additional supplementary figure (Figure 4 – figure supplement 1). The manuscript also contains details on the performed APOP analysis in the Materials and Methods section.

      Comment 2.4: The authors comment on the modest growth inhibition, and refer to the need to achieve over 88% reduction in Vmax of PYK to see a strong effect, something that may or may not be achieved in the cell-based model (no target-engagement or functional readout provided). The slow binding model and switch of species are also raised as potential explanations. While these may be plausible explanations, they are not tested which leaves us with limited evidence to support targeting the allosteric site on PYK.

      In our understanding of this remark, we believe it be related to Comments 2.1 and 2.2 and thus refer to our answers formulated above.

      Comment 2.5: The evidence to support an allosteric mechanism is derived from structural studies, including the in silico allosteric network predictions. Unfortunately, standard enzyme kinetics mode of inhibition studies are missing. Such studies could distinguish uncompetitive from non-competitive behaviour and strengthen the claim that sdAb42 locks the enzyme complex in the apo form.

      We agree with the referee that a thorough kinetic analysis could distinguish between uncompetitive (i.e., sdAb only binds to the enzyme if substrate is bound) or non-competitive (i.e., sdAb can bind to apo enzyme and substrate-bound enzyme) inhibition. In both cases, however, this would correspond to an allosteric mechanism of inhibition. Although such a thorough kinetic analysis would be interesting in its own right, we would like to argue that this type of very detailed kinetics is outside the scope of this paper. This is especially the case taking into account that this analysis could be complicated by the slow-onset inhibition behavior.

      Comment 2.6: As general comment, the graphical representation of the data could be improved in line with recent recommendations: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002128, https://elifesciences.org/inside-elife/5114d8e9/webinar-report-transforming-data-visualisation-to-improve-transparency-and-reproducibility.

      - Bar-charts for potency are ideally presented as dot plots, showing the individual data points, or box plots with datapoints shown.

      - Images in Figure 7 show significant heterogeneity of nanobody expression, but the extent of this can not be gleaned from Figure 7B. It would be much better to use box plots or violin plots for each cell line on this figure panel. The same applies to Figure 10.

      We thank the reviewer for these suggestions but have taken the decision not to act upon these as the other reviewers explicitly mentioned that our figures are very clear.

      Recommendations for authors:

      Please find below some minor comments:

      Comment 2.7: Line 24: "increasing number of drug failures": This does not really reflect the current situation for human African trypanosomiasis, with NECT treatment retaining efficacy, fexinidazole now being registered, and acoziborole progressing towards registration. It may be worth considering focusing the introduction more on Nagana, as all Trypanosome species used in the paper are animal infective, and the nanobody was discovered for T. congolense.

      We refer to our answer formulated in response to Comment 1.1.

      Comment 2.8: Line 55: "alarming number of reports describing ..." While resistance is a big problem, this mainly applies to malaria, bacterial and fungal diseases. For kinetoplastids, the number of reports describing resistance in the clinic is pretty limited. However, the drug discovery pipeline for these diseases is sparse, so I definitely agree there is a need to develop new compounds with differentiated mechanisms.

      We agree with the reviewer and have slightly adapted our wording here as follows.

      “Unfortunately, a number of reports describe treatment failure or parasite resistance to the currently available drugs (De Rycker et al., 2018).”

      Comment 2.9: This manuscript is about pyruvate kinase, but the enzyme is not properly introduced. I suggest a short paragraph introducing PYK at line 77 (without duplicating Figure 1), covering its role in glycolysis, the importance of pyruvate, any essentiality data from the literature, and any known inhibitors.

      We refer to our answer formulated in response to Comment 1.3.

      Comment 2.10: Figure 6: For the top insets it would be useful to somehow show the increasing antibody concentration, either by using a changing intensity or size for each line.

      We thank the reviewer for this suggestions, but decided not to act upon it as we found that the inclusion of this information in the figure made it “too crowded”, which is why we opted to provide this information in the figure legend.

      “Only a subset of the traces is shown for the sake of clarity. The following curves are shown (from bottom to top): TcoPYK (0.15 nM sdAb42, 500 nM sdAb42, 750 nM sdAb42, 1000 nM sdAb42, 1500 nM sdAb42, 2000 nM sdAb42, no enzyme control), LmePYK (5 nM sdAb42, 750 nM sdAb42, 1250 nM sdAb42, 1500 nM sdAb42, 2500 nM sdAb42, 3000 nM sdAb42, no enzyme control), and TbrPYK (1 nM sdAb42, 1000 nM sdAb42, 1750 nM sdAb42, 2000 nM sdAb42, 3500 nM sdAb42, 4000 nM sdAb42, no enzyme control).”

      Comment 2.11: You refer to the curves as biphasic, but they look like 1st order kinetics, and there are no clear 1st and 2nd phases (or at least they are not marked). It may be more appropriate to label these as non-linear.

      We agree that the term “biphasic” is potentially an over-simplification of the actual situation. What we mean is that the formation of product as a function of time ([P] versus [t] curve) is not linear at short time ranges but evolves from an initial “weakly inhibited” rate (v<sub>0</sub>) to a “strongly inhibited” steady-state rate (v<sub>ss</sub>). This conversion from v<sub>0</sub> to v<sub>ss</sub> indeed occurs in a fashion following single exponential behavior. With the term “biphasic” we thus meant a non-linear phase (before v<sub>ss</sub> is reached) followed by a linear phase (after v<sub>ss</sub> is reached). To avoid any confusion, we replaced the term “biphasic” by “non-linear”.

      Comment 2.12: IC50s - would be useful to provide a comparison with IC50s generated in the pre-incubation experiments - is the antibody less potent without pre-incubation? I could not find IC50s for the pre-incubation experiments shown in Figure 2.

      We determined an IC50 value for sdAb42 against TcoPYK under pre-incubation conditions, but initially decided not to include this into the manuscript. We agree with the reviewer that a comparison between IC50 values determined under pre- and post-incubation conditions would be of interest, and have therefore included the pre-incubation IC50 data for TcoPYK in Figure 2 (panel B). The data indeed show that sdAb42 far more efficiently inhibits an enzyme that is not continuously cycling between R and T states (IC50 values of 15 nM and 359 nM under pre- and post-incubation conditions, respectively). This is now discussed in the results section of the manuscript. We did not determine IC50 values for sdAb42 against TbrPYK and LmePYK under pre-incubation conditions, but suspect that a similar observation will be made upon comparing these values to IC50 under post-incubation conditions.

      REVIEWER 3:

      Summary:

      Out of the 20 Neglected Tropical Diseases (NTD) highlighted by the WHO, three are caused by members of the trypanosomatids, namely Leishmanaisis, Trypanosomiasis, and Chagas disease. Trypanosomal glycolytic enzymes including pyruvate kinase (PyK) have long been recognised as potential targets. In this important study, single-chain camelid antibodies have been developed as novel and potent inhibitors of PyK from the T, congolense. To gain structural insight into the mode of action, binding was further characterised by biophysical and structural methods, including crystal structure determination of the enzyme-nanobody complex. The results revealed a novel allosteric mechanism/pathway with significant potential for the future development of novel drugs targeting allosteric and/or cryptic binding sites.

      Strengths:

      This paper covers an important area of science towards the development of novel therapies for three of the Neglected Tropical Diseases. The manuscript is very clearly written with excellent graphics making it accessible to a wide readership beyond experts. Particular strengths are the wide range of experimental and computational techniques applied to an important biological problem. The use of nanobodies in all areas from biophysical binding experiments and X-ray crystallography to in-vivo studies is particularly impressive. This is likely to inspire researchers from many areas to consider the use of nanobodies in their fields.

      Weaknesses:

      There is no particular weakness, but I think the computational analysis of allostery, which basically relies on a single server could have been more detailed.

      Recommendations for authors:

      Overall an excellent paper, there are just a couple of points that the authors could consider, if time allows.

      Comment 3.1: As mentioned above the computational analysis of allostery appears to be based on a single server based on coordinates alone with no in-depth analysis. It would be extremely interesting to see if more sophisticated methods based on elastic network model and/or molecular dynamics simulation gave similar results. I realize that this would require quite a lot of work though.

      We agree with the reviewer’s comment and have complemented the perturbation analysis (previously presented in the manuscript) with dGNM and APOP analyses to identify allosteric communication pathways and allosteric binding pockets, respectively. dGNM, which is based on transfer entropy, allowing for a detailed characterization of the dynamic coupling and information transfer between residues. Meanwhile, APOP employs a perturbation-based approach to detect and rank allosteric pockets. The findings are in good agreement with the previously presented perturbation data and have been summarized in an additional supplementary figure (Figure 4 – figure supplement 1). The manuscript also contains details on the performed transfer entropy and APOP analyses in the Materials and Methods section.

      Comment 3.2: The figures are excellent and really help the reader - with the exception of the screenshots (Figure 8). Using pymol or chimera (or any other more expensive commercial package) would really help the reader and will not take much time.

      We agree with the referee that this is not the most beautiful figure. However, we find the quality and clarity of the figure to be adequate for its purpose (i.e., a supplemental figure).

      Comment 3.3: Finally, I would have liked to see at least the PDB validation files. This is a highly regarded and experienced team, nevertheless, the resolution is rather mediocre. As the crystal coordinates were used as input for the computational, any experimental inaccuracies will affect the computational results.

      We agree with the reviewer that we could have provided the validation report together with the submitted manuscript and we apologise for the inconvenience. The validation reports will be released together with the structures following final manuscript publication. Regarding the resolution of the crystal structures, we agree with the reviewer’s comment, but we obviously employed data sets from our best diffracting crystals and could not obtain a higher resolution despite our best efforts.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Recommendations for the authors):

      While the authors have responded to most of the comments, a number of issues remain, most of which pertain to imprecise writing, as previously mentioned.

      In the second revision of our manuscript, we tried our best to precise our writing.

      For example, at high concentrations of PRG-GEF, the authors repeatedly state that RhoA is inhibited (including in the summary). While this may be functionally valid, it is imprecise. RhoA is activated (not inhibited), but its ability to promote contractility is impaired, presumably as a consequence of sequestration of the active GTPase by the PH domain of PRG-GEF. To put a finer point on this, the activity of RhoA•GTP is to bind to proteins that selectively bind active RhoA. One such protein the PH domain of PRG. In the case where PRG is overexpressed, RhoA•GTP binds to PRG. Due to the high concentrations of PRG in some cells, this outcompetes the ability of RhoA•GTP to bind other effectors such as formins or ROCK. However, there no strong evidence that RhoA is inhibited. The only hint of such evidence is a reduction in the biosensor for active RhoA, but this too is likely outcompeted by the overexpressed active GEF. There does not appear to be any disagreement about the mechanism, but rather a semantic difference.

      We thank Reviewer #2 for emphasizing this semantic concern, which indeed requires clarification. We agree that RhoA is not chemically inactivated; rather, the protein remains active but is functionally sequestered. Our use of the term “inhibition” was intended to describe functional inhibition, consistent with the definition of inhibition as the act of reducing, preventing, or blocking a process, activity, or function. However, we recognize that this terminology could be interpreted as imprecise. To address this, we have clarified the text by explicitly referring to "functional inhibition of RhoA signaling" where appropriate, or by rewording to terms such as "competitive inhibition of RhoA effector binding" to more accurately reflect the mechanism.

      Overall, the manuscript is written in a conversational style, not with the precision expected of a scientific manuscript.

      We acknowledge Reviewer #2’s comment regarding the style of our manuscript. While our manuscript adopts a somewhat conversational tone, this was a deliberate choice. We believe this style helps engage the reader and facilitates understanding of our reasoning, guided by the philosophy that science is conducted by humans and should be communicated in a way that resonates with them. That said, we fully agree that this approach should not compromise scientific precision. In response to this feedback, we have revised the manuscript to ensure greater clarity and precision while maintaining the approachable style we have chosen.

      To exemplify this, I provide an alternative phrasing of one such paragraph.

      Lines 51-62:

      Here, contrarily to previous optogenetic approaches, we report a serendipitous discovery where the optogenetic recruitment at the plasma membrane of GEFs of RhoA triggers both protrusion and retraction in the same cell type, polarizing the cell in opposite directions. In particular, one GEF of RhoA, PDZ-RhoGEF (PRG), also known as ARHGEF11, was most efficient in eliciting both phenotypes. We show that the outcome of the optogenetic perturbation can be predicted by the basal GEF concentration prior to activation. At high concentration, we demonstrate that Cdc42 is activated together with an inhibition of RhoA by the GEF leading to a cell protrusion. Thanks to the prediction of a minimal mathematical model, we can induce both protrusion and retraction in the same cell by modulating the frequency of light pulses. Our ability to control both phenotypes with a single protein on timescales of second provides a clear and causal demonstration of the multiplexing capacity of signaling circuits.

      Here, we report that the phenotypic consequences of plasma membrane recruitment of a guanine nucleotide exchange factor (GEF), PDZ-RhoGEF (PRG, aka ARHGEF11) depends on the level of expression and degree of recruitment of the GEF. At low concentrations, recruitment of PRG induces cell retraction, consistent with the expected function of a GEF for RhoA. However, at high concentrations, Cdc42 is activated, leading to cell protrusion. A minimal mathematical predicts, and experimental observations confirm, that the extent of recruitment determines the consequences of GEF recruitment. The ability of a single GEF to induce disparate outcomes demonstrates the multiplexing capacity of signaling circuits.

      We thank Reviewer #2 for providing an alternative phrasing for lines 51–62. We appreciate the effort to enhance clarity and precision in this key section of the manuscript. While we agree with many aspects of the suggested revision and have incorporated several elements to improve the text, we have also retained aspects of our original phrasing that align with the overall tone and structure of the manuscript. Specifically, we have ensured that the balance between precision and accessibility is maintained while integrating the reviewer's suggestions. We hope that the revised text now addresses the concerns raised.

      Key points to correct throughout the manuscript are:

      -  overexpression of PRG does not "inhibit" RhoA.

      -  retraction and protrusion are distinct phenotypes, they are not opposite phenotypes. One results from RhoA activation, the other results from Cdc42 activation.

      Regarding the term “inhibition,” we agree with the reviewer’s point and have addressed this in our earlier comment.

      Regarding the terminology of "opposite phenotypes," we believe this description is valid. While protrusion and retraction arise from distinct signaling pathways (Cdc42 activation and RhoA activation, respectively), we describe them as opposite phenotypes because they represent mutually exclusive cellular behaviors. A cell cannot protrude and retract at the same location simultaneously; instead, these behaviors represent opposing ends of the dynamic spectrum of cell morphology.

      Here are some other places where editing would improve the manuscript (a noncomprehensive list).

      We went through the whole manuscript to improve the scientific precision according to Reviewer #2 comment on the terminology “inhibition”.

      line 15 "inhibition of RhoA by the PH domain of the GEF at high concentrations."

      We modified the wording: “sequestration of active RhoA by the GEF PH domain at high concentrations”

      line 51 "Here, contrarily to previous optogenetic approaches"

      We removed “contrarily to previous optogenetic approaches"

      line 141 "We next wonder what could differ in the activated cells that lead to the two opposite phenotypes." (the state of mind of the authors is not relevant)

      As explained earlier, we made the choice to keep our writing style.

      line 185 "Very surprised by this ability of one protein to trigger opposite phenotypes"

      As explained earlier, we made the choice to keep our writing style.

      lines 206 ff "As our optogenetic tool prevented us from using FRET biosensors because of spectral overlap, we turned to a relocation biosensor that binds RhoA in its GTP form. This highly sensitive biosensor is based on the multimeric TdTomato, whose spectrum overlaps with the RFPt fluorescent protein used for quantifying optoPRG recruitment. We thus designed a new optoPRG with iRFP, which could trigger both phenotypes *but was harder to transiently express* (?? what does this have to do with the spectral overlap), giving rise to a majority of retracting phenotype. *Looking at the RhoA biosensor*, we saw very different responses for both phenotypes (Figure 3G-I). "

      We have clarified.

      lines 231ff "RhoA activity shows a very different behavior: it first decays, and then rises. It seems that, adding to the well-known activation of RhoA, PRG DH-PH can also negatively regulate RhoA activity." again, RhoA activity may appear to decay, but this is a limitation of the measurements. RhoA is likely activated to the GTP-bound form. PRG is not negatively regulating RhoA activity. An activity that prevents nucleotide exchange by RhoA or accelerates its hydrolysis would constitute negative regulation of RhoA.

      We modified the wording to clarify the sentence.

      The attempts to quantify the degree of overexpression, though rough, should be included in the version of record. It is not clear how that estimate was generated.

      The estimate of absolute concentration (switch at 200nM) was obtained by comparing fluorescent intensities of purified RFPt and cells under a spinning disk microscope while keeping the exact same acquisition settings. The whole procedure will be described in a manuscript in preparation, focused on Rac1 GEFs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ghone et al show that HIV-1 Vif causes a pseudo-metaphase arrest rather than a G2 arrest. The metaphase arrest correlates with misregulation of the kinetochore which could be explained by the loss of phosphatase functions that determine chromosome-microtubule interactions.

      Strengths:

      The single-cell imaging using different reporters of cell cycle progression is very elegant and the quantitation is convincing. The authors clearly show that what others have characterized as a G2 arrest by flow cytometry is somewhat later in metaphase and correlates with kinetochore misregulation.

      We sincerely appreciate the reviewer recognizing the quality and precision of our study, particularly our use of long-term live cell imaging combined with single-cell resolution analysis.

      Weaknesses:

      (1) The major problem with the paper is trying to connect what is observed in tumor cell lines with actual infections in primary T cells. While all of the descriptive work in cell lines is convincing, none of these cells are relevant targets and tumor cells have different cell death and cell cycle regulation than primary T cells. Thus, while Vif might well do all of the things described in the manuscript, it is a stretch to connect any of it to what happens in vivo.

      We fully agree with this point. It is indeed technically challenging to perform 48-120 hours of live-cell imaging at high magnification at short intervals using primary T cells because of their non-adherent nature. We also agree that Vif’s functions in pseudo-metaphase arrest and the consequent induction of cell death, observed in cancer cells (e.g., Cal51, HeLa, and MDA-MB-231 cell lines) or normal non-transformed epithelial cells (e.g., the RPE1 cell line), may differ in T cells. Further studies and refined approaches will be required to address this important question. We have revised the manuscript to include a discussion of this issue in the section of Limitation of this study.

      (2) Line 109 and elsewhere. The ability of Vif to cause cell cycle arrest and bind PP2A subunits is not a completely conserved feature. Rather, it is quite variable in different HIV-1 strains. (e.g. https://doi.org/10.1016/j.bbrc.2020.04.123 and https://elifesciences.org/articles/53036). Therefore, it is necessary for the authors to quite clearly use strain designations in the manuscript rather than a generic "Vif", and to more clearly describe the viruses being used.

      Thank you for raising this important point. We utilized the NL4-3 strain in our study and have revised the manuscript to specify this detail. While this study uncovered part of the mechanism by which Vif modulates phosphatase regulation during mitosis, further research is required to elucidate the full mechanism, particularly how this degradation induces a robust pseudo-metaphase arrest.

      (3) Figure 5: This figure shows disruption of PP2A-B56 at the kinetochores. However, is this specific to the kinetochores? Since Vif has been described to more broadly degrade PP2A-B56, could this not be a result of a more general decrease in PP2A activity throughout the cell?

      Thank you for highlighting this critical point. PP2A is a major serine/threonine phosphatase that regulates numerous essential cell cycle processes. To the best of our knowledge, Vif selectively targets the degradation of the B56 family of PP2A regulatory subunits, without affecting other three B-type subunits or the catalytic core of PP2A itself. During early mitosis, all five members of the B56 family (B56α, B56β, B56γ, B56δ, and B56ε) accumulate at kinetochores and centromeres, where they play critical roles in chromosome alignment. Many PP2A-B56 substrates are also localized to kinetochores and chromosomes during mitosis. Depletion of specific B56 isoforms or introduction of phosphorylation-deficient mutants of PP2A-B56 substrates at kinetochores has been shown to result in mitotic defects, underscoring the crucial roles of PP2A-B56 in regulating kinetochore, centromere, and chromosomal functions during mitosis. Interestingly, we observed no significant cell cycle arrest during G1, S, or G2 phases in Vif-expressing cells. While PP2A-B56 likely has important roles outside of mitosis, Vif-mediated degradation of PP2A-B56 appears to selectively disrupt its mitotic functions, particularly at the kinetochore. This finding highlights a targeted mechanism by which Vif interferes with PP2A-B56-mediated regulation of mitotic processes. However, further experiments are required to elucidate the precise mechanisms underlying Vif's inhibition of the specific mitotic roles of PP2A-B56.

      Reviewer #2 (Public review):

      Summary

      The authors characterize the cell-cycle arrest induced by HIV-1 Vif in infected cells. They show this arrest is not at G2/M as previously thought but during metaphase. They show that the metaphase plate forms normally but progression to anaphase is massively delayed, and chromosome segregation is dysregulated in a manner consistent with impaired assembly of microtubules at the kinetochore. This correlates with the lack of recruitment of B56-subunits of PP2 phosphatase which are known degradation targets of Vif, suggesting that this weakens and unbalances the microtubule-mediated forces on the separating chromosomes.

      Strengths

      The authors present a very well-performed set of quantitative live cell imaging experiments that convincingly show a difference between Vif and Vpr-mediated cell cycle arrests. Through an in-depth characterization of the Vif-mediated block in metaphase, they make a strong case for this phenotype being tied to the degradation of PP2-B56 by Vif. Furthermore, it is important that they have performed most of these experiments with virally infected cells, meaning that their observations are observable at relevant viral expression levels of Vif.

      We appreciate the reviewer’s recognition of the importance and significance of our study.

      Weaknesses

      Experimentally there is very little to criticize with respect to the cellular systems used. Data from 10.1016/j.bbrc.2020.04.123 has identified selective mutants that fail to degrade B56 while maintaining A3G degradation by Cul5, and it would be nice to confirm that such a mutant behaves like the delta-Vif virus when examining metaphase, but selective ablation of B56 during mitosis to mimic Vif is would expect to be very challenging and beyond the scope.

      Thank you for your valuable suggestion. As also highlighted by Reviewer #1, it is true that certain variants of Vif, as discussed in 10.1016/j.bbrc.2020.04.123, differentially impact B56 degradation. Notably, some variants degrade A3G without inducing cell cycle arrest. We agree that investigating whether Vif's effects on B56 are directly linked to the mitotic arrest phenotype is an important direction for future research. Equipped with our advanced imaging tools, we are now preparing to extend our studies to include Vif variants from additional HIV-1 subtypes, including primary isolates. As you rightly pointed out, depletion of B56 is expected to be challenging as the B56 family comprises multiple isoforms, each with distinct and partially redundant roles in mitosis, particularly in microtubule assembly and spindle assembly checkpoint regulation. The functions of PP2A-B56 in mitosis are well-documented compared to the relatively new studies on Vif’s role in PP2A-B56 degradation. In human cells, the B56 family comprises 5 isoforms (B56α, B56β, B56γ, B56δ, and B56ε). While all B56 isoforms localize to kinetochores or centromeres during early mitosis, the reasons for their slightly different localization patterns (to either kinetochores or centromeres) remain unclear (Vallardi et al., eLife, 2019). Notably, these isoforms exhibit functional redundancy; thus, the depletion of any single isoform does not result in severe mitotic defects (Foley et al., Nature Cell Biology, 2011; Neumann et al., Nature, 2010). Supporting this redundancy, the overexpression of a single isoform (tested only B56α and B56γ) can rescue kinetochore function when all other isoforms are depleted (Foley et al., Nature Cell Biology, 2011; Vallardi et al., eLife, 2019). This complexity poses significant challenges to modulating the relative levels of individual B56 isoforms experimentally. While these specific experiments are beyond the current scope of our study, we remain committed to advancing our understanding of the mechanisms driving Vif-induced pseudo-metaphase arrest. Your suggestion aligns with our ongoing efforts, and we will consider these experiments as we further explore this fascinating area.

      Where I would raise some criticism is in the relevance of these observations to the replication and pathogenesis of the virus itself, which the authors do not address or discuss. Firstly, despite clear data that both Vpr and Vif can lead to a cell cycle arrest in cycling cells, it has never been particularly clear why the virus does this. While I would agree with the authors that Vif results in the metaphase arrest through targeting B56-PP2A, this may not be the reason WHY the virus targets one of the cell's major phosphatases, but rather a knock-on effect of doing so. I appreciate that this is beyond the scope of the study, but it is something I feel should be discussed rather than the narrow mechanistic points made in the discussion. Secondly, the authors suggest that this activity of Vif is a major cause of apoptosis in infected cells and perhaps CD4+ T cell depletion in vivo. It would be good to quantify how much apoptosis is Vif-dependent in infected primary human CD4+ T cells rather than transformed tumor cells, and whether this correlates with the Vif-mediated induction of a pseudometaphase.

      Thank you for highlighting this important point. We completely agree that the full scope of Vif’s bi-functional roles, in both degrading the APOBEC3 family, which is essential for HIV-1 infection, and inducing cell cycle arrest, is not yet fully understood. The connection between Vif’s role in cell cycle arrest and the HIV-1 life cycle remains unclear. One possible explanation, as discussed in our study, is that Vif-induced pseudo-metaphase arrest may contribute to cell death, suggesting that Vif could play a role in the reduction of CD4+ T cells. Alternatively, Vif’s impact on cell cycle arrest, or its disruption of phosphatase activity, could facilitate HIV-1 virus production. However, further experiments, especially using primary human CD4+ T cells with similar approaches as in this study, are essential to gain deeper insights. This discussion has been included in the Limitations section of our study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The first paragraph of the Introduction is not necessary and anyway is quite outdated about the current state of HIV pathogenesis. Likewise, the discussion implies that HIV pathogenesis is due to virally-induced cell death, which is also outdated by more than a decade of work demonstrating that chronic immune activation is the driver of CD4 cell decline rather than direct cytotoxicity due to viral proteins.

      We have revised the first paragraph of the Introduction.

      (2) Line 134. I do not know what are Cal51 cells, and why they are being used for an HIV study here. Some rationale for being the cell of choice for this study should be included.

      Thank you for this suggestion. We have revised the text to clearly articulate the rationale for selecting the Cal51 cell line in this study. Briefly, this study focuses on the robust mitotic arrest induced by Vif. To capture this phenomenon, long-term live-cell imaging was required with a range of 48–120 hours, with imaging intervals of 6–12 minutes and 3–4 z-stacks per time point. These parameters presented considerable technical challenges. The Cal51 cell line was chosen as it has been genetically engineered by the CRISPR-Cas9 method to express mScarlet-tagged Histone H2B and mNeonGreen-tagged Tubulin, enabling extended live-cell imaging. Furthermore, the Cal51 cell line exhibits wild-type p53 expression and maintains a stable near-diploid karyotype, making it an ideal model for studying cell cycle progression.

      (3) A description of the viruses being used is necessary. Although the authors cite a previous paper, the names in that paper do not exactly match the names used here. I presume that is the NL4.3 strain?

      Thank you for raising this important point. We utilized the B type HIV-1 NL4-3 strain in our study and have revised the manuscript to specify this detail.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Reviews):

      Summary:

      This study examines to what extent this phenomenon varies based on the visibility of the saccade target. Visibility is defined as the contrast level of the target with respect to the noise background, and it is related to the signal-to-noise ratio of the target. A more visible target facilitates the oculomotor behavior planning and execution, however, as speculated by the authors, it can also benefit foveal prediction even if the foveal stimulus visibility is maintained constant. Remarkably, the authors show that presenting a highly visible saccade target is beneficial for foveal vision as the detection of stimuli with an orientation similar to that of the saccade target is improved, the lower the saccade target visibility, the less prominent the effect.  

      Strengths:

      The results are convincing and the research methodology is technically sound.

      Weaknesses:

      Discussion on how this phenomenon may unfold in natural viewing conditions when the foveal and saccade target stimuli are complex and are constituted by different visual properties is lacking. Some speculations regarding feedforward vs feedback neural processing involved in the phenomenon and the speed of the feedforward signal in relation to the visibility of the target, are not well justified and not clearly supported by the data.

      We thank the reviewer for their comment. In general, we tried to address conceptual points only briefly in this Research Advance if we had discussed them in depth in our main article which this advance will be linked to (Kroell & Rolfs, 2022: https://elifesciences.org/articles/78106). However, the reviews showed us that this rendered our theoretical reasoning in the current manuscript appear incomplete. In the revised Discussion section, we have elaborated on several conceptual questions. In particular, we expand on the transferability of our findings to natural viewing conditions:

      “Foveal prediction in natural visual environments

      As noted above, human observers typically move their eyes towards the most conspicuous objects in their environment (‘t Hart, Schmidt, Roth, & Einhäuser, 2013). Foveal prediction seems to benefit from this strategy as the strength of the predicted signal increases with the conspicuity of the eye movement target. Nonetheless, natural visual environments as well as naturalistic viewing behavior pose several challenges for the foveal prediction mechanism (see Kroell & Rolfs, 2022, for an initial discussion). 

      First, naturalistic saccade target stimuli will likely exhibit complex shapes and, more often than not, will include feature conjunctions rather than isolated features. Previous findings suggest that the foveal feedback mechanism is capable of operating at this level of complexity: High-level peripheral information such as the category of novel, rendered objects (Williams et al., 2008) has been successfully decoded from activation in foveal retinotopic cortex. If, indeed, temporal objectspecific areas such as area TE send feedback, the foveal prediction mechanism may even be specialized for the transfer of complex visual properties.

      Second, foveal input will often be of high contrast in natural visual environments. If fed-back predictive signals can influence foveal perception in the presence of high-contrast feedforward input remains to be established. In our main investigation (Kroell & Rolfs, 2022; Figure 2B) as well as in previous studies (Hanning & Deubel, 2022b), pre-saccadic foveal detection performance decreased markedly in the course of saccade preparation, presumably because visuospatial attention gradually shifted towards the saccade target and away from the foveal location. This presaccadic decrease in foveal sensitivity may boost the relative weight of fed-back signals by attenuating the conspicuity of high-contrast feedforward input. In other words, the strength of feedforward input to the fovea is reduced gradually across saccade preparation. At the same time, the strength of the fed-back predictive signal should profit from the high contrast of naturalistic saccade targets.

      Third, while foveal and peripheral information was congruent on 50% of all ‘probe present’ trials in our investigation, peripheral and foveal features will often be weakly correlated or even uncorrelated in natural environments (see Samonds, Geisler, & Priebe, 2018). Again, the presaccadic attenuation of foveal feedforward processing may allow fed-back peripheral signals to influence perception even if they are uncorrelated with foveal information. Moreover, in piloting variations of our paradigm, we observed that the subjective impression of perceiving the saccade target at the pre-saccadic foveal location is most pronounced if the foveal noise region is replaced with a black Gaussian blob at certain time points before saccade onset (unpublished phenomenological accounts). In consequence, fed-back signals do not seem to require correlated feedforward input to influence perception. Quantitative evidence, however, remains to be established.

      Lastly, pre-saccadic foveal input is likely less relevant during natural viewing behavior than it is in our task. It is possible that this task-induced prioritization of the foveal location facilitated the emergence of congruency effects. In a previous experiment (Kroell & Rolfs, 2022; Figure 1D), however, the perceptual probe could appear anywhere on a horizontal axis of 9 dva length around the fixation location. Despite this spatial unpredictability, congruency effects peaked at the presaccadic foveal location, even after peripheral baseline performances had been raised to a foveal level through an adaptive increase in probe opacity. On a similar note, the orientation of the saccade target is irrelevant to the behavioral task in our design, mirroring naturalistic situations: The eye movement can be planned and executed based on local contrast variations alone, and observers are never required to report on the orientation of the peripheral target stimulus. Ultimately, however, an influence of task demands on visual processing can only be fully excluded through techniques that provide a direct readout of perceptual contents without requiring overt responses. In psychophysical investigations, a prediction of saccade target motion may be read out from observers’ eye velocities (Kroell, Mitchell, & Rolfs, 2023; Kwon, Rolfs, & Mitchell, 2019). In electroencephalographic (EEG) and electrophysiological studies, foveal predictions should manifest in early visually evoked potentials (e.g., Creel, 2019) and increased firing rates of featureselective foveal neurons in early visual areas, respectively. In conclusion, previous findings (Williams et al., 2008), the assumed properties of the neuronal feedback mechanism (Williams et al., 2008; Bullier, 2001) and characteristics of our current and previous experimental paradigms collectively suggest that foveal feature predictions are likely to transfer to naturalistic environments and viewing situations. Experimental evidence remains to be established.”

      We have furthermore modified the Abstract to emphasize the connection of the current manuscript to the main article.

      With respect to the reviewer’s point that “speculations regarding feedforward vs feedback neural processing involved in the phenomenon and the speed of the feedforward signal in relation to the visibility of the target, are not well justified”: 

      Again, we understand that we should have elaborated on our theoretical reasoning in this Research Advance. The assumption that our initial findings rely on neuronal feedback to foveal retinotopic cortex is derived from Williams et al.’s (2008) seminal findings: In an fMRI study, the category of peripherally presented objects could be decoded from voxels in foveal retinotopic cortex, suggesting that peripheral visual information was available to neurons with strictly foveal receptive fields. We extended these findings to saccade preparation, suggesting that feedback from higher-order, non-retinotopically organized visual areas may transmit information without the requirement of efference copies (see Kroell, 2023; Dissertation; https://doi.org/10.18452/27204, pp. 54-59): Irrespective of the vector of the upcoming saccade, the features of the attended saccade target would invariably be relayed to foveal retinotopic cortex. Ultimately, only anatomical and functional studies in non-human primates can conclusively establish the role of feedback connections in the observed foveal prediction effects. At present, however, this parsimonious model could account for all of our current and previous findings, that is, a temporally, spatially and feature-specific anticipation of saccade target properties in the presaccadic center of gaze. Nonetheless, we are open to considering any other mechanism that may account for our findings, and have integrated the explanation provided by the reviewer into the paragraph on potential thalamic mechanisms (see the reviewer’s Major Point 1).

      Concerning the point that the “some speculations regarding feedforward vs feedback neural processing […] and the speed of the feedforward signal in relation to the visibility of the target are not well justified and not clearly supported by the data”: 

      Theoretical considerations on the impact of peripheral target contrast on feedforward processing speed were a main motivation for the current study. We apologize if our theoretical reasoning was incomplete and have added additional references and elaborations to the Introduction: 

      “In particular, neuronal response latencies decrease systematically as the contrast of visual input increases. While this phenomenon is reliably observed at varying stages of the visual processing hierarchy—such as the lateral geniculate nucleus (Lee, Elepfandt, & Virsu, 1981b), primary visual cortex (e.g., Albrecht, 1995; Carandini & Heeger, 1994; Carandini, Heeger, & Movshon, 1997; Carandini, Heeger, & Senn, 2002), and anterior superior temporal sulcus (STSa; Oram, Xiao, Dritschel, & Payne, 2002; van Rossum, van der Meer, Xiao, & Oram, 2008)—influences of contrast on neuronal response latency are particularly pronounced in higher-order visual areas: A doubling of stimulus contrast has been shown to decrease the latency of V1 neurons by 8 ms, compared to a reduction of 33 ms in area STSa (Oram et al., 2002; van Rossum et al., 2008). Assuming that the peripheral target is processed in a bottom-up fashion until it reaches higher-order object processing areas, the time point at which peripheral signals are available for feedback should be dictated by the temporal dynamics of visual feedforward processing.”

      Concerning the interpretation of the observed time courses, and regarding the reviewer’s Major points 3 & 6, we substantially revised the Results and Discussion section. In brief, we deemphasized the claim/interpretation of faster enhancement with increasing target opacity and instead focus on describing the oscillatory pattern mentioned by the reviewer. We provide a more temporally resolved pre-saccadic time course using a moving-window analysis and discuss all suggested and further alternative explanations (i.e., saccade-locked perceptual or attentional oscillations, longer signal accumulation intervals for low-contrast information, oscillatory nature of feedback signaling). Details and full revised paragraphs are provided in the response to this reviewer’s Major points 3 & 6.

      Unfortunately, there is no line numbering in the manuscript version I downloaded so I cannot refer to the specific lines of text here.

      We apologize for the inconvenience and have added line numbers.

      Major:

      (1) The authors speculate that the phenomenon of pre-saccadic foveal prediction arises from feedback connections from higher-order visual areas, which relay relevant saccade target features to the foveal retinotopic cortex. These feedback signals are then presumably combined with feedforward foveal input to the early visual cortex and facilitate the detection of target-congruent features at the center of gaze. This interpretation is sensible, however, it may not be the only plausible scenario. The thalamus receives copies of feedforward and feedback connections between all visual areas and is a likely candidate hub for combining information across visual space. In this latter case, the phenomenon of pre-saccadic foveal prediction may not arise from feedback from higher-order visual areas, but rather from a combination of signals occurring at the level of the thalamus. The authors should either acknowledge this possibility and the fact that this phenomenon is not necessarily the result of a feedback loop, or they should explain their rationale for excluding this scenario.

      We thank the reviewer for their highly thoughtful suggestion, and for alerting us to relevant literature. We have added the following paragraph to the Discussion section. In brief, we discuss the thalamic pulvinar as either an intermediate modulatory region or as the final receiver of the fed-back signal. Yet, we assume that—to solve the combinatorial issue associated with a transfer of feature information before saccades with any possible direction and amplitude—the contribution of non-retinotopic, higherorder object processing areas is likely required. 

      “Neural implementation of foveal prediction

      Based on the body of our findings as well as previous literature, we suggested a parsimonious feedback mechanism to underly the observed effects: the preparation of a saccadic eye movement, and the concomitant shift of pre-saccadic attention (e.g., Kowler, Anderson, Dosher, & Blaser, 1995; Deubel & Schneider, 1996), selects the peripheral target stimulus among competing information. Higher-order visual areas feed selected feature input back to early retinotopic areas— specifically, to neurons with foveal receptive fields. Fed-back feature information combines with congruent, foveal feedforward input, resulting in the enhancement effects we observe. Especially in the context of active vision, this feedback mechanism is appealing as it resolves a combinatorial issue associated with feature-specific information transfer before saccades. Consider a simplified case in which, right before a saccadic eye movement, the activation of a feature-selective neuron that encodes a certain retinal location is transferred to a neuron within the same brain area that will encode said retinal location after saccade landing. For this mechanism to function for any possible saccade direction and amplitude, most neurons would need to be connected to most other neurons (or, in a simplified version, to neurons with foveal receptive fields) in a given brain area. Assuming an information transmission via feedback rather than horizontal connections significantly reduces this dimensionality: Higher-order visual areas that encode object properties (largely) detached from retinotopic or spatiotopic reference frames selectively transfer feature information to neurons with foveal receptive fields, irrespective of the vector of the upcoming saccade. This parsimonious mechanism would have shortcomings. In particular, foveal feedback should become less effective during saccade sequences where several peripheral targets are simultaneously attended. Feature information at both attended target locations may be fed back in temporal succession or weighted and erroneously combined into a single fed-back signal. In most cases, however, foveal feedback may reasonably achieve what established transsaccadic mechanisms struggle to explain: An anticipation of the features of a single saccade target—which typically constitutes the currently most relevant object in the visual field—in foveal vision. 

      While direct feedback connections from higher-order to early visual areas would constitute the most straightforward implementation, it is conceivable that feedback signals are relayed through and modulated by subcortical areas. In particular, the thalamic pulvinar has been identified as a connection hub for visual processing that receives copies of feedforward and feedback connections from different visual areas and may even combine information across visual space (Cortes, Ladret, Abbas-Farishta, & Casanova, 2024). In the case of foveal prediction, thalamic neurons may receive fed-back signals from higher-order areas and enhance those signals before passing them on to cortical neurons with foveal receptive fields. Perhaps, a modification of foveal activation within the thalamic pulvinar itself is sufficient to influence perception. To the best of our understanding, however, the fed-back signal must originate in non-retinotopic, higher-order object processing areas to reduce the number of necessary neuronal connections.”

      (2) The results presented are very compelling. I wonder to which extent they generalize to situations in which the foveal input and the peripheral input are more heterogenous (e.g., faces or complex objects composed of many different features, orientations, and other visual properties). I think the current research raises a number of interesting questions. In general, it would be important for the readers to elaborate more on how the mechanism of pre-saccadic foveal prediction may play out in normal viewing conditions or in conditions in which the foveal input is completely irrelevant to the task.

      We agree and have reiterated this point in the current manuscript (see our first reply to “Weaknesses”). We also explicitly refer to Kroell & Rolfs (2022) for an extensive initial discussion of this question.

      (3) On page 10 the authors state that their data suggest that foveal enhancement emerges in earlier stages of saccade preparation as target opacity increases. However, this is not clear from the figures, when performance is locked to saccade onset (Fig 3 C), for the highest opacity targets performance seems to oscillate, however, the authors do not comment on that. There is literature showing how saccades can reset perceptual oscillations, and maybe what is observed here is just a stronger performance oscillation when the saccade target is more visible. Why would performance drop systematically 75 ms before saccade onset and then increase again 25 ms before the onset? Can the authors elaborate more on this?

      In response to this comment, we inspected the pre-saccadic time course of enhancement effects in a more temporally resolved fashion and, indeed, observed pronounced oscillations for the two higher target opacity conditions (see Results): 

      “Especially at higher target opacities, the temporal development of foveal enhancement appears to exhibit an oscillatory pattern. To inspect this incidental observation in a more temporally resolved fashion, we determined mean enhancement values in a boxcar window of 50 ms duration sliding along all saccade-locked probe offset time points (step size = 10 ms; x-axis values in Figure 4 indicate the latest time point in a certain window). We then fitted 6th order polynomials (with no constraints on parameters) to the resulting time courses and compared the fitted values against zero using bootstrapping (see Methods). The average foveal enhancement across target opacities reached significance starting 115 ms before saccade onset (gray curve in Figure 4; all ps < .046). For every individual target opacity condition, we observed significant enhancement immediately before saccade onset, although only very briefly for the lowest opacity (-2–0 ms for 25%; -39–0 ms for 39%, -106–0 ms for 59% &  -13–0 ms for 90%; all ps < .050; yellow to dark red curves in Figure 4). Especially for the higher two target opacities, we observed a local maximum preceding eye movement onset by approximately 80 ms. Interestingly, assuming a peak in enhancement in approximately 80 ms intervals (i.e., at x-axis values of -80 and 0 ms in Figure 4) would correspond to an oscillation frequency of 12.5 Hz. In contrast to rapid feedforward processing, feedback signaling is associated with neural oscillations in the alpha and beta range (i.e., between 7 and 30 Hz; Bastos et al., 2015; Jensen, Bonnefond, Marshall, & Tiesinga, 2015; van Kerkoerle et al., 2015).”

      We had observed an oscillatory pattern in multiple previous investigations, and in both Hit Rates to foveal orientation content and reflexive gaze velocities in response to peripheral motion information. So far, we have been unsure how to explain it. The literature on thalamic visual processing mentioned by the reviewer alerted us to the oscillatory nature of feedback signaling itself. Interestingly, the temporal frequency range of feedback oscillations includes the frequency of ~12.5 Hz observed in our data. We have included this and alternative explanations in the Discussion section (see below). Throughout, we highlight that we are aware that our analysis approach is purely descriptive and that the potential explanations we give are speculative.

      “Moreover, foveal congruency effects appear to exhibit an oscillatory pattern, with peaks in a medium saccade preparation stage (~80 ms before the eye movement) and immediately before saccade onset. We have noticed this pattern in several investigations with substantially different visual stimuli and behavioral readouts. For instance, using a full-screen dot motion paradigm, we observed a pre-saccadic, small-gain ocular following response to coherent motion in the saccade target region (Kroell, Rolfs, & Mitchell, 2023, conference abstract; Kroell, 2023, dissertation). Predictive ocular following first reached significance ~125 ms before the eye movement, then decreased and subsequently ramped up again ~25 ms before saccade onset. Several explanatory mechanisms appear conceivable. Unlike rapid feedforward processing, feedback propagation has been shown to follow an oscillatory rhythm in the alpha and beta range, that is, between 7 and 30 Hz (Bastos et al., 2015; Jensen, Bonnefond, Marshall, & Tiesinga, 2015; van Kerkoerle, et al., 2015). In our case, it is possible that the object-processing areas that send feedback to retinotopic visual cortex do so at a temporal frequency of ~12.5 Hz. At higher stimulus contrasts, feedforward signals may be fed back instantaneously and without the need for signal accumulation in feedbackgenerating areas. The resulting perceptual time courses may reflect innate temporal feedback properties most veridically. Alternatively, the initial enhancement peak may be related to the sudden onset of the saccade target stimulus and not to movement preparation itself. In this case, the initial peak should become particularly apparent if enhancement is aligned to the onset of the target stimulus. Yet, Figure 3 and Figure 4 suggest more prominent oscillations in saccade-locked time courses. In accordance with this, perceptual and attentional processes have been shown to exhibit oscillatory modulations that are phase-locked to action onset (e.g., Tomassini, Spinelli, Jacono, Sandini, & Morrone, 2015; Hogendoorn, 2016; Wutz, Muschter, van Koningsbruggen, Weisz, & Melcher, 2016; Benedetto & Morrone, 2017; Tomassini, Ambrogioni, Medendorp, & Maris, 2017; Benedetto, Morrone & Tomassini, 2019). Whether the oscillatory pattern of foveal enhancement, as well as its increased prominence at higher target contrasts, relies on innate temporal properties of feedback signaling, signal accumulation, saccade-locked oscillatory modulations of feedforward processing or attention, or a combination of these factors, one conclusion remains: task-induced cognitive influences suggested to underlie the considerable variability in temporal characteristics of foveal feedback during passive fixation (e.g., Fan et al., 2016; Weldon et al., 2016; 2020) are not the only possible explanation. Low-level target properties such as its luminance contrast modulate the resulting time course and should be equally considered, at least in our paradigm.”

      In the revised Abstract, we removed our claim on an earlier emergence of enhancement at higher opacities and have added this summary instead:

      “Second, the time course of foveal enhancement appeared to show an oscillatory pattern that was particularly pronounced at higher target opacities. Interestingly, the temporal frequency of these oscillations corresponded to the frequency range typically associated with neural feedback signaling.”

      (4) What was the average difference in latency between short and long latencies? It would be good to report it in the main text.

      We apologize for the oversight. The difference was 61 ms, with latencies of md = 247±18 ms for short- and md = 308±18 ms for long-latency saccades. We have added this information to the main text.

      (5) From the saccade latency graphs in Figure S1 it seems there is some variability in the latency of saccades across subjects, I wonder if there is a correlation between saccade latency and the magnitude of the foveal prediction effect across subjects.

      We had inspected a connection between saccade latency and congruency in our first investigation (Kroell & Rolfs, 2022; not reported) and observed that participants with lower latencies tended to show more enhancement, albeit non-significantly. Likewise, we observed a non-significant negative correlation between the median saccade latency and the mean foveal prediction effect (across opacities and time points) in the current investigation, r \= -0.22, p \= .572. While our study involved a small number of observers (n = 9), the analysis approach illustrated in Figure 2 A-C instead makes use of the large number of trials collected per participant (mean n = 2841 trials per observer) and demonstrates a reliable influence of saccade latency on an individual-observer level.

      (6) Page 14, the authors state that their findings suggest that the feedforward processing of the peripheral saccade target is accelerated when it is presented at high contrast. I find this a bit too speculative, both in terms of assuming that there is a feedforward vs a feedback process (see my point 1) and in terms of speculating that the feedforward process is accelerated as I do not see a clear hint of this in the data (see my point 3) and it is a bit of a stretch to speculate on delays or accelerations of neural processing. It is possible that the feedforward signal is always delivered at the same speed but it is weaker in one case and the effect needs more time to build up.

      We fully agree and hope to have addressed the reviewer’s arguments in the sections preceding this point. We included the reviewer’s last sentence in the Discussion section as well: 

      “Alternatively, or in addition, it is conceivable that weaker feedforward signals require a longer accumulation interval before the feedback process can be initiated.”

      Minor:

      (1) I think the description of the linear mixed-effects model can go in the supplemental methods, if possible, and its results can be briefly mentioned in the text.

      In previous work, we have been asked to move linear mixed-effects model descriptions from supplemental to main method (or even results) sections for clarity. We have followed this suggestion ever since and, due to the relevance of the models for the interpretation of the presented results, would like to keep their description in the methods section.

      (2) This is just a minor point, but I would suggest using a different word instead of opacity (maybe visibility?).

      We had gone back and forth on this. We decided to use the term ‘conspicuity’ when we discuss our findings conceptually and the term ‘opacity’ when we refer to the experimental manipulation (since we directly manipulate the transparency, i.e., 1-opacity, of the target patch against the background). To compute the slopes in Figures 2 and 5, we ordered observers’ performances by the linearly spaced opacity conditions. Since the term ‘opacity’ is closest to both the experimental manipulation and the variable entered into analysis, we would like to adhere to this terminology. However, we have added an explicit note to the end of our introduction to avoid confusion: 

      “Throughout the paper, we use the term ‘opacity’ when we refer to the experimental manipulation (that is, a variation of the transparency, i.e., 1-opacity of the target patch against the background noise) and the term ‘conspicuity’ when we discuss our findings conceptually.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors ran a dual task. Subjects monitored a peripheral location for a target onset (to generate a saccade to), and they also monitored a foveal location for a foveal probe. The foveal probe could be congruent or incongruent with the orientation of the peripheral target. In this study, the authors manipulated the conspicuity of the peripheral target, and they saw changes in performance in the foveal task. However, the changes were somewhat counterintuitive.

      Strengths:

      The authors use solid analysis methods and careful experimental design.

      Weaknesses:

      I have some issues with the interpretation of the results, as explained below. In general, I feel that a lot of effects are being explained by attention and target-probe onset asynchrony etc, but this seems to be against the idea put forth by the authors of "foveal prediction for visual continuity across saccades". Why would foveal prediction be so dependent on such other processes? This needs to be better clarified and justified.

      We address the described weaknesses in the respective sections below. In general, as we point out in response to Reviewer 1 as well, the current submission is a Research Advance article meant to supplement our main article (Kroell & Rolfs, 2022, https://doi.org/10.7554/eLife.78106). To comply with the eLife recommendations for Research Advance submissions, we addressed conceptual points only briefly, especially if they had been explained in detail in our main article. To make the nature and format of the current submission as explicit as possible, and to emphasize its connection to our previous work, we refer to the submission format in our abstract and introduction now.

      Specifics:

      The explanation of decreased hit rates with increased peripheral target opacity is not convincing. The authors suggest that higher contrast stimuli in the periphery attract attention. But, then, why are the foveal results occurring earlier (as per the later descriptions in the manuscript)? And, more importantly, why would foveal prediction need to be weaker with stronger pre-saccadic attention to the periphery? What is the function of foveal prediction? What of the other interpretation that could be invoked in general for this type of task used by the authors: that the dual task is challenging and that subjects somehow misattribute what they saw in the peripheral task when planning the saccade. i.e. foveal hit rates are misperceptions of the peripheral target. When the peripheral target is easier to see, then the foveal hit rate drops.

      We will address these comments one by one:

      The authors suggest that higher contrast stimuli in the periphery attract attention. But, then, why are the foveal results occurring earlier (as per the later descriptions in the manuscript)?

      We consider these observations to rely on separate processes. Already in the main publication (Kroell & Rolfs, 2022), we had observed a continuous decrease of target-congruent and target-incongruent foveal Hit Rates (HRs) during saccade preparation, and suggested that this decrease (similarly observed in Hanning & Deubel, 2022b is likely caused by the pre-saccadic shift of visuospatial attention to the target. In other words, as attentional resources shift towards the periphery, foveal detection performance is hampered, irrespective of peripheral and foveal feature (in-)congruency. In the current investigation, we again observed a pronounced pre-saccadic decrease of foveal HRs, irrespective of foveal probe orientation. Our argument that high-contrast peripheral saccade targets attract more attention relies on the clear observation that this decrease becomes more pronounced as the contrast of the saccade target increases. To the best of our judgment and experience with doing the task ourselves, this interpretation appears very conceivable. We explain this rationale in the Abstract and the Results sections of the manuscript (see below).

      Our hypotheses and interpretations concerning the time course of foveal prediction refer to the difference between target-congruent and target-incongruent foveal HRs (i.e., to predictive foveal feature enhancement). Irrespective of the general, feature-unspecific decrease of foveal detection performances, we had hypothesized that the peripheral target is processed faster if it exhibits a high contrast. This assumption is based on temporal processing properties of many visual neurons that we have expanded on in our revision: 

      “In particular, neuronal response latencies decrease systematically as the contrast of visual input increases. While this phenomenon is reliably observed at varying stages of the visual processing hierarchy—such as the lateral geniculate nucleus (Lee et al., 1981b), primary visual cortex (e.g., Albrecht, 1995; Carandini et al., 1997, 2002; Carandini and Heeger, 1994), and anterior superior temporal sulcus (STSa; Oram et al., 2002; van Rossum et al., 2008)— influences of contrast on neuronal response latency are particularly pronounced in higher-order visual areas: A doubling of stimulus contrast has been shown to decrease the latency of V1 neurons by 8 ms, compared to a reduction of 33 ms in area STSa (Oram et al., 2002; van Rossum et al., 2008). Assuming that the peripheral target is processed in a bottom-up fashion until it reaches higher-order object processing areas, the time point at which peripheral signals are available for feedback should be dictated by the temporal dynamics of visual feedforward processing.”

      Of note, both reviewers asked us to explore the oscillatory nature of the difference between targetcongruent and target-incongruent HRs. We will post our changes in response to the reviewer’s remark below.

      And, more importantly, why would foveal prediction need to be weaker with stronger pre-saccadic attention to the periphery?

      We hope that our previous reply has cleared up that the opposite is true: In general, and irrespective of the feature congruency of target and foveal probe, foveal HRs decrease as target contrast increases. As we have stated in our Abstract and Results, “foveal Hit Rates for target-congruent and incongruent probes decreased as target opacity increased, presumably since attention was increasingly drawn to the target the more salient it became. Crucially, foveal enhancement defined as the difference between congruent and incongruent Hit Rates increased with opacity”. This finding did not appear counterintuitive to us and was, in fact pre-registered as a main hypothesis (see https://osf.io/wceba). 

      We are unsure if this goes beyond the reviewer’s concern but we, in fact, speculate in the revised Discussion section as well as in our original eLife article that the overall, feature-unspecific decrease in foveal detection performances may aid feature-specific foveal prediction: 

      “This pre-saccadic decrease in foveal sensitivity may boost the relative weight of fed-back signals by attenuating the conspicuity of high-contrast feedforward input. In other words, the strength of feedforward input to the fovea is reduced gradually across saccade preparation. At the same time, the strength of the fed-back predictive signal should profit from the high contrast of naturalistic saccade targets.”

      What is the function of foveal prediction?

      Please refer to the section ‘What is the function of foveal prediction?’ in our main article. We have pasted this paragraph below for the reviewer’s convenience. 

      “What is the function of foveal prediction?

      As stated above, previous investigations on foveal feedback required observers to make peripheral discrimination judgments. We, in contrast, did not ask observers to generate a perceptual judgment on the orientation of the saccade target. Instead, detecting the target was necessary to perform the oculomotor task. While the identification of local contrast changes would have sufficed to direct the eye movement, the orientation of the target enhanced foveal processing of congruent orientations. The automatic nature of foveal enhancement showcases that perceptual and oculomotor processing are tightly intertwined in active visual settings: planning an eye movement appears to prioritize the features of its target; commencing the processing of these features before the eye movement is executed may accelerate post- saccadic target identification and ultimately provide a head start for corrective gaze behavior (Deubel et al., 1982; Ohl and Kliegl, 2016; Tian et al., 2013).”

      What of the other interpretation that could be invoked in general for this type of task used by the authors: that the dual task is challenging and that subjects somehow misattribute what they saw in the peripheral task when planning the saccade. i.e. foveal hit rates are misperceptions of the peripheral target. When the peripheral target is easier to see, then the foveal hit rate drops.

      Alternative explanations in general: In our main article, we ruled out—either through direct experimentation or by considering relevant properties of our findings—the following alternative explanations: i) spatially global feature-based attention to the target orientation, ii) a multiplicative combination of spatial and feature-based attention, and iii) shifts of decision criterion. While dual tasks (i.e., simultaneous oculomotor planning and perceptual detection) are standard in psychophysical investigations of active vision, we acknowledge the potential influence of an explicit foveal task in the revised manuscript, and in response to both reviewers: 

      “Lastly, pre-saccadic foveal input is likely less relevant during natural viewing behavior than it is in our task. It is possible that this task-induced prioritization of the foveal location facilitated the emergence of congruency effects. In a previous experiment (Kroell & Rolfs, 2022; Figure 2D), the perceptual probe could appear anywhere on a horizontal axis of 9 dva length around the screen center. Despite this spatial unpredictability, however, congruency effects peaked at the pre-saccadic foveal location, even after peripheral baseline performances had been raised to a foveal level through an adaptive increase in probe opacity. Ultimately, an influence of task demands on visual processing can only be fully excluded through techniques that provide a direct readout of perceptual contents without requiring keyboard responses. In psychophysical investigations, a prediction of saccade target motion may be read out from observers’ eye velocities (Kroell, Mitchell, & Rolfs, 2023; Kwon, Rolfs, & Mitchell, 2019). In electroencephalographic (EEG) and neurophysiological studies, foveal predictions should manifest in early visual evoked potentials (e.g., Creel, 2019) and increased firing rates of feature-selective foveal neurons in early visual areas, respectively.”

      Difficulty of the task: Concerning the perceptual detection task, every experimental session was preceded by an adaptive staircase procedure that adjusted the transparancy of the foveal probe—and, thus, task difficulty—depending on the respective observer’s performance (see Methods for details). Concerning the oculomotor task, observers were able to perform accurate saccades with typical movement latencies for all target opacity conditions (see Results, Supplements & Figure S1). In general, we are unsure how high task difficulty could produce a feature-, temporally and spatially specific enhancement of both filtered and incidental target-congruent foveal orientation information. In fact, a main finding of our current submission is that foveal HRs decrease as the target becomes easier to see and the oculomotor task thus becomes easier to perform.

      Perceptual confusion of target and probe stimulus: We observe a specific increase in HRs for foveal probes that exhibit the same orientation as the peripheral saccade target. Just like in our main article, a response is defined as a ‘Hit’ if a foveal probe is presented and the observer generates a ‘present’ judgment. To our understanding, the suggestion that a confusion of target and probe stimuli may account for these effects necessarily implies that this confusion hinges on the congruency between peripheral and foveal feature inputs. In other words, peripheral and foveal signals should be more readily “confused” if they exhibit similar features. We assume that peripheral feature information is fed back to neurons with foveal receptive field and combines with feature-congruent feedforward input. Whether this combination of signals can be described as low-level perceptual “confusion” likely depends on individual linguistic judgments (it would certainly be a novel description of feedback-feedforward interactions). Perhaps a defining difference between the reviewer’s concern and our assumed mechanism is the spatial specificity of the resulting congruency effects. We suggest that only neurons with foveal receptive fields receive feature information via feedback. And indeed, we demonstrate a clear spatial specificity of congruency effects around the pre-saccadic foveal location, even after parafoveal performances had been raised to a foveal level by an adaptive increase in probe opacity (see Kroell & Rolfs, 2022; Figure 2C & Figure 3). In other words, observers’ perception is altered in their pre-saccadic center of gaze while the target is presented peripherally. We struggle to conceive a

      scenario in which a confusion of signals should be feature-specific as well as specific to an interaction between peripheral and foveal signals without being meaningful at the same time. If the reviewer is referring to confusions on the response or decision level, we would like to point them towards the Discussion section ‘Can our findings be explained by established mechanisms other than foveal prediction?’ in our main article. In this paragraph, we provide detailed arguments for a dissociation between our findings and shifts in decision criterion that would exceed the scope of a Research Advance. 

      When the peripheral target is easier to see, then the foveal hit rate drops.

      We agree. Target-congruent and incongruent foveal HRs decreased as the contrast of the probe increased. However, and as we stated in response to the reviewer’s first comment, the difference between target-congruent and target-incongruent foveal HRs (and, thus, foveal enhancement of the target orientation) increased with peripheral target contrast.

      The analyses of Fig. 3C appear to be overly convoluted. They also imply an acknowledgment by the authors that target-probe temporal difference matters. Doesn't this already negate the idea that the foveal effects are associated with the saccade generation process itself? If the effect is related to target onset, how is it interpreted as related to a foveal prediction that is associated with the saccade itself? 

      We indeed conducted analyses that can reveal an influence of target presentation duration at probe onset, the saccade preparation stage at probe offset, as well as a combination of both factors. The fact that target presentation duration may have an influence on foveal prediction would not negate a simultanous influence of saccade preparation and vice versa. In the main article, we directly investigated the influence of saccade preparation on foveal enhancement by introducing a passive fixation condition (Kroell & Rolfs, 2022; Figure 5). At identical target-probe offset durations, pre-saccadic foveal enhancement was significantly more pronounced and accelerated compared to enhancement during passive fixation. We have added a purely saccade-locked time course (uncorrected by targetprobe interval) to our Results section and to Figure 3 (second row). We still believe that the target-locked, saccade-locked and combined analysis are informative for future investigations and would like to present them all for completeness.

      Also, the oscillatory nature of the effect in Fig. 3C for 59% and 90% opacity is quite confusing and not addressed. The authors simply state that enhancement occurs earlier before the saccade for higher contrasts. But, this is not entirely true. The enhancement emerges then disappears and then emerges again leading up to the saccade. Why would foveal prediction do that?

      In response to this comment and a suggestion by Reviewer 1, we inspected the pre-saccadic time course of enhancement effects in a more temporally resolved fashion and, indeed, observed pronounced oscillations for the two higher target opacity conditions (see Results): 

      “Especially at higher target opacities, the temporal development of foveal enhancement appears to exhibit an oscillatory pattern. To inspect this incidental observation in a more temporally resolved fashion, we determined mean enhancement values in a boxcar window of 50 ms duration sliding along all saccade-locked probe offset time points (step size = 10 ms; x-axis values in Figure 4 indicate the latest time point in a certain window). We then fitted 6th order polynomials to the resulting time courses and compared the fitted values against zero using bootstrapping (see Methods). The average foveal enhancement across target opacities reached significance starting 115 ms before saccade onset (gray curve in Figure 4; all ps < .046). For every individual target opacity condition, we observed significant enhancement immediately before saccade onset, although only very briefly for the lowest opacity (-2–0 ms for 25%; -39–0 ms for 39%, -106–0 ms for 59% &  -13–0 ms for 90%; all ps < .050; yellow to dark red curves in Figure 4). Especially for the higher two target opacities, we observed a local maximum preceding eye movement onset by approximately 80 ms. Interestingly, assuming a peak in enhancement in approximately 80 ms intervals (i.e., at x-axis values of -80 and 0 ms in Figure 4) would correspond to an oscillation frequency of 12.5 Hz. In contrast to rapid feedforward processing, feedback signaling is associated with neural oscillations in the alpha and beta range (i.e., between 7 and 30 Hz; Bastos et al., 2015; Jensen, Bonnefond, Marshall, & Tiesinga, 2015; van Kerkoerle et al., 2015).”

      We had observed an oscillatory pattern in multiple previous investigations, and in both Hit Rates to foveal orientation content and reflexive gaze velocities in response to peripheral motion information. So far, we have been unsure how to explain it. The literature on thalamic visual processing mentioned by the reviewer alerted us to the oscillatory nature of feedback signaling itself. Interestingly, the temporal frequency range of feedback oscillations includes the frequency of ~12.5 Hz observed in our data. We have included this and alternative explanations in the Discussion section (see below). We are aware, and acknowledge in the manuscript, that our analysis approach is purely descriptive, and that the potential explanations we give are speculative. 

      “Moreover, foveal congruency effects appeared to exhibit an oscillatory pattern, with peaks in a medium saccade preparation stage (~80 ms before the eye movement) and immediately before saccade onset. We have noticed this pattern in several investigations with substantially different visual stimuli and behavioral readouts. For instance, using a full-screen dot motion paradigm, we observed a pre-saccadic, small-gain ocular following response to coherent motion in the saccade target region (Kroell, Rolfs, & Mitchell, 2023, conference abstract; Kroell, 2023, dissertation). Predictive ocular following first reached significance ~125 ms before the eye movement, then decreased and subsequently ramped up again ~25 ms before saccade onset. Several explanatory mechanisms appear conceivable. Unlike rapid feedforward processing, feedback propagation has been shown to follow an oscillatory rhythm in the alpha and beta range, that is, between 7 and 30 Hz (Bastos et al., 2015; Jensen, Bonnefond, Marshall, & Tiesinga, 2015; van Kerkoerle, et al., 2015). In our case, it is possible that the object-processing areas that send feedback to retinotopic visual cortex do so at a temporal frequency of ~12.5 Hz. At higher stimulus contrasts, feedforward signals may be fed back instantaneously and without the need for signal accumulation in feedback-generating areas. The resulting perceptual time courses may reflect innate temporal feedback properties most veridically. Alternatively, the initial enhancement peak may be related to the sudden onset of the saccade target stimulus and not to movement preparation itself. In this case, the initial peak should become particularly apparent if enhancement is aligned to the onset of the target stimulus. Yet, Figure 3 and Figure 4 suggest more prominent oscillations in saccade-locked time courses. In accordance with this, perceptual and attention processes have been shown to exhibit oscillatory modulations that are phase-locked to action onset (e.g., Tomassini, Spinelli, Jacono, Sandini, & Morrone, 2015; Hogendoorn, 2016; Wutz, Muschter, van Koningsbruggen, Weisz, & Melcher, 2016; Benedetto & Morrone, 2017; Tomassini, Ambrogioni, Medendorp, & Maris, 2017; Benedetto, Morrone & Tomassini, 2019). Whether the oscillatory pattern of foveal enhancement, as well as its increased prominence at higher target contrasts, relies on innate temporal properties of feedback  signaling, signal accumulation, saccade-locked oscillatory modulations of feedforward processing or attention, or a combination of these factors, one conclusion remains: task-induced cognitive influences suggested to underlie the considerable variability in temporal characteristics of foveal feedback during passive fixation (e.g., Fan et al., 2016; Weldon et al., 2016; 2020) are not the only possible explanation. Low-level target properties such as its luminance contrast modulate the resulting time course and should be equally considered, at least in our paradigm.”

      The interpretation of Fig. 4 is also confusing. Doesn't the longer latency already account for the lapse in attention, such that visual continuity can proceed normally now that the saccade is actually eventually made? In all results, it seems that the effects are all related to the dual nature of the task and/or attention, rather than to the act of making the saccade itself. Why should visual continuity (when a saccade is actually made, whether with short or long latency) have different "fidelity"? And, isn't this disruptive to the whole idea of visual continuity in the first place?

      We are unsure if we grasp the unifying concern behind these remarks. For the reviewer’s point on the dual-task nature of our paradigm, please consider our answer above. Perhaps it is important to note that we do not (and would never) claim that foveal prediction is the only mechanism underlying visual continuity. We believe that multiple mechanisms, including but not limited to pre-saccadic shifts of attention, predictive remapping of attention pointers and the perception of intra-saccadic signals interact and jointly contribute to visual continuity. It appears highly conceivable that, like most processes in biological systems, motor and perceptual performances are subject to fluctuations. We argue that saccade latencies as well as the magnitude of foveal prediction constitute read-outs of these variations. We also suggest that those read-outs are innately correlated beyond their common moderator of, perhaps, attentional state; we have previously presented clear evidence for a link between eye movement preparation and foveal prediciton (Kroell & Rolfs, 2022; Figure 2). To the best of our judgment, we consider it reasonable that the effectiveness of movement-contingent perceptual processes varies with the effectiveness (in programming or execution) of the very movement motivating them. We present evidence for this assumption in our submission. We would also like to make clear that we do not assume our vision to fail entirely, even if every single well-known mechanism of visual continuity were to break down at once. Upon saccade landing, the visual system receives reliable visual input. Nonetheless, the visual system has undeniably developed mechanisms to optimize this process. We believe foveal prediciton to rank among them.

      Small question: is it just me or does the data in general seem to be too excessively smoothed?

      We did not apply any smoothing to either the analysis or visualization of our data in the initial manuscript.

      Every observer completed a large number of trials (mean n = 2841 trials per observer; total trial number > 25,500), which likely contributes to the clarity of our data. To inspect the oscillatory pattern of enhancement in a more temporally resolved fashion (in response to the reviewer’s point above), we applied a moving window analysis in this revision. Due to overlapping window borders, this analysis introduces a certain degree of smoothing. Nonetheless, data patterns are comparable to the time course with only few non-overlapping time bins (Figure 3B; second row). In general, we have described all steps of our analysis routine extensively in the Methods section and will make our data publicly available upon publication of the Reviewed Preprint. 

      General comment: it is important to include line numbers in manuscripts, to help reviewers point to specific parts of the text when writing their comments. Otherwise, the peer review process is rendered unnecessarily complicated for the reviewers.

      We apologize and have added line numbers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The study presents important findings on inositol-requiring enzyme (IRE1α) inhibition on diet-induced obesity (overnutrition) and insulin resistance where IRE1α inhibition enhances thermogenesis and reduces the metabolically active and M1-like macrophages in adipose tissue. The evidence supporting the conclusions is convincing but can be enhanced with information/data on the validity, specificity, selectivity, and toxicity of the IRE1α inhibitor and supported with more detail on the mechanisms by which adipose tissue macrophages influence adipocyte metabolism. The work will be of interest to cell biologists and biochemists working in metabolism, insulin resistance, and inflammation.

      We thank the editors for the assessment and appreciation of our findings in this study. In the revision, we have added the information on the validity, selectivity and toxicity of IRE1α inhibitor. In addition, we also discussed the likelihood that suppression of metabolically activated proinflammatory macrophage population in adipose tissue on the reversal of adipose remodeling and thermogenesis. In the revision, we have improved the manuscript significantly throughout the text and figures following the recommends by the reviewers.

      Public Reviews:

      Reviewer #1 (Public review):

      First, the authors confirm the up-regulation of the main genes involved in the three branches of the Unfolded Protein Response (UPR) system in diet-induced obese mice in AT, observations that have been extensively reported before. Not surprisingly, IRE1a inhibition with STF led to an amelioration of the obesity and insulin resistance of the animals. Moreover, non-alcoholic fatty liver disease was also improved by the treatment. More novel are their results in terms of thermogenesis and energy expenditure, where IRE1a seems to act via activation of brown AT. Finally, mice treated with STF exhibited significantly fewer metabolically active and M1-like macrophages in the AT compared to those under vehicle conditions. Overall, the authors conclude that targeting IRE1a has therapeutical potential for treating obesity and insulin resistance.

      The study has some strengths, such as the detailed characterization of the effect of STF in different fat depots and a thorough analysis of macrophage populations. However, the lack of novelty in the findings somewhat limits the study´s impact on the field.

      We thank the reviewer for the appreciation of our findings and the comments about the novelty. Regarding the novelty, we would emphasize several novelties presented in this manuscript. First, as the reviewer correctly pointed out, we discovered that IRE1 inhibition by STF activates brown AT and promotes thermogenesis and that IRE1 inhibition not only significantly attenuated the newly discovered CD9+ ATMs and the “M1-like” CD11c+ ATMs but also diminished the M2 ATMs for the first time. These discoveries are very important and novel. In obesity, it was originally proposed that ATM undergoes M1/M2 polarization from an anti-inflammatory M2 to a classical pro-inflammatory M1 state. It was further reported that IRE1 deletion improves thermogenesis by boosting M2 population which then synthesize and secrete catecholamines to promote thermogenesis. It is now known that M2 macrophages do not synthesize catecholamines or promote thermogenesis. In this study, we discovered that IRE1 inhibition doesn’t increase (but instead decrease) the M2 population and that IRE1 inhibition promotes thermogenesis likely by suppressing pro-inflammatory macrophage populations including the M1-like ATMs and most importantly the newly identified metabolically active macrophages, given that ATM inflammation has been reported to suppress thermogenesis. Second, this study presented the first characterization of relationship between the more classical M1-like ATMs and the newly discovered metabolically active ATMs, showing that the CD11c+ M1-like ATMs are largely overlapping with but yet non-identical to CD9+ ATMs in the eWAT under HFD. Third, although upregulation of ER stress response genes in the adipose tissues of diet-induced obese mice have been extensively reported, it doesn’t necessarily mean that targeting IRE1a or ER stress can reverse existing insulin resistance and obesity. It is not uncommon that a therapy doesn’t yield the desired effect as expected. For instance, amyloid plaques are a hallmark of Alzheimer's disease (AD), interventions that prevent or reverse beta amyloid deposition have been expected to prevent progression or even reverse cognitive impairment in AD patients. However, clinical trials on such therapies have been disappointing. In essence, experimental demonstration of effectiveness or feasibility for any potential therapeutic targets is a first step for any future clinical implementation.

      Reviewer #2 (Public review):

      The manuscript by Wu et al demonstrated that IRE1a inhibition mitigated insulin resistance and other comorbidities through increased energy expenditure in DIO mice. In this reviewer's opinion, this timely study has high significance in the field of metabolism research for the following reasons.

      (1) The authors' findings are significant and may offer a new therapeutic target to treat metabolic diseases, including diabetes, obesity, NAFLD, etc.

      (2) The authors carefully profiled the ATMs and examined the changes in gene expression after STF treatment.

      (3) The authors presented evidence collected from both systemic indirect calorimetry and individual tissue gene expression to support the notion of increased energy expenditure.

      Overall, the authors have presented sufficient background in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, and made a justified conclusion.

      We thank the reviewer for the appreciation of our work.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Wu D. et al. explores an innovative approach to immunometabolism and obesity by investigating the potential of targeting macrophage Inositol-requiring enzyme 1α (IRE1α) in cases of overnutrition. Their findings suggest that pharmacological inhibition of IRE1α could influence key aspects such as adipose tissue inflammation, insulin resistance, and thermogenesis. Notable discoveries include the identification of High-Fat Diet (HFD)-induced CD9+ Trem2+ macrophages and the reversal of metabolically active macrophages' activity with IRE1α inhibition using STF. These insights could significantly impact future obesity treatments.

      Strengths:

      The study's key strengths lie in its identification of specific macrophage subsets and the demonstration that inhibiting IRE1α can reverse the activity of these macrophages. This provides a potential new avenue for developing obesity treatments and contributes valuable knowledge to the field.

      Weaknesses:

      The research lacks an in-depth exploration of the broader metabolic mechanisms involved in controlling diet-induced obesity (DIO). Addressing this gap would strengthen the understanding of how targeting IRE1α might fit into the larger metabolic landscape.

      Impact and Utility:

      The findings have the potential to advance the field of obesity treatment by offering a novel target for intervention. However, further research is needed to fully elucidate the metabolic pathways involved and to confirm the long-term efficacy and safety of this approach. The methods and data presented are useful, but additional context and exploration are required for broader application and understanding.

      We thank the reviewer for the appreciation of strengths in our manuscript. In particular, we appreciate the reviewer’s recommendation on the exploration of broader metabolic landscape, such as the effect of IRE1 inhibition on non-adipose tissue macrophages and metabolism. We agree that achieving these will certainly broaden the therapeutic potential of IRE1 inhibition to larger metabolic disorders and we will pursue these explorations in future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A list of recommendations for the authors is presented below:

      (1) Please, update the literature review to include more recent studies relevant to the topic.

      We thank the reviewer’s suggestions. We have added more references from recent studies.

      (2) Please, provide a detailed explanation of how STF functions, including potential off-target effects or issues related to specificity.

      We thank the reviewer’s suggestions. STF is a small-molecule inhibitor designed to selectively inhibit the RNase activity of IRE1a. Once IRE1a is activated (e.g., in obesity), its RNase domain initiates the unconventional splicing of the transcription factor X-box binding protein 1 (XBP1) mRNA and the Regulated IRE1-Dependent Decay (RIDD) of microRNAs, which is detrimental if prolonged. IRE1a RNase inhibitors including STF engage the RNase-active site of IRE1a with high affinity and specificity by exploiting a shallow complementary pocket through pi-stacking interactions with His910 and Phe889 and an essential Schiff base interaction between the aldehyde moiety of the inhibitor and the side chain amino group of Lys907 (Sanches et al., NComm 2014, PMID: 25164867). This specific and high affinity binding blocks the IRE1a RNase activity, preventing the splicing of XBP1 mRNA and RIDD. As IRE1a has been shown to be activated in multiple tissues under various pathological conditions and to be responsible for the progression of the pathological conditions, inhibition of IRE1a by pharmacological agents including STF has the great potential for the treatment of various pathological disorders. Several studies have reported that STF shows no overt toxicity when administered systemically (Madhavan, Aparajita, et al.2022, PMID 35105890; Herlea-Pana et al., 2021, PMID 34675883; Papandreou et al., 2011, PMID 21081713; Tufanli et al., 2017, PMID 28137856).

      (3) Lines 263-266 require a reference.

      We thank the reviewer’s suggestion. A reference has been added.

      (4) Stromal vascular fraction (SVF) also contains a significant amount of preadipocytes and stem cells, not only macrophages, which might affect the conclusions reached by the authors.

      We thank the reviewer’s comments. It is true that SVF consists of multiple cell types, including endothelial cells, macrophages, preadipocytes, and various stem cell populations. In HFD-induced obesity, adipose tissue undergoes significant remodeling, and the percentage of macrophages in the SVF of obese adipose tissue increases significantly relative to other cell types. In our studies, SVFs from adipose tissues of obese mice were isolated, cultured, and treated with STF for overnight.  We observed that IRE1 RNase activity in SVFs was inhibited by STF treatment, and that ATM population and the expression of pro-inflammatory genes were downregulated by STF. Given the short-term treatment, the parsimonious interpretation of the data would be that STF directly acts on ATMs.  However, we note that the possibility that the effect of STF on other cell types might influence the ATM and inflammatory gene expression can’t be totally ruled out. As such, we have modified our conclusion from “these results indicate that STF acts directly on ATMs to regulate inflammation” to “these results indicate that STF likely acts directly on ATMs to regulate inflammation”.

      (5) Figures 1A and G: It is common practice to present the XBP1s/XBP1u ratio; consider using this standard measure.

      We thank the reviewer’s comments. Regarding the XBP1 mRNA splicing, we see both ways of presentation in publications. There are quite a number of papers, for instance, PMID25018104, 2014, Cell; PMID23086298, 2012, NCB, that used the XBP1s/ (XBP1s+XBP1u) ratio. We preferred this way of presentation as it shows the ratio of spliced XBP1 (XBP1s) relative to the total XBP1 mRNA (XBP1s+XBP1u).

      (6) Figure 1F: please indicate the type of AKT phosphorylation assessed.

      We thank the reviewer’s comments. We have added Ser473 as the phosphorylation site at in both figure legend and figure.

      (7) Figures 2E-H: please clearly indicate the specific fat depots analyzed in each figure.

      We thank the reviewer’s comments. We have added the information in the figure legends and figures.

      (8) Figures 1I and 3A, and Supplementary Figures 6D-E: please include a quantification analysis of the images presented.

      We thank the reviewer’s suggestion. We have added the quantifications of the images.

      (9) In Figure 3D the image corresponding to the merge for the STF condition is a duplication of the control, please correct this.

      We thank the reviewer for pointing this out. We have replaced it with the correct image.

      (10) Figures 4B-F: please provide individual data points in the graphs to show variability and sample distribution.

      We thank the reviewer’s suggestion. We have re-plotted the graphs in Fig. 4B-F with the individual data points.

      (11) Figure 4I: it is rather unusual to have such a strong signal of UCP1 in ND conditions, please explain.

      We thank the reviewer for the comment. We wish to point out that the images were taken from BAT slides. UCP1 is expected to show strong staining in BAT under DN condition, which as expected is weakened under HFD condition. STF treatment was able to correct the HFD-induced weakening of UCP1 staining in BAT.

      (12) Supplementary Figures 2C-D: please provide representative images for better clarity and interpretation.

      We thank the reviewer for the comment. The representative images for Supplementary Figures 2C-D were actually shown in Figures 2C and F. Supplementary Figures 2C-D were the mere quantification for adipocyte areas for Figures 2C and F.

      (13) Supplementary Table 3 is repeated, please remove.

      We thank the reviewer for the comment. We have deleted this repetition.

      Reviewer #2 (Recommendations for the authors):

      The manuscript can be further strengthened with more clarification on the following points.

      (1) The use of IRE1a pharmacological inhibitor STF-083010 (STF) needs to be validated. How was the dose determined? Were there any dose-dependent studies? Under the current dosing regimen, what are the specificity, selectivity, and toxicity of STF? Also, were the serine/threonine kinase and RNase activities measured in the adipocytes and ATMs of the animals dosed with the compound? What's the PK data?

      We thank the reviewer for the comments. In the animal study, we used STF 10 mg/kg for intraperitoneal injection. This dose was adopted from several recent studies (Madhavan, Aparajita, et al.2022, PMID 35105890; Herlea-Pana et al., 2021, PMID 34675883; Papandreou et al., 2011, PMID 21081713; Tufanli et al., 2017, PMID 28137856), in which STF treatment showed beneficial effect in their respective disease models. STF didn’t compromise cell viability or induce any other toxicity at the dose or concentration used in these studies (Papandreou I, et al., 2011; Upton JP, et al., 2012; Lerner AG, et al., 2012; Kemp KL, et al., 2013; Cross BC, et al., 2012). In our study, we didn’t observe any apparent toxicity on mice at this dose. Importantly, we did observe that STF inhibited IRE1 RNase activity in adipose tissues (F1G, S1D) and ATMs (F6Q, S8C, G, I) of the animals at this dose. As the IRE1 inhibitors including STF has been extensively examined and shown to have no effect on the kinase function of IRE1 (Cross et al., 2012, PMID: 22315414; Tufanli et al., 2017, PMID 28137856), we didn’t perform the assay on Ire1 kinase activity. Additionally, as the chemical has been administered into several animal models, with significant beneficial effects, one would assume decent pharmacokinetic parameters being achieved with the current dose. It would be important and necessary to have systematic PK studies in the future if clinical trials are to be considered.

      (2) The statistical method for individual panels in each figure needs to be specified.

      We thank the reviewer for the suggestion. We have specified the statistical method in the figure legends.

      (3) In Figure 1E, there's no difference in fasting insulin levels, though a difference was detected after the glucose load. This suggests an effect on insulin secretion but not insulin sensitivity.

      We thank the reviewer for the comments. The insulin levels are still different between Veh and STF groups at fasting, just not reaching statistically significant. Under glucose stimulation, the insulin levels all showed the same trend, which is, the STF group is lower than the Veh group. Even if the fasting insulin levels showed no difference between the two groups regardless of glucose stimulation, the fact that the blood glucose levels at all the time points are lower in STF group than Veh group (Fig. 1C) indicates that insulin sensitivity is improved. In our study, the insulin levels were lower in STF group, but the blood glucose levels were still lowered by STF, further strengthening the notion that STF treatment improves insulin sensitivity. This is indeed further corroborated by the ITT results (Fig. 1D).

      (4) Figure 2 and S2A did not show a decrease in BW but rather BW gain. The statement (line 308) needs to be edited. As a result of this, the relative fat mass measurement (% of BW) needs to be presented in addition to Figure 2B.

      We thank the reviewer for the comments/suggestions. As shown in Figs. 2A and S2A, we observed a slight decrease in body weight (~2g reduction) in STF-treated mice while Veh group increased body weight by ~3.5g, at the end of 4 weeks of treatment. As shown in Fig. 2B, this difference in body weight between Veh and STF groups was primarily due to a reduction in fat tissue. In the revision, we also added the percentages of fat and lean masses over total body weight in Supplemental Fig. 2B, which show the similar trend.

      (5) The measurement of blood lipid levels in Figure 3F-H is informative. More importantly, hepatic lipid content needs to be measured.

      We thank the reviewer for and agree on the comments. As this study is more focused on the insulin resistance and adipose tissue remodeling, we didn’t go deep into the comorbidities beyond the reported observations. It will be interesting to explore the effects of IRE1 inhibition on the obesity/insulin resistance comorbidities including hepatic lipid content measurement in future study.

      Minor corrections:

      (1) Line 261: "(spliced".

      Done. We have corrected it.

      (2) Line 334: spell out "PEPCK".

      We have added the full name “Phosphoenolpyruvate carboxykinase”. Thanks!

      (3) Line 478: please rephrase.

      We thank the reviewer for the comment. We have rephrased the sentence as following: “These results reveal that STF treatment suppresses the adipose tissue inflammation and the accumulation of pro-inflammatory ATM with augmenting (suppressing instead) M2-like ATMs.”

      (4) Figure 4L: "pGC1-a".

      We thank the reviewer for pointing this out. We have corrected the name.

      (5) Figure 4O: missing Y-axis label.

      We have added the label. Thanks!

      Reviewer #3 (Recommendations for the authors):

      The observations presented by Wu D. et al. in the manuscript are potentially interesting and relevant. The current study seeks to build upon previous findings, specifically from the work titled, "Silencing IRE1α using myeloid-specific cre suppresses alternative activation of macrophages and impairs energy expenditure in obesity." By using a pharmacological inhibitor to modulate IRE1α activity in adipose tissue macrophages (ATMs), the authors aim to develop therapeutics that could significantly impact the treatment of obesity and metabolic disease.

      The authors have performed some satisfactory experiments related to liver steatosis. However, the manuscript would benefit from a more comprehensive exploration of the mechanisms by which ATMs influence adipocyte metabolism, particularly in epididymal white adipose tissue (eWAT). In particular, the study should investigate how adiposity and lipid droplet size change in response to alterations in lipolysis and adipogenesis, as this could provide insights into how these processes contribute to the amelioration of the obesity phenotype.

      Several issues should be addressed to strengthen the manuscript and make the study more convincing. Below are specific comments and recommendations:

      Major:

      (1) The indirect calorimetric data should be normalized for dependent variables such as body weight, lean mass, and fat mass+ lean mass to accurately interpret the results. The results for 24-hour energy expenditure should be included in Figure 4B-F to provide a more comprehensive analysis. It is recommended to plot bar graphs with all individual data points for the energy expenditure (EE) results shown in Figure 4B-F, to offer a clearer and more detailed presentation of the data (Figure 4B-F).

      We thank the reviewer for the comments. Data analysis on the indirect calorimetric studies has evolved over the years. One common practice was/is to normalize the data by body weight. However, this approach was deemed improper some years ago (Tschop et al Nature Methods 2012, PMID: 22205519). Tschop paper also pointed out the shortcomings associated with normalization by lean mass. Instead, it concludes that “generalized linear model is the most appropriate statistical approach to accommodate discrete (genotype) and continuous (body mass) traits, rather than using a simple division by BW or lean BW”. In our study, we used CalR, an improved generalized linear model (which includes ANOVA and ANCOVA) (Mina et al Cell Metabolism 2018, PMID: 30017358) for all our energy expenditure data analysis (shown in Fig. 4A-E). In the revision, we also included data analysis normalized by BW (Fig. S2F-H’), which actually shows even wider difference between Veh and STF groups than the data shown in Fig. 4A-F. As STF decreased the fat mass and had little effect on lean mass, the difference would be more drastic for normalization with fat mass and with fat mass+ lean mass than the data shown in Fig. 4A-E and would be similar to the data shown in Fig. 4A-E for normalization with lean mass. In addition, we replotted the graphs in Fig. 4B, D, F-H with the individual data points.

      (2) At the thermoneutral point (30{degree sign}C), the study could benefit from testing the indirect calorimetric models of human energy physiology. Future studies could also explore this to evaluate the implications for drug development.

      We agree with the reviewer on the comments. In the future study, it will be very informative to investigate the effects of STF under thermoneutral conditions, which could provide more consistent data on how drugs affect metabolic processes in humans, improving translational research.

      (3) The current study missed the opportunity to investigate the effects of STF on non-adipose tissue (non-AT) resident macrophage populations, such as those in bone marrow or lymph-node macrophages. Understanding how STF modulates macrophage metabolism in these contexts would be valuable.

      We thank the reviewer for and agree on the comments. As this study is more focused on the insulin resistance and adipose tissue remodeling, we were mostly restricted to adipose tissue macrophage populations. In the future, it would be interesting to investigate the effect of STF on macrophages in other non-adipose tissues, which will provide a more comprehensive understanding of STF's effects on immune cell metabolism, which could inform its application in various therapeutic areas.

      (4) The study should explore how STF influences the expression of CD9, Trem2, (positive subpopulations), and the secretion of pro-inflammatory cytokines by macrophages, particularly in response to LPS and IFNγ activation in stromal vascular fraction (SVF) cells and bone marrow-derived macrophages (BM-Macrophages).

      We appreciate the reviewer for the comments. Under obesity, the ATM does not undergo the classical M1/M2 polarization; instead, both M1-like/pro-inflammatory macrophages and M2 macrophages increase drastically in obesity. It will be interesting to investigate the effects of STF on the newly identified CD9- and Trem2-positive macrophage subpopulations in SVF and bone marrow macrophages in response to LPS and IFNγ stimulation in the future, although these studies might not faithfully reflect the changes in adipose tissue under obesity as these stressors typically induce classical M1/M2 polarization.

      (5) Additional macrophage gating is necessary better to understand adipose tissue macrophage (ATM) inflammation. Specifically, CD11c−MHC2 low macrophages represent a newly identified inflammatory and dynamic subset in murine adipose tissue. These ATMs accumulate rapidly after ten days of a high-fat diet (HFD) and should increase further with prolonged HFD. For this study, CD11c−MHC2 low ATMs could be subdivided for flow cytometry analysis based on their MHC2 expression, distinguishing them from CD11c−MHC2 high ATMs. All macrophage subtypes categorized here can be studied for metabolic health using seahorse analysis as well.

      We appreciate the reviewer for the comments. It will be interesting to investigate the effects of STF on the newly identified CD11c−MHC2 low macrophage subpopulation in the future. Future studies certainly can include metabolic analysis with Seahorse which can corroborate the energy metabolism at the cellular level with organismal thermogenesis. 

      (6) All flow cytometry histograms - are they showing mean fluorescence intensity or cell# per population? Please specify. All flow cytometry dot plots - It would be helpful for readers to see populations plotted as bar graphs next to respective flow plots, as opposed to being shown as supplemental tables. Additionally, labeling dot plots with the parent population from which cells were gated on would also help readers understand faster what we're looking at.

      We appreciate the reviewer for the comments. In flow cytometry histograms, we used “normalized to mode”. The mode is often used to compare the distribution of fluorescence intensity between different samples. It focuses on the shape of the distribution (with a max of 100%) rather than the absolute cell counts, which helps remove variations caused by different cell numbers or sample sizes, making it easier to compare populations based on fluorescence intensity. When normalizing to the mode, the highest peak in the histogram is scaled to 100%, and all other values are scaled relative to that peak. This allows for easy comparison of multiple histograms, even if the total number of cells (or events) differs between samples.

      (7) The results appear to confuse the actual sample size and p-value. Please carefully review the statistical analyses to ensure that biological replicates are accurately represented. Additionally, include p-values alongside fold change data in the text for clarity represented.

      We appreciate the reviewer for the comments. We have rechecked the statistical analyses confirming that the biological replicates are now properly represented. The exact number of biological replicates for each experiment is now clearly specified in both the methods section and figure legends.

      (8) To further validate the findings, consider using Seahorse analysis at the cellular level in future experiments. This could confirm indirect calorimetric data and thermogenesis responses to cold stimulation.

      We appreciate the reviewer for the comments. Yes, Seahorse analysis at the cellular level will be conducted in future experiments.

      (9) Please ensure the use of person-first language, avoiding labels or adjectives that define individuals based on a condition or characteristic.

      We appreciate the reviewer for the comments. We have changed the descriptions by using person-first language.

      (10) The manuscript does not demonstrate how STF inhibition of IRE1α in ATM, specifically through CD9 and Trem2, controls diet-induced obesity. This aspect should be further elucidated.

      We appreciate the reviewer for the comment. In this study, we observed that STF inhibits IRE1α RNase activity in SVF and in sorted ATMs as well as in adipose tissue. The improvement in diet-induced obesity can be attributable to IRE1α inhibition in both adipocytes and macrophages as shown previously by myeloid and adipocyte-specific knockouts of IRE1α. To conclude whether the IRE1α in CD9- and/or Trem2-positive ATMs controls diet-induced obesity, genetic means would be needed to generate CD9- and/or Trem2-positive ATMs-specific deletion of IRE1α, which will be technically challenging at this moment as there is no CD9 or Trem2-specific Cre lines available.

      Minor:

      (1) Line 43-44: Update terminology to "MASLD" instead of "NAFLD."

      We thank the reviewer for pointing these out. We have changed the terminology in the revision.

      (2) Line 58-59: Add a reference for the mentioned text.

      We thank the reviewer for the comment. Added a reference in the text in the revision.

      (3) Was the antibody used to detect CD9 and Trem2 validated for FACS and other analyses?

      We thank the reviewer for the comment. In our studies, we determined CD9 and Trem2 expression through flow cytometry and immunostaining staining. In flow experiment, CD9 and Trem2 were acquired from Biolegend: PE/Dazzle™ 594 anti-mouse CD9 (BioLegend Cat# 124821, RRID:AB_2800601); APC-conjugated Trem2 (R&D Systems Cat# FAB17291N, RRID:AB_3646995), which were validated for FACS. For immunostaining: CD9  (Abcam Cat# ab223052, RRID:AB_2922392). and Trem2 (R&D Systems Cat# MAB17291, RRID:AB_2208679).

      (4) Studies were limited to male mice; this should be noted in the title and discussed as a limitation.

      We thank the reviewer for the comment. We have modified the wording in the revision.

      (5) Ensure all reagents are fully described with preparation details and identifiable numbers for reproducibility and/or submit the FACS protocol to any protocol archives.

      We thank the reviewer for the suggestions. Yes, we have modified the wording in the revision.

      (6) Provide the correct version numbers for all software used (FlowJo, Prism, etc.).

      We thank the reviewer for the suggestions. We have provided the correct version numbers for softwares for FlowJo and Prism.

      (7) Specify section size (µm) and blocking agent used for eWAT immunofluorescence (Line 207).

      We thank the reviewer for the suggestions. We have added this information.

      (8) Add gene accession numbers to Supplementary Table 3.

      We thank the reviewer for the suggestions. We have added this information.

      (9) Figure 2: Clarify HFD and treatment timelines with a schematic diagram.

      We thank the reviewer for the suggestions. We have added a schematic diagram in Supplemental Figure 1C.

      (10) For histology analysis, the minimum combined data from triplicate images is shown in Figure 2C-2H. For Figures 2E and H, provide complete methods for histology analysis.

      We thank the reviewer for the comments. For the histology analysis shown in Figures 2C–2H, we used a minimum of three mice per treatment group. For each mouse, 3–5 images were taken for analysis. All histology analyses were conducted using ImageJ for image quantification, and the data were processed and organized using Excel and Graphpad.

      (11) Figure 3D Macrophage markers F4/80 stained differently in Figure 5B; to avoid false positive staining, show isotype control to confirm actual staining. For eWAT immunofluorescence (Figures 3D, 5B, 6E)., counterstaining is needed in addition to macrophages, such as for adipocytes-perilipin, and phalloidin for total cells.

      We thank the reviewer for the comments. Yes, Figures 3D macrophage marker F4/80 stained is differently from that of Figure 5B, as they are in different tissues, with Figure 3D in liver samples while Figure 5B in adipose tissues. In the liver, subsets of macrophages are known as Kupffer cells. Kupffer cells have distinct morphology and behavior compared to other tissue-resident macrophages. When stained with F4/80 in the liver, the pattern may reflect the specialized role of Kupffer cells, typically showing a more diffuse or localized staining around blood vessels and sinusoids. In adipose tissue, macrophages tend to accumulate around dead or dying adipocytes, forming what is known as "crown-like structures" (CLS). The F4/80 staining in adipose tissue shows a more clustered pattern, particularly around areas of fat tissue undergoing remodeling or inflammation. In adipose tissue, you can still see clear, defined cells even without counterstaining like perilipin, and importantly, adipocytes are generally way larger than macrophages in size. Yes, we agree that if with counterstaining it would enhance the accuracy. In the future study, we will use perilipin staining to make it easier to differentiate adipocytes from other structures and provide stronger data.

      (12) Insert scale bars in the original images for Figures 3D, 4I, 4M, 5B, 6E, S3B, S6D-E, and S7A-B. All images added a scale bar not inserted while acquiring the image or using imaging software.

      We thank the reviewer for the suggestions. The resolution for the scale bars in the images obtained during acquisition, somehow, isn’t sufficient enough to be clearly visible and requires the enlargement of the images to be seen clearly. In the revision, we have manually added the scale bars for clarity.

      (13) Figure 5E: Please label X-axis as F4/80.

      We thank the reviewer for pointing this out. The label has been added in the revision.

      (14) Figure 5F: It is specified in the legend that cells were gated on F4/80+CD11b+CD11c+, but there is a CD11c- population shown in the histogram...How is this population appearing if all cells should be CD11c+?

      We thank the reviewer for pointing this out. We gated against CD11c in F4/80+CD11b+ population. As such, we have corrected the description in the legend.

      (15) Figure 5G: What is the F4/80+CD11b+CD11c-CD206- population gated in quadrants?

      We thank the reviewer for the comment. The F4/80+CD11b+CD11c-CD206- population was shown in Figure 5G on the lower left side, with the percentages being 15.7% for ND, 5.54% for Veh-HFD, and 26% for STF-HFD.

      (16) Figure 6J: Flow cytometry gates seem slightly misplaced and the sample appears to be overcompensated - were FMOs included in this experiment to establish proper gates? If so, please include.

      We thank the reviewer for the comment. In the study, we did include Fluorescence Minus One (FMO) control in the experiment to establish proper gating. We have included this information in the methods section.

      (17) Table 1-3: Indicate the number of replicates (n=) used in all tables.

      We thank the reviewer for the suggestion. We have provided the specific number of mice used in the study within the figure legends.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      The analysis of the dormancy rates is interesting and offers some intriguing questions related to the higher dormancy rate found for the L2 isolates and lower for the L3 ones. It will be interesting in the future to expand the data generated in this advanced in vitro plaAorm to in vivo studies.

      Indeed, an increased dormancy propensity of L2 isolates was previously reported in broth culture and associated to specific genetic polymorphisms. The opposite phenotype observed in the L3 isolates is indeed particularly intriguing and was not described to date. Hence, we fully agree that it would be very interesting to find out whether these phenotypes are also observed in vivo.

      The authors propose that ‘strains exhibiting greater proliferative capacity are more prone to induce macrophage apoptosis, thereby contributing to the extent of the granulomatous response.’ It would be interesting to know what happens if the macrophage apoptotic response is blocked.

      This is an interesting suggestion that would deserve a dedicated comprehensive investigation covering other cell death pathways. Even though the trend is significant, the correlation coefficient is rather low in this interaction, which looks a fortiori due to substantial inter-host variability in the apoptotic propensity of macrophages from individual donors to a given strain. In addition, such blocking experiments may require performing isolated macrophage infections that would fall outside of the scope of this study, or considering the extent and the contribution of the apoptosis of other cell subsets. 

      In contrast to macrophage apoptosis, T cell activation correlated with less replicative bacteria. Are these two findings related, ie, are the granulomas showing more (apoptotic) macrophages the ones with a lower percentage of activated T cells? This would shed light on what distinguishes granulomas that are protective from those that support bacterial growth. 

      Indeed, a significant negative correlation between macrophage apoptosis induction and T cell activation can be observed, specifically with activated CD4 T cells expressing CD38 (rS \= -0.36, p < 0.05) or CD69 (rS = -0.40, p < 0.01). We have added this additional result in the manuscript text (line 217).

      It would also be interesting to know the functional impact of blocking early CXCL9 or IL1b on the outcome of granulomatous response/bacteria growth.

      We have performed the suggested early blocking experiments and added the expected negative effect on granuloma formation upon neutralization of IL-1b (current Fig. 6E) in the revised version of the manuscript, and furthermore discussed the null effect on bacterial growth of the treatment with an anti-CXCL-9 specific antibody (current Fig. 6H).

      The authors acknowledge the absence of neutrophils in this model. However, this could be discussed in more detail, as neutrophils play an important part in TB pathogenesis as shown in different models of infection and human TB. 

      We concur and have expanded the importance of neutrophils in TB pathogenesis (including references) in the discussion section (line 260). 

      Related to neutrophils and TB pathogenesis, another important player is type I IFN. The multiplex assay used included IFN-alpha, was this molecule detected? If so, was there any difference in the levels of type I IFN detected among the different infections?

      We agree and that is why we had originally included IFN-α in our screen. However, this cytokine remained under the limit of quantification at both studied time points, preventing us to draw conclusions on the effect of Mtb strain diversity on the secretion of type I IFNs in in vitro granulomas.

      Reviewer #2:

      In Figure 1b/c, it is not clear what comparisons are being made to give the p-value annotations.

      In Figure 2a/b, it is not clear what comparisons are being made to give the p-value annotations.

      In Figure 3a, again it is not clear what comparisons are being made to give the p-value annotation.

      The p-values formerly present on the upper le] corner of the panels were resulting from either Friedman (Figures 1C, 2A and 3A) or Kruskal-Wallis (Figures 1B and 2B) tests and indicated whether there was a significant difference between the analyzed groups overall. To avoid confusion, those values have been removed to only leave the post-test comparison between specific groups.  

      In the results narrative related to Figure 1 (lines 93-103), the authors refer to lineage heterogeneity without providing any objective quantification of this - I suggest they do so, by providing variance or standard deviations. 

      Thank you very much for this relevant suggestion, we have now included the coefficients of variation as a quantitative measure of the within-lineage heterogeneity in the manuscript (line 97). 

      I also suggest the authors explain what the data points actually represent in this figure - do I assume each data point = cfu from a well of 'granuloma'? Are they all from the same donor PBMC? What is the sample N for each lineage? If the data are not from the same donor PBMC, I think more informative to present the results of paired statistical analyses, stratified by donor cells. In addition, the authors should include a summary table of the demographic characteristics of the donors (at least sex, ethnicity, and age). If the data are derived from a single donor, I'd advocate providing data from at least one further donor.

      In the new supplementary figure requested by Reviewer 3 Figure 1—figure supplement 1 (actual CFU data on days 1 and 8 p.i. used to calculate the growth rate) it is now indicated that bacterial load was quantified as CFU per well.

      Regarding the number of donors used, as stated in the Material and Methods section (current line 418) and depicted by the four different shapes used when data are grouped by individual infecting strain, all figures in our manuscript have been generated using PBMCs from 4 independent donors. For greater clarity, “n = 4” has now been included in the figure legends. Regarding the statistical analyses, paired statistical analyses stratified by donor were already performed in the original version of the manuscript whenever appropriate. 

      As stated in the methods section, the buffy coats used for PBMC isolation are anonymized so demographic data are unavailable.

      The premise of the analysis in Figure tic and the results narrative ("This finding suggests that an increased ability to enter dormancy is not necessarily associated with a more pronounced growth phenotype", line 132) is not clear to me. Why would increased dormancy relate to increased growth in the same context? I suggest this analysis be removed.

      We apologize for the confusion in our original statement. We now rephrased it as “This finding suggests that an increased tendency to remain in a metabolically active state is not necessarily associated with a more pronounced growth phenotype”.

      In Figure 3b, I think it may be more informative if the data points from the same donor were linked. Likewise in Figure 3c, I'd like to see a donor-paired statistical analysis.

      For all figures, the choice of using individual symbols to identify data points from the same donor but not connecting lines was made to provide a neater image. Nevertheless, we have now modified the figure linking the data points from the same donor. The statistical analysis performed is always donor-paired whenever appropriate. 

      The casual inference suggested in the results narrative between ‘macrophage apoptosis’ and granulomatous response line 173-175) is not tested directly by the experiment – I suggest the authors exclude this statement.

      Fair point, the statement has been removed.

      To what extent have the authors considered whether variation in T cell responses between lineages may be confounded by variation in Mtb reactive T cell frequencies in donor PBMC. Can this be disentangled at all? This should be acknowledged as a potential limitation of the study.

      We did characterize the presence of mycobacterial antigen-specific reactive T cells in the PBMCs from the investigated donors. To do so, we performed in vitro stimulations with purified protein derivative (PPD) or an ESAT-6/CFP-10 peptide pool and quantified the frequency of IFN-γ-positive CD4 T cells by flow cytometry. The percentage of IFN-γg-positive CD4 T cells recalled by PPD stimulation ranged from 0.02% to 0.13%, while no ESAT6/CFP-10 reactive T cells were detected. As such, we can akest that the PBMC donors never encountered Mtb even though some levels of memory recalled by PPD may be due to cross-reactivity with BCG or pre-exposure to non-tuberculous mycobacteria. We have now added a panel in Figure 5—figure supplement 2 representing the frequency of mycobacteria-specific CD4 T cells and, as suggested, discussed the impact on the extent of the T cell responses observed in granulomas in the revised version of the manuscript.  Nevertheless, the observed MTBC strain-specific trends are consistent across the donors, as depicted in Figure 5B and Figure 5—figure supplement 2A-B.

      Moreover, the experimental design does not really test cause and effect for the relationship between T cell proliferation/activation and bacterial growth. What is the impact of T-cell depletion from PBMC on bacterial growth?

      The increased TB susceptibility of HIV patients demonstrated that T cells play a critical part in the control of Mtb infection. We agree and did envisage such a depletion experiment. However, depleting T cells from PBMCs would imply removing up to 70% of the cells present in the specimen, which would lead to a situation from which results cannot be compared to the original sample and therefore would not be interpretable. 

      Reviewer #3:

      Data presentation:

      - In Figure 1 (replication rate), actual cumulative CFU means from each strain for both days 1 and 8 with statistical analysis should be presented as panels in this figure.

      Agreed. We are providing the requested representation of the data and the corresponding paired statistical analysis as supplementary material Figure 1—figure supplement 1.

      - In Figure 2 (dormancy), a panel comparing the mean number of bacteria that are single positive for either Auramine-O, Nile Red, or are double positive should be included for each strain, with statistical analysis. Representative photomicrographs of phenotypes from the staining should also be included. Electron microscopy could be conducted to compare the presence of intermediate lipid inclusions within organoidbound mycobacteria.

      As requested, percentages of single stained as well as double positive bacilli in each sample are now represented in Figure 2—figure supplement 1. In addition, we have now also followed the request and included a photomicrograph picturing representative Mtb staining phenotypes. Lastly, it would certainly be very elegant to visualize the presence of Mtb lipid inclusions within cellular aggregates by electron microscopy. However, we do not currently have the means for such investigations and the implementation of such a protocol under BSL3 conditions appears unrealistic in the context of this study.  

      - In Figure 3 (granulomatous response), the number, circularity, and size of immune aggregates are presented as "granuloma score" in which the mean ratio of size to circularity is divided by the number of inclusions. To their credit, in Supplementary Figure 2, the authors provide the data in a straighAorward manner. However, the granuloma score metric is reduced as the number of observed "granulomas" increases, which is counterintuitive. Additionally, circularity is not a definitive aspect of human granulomas (Wells et al., Am J Respir Crit Care Med, 2021, PMID: 34015247). I am skeptical that the "granuloma score" is an accurate predictor granulomatous inflammation. Is there precedent for this metric in the literature? If so, a reference should be provided. A high magnification inset of 1 representative granuloma from each strain should be included in Figure 3A.

      As requested, insets of a representative average granuloma for each strain have been included in Figure 3A. The formulation of the “granuloma score” has no precedent and cannot be referenced. By doing so, we meant to integrate within one single parameter the visual differences represented in the current Figure 3— figure supplement 2. We intentionally sought to assign the highest score to the massive aggregation that some strains may promote unlike some that trigger several small, dispersed and diffused aggregates.

      - In Figure 4 (macrophage apoptosis), a panel showing the percentage of dual Annexin V and 7-AAD positive cells should be included to provide the reader with the relative scope of ongoing apoptotic vs necrotic/secondary necrotic death in the model. If the data is readily available, including a control of uninfected PBMCs would also allow the reader to evaluate donor-dependent differences of in vitro cell death at baseline.

      No significant differences were observed in the percentage of dual Annexin V- and 7-AAD-positive macrophages (necrosis/secondary necrosis) between the MTBC strains at this time-point. Nevertheless, we have disclosed this result in the revised manuscript as Figure 4—figure supplement 2.

      - In Figures 5 and 6 (lymphocyte activation and soluble mediator secretion), panels showing unscaled data should be included. Panels depicting the unscaled immunoassay protein readings (pg/mL) by strain for CXCL9, granzyme B, and TNF with statistical analysis should be included in Figure 6.

      As requested, unscaled lymphocyte activation and soluble mediator data have been included as Figure 5— figure supplement 2 and Figure 6—figure supplement 1, respectively (replacing former supplementary figures 5 and 7). In addition, updated Figure 6G panel now depicts correlation analysis with the unscaled cytokine concentrations.

      The DosR-regulon:

      The authors hypothesize that differences in the prevalence of the dormancy metrics (acid-fastness or lipid inclusion prevalence, are due to strain-specific increases in expression of the DosR regulon within the model's hypoxic conditions (lines 107-114, 126-127). The claim that their model is equipped to evaluate dosR-dependent mycobacterial phenotypes was also previously proposed (Arbués et el., 2021) and should be tested. A comparison of the dosR-dependent gene expression of each strain in PBMC aggregates and broth culture by qRT-PCR would test this idea at a very basic level.

      We agree. Actually, a similar request was made during the revision of our first in vitro granuloma study for which such qPCR data were generated and presented in Fig. 1 D (PMID: 32069329). In addition, the work of Kapoor et al., who originally developed the in vitro granuloma model also demonstrated the induction of most of the DosR regulated genes by qPCR (PMID: 23308269). We trust that the reviewer will agree that this does not need to be repeated.

      The modern Beijing lineage strain L2C:

      The authors claim (Line 101-102) that the results of Figure 1 "confirm the higher virulence propensities of strains from modern lineages". From the data presented, it appears that strain L2C (Modern-Beijing) dominates the modern vs ancestral and inter/intra-lineage phenotypes of replication, dormancy, and apoptosis. Are significant differences between modern and ancestral lineages or between strains simply a facet of the distinct profile of L2C? Do the statistical differences disappear when the L2C group is excluded?

      Indeed, among the modern lineages’ isolates, L2C exhibits a hypervirulent profile in terms of bacterial replication. However, the difference between modern and ancestral strains remains statistically significant when L2C is excluded from the analysis (p = 0.002). That is also the case when we analyze the proportion of dormant bacteria. Exclusion of L2C strain results in a Kruskal-Wallis overall p = 0.005, and p = 0.0002 when we compare L2 vs. L3. Lastly, regarding the percentage of apoptotic macrophages, if we use L2B (instead of L2C) to compare, the difference is still significant vs. L1A (p = 0.008) although there is no longer a trend for L2A (p = 0.1).

      "Dormancy":

      Dormancy is definitively a non-replicative state, where bacterial growth is absent. The authors' findings and claims appear to be incompatible with that definition, which they acknowledge (Lines 130-135). The lack of correlation between growth and dormancy in their model is supported with reference to Figure 2C, a Spearman's analysis of dormancy ratio with growth rate (inclusive of all strains under consideration). The figure supports a model where "dormancy" and "growth rate" are disjunct but also appears to show high "dormancy" accompanying increasing "growth" in the L2C group. How are strains able to grow if they are in a non-replicative state? Are the "growth rate" assays actually measures of survival? Are there different rates of infectivity? Are the bacteria growing cellularly in the serum-rich ECM, etc. etc? We need to see the hard CFU and Nile Red, and Auramine-O data to contextualize these findings. Alternatively, could the accumulation of inclusions in the model not be a reliable dormancy metric (Fines et al., BioRxiv [Preprint], 2023, PMID: 37609245)?

      We fully agree. The Nile red profiles are always relative and only depict the proportion of the population that has entered a dormant state. Nevertheless, dormancy can be dynamic and bacteria may swi]ly resuscitate in that model. Furthermore, and as depicted in Figure 2—figure supplement 1, despite showing an increased tendency to enter a dormant-like state, a considerable population of lineage 2 bacilli still remains metabolically active and in a replicative state. The referred preprint is very interesting and we will follow it up closely.

      Specificity of responses to PBMC aggregation:

      The authors claim that their results "reveal a broad spectrum of granulomatous responses" (Line 73) but do not show any aggregation specificity of PBMC responses beyond the model's intrinsic metrics of area and circularity. To establish that their phenotypes such as lymphocyte activation, cytokine release, cell death, or mycobacterial acid-fastness/lipid inclusion prevalence, are aspects of the granulomatous response the authors could infect PBMCs from the same donors with the same strains and perform the same assays using established Mtb-PBMC models in which the cells do not aggregate. This would answer many important questions, for example, does the rate of macrophage infection account for variability in apoptosis percentage? Phagocytosis assay and quantification of stained intracellular mycobacteria within recently infected PBMCs could be conducted to determine if phenotypes are an aspect of granulomatous aggregation or due to strain-specific differences in cellintrinsic macrophage immunity. It would also be very informative to know what percentage of PBMCs and mycobacteria are granuloma-bound in the ECM.

      We are not aware of Mtb-PBMC models in which the cells do not aggregate. We previously compared PBMC infection models in the presence or absence of the collagen matrix and cells also spontaneously coalesced around infection foci (PMID: 34603299). Regarding the last point, the melting step of the collagen matrix requires enzymatic digestion and pipetting that dislocate the aggregates. Accordingly, we cannot distinguish the bacteria that would remain within the matrix compared to those replicating within cellular aggregates. However, we did resolve this question by demonstrating that the bacteria were not able to grow in the absence of cells in this culture condition (Supplementary material, PMID: 34603299)

      Minor recommendations

      - The term TNF-a should be replaced with TNF throughout the manuscript.

      We acknowledge that the term TNF-a can be interchangeable with TNF. However, we chose to use the TNFα terminology to differentiate it from lymphotoxin α, which is also referred to as TNF-β.

      - The authors cite studies conducted in murine and NHP models to support the claim that "understanding of immune protective traits in TB remains insufficient and yet dominated by data from mouse and non-human primate studies" (Lines 63-64) but ignore an abundance of data from other in vivo and in vitro models that have provided numerous valuable insights in the field of TB immunology. This line should be revised or omired.

      For us, the term “dominate” implies that these models are widely used, not that they are the only ones. Other models indeed provided additional relevant data. We are citing the lung-on-chip model of McKinney’lab and the in vitro granuloma model of Elkigton’s lab (line 66). We would be very happy to include more references upon further specifications even though we cannot build an extensive review here.

      - The authors claim that their model "encompasses, with the exception of neutrophils, all immune cell types involved in TB" (Lines 67-68). To support this claim, they should provide additional references or data demonstrating that the PBMC aggregates include, eosinophils, mast cells, dendritic cells, yolk-sac-derived alveolar macrophages, and Langhan's giant cells.

      With the aim of providing a more accurate and detailed information regarding the cell types present in the model, the sentence has been reformulated as: “The model encompasses all PBMC-derived cell types involved in TB immune responses, but lacks granulocytes (i.e. neutrophils, eosinophils, basophils and mast cells)” (line 260). Noteworthy, the presence of multinucleated giant cells was reported in Kapoor’s paper describing the in vitro granuloma model for the first time (PMID: 23308269).

      -  As an additional note, the title can be improved and made more broadly accessible by revising the use of the acronyms CXCL9, granzyme B, and TNF-α.

      To render the title more broadly accessible we propose to replace the listed acronyms by “soluble immune mediators”, but we remain opened to more appropriate and specific suggestions.

      Answers to the reviewers’ public comments

      Reviewer #1:

      First of all, we would like to thank the reviewers for their feedback and suggestions to improve our manuscript. To strengthen the findings of our study, we have performed and added results from IL-1b and CXCL9 blocking experiments evaluating the impact on the granulomatous response and bacterial load, respectively. In the revised version of the manuscript, while we discuss the null effect on bacterial growth of the treatment with an anti-CXCL-9 antibody and the potential reason behind it, we are now reporting a negative effect on the magnitude of granuloma formation upon neutralization of IL-1b that the correlation analysis had initially suggested.

      Reviewer #2:

      The revised version of our manuscript incorporates now all the points detailed in the private answers to the reviewer, including clarifications on the statistical tests performed, additional supplementary materials to transparently disclose the raw data behind the normalization approach, as well as flow cytometry data on the immune memory status of the blood donors. In addition, and as stated in the answer to reviewer #1, to test causal relationship between some host and pathogen traits, we have now performed and provided data and interpretation of IL-1b and CXCL9 blocking experiments.

      Reviewer #3:

      We are thankful and concur with these constructive comments and insights. We have now consistently revisited the statistics in the figures to improve clarity and included new supplementary figures reporting the raw data that were missing in the initial version of the manuscript. In addition, and as mentioned in the answers to reviewers #1 and #2, we have now performed and added IL-1β and CXCL9 blocking experiments to test causal relationship between specific host and pathogen traits. In particular, we are now reporting a negative effect on the magnitude of granuloma formation upon neutralization of IL-1β that the correlation analysis had initially suggested.

      More specifically, regarding the point that our method for bacterial collection calls into question whether all Mtb plated for CFU assay resided within granulomatous aggregates, we previously reported that Mtb growth strictly required the presence of human cells in our culture conditions (Supplementary material, Arbués et al, 2021, PMID: 34603299). In the presence of cells, our microscopy read-out does allow us to observe extra-cellular growth if infections are carried on beyond an 8-day limit, which we applied in the current study to exclude this particular caveat. 

      Concerning the apparently conflicting observation that those strains displaying an increased tendency to enter a dormant-like state are the ones exhibiting the highest replication rates, we would like to point out that a considerable population of bacilli still remains metabolically active and in a replicative state. For instance, and as depicted in Figure 2—figure supplement 1, despite showing an increased tendency to enter a dormant-like state, a considerable population of lineage 2 bacilli does remain metabolically active. Moreover, dormancy can be dynamic and bacteria may swi]ly resuscitate.

      Regarding the mentioned limitations of our study that we have discussed in the revised version of our manuscript, we fully concur that PBMC-based in vitro granuloma models lack tissue structure as well as some important stromal and immune cellular players. Nevertheless, we and others demonstrated the particular relevance of the 3-dimensional infection approach within a matrix of collagen and fibronectin by providing mechanistical insights into Mtb resuscitation previously associated to treatment with various immunomodulatory drugs (Arbués et al., 2020, PMID: 32069329; Tezera et al., 2020, PMID: 32091388).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript describes the impact of modulating signaling by a key regulatory enzyme, Dual Leucine Zipper Kinase (DLK), on hippocampal neurons. The results are interesting and will be important for scientists interested in synapse formation, axon specification, and cell death. The methods and interpretation of the data are solid, but the study can be further strengthened with some additional studies and controls.

      We greatly appreciate the thorough review and thoughtful suggestions from the reviewers and editors on our original manuscript. We provide point-to-point response below.  We added new studies on P10 mice and controls as suggested, and made revision of figures and texts for clarification. The revised manuscript includes three new supplemental figures; major text revision is copied under response.

      Reviewer #1 (Public Review):

      Summary:

      In this work, Ritchie and colleagues explore functional consequences of neuronal over-expression or deletion of the MAP3K DLK that their labs and others have strongly implicated in both axon degeneration, neuronal cell death, and axon regeneration. Their recent work in eLife (Li, 2021) showed that inducible over-expression of DLK (or the related LZK) induces neuronal death in the cerebellum. Here, they extend this work to show that inducible over-expression in Vglut1+ neurons also kills excitatory neurons in hippocampal CA1, but not CA3. They complement this very interesting finding with translatomics to quantify genes whose mRNAs are differentially translated in the context of DLK over-expression or knockout, the latter manipulation having little to no effect on the phenotypes measured. The authors note that several genes and pathways are differentially regulated according to whether DLK is over-expressed or knocked out. They note DLK-dependent changes in genes related to synaptic function and the cytoskeleton and ultimately relate this in cultured neurons to findings that DLK over-expression negatively impacts synapse number and changes microtubules and neurites, though with a less obvious correlation.

      Strengths:

      This work represents a conceptual advance in defining DLK-dependent changes in translation. Moreover, the finding that DLK may differentially impact neuronal death will become the basis for future studies exploring whether DLK contributes to differential neuronal susceptibility to death, which is a broadly important topic.

      We thank the reviewer for the comments on the value of our work.

      Weaknesses:

      This seems like two works in parallel that the authors have not yet connected. First is that DLK affects the translation of an interesting set of genes, and second, that DLK(OE) kills some neurons, disrupts their synapses, and affects neurite growth in culture.

      Specific questions:

      (1) Is DLK effectively knocked out? The authors reference the floxxed allele in their 2016 work (PMID: 27511108), however, the methods of this paper say that the mouse will be characterized in a future publication. Has this ever been published? The major concern is that here the authors show that Cre-mediated deletion results in a smaller molecular weight protein and the maintenance of mRNA levels.

      We apologize for out-of-date citation of the DLK(cKO)<sup>fl/fl</sup> mice.  The DLK(cKO)<sup>fl/fl</sup> mice have been published in (Li et al., 2021; Saikia et al., 2022); excision of the flox-ed exon was verified using several Cre drivers (Pv-Cre, AAV-Cre, and VGlut1-Cre in this study).  The flox-ed exon contains the initiation ATG and 148 amino acids.  By western blot analysis using antibodies against C-terminal peptides of DLK on cerebellar extracts (in Li et al., 2021) and hippocampal extracts (this study), the full-length DLK protein was significantly reduced (Fig 1A-B); DLK is expressed in other hippocampal cells, in addition to glutamatergic neurons, explaining remaining full-length DLK detected. 

      Our Ribo-seq of VGlut1-Cre; DLK(cKO)<sup>fl/fl</sup> detected remaining Dlk mRNAs lacking the floxed exon (Fig.S1C), which has several candidate ATG at amino acid 223 and after (Fig.S1C1). We detected a very faint band for smaller molecular weight proteins on western blots, only when the membrane was exposed under 5X longer exposure using Pico PLUS Chemiluminescent Substrate (Thermo Scientific, 34580) and a Licor Odyssey XF Imager (revised Fig. S1B). This smaller molecular weight protein might be produced using any candidate ATGs, but would represent an N-terminal truncated DLK protein lacking the ATP binding site and ~1/4 of the kinase domain, i.e. not a functional kinase. 

      The revised manuscript has updated citation for DLK(cKO)<sup>fl/fl</sup>. Revised Fig.S1B includes images of a western blot under normal exposure vs longer exposure of western blots using anti-DLK antibodies. New Fig.S1C1 shows effects of floxed exon on DLK.

      (2) Why does DLK(OE) not kill CA3 neurons? The phenomenon is clear but there is no link to gene expression changes. In fact, the highlighted transcript in this work, Stmn4, changes in a DLK-dependent manner in CA3.

      We agree that this is a very interesting question not answered by our gene expression analysis.  While we verified Stmn4 expression levels to correlate to the levels of DLK, we do not think that increased Stmn4 per se in DLK(iOE) is a major factor accounting for CA1 death vs CA3 survival. Several published studies have also reported regulation of Stmn4 mRNAs in other cell types, in the contexts of cell death (Watkins et al., 2013; Le Pichon et al., 2017) and axon regeneration and cytoskeleton disruption (Asghari Adib et al., 2024; DeVault et al., 2024; Hu et al., 2019;  Shin et al., 2019). As Stmns have significant expression and function redundancy, conventional knockdown or overexpression of individual Stmn generally does not lead to detectable effects on cellular function. As CA3 neurons are widely known for their dense connections and show resilience to NMDA-mediated neurotoxicity (Sammons et al., 2024; Vornov et al., 1991), we speculate that the differential vulnerability of CA1 and CA3 under DLK(iOE) is a reflection of both the intrinsic property, such as gene expression, and also their circuit connection. 

      In the revised manuscript, we have included following statement on pg 18:

      ‘While our data does not pinpoint the molecular changes explaining why CA3 would show less vulnerability to increased DLK, we may speculate that DLK(iOE) induced signal transduction amplification may differ in CA1 vs CA3. CA1 genes appear to be more strongly regulated than CA3 genes, consistent with our observation that increased c-Jun expression in CA1 is greater than that in CA3. Other parallel molecular factors may also contribute to resilience of CA3 neurons to DLK(iOE), such as HSP70 chaperones, different JNK isoforms, and phosphatases, some of which showed differential expression in our RiboTag analysis of DLK(iOE) vs WT (shown in File S2. WT vs DLK(iOE) DEGs). Together with other genes that show dependency on DLK, the DLK and Jun regulatory network contributes to the regional differences in hippocampal neuronal vulnerability under pathological conditions.’

      Further we state in ‘Limitation of our study’ on pg 20:

      ‘Our analysis also does not directly address why CA3 neurons are less vulnerable to increased DLK expression. Future studies using cell-type specific RiboTag profiling and other methods at a refined time window will be required to address how DLK dependent signaling interacts with other networks underlying hippocampal regional neuron vulnerability to pathological insults.’

      We hope our data will stimulate continued interests for testable hypothesis in future studies.

      (3) Why are whole hippocampi analyzed to IP ribosome-associated mRNAs? The authors nicely show a differential effect of DLK on CA1 vs CA3, but then - at least according to their methods ¬- lyse whole hippocampi to perform IP/sequencing. Their data are therefore a mix of cells where DLK does and does not change cell death. The key issue is whether DLK does/does not have an effect based on the expression changes it drives.

      At the time of planning the Ribo-Tag experiment several years ago, we focused on the hippocampal glutamatergic neurons. Due to technical difficulty in micro-dissecting individual hippocampal regions from this early timepoint, we opted to use whole hippocampi to isolate ribosome-associated mRNAs. We agree with the reviewer that it is important to sort out DLK-dependent general gene expression changes vs those specific to a particular cell type where DLK impacts its survival. With emerging CA1, CA3 and other cell-type specific Cre drivers and advanced RNAseq technology, we hope that our work will stimulate broad interest in these questions in future studies. 

      In the revised manuscript, we have included new analysis comparing our Vglut1-RiboTag profiling (P15) with CamK2-RiboTag (for CA1) and Grik4-RiboTag (for CA3) (P42) published in Traunmüller et al., 2023 (GSE209870). We find that >80% of the top ranked genes in their CamK2-RiboTag (for CA1) and Girk4-RiboTag (for CA3) were detected in our VGlut1-RiboTag (revised methods and Supplemental Excel File S3). CA1-enriched genes tended to be expressed higher in DLK(cKO), compared to control, whereas CA3-enriched genes showed less significant correlation to DLK expression levels. Additionally, many genes known to specify CA1 fate do not show significant downregulation in DLK(iOE). This analysis, along with other data in our manuscript, is consistent with an idea that DLK does not regulate neuronal fate.

      In the revised manuscript, we presented this additional analysis in Fig. S6K-L, and expanded text description on page 9:

      ‘Additionally, we compared our Vglut1-RiboTag datasets with CamK2-RiboTag and Grik4-RiboTag datasets from 6-week-old wild type mice reported by (Traunmüller et al., 2023; GSE209870). We defined a list of genes enriched in CamK2-expressing CA1 neurons relative to Grik4-expressing CA3 neurons (CA1 genes), and those enriched in Grik4-expressing CA3 neurons (CA3 genes) (File S3). When compared with the entire list of Vglut1-RiboTag profiling in our control and DLK(cKO), we found CA1 genes tended to be expressed more in DLK(cKO) mice, compared to control (Fig.S6K), while CA3 genes showed a slight enrichment in control though the trend was less significant, and were less clustered towards one genotype (Fig.S6L). Moreover, many CA1 genes related to cell-type specification, such as FoxP1, Satb2, Wfs1, Gpr161, Adcy8, Ndst3, Chrna5, Ldb2, Ptpru, and Ntm, did not show significant downregulation when DLK was overexpressed. These observations imply that DLK likely specifically down-regulates CA1 genes both under normal conditions and when overexpressed, with a stronger effect on CA1 genes, compared to CA3 genes. Overall, the informatic analysis suggests that decreased expression of CA1 enriched genes may contribute to CA1 neuron vulnerability to elevated DLK, although it is also possible that the observed down-regulation of these genes is a secondary effect associated with CA1 neuron degeneration’.

      (4) Is the subtle decrease in synapse number (Basson/Homer co-loc.) in the DLK (OE) simply a function of neurons (and their synapses, presumably) having died? At the P15 time point that the authors choose because cell death is minimal, there is still a ~25% reduction in CA1 thickness (Figure 2B), which is larger than the ~15% change in synapses (Figure 5H) they describe.

      We thank reviewer for the question. To address this, we have analyzed synapses in the CA1 region at P10 in DLK(iOE) mice when there was no detectable loss of neurons. At P10, we did not detect significant changes in Bassoon, Homer1, or colocalized puncta in CA1 (Fig.S11A-F). In P15 DLK(iOE) mice, Homer1 puncta were slightly smaller (Fig.5L) and showed a significant decrease in CA1 SR (Fig.5I).

      In the revised manuscript we have also redone our statistical analysis of synapses, using mice rather than ROIs (revised Fig. 5), as recommended by R3. We also analyzed synapses in CA3, and found no significant differences in P10 or P15 (Fig.S12).  We would interpret the data to mean that the effects of DLK(OE) on synapses in CA1 may represent an early step in neuronal death. We hope that future studies will shed clarity on this question.

      Reviewer #2 (Public Review):

      This manuscript describes the impact of deleting or enhancing the expression of the neuronal-specific kinase DLK in glutamatergic hippocampal neurons using clever genetic strategies, which demonstrates that DLK deletion had minimal effects while overexpression resulted in neurodegeneration in vivo. To determine the molecular mechanisms underlying this effect, ribotag mice were used to determine changes in active translation which identified Jun and STMN4 as DLK-dependent genes that may contribute to this effect. Finally, experiments in cultured neurons were conducted to better understand the in vivo effects. These experiments demonstrated that DLK overexpression resulted in morphological and synaptic abnormalities.

      Strengths:

      This study provides interesting new insights into the role of DLK in the normal function of hippocampal neurons. Specifically, the study identifies:

      (1) CA1 vs CA3 hippocampal neurons have differing sensitivity to increased DLK signaling.

      (2) DLK-dependent signaling in these neurons is similar to but distinct from the downstream factors identified in other cell types, highlighted by the identification of STMN4 as a downstream signal.

      (3) DLK overexpression in hippocampal neurons results in signaling that is similar to that induced by neuronal injury.

      The study also provides confirmatory evidence that supports previously published work through orthogonal methods, which adds additional confidence to our understanding of DLK signaling in neurons. Taken together, this is a useful addition to our understanding of DLK function.

      We thank the reviewer for careful reading and positive comments.

      Weaknesses:

      There are a few weaknesses that limit the impact of this manuscript, most of which are pointed out by the authors in the discussion. Namely:

      (1) It is difficult to distinguish whether the changes in the translatome identified by the authors are DLK-dependent transcriptional changes, DLK-dependent post-transcriptional changes or secondary gene expression changes that occur as a result of the neurodegeneration that occurs in vivo. Additional expression analysis at earlier time points could be one method to address this concern.

      We appreciate the reviewer’s comment, and have performed new analysis on c-Jun and p-c-Jun levels in CA1, CA3, and DG in P10 DLK(OE) mice. Our data suggest that in CA3 elevations in p-c-Jun and c-Jun occur separately from cell death in a DLK-dependent manner, though the high elevation of both p-c-Jun and c-Jun in CA1 correlates with cell death.

      The data is presented in revised Fig.S7A,B, and described in revised text on pg 9-10:

      ‘In control mice, glutamatergic neurons in CA1 had low but detectable c-Jun immunostaining at P10 and P15, but reduced intensity at P60; those in CA3 showed an overall low level of c-Jun immunostaining at P10, P15 and P60; and those in DG showed a low level of c-Jun immunostaining at P10 and P15, and an increased intensity at P60 (Fig.S7A,C,E). In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice at P10 when no discernable neuron degeneration was seen in any regions of hippocampus, only CA3 neurons showed a significant increase of immunostaining intensity of c-Jun, compared to control (Fig.S7A). In P15 mice, we observed further increased immunostaining intensity of c-Jun in CA1, CA3, and DG, with the strongest increase (~4-fold) in CA1, compared to age-matched control mice (Fig.S7C). The overall increased c-Jun staining is consistent with RiboTag analysis.’

      Also, on pg.10:

      In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice, we observed increased p-c-Jun positive nuclei in CA1 at P10, and strong increase in CA1 (~10-fold), CA3 (~6-fold), and DG (~8-fold) at P15 (Fig.S7B,D).

      (2) Related to the above, it is difficult to conclusively determine from the current data whether the changes in synaptic proteins observed in vivo are a secondary result of neuronal degeneration or a primary impact on synapse formation. The in vitro studies suggest this has the potential to be a primary effect, though the difference in experimental paradigm makes it impossible to determine whether the same mechanisms are present in vitro and in vivo.

      We appreciate the comment, which is related to R1 point 4. We have performed further analysis and revised the text on pg.12 with the following text:

      ‘To assess effects of DLK overexpression on synapses, we immunostained hippocampal sections from both P10 and P15, with age-matched littermate controls. Quantification of Bassoon and Homer1 immunostaining revealed no significant differences in CA1 SR and CA3 SR and SL in P10 mice of _<_i>Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> and control (Fig.S11A-F, S12A-J). In P15, Bassoon density and size in CA1 SR were comparable in both mice (Fig 5G, H, K), while Homer1 density and size were reduced in DLK(iOE) (Fig.5G,I, L). Overall synapse number in CA1 SR was similar in DLK(iOE) and control mice (Fig.5J). Similar analysis on CA3 SR and SL detected no significant difference from control (Fig.S12M-V).’

      We would interpret the data to mean that the effects of DLK(OE) on synapses in CA1 may represent an early step in neuronal death. We hope that future studies will shed clarity on this question.

      Additionally, to address whether the same mechanisms are present in vitro, we have performed further analysis on cultured hippocampal neurons. As described in the Methods, we made hippocampal neuron cultures from P1 pups of the following crosses:

      For control: Vglut1<sup>Cre/+</sup> X Rosa26<sup>tdT/+</sup> 

      For DLKcKO: Vglut1<sup>Cre/+</sup>;DLK(cKO)<sup>fl/fl</sup>  X Vglut1<sup>Cre/+</sup>;DLK(cKO)<sup>fl/fl</sup>;Rosa26<sup>tdT/+</sup> 

      For DLKiOE: H11-DLK<sup>iOE/iOE</sup> X Vglut1<sup>Cre/+</sup>;Rosa26<sup>tdT/+</sup> 

      Dissociated cells from a given litter were pooled into the same culture. Because there were different proportions of neurons with our genotype of interest in each culture, it is not simple to know whether DLK was causing significant cell death.

      On pg 13, we stated our observation:

      ‘We did not notice an obvious effect of DLK(iOE) or DLK(cKO) on neuron density in cultures at DIV2. To assess neuronal type distribution in our cultures, we immunostained DIV14 neurons with antibodies for Satb2, as a CA1 marker (Nielsen et al., 2010), and Prox1, as a marker of DG neurons (Iwano et al., 2012). We did not observe significant differences in the proportion of cells labeled with each marker in DLK(cKO) or DLK(iOE) cultures (Fig.S13E). These data are consistent with the idea that DLK signaling does not have a strong role in neuron-type specification both in vivo and in vitro’.

      (3) The phenotype of DLK cKO mice is very subtle (consistent with previous reports) and while the outcome of increased DLK levels is interesting, the relevance to physiological DLK signaling is less clear. What does seem possible is that increased DLK may phenocopy other neuronal injuries but there are no real comparisons to directly address this in the manuscript. It would be helpful for the authors to provide this analysis as well as a table with all of the translational changes along with fold changes.

      Thank you for the suggestion. The fold changes of genes showing significantly altered expression in DLK(cKO) and DLK(iOE) are provided in the excel files (Supplementary excel File S1 WT vs DLK(cKO) DEGs and File S2. WT vs DLK(iOE) DEGs, highlighted columns B and F).  

      On pg 6, we revised the text as following to include comparison of DLK levels in other physiological conditions and our mice:

      ‘Several studies have reported that DLK protein levels increase under a variety of conditions, including optic nerve crush (Watkins et al., 2013), NGF withdrawal (~2 fold) (Huntwork-Rodriguez et al., 2013; Larhammar et al., 2017), and sciatic nerve injury (Larhammar et al., 2017). Induced human neurons show increased DLK abundance about ~4 fold in response to ApoE4 treatment (Huang et al., 2019). Increased expression of DLK can lead to its activation through dimerization and autophosphorylation (Nihalani et al., 2000)’.

      And,

      ‘Additional analysis at the mRNA level (supplemental excel, File S2. WT vs DLK(iOE) DEGs) and at the protein level (Fig.S8E) suggest that the increase in DLK abundance was around 3 times the control level. The localization patterns of DLK protein appeared to vary depending on region of hippocampus and age of animals in both control and Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice (Fig.S3C).’

      In Discussion, we state (pg. 16): ‘The levels of DLK in our DLK(iOE) mice model appear comparable to those reported under traumatic injury and chronic stress.’

      (4) For the in vivo experiments, it is unclear whether multiple sections from each animal were quantified for each condition. More information here would be helpful and it is important that any quantification takes multiple sections from each animal into account to account for natural variability.

      We apologize this was unclear in the original manuscript.

      In the revised methods, under Confocal imaging and quantification (pg 33), we stated: “For brain tissue, three sections per mouse were imaged with a minimum of three mice per genotype for data analysis.”

      In revised figure legends, we made it clear that multiple sections from each animal have been used for quantification in all instances, i.e. “Each dot represents averaged thickness from 3 sections per mouse, N≥4 mice/genotype per timepoint.” 

      In Fig.1F-H: “Each dot represents averaged intensity from 3 sections per mouse”

      In Fig.S3B “Data points represent individual mice, averages taken across 3 sections per mouse”

      Reviewer #3 (Public Review):

      Dr Jin and colleagues revisit DLK and its established multifactorial roles in neuronal development, axonal injury, and neurodegeneration. The ambitious aim here is to understand the DLK-dependent gene network in the brain and, to pursue this, they explore the role of DLK in hippocampal glutamatergic neurons using conditional knockout and induced overexpression mice. They produce evidence that dorsal CA1 and dentate gyrus neurons are vulnerable to elevated expression of DLK, while CA3 neurons appear unaffected. Then they identify the DLK-dependent translatome featured by conserved molecular signatures and cell-type specificity. Their evidence suggests that increased DLK signaling is associated with possible STMN4 disruptions to microtubules, among else. They also produce evidence on cultured hippocampal neurons showing that expression levels of DLK are associated with changes in neurite outgrowth, axon specification, and synapse formation. They posit that downstream translational events related to DLK signaling in hippocampal glutamatergic neurons are a generalizable paradigm for understanding neurodegenerative diseases.

      Strengths

      This is an interesting paper based on a lot of work and a high number of diverse experiments that point to the pervasive roles of DLK in the development of select glutamatergic hippocampal neurons. One should applaud the authors for their work in constructing sophisticated molecular cre-lox tools and their expert Ribotag analysis, as well as technical skill and scholarly treatment of the literature. I am somewhat more skeptical of interpretations and conclusions on spatial anatomical selectivity without stereological approaches and also going directly from (extremely complex) Ribotag profiling patterns to relevance based on immunohistochemistry and no additional interventions to manipulate (e.g. by knocking down or blocking) their top Ribotag profile hits. Also, it seems to this reviewer that major developmental claims in the paper are based on gene translational profiling dependent on DLK expression, not DLK activation, despite some evidence in the paper that there is a correlation between the two. Therefore, observed patterns and correlations may or may not be physiologically or pathologically relevant. Generalizability to neurodegenerative diseases is an overreach not justified by the scope, approach, and findings of the paper.

      We thank the reviewer for the encouraging and constructive comments on the manuscript.

      Weaknesses and Suggestions:

      The authors state that the rationale for the translatomic studies is to "to gain molecular understanding of gene expression associated with DLK in glutamatergic neurons" and to characterize the "DLK-dependent molecular and cellular network", However, a problem with the experimental design is the selection of an anatomical region at a time point featured by active neurodegeneration. Therefore, it is not straightforward that the differentially expressed genes or pathways caused by DLK overexpression changes could be due to processes related to neurodegeneration. Indeed, the authors find enrichment of signals related to pathways involved in extracellular matrix organization, apoptosis, unfolded protein responses, the complement cascade, DNA damage responses, and depletion of signals related to mitochondrial electron transport, etc., all of which could be the consequence of neurodegeneration regardless of cause. A more appropriate design to discover DLK-dependent pathways might be to look at a region and/or a time point that is not confounded by neurodegeneration.

      We appreciate reviewer’s comment. We included our thoughts in ‘Limitation of the study’ (pg 20):

      ‘Future studies using cell-type specific RiboTag profiling and other methods at a refined time window will be required to address how DLK dependent signaling interacts with other networks underlying hippocampal regional neuron vulnerability to pathological insults.’

      In a related vein, the authors ask "if the differentially expressed genes associated with DLK(iOE) might show correlation to neuronal vulnerability" and, to answer this question, they select the set of differentially expressed genes after DLK overexpression and assess their expression patterns in various regions under normal conditions. It looks to me that this selection is already confounded by neurodegeneration which could be the cause for their downregulation. Therefore, such gene profiles may not be directly linked to neuronal vulnerability. A similar issue also relates to the conclusion that "...the enrichment of DLK-dependent translation of genes in CA1 suggests that the decreased expression of these genes may contribute to CA1 neuron vulnerability to elevated DLK".

      We agree with the reviewer’s concern that it is difficult to separate neurodegenerative consequences from changes caused by DLK solely based on our translatomics studies on P15 DLK(iOE) mice.  As responded to reviewer 1 (point 4) and reviewer 2 (point 1), we have included new analysis of P10 mice (Fig.S7A,B) when neurons did not show detectable sign of degeneration.

      We consider several lines of evidence supporting that some differentially expressed genes in DLK(iOE) vs control may likely be specific for increased DLK signaling.

      First, the genes identified in DLK(iOE) vs control represent a small set of genes (260), which is comparable to other DLK dependent datasets (Asghari Adib et al., 2024) but shows cell-type specificity.

      Second, our analysis using rank-rank hypergeometric overlap (RRHO) detects a significant correlation between upregulated genes from DLK(iOE) vs downregulated genes in DLK(cKO), and vice versa, suggesting that expression of a similar set of genes is depended on DLK (Fig.3C, S6C-E). Consistently, GO term analysis using the list of genes coordinately regulated by DLK, derived from our RRHO analysis, leads to identification of similar GO terms related to up- and downregulated genes as using DLK(iOE)-RiboTag data alone. SynGO analysis of DLK(iOE) regulated genes and DLK(cKO) regulated genes also identified similar synaptic processes regulated by significantly regulated genes (Fig.3F and S6J).  

      Third, we performed additional analysis comparing our Vglut1-RiboTag dataset with CamK2-RiboTag and Grik4-RiboTag datasets from 6-week-old wild type mice reported by (Traunmüller et al., 2023; GSE209870). We observed >80% overlap among the top ranked genes (revised Methods). We described this analysis on pg 9 and Fig. S6K-L (and Supplemental Excel File S3):

      ‘Additionally, we compared our Vglut1-RiboTag datasets with CamK2-RiboTag and Grik4-RiboTag datasets from 6-week-old wild type mice reported by (Traunmüller et al., 2023; GSE209870). We defined a list of genes enriched in CamK2-expressing CA1 neurons relative to Grik4-expressing CA3 neurons (CA1 genes), and those enriched in Grik4-expressing CA3 neurons (CA3 genes) (File S3). When compared with the entire list of Vglut1-RiboTag profiling in our control and DLK(cKO), we found CA1 genes tended to be expressed more in DLK(cKO) mice, compared to control (Fig.S6K), while CA3 genes showed a slight enrichment in control though the trend was less significant, and were less clustered towards one genotype (Fig.S6L). Moreover, many CA1 genes related to cell-type specification, such as FoxP1, Satb2, Wfs1, Gpr161, Adcy8, Ndst3, Chrna5, Ldb2, Ptpru, and Ntm, did not show significant downregulation when DLK was overexpressed. These observations imply that DLK likely specifically down-regulates CA1 genes both under normal conditions and when overexpressed, with a stronger effect on CA1 genes, compared to CA3 genes. Overall, the informatic analysis suggests that decreased expression of CA1 enriched genes may contribute to CA1 neuron vulnerability to elevated DLK, although it is also possible that the observed down-regulation of these genes is a secondary effect associated with CA1 neuron degeneration.’

      To understand the role and relevance of the DLK overexpression model, there should be a discussion of to what extent it corresponds to endogenous levels of DLK expression or DLK-MAPK pathway activation under baseline or pathological conditions.

      We appreciate the suggestion, which is similar to R2 point 3. We have revised the text and discussion to include how DLK levels may be altered in other physiological conditions vs our mice.

      Pg. 6: ‘Several studies have reported that DLK protein levels increase under a variety of conditions, including optic nerve crush (Watkins et al., 2013), NGF withdrawal (~2 fold) (Huntwork-Rodriguez et al., 2013; Larhammar et al., 2017), and sciatic nerve injury (Larhammar et al., 2017). Induced human neurons show increased DLK abundance about ~4 fold in response to ApoE4 treatment (Huang et al., 2019). Increased expression of DLK can lead to its activation through dimerization and autophosphorylation (Nihalani et al., 2000)’.

      And,

      ‘Additional analysis at the mRNA level (supplemental excel, File S2. WT vs DLK(iOE) DEGs) and at the protein level (Fig.S8E) suggest that the increase in DLK abundance was around 3 times the control level. The localization patterns of DLK protein appeared to vary depending on region of hippocampus and age of animals in both control and Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice (Fig.S3C).’

      In Discussion (pg. 16): ‘The levels of DLK in our DLK(iOE) mice model appear comparable to those reported under traumatic injury and chronic stress.’

      The authors posit that "dorsal CA1 neurons are vulnerable to elevated DLK expression, while neurons in CA3 appear largely resistant to DLK overexpression". This statement assumes that DLK expression levels start at a similar baseline among regions. Do the authors have any such data? Ideally, they should show whether DLK expression and p-c-Jun (as a marker of downstream DLK signaling) are the same or different across regions in both WT and overexpression mice. For example, what are the DLK/p-c-Jun expression levels in regions other than CA1 in Supplementary Figures 2-3 and how do they compare with each other? Normalization to baseline for each region does not allow such a comparison. Also, in Supplementary Figure 6, analyses and comparisons between regions are done at a time point when degeneration has already started. Ideally, these should be done at P10.

      We thank the reviewer for raising these points. In the revised manuscript we have included protein expression analysis of DLK (Fig S3), c-Jun, and p-c-Jun at P10 (Fig. S7).

      We provided a quantification of DLK immunostaining intensity in CA1 and CA3 in Fig.S3D,E and find roughly comparable levels between regions.

      Pg. 6: ‘Additional analysis at the mRNA level (supplemental excel, File S2. WT vs DLK(iOE) DEGs) and at the protein level (Fig.S8E) suggest that the increase in DLK abundance was around 3 times the control level. The localization patterns of DLK protein appeared to vary depending on region of hippocampus and age of animals in both control and Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice (Fig.S3C).’

      We provided our quantifications without normalization to baseline in each region for c-Jun and p-c-Jun, and revised the text accordingly:

      Pg. 9-10: ‘In control mice, glutamatergic neurons in CA1 had low but detectable c-Jun immunostaining at P10 and P15, but reduced intensity at P60; those in CA3 showed an overall low level of c-Jun immunostaining at P10, P15 and P60; and those in DG showed a low level of c-Jun immunostaining at P10 and P15, and an increased intensity at P60 (Fig.S7A,C,E). In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice at P10 when no discernable neuron degeneration was seen in any regions of hippocampus, only CA3 neurons showed a significant increase of immunostaining intensity of c-Jun, compared to control (Fig.S7A). In P15 mice, we observed further increased immunostaining intensity of c-Jun in CA1, CA3, and DG, with the strongest increase (~4-fold) in CA1, compared to age-matched control mice (Fig.S7C). The overall increased c-Jun staining is consistent with RiboTag analysis’.

      Pg. 10: ‘In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice, we observed increased p-c-Jun positive nuclei in CA1 at P10, and strong increase in CA1 (~10-fold), CA3 (~6-fold), and DG (~8-fold) at P15 (Fig.S7B,D).

      Illustration of proposed selective changes in hippocampal sector volume needs to be very carefully prepared in view of the substantial claims on selective vulnerability. In 2A under P15 and especially P60, it is difficult to see the difference - this needs lower magnification and a lot of care that anteroposterior levels are identical because hippocampal sector anatomy and volumes of sectors vary from level to level. One wonders if the cortex shrinks, too. This is important.

      Thank you for raising the point. We have provided images to view the anteroposterior level in Fig.S2A-C. We have noticed cortex in DLK(OE) mice to become thinner, along with expansion of ventricles in some animals at later timepoints (Fig.S2C).

      One cannot be sure that there is selective death of hippocampal sectors with DLK overexpression versus, say, rearrangement of hippocampal architecture. One may need stereological analysis, otherwise this substantial claim appears overinterpreted.

      We appreciate the comment.

      In the revised manuscript, we included a new supplemental figure (Fig. S2) showing lower magnification images of coronal sections, and used cautionary wording, such as ‘CA3 is less vulnerable, compared to CA1’, to minimize the impression of over-interpretation.  By NeuN staining, at P10, P15, P60, we did not observe detectable difference in overall hippocampus architecture, apart from noted cell death of CA1 and DG and associated thinning of each of the layers. At 46 weeks, some animals showed differences in the overall shape of dorsal hippocampus, though this appeared to reflect a disproportionately large CA3 region compared to other regions (Fig S2). Increased GFAP staining (Fig.S5A-C) was detected in CA1 but not in CA3, and microglia by IBA1 staining (Fig.S5E) also displayed less reactivity in CA3, compared to CA1. Thus, based on NeuN staining, GFAP staining, IBA1 staining and analysis of the differentially regulated genes, we infer that the effect of DLK(iOE) in CA1 is different than the effect on CA3.

      Is the GFAP excess reflective of neuroinflammation? What do microglial markers show? The presence of neuroinflammation does not bode well with apoptosis. Speaking of which, TUNEL in one cell in Supplementary Figure 4E is not strong evidence of a more widespread apoptotic event in CA1.

      We have included staining data for the microglia marker IBA1. Both GFAP and IBA1 showed evidence of reactivity particularly in the CA1 region (S5A-E), supporting the differential vulnerability in different regions, though whether cell death is primarily due to apoptosis is unclear.

      We agree that our data of sparse TUNEL staining at P15 (Fig S5F,G) do not rule out whether other mechanisms of cell death may also occur.  We have included this in our limitations (pg.20) “While we find evidence for apoptosis, other forms of cell death may also occur.”

      In several places in the paper (as illustrated in Figure 4B, Supplementary Figure 2B, etc.): the unit of biological observation in animal models is typically not a cell, but an organism, in which averaged measures are generated. This is a significant methodological problem because it is not easy to sample neurons without involving stereological methods. With the approach taken here, there is a risk that significance may be overblown.

      We appreciate the reviewer’s point. We used same region for quantification of RNAscope, genotype-blind when possible. We revised the graphs to show mean values for individual mice in Fig.4B, 4C, and Fig.S3B (previously Fig.S2B).

      Other Comments and Questions:

      Supplementary Figure 9: The authors state that data points are shown for individual ROIs - ideally, they should also show averages for biological replicates. Can the authors confirm that statistical analyses are based on biological replicates (mice) and not ROIs?

      We have revised the graphs to show averages from individual mice in Fig.5B-D, F5E-F (previously Fig.S9G-I), Fig.5H-J, and Fig.5K-L (previously Fig.S9J-L)  and Fig.S10B,C,E,F (previously Fig.S9B,C, E,F). The statistical analyses are based on biological replicates of mice.

      For in vitro experiments, what is the effect of DLK overexpression on neuronal viability and density? Could these variables confound effects on synaptogenesis/synapse maturation?

      As described in the Methods, we made hippocampal neuron cultures from P1 pups of the following crosses:

      For control: Vglut1<sup>Cre/+</sup> X Rosa26<sup>tdT/+</sup> 

      For DLKcKO: Vglut1<sup>Cre/+</sup>;DLK(cKO)<sup>fl/fl</sup>  X Vglut1<sup>Cre/+</sup>;DLK(cKO)<sup>fl/fl</sup>;Rosa26<sup>tdT/+</sup> 

      For DLKiOE: H11-DLK<sup>iOE/iOE</sup> X Vglut1<sup>Cre/+</sup>;Rosa26<sup>tdT/+</sup> 

      Dissociated cells from a given litter were pooled into the same culture. Because there were different proportions of neurons with our genotype of interest in each culture, it is not simple to know whether DLK was causing significant cell death.

      On pg 13, we stated our observation:

      ‘We did not notice an obvious effect of DLK(iOE) or DLK(cKO) on neuron density in cultures at DIV2. To assess neuronal type distribution in our cultures, we immunostained DIV14 neurons with antibodies for Satb2, as a CA1 marker (Nielsen et al., 2010), and Prox1, as a marker of DG neurons (Iwano et al., 2012). We did not observe significant differences in the proportion of cells labeled with each marker in DLK(cKO) or DLK(iOE) cultures (Fig.S13E). These data are consistent with the idea that DLK signaling does not have a strong role in neuron-type specification both in vivo and in vitro’.

      We cannot rule out whether variable factors in our cultures may confound effects on synaptogenesis/synapse maturation, and would hope future studies will shed clarity.

      Correlations between c-jun expression and phosphorylation are extremely important and need to be carefully and convincingly documented. I am a bit concerned about Supplementary Figure 6 images, especially 6B-CA1 (no difference between control and KO, too small images) and 6D (no p-c-Jun expression at all anywhere in the hippocampus at P15?).

      At P10, P15, and P60 we stained for p-c-Jun using the Rabbit monoclonal p-c-Jun (Ser73) (D47G9) antibody from Cell Signaling (cat# 3270) at a 1:200 dilution and imaged using an LSM800 confocal microscope with a 20x objective. We observed p-c-Jun to be quite low generally in control animals. We have replaced the images in Fig.S7F (previously S6D), and adjusted the brightness/contrast to enable better visualization of the low signal in Fig.S7B,D,F (previously Fig.S6B,D).

      We revised our text to present the data carefully as stated above:

      Pg. 9-10: ‘In control mice, glutamatergic neurons in CA1 had low but detectable c-Jun immunostaining at P10 and P15, but reduced intensity at P60; those in CA3 showed an overall low level of c-Jun immunostaining at P10, P15 and P60; and those in DG showed a low level of c-Jun immunostaining at P10 and P15, and an increased intensity at P60 (Fig.S7A,C,E). In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice at P10 when no discernable neuron degeneration was seen in any regions of hippocampus, only CA3 neurons showed a significant increase of immunostaining intensity of c-Jun, compared to control (Fig.S7A). In P15 mice, we observed further increased immunostaining intensity of c-Jun in CA1, CA3, and DG, with the strongest increase (~4-fold) in CA1, compared to age-matched control mice (Fig.S7C). The overall increased c-Jun staining is consistent with RiboTag analysis’.

      Pg. 10: ‘In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice, we observed increased p-c-Jun positive nuclei in CA1 at P10, and strong increase in CA1 (~10-fold), CA3 (~6-fold), and DG (~8-fold) at P15 (Fig.S7B,D).

      Recommendations for the authors:

      Several major and minor reservations were raised. The major issues are the need for more information about the over-expression of DLK and a need to extrapolate to an in vivo condition with DLK. A considerable amount of useful information is presented with some very nicely done experiments but it is not yet a coherent or integrated story. The lack of impact of DLK overexpression in some neurons is perhaps the most impactful observation of the study and would be great to have more information around the differential transcriptional/signaling response in these cell types. There is also a need for more experimental details and to address several questions about the mouse genetic and translatome analysis. They are valid concerns that require attention by the authors.

      We thank the editors and reviewers for their thoughtful evaluation and suggestions.  We hope that the editors and reviewers find that the new data and text changes in our revised manuscript, along with above point-to-point response, have addressed the concerns and strengthened our findings.

      Minor points:

      (1)The authors state that deletion of DLK has no effect on CA1 at 1yr, however, the image of CA1 in Figure S1D shows substantially fewer NeuN+ neurons. Is this a representative field of view?

      We have re-examined images, and observed no effect on hippocampal morphology at 1 yr. We now included representative images in the revised Fig S1D.

      (2) Is the DLK protein section staining in Figure 2C a real signal? The staining looks like speckles and is purely somatic. Axonal staining is widely expected based on the literature and the authors' own work. There should be a specificity control.

      To our knowledge, axonal staining of DLK reported in the literature is mostly based on cultured DRG neurons. In addition to the reported axonal localization, DLK is present in the cell soma, near the golgi (Hirai et al., 2002), and in the post-synaptic density (Pozniak et al., 2013).

      In the revised manuscript, we addressed this point by including controls with no primary antibody, and using an antibody against the closely related kinase, LZK. These additional data are shown in (Fig.S3C,D) (previously Fig.S2C), supporting that DLK protein staining represents real signal.  At P10 and P15, DLK immunostaining around CA3 showed axonal staining of the mossy fibers, as well as in the soma and dendritic layers (Fig.S3C,D). A similar pattern was also seen in primary cultured neurons (Fig 6A).

      (3) The protein expression of DLK in the transgenic overexpressor (Figure S7C) looks, to the resolution of this blot, to be at least 50kD heavier than 'WT' DLK. Can the authors explain this discrepancy?

      The Cre-induced DLK(iOE) transgene has T2A and tdTomato in-frame to C-terminus of DLK. It is known that T2A ‘self-cleavage’ is often incomplete. DLK-T2A-tdTomato would be about 50 kD bigger than WT DLK. We now include the transgene design in revised Fig S1D, and also stated in figure legend of Fig.S8C (previously S7C) that ‘Larger molecular weight band of DLK in Vglut1<sup>Cre/+</sup>;H11-DLKiOE/+ would match the predicted molecular weight of DLK-T2A-tdTomato if T2A-peptide induced ‘self-cleavage’ due to ribosomal skipping is ineffective (Fig.S1D).’

      (4) Expression changes in DLK affect various aspects of neurites in CA1 cultures (Figure 6), and changes in DLK also modestly affect STMN4 (and 2, perhaps indirectly) levels (Figure S7C), but there is no indication that DLK acts via STMN4 to cause these changes. It is not clear what to make of these data. Of note, Stmn4 levels change in response to DLK in CA3, without DLK affecting cell death in this region.

      We appreciate and agree with the comment. Other studies (Asghari Adib et al., 2024; DeVault et al., 2024; Hu et al., 2019; Larhammar et al., 2017; Le Pichon et al., 2017; Shin et al., 2019; Watkins et al., 2013) reported expression changes in Stmn4 mRNAs in other cell types and cellular contexts, which appeared to depend on DLK. Hippocampal neurons express multiple Stmns (Fig.S8A). While we present our analysis on the effects of DLK dosage on Stmn4, and also Stmn2, we do not think that DLK-induced changes of Stmn4 expression per se is a major factor underlying CA1 cell death vs CA3 survival.

      In the revised manuscript, we addressed this point in ‘Limitation of our study’ (pg 20):

      ‘Additional experiments will be needed to elucidate in vivo roles of STMN4 and its interaction with other STMNs’.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The article by Piersma et al. aims to reduce the complex process of NK cell licensing to the action of a single inhibitory receptor for MHC class I. This is achieved using a mouse strain lacking all of the Ly49 receptors expressed by NK cells and inserting the Ly49a gene into the Ncr1 locus, leading to expression on the majority of NK cells.

      Strengths:

      The mouse model used represents a precise deletion of all NK-expressed genes within the Ly49 cluster. The re-introduction of the Ly49a gene into the Ncr1 locus allows expression by most NK cells. Convincing effects of Ly49a expression on in vitro activation and in vivo killing assay are shown.

      Weaknesses:

      The choice of Ly49a provides a clear picture of H-2D<sup>d</sup> recognition by this Ly49. It would be valuable to perform additional studies investigating Ly49c and Ly49i receptors for H-2b. This is of interest because there are reports indicating that Ly49c may not be a functional receptor in B6 mice due to strong cis interactions.

      We agree with the reviewer that it will be important to extend our findings to H-2b haplotypes with individual cognate Ly49 receptors (Ly49C and Ly49I). While these experiments are subject of our ongoing studies, they are beyond the scope of the current manuscript considering the significant time, effort and cost to generate these new Ly49C and Ly49I knockin mice.

      This work generates an excellent mouse model for the study of NK cell licensing by inhibitory Ly49s that will be useful for the community. It provides a platform whereby the functional activity of a single Ly49 can be assessed.

      Reviewer #2 (Public review):

      Piersma et al. continue to work on deciphering the role and function of Ly49 NK cell receptors. This manuscript shows that a single inhibitory Ly49 receptor is sufficient to license NK cells and eliminate MHC-I-deficient target cells in mice. In short, they refined the mouse model ∆Ly49-1 (Parikh et al., 2020) into the Ly49KO model in which all Ly49 genes are disrupted. Using this model, they confirmed that NK cells from Ly49KO mice cannot be licensed, produce lower levels of IFN-gamma, and cannot reject MHC-I-deficient cells. To study the effect of a single Ly49 receptor in the function of NK cells, the authors backcrossed Ly49KO mice to H-2D<sup>d</sup> transgenic KODO (D8-KODO) Ly49A knock-in mice in which a single inhibitory Ly49A receptor that recognizes H-2D<sup>d</sup> ligands is expressed. By doing so, they demonstrate that a single inhibitory Ly49 receptor expressed by all NK cells is sufficient for licensing and missing-self killing.

      While the results of the study are largely consistent with the conclusions, it is important to address some discrepancies. For instance, in the title of Figure 1, the authors state that NK cells in Ly49KO mice compared to WT mice have a less mature phenotype , which is not consistent with the corresponding text in the Results section (lines 170-171) that states there is no difference in maturation. These differences are not evident in Figure 1, panel D. It is crucial to acknowledge these inconsistencies to ensure a comprehensive understanding of the research findings.

      We thank the reviewer for pointing this out. We have corrected the figure legend title to: “Mice generated to lack all NK-related Ly49 molecules using CRISPR have NK cells that display alterations in select surface molecules.”

      In the legend of Figure 2. the text related to panel C indicates the use of dyes to label the splenocytes, and CFSE, CTV, and CTFR were mentioned. However, only CTV and CTFR are shown on the plots and mentioned in the corresponding text in the Results section. Similarly, in the legend of Figure 4, which is related to panel C, the authors write that splenocytes were differentially labeled with CFSE and CTV as indicated; however, in Figure 4, C and the Results section text, there is no mention of CFSE.

      We thank the reviewer to point out these inconsistencies. We did label target cells with CFSE to distinguish them from host cells, to clarify we have done the following:

      We have removed CFSE from figure legends of Figure 2 and 4.

      We included the following on CFSE labeling in the Materials and Methods section: “Target splenocytes were additionally labeled with CFSE to identify transferred target splenocytes from host cells.”

      The authors should clarify why they assume that KLRG1 expression is influenced by the expression of inhibitory Ly49 receptors and not by manipulations on chromosome 6, where the genes for both KLRG1 and Ly49 receptors are located.

      The effect on KLRG1 expression in phenocopied in the Ly49A KI mice (on a Ly49 KO background). The Ly49A KI allele is encoded by the Ncr1 locus, which is located on chromosome 7 and not by chromosome 6 where KLRG1 is located, thus excluding involvement of cis-regulatory elements encoded by the Ly49 locus on chromosome 6. 

      We have clarified this in the discussion section (lines 350-358):

      “The Ly49 gene family as well as Klrg1 is located within the NKC on chromosome 6 (Yokoyama and Plougastel, 2003) ….  expression of only Ly49A, encoded in the Ncr1 locus located on chromosome 7, in Ly49KO mice on a H-2D<sup>d</sup> background restored KLRG1 expression”

      However, a better explanation for the possible influence of other inhibitory NK cell receptors still needs to be included. In the study by Zhang et al. (doi: 10.1038/s41467-019-13032-5 the authors showed the synergized regulation of NK cell education by the NKG2A receptor and the specific Ly49 family members. Although in this study, Piersma and colleagues show the control of MHC-I deficient cells by Ly49A+ NKG2A-NK cells in Figure 4., this receptor is not mentioned in the Results or in the Discussion section, so its role in this story needs to be clarified. Therefore, the reader would benefit from more information regarding NKG2A receptor and NKG2A+/- populations in their results.

      We agree with the reviewer that it is important to describe our results in the context of other inhibitory receptors. To clarify the role of NKG2A and potentially other inhibitory receptors we have made the following improvements to our manuscript:

      We discuss the role of NKG2A in the discussion section, which now include (lines 259-266):

      “While our results did not interrogate licensing by inhibitory receptors outside of the Ly49 receptor family, such as has been reported for NKG2A (Anfossi et al., 2006; Zhang et al., 2019), they do demonstrate that expression of Ly49A without other Ly49 family members can mediate NK cell licensing. Moreover, we found that Ly49 receptors are required and sufficient for missing-self rejection under steady-state conditions. However, these observations do not rule out involvement of other inhibitory receptors under specific inflammatory conditions. For example, NKG2A contributes to rejection of missing-self targets in poly(I:C)-treated mice (Zhang et al., 2019).”

      We also added the following to the result section (lines 179-182):

      NKG2A has been implicated in NK cell licensing by the non-classical MHC-I molecule Qa1 (Anfossi et al., 2006), to eliminate potential confounding effects by this interaction, effector functions of NKG2A- NK cells were evaluated as described before (Bern et al., 2017).

      Reviewer #3 (Public review):

      Summary:

      In this study, Piersma et al. successfully generated a mouse model with all Ly49n et al., 2017 genes knocked out, resulting in the complete absence of Ly49 receptor expression on the cell surface. The absence of Ly49 expression led to the loss of NK cell education/licensing and consequently, a failure in responsiveness against missing-self target cells. The experimental work and findings are partially overlapping with the previous work by Zhang et al. (2019), who also performed knockout of the entire Ly49 locus in mice and demonstrated that loss of NK responsiveness was due to the removal of inhibitory, and not activating Ly49 genes. The authors demonstrate the restoration of NK cell licensing by knocking in a single Ly49 gene, Ly49A, in a mouse expressing the H-2D<sup>d</sup> ligand for this receptor, which is a novel and important finding.

      Strengths:

      The authors established a novel mouse model enabling them to have a clean and thorough study on the function of Ly49 on NK cell licensing. Also, by knocking in a single Ly49, they were able to investigate the function of a given Ly49 receptor excluding the "contamination" of co-expression of any other Ly49 genes. Their idea and method were novel though the mouse model was somehow genetically similar to a previous study. The experiment design and data interpretation were logically clear and the evidence was solid.

      Weaknesses:

      The paper is very poorly written and confusing. The authors should be more accurate in the usage of terminology, provide more details on experimental procedures, and revise much of the text to improve clarity and coherence. A thorough revision aiming to clarify the paper would be helpful.

      We regret that the manuscript was confusing to the reviewer. We have made thorough revisions to the different sections, which we hope will improve the clarity of the manuscript.

      We have made changes to all sections of the manuscript, including the title. These revisions include improved clarity on description of NK cell licensing and consistent usage throughout the manuscript, per the reviewer recommendations. We hope that all our improvements help the clarity of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I was confused by lines 262-270 in the discussion. The data from Hanke et al. is presented as contradictory to the observation that Ly49s bind more efficiently to H2-Kb than -Db, but they showed that Ly49c/i did not bind Kb-deficient cells, supporting the preferred binding to Kb.

      We have clarified this issue and the paragraph now reads: “This is further supported by early studies using Ly49 transfectants binding to Con A blasts showing that Ly49C and Ly49I can bind to H-2D<sup>b</sup>-deficient but not H-2K<sup>b</sup>-deficient cells (Hanke et al., 1999), despite the caveat of testing binding to cells overexpressing Ly49s in these studies.”

      Reviewer #2 (Recommendations for the authors):

      The authors' conclusion that one type of inhibitory Ly49 receptor expressed on NK cells is sufficient for successful licensing and rejection of missing self-cells is a significant step forward. However, it would be beneficial to complement this with additional data. For instance, exploring the role of a single inhibitory Ly49 receptor responsible for licensing in a mouse model with a different haplotype (e.g. Ly49C or Ly49I on H-2b MHC I haplotype in C57BL/6J mice) could provide valuable insights and open new avenues for research in the field.

      We agree with the reviewer that it will be important to extend our findings to additional MHC-I haplotypes with single cognate Ly49 receptors. While these experiments are subject of our ongoing studies, they are beyond the scope of the current manuscript considering the significant effort, time, and cost to generate these new Ly49C and Ly49I knockin mice.

      Reviewer #3 (Recommendations for the authors):

      Specific issues that should be addressed are as follows:

      (1) The title of the paper: "Expression of a single inhibitory Ly49 receptor is sufficient to license NK cells for effector functions" is ambiguous. When I first read the title, I thought the authors meant that only a single Ly49 molecule on the NK cell surface was necessary to induce licensing. It might be better to replace "single inhibitory receptor" with "single member of Ly49 receptor family".

      We have changed the title to: “Expression of a single inhibitory member of the Ly49 receptor family is sufficient to license NK cells for effector functions”

      (2) In the abstract, introduction, and results, the authors distinguish "licensing" and "rejection of missing-self targets" as two distinct phenomena. An example includes Abstract, lines 51-53: "Herein, we showed mice lacking expression of all Ly49s were unable to reject missing-self target cells in vivo, were defective in NK cell licensing, and displayed lower KLRG1 on the surface of NK cells". Similarly, the title of the second subsection of the Results states: "Ly49-deficient NK cells are defective in licensing and rejection of cognate MHC-I deficient target cells" (line 176). In these instances, it seems that by "licensing", they mean only response to plate-bound anti-NK1.1 stimulation and not a response to missing-self targets. Alternatively, in the first paragraph of the Discussion, it sounds as if licensing includes both anti-NK1.1 and missing-self responses (lines 258-260): "...NK cells were fully licensed in terms of their functional phenotype, including the capacity to be activated by an activation receptor in vitro and efficient rejection of MHC-I deficient target cells in vivo". Please define the terms and use the terms consistently throughout the paper.

      We were the first to describe the term licensing and have defined this as acquisition of NK cell functional competence by self-MHC molecules (Kim et al., 2005), which is characterized by increased NK cell effector functions to activating signals. Thus, licensed NK cells are prevented from attacking normal MHC-I<sup>+</sup> cells by the same self-MHC-I-specific receptor that conferred licensing, while unlicensed NK cells without appropriate Ly49 receptors are functionally incompetent.

      To clarify we made changes throughout the manuscript including the following:

      Lines 91-101:

      “In addition to effector function in missing-self, Ly49 receptors that recognize their cognate MHC-I ligands are involved in licensing or education of NK cells to acquire functional competence. NK cell licensing is characterized by potent effector functions including IFNγ production and degranulation in response to activation receptor stimulation (Elliott et al., 2010; Kim et al., 2005). Like missing-self recognition, inhibitory Ly49s require SHP-1 for NK cell licensing which interacts with the ITIM-motif encoded in the cytosolic tail of inhibitory Ly49s (Bern et al., 2017; Kim et al., 2005; Viant et al., 2014). Moreover, lower expression of SHP-1, particularly within the immunological synapse, is associated with licensed NK cells (Schmied et al., 2023; Wu et al., 2021). Thus, inhibitory Ly49s have a second function that licenses NK cells to self-MHC-I thereby generating functionally competent NK cells but it has not been possible to exclude contributions from other co-expressed Ly49s.”

      Lines 268-271 (previously 258-260):

      “Yet the NK cells were fully licensed in terms of IFNγ production and degranulation in vitro and efficiently rejected MHC-I deficient target cells in vivo. Thus, a single Ly49 receptor is capable to confer the licensed phenotype and missing-self rejection in vitro and in vivo.”

      Lines 309-312:

      “In conclusion, these data show that expression of a single inhibitory Ly49 receptor is necessary and sufficient to license NK cells and mediate missing self-rejection under steady state conditions in vivo.”

      (3) Introduction, lines 76-79. Please provide the C57BL/6 MHC-I genotype. It is difficult to follow the text here without this information. In general, please provide information to help the reader who may not be working in this precise field.

      We thank the reviewer for pointing this out. We have included this and the lines now read: “For example, in the C57BL/6 background, Ly49C and Ly49I can recognize H-2<sup>b</sup> MHC-I molecules that include H-2K<sup>b</sup> and H-2D<sup>b</sup>, while Ly49A and Ly49G cannot recognize H-2<sup>b</sup> molecules and instead they recognize H-2<sup>d</sup> alleles.”

      (4) Introduction, lines 85-97. Please use commas: "...the MHC-I specificities of other Ly49s have been primarily studied with MHC tetramers containing human b2m, which is not recognized by Ly49A, on cells overexpressing Ly49s" in order to clarify the sentence.

      Commas have been added as suggested by the reviewer.

      (5) Introduction, lines 91-101. The whole paragraph starting with the following sentence does not make sense and should be re-written. "In addition to effector function in missing-self, when inhibitory Ly49 receptors recognize their cognate MHC-I ligands in vivo, they license or educate NK cells for potent effector functions including IFNγ production and degranulation in response to activation receptor stimulation".

      We regret that this paragraph was not clear to the reviewer. We have changed this paragraph to:

      “In addition to effector function in missing-self, Ly49 receptors that recognize their cognate MHC-I ligands are involved in licensing or education of NK cells to acquire functional competence. NK cell licensing is characterized by potent effector functions including IFNγ production and degranulation in response to activation receptor stimulation (Elliott et al., 2010; Kim et al., 2005). Like missing-self recognition, inhibitory Ly49s require SHP-1 for NK cell licensing which interacts with the ITIM-motif encoded in the cytosolic tail of inhibitory Ly49s (Bern et al., 2017; Kim et al., 2005; Viant et al., 2014). Moreover, lower expression of SHP-1, particularly within the immunological synapse, is associated with licensed NK cells (Schmied et al., 2023; Wu et al., 2021). Thus, inhibitory Ly49s have a second function that licenses NK cells to self-MHC-I thereby generating functionally competent NK cells but it has not been possible to exclude contributions from other co-expressed Ly49s.”

      (6) Results, line 181. Please edit: "...MHC-I-deficient H-2K<sup>b</sup> x H-2D<sup>b</sup> deficient (KODO) mice".

      This sentence now reads “... NK cells from H-2K<sup>b</sup> and H-2D<sup>b</sup> double deficient (KODO) mice”

      (7) Results, line 192. Please re-word the following phrase: "missing-self is dominated by H-2K<sup>b</sup> in the C57BL/6 background", as it is unclear. Do you mean that H-2K<sup>b</sup> is protected from lysis as opposed to H-2D<sup>b</sup>?

      We thank the reviewer for pointing this out, line 192 now reads: “..missing-self recognition in the C57BL/6 background depends on the absence of H-2K<sup>b</sup> rather than H-2D<sup>b</sup>.”

      (8) Please briefly describe the Ncr1-Ly49A knockin procedure so that the reader understands the link between NKp46 and Ly49A expression without going to the earlier paper. Also, it needs to be mentioned that Ncr1 is the gene encoding NKp46.

      Lines 201-205 now read: “To investigate the potential of a single inhibitory Ly49 receptor on mediating NK cell licensing and missing-self rejection, the Ly49KO mice were backcrossed to H-2D<sup>d</sup> transgenic KODO (D8-KODO) Ly49A KI mice that express Klra1 cDNA encoding the inhibitory Ly49A receptor in the Ncr1 locus encoding NKp46 and its cognate ligand H-2D<sup>d</sup> but not any other classical MHC-I molecules (Parikh et al., 2020).

      In the materials and Methods section, the following has been added (lines 324-326):

      “In Ly49A KI mice the stop codon of Ncr1 encoding NKp46 is replaced with a P2A peptide-cleavage site upstream of the Ly49A cDNA, while maintaining the 3’ untranslated region.”

      (9) Figure 4C, legend. There is no CFSE staining in this experiment. Please correct.

      We did label target cells with CFSE to distinguish them from host cells, to clarify we have done the following:

      We have removed CFSE from figure legends of Figure 2 and 4.

      We included the following on CFSE labeling in the Materials and Methods section (lines 377-379): “Target splenocytes were additionally labeled with CFSE to identify transferred target splenocytes from host cells.”

      (10) Discussion, lines 262-270. This paragraph sounds as if data by Hanke et al. does not agree with the data presented in the paper. On the contrary, Hanke et al. demonstrate that Ly49C and Ly49I detectably bind to H-2K<sup>b</sup>, but poorly to H-2D<sup>b</sup>, supporting observations shown in Figure 2C.

      We have clarified this issue and the paragraph now reads: “This is further supported by early studies using Ly49 transfectants binding to Con A blasts showing that Ly49C and Ly49I can bind to H-2D<sup>b</sup>-deficient but not H-2K<sup>b</sup>-deficient cells (Hanke et al., 1999), despite the caveat of testing binding to cells overexpressing Ly49s in these studies.”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Hua et al show how targeting amino acid metabolism can overcome Trastuzumab resistance in HER2+ breast cancer.

      Strengths:

      The authors used metabolomics, transcriptomics and epigenomics approaches in vitro and in preclinical models to demonstrate how trastuzumab-resistant cells utilize cysteine metabolism.

      Thank you for your valuable comments. We would like to extend our appreciation for your efforts. Your constructive suggestion would help improve our research.

      Weaknesses:

      However, there are some key aspects that needs to be addressed.

      Major:

      (1) Patient Samples for Transcriptomic Analysis: It is unclear from the text whether tumor tissues or blood samples were used for the transcriptomic analysis. This distinction is crucial, as these two sample types would yield vastly different inferences. The authors should clarify the source of these samples.

      Thank you for your valuable comments. In the transcriptomic analysis, we included the data of HER2 positive breast cancer patients who received trastuzumab in I-SPY2 trial (GSE181574). Tumor tissues were used in this dataset.

      (2) The study only tested one trastuzumab-resistant and one trastuzumab-sensitive cell line. It is unclear whether these findings are applicable to other HER2-positive tumor cell lines, such as HCC1954. The authors should validate their results in additional cell lines to strengthen their conclusions.

      Thank you for your valuable comments. We agree with your opinion, and the exploration of multiple cell lines would make our research findings more comprehensive. This is a limitation of our study, and we would continue to improve our design and methods in future experiments.

      (3) Relevance to Metastatic Disease: Trastuzumab resistance often arises in patients during disease recurrence, which is frequently associated with metastasis. However, the mouse experiments described in this paper were conducted only in the primary tumors. This article would have more impact if the authors could demonstrate that the combination of Erastin or cysteine starvation with trastuzumab can also improve outcomes in metastasis models.

      Thank you for your valuable comments. We agree with your suggestions. The exploration of metastatic disease would make our research more meaningful and help better address clinical key issues. In our future studies, we will continue to investigate the association between the invasive and metastatic capabilities of trastuzumab resistant HER2 positive breast cancer and cysteine metabolism.

      Minor:

      (1) The figures lack information about the specific statistical tests used. Including this information is essential to show the robustness of the results.

      Thank you for your valuable comments. We would include the statistical information in our figure legends.

      (2) Figure 3K Interpretation: The significance asterisks in Figure 3K do not specify the comparison being made. Are they relative to the DMSO control? This should be clarified.

      Thank you for your valuable comments. We would clarify the comparison information in our figure legends.

      Reviewer #2 (Public review):

      In this manuscript, Hua et al. proposed SLC7A11, a protein facilitating cellular cystine uptake, as a potential target for the treatment of trastuzumab-resistant HER2-positive breast cancer. If this claim holds true, the finding would be of significance and might be translated to clinical practice. Nevertheless, this reviewer finds that the conclusion was poorly supported by the data.

      Notably, most of the data (Figures 2-6) were based on two cell lines - JIMT1 as a representative of trastuzumab-resistant cell line, and SKBR3 as a representative of trastuzumab sensitive cell line. As such, these findings could be cell-line specific while irrelevant to trastuzumab sensitivity at all. Furthermore, the authors claimed ferroptosis simply based on lipid peroxidation (Figure 3). Cell viability was not determined, and the rescuing effects of ferroptosis inhibitors were missing. The xenograft experiments were also suspicious (Figure 4). The description of how cysteine starvation was performed on xenograft tumors was lacking, and the compound (i.e., erastin) used by the authors is not suitable for in vivo experiments due to low solubility and low metabolic stability. Finally, it is confusing why the authors focused on epigenetic regulations (Figures 5 & 6), without measuring major transcription factors (e.g., NRF2, ATF4) which are known to regulate SLC7A11.

      To sum up, this reviewer finds that the most valuable data in this manuscript is perhaps Figure 1, which provides unbiased information concerning the metabolic patterns in trastuzumab-sensitive and primary resistant HER2-positive breast cancer patients.

      Thank you for your valuable comments. We agree with your suggestions. Your feedback would help enhance the quality of our research.

      (1) Our research was mainly conducted in JIMT1 (trastuzumab resistant) and SKBR3 (trastuzumab sensitive), and this is a limitation of our study. The experimental validation using different cell lines will make our research findings more persuasive. In our future research, we will continuously optimize experimental design and methods to make our findings more comprehensive.

      (2) The detection of ferroptosis in our research was mainly performed by evaluating the lipid peroxidation. Experiments measuring cell viability and rescuing effects would help provide more evidence.

      (3) In xenograft experiments, the cysteine starvation was performed by feeding cysteine-free diet. The drug dissolution and other conditions were optimized by referring to previous relevant literature. We would clarify more details in our article.

      (4) Epigenetic modifications have been recognized as crucial factors in drug resistance formation. An increasing number of studies have emphasized the importance of epigenetic changes in regulating the abnormal expression of oncogenes and tumor suppressor genes related to drug resistance. Currently, the role of epigenetic changes in the development of trastuzumab resistance in HER2 positive breast cancer is still in exploration. We tried to investigate the dysregulation of histone modifications and DNA methylation in trastuzumab resistant HER2 positive breast cancer. Our findings indicated that targeting H3K4me3 and DNA methylation could decrease SLC7A11 expression and induce ferroptosis. This would provide more evidence in exploring trastuzumab resistance mechanisms. We will provide a more detailed discussion in the article.

      We would like to extend our appreciation for your constructive suggestions and continue to improve our research in future experiments.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors report that GPR55 activation in presynaptic terminals of Purkinje cells decrease GABA release at the PC-DCN synapse. The authors use an impressive array of techniques (including highly challenging presynaptic recordings) to show that GPR55 activation reduces the readily releasable pool of vesicle without affecting presynaptic AP waveform and presynaptic Ca2+ influx. This is an interesting study, which is seemingly well-executed and proposes a novel mechanism for the control of neurotransmitter release. However, the authors' main conclusions are heavily, if not solely, based on pharmacological agents that most often than not demonstrate affinity at multiple targets. Below are points that the authors should consider in a revised version.

      We thank the reviewer for the encouraging comments, and will fully address the reviewer’s concerns as detailed below.

      Major points:

      (1) There is no clear evidence that GPR55 is specifically expressed in presynaptic terminals at the PC-DCN synapse. The authors cited Ryberg 2007 and Wu 2013 in the introduction, mentioning that GPR55 is potentially expressed in PCs. Ryberg (2007) offers no such evidence, and the expression in PC suggested by Wu (2013) does not necessarily correlate with presynaptic expression. The authors should perform additional experiments to demonstrate the presynaptic expression of GPR55 at PC-DCN synapse.

      We agree with the reviewer’s concern that the present manuscript lacks the evidence for localization of GPR55 at PC axon terminals. Honestly, our previous attempt to immune-label GPR55 did not work well. Now, we realize that different antibodies are commercially available, and are going to test them. Hopefully, in the revised manuscript, we will demonstrate immunocytochemical images showing GPR55 at terminals of PCs.

      (2) The authors' conclusions rest heavily on pharmacological experiments, with compounds that are sometimes not selective for single targets. Genetic deletion of GPR55 would be a more appropriate control. The authors should also expand their experiments with occlusion experiments, showing if the effects of LPI are absent after AM251 or O-1602 treatment. In addition, the authors may want to consider AM281 as a CB1R antagonist without reported effects at GPR55.

      We appreciate the reviewer for pointing out the essential issue regarding the specificity of activation of GPR55 in our study. Regarding the direct manipulation of GPR55, such as genetic deletion, we will try acute knock-down of its expression, considering the possibility of compensation which sometimes occur when the complete knock-out is performed. In addition, according to the reviewer’s suggestion, we will examine whether the effects of LPI and AM251 occlude each other, and also perform control experiments showing the lack of CB1R involvement.

      (3) It is not clear how long the different drugs were applied, and at what time the recordings were performed during or following drug application. It appears that GPR55 agonists can have transient effects (Sylantyev, 2013; Rosenberg, 2023), possibly due to receptor internalization. The timeline of drug application should be reported, where IPSC amplitude is shown as a function of time and drug application windows are illustrated.

      As suggested, the timing and duration of drug application will be indicated together with the time course of changes of IPSC amplitudes. This change will make things much clearer. Thank you for the suggestion.

      (4) A previous investigation on the role of GPR55 in the control of neurotransmitter release is not cited nor discussed Sylantyev et al., (2013, PNAS, Cannabinoid- and lysophosphatidylinositol-sensitive receptor GPR55 boosts neurotransmitter release at central synapses). Similarities and differences should be discussed.

      We are really sorry for missing this important study in discussion and citation. In the revised version, of course, we will discuss their findings and our data.

      Minor point:

      (1) What is the source of LPI? What isoform was used? The multiple isoforms of LPI have different affinities for GPR55.

      We are sorry for insufficient explanation about the LPI used in our study. We used LPI derived from soy (Merck, catalog #L7635) that was estimated to contain 58% C16:0 and 42% C18:0 or C18:2 LPI. This information will be added to the Materials and Methods in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This paper investigates the mode of action of GPR55, a relatively understudied type of cannabinoid receptor, in presynaptic terminals of Purkinje cells. The authors use demanding techniques of patch clamp recording of the terminals, sometimes coupled with another recording of the postsynaptic cell. They find a lower release probability of synaptic vesicles after activation of GPR55 receptors, while presynaptic voltage-dependent calcium currents are unaffected. They propose that the size of a specific pool of synaptic vesicles supplying release sites is decreased upon activation of GPR55 receptors.

      Strengths:

      The paper uses cutting-edge techniques to shed light on a little-studied, potentially important type of cannabinoid receptor. The results are clearly presented, and the conclusions are for the most part sound.

      We are really happy to hear the encouraging comments from the reviewer.

      Weaknesses:

      The nature of the vesicular pool that is modified following activation of GPR55 is not definitively characterized.

      During revision, we will perform further analysis and additional experiments to obtain deeper insights into the vesicle pools affected by GPR55 as much as possible.

      Reviewer #3 (Public review):

      Summary:

      Inoshita and Kawaguchi investigated the effects of GPR55 activation on synaptic transmission in vitro. To address this question, they performed direct patch-clamp recordings from axon terminals of cerebellar Purkinje cells and fluorescent imaging of vesicular exocytosis utilizing synapto-pHluorin. They found that exogenous activation of GPR55 suppresses GABA release at Purkinje cell to deep cerebellar nuclei (PC-DCN) synapses by reducing the readily releasable pool (RRP) of vesicles. This mechanism may also operate at other synapses.

      Strengths:

      The main strength of this study lies in combining patch-clamp recordings from axon terminals with imaging of presynaptic vesicular exocytosis to reveal a novel mechanism by which activation of GPR55 suppresses inhibitory synaptic strength. The results strongly suggest that GPR55 activation reduces the RRP size without altering presynaptic calcium influx.

      We thank the reviewer for the positive evaluation on our conclusions.

      Weaknesses:

      The study relies on the exogenous application of GPR55 agonists. It remains unclear whether endogenous ligands released due to physiological or pathological activities would have similar effects. There is no information regarding the time course of the agonist-induced suppression. There is also little evidence that GPR55 is expressed in Purkinje cells. This study would benefit from using GPR55 knockout (KO) mice. The downstream mechanism by which GPR55 mediates the suppression of GABA release remains unknown.

      We agree with the reviewer in all respects suggested as weaknesses. Most issues will be made much clearer by the additional experiments and analysis described above to respond to respective issues raised by other reviewers. The situation of endogenous ligands for GPR55 causing the synaptic depression and its downstream mechanism are very important issues, and we are going to discuss these points in the revised manuscript, and like to work on these in the future study.

    1. Author response:

      Reviewer #1:

      In their paper entitled "Combined transcriptomic, connectivity, and activity profiling of the medial amygdala using highly amplified multiplexed in situ hybridization (hamFISH)" Edwards et al. present a new method designated as hamFISH (highly amplified multiplexed in situ hybridization) that enables sequential detection of {less than or equal to}32 genes using multiplexed branched DNA amplification. As proof-of-principle, the authors apply the new technique - in conjunction with connectivity, and activity profiling - to the medial amygdala (MeA) of the mouse, which is a critical nucleus for innate social and defensive behaviors.

      As mentioned by Edwards et al., hamFISH could prove beneficial as an affordable alternative to other in situ transcriptomic methods, including commercial platforms, that are resource-intensive and require complex analysis pipelines. Thus, the authors envision that the method they present could democratize in situ cell-type identification in individual laboratories.

      The data presented by Edwards et al. is convincing. The authors use the appropriate and validated methodology in line with the current state-of-the-art. The paper makes a strong case for the benefits of hamFISH when combining transcriptomics studies with connectivity tracing and immediate early gene-based activity profiling. Notably, the authors also discuss the caveats and limitations of their study/approach in an open and transparent manner.

      In its current state, the manuscript touches upon a number of most intriguing, yet rather preliminary findings. For example, the roles of inhibitory neuron cluster i3 or of the selective and apparently MeA neuron-specific projections (Figure 3 - Figure Supplement 2D) remain elusive. As it is the authors' prime intent to provide "a proof-of-principle example of overlaying transcriptomic types, projection, and activity in a behaviorally relevant manner and demonstrates the usefulness of hamFISH in multiplexed in situ gene expression profiling", such studies might be beyond the scope of the present manuscript. The absence of such more in-depth hypothesis-based analysis, however, prevents an even more enthusiastic overall assessment.

      We thank the reviewer for their positive assessment and agree that further studies are needed to explore and understand the MeA circuit further.

      Reviewer #2:

      The authors describe the development and implementation of hamFISH, a sensitive multiplexed ISH method. They leverage a pre-existing scRNA-seq dataset for the MeA to design 32 probes that combinatorically represent MeA neuronal populations - ~80% of MeA neurons express three of these markers. Using these markers to assess the spatial organization of the MeA, the authors identify a novel population of Ndnf+ projection neurons and characterize their connectivity with anterograde and retrograde labeling. They additionally combine hamFISH with CTB labeling of three principal MeA projection sites to show that 75% of MeA neurons have only a single projection target. Finally, they engage adult male mice in encounters with other adult males (aggression), females (mating), and pups (infanticide), followed by hamFISH and c-fos labeling to relate cell identity to behavior. Their overall conclusion is that hamFISH-defined cell types are broadly active to multiple sensory stimuli. However, the data presented are not sufficient to conclude that no selectivity exists within the MeA. A weakness of the study is that the selected hamFISH genes contain only Lhx6 as a lineage-marking transcription factor. Instead, the authors predominately use neuropeptides as markers. Genes such as Tac1, Cartpt, Adcyap1, Calb1, and Gal are expressed throughout the MeA, and many other brain regions; they are not restricted to a single transcriptomic cell type and they do not denote any developmental origins. By design, the panel has low cell type specificity as all MeA neurons express at least three of the genes. Therefore, the authors' conclusions may not hold with a more stringent classification of cell type or cell identity.

      We agree with the reviewer that a deeper level of cell type classification may reveal the selectivity of cell types that may have been missed. The design of our hamFISH bridge-readout probes allows modification to be compatible with a barcoded readout system such as MERFISH, which would substantially increase the number of genes that can be included in the gene panel. This would, however, increase the complexity of the analysis pipeline and reduce throughput, but would be a potential avenue to explore to define MeA cell types at a deeper level. An advantage of hamFISH is the ease of including and reading out alternative gene panels. For example, one panel could examine developmental-lineage-specific genes. Overall, our panel captures the highest hierarchical level (similar to the subclass level of the Allen taxonomy) of MeA transcriptomic types, based on published data available at the time of our gene panel design. Genes including Tac1, Cartpt, Adcyap1, Calb1, and Gal are expressed in specific patterns within the MeA and are useful for classification. In the original manuscript, we also included our rationale for dropping Foxp2, a lineage-specific marker gene in the MeA.

      Reviewer #3:

      In this manuscript, Edwards et al. describe hamFISH, a customizable and cost-efficient method for performing targeted spatial transcriptomics. hamFISH utilizes highly amplified multiplexed branched DNA amplification, and the authors extensively describe hamFISH development and its advantages over prior variants of this approach.

      The authors then used hamFISH to investigate an important circuit in the mouse brain for social behavior, the medial amygdala (MeA). To develop a hamFISH probe set capable of distinguishing MeA neurons, the authors mined published single-cell RNA-sequencing datasets of the MeA, ultimately creating a panel of 32 hamFISH probes that mostly cover the identified MeA cell types. They evaluated over 600,000 MeA cells and classified neurons into 16 inhibitory and 10 excitatory types, many of which are spatially clustered. The authors combined hamFISH with viral and other circuit tracer injections to determine whether the identified MeA cell populations sent and/or received unique inputs from connected brain regions, finding evidence that several cell types had unique patterns of input and output. Finally, the authors performed hamFISH on the brains of male mice that were placed in behavioral conditions that elicit aggressive, infanticidal, or mating behaviors, finding that some cell populations are selectively activated (as assessed by c-fos mRNA expression) in specific social contexts.

      Strengths:

      (1) The authors developed an optimized tissue preparation protocol for hamFISH and implemented oligopools instead of individually synthesized oligonucleotides to reduce costs. The branched DNA amplification scheme improved smFISH signal compared to previous methods, and multiple variants provide additional improvements in signal intensity and specificity. Compared to other spatial transcriptomics methods, the pipeline for imaging and analysis is streamlined and is compatible with other techniques like fluorescence-based circuit tracing. This approach is cost-effective and has several advantages that make it a valuable addition to the list of spatial transcriptomics toolkits.

      (2) Using 31 probes, hamFISH was able to detect 16 inhibitory and 10 excitatory neuron types in the MeA subregions, including the vast majority of cell types identified by other transcriptomics approaches. The authors quantified the distributions of these cell types along the anterior-posterior, dorsal-ventral, and medial-lateral axes, finding spatial segregation among some, but not all, MeA excitatory and inhibitory cell types. The authors additionally identified a class of inhibitory neurons expressing Ndnf (and a subset of these that express Chrna7) that project multiple social chemosensory circuits.

      (3) The authors combined hamFISH with MeA input and output mapping, finding cell-type biases in the projections to the MPOA, BNST, and VMHvl, and inputs from multiple regions.

      (4) The authors identified excitatory and inhibitory cell types, and patterns of activity across cell types, that were selectively activated during various social behaviors, including aggression, mating, and infanticide, providing new insights and avenues for future research into MeA circuit function.

      Weaknesses:

      (1) Gene selection for hamFISH is likely to still be a limiting factor, even with the expanded (32-probe) capacity. This may have contributed to the lack of ability to identify sexually dimorphic cell types (Figure S2B). This is an expected tradeoff for a method that has major advantages in terms of cost and adaptability.

      We recognise that the 32-plex gene detection might not be sufficient to address key questions in the transcriptomic organization of innate social behavior circuits, and that the study fell short of addressing more quantitative gene expression differences between sexes.  Detecting sexually dimorphic gene expression likely requires a more targeted approach as the dimorphism is expression differences rather than binary expression of marker genes, and the gene panel needs to be specifically configured for this purpose.

      (2) Adaptation of hamFISH, for example, to adapt it to other brain regions or tissues, may require extensive optimization.

      We have successfully performed hamFISH on at least two other mouse brain regions without needing to optimize further, suggesting that compatibility with other mouse brain regions is not an issue. We recognise, however, that optimization of hamFISH may be required for its application in other types of tissue or species. Human brain tissue, for example, typically suffers from high autofluorescence and different tissue preparation methods may need to be employed. We note that the amplification by hamFISH signal boost with v2 amplifiers may be useful to this end.

      (3) Pairing this method with behavioral experiments is likely to require further optimization, as c-fos mRNA expression is an indirect and incomplete survey of neuronal activity (e.g. not all cell types upregulate c-fos when electrically active). As such, there is a risk of false negative results that limit its utility for understanding circuit function.

      We acknowledge that c-fos is not the only readout of neuronal activity and that a panel of immediate early genes would allow a more comprehensive readout of activity-dependent gene expression. We fully agree that immediate early gene induction is an indirect readout of neural activity, and alternative methods such as in vivo physiology would provide a complementary insight into the selectivity of MeA neuron responses.

      (4) The limited compatibility of hamFISH with thicker tissue samples and lack of optical sectioning introduce additional technical limitations. For example, it would be difficult to densely sample larger neural circuits using serial 20 micron sections. Also, because the imaging modality is not clear from the methods, it is difficult to know whether the analysis methods introduce the risk of misattributing gene expression to overlapping cells.

      We agree that the use of hamFISH as described here is restricted to thin (<20 um) sections. We have shown, however, that our encoding probe and bridge-readout probe design are compatible with HCR-based mRNA detection, which is compatible with thicker sections. Regarding the misattribution of gene expression to overlapping cells in the z-axis, we used epifluorescence microscopy with 14x 500 nm z-steps to collect our raw data and generate maximum intensity projections for further analysis. Because of the thin sections (10 um) used for the imaging, the overlap between cells in z is expected to be minimal. Regarding throughput, we agree that hamFISH is likely not suitable for brain-wide questions that require large volume coverage, but its major advantage is that it allows routine use of low-level multiplexing for targeted brain areas.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      The study by Pudlowski et al. investigates how the intricate structure of centrioles is formed by studying the role of a complex formed by delta- and epsilon-tubulin and the TEDC1 and TEDC2 proteins. For this, they employ knockout cell lines, EM, and ultrastructure expansion microscopy as well as pull-downs. Previous work has indicated a role of delta- and epsilon-tubulin in triplet microtubule formation. Without triplet microtubules centriolar cylinders can still form, but are unstable, resulting in futile rounds of de novo centriole assembly during S phase and disassembly during mitosis. Here the authors show that all four proteins function as a complex and knockout of any of the four proteins results in the same phenotype. They further find that mutant centrioles lack inner scaffold proteins and contain an extended proximal end including markers such as SAS6 and CEP135, suggesting that triplet microtubule formation is linked to limiting proximal end extension and formation of the central region that contains the inner scaffold. Finally, they show that mutant centrioles seem to undergo elongation during early mitosis before disassembly, although it is not clear if this may also be due to prolonged mitotic duration in mutants.  

      Strengths:  

      Overall this is a well-performed study, well presented, with conclusions mostly supported by the data. The use of knockout cell lines and rescue experiments is convincing.  

      Weaknesses:  

      In some cases, additional controls and quantification would be needed, in particular regarding cell cycle and centriole elongation stages, to make the data and conclusions more robust. 

      We thank the reviewer for these comments and have improved our analyses of these as detailed below.

      Reviewer #2 (Public Review):  

      Summary:  

      In this article, the authors study the function of TEDC1 and TEDC2, two proteins previously reported to interact with TUBD1 and TUBE1. Previous work by the same group had shown that TUBD1 and TUBE1 are required for centriole assembly and that human cells lacking these proteins form abnormal centrioles that only have singlet microtubules that disintegrate in mitosis. In this new work, the authors demonstrate that TEDC1 and TEDC2 depletion results in the same phenotype with abnormal centrioles that also disintegrate into mitosis. In addition, they were able to localize these proteins to the proximal end of the centriole, a result not previously achieved with TUBD1 and TUBE1, providing a better understanding of where and when the complex is involved in centriole growth.  

      Strengths:  

      The results are very convincing, particularly the phenotype, which is the same as previously observed for TUBD1 and TUBE1. The U-ExM localization is also convincing:

      despite a signal that's not very homogeneous, it's clear that the complex is in the proximal region of the centriole and procentriole. The phenotype observed in U-ExM on the elongation of the cartwheel is also spectacular and opens the question of the regulation of the size of this structure. The authors also report convincing results on direct interactions between TUBD1, TUBE1, TEDC1, and TEDC2, and an intriguing structural prediction suggesting that TEDC1 and TEDC2 form a heterodimer that interacts with the TUBD1- TUBE1 heterodimer.  

      Weaknesses:  

      The phenotypes observed in U-ExM on cartwheel elongation merit further quantification, enabling the field to appreciate better what is happening at the level of this structure.  

      We thank the reviewer for these comments and have improved our analyses of cartwheel elongation as detailed below.

      Reviewer #3 (Public Review):  

      Summary:  

      Human cells deficient in delta-tubulin or epsilon-tubulin form unstable centrioles, which lack triplet microtubules and undergo a futile formation and disintegration cycle. In this study, the authors show that human cells lacking the associated proteins TEDC1 or TEDC2 have these identical phenotypes. They use genetics to knockout TEDC1 or TEDC2 in p53negative RPE-1 cells and expansion microscopy to structurally characterize mutant centrioles. Biochemical methods and AlphaFold-multimer prediction software are used to investigate interactions between tubulins and TEDC1 and TEDC2.  

      The study shows that mutant centrioles are built only of A tubules, which elongate and extend their proximal region, fail to incorporate structural components, and finally disintegrate in mitosis. In addition, they demonstrate that delta-tubulin or epsilon-tubulin and TEDC1 and TEDC2 form one complex and that TEDC1 TEDC2 can interact independently of tubulins. Finally, they show that the localization of four proteins is mutually dependent.  

      Strengths:  

      The results presented here are mostly convincing, the study is exciting and important, and the manuscript is well-written. The study shows that delta-tubulin, epsilon-tubulin, TEDC1, and TEDC2 function together to build a stable and functional centriole, significantly contributing to the field and our understanding of the centriole assembly process.  

      Weaknesses:  

      The ultrastructural characterization of TEDC1 and TEDC2 obtained by U-ExM is inconclusive. Improving the quality of the signals is paramount for this manuscript.  

      We thank the reviewer for these comments and have improved our imaging of TEDC1 and TEDC2 localization, as detailed below.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):  

      The reviewers agreed that the conclusions are largely supported by solid evidence, but felt that improving the following aspects would make some of the data more convincing:  

      (1) The UExM localizations of TEDC1/2 are not very convincing and the reviewers suggest to complement these with alternative super-resolution approaches (e.g. SIM) and/or different labeling techniques such as pre-expansion labeling using STAR red/orange secondaries (also robust for SIM and STED), use of the Halo tag, different tag antibodies, etc 

      We thank the reviewers for these recommendations and have adapted two of these strategies to improve our imaging of TEDC1 and TEDC2 localization. First, we used an alternative super-resolution approach, a Yokogawa CSU-W1 SoRA confocal scanner (resolution = 120 nm) and imaged cells grown on coverslips (not expanded). We found that TEDC1 and TEDC2 localize to procentrioles and the proximal end of parental centrioles (Fig 2 – Supplementary Figure 1a, b). Second, we used a recently described expansion gel chemistry (Kong et al., Methods Mol Biol 2024) combined with Abberior Star red and orange secondary antibodies. This technique resulted in robust signal at centrosomes and in the cytoplasm and indicated that TEDC1 and TEDC2 localize near the centriole walls of procentrioles and the proximal region of parental centrioles, near CEP44 (Fig 2 – Supplementary Figure 1c, d). These results complement and support our initial observations (Fig 2C, D) and we have edited the text to reflect this (lines 157-163). We also note that these Flag tag and V5 tag primary antibodies are specific and have little background signal in all applications (Fig 2 – Supplementary Fig 1E-J), while other commercially available antibodies against these tags did exhibit non-specific signal. 

      (2) The cell cycle classifications of centrioles would strongly benefit, apart from a better description, from adding quantifications of average centriole length at a given stage based on tubulin staining (not acTub). 

      We thank the reviewers for these recommendations. We have added an improved description of our cell cycle analyses (lines 234-237). We have also added new analyses for centriole length as measured by staining with alpha-tubulin (Fig 4 – Supp 3 and Fig 4 – Supp 4). We find that in all mutants, acetylated tubulin elongates along with alpha-tubulin in a similar way as control centrioles.

      Reviewer #1 (Recommendations For The Authors):  

      Specific points:  

      (1) The introduction is a bit oddly structured. About halfway through it summarizes what is going to be presented in the study, giving the impression that it is about to conclude, but then continues with additional, detailed introduction paragraphs. Overall, the authors may also want to consider making it more concise.

      We thank the reviewer for these suggestions and have shortened and restructured the introduction for clarity and conciseness.

      (2) The text should explain to the non-expert reader why endogenous proteins are not detected and why exogenously expressed, tagged versions are used. Related to this, the authors state overexpression, but what is this assessment based on? Does expression at the endogenous level also rescue? At least by western blot, these questions should be addressed. 

      In the text, we have added clarification about why endogenous proteins were not detected for immunofluorescence (lines 149-151). To quantify the overexpression, we have added Western blots of TEDC1 and TEDC2 to Fig 1 – Supplementary Figure 1E,F. We note that endogenous levels of both proteins are very low, and the rescue constructs are overexpressed 20 to 70 fold above endogenous levels.  

      (3) The figures should clearly indicate when tagged proteins are used and detected.

      Currently, this info is only found in the legends but should be in the figure panels as well. 

      We have made these changes to the figure panels in Fig 2, Fig 2 – Supp 1, and Fig 3.

      (4)  I could not find a description and reference to Figure 2 Supplement 2 and 3. 

      We have replaced these supplements with new supplementary figures for TEDC1 and TEDC2 localization (Fig 2 – Supp 1).

      (5) The multiple bands including unspecific (?) bands should be labeled to guide the reader in the western blots. 

      We have labeled nonspecific bands in our Western blots with asterisks (Fig 1 – Supp 1, Fig 3)

      (6) The alphafold prediction suggests that TUBD1 can bind to the TED complex in the absence of TUBE1 can this be shown? This would be a nice validation of the predicted architecture of the complex. I also missed a bit of a discussion of the predicted architecture. How could it be linked to triplet microtubule formation? Is the latest alphafold version 3 adding anything to this analysis? 

      In our pulldown experiments, we found that TUBD1 cannot bind to TEDC1 or TEDC2 in the absence of TUBE1 (Fig 3C, D, IB: TUBD1). We performed this experiment with three biological replicates and found the same result. It is possible that TUBD1 and TUBE1 form an intact heterodimer, similar to alpha-tubulin and beta-tubulin, and this will be an exciting area of future research.

      We have added new analysis from AlphaFold3 (Fig 3 – Supp 1B). AlphaFold3 predicts a similar structure as AlphaFold Multimer.

      We have also added additional discussion about the AlphaFold prediction to the text (lines 220-222, 365-367). Thanks to the reviewer for pointing out this oversight.

      (7) I suggest briefly explaining in the text how cells and centrioles at different cell cycle stages were identified. I found some info in the legend of Figure 1, but no info for other figures or in the text. Related to this, how are procentrioles defined in de novo formation? There is no parental centriole to serve as a reference. 

      We have added a brief explanation of the synchronization and identification in lines 234237. We have also clarified the text regarding de novo centrioles, and now term these “de novo centrioles in the first cell cycle after their formation” (lines 271-272).

      (8) Related to point 7: using acetylated tubulin as a universal length and width marker seems unreliable since it is a PTM. The authors should use general tubulin staining to estimate centriole dimensions, or at least establish that acetylated tubulin correlates well with the overall tubulin signal in all mutants. 

      We have added two supplementary data figures (Fig 4 – supp 3 and Fig 4 – supp 4) in which we co-stain control and mutant centrioles with alpha-tubulin. We found that acetylated tubulin marked mutant centrioles well and as alpha-tubulin length increased, acetylated tubulin length also increased. 

      (9) Presence and absence of various centriolar proteins. These analyses lack a clear reference for the precise centriole elongation stage. This is particularly problematic for proteins that are recruited at specific later stages (such as inner scaffold proteins). The staining should be correlated with centriole length measurements, ideally using general tubulin staining.  

      As described for point 8, we have added two supplementary data figures in which we costain control and mutant centrioles with alpha-tubulin and found that acetylated tubulin also increases as overall tubulin length increases in all mutants. We note that inner scaffold proteins are absent in all our mutant centrioles at all stages of the cell and centriole cycle, as also previously reported for POC5 in Wang et al., 2017.

      Reviewer #2 (Recommendations For The Authors):  

      Here's a list of points I think could be improved:  

      -  As the authors previously published, the centriole appears to have a smaller internal diameter than mature centrioles. Could the authors measure to see if the phenotype is identical? Is the centriole blocked in the bloom phase (Laporte et al. 2024)? 

      We have added an additional supplementary figure (Fig 4 – supp 5) to show that mutant centrioles have smaller diameters than mature centrioles, as we previously reported for the delta-tubulin and epsilon-tubulin mutant centrioles by EM. We thank the reviewers for the additional question of the bloom phase. Given the comparatively smaller number of centrioles we analyzed in this paper compared to Laporte et al (50 to 80 centrioles per condition here, versus 800 centrioles in Laporte et al), it is difficult to definitively conclude whether there is a block in bloom phase. This would be an interesting area for future research.  

      -  The images of the centrioles in EM are beautiful. Would it be possible to apply a symmetrisation on it to better see the centriolar structures? For example, is the A-C linker present? 

      We thank the reviewer for this excellent suggestion. Using centrioleJ, we find that the A-C linker is absent from mutant centrioles. The symmetrized images have been added to Fig 1 – Supplementary Fig 2, and additional discussion has been added to the text (line 143-144, line 368-374).  

      -  How many EM images were taken? Did the centrioles have 100% A-microtubule only or sometimes with B-MT? 

      For TEM, we focused on centrioles that were positioned to give perfect cross-section images of the centriolar microtubules, and thus did not take images of off-angle or rotated centrioles. Given the difficulty of this experiment (centrioles are small structures within the cell, centrosomes are single-copy organelles, and off-angle centrioles were not imaged), we were lucky to image 3 centrioles that were in perfect cross-section – 2 for Tedc1<sup>-/-</sup> and 1 for Tedc2<sup>-/-</sup>. Our images indicate that these centrioles only have A-tubules (Fig 1 – Supp Fig

      2).

      -  In Figure 2 - it would be preferable to write TEDC2-flag or TEDC1-flag and not TEDC2/1. 

      We have made this change

      -  It seems that Figures 2C and D aren't cited, and some of the data in the supplemental data are not described in the main text. 

      We have replaced these supplements with new supplementary figures for TEDC1 and TEDC2 localization (Fig 2 – Supp 1).

      -  The signal in U-ExM with the anti-Flag antibody is heterogeneous. Did the authors test several anti-FLAG antibodies in U-ExM? 

      We tested several anti-Flag and anti-V5 antibodies for our analyses, and chose these because they have little background signal in all applications (Fig 2 – Supplementary Fig 1E-J). Other commercially available antibodies against these tags did exhibit non-specific signal.

      -  The AlphaFold prediction is difficult to interpret, the authors should provide more views and the PDB file. 

      We have added 2 additional views of the AlphaFold prediction in Fig 3 – Supp 1A.

      -  In general, but particularly for Figure 4: the length doesn't seem to be divided by the expansion factor, it is therefore difficult to compare with known EM dimensions. Can the authors correct the scale bars? 

      We have corrected the scale bars for all figures to account for the expansion factor.

      -  Concerning Gamma-tubulin that is "recruited to the lumen of centrioles by the inner scaffold, had localization defects in mutant centrioles. However, we were unable to reliably detect gamma-tubulin within the lumen of control or de novo-formed centrioles in S or G2-phase (Figure 4 - Supplement 1E), and thus were unable to test this hypothesis". In Laporte et al 2024, Gamma-tubulin arrives later than the inner scaffold and only on mature centrioles, so this result appears to be in line with previous observation. However, the authors should be able to detect a proximal signal under the microtubules of the procentriole, is this the case? 

      We agree that this is an exciting question. However, in our expansion microscopy staining, we frequently observe that gamma-tubulin surrounds centrioles, corresponding to its role in the pericentriolar material (PCM). In our hands, we find it difficult to distinguish between centriolar gamma-tubulin at the base of the A-tubule from gamma-tubulin within the PCM.  

      -  In the signal elongation of SAS-6, STIL, CEP135, CPAP, and CEP44, would it be possible to quantify the length of these signals (with dimensions divided by the expansion factor for comparison with known TEM distances)? 

      We have quantified the lengths of SAS-6 and CEP135 in new Fig 4 – Supp 3 and Fig 4 – Supp 4.  

      -  The authors observe that centrin is present, but only as a SFI1 dot-like localization (which is another protein that would be interesting to look at), and not an inner scaffold localization. Can the authors elaborate? These results suggest that the distal part is correctly formed with only a microtubule singlet. 

      We agree with the reviewer’s interpretation that the centriole distal tip is likely correctly formed with only singlet microtubules, as both distal centrin and CP110 are present. We have added this point to the discussion (line 415).

      -The authors observe that CPAP is elongated, but CPAP has two locations, proximal and distal. Is it distal or proximal elongation? Is the proximal signal of CPAP longer than that of CEP44 in the mutants? The authors discuss that the elongation could come from overexpression of CPAP, but here it seems that the centriole is not overlong, just the structures around the cartwheel. 

      We thank the reviewer for this point. It is difficult for us to conclude whether the proximal or distal region is extended in the mutants, as our mutant centrioles lacks a visible separation between these two regions. It would be interesting to probe this question in the future by testing whether subdomains of CPAP may be differentially regulated in our mutants.

      Reviewer #3 (Recommendations For The Authors):  

      It isn't apparent to me what was counted in Figure 1C. Were all centrioles (mother centrioles and procentrioles) counted? Where is the 40% in control cells coming from? Can this set of data be presented differently? 

      We apologize for the confusion. In this figure, all centrioles were counted. We have updated the figure legend for clarity. We performed this analysis in a similar way as in Wang et al., 2017 to better compare phenotypes.  

      Figure 2C. and the text lines 182-187: The ultrastructural characterization of TEDC1 and TEDC2 suffers from the low quality of the TEDC1 and TEDC2 signals obtained postexpansion. In comparison with robust low-resolution immunosignal, it appears that most of the signal cannot be recovered after expansion. Another sub-resolution imaging method to re-analyze TEDC1 and TEDC22 localization would be essential. The same concern applies to Figures 2 - Supplement 2 and 3. Also, Figure 2 - Supplement 2 and Supplement 3 do not seem to be cited. 

      We thank the reviewer for these recommendations. As also mentioned above, we used an alternative super-resolution approach, a Yokogawa CSU-W1 SoRA confocal scanner (resolution = 120 nm), and found that TEDC1 and TEDC2 localize to procentrioles and the proximal end of parental centrioles (Fig 2 – Supplementary Figure 1a, b). Second, we used a recently described expansion gel chemistry (Kong et al., Methods Mol Biol 2024) combined with Abberior Star red and orange secondary antibodies. This technique resulted in robust signal at centrosomes and in the cytoplasm and indicated that TEDC1 and TEDC2 localize near the centriole walls of procentrioles and the proximal region of parental centrioles, near CEP44 (Fig 2 – Supplementary Figure 1c, d). These stainings complement and support our initial observations (Fig 2C, D) and we have edited the text to reflect this (lines 157-163). We have also removed the supplementary figures that were uncited in the text.

      TUBD1 and TUBE1 form a dimer and TEDC2 and TEDC1 can interact. Any speculation as to why TEDC2 does not pull down both TUBE1 and TUBD1? 

      We apologize for the confusion. TEDC2 does pull down both TUBE1 and TUBD1 (Fig 3D, pull-down, second column, Tedc2-V5-APEX2 rescuing the Tedc2<sup>-/-</sup> cells pulls down TUBD1, TUBE1, and TEDC1).  

      Figure 4A and B. The authors use acetylated tubulin to determine the length of procentrioles in the S and G2 phases. However, procentrioles are not acetylated on their distal ends in these cell phase phases (as the authors also mention further in the text). Why has alpha tubulin not been used since it works well in U-ExM? The average size of the control, G2 procentrioles, seems too small in Figure 4A and not consistent with other imaging data (for instance, in Figure 4 - Supplement 1 C, Cp110, and CPAP staining). There is no statistical analysis in F4A.  

      We have added two supplementary data figures (Fig 4 – supp 3 and Fig 4 – supp 4) in which we co-stain control and mutant centrioles with alpha-tubulin. We found that acetylated tubulin correlates well with overall tubulin signal in all mutants. We have added statistical analysis to the figure legend of Fig 4A.

      Lines 260 - 262: "These results indicate that centrioles with singlet microtubules can elongate to the same length as controls, and therefore that triplet microtubules are not essential for regulating centriole length." It is hard to agree with this statement. Mutant procentrioles show aberrantly elongated proximal signals of several tested proteins. In addition, in lines 326 - 328, the authors state that "Together, these results indicate that centrioles lacking compound microtubules are unable to properly regulate the length of the proximal end."  

      We thank the reviewer and have clarified the statement to state that these results indicate that centrioles with singlet microtubules can elongate to the same overall length as control centrioles in G2 phase.  

      Line 353: The authors suggest that elongated procentriole structure in mitosis may represent intermediates in centriole disassembly. Another interpretation, more in line with the EM data from Wang et al., 2017, would be that these mutant procentrioles first additionally elongate before they disassemble in late mitosis. The aberrant intermediate structure concept would need further exploration. For instance, anti-alpha/beta-tubulin antibodies could be used to investigate centriole microtubules.  

      We apologize for the confusion and have edited this section for clarity (lines 341-343): “We conclude that in our mutant cells, centrioles elongate in early mitosis to form an aberrant intermediate structure, followed by fragmentation in late mitosis.”

      References need to be included in lines 122, 277, 279. 

      We have added these references

      Line 281: Add references PMID: 30559430 and PMID: 32526902.  

      We have added these references (lines 265-266).

      Line 289: "Moreover, our results suggest that centriole glutamylation is a multistep process, in which long glutamate side chains are added later during centriole maturation." This does not seem like an original observation. For instance, see PMID: 32526902.  

      We have added this reference (lines 273-274).

    1. Author response:

      Reviewer 1:

      (1) Provide Rsmd and DALI scores to show how similar the AlphaFold-predicted structures of BrrG are to other anti-termination factors. This should be done for Fig1B and also for Suppl. Fig 1 to support the claim that BrrG, GafA, GafZ, Q21 share structural features.

      In the revised manuscript we will provide Rsmd and DALI scores.

      (2) Throughout the manuscript, flow cytometry data of gfp expression was used and shown as single replicate. Korotaev et al wrote in the legends that error bars are shown (that is not true for e.g. Figs. 3, 4, and 5). It is difficult for reviewers/readers to gauge how reliable are their experiments.

      As stated in the manuscript all flow cytometry data were performed in triplicate. In the revised manuscript we will include the two replicates not presented in the main figures as supplementary information.

      (3) I am unsure how ChIP-seq in Fig. 2A was performed (with anti-FLAG or anti-HA antibodies? I cannot tell from the Materials & Methods). More importantly, I did not see the control for this ChIP-seq experiment. If a FLAG-tagged BrrG was used for ChIP-seq, then a WT non-tagged version should be used as a negative control (not sequencing INPUT DNA), this is especially important for anti-terminator that can co-travel with RNA polymerase. Please also report the number of replicates for ChIP-seq experiments.

      Fig. 2A presents a coverage plot from the ChIP-Seq of ∆brrG +pTet:brrG-3xFLAG (N’). A replicate of this N-terminally tagged construct will be added as supplementary data in the revised version. As anticipated by the referee, we had used ∆brrG +pTet:brrG (untagged) as control.

      (4) Korotaev et al mentioned that BrrG binds to DNA (as well as to RNA polymerase). With the availability of existing ChIP-seq data, the authors should be able to locate the DNA-binding element of BrrG, this additional information will be useful to the community.

      We will mine the ChIP-Seq data to define the BrrG binding site as closely as possible and include the analysis in the revised version of the manuscript.

      (5) Mutational experiments to break the potential hairpin structure are required to strengthen the claim that this putative hairpin is the potential transcriptional terminator.

      We did not claim that the identified hairpin is a terminator but rather suggested it as a candidate terminator. We agree with the referee that the proposed experiment would be necessary to definitively prove its terminator function. However, our primary aim was to demonstrate that BrrG acts as a processive terminator, which we have shown by replacing the putative terminator with a well-characterized synthetic terminator that BrrG successfully overcame. Therefore, we prefer not to conduct the proposed experiment and will instead further tone down our conclusions regarding the putative terminator function of the identified hairpin structure.

      Reviewer 2:

      (1) The authors wrote "GTAs are not self-transmitting because the DNA packaging capacity of a GTA particle is too small to package the entire gene cluster encoding it" (page 3). I thought that at least the Bartonella capsid gene cluster should be self-transmissible within the 14 kb packaged DNA (https://doi.org/10.1371/journal.pgen.1003393, https://doi.org/10.1371/journal.pgen.1000546). This was also concluded by Lang et al (https://doi.org/10.1146/annurev-virology-101416-041624). In this case the presented results would have important implications. As the gene cluster and the anti-terminator required for its expression are separated on the chromosome, it would not be possible to transfer an active GTA gene cluster, although the DNA coding for the genes required for making the packaging agent itself, theoretically fits into a BaGTA particle. Could the authors comment on that? I think it would be helpful to add the sizes of the different gene clusters and the distance between them in Fig. 2A. The ROR amplified region spans 500kb, is the capsid gene cluster within this region?

      We thank the reviewer for bringing up this interesting point. The bgt cluster (capsid cluster) is approximately 9.2 kb in size and could feasibly be packaged in its entirety into a GTA particle. In contrast, the ror gene cluster, which encodes the antiterminator BrrG, is approximately 20 kb in size—exceeding the packaging limit of GTA particles—and is separated from the bgt cluster by approximately 35 kb. Consequently, if the bgt cluster is transferred via a GTA particle into a recipient host that does not encode the ror gene cluster, the bgt cluster would not be expressed.

      (2) Another side-note regarding the introduction: On page three the authors write: "GTAs encode bacteriophage-like particles and in contrast to phages transfer random pieces of host bacterial DNA". While packaging is not specific, certain biases in the packaging frequency are observed in both studied GTA families. For Bartonella this is ROR. In the two GTA-producing strains D. shibae and C. crescentus origin and terminus of replication are not packaged and certain regions are overrepresented (https://doi.org/10.1093/gbe/evy005, https://doi.org/10.1371/journal.pbio.3001790). Furthermore, D. shibae plasmids are not packaged but chromids are. I think the term "random" does not properly describe these observations. I would suggest using "not specific" instead.

      We thank the reviewer for this suggestion and will adjust the working accordingly.

      (3) Page 5: Remove "To address this". It is not needed as you already state "To test this hypothesis" in the previous sentence.

      We will adjust the working accordingly.

      (4) I think the manuscript would greatly benefit from a summary figure to visualize the Q-like antiterminator-dependent regulatory circuit for GTA control and its four components described on pages 15 and 16.

      We thank the reviewer for this valuable suggestion and will include a summary figure illustrating the deduced regulatory mechanism in the revised manuscript.

      (5) Page 17: It might be worth noting that GafA is highly conserved along GTAs in Rhodobacterales (https://doi.org/10.3389/fmicb.2021.662907) and so is probably regulatory integration into the ctrA network (https://doi.org/10.3389/fmicb.2019.00803). It's an old mechanism. It would be also interesting to know if it is a common feature of the two archetypical GTAs that the regulator is not part of the cluster itself.

      We agree with the points raised by the reviewer and will address them in the revised manuscript. Specifically, we will highlight the high conservation of GafA in GTAs across Rhodobacterales and its regulatory integration within the ctrA network. Additionally, we will analyze whether the GafA-like antitermination regulator is typically located outside the regulated gene cluster, as we have demonstrated for BrrG of BaGTA in the Bartonellae.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, Huang et al used SMRT sequencing to identify methylated nucleotides (6mA, 4mC, and 5mC) in Pseudomonas syringae genome. They show that the most abundant modification is 6mA and they identify the enzymes required for this modification as when they mutate HsdMSR they observe a decrease of 6mA. Interestingly, the mutant also displays phenotypes of change in pathogenicity, biofilm formation, and translation activity due to a change in gene expression likely linked to the loss of 6mA. Overall, the paper represents an interesting set of new data that can bring forward the field of DNA modification in bacteria.

      Thank you for your valuable feedback on our paper exploring the impact of 6mA modification in P. syringae.

      Major Concerns:

      Most of the authors' data concern Psph pathovar. I am not sure that the authors' conclusions are supported by the two other pathovars they used in the initial 2 figures. If the authors want to broaden their conclusions to Pseudomonas syringe and not restrict it to Psph, the authors should have stronger methylation data using replicates. Additionally, they should discuss why Pss is so different than Pst and Psph. Could they do a blot to confirm it is really the case and not a sequencing artifact? Is the change of methylation during bacterial growth conserved between the pathovar? The authors should obtain mutants in the other pathovar to see if they have the same phenotype. The authors have a nice set of data concerning Psph but the broadening of the results to other pathovar requires further investigation.

      We appreciate the reviewer’s insightful comments. While the majority of our data focuses on the Psph, we recognize the importance of validating these findings in Pss and Pst. To this end, we have performed additional experiments using dot blot and mutant construction to enhance our conclusions in other pathovars.

      We agree that we should discuss why Pss is different from Psph and Pst. We performed a dot blot assay using genome DNA in Pss and Pst, presented in Figure S5A. Meanwhile, we compared the 6mA modification level of Pss and Pst in different growth phases. As shown in Figure S5A, the change of methylation during bacterial growth is conserved in Pst. The change was not obvious in Pss, which might be due to the lack of a type I R-M system.

      “In accordance with previous studies showing that growth conditions affect the bacterial methylation status, we applied dot blot experiments using the same amount of DNA (1 μg) from these three P. syringae strains to detect the 6mA levels during both logarithmic and stationary phases. The results revealed that 6mA levels in the stationary phase were much higher compared to the logarithmic phase in Psph and Pst, but no significant change in Pss. Additionally, we found that during the stationary phase, 6mA methylation levels in Psph and Pst were higher than those in Pss. These findings were consistent with the MTases predication on these three strains, since Pss does not harbor any type I R-M systems, which are important for 6mA medication in bacteria.”

      Please see Figure S5A and Lines 220-228 in the revised manuscript.

      We also tried to construct an HsdM mutant in Pst to explore whether the influence of 6mA methylation was conserved in P. syringae, but it failed after multiple attempts. We did not construct a Pss mutant because no type I R-M system was predicted, and few methylation sites were identified via SMRT-seq in this strain. Therefore, we overexpressed HsdM in Pst instead. We have performed additional experiments in WT and the HsdM overexpression strains, including dot blot and growth curve assays.

      Please see Figures S5B-C and Lines228-232 in the revised manuscript.

      The authors should include proper statistical analysis of their data. A lot of terms are descriptive but not supported by a deeper analysis to sustain the conclusions. For example, in Figure 4E, we do not know if the overlap is significant or not. Are DEGs more overlapping to 6mA sites than non-DEGs? Here is a non-exhaustive list of terms that need to be supported by statistics: different level (L145), greater conservation (L162), significant conservation (L165), considerable similarity (L175), credible motifs (L189), Less strong (L277) and several "lower" and "higher" throughout the text.

      Thank you for the insightful feedback. We have made the following revisions in the manuscript to ensure that the terms are more precise and do not require statistical significance testing.

      (1) Statistical analysis: We have added statistical tests for the overlap between DEGs and 6mA sites in Figure 4E. We performed the Fisher test, and we found the overlap was not significant (p> 0.05). DEGs and non-DEGs were both non-significant overlapped 6mA sites. Please see Figure 4E and Lines 261-262.

      “Less strong” was used to indicate the influence of HsdM on biofilm in Figure 5D. All Figures with “*” labels were analyzed using students' two-tailed t-tests with a significant change (p < 0.05).

      (2) Revised wording: For terms used to describe comparisons, we have revised the wording to be clearer and ensure that the terminology used did not imply the need for statistical significance testing when not required. For example:

      “Different level” has been removed. Please see Lines 143-144.

      “Greater conservation” has been revised to “more conserved functional terms”. Please see Lines 161-162.

      “Significant conservation” has been revised to “notable conservation”. Please see Line 165.

      “Credible motifs” has been revised to “identified motifs”. Please see Line 186.

      The authors performed SMRT sequencing of the delta hsdMSR showing a reduction of 6mA. Could they include a description of their results similar to Figures 1-2. How reduced is the 6mA level? Is it everywhere in the genome? Does it affect other methylation marks? This analysis would strengthen their conclusions.

      Yes, we agree. We have provided additional analysis and descriptions to strengthen the conclusions regarding these valuable comments. We determined three methylation sites in the HsdMSR mutant strain and compared the overlapped genes within these modification patterns. Specifically, we focused on the 6mA sites in Psph WT, HsdMSR mutant, and HsdM motif CAGCN<sub>(6)</sub>CTC. As expected, we found almost all of the reduction 6mA sites in the ΔhsdMSR were from motif CAGCN<sub>(6)</sub>CTC. We also noticed that 5mC and 4mC sites in the mutant were relatively similar to that in WT, and the slight difference might be caused by sequencing errors. Overall, we propose that HsdMSR only catalyze the 6mA located on the motif CAGCN<sub>(6)</sub>CTC, but does not affect other 6mA sites and other modification types.

      Please see Figures S4D-E and Lines 212-218 in the revised manuscript.

      In Figure 6E to conclude that methylation is required on both strands, the authors are missing the control CAGCN6CGC construct otherwise the effect could be linked to the A on the complementary strand.

      Thank you for your valuable suggestions. We have provided the control result on the complementary strand. Please see Figure 6C. The new result evidences the conclusion that 6mA methylation regulates gene transcription based on methylation on both strands.

      Please see Figure 6C and Lines 329-330 in the revised manuscript.

      Reviewer #2 (Public Review):

      In the present manuscript, Huang et.al. revealed the significant roles of the DNA methylome in regulating virulence and metabolism within Pseudomonas syringae, with a particular focus on the HsdMSR system in this model strain. The authors used SMRT-seq to profile the DNA methylation patterns (6mA, 5mC, and 4mC) in three P. syringae strains (Psph, Pss, and Psa) and displayed the conservation among them. They further identified the type I restriction-modification system (HsdMSR) in P. syringae, including its specific motif sequence. The HsdMAR participated in the process of metabolism and virulence (T3SS & Biofilm formation), as demonstrated through RNA-seq analyses. Additionally, the authors revealed the mechanisms of the transcriptional regulation by 6mA. Strictly from the point of view of the interest of the question and the work carried out, this is a worthy and timely study that uses third-generation sequencing technology to characterize the DNA methylation in P. syringae. The experimental approaches were solid, and the results obtained were interesting and provided new information on how epigenetics influences the transcription in P. syringae. The conclusions of this paper are mostly well supported by data, but some aspects of data analysis and discussion need to be clarified and extended.

      Thank you for your positive feedback and recognition of the importance of our study. We appreciate the suggestions for further clarification and extension of some aspects of data analysis and discussion. We added further analysis of the SMRT-seq result of the ΔhsdMSR and overexpressed HsdM in Pst to provide more information on conservation. We added these contents to the discussion in the revised manuscript. Please see Figure 6C and  Figure S5.

      Reviewer #3 (Public Review):

      Summary:

      The article by Huang et.al. presents an in-depth study on the role of DNA methylation in regulating virulence and metabolism in Pseudomonas syringae, a model phytopathogenic bacterium. This comprehensive research utilized single-molecule real-time (SMRT) sequencing to profile the DNA methylation landscape across three model pathovars of P. syringae, identifying significant epigenetic mechanisms through the Type-I restriction-modification system (HsdMSR), which includes a conserved sequence motif associated with N6-methyladenine (6mA). The study provides novel insights into the epigenetic mechanisms of P. syringae, expanding the understanding of bacterial pathogenicity and adaptation. The use of SMRT sequencing for methylome profiling, coupled with transcriptomic analysis and in vivo validation, establishes a robust evidence base for the findings

      Strengths:

      The results are presented clearly, with well-organized figures and tables that effectively illustrate the study's findings.

      Weaknesses:

      It would be helpful to add more details, especially in the methods, which make it easy to evaluate and enhance the manuscript's reproducibility.

      Thank you for the positive evaluation of our study, as well as the constructive feedback provided. We have added more details in methods for RNA-seq analysis and Ribo-seq analysis. Please see Lines 484-515.

      “Briefly, bacteria were cultured to an OD<sub>600</sub> of 0.4, at which point chloramphenicol was added to a final concentration of 100 µg/mL for 2 minutes. Cells were then pelleted and washed with pre-chilled lysis buffer [25 mM Tris-HCl, pH 8.0; 25 mM NH4Cl; 10 mM MgOAc; 0.8% Triton X-100; 100 U/mL RNase-free DNase I; 0.3 U/mL Superase-In; 1.55 mM chloramphenicol; and 17 mM GMPPNP]. The pellet was resuspended in lysis buffer, followed by three freeze-thaw cycles using liquid nitrogen. Sodium deoxycholate was then added to a final concentration of 0.3% before centrifugation. The resulting supernatant was adjusted to 25 A260 units and mixed with 2 mL of 500 mM CaCl<sub>2</sub> and 12 µL MNase, making up a total volume of 200 µL. After the digestion, the reaction was quenched with 2.5 mL of 500 mM EGTA. Monosomes were isolated using Sephacryl S400 MicroSpin columns, and RNA was purified using the miRNeasy Mini Kit (Qiagen). rRNA was removed using the NEBNext rRNA Depletion Kit, and the final library was constructed with the NEBNext Small RNA Library Prep Kit. For each sample, ribosome footprint reads were mapped to the Psph 1448A reference genome, and the translational efficiency was calculated by dividing the normalized Ribo-seq counts by the normalized RNA counts. Two biological replicates were performed for all experiments.”

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      I would recommend the authors limit their manuscript to Psph pathovar and include statistical analysis supporting their conclusions.

      Thank you for your suggestion.

      Minor

      • L104: "significantly" please add a p-value and explain the analysis.

      Sorry for the confusion. We have added the p-value and explained the analysis in the method section. The p-value used for SMRT-seq was the modification quality value (QV) score, which were used to call the modified bases A (QV=50) and C (QV=100). Please see Lines 452-454.

      • Figures 1B, D, F, and Figure 2A: make the Venn diagram to scale

      Yes, we have revised.

      • L110-111: missing p-value to say that the authors observe a bigger overlap in Pst than Psph as they observed more modified sites in Pst

      Sorry for the confusion. We said it had a bigger overlap in Pst because the number 17.7 was bigger than the number of 15 in Psph. To avoid misunderstanding, we revised the wording to “more genes equipped with all three modification types were detected in Pst than Psph”. Please see Lines 110-111.

      • L112: missing description of their Pss analysis (IDP, sites...)

      We have added the information for Pss in the revised manuscript.

      “Additionally, the methylome atlas of Pss revealed a lower incidence of methylation than those of Psph and Pst, particularly in terms of 6mA modifications, which were only seen in 457 significant 6mA occurrences under the same threshold (IPD > 1.5) and a total of 2,853 and 1,438 methylation sites were detected as 5mC and 4mC, respectively”. Please see Lines 114-116.

      • L118: "modification" to "modified "

      We have revised. Please see Line 119.

      • L120: "modification sites" to "modified nucleotides"

      We have revised. Please see Line 121.

      • L142: correct the title "Methylated genes revealed highly functional conservation among three P. syringae strains" maybe to "Methylated genes are functionally conserved among ..."

      We have revised. Please see Line 142.

      • Figure 2C: not easy to read and interpret

      Sorry for the confusion. Figure 2C revealed the significantly enriched functional pathways in GO and KEGG databases among three P. syringae strains. The specific names of each pathway were listed on the left, and each column with dots indicated the number of genes within one kind of methylation in one of three P. syringae strains. The larger the size, the bigger the number.

      We have revised the legend of Figure 2C. Please see Lines 575-579.

      “The dot plot revealed the significantly enriched functional pathways in GO and KEGG databases among three P. syringae strains. The specific names of each pathway were listed on the left, and each column with dots indicated the number of genes within one kind of methylation in one of three P. syringae strains. The size of the dots indicates the number of related genes.”

      • Figure 6B-C: what is the difference between B 24h and C?

      Figure 6B revealed the expression difference between WT and mutant during 24 hours. Figure 6C only showed a time point in 24 hours. To avoid repetition, we have removed Figure 6C.

      • Figure 6C-D: if the same maybe remove Figure 2C

      We have removed Figure 6D.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript could be improved by addressing the following concerns:

      (1) In line 146: How to understand the percentage conserved in "more than two of the strains"?

      Sorry for the confusion, we planned to indicate the pattern that conserved in two strains and three strains. We have revised it to: “Notable, about 25% to 45% of methylated genes were conserved in two and three strains”. Please see Line 145.

      (2) In line 178: Five conserved sequence motifs should be replaced by "Six conserved sequence motifs".

      We have revised. Please see Line 176.

      (3) In Figure 2B, specify the C1, C2 and C3. "m6A" should be replaced by "6mA".

      Yes, we have revised.

      (4) In Figure S2, "m6A" should be replaced by "6mA".

      Yes, we have revised.

      (5) In line 212, please add references for the previous studies showing that growth conditions affect bacterial methylation status.

      Thank you for your suggestion. We have added the relevant references (Gonzalez and Collier, 2013), (Krebes et al., 2014), (Sanchez-Romero and Casadesus, 2020).

      (6) In line 217, "illustrate" should be "illustrated".

      Yes, we have revised. Please see Line 210.

      (7) There are some genes colored in grey, revealing bigger differences between the two strains than those related to ribosomal protein, T3SS, and alginate synthesis in Fig. 4A. Do they have important functional roles as well?

      Thank you for your suggestion. A total of 116 genes with bigger differences (|Log<sub>2</sub>FC| > 2) except for genes related to ribosomal protein, T3SS, and alginate synthesis. Among these genes, 31 were annotated as hypothetical proteins and 4 as transcription factors with unknown functions, and the remaining genes mostly encoded metabolism-related enzymes. These enzymes might have effects on growth defects in ΔhsdMSR. We added this information in the revised manuscript. Please see Line 249-254.

      (8) The authors should discuss what will be the potential signals or factors that can regulate the activity of HsdMSR. In other words, what can decide the extent of methylation through activating or suppressing the expression of HsdMSR?

      Thank you for your valuable suggestion. We have added this part in the discussion part. Please see Lines 404-415.

      “Apart from the established roles of 6mA and HsdMSR in P. syringae, certain signals or factors may influence HsdMSR expression. For instance, we confirmed that the growth phase affects methylation levels in P. syringae. Previous studies have shown that increased temperatures can reduce methylation levels, as observed in PAO1(Doberenz et al., 2017). These findings suggest that HsdMSR expression may be responsive to both intrinsic cellular states and extrinsic environmental conditions. To further explore potential upstream TFs regulating the expression of HsdMSR, we searched for TF-binding sites in the HsdMSR promoter using our own databases (Fan et al., 2020; Shao et al., 2021; Sun et al., 2024). As a result, we found three candidate TFs (PSPPH_0061, PSPPH_3268, and PSPPH_3504) that might directly bind and regulate HsdMSR expression. Future studies on these TFs and their interactions with the HsdMSR promoter would help clarify the regulatory network governing HsdMSR activity.”

      Reviewer #3 (Recommendations For The Authors):

      (1) Some figures contain dense information, which may be overwhelming for readers. Streamlining the legend for Figure 1 and resizing the Venn diagrams within it could enhance clarity and visual appeal.

      Thank you for your suggestion. We have scaled all the Venn plots in the revised version.

      (2) Incorporating a discussion about the role of the restriction-modification (RM) system in bacterial defense against phage infection into the discussion section could enrich the manuscript's context and relevance.

      Thank you for your valuable suggestion. We have added this part in the Discussion part. Please see Lines 416-427.

      “RM systems are known for their intrinsic role as innate immune systems in anti-phage infection, and present in around 90% of bacterial genomes(Oliveira et al., 2014). RM systems protect bacteria self by recognizing and degrading foreign phage DNA via methylation-specific site and restriction endonucleases (REases) (Loenen et al., 2014). In addition, other phage-resistance systems are similar to RM systems but carry extra genes. One is called the phage growth limitation (Pgl) system, which modifies and cleaves phage DNA. However, the Pgl only modifies the phage DNA in the first infection cycle, and cleaves phage DNA in the subsequent infections, which gives a warn to the neighboring cells(Hampton et al., 2020; Hoskisson et al., 2015). To counteract RM and RM-like systems, phages have evolved strategies, including unusual modifications such as hydroxymethylation, glycosylation, and glucosylation. They can also encode their own MTases to protect their DNA or employ strategies to evade restriction systems and other anti-RM defenses.(Iida et al., 1987; Murphy et al., 2013; Vasu and Nagaraja, 2013).

      (3) In line 152: What is the importance of the mentioned example of Cro/CI family TF?

      Thank you for your comments. The Cro/CI are important TFs present in phages. The interaction between Cro and CI affects bacteria immunity status in Enterohemorrhagic Escherichia coli (EHEC) strains(Jin et al., 2022). RM systems are known as a kind of phage-defense system, and hence we mentioned it here. We have added this description in the revised manuscript. Please see Lines 152-154.

      Reference:

      (1) Doberenz, S., Eckweiler, D., Reichert, O., Jensen, V., Bunk, B., Sproer, C., Kordes, A., Frangipani, E., Luong, K., Korlach, J., et al. (2017). Identification of a Pseudomonas aeruginosa PAO1 DNA Methyltransferase, Its Targets, and Physiological Roles. mBio 8. 10.1128/mBio.02312-16.

      (2) Fan, L., Wang, T., Hua, C., Sun, W., Li, X., Grunwald, L., Liu, J., Wu, N., Shao, X., Yin, Y., et al. (2020). A compendium of DNA-binding specificities of transcription factors in Pseudomonas syringae. Nat Commun 11, 4947. 10.1038/s41467-020-18744-7.

      (3) Gonzalez, D., and Collier, J. (2013). DNA methylation by CcrM activates the transcription of two genes required for the division of Caulobacter crescentus. Mol Microbiol 88, 203-218. 10.1111/mmi.12180.

      (4) Hampton, H.G., Watson, B.N., and Fineran, P.C. (2020). The arms race between bacteria and their phage foes. Nature 577, 327-336.

      (5) Hoskisson, P.A., Sumby, P., and Smith, M.C. (2015). The phage growth limitation system in Streptomyces coelicolor A (3) 2 is a toxin/antitoxin system, comprising enzymes with DNA methyltransferase, protein kinase and ATPase activity. Virology 477, 100-109.

      (6) Iida, S., Streiff, M.B., Bickle, T.A., and Arber, W. (1987). Two DNA antirestriction systems of bacteriophage P1, darA, and darB: characterization of darA− phages. Virology 157, 156-166.

      (7) Jin, M., Chen, J., Zhao, X., Hu, G., Wang, H., Liu, Z., and Chen, W.-H. (2022). An engineered λ phage enables enhanced and strain-specific killing of enterohemorrhagic Escherichia coli. Microbiology Spectrum 10, e01271-01222.

      (8) Krebes, J., Morgan, R.D., Bunk, B., Sproer, C., Luong, K., Parusel, R., Anton, B.P., Konig, C., Josenhans, C., Overmann, J., et al. (2014). The complex methylome of the human gastric pathogen Helicobacter pylori. Nucleic Acids Res 42, 2415-2432. 10.1093/nar/gkt1201.

      (9) Loenen, W.A., Dryden, D.T., Raleigh, E.A., Wilson, G.G., and Murray, N.E. (2014). Highlights of the DNA cutters: a short history of the restriction enzymes. Nucleic Acids Res 42, 3-19.

      (10) Murphy, J., Mahony, J., Ainsworth, S., Nauta, A., and van Sinderen, D. (2013). Bacteriophage orphan DNA methyltransferases: insights from their bacterial origin, function, and occurrence. Appl Environ Microb 79, 7547-7555.

      (11) Oliveira, P.H., Touchon, M., and Rocha, E.P. (2014). The interplay of restriction-modification systems with mobile genetic elements and their prokaryotic hosts. Nucleic Acids Res 42, 10618-10631.

      (12) Sanchez-Romero, M.A., and Casadesus, J. (2020). The bacterial epigenome. Nature reviews. Microbiology 18, 7-20. 10.1038/s41579-019-0286-2.

      (13) Shao, X., Tan, M., Xie, Y., Yao, C., Wang, T., Huang, H., Zhang, Y., Ding, Y., Liu, J., Han, L., et al. (2021). Integrated regulatory network in Pseudomonas syringae reveals dynamics of virulence. Cell Rep 34, 108920. 10.1016/j.celrep.2021.108920.

      (14) Sun, Y., Li, J., Huang, J., Li, S., Li, Y., Lu, B., and Deng, X. (2024). Architecture of genome-wide transcriptional regulatory network reveals dynamic functions and evolutionary trajectories in Pseudomonas syringae. bioRxiv, 2024.2001. 2018.576191.

      (15) Vasu, K., and Nagaraja, V. (2013). Diverse functions of restriction-modification systems in addition to cellular defense. Microbiol Mol Biol Rev 77, 53-72. 10.1128/MMBR.00044-12.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editors and reviewers for the comments and suggestions on our manuscript.  The main point that we wished to convey in this paper was the concept and the kinetic model that enabled the estimation of nuclear export rate from an image of single mRNAs localised in single cells.  By studying the influenza viral transcripts with this model, we report the variation in the mRNA nuclear export rate of the eight viral segments.  Of note, the hemagglutinin and neuraminidase mRNAs were the slowest among the eight segments in exiting the nucleus.  We agree that the potential mechanism and the biological impact of this observation require further validation, as the reviewers pointed out.  We revised our manuscript to describe these points separately (Lines 21-25, Abstract; Lines 86-91, Introduction; Lines 316-320, Results; Lines 372-381, Discussion).  We also highlight below, the revisions that we made to address the specific points raised by the reviewers.  

      Influenza viral transcription

      The authors used specific settings for their virology experiments and several assumptions regarding their mathematical modelling, so it's extremely important that the reader has the viral life cycle clearly understood before immersing themselves in the results. Thus, a detailed explanation of the viral life cycle, including the kinetics of each step, would be extremely helpful if included in the introduction section.  Reviewer #1

      We have included the molecular composition of influenza vRNP and the mechanism of viral transcription in the revised manuscript (Lines 46-53).  

      Line 45: "Eight viral RNA segments are transcribed by the same set of molecular machinery" (Ref. 7). What's known about the arrival of the viral RNA segments in the nucleus? Is it synchronized? The authors will understand that my concern is related to the fact that a differential arrival would indeed impact the transcription and export processes.  Reviewer #1

      The arrival of eight vRNPs in the nucleus is not synchronised, with each of the eight vRNPs arriving independently (Chou et al. PLOS Pathogens 2013) (Lakadamyali et al, PNAS 2003).  This does not compromise our model, as our model estimates the export rate of each mRNA species individually (also please see our response in Model assumption below).  This is included in the second paragraph of the Discussion section (Lines 390-400).  

      Model assumption

      Even though I do not have the expertise to assess the authors' mathematical model, I do not doubt its robustness. Even so, I find some virological concerns related to the set-up of their experiments. According to what I understand, the authors performed non-synchronized 2 h-long infections with the WSN strain of influenza A virus. They did this to avoid cRNA production (and cross-reaction of the probes), which they claim to occur "much later than mRNA synthesis". Then they omit the degradation of the mRNAs for their model without giving an explanation for having done so. So, taking all these into account, it seems to me that too many assumptions are made without a strong argument. I understand that they are made in order to simplify their model, but I strongly consider that the model would gain strength if some of these events were experimentally considered. Thus, would it be possible to perform synchronized infections? Would it be possible to empirically demonstrate that cRNA production does not occur within the first 2 hours of infection and/or separate transcription and replication? Would it be possible to incorporate a degradation inhibitor of the mRNAs into their infections? If all these could be achieved, then the results coming out of the mathematical model would be enormously reinforced.  Reviewer #1

      * The study lacks experimental data that would help support the conclusions. For instance, perturbations are many times used to prove a point related to gene expression. An example for Fig. 2 for such an experiment could be to treat the cells with transcription inhibitors (e.g. DRB, 5,6-dichloro1-beta-D-ribofuranosylbenzimidazole). Preventing transcription leaves only mature RNAs in the nucleus, and then using this system one can compare the export rate of different RNAs.  Reviewer #2

      We agreed that the primary concern in our model was the assumption that the mRNA degradation could be omitted.  Synchronised infection is not necessary; in fact, non-synchronised infection is preferred, as we explain later in our response.  Additionally, the dominance of mRNA production over the cRNA production has been documented elsewhere.  To address mRNA degradation and validate our model estimation, we performed a time-course measurement using baloxavir.  Baloxavir efficiently blocks the viral transcription by inhibiting the nuclease activity in PA.  DRB, suggested by the reviewer, allows influenza viral transcription and causes viral transcripts to accumulate in the nucleus for unknown mechanisms (Amorim et al. Traffic 2007 and our observation using smFISH, not shown).  The additional experiment, now presented in Fig. 5 in the revised manuscript, indicated that the mRNA degradation is minimal, and the export rate estimated in our model and the time-course experiment agreed well for the HA segment.  The experiment raised the possibility that the time-course measurement underestimates the export rate of transcripts that exit the nucleus rapidly, such as NP.  A real-time imaging of single transcripts would be necessary to directly measure the true nuclear export rate; however, this is beyond the scope of our paper.  The new result is now presented in Fig. 5, Supplementary figures 3 and 4, and in the main text (Lines 322-360).  An alteration was also made in Line 286 to guide to Fig. 5.  The Materials and Methods section was updated (Lines 478-482).  

      We note that our model does not require synchronised infection.  Even under synchronised infection, such as incubating cells with the virus at 4°C to facilitate attachment and subsequently shifting to 37°C to allow viral entry, the inherent heterogeneity in vRNP migration to the nucleus still remains.  This randomness does not compromise our model; rather, our model exploits this random arrival of each vRNP in each cell in the system.  This variation, in turn, generates cells carrying varying amounts of transcripts, enabling the estimation of nuclear export rate.  Importantly, more variation ensures the broader distribution of transcript levels, enabling more precise parameter fitting in our model.  It is also important to note that our model does not require the correlation between segments.  Our model estimates the export rate of each mRNA species individually.  These important points were explained in the Discussion section (Lines 390-400).  

      * There is no concrete value given for the export rates and what they might mean biologically (e.g. time present/stuck in the nucleus) - Fig. 4D. This leaves the reader in the dark.  Reviewer #2

      The export rate lambda (previously denoted as k) in our model (Fig. 4) and the decay constant k in the time-course measurement (Fig. 5) represent the proportion of mRNAs exported from the nucleus in an infinitesimal time, defining the nuclear export rate.  This has been clarified in the revised manuscript (Lines 314-316), with some alterations to make the parameter use more comprehensive.  

      -  The Greek letter k previously used in Fig. 4 and the associated equations was consistently replaced with lambda to avoid the confusion with the parameter k that is subsequently used for the exponent decay in Fig. 5 in the revised manuscript.  

      -  The Greek letter epsilon (previously used to represent export) was replaced with mu, slightly more common for representing the rate of transport.  

      -  The term “velocity” was consistently replaced with “rate” in the context of the nuclear export (Lines 163, 215, 320, 441).  

      -  The phrase “molar concentrations of mRNAs” was corrected for “molecules of mRNAs” (Line 282).

      Also, we have now described our model in two sections: “Conceiving the model” and “Implementing a kinetic model to estimate the nuclear export rate” in the Result.  The first section outlines the conceptual framework of the model, and the second focuses on its implementation and the parameter extraction (Lines 227 and 277).  

      Applicability of the model

      Lines 27-29. "Our framework presented in this study can be widely used for investigating the nuclear retention of nascent transcripts produced in a transcription burst." In my opinion, this is the strongest point of the manuscript: developing a mathematical model to analyze nuclear export retention as a mechanism of protein expression control, which could lay the foundation for further biological processes. The authors revisit this idea in the Discussion section. However, which would be those processes for which the model could be helpful? I consider that a more conspicuous discussion on this topic would broaden the readers scope, a crucial point under the eLife scope.  Reviewer #1

      * Could this framework be used to quantify the nuclear export rate of cellular RNAs? According to the explanation in the Discussion, it would seem that this approach is limited to quantifying the export rate of influenza RNAs.  Reviewer #2

      Our model is not limited to the influenza virus infection.  Our model is applicable for systems where transcription is initiated concurrently, such as when stimuli trigger the activation of a certain set of genes for transcription.  Therefore, this makes it particularly valuable for quantifying the nuclear retention of mRNAs in a transcription burst.  This point is reiterated in Line 383-390.  

      Potential mechanisms for differential nuclear export rate of viral segments

      * There is no mechanistic insight in the study. The idea driven by this study is that gene expression is regulated by the RNA export rate. But how is that explained? Is there any molecular pathway or explanation for this model? If the transcripts are ready for export, why do the mRNAs stay inside the nucleus? One option to consider are the export factors. Viral RNAs are exported by different pathways as mentioned (line 362), or by TREX2 (Bhat P et al Nat Comm 2023). The data shows that there is no difference observed in the export rate of different pathways. How about knocking down an important export factor to show how this affects the export rates. Or the opposite, overexpress a certain factor, would this change the nucleus/cytoplasm distribution of the retained RNAs.  Reviewer #2

      As we discussed in the paper, we are beginning to consider that each viral segment has an intrinsic sequence that determines its nuclear export rate, because previous studies on the export factors does not fully explain the variation in the nuclear export rate observed in our study.  As the reviewer suggested, a recent study (Bhat et al. Nature Communications 2023) exactly pointed out the internal sequence in the HA segment, aligning with our working hypothesis.  This point is discussed and their work (Bhat et al. 2023) has been cited in the Discussion section in the revised manuscript (Lines 446-449).  

      Biological impact of the nuclear retention

      The authors mention several times throughout the manuscript that the virus might use the nuclear retention of mRNA for HA and NA to postpone the expression of these antigenic molecules. At this point, I need to admit that a great question mark appeared in my mind, maybe related to the fact that some knowledge is lacking in my analysis. Lines 328-330: "On the other hand, pushing back the expression of viral antigens HA and NA would be beneficial for the virus to delay the host immune response against the infected cells in which the virus is being replicated." As I tend to understand, the host immune response recognizes HA and NA within the viral particle, if so and independently of the time that HA and Na arrive at the virus assembly step, the progeny' viral particles that are complete and extruded from the cells would be those awakening the host immunity response. If this is right, how would the delayed export of those proteins from the nucleus (and their late expression) be beneficial for delaying the immune response? I would appreciate an explanation for this point, and if I am wrong, then there could exist a relationship between nuclear export rate and the pathogenicity of different strains of influenza A virus. If so, could the authors challenge their model with additional viral strains showing a differential immune response pattern? A deeper analysis in this direction would greatly strengthen the message in their manuscript.  Reviewer #1

      * Is the timing of viral protein appearance in accordance with the time the mRNA is exported to the cytoplasm. It is logical that the first mRNA to go to the cytoplasm would be the first to become a protein. Can the authors show that nuclear retention of mRNA would push back the expression of the viral antigens HA and NA.  Reviewer #2

      Three types of immune reactions are being studied extensively.  The first is the humoral immune response, where antibodies target the viral antigens HA and NA on the viral envelope, coating and inactivating the viral particles.  The second is the cytotoxic T cell response.  There is growing evidence that cytotoxic T cells react against NP, eliciting cross-reaction to broader range of influenza viral strains.  This reaction is not specific to HA and NA, and antigens are processed in the cytoplasm and presented on the MHC.  The third is antibody-dependent cellular cytotoxicity (ADCC), where antibodies recognise the viral proteins on the cellular surface (HA and NA) of infected cells, facilitating their elimination by the NK cells.  Although protein translation may begin as soon as the first mRNA exits the nucleus, the virus may delay the peak of the antigen production and therefore, postpone the NK-mediated ADCC.  This specific point, along with references to ADCC in influenza virus infection, has been clarified in the Discussion section (Lines 377-381).  

      Data analysis and presentation

      Lines 99-101. "Viral mRNAs were detected as single diffraction-limited spots in the three-dimensional image stacks, allowing for absolute mRNA quantification (Fig. 1B)". What do the authors mean to say by "absolute mRNA quantification"? Do they refer to the total spots or the total mRNAs? Is it assumed that one spot corresponds to a single mRNA transcript? This is not clear at all for this reviewer, which could be the situation for a potential reader. Since it's the beginning of the story, this should be clearly stated in the manuscript.  Reviewer #1

      Each spot of fluorescent signal corresponds to a single molecule of viral mRNA.  We quantified the absolute number of transcripts in each cell.  This is clarified in the revised manuscript (Lines 104-106).  

      * Line 151: does the baseline change according to the RNA in question? The authors say that the "baseline is defined by the median of the Z distribution of peripheral mRNAs" - it seems that the number 0.731 refers only to one type of RNA (which is not mentioned at all not in the text and not in the legend). Reviewer #2

      The baseline was set using the NP mRNAs in the cytoplasm because the NP mRNA showed the widest distribution across the cytoplasm (Line 157).  

      * Also, what is all the signal that is seen outside the marked cells in Fig. 2B? There seems to be significant background in the field, does this mean much false-positive in the multiplex FISH? If so, then how do the authors know that the staining inside the cells isn't to some degree non-specific? It would be necessary to back this up with some other type of quantitative assay like qRT-PCR.  Reviewer #2

      The cells were removed from the analysis if the cytoplasmic boundary touched any edge of the field-of-view, while the signals were recovered across the entire field-of-view.  This is clarified in the figure legend (Lines 194-195).  

      Others

      * The meaning and explanation for Figure 1H -are unclear. Rephrase and make the legend more reader friendly.  Reviewer #2

      We made alterations to the legend (Lines 132-134) and the relevant lines in the main text (Lines 148-151).  

      * Fig. 2E: Is this the total transcript count or only in the nucleus? Would it be possible to find some correlation between the segments if a pair-wise analysis is performed according to nuclear-cytoplasm distribution?  Reviewer #2

      The total counts are presented.  This is clarified in the legend (Lines 199-200).  

      * Abstract -"A mathematical modelling indicated that the relationship between the nuclear ratio and the total count of mRNAs in single cells is dictated by a proxy for the nuclear export rate." - this sentence is very unclear.  Reviewer #2

      The sentence was removed in the revised manuscript (Line 21).  This removal did not affect the overall meaning in the abstract.  We also made an alteration to Line 279 that contained a similar phrase.  

      * The use of the word "acutely" (lines 16 and 35) is strange.  Reviewer #2

      They have been removed (now Lines 15, 33).  

      * Line 157 - "This result indicates that the velocity of viral mRNA export from the nucleus varies according to the viral segments." - not velocity, maybe timing.  Reviewer #2

      We consistently replaced “velocity” with “rate” (Lines 163, 215, 320, 441).

      * Reference for line 41.  Reviewer #2

      A reference (Waker et al. Trends Microbiol. 2019) has been cited (Line 39).  

      * Reference for lines 105-106.  Reviewer #2

      The gene length of each segment was indicated in the sentence (Line 137).  

      * Line 264- why here is 0.02 M.O.I used compared to line 97 where 2 is used?  Reviewer #2

      We used M.O.I. of 0.02 to allow for spot quantification over longer periods of observation (Lines 269-270).  

      * NS1 is expressed at late infection times and might alter the nuclear export of viral mRNAs (line 352). Need to show that indeed it is not expressed in the experiments done here.  Reviewer #2

      It is not possible to definitely prove that NS1 is not expressed due to the sensitivity limitations.  However, we minimised the its impact by investigating at the early time point (Lines 415416).  

      * Line 459- 30% formamide? Is this correct or should it be 10%?  Reviewer #2

      This is correct.  The probes used were longer than the others for smFISH.  Therefore, we washed away the probes with the stringent condition.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Giménez-Orenga et al. investigate the origin and pathophysiology of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and fibromyalgia (FM). Using RNA microarrays, the authors compare the expression profiles and evaluate the biomarker potential of human endogenous retroviruses (HERV) in these two conditions. Altogether, the authors show that HERV expression is distinct between ME/CFS and FM patients, and HERV dysregulation is associated with higher symptom intensity in ME/CFS. HERV expression in ME/CFS patients is associated with impaired immune function and higher estimated levels of plasma cells and resting CD4 memory T cells. This work provides interesting insights into the pathophysiology of ME/CFS and FM, creating opportunities for several follow-up studies.

      Strengths:

      (1) Overall, the data is convincing and supports the authors' claims. The manuscript is clear and easy to understand, and the methods are generally well-detailed. It was quite enjoyable to read.

      (2) The authors combined several unbiased approaches to analyse HERV expression in ME/CFS and FM. The tools, thresholds, and statistical models used all seem appropriate to answer their biological questions.

      (3) The authors propose an interesting alternative to diagnosing these two conditions. Transcriptomic analysis of blood samples using an RNA microarray could allow a minimally invasive and reproducible way of diagnosing ME/CFS and FM.

      Weaknesses:

      (1) The cohort analysed in this study was phenotyped by a single clinician. As ME/CFS and FM are diagnosed based on unspecific symptoms and are frequently misdiagnosed, this raises the question of whether the results can be generalised to external cohorts.

      Thank you for your comment. Surely the study of larger cohorts will determine the external validity of these results in a clinical scenario. However, this pilot study, first of its kind, was designed to maximize homogeneity across participants which seemed primarily ensured by inclusion of females only diagnosed by a single experienced observer.

      (2) The analyses performed to unravel the causes and effects of HERV expression in ME/CFS and FM are solely based on sequencing data. Experimental approaches could be used to validate some of the transcriptomic observations.

      Certainly, experimental approaches may add robustness to our findings. We in fact consider taking this avenue to deepen in the observations presented here. However, the limited knowledge of HERV-mediated physiological functions may hinder the task of revealing causes and effects of HERV expression in ME/CFS and FM in the short term.

      Reviewer #2 (Public review):

      Summary:

      Giménez-Orenga carried out this study to assess whether human endogenous retroviruses (HERVs) could be used to improve the diagnosis of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and Fibromyalgia (FM). To this end, they used the HERV-V3 array developed previously, to characterize the genome-wide changes in the expression of HERVs in patients suffering from ME/CFS, FM, or both, compared to controls. In turn, they present a useful repertoire of HERVs that might characterize ME/CFS and FM. For the most part, the paper is written in a manner that allows a natural understanding of the workflow and analyses carried out, making it compelling. The figures and additional tables present solid support for the findings. However, some statements made by the authors seem incomplete and would benefit from a more thorough literature review. Overall, this work will be of interest to the medical community seeking in better understanding of the co-occurrence of these pathologies, hinting at a novel angle by integrating HERVs, which are often overlooked, into their assessment.

      Strengths:

      (1) The work is well-presented, allowing the reader to understand the overall workflow and how the specific aims contribute to filling the knowledge gap in the field.

      (2) The analyses carried out to understand the potential impact on gene expression mediated by HERVs are in line with previous works, making it solid and robust in the context of this study.

      Weaknesses:

      (1) The authors claim to obtain genome-wide HERV expression profiles. However, the array used was developed using hg19, while the genomic analysis of this work are carried out using a liftover to hg38. It would improve the statement and findings to include a comparison of the differences in HERVs available in hg38, and how this could impact the "genome-wide" findings.

      This is an important point. However, the low number of probes that were excluded from our analysis by lack of correspondence with hg38, less than 100 among the 1,290,800 probesets, was interpreted as insignificant for "genome-wide" claims. An aspect that will be detailed in the revised version of this manuscript.

      (2) The authors in some points are not thorough with the cited literature. Two examples are:

      a) Lines 396-397 the authors say "the MLT1, usually found enriched near DE genes (Bogdan et al., 2020)". I checked the work by Bogdan, and they studied bacterial infection. A single work in a specific topic is not sufficient to support the statement that MLT1 is "usually" in close vicinity to differentially expressed genes. More works are needed to support this.

      b) After the previous statement, the authors go on to mention "contributing to the coding of conserved lncRNAs (Ramsay et al., 2017)". First, lnc = long non-coding, so this doesn't make sense. Second, in the work by Ramsay they mention "that contributed a significant amount of sequence to primate lncRNAs whose expression was conserved", which is different from what the authors in this study are trying to convey. Again, additional work and a rephrasing might help to support this idea.

      Certainly, these two sentences need rephrasing to better adjust statements to current evidence and will be replaced in the revised version of this manuscript.

      (3) When presenting the clusters, the authors overlook the fact that cluster 4 is clearly control-specific, and fail to discuss what this means. Could this subset of HERV be used as bona fide markers of healthy individuals in the context of these diseases? Are they associated with DE genes? What could be the impact of such associations?

      Using control DE HERV as bona fide markers of healthy individuals seems like an interesting possibility worth exploring. Control DE HERVs (cluster 4) are indeed associated with DE genes involved in apoptosis, T cell activation and cell-cell adhesion (modules 1 and 6) (Figure 3A). The impact of which deserves further study.

      Appraisals on aims:

      The authors set specific questions and presented the results to successfully answer them. The evidence is solid, with some weaknesses discussed above that will methodologically strengthen the work.

      Likely impact of work on the field:

      This work will be of interest to the medical community looking for novel ways to improve clinical diagnosis. Although future works with a greater population size, and more robust techniques such as RNA-Seq, are needed, this is the first step in presenting a novel way to distinguish these pathologies.

      It would be of great benefit to the community to provide a table/spreadsheet indicating the specific genomic locations of the HERVs specific to each condition. This will allow proper provenance for future researchers interested in expanding on this knowledge, as these genomic coordinates will be independent of the technique used (as was the array used here).

      We agree with the reviewer that sharing genomic locations of DE HERVs in these pathologies would contribute to further development of our findings. Unfortunately, we do not hold the rights to share probe coordinates from this custom HERV-V3 microarray which we used under MTA agreement with its developer.

      Reviewer #3 (Public review):

      The authors find that HERV expression patterns can be used as new criteria for differential diagnosis of FM and ME/CFS and patient subtyping. The data are based on transcriptome analysis by microarray for HERVs using patient blood samples, followed by differential expression of ERVs and bioinformatic analyses. This is a standard and solid data processing pipeline, and the results are well presented and support the authors' claim.

    1. Author response:

      Thank you to the reviewers and editors for their positive and constructive comments. Based on this feedback, we can see that we need to clarify that the primary goal of this paper is a test of potential changes in public health policy rather than a test of technical improvements to forecasting models. We briefly summarize the primary goal below to address these public reviews and list our proposed revisions to the manuscript based on reviewer feedback.

      All real-time forecasting models contend with 2 major constraints:

      (1) How far into the future they have to predict

      (2) How rapidly the data used for predictions become available in real time

      In the case of evolutionary influenza forecasts, the current values of these constraints are 1) 12 months into the future and 2) an average lag of ~3 months for hemagglutinin (HA) sequences to become available after sample collection. Regardless of the predictors we use in these models (genetic or phenotypic), our units of prediction always depend on HA: the HA protein is the primary target of our immunity, HA is the only gene whose composition is determined by the vaccine selection process, and influenza diversity is historically defined by clades in HA phylogenies.

      Our primary goal of this study was to understand the relative effect sizes of these two common constraints on forecasting while holding all other variables as constant as possible. With this understanding, we hoped to better inform public health priorities and set realistic expectations for current and future forecasting efforts regardless of the technical specifications of each forecasting model. In other words, the goal of this study was not to optimize prediction methods but to estimate the effects of potential policy changes on forecast accuracy.

      We found that reducing how far into the future we need to predict consistently reduced our forecasting error in simulated populations (where we knew the true fitness of each virus) and in natural populations (where we either estimated fitness from genetic predictors or we knew the true fitness of each virus based on its future success). Figure 6 and its first supplemental figure show these effect sizes for natural and simulated populations, respectively, when the future fitness of each virus is known at the time of prediction. By definition, we cannot hope to improve our estimates of viral fitness for these forecasts by using other genetic or phenotypic information.

      Figure 6 shows that reducing how far into the future we need to predict from 12 to 6 months improves our forecasting accuracy 3 times as much as reducing the lag between sample collection and HA sequence submission to public databases. The impact of this finding is the confirmation that a faster vaccine development process would improve our forecast accuracy substantially more than faster turnaround between sample collection and sequence submission. If our public health goal is to make better predictions of future influenza populations, then this result indicates that our main priority is to speed up the vaccine development process.

      If our public health goal is to better understand the composition of currently circulating influenza populations (the units of our forecasts), then Figure 3 shows that reducing the lag between sample collection and HA sequence submission from ~3 months on average to 1 month on average reduces our uncertainty in current clade frequency estimates by half. This impact is also independent of the predictors we use in our forecasting models and is not lessened by the lack of other genetic or phenotypic information in our analyses.

      We realize that neither a 6-month vaccine development process nor a 1-month average sequence submission lag exist yet, but we believe that these are realistic and achievable goals for scientific and public health communities. We also realize that these public health goals are not mutually exclusive. By measuring the effects of these realistic changes to current policy through our forecasting experiments, we hope to inspire and motivate researchers and decision-makers who are empowered to make both of these goals a reality.

      Finally, we want to emphasize that the use of phenotypic data in forecasts introduces additional delays caused by the lag between when genetic sequences become available and when serological experiments can be performed. Most WHO influenza collaborating centers use a "sequence-first" approach where they characterize the genetic sequence and use available sequences to prioritize phenotypic experiments with serology. This additional lag in availability of phenotypic data means that a forecasting model based on genetic and phenotypic data will necessarily have a greater lag in data availability than a model based on genetic data only. This lag is important for practical forecasts, too, but because the lag reflects specific characteristics of each collaborating center and not a global policy change, we believe this topic falls outside of the scope of this study.

      Based on these public reviews and the private recommendations from reviewers, we plan to make the following revisions to this manuscript.

      ● Clarify the introduction, discussion, and abstract to emphasize the primary goal for this study to test effects of realistic changes to public health policy and note that this study does not cover improvements to forecasting models. As part of these changes, we will include a rationale for our choice of a genetic-information-only approach rather than a model that integrates phenotypic data. We will also refine Figure 1 to more clearly communicate the two factors we tested in this study.

      ● Provide a clearer explanation for the subsampling approach we use, include supplemental materials to communicate the geographic and temporal biases that exist in available HA sequence data, and discuss potential effects of different subsampling strategies.

      ● Evaluate the robustness of our results to different randomly subsampled data. We will perform additional technical replicates of our analysis workflow for natural populations, and summarize the effects of realistic interventions across replicates in a supplemental figure and the main text of the results.

      ● Investigate time-dependent effects of forecast horizons and submission lags on model accuracy to identify any potential biases in accuracy during specific historical epochs or any seasonal trends in accuracy associated with predicting future populations for the Northern or Southern Hemispheres.

      ● In the discussion, clarify how reducing submission lags would practically improve the WHO's ability to select vaccine candidate viruses and minimize jargon that currently makes the discussion less accessible to the average reader.

      ● Investigate how changes in forecast horizons and submission lags change the distance between predicted and observed future populations at antigenic positions (i.e., "epitope" positions) to understand whether we see the same effects with that subset of positions as we see across all HA positions.

    1. Author Response:

      We greatly appreciate the feedback provided by reviewers on this manuscript. One of our key objectives was to provide a comprehensive, detailed resource for researchers using single-cell transcriptomics to study arthritis, especially immune cells like macrophages. We strived to perform thorough, wide-ranging analyses that are both accessible and useful to other scientists in the field, and that we hope will serve as the basis for many future avenues of study. As such, we acknowledge that this work is a “first step”, providing a strong descriptive foundation with some mechanistic insight that we and others will continue pursuing. Preliminary studies in our laboratory seeking to dissect signaling mechanisms associated with the M-CSF pathway have illuminated how complex and context-dependent this signaling is, which is an important consideration for future in vivo investigations. Further, it is indeed true that attempting to harmonize transcriptomic data across studies, models, laboratories, and dissection/processing methods is fraught with difficulty and prone to misinterpretation – and we made an effort to highlight this in our manuscript, particularly with respect to where synovial immune cells were recovered from, and how. We encourage healthy discussion within the field for developing shared, unified protocols for harvests and processing upstream of transcriptomic experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      Summary:

      The authors wanted to use AlphaFold-multimer (AFm) predictions to reduce the challenge of physics-based protein-protein docking.

      Strengths:

      They found that two features of AFm predictions are very useful. 1) pLLDT is predictive of flexible residues, which they could target for conformational sampling during docking; 2) the interface-pLLDT score is predictive of the quality of AFm predictions, which allows the authors to decide whether to do local or global docking.

      Weaknesses:

      (1) As admitted by the authors, the AFm predictions for the main dataset are undoubtedly biased because these structures were used for AFm training. Could the authors find a way to assess the extent of this bias?

      Indeed, the AFm training included most of the structures in the DB5 benchmark for its training as many structures (either unbound or bound) were deposited before the training cut-off period. One of the challenges of estimating this bias is the availability of new structures - both bound and unbound deposited after the training cut-off. Estimating the extent of training bias is therefore conditional on these factors and difficult. A few studies have attempted to address this bias (Yin et al, 2022, https://doi.org/10.1002/pro.4379).

      In our study, we assess this bias by comparing the AFm structures to the bound and unbound forms and calculating their Ca RMSDs and TM-scores (new addition). We now elaborate in the Results:Dataset curation section and we have added a figure comparing the TM-scores in the supplement.

      We added a clarifying text and a note about the TM-score calculation in the manuscript as follows:

      “Since most of the benchmark targets in DB5.5 were included in AlphaFold training, there would be training bias associated with their predictions (i.e. our measured success rates are an upper bound).”

      “We also calculated the TM-scores of the AFm predicted complex structures with respect to the bound and the unbound crystal structures (Supplementary Figure S2). As TM-scores reflect a global comparison between structures and are less sensitive to local structural deviations, no strong conclusions could be derived. This is in agreement with our intuition that since both unbound and bound states of proteins will share a similar fold, and AlphaFold can predict structures with high TM-scores in most cases, gauging the conformational deviations with TM-scores would be inconclusive.”

      (2) For the CASP15 targets where this bias is absent, the presentation was very brief. In particular, it would be interesting to see how AFm helped with the docking. The authors may even want to do a direct comparison with docking results without the help of AFm.

      Unfortunately since this was a CASP-CAPRI round, the structure of the unbound Antigen or the nanobodies was unavailable. Thus we cannot perform a comparison without using AF2 at all since we need a structure prediction tool to produce the unbound nanobody and the nanobody-antigen complex template structure to dock. This has been clarified in the main text for better understanding for the readers.

      “Since the nanobody-antigen complexes were CASP targets, we did not have unbound structures, rather only the sequences of individual chains. Therefore, for each target, we employed the AlphaRED strategy as described in Fig 7.”

      Reviewer #1 (Recommendations For The Authors):

      For suggestions for major improvements, see comments under weaknesses. One additional suggestion: the authors found that pLLDT is predictive of flexible residues. Can they try to find AFm features that are predictive of the interface site? Such information may guide their docking to a local site.

      This is a great idea that we and others have been thinking about considerably. Prior work by Burke et al. (Towards a structurally resolved human protein interaction network) examines AlphaFold’s ability to predict PPIs. For high-confidence predicted models of interacting protein complexes, the authors showed that pDockQ correlated reasonably well with correct protein interactions.

      That being said, binding site identification, particularly in a partner-agnostic fashion, i.e. determining binding patches on a given protein, is an area of on-going research . We hope a future study examines AlphaFold3 or ESM3 specifically for this task.

      “Further, we tested multiple thresholds to estimate the optimum cut-off for distinguishing near-native structures (defined as an interface-RMSD < 4 Å) from the predictions. Figure 3.B summarizes the performance with a confusion matrix for the chosen interface-pLDDT cutoff of 85. 79 % of the targets are classified accurately with a precision of 75%, thereby validating the utility of interface-pLDDT as a discriminating metric to rank the docking quality of the AFm complex structure predictions. With AlphaFold3 and ESM3 being released, investigating features that could predict flexible residues or interface site would be valuable, as this information may guide local docking.”

      Minor:

      Page 3, lines 73-77, state how many targets were curated from DB5.5.

      We have now clarified this in the manuscript. All 254 targets curated from DB5.5 at the time of this benchmark study.

      “For each protein target, we extracted the amino acid sequences from the bound structure and predicted a corresponding three-dimensional complex structure with the ColabFold implementation of the AlphaFold multimer v2.3.0 (released in March 2023) for the 254 benchmark targets from DB5.5.”

      In Figure 1, the color used for medium is too difficult to distinguish from the grey color used for rigid.

      We thank you for this suggestion. We have updated the color to olive. Further, based on Reviewer 2’s suggestions, we have moved this plot to the Supplementary.

      Reviewer #2 (Public Review):

      Summary:

      In short, this paper uses a previously published method, ReplicaDock, to improve predictions from AlphaFold-multimer. The method generated about 25% more acceptable predictions than AFm, but more important is improving an Antibody-antigen set, where more than 50% of the models become improved.

      When looking at the results in more detail, it is clear that for the models where the AFm models are good, the improvement is modest (or not at all). See, for instance, the blue dots in Figure 6. However, in the cases where AFm fails, the improvement is substantial (red dots in Figure 6), but no models reach a very high accuracy (Fnat ~0.5 compared to 0.8 for the good AFm models). So the paper could be summarized by claiming, "We apply ReplicaDock when AFm fails", instead of trying to sell the paper as an utterly novel pipeline. I must also say that I am surprised by the excellent performance of ReplicaDock - it seems to be a significant step ahead of other (not AlphaFold) docking methods, and from reading the original paper, that was unclear. Having a better benchmark of it alone (without AFm) would be very interesting.

      We thank the reviewer for highlighting the performance of ReplicaDock. ReplicaDock alone is benchmarked in the original paper (10.1371/journal.pcbi.1010124), with full details on the 2022 version of DB5.5 in the supplement. Indeed ReplicaDock2 achieves the highest reported success rates on flexible docking targets reported in the literature (until this AlphaRED paper!).

      Regarding this statement about “the paper could be summarized…” it might be helpful to give more context. ReplicaDock is a replica exchange Monte Carlo sampling approach for protein docking that incorporates flexibility in an induced-fit fashion. However, the choice of which backbone residues to move is solely dependent on contacts made during each docking trajectory. In the last section of the ReplicaDock paper, we introduced “Directed Induced-fit” where we biased the backbone sampling only towards those residues where we knew the backbone is flexible (this information is obtained because for the benchmark set, we had both unbound and bound structures and hence could cherry-pick the specific residues which are mobile). We agree with the reviewers that AlphaRED is essentially a derivative of ReplicaDock, however, the two major claims that we make in this paper are:

      (1) AlphaFold pLDDT is an effective predictor of backbone flexibility for practical use in docking.

      (2) We can automate the Directed InducedFit approach within ReplicaDock by utilizing this pLDDT information per residue for conformational sampling in protein docking; and in doing so, create a pipeline that would allow us to go from sequence-to-structure-to-complex, specifically capturing conformational changes.

      To conclude these claims, we pose the following questions in the Introduction:

      “(1) Do the residue-specific estimates from AF/AFm relate to potential metrics demonstrating conformational flexibility?

      (2) Can AF/AFm metrics deduce information about docking accuracy?

      (3) Can we create a docking pipeline for in-silico complex structure prediction incorporating AFm to convert sequence-to-structure-to-docked complexes?”

      This work requires a pipeline, the center of which lies in ReplicaDock as a docking method, but has functionalities that were absent in prior work. The goal is also to develop a one-stop shop without manual intervention (a prerequisite for biasing backbone sampling in ReplicaDock) that could be utilized by structural biologists efficiently.

      We clarify this points in the abstract and main text as follows:

      Abstract: “In this work, we combine AlphaFold as a structural template generator with a physics-based replica exchange docking algorithm \add{to better sample conformational changes.”

      Introduction:

      “The overarching goal is to create a one-stop, fully-automated pipeline for simple, reproducible, and accurate modeling of protein complexes. We investigate the aforementioned questions and create a protocol to resolve AFm failures and capture binding-induced conformational changes. We first assess the utility of AFm confidence metrics to detect conformational flexibility and binding site confidence.”

      These results also highlight several questions I try to describe in the weakness section below. In short, they boil down to the fact that the authors must show how good/bad ReplicaDock is at all targets (not only the ones where AFm fails. In addition, I have several more technical comments.

      Strengths:

      Impressive increase in performance on AB-AG set (although a small set and no proteins).

      We thank the reviewer for their comments.

      Weaknesses:

      The presentation is a bit hard to follow. The authors mix several measures (Fnat, iRMS, RMSDbound, etc). In addition, it is not always clear what is shown. For instance, in Figure 1, is the RMSD calculated for a single chain or the entire protein? I would suggest that the author replace all these measures with two: TM-score when evaluating the quality of a single chain and DockQ when evaluating the results for docking. This would provide a clearer picture of the performance. This applies to most figures and tables.

      We apologize for the lack of clarity owing to different metrics. Irms and fnat are standard performance metrics in the docking field, but we agree that DockQ would be simpler when the detail of the other metrics are not required. We have updated the figures Figure 5 and Figure 8 to also show DockQ comparisons.

      Regarding Figure 1, as highlighted in Line 90 of the main-text, “Figure 1 shows the Ca-RMSD of all protein partners of the AFm predicted complex structures with respect to the bound and the unbound.” As suggested by the reviewer in their further comments, we have moved this FIgure to the Supplementary. We have also included TM-score comparison in the Supplementary ( SupFig S2) and included clarifying statements in the main text:

      “We also tested TM-scores to measure the structural deviations of the AFm predicted complex structures with respect to the bound and unbound structures (Supplementary Figure S2). However, this metric is not sensitive enough to detect the subtle, local conformational changes upon binding.”

      For instance, Figure 9 could be shown as a distribution of DockQ scores.

      We have now updated Figure 5 to include DockQ scores in Panel D. Since DockQ is a function of iRMSD, fnat and L-RMSD, it shows cumulative improvement in performance. Some of the nuanced details, such as, the protocol improves i-RMSD considerably but fnat improvement is lacking, and can highlight whether backbone sampling is the challenge or is it sidechain refinement.Therefore, we need to retain the iRMSD and fnat metrics in panel A-C . But We have incorporated this in the main text as follows:

      “Finally, to evaluate docking success rates, we calculate DockQ for top predictions from AFm and AlphaRED respectively (Figure 5D). AlphaRED demonstrates a success rate (DockQ>0.23) for 63% of the benchmark targets. Particularly for Ab-Ag complexes, AFm predicted acceptable or better quality docked structures in only 20% of the 67 targets. In contrast, the AlphaRED pipeline succeeds in 43% of the targets, a significant improvement.”

      Further, we have reevaluated success rates in Figure 8 (previously Figure 9) and have updated the manuscript to report these updated success rates.

      “By utilizing the AlphaRED strategy, we show that failure cases in AFm predicted models are improved for all targets (lower Irms for 97 of 254 failed targets) with CAPRI acceptable-quality or better models generated for 62% of targets overall (Fig 8)”.

      The improvements on the models where AFm is good are minimal (if at all), and it is unclear how global docking would perform on these targets, nor exactly why the plDDT<0.85 cutoff was chosen.

      We agree with the reviewers that the improvement on the models with good AFm predictions is minimal. We acknowledge this in the text now as follows:

      “Most of the improvements in the success rates are for cases where AFm predictions are worse. For targets with good AFm predictions, AlphaRED refinement results in minimal improvements in docking accuracy.”

      The choice of pLDDT cutoff = 85 is elaborated in the “Interface-pLDDT correlates with DockQ and discriminates poorly docked structures” section, paragraph 3. Briefly, we tested multiple metrics and the interface pLDDT had the highest AUC, indicating that it is the best metric for this task. For interface-pLDDT we tested multiple thresholds, and the cutoff of 85 resulted in the highest percentage of true-positive and true-negative rates. This is illustrated with the confusion matrix in Figure 3.B with the precision scores. We now clarify this in the text as follows:

      “With interface-pLDDT as a discriminating metric, we tested multiple thresholds to estimate the optimum cut-off for distinguishing near-native structures (defined as an interface-RMSD < 4 Å) from the predictions. Figure 3B summarizes the performance with a confusion matrix for the chosen interface-pLDDT cutoff of 85. 79% of the targets are classified accurately with a precision of 75%, thereby validating the utility of interface-pLDDT as a discriminating metric to rank the docking quality of the AFm complex structure predictions.”

      To better understand the performance of ReplicaDock, the authors should therefore (i) run global and local docking on all targets and report the results, (ii) report the results if AlphaFold (not multimer) models of the chains were used as input to ReplicaDock (I would assume it is similar). These models can be downloaded from AlphaFoldDB.

      The performance of ReplicaDock on DB5.5 is tabulated in our prior work (https://doi.org/10.1371/journal.pcbi.1010124) and we direct the reviewers there for the detailed performance and results. In our opinion, the benchmark suggested by the reviewer would be redundant and not worth the computational expense.

      The scope of this paper is to highlight a structure prediction + physics-based modeling pipeline for docking to adapt to the accuracy of up-and-coming structure prediction tools.

      Using AlphaFold monomer chains as input and benchmarking on that, albeit interesting scientifically, will not be useful for either the pipeline or biologists who would want a complex structure prediction. We thank the authors for their comments but want to reemphasize that the end goal of this work is to increase the accuracy of complex structure predictions and PPIs obtained from computational tools.

      Further, it would be interesting to see if ReplicaDock could be combined with AFsample (or any other model to generate structural diversity) to improve performance further.

      We would like to highlight that ReplicaDock is a stand-alone tool for protein docking and here we demonstrate the ability of adapting it with metrics derived from AlphaFold or other structure prediction tools (say ESMFold) such as pLDDT for conformational sampling and improving docking accuracy. We definitely agree that adapting it to use with tools such as AFSample will be interesting but it is out of scope of this work.

      The estimates of computing costs for the AFsample are incorrect (check what is presented in their paper). What are the computational costs for RepliaDock global docking?

      The authors of the AFSample paper report that “AFsample requires more computational time than AF2, as it generates 240 models, and including the extra recycles, the overall timing is 1000 more costly than the baseline.” We have reported these exact numbers in our manuscript.

      The computational costs of ReplicaDock are 8-72 CPU hours on a single node with 24 processors as reported in our prior work.

      For AlphaRED, the costs are slightly higher owing to the structure prediction module in the beginning and are up to 100 CPU hrs for our largest (max Nres) target.

      It is unclear strictly what sequences were used as input to the modelling. The authors should use full-length UniProt sequences if they were not done.

      We report this in the methods section of the manuscript as well as in Figure 5. Full length complex sequences were used for the models that we extracted from DB5.5.

      “As illustrated in Fig. 5, given a sequence of a protein complex, we use the ColabFold implementation of AF2-multimer to obtain a predictive template.”

      We clarify this in the methods section as:

      “For each target in the DB5.5 dataset, we first extracted the corresponding FASTA sequence for the bound complex and then obtained AlphaFold predicted models with the ColabFold v1.5.2 implementation of AlphaFold and AlphaFold-multimer (v.2.3.0).”

      The antibody-antigen dataset is small. It could easily be expanded to thousands of proteins. It would be interesting to know the performance of ReplicaDock on a more extensive set of Antibodies and nanobodies.

      This work demonstrates the performance on the docking benchmark, i.e. given unbound structure can you predict the bound complexes. With this regard, our analysis has been focussed on targets where both the unbound and bound structures are available so that we could evaluate the ability of AlphaRED on modeling protein flexibility and docking accuracy. For antibody-antigen complexes, there are only 67 structures with both unbound and bound complexes available and they constituted our dataset. Benchmarking AlphaRED on all antibody-antigen targets can give biased results as most Ab-Ag complexes are in AlphaFold training set. Further, our work is more aimed towards predicting conformational flexibility in docking and not rigid-body docked complexes, so benchmarking on existing bound Ab-Ag structures is out of scope for this work.

      Using pLDDT on the interface region to identify good/bas models is likely suboptimal. It was acceptable (as a part of the score) for AlphaFold-2.0 (monomer), but AFm behaves differently. Here, AFm provides a direct score to evaluate the quality of the interaction (ipTM or Ranking Confidence). The authors should use these to separate good/bad models (for global/local docking), or at least show that these scores are less good than the one they used.

      We thank the reviewers for this suggestion.

      Reviewer #2 (Recommendations For The Authors):

      Some Figures could be skipped/improved

      Fig 1: Use TM-score instead a much better measure (and the figure is not necessary).

      Figure 1 compares the bias of AlphaFold towards unbound or bound forms of the proteins. We believe that this figure highlights the slight inherent bias of AlphaFold towards bound structures over unbound.

      As the reviewers have suggested we have included a plot comparing the TM-scores for the structures. Further, we have moved this figure to the Supplementary.

      Fig 2. Skip B (why compare RMSD with pLDDT?). Add a figure to see how this correlates over all targets not just two.

      RMSD and LDDT both represent metrics to evaluate conformational variability between two structures, such as the bound and unbound forms of the same protein structure. On one hand where RMSD measures overall deviation of residues, LDDT allows the estimation of relative domain orientations and concerted proteins. We have elaborated this in Methods as well as in the Results section titled “AlphaFold pLDDT provides a predictive confidence measure for backbone flexibility”.

      The data for the benchmark targets is now included in the Supplementary (Supplementary Figures S3-S4).

      Fig 3. Color the different chains of a protein differently. Thereby the Receptor/Ligand/Bound labels can be omitted.

      We thank the reviewers for this suggestion. However, the color scheme is chosen to highlight (1) the relative orientation of protein partners relative to each other. We have ensured that the alignment is over one partner (Receptor) so that you could see the relative orientation of the other partner (Ligand) in the modeled protein over the bound structure (in one color). (2) The coloring of the receptor and ligand chain is by pLDDT (from red to blue) to highlight that for decoys with incorrectly predicted interfaces, the pLDDT scores of the interface residues are indeed lower and can be a discriminating metric. We elaborate this in the caption of Figure 3 as well as in the section “Interface-pLDDT correlates with DockQ and discriminates poorly docked structures”. Coloring the chains of a protein differently will obfuscate the point that we are aiming to make and will be inconclusive for the readers as they would need to rely only on quantitative metrics (Irms and DockQ) reported but won’t be able to visualize the interface pLDDT of the incorrectly bound structures. We hope that this justifies the choice of our color scheme.

      Fig 4. Include RankConf, ipTM, pDockQ, and other measures in the plos (they are likely better). Include DockQ for the top targets. It is difficult to estimate for multi chain complexes.

      We thank the reviewer for this suggestion. We have now included the DockQ performances for all targets in Figure 5 (previously Figure 6) as well as re-evaluated our final success rates based on the DockQ calculations in Figure 8 (previously Figure 9).

      Fig 5. use a better measure to split (see above).

      We have elaborated on the choice of the split for the comments above and the interface pLDDT threshold of 85 is a decision made post observation on the docking benchmark. We do want to highlight that the cut-off is arbitrary and in our online server (ROSIE) as well as in custom scripts, this cut-off can be tuned by the user as required. We would suggest a cut-off of 85 based on our observations but the users are welcome to tune this as per their needs.

      Fig 6. Replace lrms/fnat with DockQ.

      We have now included DockQ scores in our manuscript.

      Fig 7. Color the different chains of a protein differently.

      We have colored the protein chains differently. AlphaFold models are in Orange, Bound complexes are in Gray, and predicted proteins from AlphaRED are in Blue-Green indicating the two partners. All models are aligned over the receptor so relative orientations of the ligand protein can be observed.

      Fig 8 Color the different chains of a protein differently.

      The chains are colored differently. We would like the reviewer to elaborate more on what they would like to observe as we believe our color scheme makes intuitive sense for readers.

      Fig 9. Use DockQ instead of CAPRI criteria.

      The figure has been updated based on DockQ. To elaborate, the CAPRI criteria is set based on DockQ scores as elaborated in the figure caption.

    1. Author response:

      eLife Assessment <br /> This manuscript reports important findings that the methyltransferase METTL3 is involved in the repair of abasic sites and uracil in DNA, mediating resistance to floxuridine-driven cytotoxicity. The presented evidence for the involvement of m6A in DNA is incomplete and requires further validation with orthogonal approaches to conclusively show the presence of 6mA in the DNA and exclude that the source is RNA or bacterial contamination. 

      We thank the editors for recognizing the importance of our work and the relevance of METTL3 in DNA repair. However, we wholly disagree with the second sentence in the eLife assessment, and we want to clarify why our evidence for the involvement of 6mA in DNA is complete.  

      The identification of 6mA in DNA, upon DNA damage, is based first on immunofluorescence observations using an anti-m6A antibody. In this setting, removal of RNA with RNase treatment fails to reduce the 6mA signal, excluding the possibility that the source of signal is RNA. In contrast, removal of DNA with DNase treatment removes all 6mA signal, strongly suggesting that the species carrying the N6-methyladenosine modification is DNA (Figure 3D, E). Importantly, in Figure 3F, we provide orthogonal, quantitative mass spectrometry data that independently confirm this finding. Mass spectrometry-liquid chromatography of DNA analytes, conclusively shows the presence of 6mA in DNA upon treatment with DNA damaging agents and excludes that the source is RNA, based on exact mass. Reviewer #2 recognized the strengths of this approach to generate solid evidence for 6mA in DNA.

      Cells only show the 6mA signal when treated with DNA damaging agents, and the 6mA is absent from untreated cells (Figure 3D, E, F). This provides strong evidence that the 6mA signal is not a result of bacterial contamination in our cell lines. Moreover, our cell lines are routinely tested for mycoplasma contamination. It could be possible that stock solutions of DNA damaging agents may be contaminated, but this would need to be true for all individual drugs and stocks tested. The data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3G, H) provides strong evidence against bacterial contamination in our stocks.  

      In summary, we provide conclusive evidence, based on orthogonal methods, that the METTL3-dependent N6-methyladenosine modification is deposited in DNA, not RNA, in response to DNA damage. 

      Public Reviews: <br /> Reviewer #1 (Public review): <br /> Summary: 

      The authors sought to identify unknown factors involved in the repair of uracil in DNA through a CRISPER knockout screen. 

      Typo above: “CRISPER” should be “CRISPR”.

      Strengths: 

      The screen identified both known and unknown proteins involved in DNA repair resulting from uracil or modified uracil base incorporation into DNA. The conclusion is that the protein activity of METTL3, which converts A nucleotides to 5mA nucleotides, plays a role in the DNA damage/repair response. The importance of METTL3 in DNA repair, and its colocalization with a known DNA repair enzyme, UNG2, is well characterized. 

      Typo above: “5mA” should be “6mA”.

      Weaknesses: <br /> This reviewer identified no major weaknesses in this study. The manuscript could be improved by tightening the text throughout, and more accurate and consistent word choice around the origin of U and 6mA in DNA. The dUTP nucleotide is misincorporated into DNA, and 6mA is formed by methylation of the A base present in DNA. Using words like 6mA "deposition in DNA" seems to imply it results from incorporation of a methylated dATP nucleotide during DNA synthesis.

      The increased presence of 6mA during DNA damage could result from methylation at the A base itself (within DNA) or from incorporation of pre-modified 6mA during DNA synthesis. Our data do not directly discriminate between these two mechanisms, and we will clarify this point in the discussion.

      Reviewer #2 (Public review): <br /> Summary: <br /> In this work, the authors performed a CRISPR knockout screen in the presence of floxuridine, a chemotherapeutic agent that incorporates uracil and fluoro-uracil into DNA, and identified unexpected factors, such as the RNA m6A methyltransferase METTL3, as required to overcome floxuridine-driven cytotoxicity in mammalian cells. Interestingly, the observed N6-methyladenosine was embedded in DNA, which has been reported as DNA 6mA in mammalian genomes and is currently confirmed with mass spectrometry in this model. Therefore, this work consolidated the functional role of mammalian genomic DNA 6mA, and supported with solid evidence to uncover the METTL3-6mA-UNG2 axis in response to DNA base damage. <br /> Strengths: <br /> In this work, the authors took an unbiased, genome-wide CRISPR approach to identify novel factors involved in uracil repair with potential clinical interest. 

      The authors designed elegant experiments to confirm the METTL3 works through genomic DNA, adding the methylation into DNA (6mA) but not the RNA (m6A), in this base damage repair context. The authors employ different enzymes, such as RNase A, RNase H, DNase, and liquid chromatography coupled to tandem mass spectrometry to validate that METTL3 deposits 6mA in DNA in response to agents that increase genomic uracil. <br /> They also have the Mettl3-KO and the METTL3 inhibition results to support their conclusion. <br /> Weaknesses:<br /> Although this study demonstrates that METTL3-dependent 6mA deposition in DNA is functionally relevant to DNA damage repair in mammalian cells, there are still several concerns and issues that need to be improved to strengthen this research.

      First, in the whole paper, the authors never claim or mention the mammalian cell lines contamination testing result, which is the fundamental assay that has to be done for the mammalian cell lines DNA 6mA study.

      Our cell lines are routinely tested for bacterial contamination, specifically mycoplasma, and we plan to state this information in a revised version of the manuscript.

      Importantly, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on the presence of DNA damage and not caused by contamination in the cell lines (Figure 3D, E, F). While it could be possible that stock solutions of DNA damaging agents may be contaminated, this would need to be the case for all individual drugs and stocks tested that induce 6mA, which seems very unlikely. Finally, the data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3 G, H) provides strong evidence against bacterial contamination in our drug stocks.

      Second, in the whole work, the authors have not supplied any genomic sequencing data to support their conclusions. Although the sequencing of DNA 6mA in mammalian models is challenging, recent breakthroughs in sequencing techniques, such as DR-Seq or NT/NAME-seq, have lowered the bar and improved a lot in the 6mA sequencing assay. Therefore, the authors should consider employing the sequencing methods to further confirm the functional role of 6mA in base repair.

      While we agree that it could be important to understand the precise genomic location of 6mA in relation to DNA damage, this is outside the scope of the current study. Moreover, this exercise may prove unproductive. If 6mA is enriched in DNA at damage sites or as DNA is replicated, the genomic mapping of 6mA is likely to be stochastic. If stochastic, it would be impossible to obtain the read depth necessary to map 6mA accurately.

      Third, the authors used the METTL3 inhibitor and Mettl3-KO to validate the METTL3-6mA-UNG2 functional roles. However, the catalytic mutant and rescue of Mettl3 may be the further experiments to confirm the conclusion. 

      We believe this to be an excellent suggestion from Reviewer #2 but we are unable to perform the proposed experiment at this time. We encourage future studies to explore the rescue experiment.

      Reviewer #3 (Public review):

      Summary:

      The authors are showing evidence that they claim establishes the controversial epigenetic mark, DNA 6mA, as promoting genome stability.

      Strengths:

      The identification of a poorly understood protein, METTL3, and its subsequent characterization in DDR is of high quality and interesting.

      Weaknesses:

      (1) The very presence of 6mA (DNA) in mammalian DNA is still highly controversial and numerous studies have been conclusively shown to have reported the presence of 6mA due to technical artifacts and bacterial contamination. Thus, to my knowledge there is no clear evidence for 6mA as an epigenetic mark in mammals, and consequently, no evidence of writers and readers of 6mA. None of this is mentioned in the introduction. Much of the introduction can be reduced, but a paragraph clearly stating the controversy and lack of evidence for 6mA in mammals needs to be added, otherwise, the reader is given an entirely distorted view of the field.

      These concerns must also be clearly in the limitations section and even in the results section which fails to nuance the authors' findings.

      We agree with the reviewer that the presence and potential function of 6mA in mammalian DNA has been debated. Importantly, the debate regarding the presence and quantity of 6mA in DNA has been previously restricted to undamaged, baseline conditions. In complete agreement with this notion, we do not detect appreciable levels of 6mA in untreated cells. We will revise the introduction to introduce the debate about 6mA in DNA. We, however, want to highlight that our study provides for the first time, convincing evidence (based on orthogonal methods) that 6mA is present in DNA in response to a stimulus, DNA damage.

      (2) What is the motivation for using HT-29 cells? Moreover, the materials and methods do not state how the authors controlled for bacterial contamination, which has been the most common cause of erroneous 6mA signals to date. Did the authors routinely check for mycoplasma?

      HT-29 is a cell line of colorectal origin and chemotherapeutic agents that introduce uracil and uracil derivatives in DNA, as those used in this study, are relevant for the treatment of colorectal cancer. As indicated above, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on DNA damage and not caused by a potential bacterial contamination (Figure 3D, E, F). Additionally, our cell lines are routinely tested for bacterial contamination, specifically mycoplasma.

      (3) The single-cell imaging of 6mA in various cells is nice but must be confirmed by orthogonal approaches. PacBio would provide an alternative and quantitative approach to assessing 6mA levels. Similarly, it is unclear why the authors have not performed dot-blots of 6mA for genomic DNA from the given cell lines.

      We are confused by this point since an orthogonal approach to detect 6mA, mass spectrometry-liquid chromatography, was employed. This method does not use an antibody and confirms the increase of 6mA in DNA when cells were treated with DNA damaging agents. This data is presented in Figure 3F.

      It is sensible to hypothesize that the localization of 6mA is consistent with DNA replication (like uracil deposition). In this event, the genomic mapping of 6mA is likely to be stochastic. This would make quantification with PacBio sequencing difficult because it would be very challenging to achieve the appropriate read depth to call a modified base.

      Dot blots rely on an antibody and thus are not truly orthogonal to our immunofluorescence-based measurements. We preferred the mass spectrometry-liquid chromatography approach we took as a true orthogonal approach.

      (4) The results of Figure 3 need further investigation and validation. If the results are correct the authors are suggesting that the majority of 6mA in their cell lines is present in the DNA, and not the RNA, which is completely contrary to every other study of 6mA in mammalian cells that I am aware of. This could suggest that the antibody is not, in fact, binding to 6mA, but to unmodified adenine, which would explain why the signal disappears after DNAse treatment. Indeed, binding of 6mA to unmethylated DNA is a commonly known problem with most 6mA antibodies and is well described elsewhere.

      Based on this and the following comment, we are convinced that Reviewer #3 has overlooked two critical elements of our study:

      First, the immunofluorescence work presented in Figure 3, showing 6mA signal in response to DNA damage, uses cells that were pre-extracted to remove excess cytoplasmic RNA. This method is often used in immunofluorescence experiments of this kind. The pre-extraction method removes most of the cytoplasmic content, and the majority of the cytoplasmic m6A RNA signal. Supplementary Figure 3D shows cells that have not been pre-extracted prior to staining. These images show the cytoplasmic m6A signal is abundant if we do not perform the pre-extraction step.

      If the antibody used to label 6mA significantly reacted with unmodified adenine, we would expect a large signal in untreated or untreated and denatured conditions. In contrast, an increase in 6mA is not observed in either case.

      Second, the orthogonal approach we employed, mass spectrometry coupled with liquid chromatography, measures 6mA DNA analytes specifically by exact mass. This approach does not depend on an antibody and yields results consistent with those from the immunofluorescence experiments.

      (5) Given the lack of orthologous validation of the observed DNA 6mA and the lack of evidence supporting the presence of 6mA in mammalian DNA and consequently any functional role for 6mA in mammalian biology, the manuscript's conclusions need to be toned down significantly, and the inherent difficultly in assessing 6mA accurately in mammals acknowledged throughout.

      Typo above: “difficultly” should be “difficulty”.

      As discussed in response to prior comments, Figure 3 does provide two independent and orthologous methods that demonstrate 6mA presence in DNA specifically, and not RNA, in response to DNA damage. Complementary and orthogonal datasets are presented using either immunofluorescence microscopy or mass spectrometry-liquid chromatography of extracted DNA. The latter method does not rely on an antibody and can discriminate 6mA DNA versus RNA based on exact mass. We will revise the text to clarify that Figure 3F is a completely orthogonal approach.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Hotinger et al. explore the population dynamics of Salmonella enterica serovar Typhimurium in mice using genetically tagged bacteria. In addition to physiological observations, pathology assessments, and CFU measurements, the study emphasizes quantifying host bottleneck sizes that limit Salmonella colonization and dissemination. The authors also investigate the genetic distances between bacterial populations at various infection sites within the host.

      Initially, the study confirms that pretreatment with the antibiotic streptomycin before inoculation via orogastric gavage increases the bacterial burden in the gastrointestinal (GI) tract, leading to more severe symptoms and heightened fecal shedding of bacteria. This pretreatment also significantly reduces between-animal variation in bacterial burden and fecal shedding. The authors then calculate founding population sizes across different organs, discovering a severe bottleneck in the intestine, with founding populations reduced by approximately 10^6-fold compared to the inoculum size. Streptomycin pretreatment increases the founding population size and bacterial replication in the GI tract. Moreover, by calculating genetic distances between populations, the authors demonstrate that, in untreated mice, Salmonella populations within the GI tract are genetically dissimilar, suggesting limited exchange between colonization sites. In contrast, streptomycin pretreatment reduces genetic distances, indicating increased exchange.

      In extraintestinal organs, the bacterial burden is generally not substantially increased by streptomycin pretreatment, with significant differences observed only in the mesenteric lymph nodes and bile. However, the founding population sizes in these organs are increased. By comparing genetic distances between organs, the authors provide evidence that subpopulations colonizing extraintestinal organs diverge early after infection from those in the GI tract. This hypothesis is further tested by measuring bacterial burden and founding population sizes in the liver and GI tract at 5 and 120 hours post-infection. Additionally, they compare orogastric gavage infection with the less injurious method of infection via drinking, finding similar results for CFUs, founding populations, and genetic distances. These results argue against injuries during gavage as a route of direct infection. 

      To bypass bottlenecks associated with the GI tract, the authors compare intravenous (IV) and intraperitoneal (IP) routes of infection. They find approximately a 10-fold increase in bacterial burden and founding population size in immune-rich organs with IV/IP routes compared to orogastric gavage in streptomycin-pretreated animals. This difference is interpreted as a result of "extra steps required to reach systemic organs."

      While IP and IV routes yield similar results in immune-rich organs, IP infections lead to higher bacterial burdens in nearby sites, such as the pancreas, adipose tissue, and intraperitoneal wash, as well as somewhat increased founding population sizes. The authors correlate these findings with the presence of white lesions in adipose tissue. Genetic distance comparisons reveal that, apart from the spleen and liver, IP infections lead to genetically distinct populations in infected organs, whereas IV infections generally result in higher genetic similarity. 

      Finally, the authors investigate GI tract reseeding, identifying two distinct routes. They observe that the GI tracts of IP/IV-infected mice are colonized either by a clonal or a diversely tagged bacterial population. In clonally reseeded animals, the genetic distance within the GI tract is very low (often zero) compared to the bile population, which is predominantly clonal or pauciclonal. These animals also display pathological signs, such as cloudy/hardened bile and increased bacterial burden, leading the authors to conclude that the GI tract was reseeded by bacteria from the gallbladder bile. In contrast, animals reseeded by more complex bacterial populations show that bile contributes only a minor fraction of the tags. Given the large founding population size in these animals' GI tracts, which is larger than in orogastrically infected animals, the authors suggest a highly permissive second reseeding route, largely independent of bile. They speculate that this route may involve a reversal of known mechanisms that the pathogen uses to escape from the intestine. 

      The manuscript presents a substantial body of work that offers a meticulously detailed understanding of the population dynamics of S. Typhimurium in mice. It quantifies the processes shaping the within-host dynamics of this pathogen and provides new insights into its spread, including previously unrecognized dissemination routes. The methodology is appropriate and carefully executed, and the manuscript is well-written, clearly presented, and concise. The authors' conclusions are well-supported by experimental results and thoroughly discussed. This work underscores the power of using highly diverse barcoded pathogens to uncover the within-host population dynamics of infections and will likely inspire further investigations into the molecular mechanisms underlying the bottlenecks and dissemination routes described here.

      Major point:

      Substantial conclusions in the manuscript rely on genetic distance measurements using the Cavalli-Sforza chord distance. However, it is unclear whether these genetic distance measurements are independent of the founding population size. I would anticipate that in populations with larger founding population sizes, where the relative tag frequencies are closer to those in the inoculum, the genetic distances would appear smaller compared to populations with smaller founding sizes independent of their actual relatedness. This potential dependency could have implications for the interpretation of findings, such as those in Figures 2B and 2D, where antibiotic-pretreated animals consistently exhibit higher founding population sizes and smaller genetic distances compared to untreated animals.

      Thank you for raising this important point regarding reliance on cord distances for gauging genetic distance in barcoded populations. The reviewer is correct that samples with more founders will be more similar to the inoculum and thus inherently more similar to other samples that also have more founders. However, creation of libraries containing very large numbers of unique barcodes can often circumvent this issue. In this case, the effect size of chance-based similarity is not large enough to change the interpretation of the data in Figures 2B and 2D. In our case, the library has ~6x10<sup>4</sup> barcodes, and the founding populations in Figure 2B are ~10<sup>3</sup>. Randomly resampling to create two populations of 10<sup>3</sup> cells from an initial population with 6x10<sup>4</sup> barcodes is expected to yield largely distinct populations with very little similarity. Thus, the similarity between streptomycin-treated populations in Figure 2D is likely the result of biology rather than chance.  

      Reviewer #2 (Public review):

      In this paper, Hotinger et. al. propose an improved barcoded library system, called STAMPR, to study Salmonella population dynamics during infection. Using this system, the authors demonstrate significant diversity in the colonization of different Salmonella clones (defined by the presence of different barcodes) not only across different organs (liver, spleen, adipose tissues, pancreas, and gall bladder) but also within different compartments of the same gastrointestinal tissue. Additionally, this system revealed that microbiota competition is the major bottleneck in Salmonella intestinal colonization, which can be mitigated by streptomycin treatment. However, this has been demonstrated previously in numerous publications. They also show that there was minimal sharing between populations found in the intestine and those in the other organs. Upon IV and IP infection to bypass the intestinal bottleneck, they were able to demonstrate, using this library, that Salmonella can renter the intestine through two possible routes. One route is essentially the reverse path used to escape the gut, leading to a diverse intestinal population; while the other, through the bile, typically results in a clonal population. Although the authors showed that the STAMPR pipeline improved the ability to identify founder populations and their diversity within the same animal during infections, some of the conclusions appear speculative and not fully supported.

      (1) It's particularly interesting how the authors, using this system, demonstrate the dominant role of the microbiota bottleneck in Salmonella colonization and how it is widened by antibiotic treatment (Figure 1). Additionally, the ability to track Salmonella reseeding of the gut from other organs starting with IV and IP injections of the pathogen provides a new tool to study population dynamics (Figure 5). However, I don't think it is possible to argue that the proximal and distal small intestine, Peyer's patches (PPs), cecum, colon, and feces have different founder populations for reasons other than stochastic variations. All the barcoded Salmonella clones have the same fitness and the fact that some are found or expanded in one region of the gastrointestinal tract rather than another likely results from random chance - such as being forced in a specific region of the gut for physical or spatial reasons-and subsequent expansion, rather than any inherent biological cause. For example, some bacteria may randomly adhere to the mucus, some may swim toward the epithelial layer, while others remain in the lumen; all will proliferate in those respective sites. In this way, different founder populations arise based on random localization during movement through the gastrointestinal tract, which is an observation, but it doesn't significantly contribute to understanding pathogen colonization dynamics or pathogenesis. Therefore, I would suggest placing less emphasis on describing these differences or better discussing this aspect, especially in the context of the gastrointestinal tract.

      Thank you for helping us identify this area for further clarification. We agree with the reviewer’s interpretation that seeding of proximal and distal small intestine, Peyer's patches (PPs), cecum, colon, and feces with different founder populations is likely caused by stochastic variations, consistent with separate stochastic bottlenecks to establishing these separate niches. To clarify this point we have modified the text in the results section, “Streptomycin treatment decreases compartmentalization of S. Typhimurium populations within the intestine”.

      Change to text:

      “Except for the cecum and colon, in untreated animals the S. Typhimurium populations in different regions of the intestine were dissimilar (Avg. GD ranged from 0.369 to 0.729, 2D left); i.e., there is little sharing between populations in the intestine. These data suggest that there are separate bottlenecks in different regions of the intestine that cause stochastic differences in the identity of the founders. Interestingly, when these founders replicate, they do not mix, remaining compartmentalized with little sharing between populations throughout the intestinal tract (i.e., barcodes found in one region are not in other regions, Figure S3). This was surprising as the luminal contents, an environment presumably conducive to bacterial movement, were not removed from these samples.”

      In this section we are interested in the underlying biology that occurs after the initial bottleneck to preserve this compartmentalization during outgrowth of the intestinal population. In other words, what prevents these separate populations from merging (e.g., what prevents the bacteria replicating in the proximal small intestine from traveling through the intestine and establishing a niche in the distal small intestine)? While we do not explore the mechanisms of compartmentalization, we observe that it is disrupted by streptomycin pretreatment, suggesting a microbiota-dependent biological cause. 

      (2) I do think that STAMPR is useful for studying the dynamics of pathogen spread to organs where Salmonella likely resides intracellularly (Figure 3). The observation that the liver is colonized by an early intestinal population, which continues to proliferate at a steady rate throughout the infection, is very interesting and may be due to the unique nature of the organ compared to the mucosal environment. What is the biological relevance during infection? Do the authors observe the same pattern (Figures 3C and G) when normalizing the population data for the spleen and mesenteric lymph nodes (mLN)? If not, what do the authors think is driving this different distribution?

      Thank you for raising this interesting point. These data indicate that the liver is seeded from the intestine early during infection. The timing and source of dissemination have relevance for understanding how host and pathogen variables control the spread of bacteria to systemic sites. For example, our conclusion (early dissemination) indicates that the immune state of a host at the time of exposure to a pathogen, and for a short period thereafter, are what primarily influence the process of dissemination, not the later response to an active infection. 

      We observe that the liver and mucosal environments within the intestine have similar colonization behaviors. Both niches are seeded early during infection, followed by steady pathogen proliferation and compartmentalization that apparently inhibits further seeding. This results in the identity of barcodes in the liver population remaining distinct from the intestinal populations, and the intestinal populations remaining distinct from each other.

      We observe a similar pattern to the liver in the spleen and MLN (the barcodes in the spleen and MLN are dissimilar to the population in the intestine). To clarify this point, we have modified the text (below) and added this analysis as a supplemental figure (S4).

      Change to text:

      Genetic distance comparison of liver samples to other sites revealed that, regardless of streptomycin treatment, there was very little sharing of barcodes between the intestine and extraintestinal sites (Avg. GD >0.75, Figure 3C). Furthermore, the MLN and spleen populations also lacked similarity with the intestine (Figure S4). These analyses strongly support the idea that S. Typhimurium disseminates to extraintestinal organs relatively early following inoculation, before it establishes a replicative niche in the intestine.

      (3) Figure 6: Could the bile pathology be due to increased general bacterial translocation rather than Salmonella colonization specifically? Did the authors check for the presence of other bacteria (potentially also proliferating) in the bile? Do the authors know whether Salmonella's metabolic activity in the bile could be responsible for gallbladder pathology?

      The reviewer raises interesting points for future work. We did not check whether other bacterial species are translocating during S. Typhimurium infection. The relevance of Salmonella’s metabolic activity is also very interesting, and we hope these questions will be answered by future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) P. 9/10 "... the marked delay in shedding after IP and IV relative to orogastric inoculation suggest that the S. Typhimurium population encounters substantial bottleneck(s) on the route(s) from extraintestinal sites back to the intestine.": Can you conclude that from the data? It could also be possible that there is a biological mechanism (other than chance events) that delays the re-entry to the intestine.

      We propose that the delay in shedding indicates additional obstacles that bacteria face when re-entering the intestine, and that there are likely biological mechanisms that cause this delay. However, these unknown mechanisms effectively act as additional bottlenecks by causing a stochastic loss of population diversity. 

      (2) P. 11 "...both organs would likely contain all 10 barcodes. In contrast, a library with 10,000 barcodes can be used to distinguish between a bottleneck resulting in Ns = 1,000 and Ns = 10,000, since these bottlenecks result in a different number of barcodes in output samples. Furthermore, high diversity libraries reduce the likelihood that two tissue samples share the same barcode(s) due to random chance, enabling more accurate quantification of bacterial dissemination.": I agree with the general analysis, but I find it misleading to talk about the presence of barcodes when the analyses in this manuscript are based on the much more powerful comparison of relative abundance of individual tags instead of their presence or absence.

      The reviewer raises an excellent point, and the distinction between relative abundance versus presence/absence is discussed extensively in the original STAMPR manuscript. Although relative abundance is powerful, the primary metric used in this study (Ns) is calculated principally from the number of barcodes, corrected (via simulations) for the probability of observing the same barcode across distinct founders. Although this correction procedure does rely on barcode abundance, the primary driver of founding population quantification is the number of barcodes.

      (3) P.14 "the library in LB supplemented with SM was not significantly different than the parent strain" and Figure 2C: How was significance tested? How many times were the growth curves recorded? On my print-out, the red color has different shades for different growth curves.

      Significance was tested with a Mann-Whitney and growth curves were performed 5 times. Growth curves are displayed with 50% opacity, and as a result multiple curves directly on top of each other appear darker. The legend to S2 has been modified accordingly.

      (4) P.16: close bracket in the equation for FRD calculation.

      Done

      (5) Figure 2C "Average CFU per founder": I found the wording confusing at first as I thought you divided the average bacterial burden per organ by Ns, instead of averaging the CFU/Ns calculated for each mouse.

      The wording has been clarified. 

      (6) Figure 3B: It would be helpful to include expected genetic distances in the schematic as it is difficult to infer the genetic distance when only two of three, respectively, different "barcode colors" are used. While I find the explanation in the main text intuitive, a graphical representation would have helped me.

      Thank you for the suggestion. Unfortunately, using colors to represent barcodes is imperfect and limits the diversity that can be depicted. We have modified Figure 3B to further clarify. 

      (7) Figure 3C: Why do you compare the genetic distance to the liver, when you discuss the genetic distance of the intestinal population? Is it not possible that the intestinal populations are similar to the extraintestinal organs except the liver?

      For clarity, we chose to highlight exclusively the liver. However, we observed a similar pattern to the liver in other extraintestinal organs. To clarify the generalizability of this point we have added a supplemental figure with comparisons to MLN and Spleen (Supplemental figure S4) as well as further text.

      (8) Figure 3C & S5A: I found "+SM" and "+SM, Drinking" confusing and would have preferred "+SM, Gavage" and "+SM, Drinking" for clarity.

      Done, thank you for the suggestion.

      (9) Figure 3G&H: I find it worthy of discussion that the bacterial burden increases over time, while the founding population decreases. Does that not indicate that replication only occurs at specific sites leading to the amplification of only a few barcodes and thereby a larger change of the relative barcode abundance compared to the inoculum?

      From 5h to 120h the size of the founding population decreases in multiple intestinal sites. This likely indicates that the impact of the initial bottleneck is still ongoing at 5h, although further temporal analysis would be required to define the exact timing of the bottleneck. Notably, the passage time through the mouse intestine is ~5h. Many of the founders observed at 5h could be a population that will never establish a replicative niche, and failing to colonize be shed in the feces, bottlenecking the population between 5h and 120h. To clarify this point we have added the following text:

      Section “S. Typhimurium disseminates out of the intestine before establishing an intestinal replicative niche”.

      “In contrast to the liver, there were more founders present in samples from the intestine (particularly in the colon) at 5 hours versus 120 hours (Figure 3H). These data likely indicate that many of the founders observed in the intestine at 5 hours are shed in the feces prior to establishing a replicative niche, and demonstrates that the forces restricting the S. Typhimurium population in the intestine act over a period of > 5 hours.”  

      (10) Figure S2A: I do not understand this figure. Why are there more than 70.000 tags listed? I was under the impression the barcode library in S. Typhimurium had 55.000 tags while only the plasmid pSM1 had more than 70.000 (but the plasmid should not be relevant here). Why are there distinct lines at approximately 10^-5 and a bit lower? I would have expected continuously distributed barcode frequencies.

      During barcode analysis, each library is mapped to the total barcode list in the barcode donor pSM1, which contains ~70,000 barcodes. This enables consistent analysis across different bacterial libraries. The designation “barcode number” refers to the barcode number in pSM1, meaning many of the barcodes in the Salmonella library are at zero reads. This graph type was chosen to show there was no bias toward a particular barcode, however there is significant overlap of the points, making individual barcode frequencies difficult to see. We have changed the x-axis to state “pSM1 Barcode Number” and clarified in the figure legend.

      Since the y-axes on these graphs is on a log10 scale, the lines represent barcodes with 1 read, 2 reads, 3 reads, etc. As the number of reads per barcode increases linearly, the space between them decreases on logarithmic axes.

      (11) There are a few typos in the figure legends of the supplementary material. For example Figure S2: S. Typhimurium not italicized, ~7x105 no superscript. Fig. S4&5 ", Open circles" is "O" is capitalized.

      Typos have been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting manuscript where the authors systematically measure rG4 levels in brain samples at different ages of patients affected by AD. To the best of my knowledge this is the first time that BG4 staining is used in this context and the authors provide compelling evidence to show an association with BG4 staining and age or AD progression, which interestingly indicates that such RNA structure might play a role in regulating protein homeostasis as previously speculated. The methods used and the results reported seems robust and reproducible. There were two main things that needed addressing:

      (1) Usually in BG4 staining experiments to ensure that the signal detected is genuinely due to rG4 an RNase treatment experiment is performed. This does not have to be extended to all the samples presented but having a couple of controls where the authors observe loss of staining upon RNase treatment will be key to ensure with confidence that rG4s are detected under the experimental conditions. This is particularly relevant for this brain tissue samples where BG4 staining has never been performed before.

      (2) The authors have an association between rG4-formation and age/disease progression. They also observe distribution dependency of this, which is great. However, this is still an association which does not allow the model to be supported. This is not something that can be fixed with an easy experiment and it is what it is, but my point is that the narrative of the manuscript should be more fair and reflect the fact that, although interesting, what the authors are observing is a simple correlation. They should still go ahead and propose a model for it, but they should be more balanced in the conclusion and do not imply that this evidence is sufficient to demonstrate the proposed model. It is absolutely fine to refer to the literature and comment on the fact that similar observations have been reported and this is in line with those, but still this is not an ultimate demonstration.

      Comments on current version:

      The authors have now addressed my concerns.

      We thank the reviewer for their support!

      Reviewer #2 (Public review):

      RNA guanine-rich G-quadruplexes (rG4s) are non-canonical higher order nucleic acid structures that can form under physiological conditions. Interestingly, cellular stress is positively correlated with rG4 induction.

      In this study, the authors examined human hippocampal postmortem tissue for the formation ofrG4s in aging and Alzheimer Disease (AD). rG4 immunostaining strongly increased in the hippocampus with both age and with AD severity. 21 cases were used in this study (age range 30-92).

      This immunostaining co-localized with hyper-phosphorylated tau immunostaining in neurons. The BG4 staining levels were also impacted by APOE status. rG4 structure was previously found to drive tau aggregation. Based on these observations, the authors propose a model of neurodegeneration in which chronic rG4 formation drives proteostasis collapse.

      This model is interesting, and would explain different observations (e.g., RNA is present in AD aggregates and rG4s can enhance protein oligomerization and tau aggregation).

      Main issue from the previous round of review:

      There is indeed a positive correlation between Braak stage severity and BG4 staining, but this correlation is relatively weak and borderline significant ((R = 0.52, p value = 0.028). This is probably the main limitation of this study, which should be clearly acknowledged (together with a reminder that "correlation is not causality"). Related to this, here is no clear justification to exclude the four individuals in Fig 1d (without them R increases to 0.78). Please remove this statement. On the other hand, the difference based on APOE status is more striking.

      Comments on current version:

      The authors have made laudable efforts to address the criticisms I made in my evaluation of the original manuscript.

      We thank the reviewer for their support!

      Recommendations for the authors:

      Reviewing Editor:

      I would suggest two minor edits:

      - The findings are correlative and descriptive, but the title implies functionality (A New Role for RNA G-quadruplexes in Aging and Alzheimer′s Disease). I would suggest toning down this title).

      - While I understand the limitations in performing additional biochemical experiments to validate the immunofluorescence study, I think this is worth mentioning as a limitation in the text.

      We have made these two changes as requested, altering the title to remove the word Role that may imply more meaning than intended, and adding a line to the discussion on the need for future additional biochemical experiments.

      Reviewer #1 (Recommendations for the authors):

      Thanks for addressing the concerns raised.

      We thank the reviewer for their support!

      Reviewer #2 (Recommendations for the authors):

      Minor point:

      Related to the "correlation is not causality" remark I made in my evaluation of the original manuscript: the authors' answer is reasonable. Still, I would suggest to modify the abstract: "we propose a model of neurodegeneration in which chronic rG4 formation drives proteostasis collapse" => "we propose a model of neurodegeneration in which chronic rG4 formation is linked to proteostasis collapse"

      All other remarks I made have been answered properly.

      We thank the reviewer for their support! We have made the change exactly as requested by the reviewer.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates lipid scrambling mechanisms across TMEM16 family members using coarse-grained molecular dynamics (MD) simulations. While the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations, several critical issues undermine its novelty, impact, and alignment with experimental observations.

      Critical issues:

      (1) Lack of Novelty:

      The phenomenon of lipid scrambling via an open hydrophilic groove is already well-established in the literature, including through atomistic MD simulations. The authors themselves acknowledge this fact in their introduction and discussion. By employing coarse-grained simulations, the study essentially reiterates previously known findings with limited additional mechanistic insight. The repeated observation of scrambling occurring predominantly via the groove does not offer significant advancement beyond prior work.

      We agree with the reviewer’s statement regarding the lack of novelty when it comes to our observations of scrambling in the groove of open Ca<sup>2+</sup>-bound TMEM16 structures. However, we feel that the inclusion of closed structures in this study, which attempts to address the yet unanswered question of how scrambling by TMEM16s occurs in the absence of Ca<sup>2+</sup>, offers new observations for the field. In our study we specifically address to what extent the induced membrane deformation, which has been theorized to aid lipids cross the bilayer especially in the absence of Ca<sup>2+</sup>, contributes to the rate of scrambling (see references 36, 59, and 66). There are also several TMEM16F structures solved under activating conditions (bound to Ca<sup>2+</sup> and in the presence of PIP2) which feature structural rearrangements to TM6 that may be indicative of an open state (PDB 6P48) and had not been tested in simulations. We show that these structures do not scramble and thereby present evidence against an out-of-the-groove scrambling mechanism for these states. Although we find a handful of examples of lipids being scrambled by Ca<sup>2+</sup>-free structures of TMEM16 scramblases, none of our simulations suggest that these events are related to the degree of deformation.

      (2) Redundancy Across Systems:

      The manuscript explores multiple TMEM16 family members in activating and non-activating conformations, but the conclusions remain largely confirmatory. The extensive dataset generated through coarse-grained MD simulations primarily reinforces established mechanistic models rather than uncovering fundamentally new insights. The effort, while statistically robust, feels excessive given the incremental nature of the findings.

      Again, we agree with the reviewer’s statement that our results largely confirm those published by other groups and our own. We think there is however value in comparing the scrambling competence of these TMEM16 structures in a consistent manner in a single study to reduce inconsistencies that may be introduced by different simulation methods, parameters, environmental variables such as lipid composition as used in other published works of single family members. The consistency across our simulations and high number of observed scrambling events have allowed us to confirm that the mechanism of scrambling is shared by multiple family members and relies most obviously on groove dilation.

      (3) Discrepancy with Experimental Observations:

      The use of coarse-grained simulations introduces inherent limitations in accurately representing lipid scrambling dynamics at the atomistic level. Experimental studies have highlighted nuances in lipid permeation that are not fully captured by coarse-grained models. This discrepancy raises questions about the biological relevance of the reported scrambling events, especially those occurring outside the canonical groove.

      We thank the reviewer for bringing up the possible inaccuracies introduced by coarse graining our simulations. This is also a concern for us, and we address this issue extensively in our discussion. As the reviewer pointed out above, our CG simulations have largely confirmed existing evidence in the field which we think speaks well to the transferability of observations from atomistic simulations to the coarse-grained level of detail. We have made both qualitative and quantitative comparisons between atomistic and coarse-grained simulations of nhTMEM16 and TMEM16F (Figure 1, Figure 4-figure supplement 1, Figure 4-figure supplement 5) showing the two methods give similar answers for where lipids interact with the protein, including outside of the canonical groove. We do not dispute the possible discrepancy between our simulations and experiment, but our goal is to share new nuanced ideas for the predicted TMEM16 scrambling mechanism that we hope will be tested by future experimental studies.

      (4) Alternative Scrambling Sites:

      The manuscript reports scrambling events at the dimer-dimer interface as a novel mechanism. While this observation is intriguing, it is not explored in sufficient detail to establish its functional significance. Furthermore, the low frequency of these events (relative to groove-mediated scrambling) suggests they may be artifacts of the simulation model rather than biologically meaningful pathways.

      We agree with the reviewer that our observed number of scrambling events in the dimer interface is too low to present it as strong evidence for it being the alternative mechanism for Ca<sup>2+</sup>-independent scrambling. This will require additional experiments and computational studies which we plan to do in future research. However, we are less certain that these are artifacts of the coarse-grained simulation system as we observed a similar event in an atomistic simulation of TMEM16F.

      Conclusion:

      Overall, while the study is technically sound and presents a large dataset of lipid scrambling events across multiple TMEM16 structures, it falls short in terms of novelty and mechanistic advancement. The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      Reviewer #2 (Public review):

      Summary:

      Stephens et al. present a comprehensive study of TMEM16-members via coarse-grained MD simulations (CGMD). They particularly focus on the scramblase ability of these proteins and aim to characterize the "energetics of scrambling". Through their simulations, the authors interestingly relate protein conformational states to the membrane's thickness and link those to the scrambling ability of TMEM members, measured as the trespassing tendency of lipids across leaflets. They validate their simulation with a direct qualitative comparison with Cryo-EM maps.

      Strengths:

      The study demonstrates an efficient use of CGMD simulations to explore lipid scrambling across various TMEM16 family members. By leveraging this approach, the authors are able to bypass some of the sampling limitations inherent in all-atom simulations, providing a more comprehensive and high-throughput analysis of lipid scrambling. Their comparison of different protein conformations, including open and closed groove states, presents a detailed exploration of how structural features influence scrambling activity, adding significant value to the field. A key contribution of this study is the finding that groove dilation plays a central role in lipid scrambling. The authors observe that for scrambling-competent TMEM16 structures, there is substantial membrane thinning and groove widening. The open Ca<sup>2+</sup>-bound nhTMEM16 structure (PDB ID 4WIS) was identified as the fastest scrambler in their simulations, with scrambling rates as high as 24.4 {plus minus} 5.2 events per μs. This structure also shows significant membrane thinning (up to 18 Å), which supports the hypothesis that groove dilation lowers the energetic barrier for lipid translocation, facilitating scrambling.

      The study also establishes a correlation between structural features and scrambling competence, though analyses often lack statistical robustness and quantitative comparisons. The simulations differentiate between open and closed conformations of TMEM16 structures, with open-groove structures exhibiting increased scrambling activity, while closed-groove structures do not. This finding aligns with previous research suggesting that the structural dynamics of the groove are critical for scrambling. Furthermore, the authors explore how the physical dimensions of the groove qualitatively correlate with observed scrambling rates. For example, TMEM16K induces increased membrane thinning in its open form, suggesting that membrane properties, along with structural features, play a role in modulating scrambling activity.

      Another significant finding is the concept of "out-of-the-groove" scrambling, where lipid translocation occurs outside the protein's groove. This observation introduces the possibility of alternate scrambling mechanisms that do not follow the traditional "credit-card model" of groove-mediated lipid scrambling. In their simulations, the authors note that these out-of-the-groove events predominantly occur at the dimer interface between TM3 and TM10, especially in mammalian TMEM16 structures. While these events were not observed in fungal TMEM16s, they may provide insight into Ca<sup>2+</sup>-independent scrambling mechanisms, as they do not require groove opening.

      Weaknesses:

      A significant challenge of the study is the discrepancy between the scrambling rates observed in CGMD simulations and those reported experimentally. Despite the authors' claim that the rates are in line experimentally, the observed differences can mean large energetic discrepancies in describing scrambling (larger than 1kT barrier in reality). For instance, the authors report scrambling rates of 10.7 events per μs for TMEM16F and 24.4 events per μs for nhTMEM16, which are several orders of magnitude faster than experimental rates. While the authors suggest that this discrepancy could be due to the Martini 3 force field's faster diffusion dynamics, this explanation does not fully account for the large difference in rates. A more thorough discussion on how the choice of force field and simulation parameters influence the results, and how these discrepancies can be reconciled with experimental data, would strengthen the conclusions. Likewise, rate calculations in the study are based on 10 μs simulations, while experimental scrambling rates occur over seconds. This timescale discrepancy limits the study's accuracy, as the simulations may not capture rare or slow scrambling events that are observed experimentally and therefore might underestimate the kinetics of scrambling. It's however important to recognize that it's hard (borderline unachievable) to pinpoint reasonable kinetics for systems like this using the currently available computational power and force field accuracy. The faster diffusion in simulations may lead to overestimated scrambling rates, making the simulation results less comparable to real-world observations. Thus, I would therefore read the findings qualitatively rather than quantitatively. An interesting observation is the asymmetry observed in the scrambling rates of the two monomers. Since MARTINI is known to be limited in correctly sampling protein dynamics, the authors - in order to preserve the fold - have applied a strong (500 kJ mol-1 nm-2) elastic network. However, I am wondering how the ENM applies across the dimer and if any asymmetry can be noticed in the application of restraints for each monomer and at the dimer interface. How can this have potentially biased the asymmetry in the scrambling rates observed between the monomers? Is this artificially obtained from restraining the initial structure, or is the asymmetry somehow gatekeeping the scrambling mechanism to occur majorly across a single monomer? Answering this question would have far-reaching implications to better describe the mechanism of scrambling.

      The main aim of our computational survey was to directly compare all relevant published TMEM16 structures in both open and closed states using the Martini 3 CGMD force field. Our standardized simulation and analysis protocol allowed us to quantitatively compare scrambling rates across the TMEM16 family, something that has never been done before. We do acknowledge that direct comparison between simulated versus experimental scrambling rates is complicated and is best to be interpreted qualitatively. In line with other reports (e.g., Li et al, PNAS 2024), lipid scrambling in CGMD is 2-3 orders of magnitude faster than typical experimental findings. In the CG simulation field, these increased dynamics due to the smoother energy landscape are a well known phenomenon. In our view, this is a valuable trade-off for being able to capture statistically robust scrambling dynamics and gain mechanistic understanding in the first place, since these are currently challenging to obtain otherwise. For example, with all-atom MD it would have been near-impossible to conclude that groove openness and high scrambling rates are closely related, simply because one would only measure a handful of scrambling events in (at most) a handful of structures.

      Considering the elastic network: the reviewer is correct in that the elastic network restrains the overall structure to the experimental conformation. This is necessary because the Martini 3 force field does not accurately model changes in secondary (and tertiary) structure. In fact, by retaining the structural information from the experimental structures, we argue that the elastic network helped us arrive at the conclusion that groove openness is the major contributing factor in determining a protein’s scrambling rate. This is best exemplified by the asymmetric X-ray structure of TMEM16K (5OC9), in which the groove of one subunit is more dilated than the other. In our simulation, this information was stored in the elastic network, yielding a 4x higher rate in the open groove than in the closed groove, within the same trajectory.

      Notably, the manuscript does not explore the impact of membrane composition on scrambling rates. While the authors use a specific lipid composition (DOPC) in their simulations, they acknowledge that membrane composition can influence scrambling activity. However, the study does not explore how different lipids or membrane environments or varying membrane curvature and tension, could alter scrambling behaviour. I appreciate that this might have been beyond the scope of this particular paper and the authors plan to further chase these questions, as this work sets a strong protocol for this study. Contextualizing scrambling in the context of membrane composition is particularly relevant since the authors note that TMEM16K's scrambling rate increases tenfold in thinner membranes, suggesting that lipid-specific or membrane-thickness-dependent effects could play a role.

      Considering different membrane compositions: for this study, we chose to keep the membranes as simple as possible. We opted for pure DOPC membranes, because it has (1) negligible intrinsic curvature, (2) forms fluid membranes, and (3) was used previously by others (Li et al, PNAS 2024). As mentioned by the reviewer, we believe our current study defines a good standardized protocol and solid baseline for future efforts looking into the additional effects of membrane composition, tension, and curvature that could all affect TMEM16-mediated lipid scrambling.

      Reviewer #3 (Public review):

      Strengths:

      The strength of this study emerges from a comparative analysis of multiple structural starting points and understanding global/local motions of the protein with respect to lipid movement. Although the protein is well-studied, both experimentally and computationally, the understanding of conformational events in different family members, especially membrane thickness less compared to fungal scramblases offers good insights.

      We appreciate the reviewer recognizing the value of the comparative study. In addition to valuable insights from previous experimental and computational work, we hope to put forward a unifying framework that highlights various TMEM16 structural features and membrane properties that underlie scrambling function.

      Weaknesses:

      The weakness of the work is to fully reconcile with experimental evidence of Ca²⁺-independent scrambling rates observed in prior studies, but this part is also challenging using coarse-grain molecular simulations. Previous reports have identified lipid crossing, packing defects, and other associated events, so it is difficult to place this paper in that context. However, the absence of validation leaves certain claims, like alternative scrambling pathways, speculative.

      It is generally difficult to quantitatively compare bulk measurements of scrambling phenomena with simulation results. The advantage of simulations is to directly observe the transient scrambling events at a spatial and temporal resolution that is currently unattainable for experiments. The current experimental evidence for the precise mechanism of Ca<sup>2+</sup>-independent scrambling is still under debate. We therefore hope to leverage the strength of MD and statistical rigor of coarse-grained simulations to generate testable hypotheses for further structural, biochemical, and computational studies.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review):

      This experiment sought to determine what effect congenital/early-onset hearing loss (and associated delay in language onset) has on the degree of inter-individual variability in functional connectivity to the auditory cortex. Looking at differences in variability rather than group differences in mean connectivity itself represents an interesting addition to the existing literature. The sample of deaf individuals was large, and quite homogeneous in terms of age of hearing loss onset, which are considerable strengths of the work. The experiment appears well conducted and the results are certainly of interest. R: Thank you for your positive and thoughtful feedback.

      Reviewer #3 (Public review):

      Summary:

      This study focuses on changes in brain organization associated with congenital deafness. The authors investigate differences in functional connectivity (FC) and differences in the variability of FC. By comparing congenitally deaf individuals to individuals with normal hearing, and by further separating congenitally deaf individuals into groups of early and late signers, the authors can distinguish between changes in FC due to auditory deprivation and changes in FC due to late language acquisition. They find larger FC variability in deaf than normal-hearing individuals in temporal, frontal, parietal, and midline brain structures, and that FC variability is largely driven by auditory deprivation. They suggest that the regions that show a greater FC difference between groups also show greater FC variability.

      Strengths:

      The manuscript is well-written, and the methods are clearly described and appropriate. Including the three different groups enables the critical contrasts distinguishing between different causes of FC variability changes. The results are interesting and novel.

      Weaknesses:

      Analyses were conducted for task-based data rather than resting-state data. The authors report behavioral differences between groups and include behavioral performance as a nuisance regressor in their analysis. This is a good approach to account for behavioral task differences, given the data. Nevertheless, additional work using resting-state functional connectivity could remove the potential confound fully.

      The authors have addressed my concerns well.

      Thank you for your thoughtful feedback. We appreciate your acknowledgment of the strengths of our study and the approaches taken to address potential confounds. As noted, we discuss the limitation of not including resting-state data in the manuscript, and we agree that this represents an important avenue for future research. We hope to address this question in future studies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The paper proposes that the placement of criteria for determining whether a stimulus is 'seen' or 'unseen' can significantly impact the validity of neural measures of consciousness. The authors found that conservative criteria, which require stronger evidence to classify a stimulus as 'seen,' tend to inflate effect sizes in neural measures, making conscious processing appear more pronounced than it is. Conversely, liberal criteria, which require less evidence, reduce these effect sizes, potentially underestimating conscious processing. This variability in effect sizes due to criterion placement can lead to misleading conclusions about the nature of conscious and unconscious processing.

      Furthermore, the study highlights that the Perceptual Awareness Scale (PAS), a commonly used tool in consciousness research, does not effectively mitigate these criterion-related confounds. This means that even with PAS, the validity of neural measures can still be compromised by how criteria are set. The authors emphasize the need for careful consideration and standardization of criterion placement in experimental designs to ensure that neural measures accurately reflect the underlying cognitive processes. By addressing this issue, the paper aims to improve the reliability and validity of findings in the field of consciousness research.

      Strengths:

      (1) This research provides a fresh perspective on how criterion placement can significantly impact the validity of neural measures in consciousness research.

      (2) The study employs robust simulations and EEG experiments to demonstrate the effects of criterion placement, ensuring that the findings are well-supported by empirical evidence.

      (3) By highlighting the limitations of the PAS and the impact of criterion placement, the study offers practical recommendations for improving experimental designs in consciousness research.

      Weaknesses:

      The primary focused criterion of PAS is a commonly used tool, but there are other measures of consciousness that were not evaluated, which might also be subject to similar or different criterion limitations. A simulation could applied to these metrics to show how generalizable the conclusion of the study is.

      We would like to thank reviewer 1 for their positive words and for taking the time to evaluate our manuscript. We agree that it would be important to gauge generalization to other metrics of consciousness. Note however, that the most commonly used alternative methods are postdecision wagering and confidence, both of which are known to behave quite similarly to the PAS (Sandberg, Timmermans , Overgaard & Cleeremans, 2010). Indeed, we have confirmed in other work that confidence is also sensitive to criterion shifts (see https://osf.io/preprints/psyarxiv/xa4fj). Although it has been claimed that confidence-derived aggregate metrics like meta-d’ or metacognitive efficiency may overcome criterion shifts, it would require empirical data rather than simulation to settle whether this is true or not (also see the discussion in https://osf.io/preprints/psyarxiv/xa4fj). Furthermore, out of these metrics, the PAS seems to be the preferred one amongst consciouness researchers (see figure 4 in Francken, Beerendonk, Molenaar, Fahrenfort, Kiverstein, Seth, Gaal S van, 2022; as well as https://osf.io/preprints/psyarxiv/bkxzh). Thus, given the fact that other metrics are either expected to behave in similar ways and/or because it would require more empirical work to determine along which dimension(s) criterion shifts would operate in alternative metrics, we see no clear path to implement the suggested simulations. We anticipate that aiming to do this would require a considerable amount of additional work, figuring out many things which we believe would better suit a future project. We would of course be open to doing this if the reviewer would have more specific suggestions for how to go about the proposed simulations.

      Reviewer #2 (Public review):

      Summary:

      The study investigates the potential influence of the response criterion on neural decoding accuracy in consciousness and unconsciousness, utilizing either simulated data or reanalyzing experimental data with post-hoc sorting data.

      Strengths:

      When comparing the neural decoding performance of Target versus NonTarget with or without post-hoc sorting based on subject reports, it is evident that response criterion can influence the results. This was observed in simulated data as well as in two experiments that manipulated the subject response criterion to be either more liberal or more conservative. One experiment involved a two-level response (seen vs unseen), while the other included a more detailed four-level response (ranging from 0 for no experience to 3 for a clear experience). The findings consistently indicated that adopting a more conservative response criterion could enhance neural decoding performance, whether in conscious or unconscious states, depending on the sensitivity or overall response threshold.

      Weaknesses:

      (1) The response criterion plays a crucial role in influencing neural decoding because a subject's report may not always align with the actual stimulus presented. This discrepancy can occur in cases of false alarms, where a subject reports seeing a target that was not actually there, or in cases where a target is present but not reported. Some may argue that only using data from consistent trials (those with correct responses) would not be affected by the response criterion. However, the authors' analysis suggests that a conservative response criterion not only reduces false alarms but also impacts hit rates. It is important for the authors to further investigate how the response criterion affects neural decoding even when considering only correct trials.

      We would like to thank reviewer 2 for taking the time to evaluate our manuscript. We appreciate the suggestion to investigate neural decoding on only correct trials. What we in fact did is consider target trials that are 'correct' (hits = seen target present trials) and 'incorrect' (misses = unseen target present trials) separately, see figure 4A and figure 4B. This shows that the response criterion also affects the neural measure of consciousness when only considering correct target present trials. Note however, that one cannot decode 'unseen' (target present) trials if one only aims to decode 'correct' trials, because those are all incorrect by definition. We did not analyze false alarms (these would be the 'seen' trials on the noise distribution of Figure 1A), as there were not enough trials of those, especially in the conservative condition (see Figure 2C and 2D), making comparisons between conservative and liberal impossible. However, the predictions for false alarms are pretty straightforward, and follow directly from the framework in Figure 1.

      (2) The author has utilized decoding target vs. nontarget as the neural measures of unconscious and/or conscious processing. However, it is important to note that this is just one of the many neural measures used in the field. There are an increasing number of studies that focus on decoding the conscious content, such as target location or target category. If the author were to include results on decoding target orientation and how it may be influenced by response criterion, the field would greatly benefit from this paper.

      We thank the reviewer for the suggestion to decode orientation of the target. In our experiments, the target itself does not have an orientation, but the texture of which it is composed does. We used four orientations, which were balanced out within and across conditions such that presence-absence decoding is never driven by orientation, but rather by texture based figure-ground segregation (for similar logic, see for example Fahrenfort et al, 2007; 2008 etc). There are a couple of things to consider when wanting to apply a decoding analysis on the orientation of these textures:

      (1) Our behavioral task was only on the presence or absence of the target, not on the orientation of the textures. This makes it impossible to draw any conclusions about the visibility of the orientation of the textures. Put differently: based on behavior there is no way of identifying seen or unseen orientations, correctly or incorrectly identified orientations etc. For examply, it is easy to envision that an observer detects a target without knowing the orientation that defines it, or vice versa a situation in which an observer does not detect the target while still being aware of the orientation of a texture in the image (either of the figure, or of the background). The fact that we have no behavioral response to the orientation of the textures severely limits the usefulness of a hypothetical decoding effect on these orientations, as such results would be uninterpretable with respect to the relevant dimension in this experiment, which is visibility.

      (2) This problem is further excarbated by the fact that the orientation of the background is always orthogonal to the orientation of the target. Therefore, one would not only be decoding the orientation of the texture that constitutes the target itself, but also the texture that constitutes the background. Given that we also have no behavioral metric of how/whether the orientation of the background is perceived, it is similarly unclear how one would interpret any observed effect.

      (3) Finally, it is important to note that – even when categorization/content is sometimes used as an auxiliary measure in consciousness research (often as a way to assay objective performance) - consciousness is most commonly conceptualized on the presence-absence dimension. A clear illustration of this is the concept of blindsight. Blindsight is the ability of observers to discriminate stimuli (i.e. identify content) without being able to detect them. Blindsight is often considered the bedrock of the cognitive neuroscience of consciousness as it acts as proof that one can dissociate between unconscious processing (the categorization of a stimulus, i.e. the content) and conscious processing of that stimulus (i.e. the ability to detect it).

      Given the above, we do not see how the suggested analysis could contribute to the conclusions that the manuscript already establishes. We hope that – given the above - the reviewer agrees with this assessment.

      Reviewer #3 (Public review):

      Summary:

      Fahrenfort et al. investigate how liberal or conservative criterion placement in a detection task affects the construct validity of neural measures of unconscious cognition and conscious processing. Participants identified instances of "seen" or "unseen" in a detection task, a method known as post hoc sorting. Simulation data convincingly demonstrate that, counterintuitively, a conservative criterion inflates effect sizes of neural measures compared to a liberal criterion. While the impact of criterion shifts on effect size is suggested by signal detection theory, this study is the first to address this explicitly within the consciousness literature. Decoding analysis of data from two EEG experiments further shows that different criteria lead to differential effects on classifier performance in post hoc sorting. The findings underscore the pervasive influence of experimental design and participants report on neural measures of consciousness, revealing that criterion placement poses a critical challenge for researchers.

      Strengths and Weaknesses:

      One of the strengths of this study is the inclusion of the Perceptual Awareness Scale (PAS), which allows participants to provide more nuanced responses regarding their perceptual experiences. This approach ensures that responses at the lowest awareness level (selection 0) are made only when trials are genuinely unseen. This methodological choice is important as it helps prevent the overestimation of unconscious processing, enhancing the validity of the findings.

      A potential area for improvement in this study is the use of single time-points from peak decoding accuracy to generate current source density topography maps. While we recognize that the decoding analysis employed here differs from traditional ERP approaches, the robustness of the findings could be enhanced by exploring current source density over relevant time windows. Event-related peaks, both in terms of timing and amplitude, can sometimes be influenced by noise or variability in trial-averaged EEG data, and a time-window analysis might provide a more comprehensive and stable representation of the underlying neural dynamics.

      We thank reviewer 3 for their positive words and for taking the time to evaluate our manuscript. If we understand the reviewer correctly, he/she suggests that the signal-to-noise ratio could be improved by averaging over time windows rather than taking the values at singular peaks in time. Before addressing this suggestion, we would like to point out that we plotted the relevant effects across time in Supplementary Figure S1A and S1B. These show that the observed effects were not somehow limited in time, i.e. only occuring around the peaks, but that they consistenly occured throughout the time course of the trial. In line with this observation one might argue that the results could be improved further by averaging across windows of interest rather than taking the peak moments alone, as the reviewer suggests. Although this might be true, there are many analysis choices that one can make, each of which could have a positive (or negative) effect on the signal to noise ratio. For example, when taking a window of interest, one is faced with a new choice to make, this time regarding the number of consecutive samples to average across (i.e. the size of the window), etc. More generally there is a long list of choices that may affect the precise outcome of analyses, either positively or negatively. Having analyzed the data in one way, the problem with adding new analysis approaches is that there is no objective criterion for deciding which analysis would be ‘best’, other than looking at the outcome of the statistical analyses themselves. Doing this would constitute an explorative double-dipping-like approach to analyzing the results, which – aside from potentially increasing the signal to noise ratio – is likely to also result in an increase of the type I error rate. In the past, when the first author of this manuscript has attempted to minimize the number of statistical tests, he has lowered the number of EEG time points by simply taking the peaks (for example see https://doi.org/10.1073/pnas.1617268114), and that is the approach that was taken here as well. Given the above, we prefer not to further ‘try out’ additional analytical approaches on this dataset, simply to improve the results. We hope the reviewer sympathizes with our position that it is methodologically most sound to stick to the analyses we have already performed and reported, without further exploration.

      It is helpful that the authors show the standard error of the mean for the classifier performance over time. A similar indication of a measure of variance in other figures could improve clarity and transparency.

      That said, the paper appears solid regarding technical issues overall. The authors also do a commendable job in the discussion by addressing alternative paradigms, such as wagering paradigms, as a possible remedy to the criterion problem (Peters & Lau, 2015; Dienes & Seth, 2010). Their consideration of these alternatives provides a balanced view and strengthens the overall discussion.

      We thank the reviewer for this suggestion. Note that we already have a measure of variance in the other figures too, namely showing the connected data points of individual participants. Indvidual data points as a visualization of variance is preferred by many journals (e.g., see https://www.nature.com/documents/cr-gta.pdf), and also shows the spread of relevant differences when paired points are connected. For example, in Figure 2, 3 and 4, the relevant difference is between the liberal and conservative condition. When wanting to show the spread of the differences between these conditions, one option would be to first subtract the two measures in a pairwise fashion (e.g., liberal-conservative), and then plot the spread of those differences using some metric (e.g. standard error/CI of the mean difference). However, this has the disadvantage of no longer separately showing the raw scores on the conditions that are being compared. Showing conditions separately provides clarity to the reader about what is being compared to what. The most common approach to visualizing the variance of the relevant difference in such cases, is to plot the connected individual data points of all participants in the same plot. The uniformity of the slope of these lines in such a visualization provides direct insight into the spread of the relevant difference. Plotting the standard error of the mean on the raw scores of the conditions in these plots would not help, because this would not visualize the spread of the relevant difference (liberal-conservative). We therefore opted in the manuscript to show the mean scores on the conditions that we compare, while also showing the connected raw data points of individual participants in the same plot. One might argue that we should then use that same visualization in figure 3A, but note that this figure is merely intended to identify the peaks, i.e. it does not compare liberal to conservative. Furthermore, plotting the decoding time lines of individual participants would greatly diminish the clarity of this figure. Given our explanation, we hope the reviewer agrees with the approach that we chose, although we are of course open to modifying the figures if the reviewer has a suggestion for doing so while taking into account the points we raise here in our response.

      Impact of the Work:

      This study effectively demonstrates a phenomenon that has been largely unexplored within the consciousness literature. Subjective measures may not reliably capture the construct they aim to measure due to criterion confounds. Future research on neural measures of consciousness should account for this issue, and no-report measures may be necessary until the criterion problem is resolved.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors could further elaborate on the results of the PAS to provide a clearer insight into the impact of response criteria, which is notably more complex than in other experiments. Specifically, the results demonstrate that conservative response criterion condition displays a considerably higher sensitivity compared to those with a liberal response criterion. It would be interesting to explore whether this shift in sensitivity suggests a correlation between changes in response criteria and conscious experiences, and how the interaction between sensitivity and response criteria can affect the neural measure of consciousness.

      We thank the reviewer for this suggestion. Note that the change in sensitivity that we observed is minor compared to the change we observed in response criterion (hedges g criterion in exp 2 = 2.02, compared to hedges g sensitivity/d’ in exp 2 = 0.42). However, we do investigate the effect of sensitivity (disregarding response criterion) on decoding accuracy. To this end we devised Figure 3C (for the full decoding time course see Supplementary Figure S1B). These figures show that the small behavioral sensitivity effects observed in both experiments (hedges g sensitivity in exp 1 = 0.30, exp 2 = 0.42) did not translate into significant decoding differences between conservative and liberal in either experiment. This comes as no surprise given the small corresponding behavioral effects. Note that small sensitivity differences between liberal and conservative conditions are commonplace, plausibly driven by the fact that being liberal also involves being more noisy in one’s response tendencies (i.e. sometimes randomly indicating presence). Further, the reviewer suggests that we might correlate changes in response criteria to changes in conscious experience. The only relevant metric of conscious experience for which we have data in this manuscript is the Perceptual Awareness Scale (PAS), so we assume the reviewer asks for a correlation between experimentally induced changes in response criterion with the equivalent changes in d’. To this end we computed the difference in the PAS-based d’ metric between conservative and liberal, as well as the difference in the PAS-based criterion metric between conservative and liberal, and correlated these across subjects (N=26) using a Spearman rank correlation. The result shows that these metrics do not correlate r(24)=0.04, p=0.85. Note however that small-N correlations like these are only somewhat reliable for large effect sizes. An N of 26 and a mere power of 80% requires an effect size of at least r=0.5 to be detectable, so even if a correlation were to exist we may not have had enough power to detect it. Due to these caveats we opted to not report this null-correlation in the manuscript, but we are of course willing to do so if the reviewer and/or editor disagrees with this assessment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tubert C. et al. investigated the role of dopamine D5 receptors (D5R) and their downstream potassium channel, Kv1, in the striatal cholinergic neuron pause response induced by thalamic excitatory input. Using slice electrophysiological analysis combined with pharmacological approaches, the authors tested which receptors and channels contribute to the cholinergic interneuron pause response in both control and dyskinetic mice (in the LDOPA off state). They found that activation of Kv1 was necessary for the pause response, while activation of D5R blocked the pause response in control mice. Furthermore, in the LDOPA off-state of dyskinetic mice, the absence of the pause response was restored by the application of clozapine. The authors claimed that (1) the D5R-Kv1 pathway contributes to the cholinergic interneuron pause response in a phasic dopamine concentration-dependent manner, and (2) clozapine inhibits D5R in the L-DOPA off state, which restores the pause response.

      Strengths:

      The electrophysiological and pharmacological approaches used in this study are powerful tools for testing channel properties and functions. The authors' group has well-established these methodologies and analysis pipelines. Indeed, the data presented were robust and reliable.

      Thank you for your comments.

      Weaknesses:

      Although the paper has strengths in its methodological approaches, there is a significant gap between the presented data and the authors' claims.

      There was no direct demonstration that the D5R-Kv1 pathway is dominant when dopamine levels are high. The term 'high' is ambiguous, and it raises the question of whether the authors believe that dopamine levels do not reach the threshold required to activate D5R under physiological conditions.

      We acknowledge that further work is necessary to clarify the role of the D5R in physiological conditions. While we haven’t found effects of the D1/D5 receptor antagonist SCH23390 on the pause response in control animals (Fig. 3), it is still possible that dopamine levels reach the threshold to stimulate D5R when burst firing of dopaminergic neurons contributes to dopamine release. We believe the pause response depends, among other factors, on the relative stimulation levels of SCIN D2 and D5 receptors, which is likely not an all-or-nothing phenomenon. To reduce ambiguity, we have eliminated the labels referring to dopamine levels in Figure 6F.

      Furthermore, the data presented in Figure 6 are confusing. If clozapine inhibits active D5R and restores the pause response, the D5R antagonist SCH23390 should have the same effect. The data suggest that clozapine-induced restoration of the pause response might be mediated by other receptors, rather than D5R alone.

      Thank you for letting us clarify this issue. Please note that the levels of endogenous dopamine 24 h after the last L-DOPA challenge in severe parkinsonian mice are expected to be very low. In the absence of an agonist, a pure D1/D5 antagonist would not exert an effect, as demonstrated with SCH23390 alone, which did not have an impact on the SCIN response to thalamic stimulation (Fig. 6). While clozapine can also act as a D1/D5 receptor antagonist, its D1/D5 effects in absence of an agonist are attributed to its inverse agonist properties (PMID: 24931197). Notably, SCH23390 prevented the effect of clozapine, allowing us to conclude that ligand-independent D1/D5 receptor-mediated mechanisms are involved in suppressing the pause response in dyskinetic mice. We now made it clearer in the third paragraph of the Discussion.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Tubert et al presents the role of the D5 receptor in modulating the striatal cholinergic interneuron (CIN) pause response through D5R-cAMP-Kv1 inhibitory signaling. Their model elucidates the on / off switch of CIN pause, likely due to the different DA affinity between D2R and D5R. This machinery may be crucial in modulating synaptic plasticity in cortical-striatal circuits during motor learning and execution. Furthermore, the study bridges their previous finding of CIN hyperexcitability (Paz et al., Movement Disorder 2022) with the loss of pause response in LID mice.

      Strengths:

      The study had solid findings, and the writing was logically structured and easy to follow. The experiments are well-designed, and they properly combined electrophysiology recording, optogenetics, and pharmacological treatment to dissect/rule out most, if not all, possible mechanisms in their model.

      Thank you for your comments.

      Weaknesses:

      The manuscript is overall satisfying with only some minor concerns that need to be addressed. Manipulation of intracellular cAMP (e.g. using pharmacological analogs or inhibitors) can add additional evidence to strengthen the conclusion.

      Thank you for the suggestion. While we acknowledge that we are not providing direct evidence of the role of cAMP, we chose not to conduct these experiments because cAMP levels influence several intrinsic and synaptic currents beyond Kv1, significantly affecting  membrane oscillations and spontaneous firing, as shown in Paz et al. 2021. However, we are modifying the fourth paragraph of the Discussion so there is no misinterpretation about our findings in the current work.

      Reviewer #3 (Public review):

      Summary:

      Tubert et al. investigate the mechanisms underlying the pause response in striatal cholinergic interneurons (SCINs). The authors demonstrate that optogenetic activation of thalamic axons in the striatum induces burst activity in SCINs, followed by a brief pause in firing. They show that the duration of this pause correlates with the number of elicited action potentials, suggesting a burst-dependent pause mechanism. The authors demonstrated this burst-dependent pause relied on Kv1 channels. The pause is blocked by an SKF81297 and partially by sulpiride and mecamylamine, implicating D1/D5 receptor involvement. The study also shows that the ZD7288 does not reduce the duration of the pause and that lesioning dopamine neurons abolishes this response, which can be restored by clozapine.

      Weaknesses:

      While this study presents an interesting mechanism for SCIN pausing after burst activity, there are several major concerns that should be addressed:

      (1) Scope of the Mechanism:

      It is important to clarify that the proposed mechanism may apply specifically to the pause in SCINs following burst activity. The manuscript does not provide clear evidence that this mechanism contributes to the pause response observed in behavioral animals. While the thalamus is crucial for SCIN pauses in behavioral contexts, the exact mechanism remains unclear. Activating thalamic input triggers burst activity in SCINs, leading to a subsequent pause, but this mechanism may not be generalizable across different scenarios. For instance, approximately half of TANs do not exhibit initial excitation but still pause during behavior, suggesting that the burst-dependent pause mechanism is unlikely to explain this phenomenon. Furthermore, in behavioral animals, the duration of the pause seems consistent, whereas the proposed mechanism suggests it depends on the prior burst, which is not aligned with in vivo observations. Additionally, many in vivo recordings show that the pause response is a reduction in firing rate, not complete silence, which the mechanism described here does not explain. Please address these in the manuscript.

      Thank you for your valuable feedback. While the absence of an initial burst in some TANs in vivo may suggest the involvement of alternative or additional mechanisms, this does not exclude a participation of Kv1 currents. We have seen that subthreshold depolarizations induced by thalamic inputs are sufficient to produce an afterhyperpolarization (AHP) mediated by Kv1 channels (see Tubert et al., 2016, PMID: 27568555). Although such subthreshold depolarizations are not captured in current recordings from behaving animals, intracellular in vivo recordings have demonstrated an intrinsically generated AHP after subthreshold depolarization of SCIN caused by stimulation of excitatory afferents (PMID: 15525771). Additionally, when pause duration is plotted against the number of spikes elicited by thalamic input (Fig. 1G), we found that one elicited spike is followed by an interspike interval 1.4 times longer than the average spontaneous interspike interval. We acknowledge the potential involvement of additional factors, including a decrease of excitatory thalamic input coinciding with the pause, followed by a second volley of thalamic inputs (Fig. 1J-K, after observations by Matsumoto et al., 2001- PMID: 11160526), as well as the timing of elicited spikes relative to ongoing spontaneous firing (Fig. 1D-E). Dopaminergic modulation (Fig. 3) and regional differences among striatal regions (PMID: 24559678) may also contribute to the complexity of these dynamics. 

      (2) Terminology:

      The use of "pause response" throughout the manuscript is misleading. The pause induced by thalamic input in brain slices is distinct from the pause observed in behavioral animals. Given the lack of a clear link between these two phenomena in the manuscript, it is essential to use more precise terminology throughout, including in the title, bullet points, and body of the manuscript.

      While we acknowledge that our study does not include in vivo evidence, we believe ex vivo preparations have been instrumental in elucidating the mechanisms underlying the responses observed in vivo. We also agree with previous ex vivo studies in using consistent terminology. However, we will clarify the ex vivo nature of our work in the abstract and bullet points for greater transparency.

      (3) Kv1 Blocker Specificity:

      It is unclear how the authors ruled out the possibility that the Kv1 blocker did not act directly on SCINs. Could there be an indirect effect contributing to the burst-dependent pause? Clarification on this point would strengthen the interpretation of the results.

      Thank you for letting us clarify this issue. In our previous work (Tubert et al., 2016) we showed that the Kv1.3 and Kv1.1 subunits are selectively expressed in SCIN throughout the striatum. Moreover, gabaergic transmission is blocked in our preparations. We are including a phrase to make it clearer in the manuscript (Results section, subheading “The pause response to thalamic stimulation requires activation of Kv1 channels”).

      (4) Role of D1 Receptors:

      While it is well-established that activating thalamic input to SCINs triggers dopamine release, contributing to SCIN pausing (as shown in Figure 3), it would be helpful to assess the extent to which D1 receptors contribute to this burst-dependent pause. This could be achieved by applying the D1 agonist SKF81297 after blocking nAChRs and D2 receptors.

      Thank you for letting us clarify this point. We show that blocking D2R or nAChR reduces the pause only for strong thalamic stimulation eliciting 4 SCIN spikes (Figure 3G), whereas the D1/D5 agonist SKF81297 is able to reduce the pause induced by weaker stimulation as well (Figure 3C). In addition, the D1/D5 receptor antagonist SCH23390 does not modify the pause response (Figure 3C). This may indicate that nAChR-mediated dopamine release induced by thalamic-induced bursts more efficiently activates D2R compared to D5R. We speculate that, in this context, lack of D5R activation may be necessary to keep normal levels of Kv1.3 currents necessary for SCIN pauses.

      (5) Clozapine's Mechanism of Action:

      The restoration of the burst-dependent pause by clozapine following dopamine neuron lesioning is interesting, but clozapine acts on multiple receptors beyond D1 and D5.

      Although it may be challenging to find a specific D5 antagonist or inverse agonist, it would be more accurate to state that clozapine restores the burst-dependent pause without conclusively attributing this effect to D5 receptors.

      Thank you for your insightful observation. We acknowledge the difficulty of targeting dopamine receptors pharmacologically due to the lack of highly selective D1/D5 inverse agonists. We used SCH23390, which is a highly selective D1/D5 receptor antagonist devoid of inverse agonist effects, to block clozapine’s ability to restore SCIN pauses (Figure 6C). This indicates that the restoration of SCIN pauses by clozapine depends on D1/D5 receptors. Furthermore, in a previous study, we demonstrated that clozapine’s effect on restoring SCIN excitability in dyskinetic mice (a phenomenon mediated by Kv1 channels in SCIN; Tubert et al., 2016) was not due to its action on serotonin receptors (Paz, Stahl et al., 2022). While our data do not rule out the potential contribution of other receptors, such as muscarinic acetylcholine receptors, we believe they strongly support the role of D1/D5 receptors. To reflect this, we added a statement discussing the potential contribution of receptors beyond D1/D5 in the last paragraph of the Discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The effect of MgTx was not consistent with the previous study (Tubert, 2016). I expected MgTx to increase the basal firing rate of cholinergic interneurons.

      Thank you for highlighting this. In our previous study we used ACSF in the recording pipette, instead of the intracellular solution -higher in potassium- used in the present study. This is likely related to the higher spontaneous firing rates of SCIN observed in the present study, which made the SCIN response stand out. In addition, our previous study analyzed the effect of MgTx on spontaneous firing frequency of SCIN isolated from major circuit regulation by adding CNQX and picrotoxin to the bath, while in this study we needed to preserve the thalamic input and only picrotoxin in the bath was used. Given these differences, the two conditions are not strictly comparable but rather give complementary information.

      (2) In the text, the authors claim that "SCINs recorded in the parkinsonian OFF-L-DOPA condition show an increase in membrane excitability that mimics changes acutely induced by SKF81297 in SCINs from control mice." However, the data for SKF81297 do not support this claim.

      We modified the text to make it clearer that the cited phrase refers to a previous publication (PMID: 35535012) in which SCIN intrinsic excitability was characterized via analysis of responses to somatic current injection in whole-cell recordings. In the present study Fig. 3D shows SKF81297 effects on interspike intervals during spontaneous activity with a trend towards increased firing, and Fig. 4E a lack of effect on “burst duration” for responses with different numbers of spikes elicited by thalamic afferent stimulation. 

      (3) I recommend testing whether other receptors, such as D2R, contribute to the clozapineinduced pause response in the L-DOPA off state.

      Thank you for your suggestion. We acknowledge that studying the role of D2R is important. However, our preliminary data suggest that a comprehensive follow up study, which is beyond the scope of this manuscript, is necessary to understand their role. 

      Reviewer #2 (Recommendations for the authors):

      (1) For Figure 1D-E, I understand that the authors are trying to state that the previous spontaneous spike contributes to a hyperpolarized window that induces a delay in the evoked spikes. However, it is almost impossible to discriminate between spontaneous and evoked spikes in this experiment. Furthermore, considering the tonic firing property, I highly suspect that even a sham control design (no optogenetic light) will give you a similar distribution as in Figure 1E (the longer IN X1, the shorter in X2).

      We agree that some spikes following stimulus onset may have occurred independently of the light stimulus, as it is also possible during behavioral tasks. We used the baseline recordings to estimate the effects of a sham stimulus as requested and included the data in Fig. 1E-F. As expected, the sham stimulation data showed a similar inverse relationship with the time elapsed from the preceding spike, but latencies were longer than with the stimulus (except for times close to the average ISI), suggesting that the optical stimulation increased the probability of evoking a spike (Fig. 1F). Remarkably, the pause following this threshold stimulation was significantly longer than the baseline ISI, as reported in the main text (Results section, last sentence of first paragraph).

      (2) The authors used optogenetics to induce thalamic inputs to induce the pause after bursts. Considering CINs also receive inputs from different brain regions, e.g. cortex, does this phenomena/pause after bursts also exist following cortical inputs?

      We did not study the SCIN response to cortical inputs, but both thalamic and cortical inputs seem to drive SCIN pause-responses as observed by others (PMID: 24553950).  

      (3) The effect of the D5R inverse agonism, and the further combined with D5R agonist and antagonist, faithfully reveal/confirm the increase of ligand-independent activity of D5R in LID reported previously. It would be ideal to also directly modulate intracellular cAMP (as in the 2022 paper) to confirm the rescue effects on the CIN pause response.

      Please, see our response in the public review.

      (4) In healthy conditions, the balance between D2R and D5R signaling (shown in Figure 6F left) switches the pause and no pause modes which potentially contributes to cortical-striatal plasticity. How about in LID off L-DOPA condition? Is it possible to rescue/modulate the pause on/off mode by D2R agonism in LID?

      We haven’t tested the effect of D2 agonists yet, but this is scheduled for follow up studies. 

      Reviewer #3 (Recommendations for the authors):

      (1) The authors use the ratio of pause duration to baseline ISI to describe the pause, which is useful for detecting significant differences. However, it would be beneficial to also report the actual duration of the burst-dependent pause to provide readers with a clearer understanding of the variation in pauses.

      In all figures we report the average baseline ISI duration for each experiment / experimental condition, allowing readers to estimate actual pause durations. We added in the main text actual average pause durations corresponding to Fig. 1H, which are representative of those observed along the study.

      (2) In Figure 3D, a more detailed comparison would be helpful, as there appears to be a significant difference between the SKF81297 group and others.

      We acknowledge that there might be a trend towards reduced ISIs, however, it was statistically non-significant (see legend of figure 3). In addition, the effect of SKF81297 seems unrelated to this trend, as its effect is also seen under the effect of ZD7288, which substantially prolongs the baseline ISI (Fig. 4E-F).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment  

      This manuscript reports important findings that the methyltransferase METTL3 is involved in the repair of abasic sites and uracil in DNA, mediating resistance to floxuridine-driven cytotoxicity. Convincing evidence shows the involvement of m6A in DNA based on single cell imaging and mass spec data. The authors present evidence that the m6A signal does not result from bacterial contamination or RNA, but the text does not make this overly clear.

      We thank the editors for recognizing the importance of our work and the relevance of METTL3 and 6mA in DNA repair. We agree the evidence presented can be regarded as convincing, in that it includes validation with orthogonal approaches and excludes the source of 6mA being RNA or bacterial contamination.

      To clarify, the identification of 6mA in DNA, upon DNA damage, is based first on immunofluorescence observations using an anti-m6A antibody. In this setting, removal of RNA with RNase treatment fails to reduce the 6mA signal, excluding the possibility that the source of signal is RNA. In contrast, removal of DNA with DNase treatment removes all 6mA signal, strongly suggesting that the species carrying the N6-methyladenosine modification is DNA (Figure 3D, E). Importantly, in Figure 3F, G, we provide orthogonal, quantitative mass spectrometry data that independently confirm this finding. Mass spectrometry-liquid chromatography of DNA analytes, conclusively shows the presence of 6mA in DNA upon treatment with DNA damaging agents and excludes that the source is RNA, based on exact mass. 

      Cells only show the 6mA signal when treated with DNA damaging agents, and the 6mA is absent from untreated cells (Figure 3D, E, H, I). This provides strong evidence that the 6mA signal is not a result of bacterial contamination in our cell lines. Furthermore, our cell lines are routinely tested for mycoplasma contamination. It could be possible that stock solutions of DNA damaging agents may be contaminated, but this would need to be true for all individual drugs and stocks tested, which is highly unlikely. Moreover, the data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3H, I) provides strong evidence against bacterial contamination in our stocks.  

      In summary, we provide conclusive evidence, based on orthogonal methods, that the METTL3-dependent N6-methyladenosine modification is deposited in DNA, not RNA, in response to DNA damage and have now clarified these points in the results and discussion. 

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      The authors sought to identify unknown factors involved in the repair of uracil in DNA through a CRISPR knockout screen.  

      Strengths:  

      The screen identified both known and unknown proteins involved in DNA repair resulting from uracil or modified uracil base incorporation into DNA. The conclusion is that the protein activity of METTL3, which converts A nucleotides to 6mA nucleotides, plays a role in the DNA damage/repair response. The importance of METTL3 in DNA repair, and its colocalization with a known DNA repair enzyme, UNG2, is well characterized.  

      Weaknesses:  

      This reviewer identified no major weaknesses in this study. The manuscript could be improved by tightening the text throughout, and more accurate and consistent word choice around the origin of U and 6mA in DNA. The dUTP nucleotide is misincorporated into DNA, and 6mA is formed by methylation of the A base present in DNA. Using words like 6mA "deposition in DNA" seems to imply it results from incorporation of a methylated dATP nucleotide during DNA synthesis.  

      The increased presence of 6mA during DNA damage could result from methylation at the A base itself (within DNA) or from incorporation of pre-modified 6mA during DNA synthesis. Our data do not directly discriminate between these two mechanisms, and we clarified this point in the discussion.  

      Reviewer #2 (Public review):  

      Summary:  

      In this work, the authors performed a CRISPR knockout screen in the presence of floxuridine, a chemotherapeutic agent that incorporates uracil and fluoro-uracil into DNA, and identified unexpected factors, such as the RNA m6A methyltransferase METTL3, as required to overcome floxuridine-driven cytotoxicity in mammalian cells. Interestingly, the observed N6-methyladenosine was embedded in DNA, which has been reported as DNA 6mA in mammalian genomes and is currently confirmed with mass spectrometry in this model. Therefore, this work consolidated the functional role of mammalian genomic DNA 6mA, and supported with solid evidence to uncover the METTL3-6mA-UNG2 axis in response to DNA base damage.  

      Strengths:  

      In this work, the authors took an unbiased, genome-wide CRISPR approach to identify novel factors involved in uracil repair with potential clinical interest.  

      The authors designed elegant experiments to confirm the METTL3 works through genomic DNA, adding the methylation into DNA (6mA) but not the RNA (m6A), in this base damage repair context. The authors employ different enzymes, such as RNase A, RNase H, DNase, and liquid chromatography coupled to tandem mass spectrometry to validate that METTL3 deposits 6mA in DNA in response to agents that increase genomic uracil.  

      They also have the Mettl3-KO and the METTL3 inhibition results to support their conclusion.  

      Weaknesses:  

      Although this study demonstrates that METTL3-dependent 6mA deposition in DNA is functionally relevant to DNA damage repair in mammalian cells, there are still several concerns and issues that need to be improved to strengthen this research.  

      First, in the whole paper, the authors never claim or mention the mammalian cell lines contamination testing result, which is the fundamental assay that has to be done for the mammalian cell lines DNA 6mA study.  

      Our cell lines are routinely tested for bacterial contamination, specifically mycoplasma, and we state this information in the revised manuscript. 

      Importantly, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on the presence of DNA damage and not caused by contamination in the cell lines (Figure 3D, E, H, I). While it could be possible that stock solutions of DNA damaging agents may be contaminated, this would need to be the case for all individual drugs and stocks tested that induce 6mA, which is very unlikely. Finally, the data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3 H, I) provides strong evidence against bacterial contamination in our drug stocks.  

      Second, in the whole work, the authors have not supplied any genomic sequencing data to support their conclusions. Although the sequencing of DNA 6mA in mammalian models is challenging, recent breakthroughs in sequencing techniques, such as DR-Seq or NT/NAME-seq, have lowered the bar and improved a lot in the 6mA sequencing assay. Therefore, the authors should consider employing the sequencing methods to further confirm the functional role of 6mA in base repair.  

      While we agree that it could be important to understand the precise genomic location of 6mA in relation to DNA damage, this is outside the scope of the current study. Moreover, this exercise may prove unproductive. If 6mA is enriched in DNA at damage sites or as DNA is replicated, the genomic mapping of 6mA is likely to be stochastic. If stochastic, it would be impossible to obtain the read depth necessary to map 6mA accurately. 

      Third, the authors used the METTL3 inhibitor and Mettl3-KO to validate the METTL36mA-UNG2 functional roles. However, the catalytic mutant and rescue of Mettl3 may be the further experiments to confirm the conclusion.  

      We believe this to be an excellent suggestion from Reviewer #2 but we are unable to perform the proposed experiment at this time. We encourage future studies to explore the rescue experiment.  

      Reviewer #3 (Public review):  

      Summary:  

      The authors are showing evidence that they claim establishes the controversial epigenetic mark, DNA 6mA, as promoting genome stability.  

      Strengths:  

      The identification of a poorly understood protein, METTL3, and its subsequent characterization in DDR is of high quality and interesting.  

      Weaknesses:  

      (1) The very presence of 6mA (DNA) in mammalian DNA is still highly controversial and numerous studies have been conclusively shown to have reported the presence of 6mA due to technical artifacts and bacterial contamination. Thus, to my knowledge there is no clear evidence for 6mA as an epigenetic mark in mammals, and consequently, no evidence of writers and readers of 6mA. None of this is mentioned in the introduction. Much of the introduction can be reduced, but a paragraph clearly stating the controversy and lack of evidence for 6mA in mammals needs to be added, otherwise, the reader is given an entirely distorted view of the field.  

      These concerns must also be clearly in the limitations section and even in the results section which fails to nuance the authors' findings. 

      We agree with the reviewer that the presence and potential function of 6mA in mammalian DNA has been debated. Importantly, the debate regarding the presence and quantity of 6mA in DNA has been previously restricted to undamaged, baseline conditions. In complete agreement with this notion, we do not detect appreciable levels of 6mA in untreated cells. We revised the introduction section to present the debate about 6mA in DNA. We, however, want to highlight that our study provides, for the first time, convincing evidence (based on two orthogonal methods) that 6mA is present in DNA in response to a stimulus, DNA damage. We do not claim or provide any data that suggest 6mA is a baseline epigenetic mark.  

      (2) What is the motivation for using HT-29 cells? Moreover, the materials and methods do not state how the authors controlled for bacterial contamination, which has been the most common cause of erroneous 6mA signals to date. Did the authors routinely check for mycoplasma? 

      HT-29 is a cell line of colorectal origin and chemotherapeutic agents that introduce uracil and uracil derivatives in DNA, as those used in this study, are relevant for the treatment of colorectal cancer. As indicated above, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on DNA damage and not caused by a potential bacterial contamination (Figure 3D, E, H, I). Additionally, our cell lines are routinely tested for bacterial contamination, specifically mycoplasma. 

      (3) The single cell imaging of 6mA in various cells is nice. The results are confirmed by mass spec as an orthogonal approach. Another orthogonal and quantitative approach to assessing 6mA levels would be PacBio. Similarly, it is unclear why the authors have not performed dot-blots of 6mA for genomic DNA from the given cell lines.

      We are confused by this point since an orthogonal approach to detect 6mA, mass spectrometry-liquid chromatography, was employed. This method does not use an antibody and confirms the increase of 6mA in DNA when cells were treated with DNA damaging agents. This data is presented in Figure 3F, G. 

      It is sensible to hypothesize that the localization of 6mA is consistent with DNA replication (like uracil deposition). In this event, the genomic mapping of 6mA is likely to be stochastic. This would make quantification with PacBio sequencing difficult because it would be very challenging to achieve the appropriate read depth to call a modified base. 

      Dot blots rely on an antibody and thus are not truly orthogonal to our immunofluorescence-based measurements. We preferred the mass spectrometry-liquid chromatography approach we took as a true orthogonal approach.  

      (4) The results of Figure 3 need further investigation and validation. If the results are correct the authors are suggesting that the majority of 6mA in their cell lines is present in the DNA, and not the RNA, which is completely contrary to every other study of 6mA in mammalian cells that I am aware of. This could suggest that the antibody is not, in fact, binding to 6mA, but to unmodified adenine, which would explain why the signal disappears after DNAse treatment. Indeed, binding of 6mA to unmethylated DNA is a commonly known problem with most 6mA antibodies and is well described elsewhere.  

      Based on this and the following comment, we are convinced that Reviewer #3 has overlooked two critical elements of our study:

      First, the immunofluorescence work presented in Figure 3, showing 6mA signal in response to DNA damage, uses cells that were pre-extracted to remove excess cytoplasmic RNA. This method is often used in immunofluorescence experiments of this kind. The pre-extraction method removes most of the cytoplasmic content, and the majority of the cytoplasmic m6A RNA signal. Supplementary Figure 3D shows cells that have not been pre-extracted prior to staining. These images show the cytoplasmic m6A signal is abundant if we do not perform the pre-extraction step. 

      If the antibody used to label 6mA significantly reacted with unmodified adenine, we would expect a large signal in untreated or untreated and denatured conditions. In contrast, an increase in 6mA is not observed in either case.

      Second, the orthogonal approach we employed, mass spectrometry coupled with liquid chromatography, measures 6mA DNA analytes specifically by exact mass. This approach does not depend on an antibody and yields results consistent with those from the immunofluorescence experiments. 

      (5) Given the lack of orthologous validation of the observed DNA 6mA and the lack of evidence supporting the presence of 6mA in mammalian DNA and consequently any functional role for 6mA in mammalian biology, the manuscript's conclusions need to be toned down significantly, and the inherent difficulty in assessing 6mA accurately in mammals acknowledged throughout.  

      As discussed in response to prior comments, Figure 3 does provide two independent and orthologous methods that demonstrate 6mA presence in DNA specifically, and not RNA, in response to DNA damage. Complementary and orthogonal datasets are presented using either immunofluorescence microscopy or mass spectrometry-liquid chromatography of extracted DNA. The latter method does not rely on an antibody and can discriminate 6mA DNA versus RNA based on exact mass. We revised the text to clarify that Figure 3F, G is a completely orthogonal approach. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):  

      The authors cited most of the related publications; however, the reviewer suggested that three 2015 papers in Cell (Dahua Chen's, Yang Shi's, and Chuan He's) and the 2016 Nature (Andrew Xiao's) article are worth citing here because those are the milestone works reported the genomic DNA 6mA, for the first wave, in eukaryotic and mammalian genomes.  

      Furthermore, in Tao P. Wu and Andrew Z. Xiao's 2016 Nature article, the result has already emphasized the genomic DNA 6mA is enriched in the H2A.X sites; therefore, that work indicated the link between DNA damage and repair and 6mA's functional role. The authors may add some comments or discussion on this point.  

      Last but not least, the authors may also need to discuss the reported evidence of DNA 6mA's function in mitochondria.  

      We thank the reviewer for these suggestions. We revised our introduction and include additional references and discussion points, as suggested by the reviewer. 

      Reviewer #3 (Recommendations for the authors):  

      Minor points:  

      (1) In general, the manuscript is too verbose, and the amount of text can be dramatically reduced/sharpened. The introduction in particular is too long. 

      We revised the manuscript and reduced text when appropriate.

      (2) Each results section can also be condensed to improve clarity significantly. Indeed the results section reads like a 'Result & Discussion' section, which is then followed by a Discussion. Maybe the discussion section can be shortened to a 'conclusion'.

      We revised the results section when appropriate and reworked the discussion.

      Importantly, we revised the text related to Figure 3 as it does appear that Reviewer #3 did not appreciate key results present in this figure, specifically the orthogonal, mass spectrometry approach validating the discovery of 6mA DNA species (Figure 3F, G). We added a schematic as Figure 3F to further clarify this point as well. 

      (3) The accession number for sequencing data in GEO data should be provided.  

      The accession numbers is now provided in the manuscript. GSE282260.

      (4) All figures are unnecessarily small and in some cases, supporting figures from the supplementary data should be moved into the main figure to improve clarity. 

      The figures are of high image quality and can be enlarged easily. If there are specific figures that the reviewer believes will improve clarity, we would be happy to move them.

  2. Jan 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important work proposes a neural network model of interactions between the prefrontal cortex and basal ganglia to implement adaptive resource allocation in working memory, where the gating strategies for storage are adjusted by reinforcement learning. Numerical simulations provide convincing evidence for the superiority of the model in improving effective capacity, optimizing resource management, and reducing error rates, as well as solid evidence for its human-like performance. The paper could be strengthened further by a more thorough comparison of model predictions with human behavior and by improved clarity in presentation. This work will be of broad interest to computational and cognitive neuroscientists, and may also interest machine-learning researchers who seek to develop brain-inspired machine-learning algorithms for memory.

      We thank the reviewers for their thorough and constructive comments, which have helped us clarify, augment and solidify our work. Regarding the suggestion to include a “more thorough comparison with with human behavior”, we believe this comment reflects one of the reviewer’s suggestion to compare with sequential order effects. We now include a new section with simulations showing that the network exhibits clear recency effects in accordance with the literature, and where such recency effects are known to be related to WM interference and not due to passive decay. Overall our work makes substantial contact with human behavioral patterns that have been documented in the human literature (and which as far as we know have not been jointly captured by any one model), such as the shape of the error distributions, including probability of recall and variable precision;  attraction to recently presented items,  sensitivity to reinforcement history, set-size dependent chunking, recency effects,  dopamine manipulation effects, as well of a range of human data linking capacity limitations to frontostriatal function. It also provides a theoretical proposal for the well established phenomenon of capacity limitations in humans, suggesting that they arise due to difficulty in WM management.

      Below we address each reviewer individually, responding to each comment and providing the relevant location in the paper that the changes and additions were made. Reviewer responses are included in blue/bold for clarity.  

      Public Reviews:

      Reviewer 1:

      Thank you for your comments. We appreciate your statements of the strengths of this paper and your suggestions to improve this paper.

      First, the method section appears somewhat challenging to follow. To enhance clarity, it might be beneficial to include a figure illustrating the overall model architecture. This visual aid could provide readers with a clearer understanding of the overall network model.

      Additionally, the structure depicted in Figure 2 could be potentially confusing. Notably, the absence of an arrow pointing from the thalamus to the PFC and the apparent presence of two separate pathways, one from sensory input to the PFC and another from sensory input to the BG and then to the thalamus, may lead to confusion. While I recognize that Figure 2 aims to explain network gating, there is room for improvement in presenting the content accurately.

      As suggested, we added a figure (new figure 2) illustrating the overall model architecture before expanding it to show the chunking circuitry. This figure also shows the projections from thalamus to PFC (we preserve the previous figure 2, now figure 3, as an example sequence of network gating decisions, in more abstract form to help facilitate a functional understanding of the sequence of events without too much clutter). We also made several other general clarifications to the methods sections to make it more transparent and easier to follow, as per your suggestions.   

      Still, for the method part, it would enhance clarity to explicitly differentiate between predesigned (fixed) components and trainable components. Specifically, does the supplementary material state that synaptic connection weights in striatal units (Go&NoGo) are trained using XCAL, while other components, such as those in the PFC and lateral inhibition, are not trained (I found some sentences in 'Limitations and Future Directions')?

      We have now explicitly specified learned and fixed components. We have further explained the role of XCAL and how striatal Go/NoGo weights are trained. We have also added clarification on how gating policies are learned via eligibility traces and synaptic tags.

      I'm not sure about the training process shown in Figure 8. It appears that the training may not have been completed, given that the blue line representing the chunk stripe is still ascending at the endpoint. The weights depicted in panel d) seem to correspond with those shown in panels b) and c), no? Then, how is the optimization process determined to be finished? Alternatively, could it be stated that these weight differences approach a certain value asymptotically? It would be better to clarify the convergence criteria of the optimization process.

      The training process has been clarified and we specify (in the last paragraph of the Base PBWM Model) how we determine when training is complete. We also can confirm that the network behavior has stabilized in learning even if the Go/NoGo weights continue to grow over time for the chunked layer (due to imperfect performance and reinforcement of the chunk gating strategy).

      Reviewer 2:

      Thank you for your comments. We appreciate your notes on the strengths of the paper and your suggestions to help improve the paper.

      The model employs a spiking neural network, which is relatively complex. Additionally, while this paper validates the effectiveness of chunking strategies used by the brain to enhance working memory efficiency through computational simulations, further comparison with related phenomena observed in cognitive neuroscience experiments on limited working memory capacity, such as the recency effect, is necessary to verify its generalizability.

      Thank you for proposing we add in more connections with human WM. Based on your specific recommendation, we have included the section “Network recapitulates human sequential effects in working memory.” where we discuss recency effects in human working memory and how our model recapitulates this effect. We have also made the connections to human data and human work more explicit throughout the manuscript (Figure 4c). As noted in response to the assessment, we believe our model does make contact with a wide variety of cognitive neuroscience data in human WM, such as the shape of the error distributions,  including probability of recall and variable precision;  attraction to recently presented items,  sensitivity to

      reinforcement history, set-size dependent chunking, recency effects, and dopamine manipulation effects, as well of a range of human data linking capacity limitations to frontostriatal function. It also provides a theoretical proposal for the well established phenomenon of capacity limitations in humans, suggesting that they arise due to difficulty in WM management.

      Recommendations For The Authors:

      Reviewer 1:

      I appreciate the authors' clear discussion of the limitations of this work in the section "Limitations and Future Directions". The development of a comprehensive model framework to overcome these constraints should require a separate paper, though, I am curious if the authors have attempted any experiments, such as using two identically designed chunking layers, that could partially support the assumptions presented in the paper.

      Expanding the number of chunking layers is a great future direction. We felt that it was most effective for this paper to begin with a minimal set up with proof of concept. We hypothesize that, given our results, a reinforcement learning algorithm would be able to learn to select the best level of abstraction (degree of chunking) in more continuous form, but would require more experience across a range of tasks to do so.

      I'm not sure whether it's appropriate that "Frontostriatal Chunking Gating..." precedes "Dopamine Balance is...", maybe it would be better to reverse the order thus avoiding the need to mention the role of dopamine before delving into the details. Additionally, including a summary at the end of the Introduction, outlining how the paper is organized, could provide readers with a clear roadmap of the forthcoming content.

      We appreciate this suggestion. After careful thought, we wanted to preserve the order because we felt it was important to make the direct connection between set size and stripe usage following the discussion on performance based on increasing stripes.  

      The authors could improve the overall polish of the paper. The equations in the Method section are somewhat confusing: Eq. (2) appears incorrect, as it lacks a weight w_i and n should presumably be in the denominator. For Eq. (3), the comma should be replaced with ']'... It would be advisable to cross-reference these equations with the original O'Reilly and Frank paper for consistency.

      Thank you for pointing out the errors in the method equations- those equations were indeed rendering incorrectly. We have fixed this problem.  

      Additionally, there are frequent instances of missing figure and reference citations (many '?'s), and it would be beneficial to maintain consistent citation formatting throughout the paper: sometimes citations are presented as "key/query coding (Traylor, Merullo, Frank, and Pavlick, 2024; see also Swan and Wyble, 2014)", while other times they are written as "function (O'Reilly & Frank, 2006)"...

      Lastly, there is an empty '3.1' section in the supplementary material that should be addressed.

      The citation issues were fixed. The supplementary information was cleaned and the missing section was removed. Thank you for mentioning these errors.  

      Reviewer 2:

      Thank you for the following recommendations and suggestions. We respond to each individual point based on the numbering system used in your review.  

      (1) This paper utilizes the experimental paradigm of visual working memory, in which different visual stimuli are sequentially loaded into the working memory system, and the accuracy of memory for these stimuli is calculated.

      The authors could further plot the memory accuracy curve as the number of items (N) increases, under both chunking and non-chunking strategies. This would allow for the examination of whether memory accuracy suddenly declines at a specific value of N (denoted as Nc), thereby determining the limited capacity of working memory within this experimental framework, which is about 4 different items or chunks. Additionally, it could be investigated whether the value of Nc is larger when the chunking strategy is applied.

      We have included an additional plot (Probability of Recall) as a supplemental figure to Figure 5 to explore the probability of recall as a function of set size for both chunking and no chunking models.  This plot shows that the chunking model increases probability of recall when set size exceeds allocated capacity (but that nevertheless both models show decreases in recall with set size, consistent with the literature).

      (2) The primacy effect or recency effect observed in the experiments and traditional working memory models, including the slot model and the limited resource model, should be examined to see if it also appears in this model.

      The literature on human working memory shows a prevalent recency effect (but not a primacy effect, which is thought to be due to episodic memory, and which is not included in our model). We have added a section showing that our model demonstrates clear recency effects.

      (3) The construction of the model and the single neuron dynamics involved need further refinement and optimization:

      Model Description: The details of the model construction in the paper need to be further elaborated to help other researchers better understand and apply the model in reproducing or extending research. Specifically:

      a) The construction details of different modules in the model (such as Input signal, BG, striatum, superficial PFC, deep PFC) and the projection relationships between different modules. Adding a diagram to illustrate the network construction would be beneficial.

      To aid in the understanding of the model construction and model components, we have included an additional figure (Figure 1: Base Model) that explains the key layers and components of the model.  We have also altered the overall model figures to show more clearly that the inputs project to both PFC and striatum, to highlight that information is temporarily represented in superficial PFC layers even before striatal gating, which is needed for storage after the input decays.

      We have expanded the methods and equations and we also provide a link to the model github for purposes of reproducibility and sharing.  

      A base model figure was added to specify key connections.  

      a) The numbers of excitatory and inhibitory neurons within different modules and the connections between neurons.

      We added clarification on the type of connections between layers (specifying which are fixed and learned). We have also added the size of layers in a new appendix section “Layer Sizes and Inner Mechanics”

      b) The dynamics of neurons in different modules need to be elaborated, including the description of the dynamic equations of variables (such as x) involved in single neuron equations.

      Single neuron dynamics are explained in equations 1-4. Equations 5-6 explain how activation travels between layers. The specific inhibitory dynamics in the chunking layer are elaborated in Figure 4. PBWM Model and Chunking Layer Details. The Appendix section “Neural model  implementational details” states the key equations, neural information and connectivity. Since there is a large corpus of background information underlying these models, we have linked the Emergent github and specifically the Computational Cognitive Neuroscience textbook which has a detailed description of all equations. For the sake of paper length and understability, we chose the most relevant equations that distinguish our model.  

      c) The selection of parameters in the model, especially those that significantly affect the model's performance.

      The appendix section hyperparameter search details some of the key parameters and why those values were chosen.  

      d) The model employs a sequential working memory paradigm, the forms of external stimuli involved in the encoding and recalling phases (including their mathematical expressions, durations, strengths, and other parameters) need to be elaborated further.

      We appreciate this comment. We have expanded the Appendix section “Continuous Stimuli” to include the details of stimuli presentation (including durations etc).  

      (4) The figures in the paper need optimization. For example, the size of the schematic diagram in Figure 2 needs to be enlarged, while the size of text such as "present stimulus 1, 2, recall stimulus 1" needs to be reduced. Additionally, the citation of figures in the main text needs to be standardized. For example, Figure 1b, Figure 1c, etc., are not cited in the main text.

      The task sequence figure (original Figure 2) has been modified and following your suggestions, text sizes have been modified.  

      (5) Section 3.1 in the appendix is missing.

      Supplemental section 3.1 is removed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      MacDonald et al., investigated the consequence of double knockout of substance P and CGRPα on pain behaviors using a newly created mouse model. The investigators used two methods to confirm knockout of these neuropeptides: traditional immunolabeling and a neat in vitro assay where sensory neurons from either wildtype or double knock are co-cultured with substance P "sniffer cells", HEK cells stably expressing NKR1 (a substance P receptor), GCaMP6s and Gα15. It should be noted that functional assays confirming CGRPα knockout were not performed. Subsequently, the authors assayed double knockout mice (DKO) and wildtype (WT) mice in numerous behavioral assays using different pain models, including acute pain and itch stimuli, intraplanar injection of Complete Freund's Adjuvant, prostaglandin E2, capsaicin, AITC, oxaliplatin, as well as the spared nerve injury model. Surprisingly, the authors found that pain behaviors did not differ between DKO and WT mice in any of the behavioral assays or pain paradigms. Importantly, female and male mice were included in all analyses. These data are important and significant, as both substance P and CGRPα have been implicated in pain signaling, though the magnitude of the effect of a single knockout of either gene has been variable and/or small between studies.

      The conclusions of the study are largely supported by the data; however, additional experimental controls and analyses would strengthen the authors claims.

      We thank the reviewer for their insightful comments and have answered them below.

      (1) The authors note that single knockout models of either substance P or CGRPα have produced variable effects on pain behaviors that are study-dependent. Therefore, it would have strengthened the study if the authors included these single knockout strains in a side-by-side analysis (in at least some of the behavioral assays), as has been done in prior studies in the field when using double- or triple-knockout mouse models (for example, see PMID: 33771873). If in the authors hands, single knockouts of either peptide also show no significant differences in pain behaviors, then the finding that double knockouts also do not show significant differences would be less surprising.

      In our study, we found no phenotypic differences between WT and DKO mice, suggesting Substance P and CGRPα are largely dispensable for pain behavior. We agree that if we had we observed significant changes in behavior, it would have been interesting to examine the effects of knocking out each gene individually to determine which peptide is responsible for the phenotype. However, given the double deletion had no effect, we can predict that loss of each alone would have no or minor effects. In line with this, a more recent study that comprehensively phenotyped the Calca KO mouse found no deficits in a range of danger related behaviors (PMID: 34376756). Overall, as we are reporting negative data about the Double KO, we do not believe extensive studies of the single KOs is necessary to support the findings of our paper.

      (2) It is unclear why the authors only show functional validation of substance P knockout using "sniffer" cells, but not CGRPα. Inclusion of this experiment would have added an additional layer of rigor to the study.

      Imaging of CGRPα release is more challenging using the ‘sniffer’ approach because functional CGRP receptors require the expression of two genes: Calcrl (or Calcr) along with Ramp1. We now have succeeded in generating a new stable cell line expressing Calcrl and Ramp1, along with GCaMPs and human Galpha15 and include new data in the revised Figure 1F-H and Figure Supplement 1B. These cells respond robustly to CGRPalpha, but not to SP. In contrast, the existing SP cell line responds to SP but not CGRPalpha. Capsaicin evokes a strong response in these cells in co-culture with DRGs. This response is dramatically reduced in the DKO. This data therefore confirms our mice have a loss of CGRPalpha signaling as indicated by IHC.

      (3) The authors should be a bit more reserved in the claims made in the manuscript. The main claim of the study is that "CGRPα and substance P are not required for pain transmission." However, the authors also note that neuropeptides can have opposing effects that may produce a net effect of no change. In my view, the data presented show that double knockout of substance P and CGRPα do not affect somatic pain behaviors, but do not preclude a role for either of these molecules in pain signaling more generally. Indeed, the authors also note that these neuropeptides could be involved in nociceptor crosstalk with the immune or vascular systems to promote headache. The authors only assayed pain responses to glabrous skin stimulation. How the DKO mice would behave in orofacial pain assays, migraine assays, visceral pain assays, or bone/joint pain assays, for example, was not tested. I do not suggest the authors include these experiments, only that they address the limitations/weaknesses of their study more thoroughly.

      The reviewer makes an important point that we agree with. Our study assesses acute and chronic pain in peptide DKO mice lacking Substance P and CGRPα. Most of our data focuses on the hindpaw as pain in the paw is the gold-standard approach for phenotyping pain targets and numerous well-validated chronic pain models have been developed for this body site.  However, to extend the conclusions to other tissues, we did also look at visceral pain and GI distress using acetic acid and LiCl models (Figure 2J and Figure 2 supplement). We agree with the reviewer that given the utility of CGRP monoclonal antibodies, migraine experiments would be interesting for future studies using these mice, a point we highlight in the discussion. Bone/joint pain is also clearly important from a translational perspective, but outside the scope of the current study.

      (4) A more minor but important point, the authors do not describe the nature of the WT animals used. Are the littermates or a separately maintained colony of WT animals? The WT strain background should be included in the methods section.

      The WT strain are C57/BL6j from Jackson Lab. This has been added to the methods.

      Reviewer #2 (Public Review):

      Summary:

      The paper aimed to examine the effect of co-ablating Substance P and CGRPα peptides on pain using Tac1 and Calca double knockout (DKO) mice. The authors observed no significant changes in acute, inflammatory, and neuropathic pain. These results suggest that Substance P and CGRPα peptides do not play a major role in mediating pain in mice. Moreover, they reveal that the lack of behavioral phenotype cannot be explained by the redundancy between the two peptides, which are often co-expressed in the same neuron

      Strengths:

      The paper uses a straightforward approach to address a significant question in the field. The authors confirm the absence of Substance P and CGRPα peptides at the levels of DRG, spinal cord, and midbrain. Subsequently, they employ a comprehensive battery of behavioral tests to examine pain phenotypes, including acute, inflammatory, and neuropathic pain. Additionally, they evaluate neurogenic inflammation by measuring edema and extravasation, revealing no changes in DKO mice. The data are compelling, and the study's conclusions are well-supported by the results. The manuscript is succinct and well-presented.

      We thank the reviewer for their enthusiasm for the importance of our work.

      Reviewer #3 (Public Review):

      In this study, the authors were assessing the role of double global knockout of substance P and CGPRα on the transmission of acute and chronic pain. The authors first generated the double knockout (DKO) mice and validated their animal model. This is then followed by a series of acute and chronic pain assessments to evaluate if the global DKO of these neuropeptides are important in modulating acute and chronic pain behaviors. Authors found that these DKO mice Substance P and CGRPα are not required for the transmission of acute and chronic pain although both neuropeptides are strongly implicated in chronic pain. This study does provide more insight into the role of these neuropeptides on chronic pain processing, however, more work still needs to be done. (see the comments below).

      We thank the reviewer for their detailed and constructive feedback, and below outline the steps we have taken to answer their concerns.

      (1) In assessing the double KO (result #1), why are different regions of the brains shown for substance P and CGRPα (for example, midbrain for substance P and amygdala for CGRPα)? Since the authors mentioned that these peptides co-expressed in the brain (as in the introduction), shouldn't the same brain regions be shown for both IHC? It would be ideal if the authors could show both regions (midbrain and amygdala) in addition to the DRG and spinal cord for both peptides in their findings.<br /> In addition, since this is double KO, the authors should show more representative IHC-stained brain regions (spanning from the anterior to posterior).

      We could not co-stain both SP and CGRP in the same sections as the DKO mouse has endogenous GFP and RFP fluorescence, limiting us to one channel (far red). Specifically, we use a Calca KO that is a Cre:GRP knock-in/knockout (Chen et al 2018, PMID30344042) and Tac1 KO is a tagRFP knock-in/knockout (Wu et al 2018 PMID29485996). This is why we show different brain sections.

      (2) It is also unclear as to why the authors only assessed the loss of substance P signaling in the double KO mice. Shouldn't the same be done for CGRPα signaling? Either the authors assess this, or the authors have to provide clear explanations as to why only substance P signaling was assessed.

      As noted in our response to Reviewer 1, imaging of CGRP release is more challenging using the ‘sniffer’ approach because functional CGRP receptors require the expression of two genes: Calcrl (or Calcr) along with Ramp1. We have now generated this cell line and performed the experiment (see revised Figure 1 and Figure 1 Supplement).

      (3) Has these animal's naturalistic behavior been assessed after the double KO (food intake, sleep, locomotion for example)? I think this is important as changes to these naturalistic behaviors can affect pain processes or outcomes.

      We agree that assessment of naturalistic behavior including food intake, sleep and locomotion would be interesting to look at in DKO mice. However, our study is focused on acute and chronic pain behavior of these animals, and therefore a comprehensive phenotypic assessment of naturalistic home-cage behavior is outside the scope of our study.

      (4) Figure 2H: The authors acknowledge that there is a trend to decrease with capsaicin-evoked coping-like responses. However, a close look at the graph suggests that the lack of significance could be driven by 1 mouse. Have the authors run an outlier test? Alternatively, the authors should consider adding more n to these experiments to verify their conclusions.

      We were reluctant to add more animals searching for significance. Instead, we investigated the potential phenotype further by looking at cfos staining in the cord and found no differences (Figure 2, supplement 1). This result suggests loss of the two peptides does not grossly disrupt capsaicin evoked pain signal transmission between the nociceptor and post-synaptic dorsal neurons in the spinal cord.

      (5) Similarly, the values for WT in the evoked cFos activity (Figure 2- Suppl Figure 1) are pretty variable. Considering that the n number is low (n = 5), authors should consider adding more n.<br /> Also, since the n number is low in this experiment (eg. 5 vs 4), does this pass the normality test to run a parametric unpaired t-test? Either the authors increase their n numbers or run the appropriate statistical test.

      As described in the statistical tables, the Shapiro-Wilk test indicates these data do pass the normality test. Therefore, we retain the use of the unpaired t test, which demonstrates no significant difference between the groups.

      (6) In most of the results, authors ran a parametric test despite the low n number. Authors have to ensure that they are carrying out the appropriate statistical test for their dataset and n number.

      We now provide a table of the statistical results, which provides detailed information about all statistical tests performed in this study. For experiments where we make a single comparison between the two distributions (WT vs DKO), we have run a Shapiro-Wilk test. Where the data from both groups pass the normality test, we retain the use of the unpaired t test. Where the Shapiro-Wilk test indicates data from either group are unlikely to be normally distributed, we now use a Mann-Whitney U test to compare the groups, as this non-parametric test makes no assumptions about the underlying distribution.

      Many experiments involved two factors (genotype, and e.g. temperature, drug, time-point). These data were analyzed in the original submission using 2-WAY ANOVA or Repeated Measures 2-WAY ANOVA, followed by post-hoc Sidak’s tests to compute p values adjusted for multiple comparisons. Because there is no widely agreed non-parametric alternative to 2-WAY ANOVA for analyzing data with two factors and that enables us to account for multiple comparisons, we used 2-WAY ANOVA as is typically used in the field for these kinds of experiments. We reasoned sticking with the 2-WAY ANOVA was the best course of action based on information provided by the statistical software used for this study - https://www.graphpad.com/support/faq/with-two-way-anova-why-doesnt-prism-offer-a-nonparametric-alternative-test-for-normality-test-for-homogeneity-of-variances-test-for-outliers/

      We note that regardless of the test, our conclusion that there are no major changes in acute or chronic pain behaviors are clear and strongly supported.

      (7) Along the same line of comment with the previous, authors should increase the n number for DKO for staining (Figure 4) as n number is only 3 and there is variability in the cFos quantification in the ipsilateral side.

      We believe this is not necessary as the finding is clear that there is no difference.

      (8) Authors should provide references for statement made in Line 319-321 as authors mentioned that there are accumulating evidence indicating that secretion of these neuropeptides from nociceptor peripheral terminals modulates immune cells and the vasculature in diverse tissues.

      We now provide several references to primary papers and reviews supporting this statement.

      (9) Authors state that the sample size used was similar to those from previous studies, but no references were provided. Also, even though the sample sizes used were similar, I believe that the right statistic test should be used to analyze the data.

      We have now cited several classic studies phenotyping mouse KOs in pain in the methods that used similar sample sizes. As detailed above, we have taken the reviewer’s feedback on board and performed normality testing to ensure the correct statistical test is used for each experiment.

      (10) In the discussion, the authors noted that knocking out of a gene remains the strongest test of whether the molecule is essential for a biological phenomenon. At the same time, it was acknowledged that Substance P infusion into the spinal cord elicits pain, but it is analgesic in the brain. The authors might want to expand more on this discussion, including how we can selectively assess the role of these neuropeptides in areas of interest. For example, knocking out both Substance P and CGRPα in selected areas instead of the global KO since there are reported compensatory effects.

      This is highlighted in the closing paragraph: “Emerging approaches to image and manipulate these molecules (Girven et al., 2022; Kim et al., 2023), as well as advances in quantitating pain behaviors (Bohic et al., 2023; MacDonald and Chesler, 2023), may ultimately reveal the fundamental roles of neuropeptides in generating our experience of pain.” The Kim preprint (now published, and so the citation has been updated in the text) describes a method of inactivating neuropeptide transmission in select brain regions in a cell-type specific manner.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I do not have any major comments. My minor comments are as follows:

      (1) What was the control group for all behavioral studies? Was it WT from an independent colony or one of the littermates was used for generating controls?

      We used C57/Bl6 mice from Jax. This is now mentioned in methods.

      (2) In Fig. 2H, it seems that the effect will become significant if several mice are added.

      We are reluctant to add mice searching for significance. Sample sizes were determined before we collected the data blind.

      (3) There is no figure 3, but two figures 4.

      Thank you. This has been corrected.

      (4) Multiple typos in the legend for figure 4 (lines 234-254). Line 242 (& n=8 (3M, 3F)), line 243 (swelling and plasma), line 252 ((n=8 for) & n=6 for DKO (4M, 4F)).

      Thank you. This has been corrected.

      (5) In Figure 4 (lines 273-285), the contralateral side is mentioned in B but no images are shown.

      Thank you. We removed the mention.

      (6) Although ligand knockouts cannot be compared directly with receptor inhibition, the readers could benefit from discussing studies of receptor ablation and/or pharmacological inhibition.

      We do discuss the classic studies of receptor KO, and the clinical data on receptor blockers here –

      “However, selective antagonists of the Substance P receptor NKR1 failed to relieve chronic pain in human clinical trials (Hill, 2000). Although CGRP monoclonal antibodies and receptor blockers have proven effective for subsets of migraine patients, their usefulness for other types of pain in humans is unclear (De Matteis et al., 2020; Jin et al., 2018). In line with this, knockout mice deficient in Substance P, CGRPα or their receptors have been reported to display some pain deficits, but the analgesic effects are neither large nor consistent between studies (Cao et al., 1998; De Felipe et al., 1998; Guo et al., 2012; Salmon et al., 2001, 1999; Zimmer et al., 1998).” 

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      (1) Figure 1E: What does chambers mean? Additionally, are the 12 chambers equally from the male and female samples (6 from male and 6 from female)?

      We have changed this to well. Each replicate is an individual well from 8 well chamber slide. In all these experiments, the wells are approximately evenly distributed by mouse, because from each mouse we cultured around 8 wells’ worth of DRGs.

      (2) Figure 1D: What does low and high mean in the Hargreaves test?

      These refer to a low and high active intensity of the radiant heat stimulus. Number is now described in the methods. 40 and 55 in the intensity units used by the instrument.

      (3) Figure 2-Suppl Figure 1: Authors should provide a bigger image of the image so that it is clearer to the readers.

      We think the image is of a reasonable size and comparable to the images used elsewhere in the paper.

      (4) Authors should consider labeling their supplementary figures in running numbers or combining supplementary figures together to avoid confusion. For example, Figure 2-Supplementary Figure 1 and Figure 2- Supplementary Figure 2 can be combined as just Supplementary Figure 2.

      We agree with the reviewer this would be clearer, but we have followed eLife’s convention for labelling and numbering supplements.

      (5) Figure 3 is mislabeled as Figure 4.

      Thank you. We have corrected this.

      (6) Only female mice were used in the CFA experiment, which does not go in line with the rest of the results which consist of both sexes.

      We have repeated the experiment with additional male mice. To be consistent with the von frey data, these were followed for 7 days, and so the figure now shows a 7 day time course.

      (7) Typo in line 243. The word "and" is subscript.

      Thank you. We have corrected this.

      (8) There is a typo in the legend for Figure 4 where E is labeled I, G is labeled as F, and J is labeled as J.

      Thank you. We have corrected this.

      (9) Authors should specify what "several weeks" means (Line 263).

      It means three weeks. We tested to 21 days. We will replace with three.

      (10) Authors should specify what "one day" means (Line 267). For example, how many days after the intraplantar oxaliplatin treatment? Also, authors should justify why that specific time point was selected or have a reference for it.

      This means one day after - 24 hours. Please see PMID: 33693512. Two references are provided in them methods.

      (11) Figure 4 legend: authors should again be specific on what "prolonged" entails (Line 277).

      We have replaced prolonged with 30 minutes brushing. Specifically, 3 x 10 min stim period, with 1 min rest between stim. It is in the methods.

      (12) In the methods section, authors state that both male and female mice were used for all experiments. However, only female mice were used in the CFA experiment (see minor comment #6). Authors should verify and correct this.

      This is correct. We only used female mice for one of the groups. We have since repeated with males, now included in the data.

      (13) Authors should be more specific in the methods section on how long the habituation per day, how many days and what were the mice habituation to (experimenter, room, chamber, etc)?

      As noted in the methods, mice are habituated for at least an hour to the chambers, and thus implicitly to the room. We do not perform explicit habituation to the investigator such as repeated handling.

      (14) Authors need to provide more information on the semi-automated procedure they are referring to in Line 397. Also, authors should also provide the criteria for cFos quantification (eg. Intensity, etc). If this has been published before, they should provide the reference.

      We have added this. We used the ‘Find maxima’ and ‘Analyze particles’ functions in FIJI, followed by a manual curation step.

      (15) How much acetone was applied and how was it applied to the paw? (Line 495)

      We used the same applicator (1ml syringe with a well at the top) to generate a droplet of acetone that was used for all mice. This has been added to methods.

      (16) Authors should specify the amount of capsaicin injected in μl (Line 500).

      20 ul. We have added this.

      (17) Authors should explain or reference why they are analyzing the 15 min interval between 5 and 20 minutes for injection (Line507-508).

      Acetic acid behaviour lasts around 30 mins in our hands. We chose the 15 minute interval because it reduces burdensome hand scoring time by 50% versus doing the whole 30 mins. We reasoned that in the first 5 mins post injection the animal behaviour may be contaminated by stress related to handling, injection and return to chamber. Thus, 5 and 20 minutes provided a sensible time-frame for scoring the behavior when it is at its peak.

      (18) Authors have to provide more information/explanation on how they decide on the conditioned taste aversion protocol. Like why they do 30 mins exposure to a single water-containing bottle followed 90 mins exposure to both bottles. If this has been published before, they should provide the reference.

      We read dozens of different published protocols in the literature, and piloted one that was something of an amalgam of some of them with various adaptations of convenience. Because it worked on our first attempt, we stuck to it. The advantage of the CTA assay is it is incredibly robust to changes in the specificities of the paradigm, evincing the clear survival value of learning to avoid tastes that make you sick.

      (19) Authors again should provide more detail in their methods section.

      a. Specify the time frame that they are assessing here (Line 533).

      This can be seen in the Figure. 0 to 120 mins. We have added it to the methods.

      b. How long were the mice allowed to recover post-SNI before mechanical allodynia was assessed (Line 545)?

      This is apparent in the figures. 2 days to 21 days. We have added it to the methods.

      c. How much of the oxaliplatin was injected into the mice?

      40 ug / 40 ul (see PMID:33693512)

      Editors note: Reviewers agreed that addressing the concerns about power, outliers, and statistics, as well as functional validation of CGRPα would raise the strength of evidence to compelling, and inclusion of comparison to single KO would raise it to exceptional.

      Should you choose to revise your manuscript, please check to ensure full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

    1. Author response:

      Regarding a future revised version, we plan to:

      • refer to the "MoMac-VERSE" study according to the original report.

      • modify incorrectly formatted references.

      • modify the text to acknowledge the heterogeneity and variability in the response of primary cells to the GSK3 inhibitor.

      • improve the explanation of the reanalysis of single cell RNAseq data in Figure 7 (ref. 47, GSE120833), and re-adapt the graphs of the scRNA-Seq data using different plot parameters (e.g., reduction = "umap.scvi") to provide a more friendly-user visualization including bona fide macrophage markers for each subpopulation.

      • include statistical analyses in each one of the figure legends

      • perform additional analyses (e.g., dose-response and kinetics of CHIR-99021 effects) and mechanistic studies (e.g., role of proteasome) to further dissect the re-programming ability of the GSK3/MAFB axis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Insects and their relatives are commonly infected with microbes that are transmitted from mothers to their offspring. A number of these microbes have independently evolved the ability to kill the sons of infected females very early in their development; this male killing strategy has evolved because males are transmission dead-ends for the microbe. A major question in the field has been to identify the genes that cause male killing and to understand how they work. This has been especially challenging because most male-killing microbes cannot be genetically manipulated. This study focuses on a male-killing bacterium called Wolbachia. Different Wolbachia strains kill male embryos in beetles, flies, moths, and other arthropods. This is remarkable because how sex is determined differs widely in these hosts. Two Wolbachia genes have been previously implicated in male-killing by Wolbachia: oscar (in moth male-killing) and wmk (in fly male-killing). The genomes of some male-killing Wolbachia contain both of these genes, so it is a challenge to disentangle the two.

      This paper provides strong evidence that oscar is responsible for male-killing in moths. Here, the authors study a strain of Wolbachia that kills males in a pest of tea, Homona magnanima. Overexpressing oscar, but not wmk, kills male moth embryos. This is because oscar interferes with masculinizer, the master gene that controls sex determination in moths and butterflies. Interfering with the masculinizer gene in this way leads the (male) embryo down a path of female development, which causes problems in regulating the expression of genes that are found on the sex chromosomes.

      We would like to thank you for evaluating our manuscript.

      Strengths:

      The authors use a broad number of approaches to implicate oscar, and to dissect its mechanism of male lethality. These approaches include:

      (1) Overexpressing oscar (and wmk) by injecting RNA into moth eggs.

      (2) Determining the sex of embryos by staining female sex chromosomes.

      (3) Determining the consequences of oscar expression by assaying sex-specific splice variants of doublesex, a key sex determination gene, and by quantifying gene expression and dosage of sex chromosomes, using RNASeq.

      (4) Expressing oscar along with masculinizer from various moth and butterfly species, in a silkmoth cell line.

      This extends recently published studies implicating oscar in male-killing by Wolbachia in Ostrinia corn borer moths, although the Homona and Ostrinia oscar proteins are quite divergent. Combined with other studies, there is now broad support for oscar as the male-killing gene in moths and butterflies (i.e. order Lepidoptera). So an outstanding question is to understand the role of wmk. Is it the master male-killing gene in insects other than Lepidoptera and if so, how does it operate?

      Thank you for your comments. Wolbachia strains often carry wmk genes, but as observed in this study, the homologs in Homona showed no apparent MK ability. These showed strong male lethality in D. melanogaster, but it is still unclear whether the genes are the master male-killing gene in Diptera. It is also possible that the genes show toxicities in other lepidopteran insects as well as in other insect taxa. Further functional validation assays in different insects are warranted to clarify whether wmk shows toxicity in different insect taxa. We have also discussed the functions of wmk in the Discussion section (lines 301-306).

      Weaknesses:

      I found the transfection assays of oscar and masculinizer in the silkworm cell line (Figure 4) to be difficult to follow. There are also places in the text where more explanation would be helpful for non-experts (see recommendations).

      Thank you for your suggestion. We have thoroughly revised the manuscript to address all the questions, comments and suggestions you raised in “recommendations”. In particular, we have revised the section on the transfection assays of Oscar and Masc in Bm-N4 cells (result section “Hm-oscar suppresses the masculinizing functions of lepidopteran masc genes” starts on line 214 and Fig. 4; materials and methods section ”Transfection assays and quantification of BmIMP<sup>M</sup>”, starts on line 483). We have also provided more detailed explanations for non-experts in some contexts (in response to your recommendation). We believe that the resulting revisions have significantly improved the quality and comprehensiveness of our manuscript.

      Reviewer #2 (Public review):

      Summary:

      Wolbachia are maternally transmitted bacteria that can manipulate host reproduction in various ways. Some Wolbachia induce male killing (MK), where the sons of infected mothers are killed during development. Several MK-associated genes have been identified in Homona magnanima, including Hm-oscar and wmk-1-4, but the mechanistic links between these Wolbachia genes and MK in the native host are still unclear.

      In this manuscript, Arai et al. show that Hm-oscar is the gene responsible for Wolbachia-induced MK in Homona magnanima. They provide evidence that Hm-Oscar functions through interactions with the sex determination system. They also found that Hm-Oscar disrupts sex determination in male embryos by inducing female-type dsx splicing and impairing dosage compensation. Additionally, Hm-Oscar suppresses the function of Masc. The manuscript is well-written and presents intriguing findings. The results support their conclusions regarding the diversity and commonality of MK mechanisms, contributing to our understanding of the mechanisms and evolutionary aspects of Wolbachia-induced MK.

      We would like to thank you for evaluating our manuscript.

      Strengths/weaknesses:

      (1) The authors found that transient overexpression of Hm-oscar, but not wmk-1-4, in Wolbachia-free H. magnanima embryos induces female-biased sex ratios. These results are striking and mirror the phenotype of the wHm-t infected line (WT12). However, Table 1 lists the "male ratio," while the text presents the "female ratio" with standard deviation. For consistency, the calculation term should be uniform, and the "ratio" should be listed for each replicate.

      We have revised the first results section (Hm-oscar induces female-biased sex ratios, starting from line 147) accordingly to maintain the consistency in the calculation term. In the revised manuscript, the 'male ratio' is now consistently used, in alignment with Fig. 1. In addition, we have included all sex ratio information (number of males and females) in the supplementary data file for transparency and clarity.

      (2) The error bars in Figure 3 are quite large, and the figure lacks statistical significance labels. The authors should perform statistical analysis to demonstrate that Hm-oscar-overexpressed male embryos have higher levels of Z-linked gene expression.

      The large error bar on each chromosome (Fig.3a-d) likely reflect the overall variation in expression levels across different transcripts. Accordingly, we have included statistical data for Figure 3 based on the Steel-Dwass test for expression levels. However, displaying statistical significance directly on the whisker plots would make the figure too cluttered due to the numerous combinations. Instead, we have provided all the statistical data in the supplementary data file. To further support the claim that Z-linked genes are more highly expressed in wHm-t-infected/Hb-Oscar-injected embryos, we have included the expression data for a Z-linked gene tpi, along with its statistical data in the revised manuscript (Fig. 3e, lines 210-212).

      (3) The authors demonstrated that Hm-Oscar suppresses the masculinizing functions of lepidopteran Masc in BmN-4 cells derived from the female ovaries of Bombyx mori. They should clarify why this cell line was chosen and its biological relevance. Additionally, they should explain the rationale for evaluating the expression levels of the male-specific BmIMP variant and whether it is equivalent to dsx.

      Thank you for your suggestion. We selected BmN-4 cell line because previous studies have established it as a reliable model for investigating the functions of lepidopteran masc genes and the interactions between masc and Oscar genes (Katsuma et al., 2019; 2022). In addition, BmIMP<sup>M</sup> is a male-specific regulator of the male-type dsx, making it an ideal target for assessing the 'maleness' induced by transfection of the masc gene in female-derived BmN-4 cells (Suzuki et al., 2010; Katsuma et al., 2015). We have included more detailed background information in the revised manuscript and have thoroughly revised this section (Hm-oscar suppresses the masculinizing functions of lepidopteran masc genes, starting at line 214) and Figure 4 for better clarity.

      (4) Although the authors show that Hm-oscar is involved in Wolbachia-induced MK in Homona magnanima and interacts with the sex determination system in lepidopteran insects, the precise molecular mechanism of Hm-oscar-induced MK remains unclear. Further studies are needed to elucidate how Hm-oscar regulates Homona magnanima genes to induce MK, though this may be beyond the scope of the current manuscript.

      Based on our findings and previous studies in Homona, Ostrinia and Bombyx (Arai et al., 2023a; Katsuma et al., 2023; Kiuchi et al., 2014), we hypothesize that the molecular mechanisms underlying _w_Hm-induced MK are likely linked to impaired dosage compensation caused by the inhibition of Masc function by the Hm-Oscar protein. While the precise mechanisms remain unclear, unbalanced Z-linked gene expression due to the impaired dosage compensation (i.e., 2-fold higher Z-linked gene expression compared to normal males) is known to be lethal for lepidopteran males (Kiuchi et al., 2014; Fukui et al., 2015; Visser et al., 2021). We have outlined this hypothesis in the Discussion section (lines 245-254).

      Reviewer #3 (Public review):

      Summary:

      Overall, this is a clearly written manuscript with nice hypothesis testing in a non-model organism that addresses the mechanism of Wolbachia-mediated male killing. The authors aim to determine how five previously identified male-killing genes (encoded in the prophage region of the wHm Wolbachia strain) impact the native host, Homona magnanima moths. This work builds on the authors' previous studies in which:

      (1) They tested the impact of these same wHm genes via heterologous expression in Drosophila melanogaster.

      (2) They examined the activity of other male-killing genes (e.g., from the wFur Wolbachia strain in its native host: Ostrinia furnacalis moths).

      Advances here include identifying which wHm gene most strongly recapitulates the male-killing phenotype in the native host (rather than in Drosophila), and the finding that the Hm-Oscar protein has the potential for male-killing in a diverse set of lepidopterans, as inferred by the cell-culture assays.

      Strengths:

      Strengths of the manuscript include the reverse genetics approaches to dissect the impact of specific male-killing loci, and the use of a "masculinization" assay in Lepidopteran cell lines to determine the impact of interactions between specific masc and oscar homologs.

      We would like to thank you for evaluating our manuscript.

      Weaknesses:

      My major comments are related to the lack of statistics for several experiments (and the data normalization process), and opportunities to make the manuscript more broadly accessible.

      Thank you for your suggestions. We have thoroughly revised the manuscript to provide clearer explanations for non-experts. In addition, we have included more detailed statistical data for Figure 3 and Figure 4 based on the Steel-Dwass tests. For Figure 3a-d, displaying statistical significance directly on the whisker plots would make the figure too cluttered due to the numerous combinations. Therefore, we have provided all the statistical data in the supplementary data file. To further support the claim that Z-linked genes are more highly expressed in w_Hm-t-infected/Hm-Oscar-injected embryos, we have included the expression data for a Z-linked gene _tpi, along with its statistical data in the revised manuscript (Fig.3e, lines 210-212). Regarding Figure 4, we have revised the Figure based on the reviewer’s suggestions, and provided more detailed information on how the expression data were analyzed (Transfection assays and quantification of BmIMP<sup>M</sup>, lines 495-520). We have also included more detailed background information on the assay system (Hm-oscar suppresses the masculinizing functions of lepidopteran masc genes, lines 215-237). Although we did not observe statistical significance based on the Steel-Dwass test, likely due to limited replicates, the observed changes in the IMP gene expression remain clearly evident.

      The manuscript I think would be much improved by providing more details regarding some of the genes and cross-lineage comparisons. I know some of this is reported in previous publications, but some summary and/or additional analysis would make this current manuscript much more approachable for a broader audience, and help guide readers to specific important findings. For example, a graphic and/or more detail on how the wmk/oscar homologs (within and across Wolbachia strains) differ (e.g., domains, percent divergence, etc) would be helpful for contextualizing some of the results. I recognize the authors discuss this in parts (e.g., lines 223-227), but it does require some bouncing between sections to follow. Similarly, the experiments presented in Figure 4 indicate that Hm-oscar has broad spectrum activity: how similar are the masc proteins from these various lepidopterans? Are they highly conserved? Rapidly evolving? Do the patterns of masc protein evolution provide any hints at how Oscar might be interacting with masc?

      Thank you for your valuable suggestion. To address this, we have included a visualization of the structural differences between the Oscar and wmk homologs in Figure 1a of the revised manuscript. In addition, we have included more detailed information for these genes and revised the introduction (lines 110-114; 124-137) and discussion (lines 255-266) to provide a clearer and more comprehensive overview. We have also described the similarity of the Masc proteins and Oscar proteins that we used, which is now reflected in the revised Figure 4b and 4d. More detailed information on these proteins is available in the supplementary data. Notably, Masc proteins exhibit high sequence variability with conserved domains (Figure 4d). Previous study identified the N-terminal region of Masc as crucial for the Oscar function (Katsuma et al., 2022). The wide spectrum of the actions of Hm-Oscar likely stems from these conserved structures of Masc, but the effects might have undergone evolutionary tuning through interactions with the native host as discussed in lines 293-294.

      It is clear from Figure 1 that the combinations of wmk homologs do not cause male killing on their own. Did the authors test if any of the wmk homologs impact the MK phenotype of oscar? It looks like a previous study tested this in wFur (noted in lines 250-252), but given that the authors also highlight the differences between the wFur-oscar and Hm-oscar proteins, this may be worth testing in this system. Related to this, what is the explanation for why there would be 4 copies of wmk in Hm?

      Thank you for your valuable suggestion. Unfortunately, we have not yet tested the effects of co-expression of wmk and Oscar. Due to a technical issue, the mixing of multiple constructs results in a reduced amount of mRNA (i.e. mixing wmk-3 and Hm-Oscar at the same concentration results in a 2-fold lower concentration in mRNA for both genes compared to mono-injected groups). In addition, we have previously tested injecting mRNA at the twofold higher concentration (i.e. 2 ug/ul mRNA), which resulted in very low hatchability regardless of the genes. Katsuma et al (2022) tested the effect of wmk on the sex determination system, but did not test the effect of co-injection/transfection of wmk and Oscar. Considering the results of this and previous studies (Katsuma et al., 2022; Arai et al., 2023), it is likely that the targets of the wmk and oscar genes are different (as discussed in lines 267-289). Co-injection of wmk and oscar may not produce additive effects. Nevertheless, we would like to test the results in future studies using the Drosophila system as well.

      As you point out, it is an interesting point that the moth-derived MK Wolbachia w_Hm-t encodes four _wmk genes, although they have no apparent effect on host survival. The exact functional relevance of these wmk homologs remains unclear. However, they may play a role in Wolbachia biology as transcriptional regulators, given that they encode HTH domains. Wolbachia generally encode several wmk homologs in their genome, regardless of whether they induce MK. This suggests that the functions of the wmk genes may be 'suppressed' in certain Wolbachia-host systems. The wmk and Hm-oscar genes are located within a prophage region, and some wmk genes are tandemly arrayed (as described in Arai et al., 2023). These wmk homologs may have increased in number by horizontal phage transfer, and the region containing wmk and adjacent sequences may act as a genomic island for virulence. So far, the function of wmk homologs has only been tested in D. melanogaster and H. magnanima, and further studies in other Wolbachia-host systems are highly warranted to test whether wmk exerts MK effects in other insect models. These points have been briefly discussed in the revised manuscript (lines 301-306; 318-320).

      Why are some of the broods male-biased (2/3) rather than ~50:50? (Lines 170-175, Figure 2a). For example, there is a strong male bias in un-hatched oscar-injected and naturally infected embryos, whereas the control uninfected embryos have normal 50:50 sex ratios. It is difficult to interpret the rate of male-killing given that the sex ratios of different sets of zygotes are quite variable.

      The observed male-biased sex ratios in unhatched embryos are due to the occurrence of MK during embryogenesis. In the unhatched groups, the skew towards males reflects that fact that the male embryos were targeted and killed by Wolbachia/Oscar, leading to a surplus of unhatched male embryos. Conversely, hatched individuals show a higher proportion of females because many of the males were eliminated during embryogenesis. Thus, the unhatched embryos are more male-biased, while the hatched individuals are more female-biased in the Hm-oscar/_w_Hm-t treated groups. We have revised the relevant section (Males are killed mainly at the embryonic stage, lines 179-186) and provided more detailed information to clarify this explanation.

      Figure 2b - it appears there are both male and female bands in the HmOsc male lane. I think this makes sense (likely a partial phenotype due to the nature of the overexpression approach), but this is worth highlighting, especially in the context of trying to understand how much of the MK phenotype might be recapitulated through these methods. Related, there is no negative control for this PCR.

      Thank you for your suggestion. As you noted, a faint dsx-M band is visible in the Hm-oscar treated group in Figure 2b. This is consistent with previous findings by Arai et al. (2023), which reported that male embryos with low-density w_Hm-t showed double bands of _dsx-M and dsx-F, similar to what we observed in this study. This information has been included in the revised manuscript in lines 196-198, as follows:

      “Notably, male embryos expressing Hm-oscar also exhibited weak male-type dsx splicing in addition to the female-type splicing, resembling the previously observed pattern in male embryos infected with low-titer _w_Hm-t (Arai et al., 2023a).”

      Also, we appreciate your comment regarding the missing of negative control. The figure has now been revised as we realised that the negative control lane had been lost during the preparation of the figure. We also included the relevant molecular marker information in both the figure legends and Figure 2b.

      It appears the RNA-seq analysis (Figure 3) is based on a single biological replicate for each condition. And, there are no statistical comparisons that support the conclusions of a shift in dosage compensation. Finally, it is unclear what exactly is new data here: the authors note "The expression data of the wHm-t-infected and non-infected groups were also calculated based on the transcriptome data included in Arai et al. (2023a)" - So, are the data in Figure 3c and 3d a re-print of previous data? The level of dosage compensation inferred by visually comparing the control conditions in 3b and 3d does not appear consistent. With only one biological replicate library per condition, what looks like a re-print of previous data, and no statistical comparisons, this is a weakly supported conclusion.

      Thank you for your suggestion. In this study, we generated the RNA-seq data for the Hm-oscar/GFP-injected groups, but did not sequence the w_Hm-t-infected/NSR lines. Instead, the previously generated RNA-seq data of _w_Hm-t-infected/NSR (Arai et al., 2023) were re-analyzed (rather than simply reprinted) to evaluate whether the expression patterns of _Hm-oscar-injected and w_Hm-t-infected groups are similar. We have revised the Results section (_Hm-oscar impairs dosage compensation in male embryos, lines 200-212), the Materials and methods section (Quantification of Z chromosome-linked genes, lines 454-456), and the figure legends to provide more precise information about this analysis.

      Although we did not perform replicates for the RNA-seq comparisons, it is important to note that each RNA-seq sample contains 50-60 male/female individuals. We believe the results are still robust and clearly indicative of the trends we observe. This was further supported by the quantification of Hmtpi gene expression, which we have visualized in Figure 3e (and lines 210-212). As you noted, the expression patterns in Figure 3b (GFP injected) and Figure 3d (NSR) are not completely identical. This discrepancy may be due to the differences between injection treatments and natural infections. Nevertheless, both treatments are consistent in showing that gene expressions on the Z chromosome (Chr01 and Chr15) are not upregulated.

      We have also added more detailed statistical data for Figure 3 based on the Steel-Dwass tests. For Figure 3a-d, however, showing the statistical significance directly on the whisker plots would create excessive clutter due to the numerous combinations of chromosomes. Instead, we have provided the full statistical data in the supplementary data file. Furthermore, to support/strengthen our conclusion that Z-linked genes are highly expressed in w_Hm-t-infected/_Hm-Oscar-injected embryos, we have included expression data for the Z-linked gene tpi, along with statistical data, in the revised manuscript (Fig. 3e, lines 210-212).

      In Figure 4: There are no statistics to support the conclusions presented here. Additionally, the data have gone through a normalization process, but it is difficult to follow exactly how this was done. The control conditions appear to always be normalized to 100 ("The expression levels of BmImpM in the Masc and Hm-Oscar/Oscar co-transfected cells were normalized by setting each Masc-transfected cell as 100"). I see two problems with this approach:

      (1) This has eliminated all of the natural variation in BmImpM expression, which is likely not always identical across cells/replicates.

      (2) How then was the percentage of BmImpM calculated for each of the experimental conditions? Was each replicate sample arbitrarily paired with a control sample? This can lead to very different outcomes depending on which samples are paired with each other. The most appropriate way to calculate the change between experimental and control would be to take the difference between every single sample (6 total, 3 control, 3 experimental) and the mean of the control group. The mean of the control can then be set at 100 as the authors like, but this also maintains the variability in the dataset and then eliminates the issue of arbitrary pairings. This approach would also then facilitate statistical comparisons which is currently missing.

      Thank you for your suggestion. As you pointed out in (1), the previous analysis did indeed eliminate the natural variation in BmIMP-M expression. In the revised manuscript and Figure 4, we have reanalyzed the data following your suggestion and have described the variation across replicates.

      For (2), the data shown in the previous manuscript were normalized to 100 for each Masc-treated group. In doing so, each replicate sample was arbitrarily paired with a control sample from the same cell lot to account for variations that might occur due to differences in cell lots. However, following your recommendation, we have revised the figure to set the average of the Hm-masc treated group to 100, rather than using arbitrary pairings. More detailed normalization procedures have been provided in the section 'Transfection assays and quantification of BmIMP' (lines 483-520). Additionally, we have provided more detailed background information on the assay system in lines 218-223. Although we did not observe statistical significance based on the Steel-Dwass test, likely due to the limited number of replicates, the differences in IMP gene expression between the Masc-treated and Masc&Hm-oscar-treated groups remain evident.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Line 38: change to: 'Wolbachia are maternally transmitted'.

      Revised accordingly (line 38).

      Line 69: remove 'seemingly'.

      Revised accordingly (line 69).

      Paragraph starting line 123: I don't think this is so clear to a reader who is not familiar with the work and system. It would be helpful to more clearly explain that candidate male-killing genes from Wolbachia that infect Homona were inserted into Drosophila melanogaster, and that their expression was then induced, with interesting patterns (and that it can be a bit difficult to interpret the transgenic expression of genes from a moth male-killer that are inserted into a fly). Also, the sentence about the combined action of cifA and cifB in Drosophila cytoplasmic incompatibility is also confusing to a non-expert. I would suggest removing it.

      Thank you for your suggestion. We have revised the paragraph (lines 124-139) to provide clearer background information, making it easier for non-experts to follow. We have also removed the sentence regarding the combined effect of cifA and cifB to improve the flow and overall clarity.

      Line 170: what is the explanation for the male-biased sex ratio instead of 50-50?

      The male-biased sex ratio occurs because MK happens during embryogenesis. Unhatched embryos include males that were killed by Wolbachia/Oscar, resulting in a higher proportion of unhatched male embryos. Conversely, the hatched individuals display a female bias, as most of the males were eliminated during embryogenesis. Thus, the unhatched embryos are more male-biased, while the hatched individuals are more female-biased in the Hm-oscar/_w_Hm-t treated groups. We have revised the section “Males are killed mainly at the embryonic stage” (lines 170-186) to include more detailed information explaining this phenomenon.

      Line 190: please explain what are the Z chromosomes in Bombyx and Homona and Lepidoptera in general (chromosomes 1 and 15?), as this is not so clear for a non-expert.

      Thank you for your suggestion. I have revised the section (lines 200-212) to include more precise background information about the chromosome constitutions in lines 202-204 as follows:

      “Unlike other lepidopteran species, Tortricidae, including H. magnanima, generally possess a large Z chromosome that is homologous to B. mori chromosomes 1 (Z) and 15 (autosome).”

      Line 222: please explain oscar diversity and classification in more detail, as this is not so clear for a non-expert.

      Thank you for your suggestion. We have revised the sentences to provide clearer background information on the diversity of oscar genes (lines 255-264).

      Figure 4: I found this difficult to follow. Why are there 2 rows (HmOscar and Oscar)? Does oscar here refer to oscar from Ostrinia? I am also a bit confused about the baseline control of Masc in these cell lines. If I understand Lepidoptera sex determination, then these cell lines are expressing high levels of female-specific piRNAs that suppress Masc. How specific are these piRNAs (i.e. do Bombyx piRNAs suppress Mascs from other Lepidoptera)? How much extra Masc will override endogenous piRNA? Information is lost by setting Masc expression to 100% in each separate comparison.

      Yes, the Oscar indicates the w_Fur-encoded _oscar (Oscar from Ostrinia) that was tested to compare function with the Homona-derived Hm-oscar gene. In addition, following the reviewer's suggestions, we have revised the figure and included more detailed information on how we adjusted the expressions in the M&M section.

      A previous study (Shoji et al., 2017, RNA 23:86–97) demonstrated that the Fem piRNA (29 bp) in Bombyx mori requires a 17 bp complementary sequence from its 5' region for its function. However, in species other than B. mori, no significant homology (i.e., over 17 bp matches) was found between the B. mori Fem piRNA and the masc genes analyzed in this study. Therefore, it is likely that the Fem piRNA expressed in BmN-4 cells is unable to suppress the masculinizing function driven by masc genes in other lepidopteran species. In addition, we did not quantify the levels of piRNA in this system, but the expression levels of masc are probably too high to be suppressed.

      Figure 4 legend: spelling of Spodoptera.

      Revised accordingly.

      Reviewer #2 (Recommendations for the authors):

      In Figure 2, what is the dsx splicing type for the hatched male in the Hm-oscar-injected group and the wHm-t infected line? Dsx-F or dsx-M?

      Thank you for your suggestion. Unfortunately, we have not tested splicing in the hatched male neonates (1st instar larvae), partly due to difficulties in obtaining sufficient material for RNA extraction. Based on the previous publication in the Ostrinia system, where Oscar-bearing w_Sca induces MK, the hatched males (ZZ) exhibit female type _dsx as observed in the male embryos (Herran et al., 2022). The hatched Homona males may show double bands for dsx-M and dsx-F as observed in this study.

      The size of the markers (in kilobase pairs) should be indicated in Figure 2.

      We have accordingly included the marker information in the revised Figure 2b and the figure legends.

      In Figure 3, could the authors identify which genes exhibit higher expression levels in the Hm-oscar-injected group and the wHm-t infected line? Could they provide hints for the possible mechanism of male-killing?

      In the RNA-seq data shown in Figure 3a-d, we observed that both the Hm-oscar-injected and w_Hm-infected groups generally exhibited upregulated expression of Z-linked genes. Rather than the upregulation or downregulation of a specific gene, we consider that global upregulation of Z-linked genes, caused by improper dosage compensation, is lethal for males. The Z chromosome contains various genes involved in key biological processes such as endocrine function and detoxification, and disruption of these processes may contribute to male lethality. Additionally, in this revised manuscript, we have provided more detailed information on the expression level of the Z-linked gene _tpi. We have also discussed the potential mechanisms of MK in the Discussion section (lines 245-254).

      The format of the references should be consistent. Gene and species names should be italicized.

      We have accordingly formatted.

      Reviewer #3 (Recommendations for the authors):

      The authors use the term "upstream" (e.g., Oscar suppressed the function of masculinizer, the upstream male sex determinant...), which was sometimes confusing. In many cases, it reads as though the masculinizer was upstream of oscar, but what I think the authors are trying to convey is that masculinizer is a primary sex-determining factor.

      Thank you for your suggestion. We have accordingly revised the term.

      Line 101: which insect is wFur from?

      It is from Ostrinia furnacalis - line 104 has been revised.

      Figure 1: it would be helpful to indicate the statistical results on the figure.

      Accordingly, we have added statistical data (binominal test) for Figure 1. The data for the Steel-Dwass test have been included in the supplementary data.

      Figure 2b: please label the ladder on the gel.

      Thank you for your suggestion. We have accordingly labeled the DNA ladder on the gel.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      The paper by Auer et. makes several contributions: (1) The study developed a novel approach to map the microstructural organization of the human amygdala by applying radiomics and dimensionality reduction techniques to high-resolution histological data from the BigBrain dataset. (2) The method identified two main axes of microstructural variation in the amygdala, which could be translated to in vivo 7 Tesla MRI data in individual subjects. (3) Functional connectivity analysis using resting-state fMRI suggests that microstructurally defined amygdala subregions had distinct patterns of functional connectivity to cortical networks, particularly the limbic, frontoparietal, and default mode networks. (4) Meta-analytic decoding was used to suggest that the superior amygdala subregion's connectivity is associated with autobiographical memory, while the inferior subregion was linked to emotional face processing. (5) Overall, the data-driven, multimodal approach provides an account of amygdala microstructure and possibly function that can be applied at the individual subject level, potentially advancing research on amygdala organization.

      We thank the Reviewer for the positive comments and insightful evaluation of the work.

      (1.1) Although these are meritorious contributions there are some concerns that I will summarize below. The paper makes little-to-no contact with the monkey literature regarding the anatomy of amygdala subregions, their functionality, and their patterns of anatomical connectivity. This is surprising because such literature on non-human primates is a very important starting point for understanding the human amygdala. I recommend taking a careful look at the work by Helen Barbas, among others. There are too many papers to cite but a notable example is: Ghashghaei, H. T., Hilgetag, C. C., & Barbas, H. (2007). Sequence of information processing for emotions based on the anatomic dialogue between prefrontal cortex and amygdala. Neuroimage, 34(3), 905-923. The work of Amaral is also highly relevant.

      As suggested, we included the important work of Amaral et al. as well as Ghashghaei et al. highlighting its contribution to mapping the intricate anatomy and function of the amygdala in non-human primates. We comment on this in the Introduction of the manuscript. Please see P.3.

      “Early research on the amygdala in non-human primates has been instrumental in understanding its intricate structure, function and patterns of anatomical connectivity (Amaral and Price 1984; Ghashghaei et al. 2007). This foundational study highlights the amygdala’s different subdivisions, most notably the basomedial nucleus (BM), basolateral nucleus (BL), and central nucleus (Ce) (Amaral et al. 1992). Furthermore, this work describes a dense network between these subdivisions and the prefrontal cortex, most strongly found in the posterior orbitofrontal and anterior cingulate areas.”

      (1.2) Furthermore, the authors subscribe to a model with LB, CM, and SF sectors. How does the SF sector relate to monkey anatomy?

      The overall organization of these subregions is largely conserved between humans and monkeys, reflecting their evolutionary relationship. While the basic subregional organization is conserved, there are still some important structural and functional differences between human and monkey amygdalae. For example, the SF subregion, often described in humans includes parts of the cortical nuclei (VCo), anterior amygdaloid area (AAA), amygdalohippocampal transition area (AHi), amygdalopiriform transition area (APir) as well as the lateral olfactory tract (LOT). This remark was added in the Discussion, on P.12:

      “However, this region has been previously described as consisting of three main subdivisions: LB, CM, and SF, each composed of smaller subnuclei with distinct connectivity patterns and functions (Amunts et al. 2005; Ball et al. 2007; Bzdok et al. 2013; de Olmos and Heimer 1999). These subregions are largely conserved between humans and monkeys, reflecting their evolutionary relationship. However, there are still some considerable differences such as in the SF subregion, where its description in monkeys additionally contains the lateral olfactory tract (LOT) (De Olmos 1990).”

      (1.3) The authors use meta-analytical decoding via NeuroSynth. If the authors like those results of course they should keep them but the quality of coordinate reporting in the literature is insufficient to conclude much in the context of amygdala subregion function in my opinion. I believe the results reported are at most "somewhat suggestive".

      We agree with the Reviewer that use of data from NeuroSynth poses unique challenges, particularly relating to investigations of a small structure such as the amygdala. However, to clarify, these analyses decode the cortex-wide functional connectivity patterns of amygdala subregions and not activations within subregions defined by our microanatomical analyses. Additionally, comments from Reviewer 2 suggested expanding the NeuroSynth decoding to the contralateral hemisphere. As such, we decided to keep this analysis in the main manuscript but rephrase the interpretation of these findings in the Discussion to emphasize their exploratory nature on P.13:

      “Functional decoding of subregional functional connectivity patterns indicated possible dissociations in cognitive (e.g., memory) and affective (e.g., emotional face processing) functions of the amygdala, echoing previous accounts of this region’s involvement in associative processing of emotional stimuli. Notably, these findings link the functional connectivity profile of a subregion partially co-localizing with LB to emotional face processing. The LB subregion has been previously linked to associative processing related to the integration of sensory information (Bzdok et al. 2013; Ghods-Sharifi, St Onge, and Floresco 2009; Pessoa 2010; Winstanley et al. 2004; Boyer 2008), which is consistent with the association with visual emotional information processing identified in the present work.”

      (1.4) Another significant concern has to do with the results in Figure 3. The red and yellow clusters identified are quite distinct but the differences in functional connectivity are very modest. Figure 3C reveals very similar functional connectivity with the networks investigated. This is very surprising, and the authors should include a careful comparison with related findings in the literature. Overall, there is limited comparison between the observed results and those obtained via other methods. On a more pessimistic note, the results of Figure 3 seem to question the validity of the general approach.

      We agree with the Reviewer that we can indeed observe considerable overlap between functional connectivity profiles of amygdala subregions. The amygdala is a relatively small structure, leading to likely interconnectivity between its subregions (Bzdok et al. 2013) in addition to considering BOLD signal autocorrelation within this region. In addition, functional signals in the amygdala are affected by relatively lower signal-to-noise ratio (SNR), a limitation extending to temporobasal and mesiotemporal regions. Despite these challenges, our technique remained sensitive to detect subtle differences in connectivity patterns even in this small group of subjects in this restricted subcortical territory.

      In the revised manuscript, we further highlight these caveats in the Discussion (P.13):

      “Although these findings are promising, we also observe considerable overlap between functional connectivity networks of both our defined subregions. Indeed, the amygdala is a relatively small structure, leading to likely interconnectivity between its subregions and locally high signal autocorrelation. Functional connectivity and microstructure in the amygdala are certainly related, however previous work suggests they do not perfectly overlap (Bzdok et al. 2013). In addition, this region is affected by relatively low signal-to-noise ratio (SNR), as is observed in broader temporobasal and mesiotemporal territories.”

      (1.5) Some statements in the Discussion feel unwarranted. For example, "significant dissociation in functional connectivity to prefrontal structures that support self-referential, reward-related, and socio-affective processes." This feels way beyond what can be stated based on the analyses performed.

      We agree that this interpretation may reach beyond the analyses performed and reported findings. We have adjusted this portion of the text accordingly in our Discussion on functional connectivity findings (P.13):

      “Qualitatively, we found that the subregion defined by the highest 25% of U1 values mainly overlapped with what is commonly defined as the superficial and centromedial subregions, whereas the lowest 25% U1 values subregion overlapped mostly with the laterobasal division. Interestingly, CM and SF characterized subregions showed significantly stronger functional connectivity to prefrontal structures. This finding aligns with previous work demonstrating unique affiliations between the CM subregion and anterior cingulate and frontal cortices (Kapp, Supple, and Whalen 1994; Barbour et al. 2010), as well as between the SF subregion and the orbitofrontal cortex (Goossens et al. 2009; Caparelli et al. 2017; Pessoa 2010; Klein-Flügge et al. 2022).”

      Additionally, we have also edited our Discussion to ensure that our interpretations are grounded in the analyses conducted, while framing the findings as potential avenues for future work. Please see P.13.

      “Functional decoding of functional connectivity results indicated possible dissociations in cognitive (e.g., memory) and affective (e.g., emotional face processing) functions of the amygdala, echoing previous accounts of this region’s functional specialization and subregional segregation of associative processing of emotional stimuli.”

      Recommendations for the authors:

      (1.6) Figure 1 has panels A-I but only A-D are discussed in the caption. The orientation of the slices is not indicated which makes it very hard to follow for most readers.

      The subpanels are now referred to in the revised Results. We also added a notation on the orientation of the slices and described them accordingly in our Figure 1 description. (P.5-6):

      “(A) The amygdala was segmented from the 100-micron resolution BigBrain dataset using an existing subcortical parcellation (Xiao et al. 2019). Slice orientation is consistent across all panels in this figure.”

      (1.7) Some figure references in the text seem to be incorrect; please check that the text refers to the correct figure number and panel.

      We thank the Reviewer for pointing this out. We thoroughly revised the correspondence between figure panel labels and their referencing in the text.

      Reviewer #2:

      This study bridges a micro- to macroscale understanding of the organization of the amygdala. First, using a data-driven approach, the authors identify structural clusters in the human amygdala from high-resolution post-mortem histological data. Next, multimodal imaging data to identify structural subunits of the amygdala and the functional networks in which they are involved. This approach is exciting because it permits the identification of both structural amygdalar subunits, and their functional implications, in individual subjects. There are, however, some differences in the macro and microscale levels of organization that should be addressed.

      Strengths:

      The use of data-driven parcellation on a structure that is important for human emotion and cognition, and the combination of this with high-resolution individual imaging-based parcellation, is a powerful and exciting approach, addressing both the need for a template-level understanding of organization as well as a parcellation that is valid for individuals. The functional decoding of rsfMRI permits valuable insight into the functional role of structural subunits. Overall, the combination of micro to macro, structure, and function, and general organization to individual relevance is an impressive holistic approach to brain mapping.

      We thank the Reviewer for their constructive and helpful feedback on our work.

      Weaknesses:

      (2.1) UMAP 1, as calculated from the histological data, appears to correlate well across individuals, and decently with the MRI data, although the medial-lateral coordinate axis is an outlier. UMAP 2, on the other hand, does not appear to correlate well with imaging data or across individuals. This does pose a problem with the claim that this paper bridges micro- and macroscale parcellations. One might certainly expect, however, that different levels of organization might parcellate differently, but the authors should address this in the discussion and offer ways forward.

      Data driven methods hold several advantages for the quantitative extraction of signal from the underlying data in an observer-independent manner. However, these techniques are also sensitive to potential idiosyncrasies in the data. In the present work, our main analyses rely on the processing of a histological dataset (BigBrain) providing a unique opportunity for high-resolution analysis of amygdala histology and in vivo translation of findings leveraging ultra-high field MRI (n=10). However, both datasets are limited by their small sample size (n=1 for BigBrain and n=10 for MICA-PNI). As a result, we speculate that signal variations captured by U2 may be sensitive to artifacts or subject-specific sources of variance. Moving forward, this hypothesis could be assessed in future work via the analysis of larger histological and neuroimaging datasets to better track recurring features picked up by U2 or the association of these unique topographies with behavioural markers.

      As suggested, we included a section in our Discussion highlighting this shortcoming and the importance for larger datasets moving forward. Please see P.11-12.

      “However, it is important to note that both datasets analyzed in this work are limited by their small sample size (n=1 for BigBrain and n=10 for MICA-PNI). We speculate that the signal variations captured by U2 may be sensitive to artifacts or subject-specific sources of variance, potentially explaining why it was not consistent between subjects and modalities. Moving forward, this hypothesis could be assessed in future work via the analysis of larger histological and neuroimaging datasets to better track recurring features picked up by U2 or the association of these unique topographies with behavioural markers.”

      (2.1) It would be interesting to see functional decoding for the right amygdala. This could be included in the supplementary material. A discussion of differences in the results in the two hemispheres could be illuminating.

      In accordance with the Reviewer’s suggestion, we added Supplementary figure S2 exploring the decoding of connectivity profiles of the right amygdala stratified by its cytoarchitectural embedding with UMAP.

      Upon analysis, dissociation in functional connectivity patterns over the right amygdala were less evident, leading to overall similar functional decoding across the two clusters. We refer to this Supplementary Figure in our Discussion on P.13.

      “For the right amygdala, dissociation in functional connectivity patterns were more subtle, leading to overall similar functional decoding across the two clusters. (Figure S2)”

      (2.3) The authors acknowledge that this mapping matches some but not all subunits that have been previously described in the amygdala. It would be helpful to neuroanatomists if the authors could discuss these differences in more detail in the discussion, to identify how this mapping differs and what the implications of this are.

      In our work, we focus on mapping the three well characterized amygdala subregions, specifically the superficial (SF), centromedial (CM) and laterobasal (LB) subdivisions. Qualitative histological accounts have indeed delineated multiple subunits within these subregions which we now describe in the revised manuscript. Due to the lower resolution of in vivo MRI data used in this work relative to post mortem histology, we focused our analyses on larger subregions that could be more reliably mapped to native quantitative T1 spaces of each participant. We now overview this issue in the Discussion. Please see P.12.

      “Although qualitative histological accounts have indeed delineated multiple subunits within these general regions, the present work focuses on three subdivisions (Amunts et al. 2005) to account for resolution disparities when translating our findings to in vivo MRI data. The LB subdivision includes the basomedial nucleus (Bm), basolateral nucleus (BL), lateral nucleus (LA) and paralaminar nucleus (PL). Moving medially, the CM subdivision includes the central (Ce) and medial nuclei (Me), while the SF subdivision includes the anterior amygdaloid area (AAA), amygdalohippocampal transition area (AHi), amygdalopiriform transition area (APir), and ventral cortical nucleus (VCo) (Heimer et al. 1999). However, disagreement on the precise attribution of nuclei to broader subdivisions motivated our investigations of probabilistic subunits of the amygdala (Kedo et al. 2018). The development of new tools to segment amygdala subnuclei in vivo offers opens opportunities for future work to further validate our framework at the precision of these nuclei within subjects (Saygin et al. 2017).”

      (2.4) The acronym UMAP is not explained. A brief explanation and description would be useful to the reader.

      We moved the expanded acronym from the Methods to the first instance of the term UMAP in our paper, found in the Introduction. As suggested, we also added a sentence describing the technique. Please see P.6.

      “We then applied Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction technique that preserves the local and global structure of high-dimensional data by projecting it into a lower-dimensional space (Becht et al. 2018), to the resulting 20-feature matrix to derive a 2-dimensional embedding of amygdala cytoarchitecture (Figure 1D).”

    1. Author response:

      Reviewer 1:

      (1) Reward Interpretation and Skin Conductance Responses (SCR):

      The reviewer raises a valid point, as the model from which we derive prediction errors describes predictive learning—specifically, the occurrence of shock—without incorporating additional reward learning effects. SCRs are used to fit the model’s hyperparameters but do not directly measure reward; rather, they serve as a marker of arousal.

      In our paradigm, SCRs are measured during CS presentation and primarily reflect predictive learning, as they are closely linked to contingency awareness. The association between estimated prediction errors during unexpected US omissions and reward remains reliant on existing literature.

      In the revised manuscript, we will further elaborate on these points to clarify the distinction between predictive learning and direct reward processing, while contextualizing our findings within the broader literature on reward signaling and fear extinction.

      (2) Reinforcement Agent and SCR Modeling:

      Notably, we do not use SCR as a personalized expectation measure due to its limited reliability at the individual level; instead, the model's hyperparameters are fitted on the entire SCR dataset, yielding per-trial prediction and prediction error estimates for each CS sequence rather than for individual participants.

      (3) Clarity and Visualization of Results:

      We recognize that the presentation of our results can be improved and will take steps to enhance figure clarity, also ensuring that trend-level results are clearly distinguished.

      (4) Theoretical Context for Paradigm Phases:

      Regarding the differences across experimental phases, we recognize the theoretical significance of these distinctions. However, our primary focus is on identifying commonalities in unexpected US omission responses across phases rather than emphasizing phase-specific differences. Nevertheless, we will provide a brief clarification on phase differences to enhance the manuscript’s interpretability.

      (5) Cerebellum-VTA Connectivity Analysis:

      Furthermore, we acknowledge that our conclusion regarding the modulation of the dopaminergic system by the cerebellum should be framed more cautiously. We will temper our claims to better reflect the bidirectional and potentially indirect nature of cerebellum-VTA interactions. Additionally, we plan to include PPI results using a cerebellar seed showing the VTA, potentially in the supplementary material.

      Reviewer 2:

      (1) Success of extinction learning based on Self-reports and SCRs?

      The reviewer points to a problem, which is inherent to extinction learning: The initial fear association is not erased, but merely inhibited, and is prone to return. Although the recall phase follows the extinction phase, we did not expect a complete inhibition of the conditioned response; instead, spontaneous recovery is expected. In fact, the spontaneous recovery observed in the recall phase provided us with an additional opportunity to investigate unexpected US omissions, which was our primary focus.

      (2) Concerns on reliability of event-based contrasts using three events:

      Regarding concerns about the reliability of analyses based on three events, we believe that the consistency of our parametric modulation analysis— which incorporates all events— combined with the three-event analysis results, provides further support for the observed patterns. We are currently discussing ways of additional analysis for further verification of the reliability of using three events.

      (3) Deviations from preregistration:

      Finally, we will carefully review all deviations from our preregistration to ensure transparency. Any methodological or analytical changes will be explicitly addressed in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In the present work the authors explore the molecular driving events involved in the establishment of constitutive heterochromatin during embryo development. The experiments have been carried out in a very accurate manner and clearly fulfill the proposed hypotheses.

      Regarding the methodology, the use of: i) an efficient system for conversion of ESCs to 2C-like cells by Dux overexpression; ii) a global approach through IPOTD that reveals the chromatome at each stage of development and iii) the STORM technology that allows visualization of DNA decompaction at high resolution, helps to provide clear and comprehensive answers to the conclusion raised.

      The contribution of the present work to the field is very important as it provides valuable information on chromatin-bound proteins at key stages of embryonic development that may help to understand other relevant processes beyond heterochromatin maintenance.

      The study could be improved through a more mechanistic approach that focuses on how SMARCAD1 and TOPBP1 cooperate and how they functionally connect with H3K9me3, HP1b and heterochromatin regulation during embryonic development. For example, addressing why topoisomerase activity is required or whether it connects (or not) to SWI/SNF function and the latter to heterochromatin establishment, are questions that would help to understand more deeply how SMARCAD1 and TOPBP1 operate in embryonic development.

      We would like to thank the reviewer for the positive evaluation of our work and the methodology we employed. We greatly appreciated the reviewer’s recognition of our study to “provide valuable information on chromatin-bound proteins at key stages of embryonic development that may help to understand other relevant processes beyond heterochromatin maintenance”. While we acknowledge the value of including mechanistic studies, such an addition would require a substantial amount of experimental work that exceeds our current resources.

      Reviewer #1 (Recommendations For The Authors):

      In my opinion, the authors could improve the study by deciphering -to a certain extent- the possible mechanism by which SMARCAD1 and TOPBP1 are cooperating in their system to establish H3K9me3 and consequently heterochromatin; and whether it is different (or not) from that already reported in yeast (ref 27). In fact, is it only SMARCAD1 that participates in this process or the whole SWI/SNF complex? Could the lack of SMARCAD1 compromise the proper assembly of the SWI/SNF complex? In this regard, a model describing the main findings of the study and the discussion of the possible mechanisms involved -based on the current bibliography- would be appreciated. This, although speculative, would illustrate the range of possibilities that could be operating in the maintenance of heterochromatin during embryonic development. In conclusion, it would be great if the authors could link -mechanistically- the dots connecting SMARCD1, TOPBP1, H3K9me3/HP1/heterochromatin.

      As suggested by the reviewer and to enrich the discussion, we have included some additional sentences and references in the revised discussion section.

      As a minor point, In Figure 3A, left panel it appears that the protein precipitating with H3K9me3 reacts with TOPBP1 but its molecular weight does not exactly match to the TOPBP1 band found in the input. The authors should clarify this point and it is also recommended that IPs and inputs are run in the same gel. Please replace Figure 3A right panel.

      Following the reviewer’s suggestion and to improve the reading flow, we have restructured the order of the figures and removed the original Figure 3A. The revised Figure 3A-C panel illustrates the SMARCAD1 association with H3K9me3 in ESCs and 2C- cells, while capturing the reduced SMARCAD1-H3K9me3 association in 2C<sup>+</sup> cells.

      Reviewer #2 (Public Review):

      The manuscript by Sebastian-Perez describes determinants of heterochromatin domain formation (chromocenters) at the 2-cell stage of mouse embryonic development. They implement an inducible system for transition from ESC to 2C-like cells (referred to as 2C<sup>+</sup>) together with proteomic approaches to identify temporal changes in associated proteins. The conversion of ESCs to 2C<sup>+</sup> is accompanied by dissolution of chromocenter domains marked by HP1b and H3K9me3, which reform upon transition back to the 2C-like state. The innovation in this study is the incorporation of proteomic analysis to identify chromatin-associated proteins, which revealed SMARCAD1 and TOPBP1 as key regulators of chromocenter formation.

      In the model system used, doxycycline induction of DUX leads to activation of EGFP reporter regulated by the MERVL-LTR in 2C<sup>+</sup> cells that can be sorted for further analysis. A doxycycline-inducible luciferase cell line is used as a control and does not activate the MERVL-LTR GFP reporter. The authors do see groups of proteins anticipated for each developmental stage that suggest the overall strategy is effective.

      The major strengths of the paper involve the proteomic screen and initial validation. From there, however, the focus on TOPBP1 and SMARCAD1 is not well justified. In addition, how data is presented in the results section does not follow a logical flow. Overall, my suggestion is that these structural issues need to be resolved before engaging in comprehensive review of the submission. This may be best achieved by separating the proteomic/morphological analyses from the characterization of TOPBP1 and SMARCAD1.

      We appreciate the reviewer’s positive evaluation of our inducible system to trigger the transition from ESCs to 2C-like cells, and the strength of the chromatin proteomics we conducted. In response to the reviewer’s suggestion, we have reorganized the order of the figures, particularly Figure 1 and Figure 2, and revised the text to improve readability and flow.

      Reviewer #2 (Recommendations For The Authors):

      There are some very interesting components to the study but, as noted, the narrative requires changes and the rationale for focusing on TOPBP1 and SMARCAD1 is not strong at present. Specific comments are noted below

      (1) Inclusion of authentic 2C cells for comparative chromocenter analysis (or at least a more fulsome discussion of how the system has been benchmarked in previous studies).

      We have included more detail in the revised methods section, in the “Cell lines and culture conditions” paragraph. We have added: “The Dux overexpression system was benchmarked according to previously reported features. Dux overexpression resulted in the loss of DAPI-dense chromocenters and the loss of the pluripotency transcription factor OCT4 (fig. S1E) (6, 7), upregulation of specific genes of the 2-cell transcriptional program such as endogenous Dux, MERVL, and major satellites (MajSat) (fig. S1F) (6, 7, 11, 26, 58), and accumulation in the G2/M cell cycle phase (fig. S1G), with a reduced S phase consistent in several clonal lines (fig. S1H) (15).”

      (2) In Figure 1A, the text indicates a loss of chromocenters, but it may be better described as decompaction because the DAPI/H3K9me3 staining shows diffuse/expanded structures (this is in fact how it is described in relation to Figure 2).

      We have changed the text accordingly, now describing it as “decompaction”.

      (3) Table S1 has 6 separate tabs but these are not specified in the text. It would be useful to separate the 397 proteins unique to Luc and 2C- cells since they form much of the basis for the remaining analysis. This approach also assumes it is the absence of a protein in the 2C<sup>+</sup> that accounts for the lack of chromocenters (noting there are 510 proteins unique to the 2C<sup>+</sup> state that are not discussed).

      We have referenced the supplementary table as Table S1 in the text for simplicity. It includes: Table S1A - List of Protein Groups identified by mass spectrometry in -EdU, Luc, 2C- and 2C<sup>+</sup> cells; Table S1B - Input data for SAINT analysis; Table S1C - SAINT results of the comparison 2C- vs Luc and 2C<sup>+</sup> vs Luc; Table S1D - SAINT results of the comparison Luc vs 2C- and 2C<sup>+</sup> vs 2C-; Table S1E - SAINT results of the comparison Luc vs 2C<sup>+</sup> and 2C- vs 2C<sup>+</sup>; and Table S1F - Total number of PSM per protein in the different cells and conditions tested.

      (4) Since there is no change in H3K9me3 levels, loss of SUV420H2 from 2C<sup>+</sup> chromatin (figure 1G) coupled with potential changes in H4K20me3 could contribute the morphological differences. SUV420H2 is known to regulate chromocenter clustering in a way the requires H4K20me3 but this is not addressed or cited (PUBMED: 23599346).

      As suggested by the reviewer, we have added additional sentences and references in the revised manuscript.

      (5) In Figure 1C, there does appear to be overlap between the 2C<sup>+</sup> and 2C- populations (while the Luc population is distinct) even though they are morphologically distinct when imaged in Figure 2A. The 2C- cells are thought to be an intermediate, low Dux expressing population.

      Chromatome profiling through genome capture provides a snapshot of the chromatin-bound proteome in the analyzed samples (shown in revised Fig. 2B). As indicated by the reviewer and previously reported in the literature, 2C- cells are an intermediate population before reaching 2C<sup>+</sup> cells. For this study, we have focused on H3K9me3 morphological changes. Even though 2C- and 2C<sup>+</sup> cells are distinct with respect to H3K9me3 morphology (shown in revised Fig. 1B), analysis of the chromatome data from hundreds of chromatin-bound proteins revealed some overlap between these two populations. However, replicates from the same population tend to cluster together, for example, 2C<sup>+</sup> rep1 and 2C<sup>+</sup> rep3, and 2C- rep1 and 2C- rep2. Collectively, these data suggest that a defined subset of coordinated changes in the chromatome likely triggers the transition from 2C- to 2C<sup>+</sup> cells. Further experimental investigation of the chromatome dataset during the 2C-like transition would be interesting, however, we believe it is beyond the scope of this study.

      (6) Data with SUV39H1 and 2 is difficult to accommodate; what about other H3K9 methyltransferases or proteins such as TRIM28 (KAP1) and SETDB1 (this comes up in the discussion but is not assessed in the results section).

      We agree that investigating the role of TRIM28 (KAP1) and SETDB1 in this experimental setting could be of interest, however, we believe that these experiments go beyond the scope of the presented study.

      (7) Rationale for choosing TOPBP1 needs to be improved. How do TOPBP1 levels relate to TOPI/TOP2A/TOP2B levels across the 3 cell populations? By what criteria does topoisomerase inhibitor treatment increase 2C<sup>+</sup> like cells? Moreover, to what extent will inhibiting topoisomerases lead to global heterochromatin and cell cycle changes regardless of cell type.

      Following the reviewer’s suggestion, we have included some additional references throughout the text to strengthen our rationale for selecting TOPBP1, given its well-established critical role in DNA replication and repair. Additionally, we have revised the results and discussion sections to include new sentences that propose a potential mechanism by which topoisomerase inhibitors may indirectly recruit TOPBP1 to facilitate DNA repair, ultimately leading to an increase in 2C<sup>+</sup> cells.

      (8) Likewise, the decision to look at SMARCAD1 based solely on its interaction with TOPBP1 seems somewhat arbitrary and it did not seem to come up as of interest in the iPOTD analysis. Moreover, they were not able to validate the interaction with their own analyses.

      We have revised the text to clarify the connection further.

      (9) The flow of results is confusing. The first section concludes with a focus on TOPBP1 and SMARCAD1, then progresses to morphological characterization of heterochromatin regions in the next two sections before returning to TOPBP1 and SMARCAD1. It seems like it would make more sense to describe the model system and morphological characterization at the beginning of the results section and then transition to the proteomic analysis and characterization of TOPBP1 and SMARCAD1 (with the expectation that the rationale be improved).

      As suggested by the reviewer, we have reordered the figures, particularly Figure 1 and Figure 2, and rephased the text to improve the overall reading flow.

      (10) There has been considerable work done on characterizing chromatin structure, epigenetic changes, and morphology during early embryonic development. It is therefore difficult to see what validating some of these changes in the inducible model is adding much in the way of new knowledge. It may, but this is not articulated in the current text.

      As detailed before, we have rephrased the text to improve the overall reading flow, which we hope has improved the understanding of the impact of our results.

      (11) It is difficult to disentangle broader effects of both TOPBP1 and SMARCAD1 from those described here; they may induce phenotypes, but these may not be unique to this model system.

      We agree with the reviewer, but to address this point would require additional experiments which would go beyond the scope of the presented study.

      (12) One of the issues with this assay is global chromatin recovery; it is not focused on heterochromatin compartments. The statement "We identified a total of 2396 proteins, suggesting an efficient pull-down of chromatin-associated factors (fig. S2D and Table S1)" does not demonstrate efficiency. Additional functional annotation would be required to establish this claim, including what fraction are known chromatin-associated proteins (with a focus on the heterochromatin compartment).

      We have changed the text accordingly. The resulting statement reads as: “We identified a total of 2396 proteins, suggesting an effective pull-down of putative chromatin-associated factors (fig. S2D and Table S1)”.

      Reviewer #3 (Public Review):

      The manuscript entitled "SMARCAD1 and TOPBP1 contribute to heterochromatin maintenance at the transition from the 2C-like to the pluripotent state" by Sebastian-Perez et al. adopted the iPOTD method to compare the chromatin-bound proteome in ESCs and 2C-like cells generated by Dux overexpression. The authors identified 397 chromatin-bound proteins enriched only in ESC and 2C- cells, among which they further investigated TOPBP1 due to its potential role in controlling chromocenter reorganization. SMARCD1, a known interacting protein of TOPBP1, was also investigated in parallel. The authors observed increased size and decreased number of H3K9me3-heterochromatin foci in Dux-induced 2C<sup>+</sup> cells. Interestingly, depletion of TOPBP1 or SMARCD1 also led to increased size and decreased number of H3K9me3 foci. However, depletion of these proteins did not affect entry into or exit from the 2C-like state. Nevertheless, the authors showed that both TOPBP1 and SMARCD1 are required for early embryonic development.

      Although this manuscript provides new insights into the features of 2C-like cells regarding H3K9me3-heterochromatin reorganization, it remains largely descriptive at this stage. It does not provide new insights into the following important aspects: 1) how SMARCD1 associates with H3K9me3 and contributes to heterochromatin maintenance, 2) how TOPBP1 regulates the expression of SMARCD1 and facilitates its localization in heterochromatin foci, 3) whether the remodelling of chromocenter is causally related to the mutual transitions between ESCs and 2C-like cells. Furthermore, some results are over-interpreted. Additional experiments and analyses are needed to increase the strength of mechanistic insights and to support all claims in the manuscript.

      We would like to thank the reviewer for their positive and thorough evaluation of our manuscript. We have revised the text and hope that the overall flow is now clearer. Moreover, while we acknowledge the value of including mechanistic studies, such an addition would require a substantial amount of experimental work that exceeds our current resources. 

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Fig.2: the DNA decompaction of the chromatin fibers shown in 2C<sup>+</sup> cells may be more related to a relaxed 3D chromatin conformation (Zhu, NAR 2021; Olbrich, Nat Commun 2021) than chromatin accessibility. The authors should discuss this point.

      As suggested by the reviewer, we have included some additional sentences and references in the revised manuscript to address this concern.

      (2) Chemical inhibition of topoisomerases resulted in an increase in the percentage of 2C<sup>+</sup> cells. Does depletion of TOPBP1 also resulted in increased percentage of 2C<sup>+</sup> cells? Please include this result in Fig. 3E. Additionally, it should be noted that DDR and p53 have been reported to activate Dux (Stashpaz, eLife 2020; Grow, Nat Genet 2021), and thus, may contribute to the increased percentage of 2C<sup>+</sup> cells observed upon topoisomerase inhibition. This point should be discussed in the manuscript.

      To address this concern, we have included some additional sentences and references in the revised manuscript.

      (3) Fig 3A: the TOPBP1 band in the IP sample is questionable, and therefore the conclusion that TOPBP1 is associated with H3K9me3 is difficult to draw from Fig 3A. Additionally, the authors mentioned that association of TOPBP1 and SMARCAD1 is undetected in ESCs, likely due to the suboptimal efficiency of available antibodies. As these are key conclusions in this study, the authors are suggested to try other commercially available TOPBP1 antibodies (e.g., Abcam #ab-105109, used by ElInati, PNAS 2017) or knock-in tags to perform the co-IP experiment.

      Following the reviewer’s suggestion and to improve the reading flow, we have restructured the order of figures and removed the original Figure 3A. The revised Figure 3A-C panel illustrates the SMARCAD1 association with H3K9me3 in ESCs and 2C- cells, while capturing the reduced SMARCAD1-H3K9me3 association in 2C<sup>+</sup> cells.

      (4) Fig. 3C-D, Fig. S3D: the authors claimed reduction of both SMARCAD1 expression and its co-localization with H3K9me3 foci in 2C<sup>+</sup> cells, but did not perform mechanistic studies. It is important to know if TOPBP1 expression also decreases in 2C<sup>+</sup> cells. Additionally, it is unclear if the reduced co-localization of SMARCAD1 with H3K9me3 foci results from its altered nuclear localization or simply from reduced expression level? In either case, please provide some mechanistic insights.

      While we acknowledge the value of including mechanistic studies, such an addition would require a substantial amount of experimental work that exceeds our current resources. 

      (5) Fig. 3K, Fig. S4D-E: does SMARCAD1 expression decrease upon TOPBP1 depletion? Statistical analysis of SMARCAD1 intensity in Fig. S4E is needed, and a Western blot analysis is strongly suggested. Additionally, it is unclear if the reduced co-localization of SMARCAD1 with H3K9me3 foci results from its altered nuclear localization or simply from reduced expression level? In Fig. 3K, TOPBP1-depleted cells appear to show decreased size and increased number of H3K9me3 foci, which is inconsistent with Fig. S4B-C. The authors should clarify this discrepancy. Furthermore, statistics should be performed to determine whether Smarcad1/Topbp1 knockdown could further increase the size and decrease the number of H3K9me3 foci in 2C<sup>+</sup> cells. This would provide additional evidence for the involvement of these proteins in heterochromatin maintenance.

      We did not observe Smarcad1 downregulation after Topbp1 knockdown (shown in fig. S4A). In Figs. S4B and S4C, we observed that the number of H3K9me3 foci decreased, and their area became larger after knocking down either Smarcad1 or Topbp1, compared to scramble controls. These results align with the reviewer’s comment. Additionally, it should be noted that these findings were derived from the quantification of tens of cells and hundreds of foci, as indicated in the figure legend. This resulted in statistical significance after applying the test indicated in the figure legend.

      (6) Fig. 3J is suggested to be moved to Fig. 4. Additionally, performing immunostaining of SMARCAD1, TOPBP1, and H3K9me3 during pre-implantation development would provide valuable information on their protein-level dynamics, interactions, and functions in early embryos. This would further strengthen the conclusions drawn in the manuscript.

      We agree that performing these additional experiments would provide additional valuable information, however this would require a substantial amount of experimental work that exceeds our current resources.

      (7) Fig. 4 and Fig. S5: the authors observed reduced H3K9me3 signal in the Smarcad1 MO embryos at the 8-cell stage, but claim that they failed to examine Topbp1 MO embryos at the 8-cell stage due to their developmental arrest at the 4-cell stage. However, based on Fig. 4A, not all Topbp1 MO embryos were arrested at the 4-cell stage, and it is still possible to examine the H3K9me3 signal in 8-cell Topbp1 MO embryos, which is critical for demonstrating its function in early embryos. Also, how to interpret the increased HP1b signal in Topbp1 MO embryos?

      For Topbp1 silencing, we observed an even more severe phenotype compared to Smarcad1 MO. All the Topbp1 MO-injected embryos (100 %) arrested at the 4-cell stage and did not develop further (shown in Fig. 4A and 4B). Therefore, the severity of the Topbp1 morpholino phenotype posed a technical challenge in evaluating the H3K9me3 signal in 8-cell Topbp1 MO embryos, as none of the injected embryos developed beyond the 4-cell stage.

      We believe the increased HP1b signal in Topbp1 MO embryos could indicate potential alterations in chromatin organization and heterochromatin stability. Specifically, we observed remodeling of heterochromatin in both 2-cell and 4-cell Topbp1 MO arrested embryos compared to controls, as evidenced by the spreading and increased HP1b signal (shown in fig. S5F-S5I). Further investigations could enhance our understanding of the underlying defects in Topbp1 knockdown embryos, extending beyond heterochromatin-related errors.

      Minor points:

      (1) Page 4, the third row from the bottom: please revise the sentence.

      We have reviewed the text and it now reads correctly in the revised manuscript.

      (2) Fig. 1C: The authors claimed "Luc replicates clustered separately from 2C<sup>+</sup> and 2C- conditions", however, Luc rep3 is apparently clustered with 2C conditions.

      (3) The GFP signal in Fig. S1E is confusing.

      (4) Please include ESC in Fig. 2D-E. Also label the colors in Fig. 2E.

      As indicated in the figure legend of the revised Fig. 1F: “Cells with a GFP intensity score > 0.2 are colored in green. Black dots indicate 2C- cells and green dots indicate 2C<sup>+</sup> cells.”

      (5) Fig. 2G: Transposition of the heatmap (show genes in rows) is suggested to improve readability.

      (6) Page 7, the third row from the bottom: incorrect citation of Fig. 1K.

      Thank you for spotting this incorrect citation. We have corrected it in the revised manuscript.

      (7) Page 8, row 15, Fig. S3D should be cited to support the decreased expression of SMARCAD1 in 2C<sup>+</sup> cells.

      We have cited the corresponding supplementary figure S3D in the mentioned sentence.

      (8) Fig. 2H: what is the difference between "2C-" and "ESC-like"?

      We named 2C- to those cells not expressing the GFP reporter in the transition from ESCs to 2C<sup>+</sup> cells. We named ESC-like cells to those cells that do not express the GFP reporter during exit, meaning from sorted and purified 2C<sup>+</sup> to a GFP negative state.

      (9) Fig. S4A-C: compared with shTopbp1#2, shTopbp1#1 appears to be slightly more effective in knockdown, but less dramatic changes in the size/number of H3K9me3 foci.

      (10) Fig. 4: please show the effectiveness of Topbp1 MO by Immunostaining of TOPBP1.

      (11) Fig. 4C: please label the developmental stage as in Fig. 4E and 4G.

      We have added a “8-cell” label in the Figure 4C, as suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Zhao and colleagues investigate inflammasome activation by E. tarda infections. They show that E. tarda induces the activation of the NLRC4 inflammasome as well as the non-canonical pathway in human THP1 macrophages. Further dissecting NLRC4 activation, they find that T3SS translocon components eseB, eseC and eseD are necessary for NLRC4 activation and that delivery of purified eseB is sufficient to trigger NAIP-dependent NLRC4 activation. Sequence analysis reveals that eseB shares homology within the C-terminus with T3SS needle and rod proteins, leading the authors to test if this region is necessary for inflammasome activation. They show that the eseB CT is required and that it mediates interaction with NAIP. Finally, they that homologs of eseB in other bacteria also share the same sequence and that they can activate NLRC4 in a HEK293T cell overexpression system.

      Strengths:

      This is a very nice study that convincingly shows that eseB and its homologs can be recognized by the human NAIP/NLRC4 inflammasome. The experiments are well designed, controlled and described, and the papers is convincing as a whole.

      Weaknesses:

      The authors need to discuss their study in the context of previous papers that have shown an important role for E. tarda flagellin in inflammasome activation and test whether flagellin and/or E. tarda T3SSs needle or rod can activate NLRC4.

      The authors show that eseB and its homologs can activate NLRC4, but there are also other translocon proteins that are very different such as YopB or PopB. and share little homology with eseB. It would be nice to include a section comparing the different type 3 secretion systems. are there 2 different families of T3SSs, those that feature translocon components that are recognized by NAIP-NLRC4 and those that cannot be recognized?

      (1) The authors need to discuss their study in the context of previous papers that have shown an important role for E. tarda flagellin in inflammasome activation and test whether flagellin and/or E. tarda T3SSs needle or rod can activate NLRC4.

      According to the reviewer’s suggestion, we added the relevant discussion (lines 326-334) and carried out additional experiments to examine whether E. tarda flagellin, needle, and rod could activate NLRC4. The relevant results are shown in Figure S3, Figure S5, and lines 226-230 and 269-274.

      (2) The authors show that eseB and its homologs can activate NLRC4, but there are also other translocon proteins that are very different such as YopB or PopB. and share little homology with eseB. It would be nice to include a section comparing the different type 3 secretion systems. are there 2 different families of T3SSs, those that feature translocon components that are recognized by NAIP-NLRC4 and those that cannot be recognized?

      According to the reviewer’s suggestion, additional experiments were performed to examine the NLRC4-activating potentials of 14 translocator proteins that share low sequence identities with EseB. The relevant results and discussion are shown in Figure S8 and lines 289-301; 364-372, and 377-379.

      Reviewer #2 (Public Review):

      Summary:

      This work by Zhao et al. demonstrates the role of the Edwardsiella tarda type 3 secretion system translocon in activating human macrophage inflammation and pyroptosis. The authors show the requirement of both the bacterial translocon proteins and particular host inflammasome components for E. tarda-induced pyroptosis. In addition, the authors show that the C-terminal region of the translocon protein, EseB, is both necessary and sufficient to induce pyroptosis when present in the cytoplasm. The most terminal region of EseB was determined to be highly conserved among other T3SS-encoding pathogenic bacteria and a subset of these exhibited functionally similar effects on inflammasome activation. Overall, the data support the conclusions and interpretations and provide interesting insights into interactions between bacterial T3SS components and the host immune system.

      Strengths:

      The authors use established and reliable molecular biology and bacterial genetics strategies to characterize the roles of the bacterial T3SS translocon and host inflammasome pathways to E. tarda-induced pyroptosis in human macrophages. These observations are naturally expanded upon by demonstrating the specific regions of EseB that are required for inflammasome activation and the conservation of this sequence among other pathogenic bacteria.

      Weaknesses:

      The functional assessment of EseB homologues is limited to inflammasome activation at the protein level but does not include the effects on cell viability as shown for E. tarda EseB. Confirmation that EseB homologues have similar effects on cell death would strengthen this portion of the manuscript.

      According to the reviewer’s suggestion, the effects of representative EseB homologs on cell death were examined in the revised manuscripts (Figure 5D, Figure S7 and line 289).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I only have a few suggestions on how to improve the study:

      Activation of caspase-4 requires entry into the host cytosol. Can this be observed with E. tarda and is it T3SS dependent? The fact that deleting the translocon components abrogates all GSDMD activation (see Fig. 2D) suggests that also Casp4 activation requires an active T3SS. It would be useful for the reader to include some more information on the cellular biology of E. tarda.

      In our study, we found that E. tarda could enter THP-1 cells (Figure S1), and host cell entry was not affected by deletion of eseB-D (Δ_eseB-D_) in the T3SS system (Figure 2B, C). Additional experiments showed that Δ_eseB-D_ abolished the ability of E. tarda to activate Casp4 (Figure S2), implying that Casp4 activation required an active T3SS. Relevant changes in the revised manuscript: lines 223 and 224, 341-342.

      The data presented by the authors suggest that escB is sensed by NLRC4 when overexpressed, they do however not prove that during an infection escB is the main factor that drives NLRC4 activation, since deficiency in escB also abrogated translocation of other potential activators of NLRC4, e.g. flagellin and T3SS needle and rod subunits. I would thus find it essential to properly test if E. tarda flagellin can activate NLRC4 by comparing a WT and flagellin deficient strain, and/or by transfecting or expressing E.t. flagellin in these cells, as well as testing whether E.t. rod and needle subunits act as NLRC4 activators. This is important as previous studies suggested that flagellin is the main activator of cytotoxicity during E. tarda infection.

      Previous studies have shown that flagellin is required for E. tarda-induced macrophage death in fish [1] but not in mice [2]. In the revised manuscript, we performed additional experiments to examine whether E. tarda flagellin, needle, and rod could activate NLRC4. The relevant results are shown in Figure S3, Figure S5, and lines 226-230 and 269-274, and 326-334.

      References

      (1) Xie HX, Lu JF, Rolhion N, Holden DW, Nie P, Zhou Y, et al. Edwardsiella tarda-induced cytotoxicity depends on its type III secretion system and flagellin. Infect Immun. 2014;82(8):3436-45. doi: 10.1128/IAI.01065-13.

      (2) Chen H, Yang D, Han F, Tan J, Zhang L, Xiao J, et al. The bacterial T6SS effector EvpP prevents NLRP3 inflammasome activation by inhibiting the Ca<sup>2+</sup>-dependent MAPK-JNK pathway. Cell Host Microbe. 2017;21(1):47-58. doi: 10.1016/j.chom.2016.12.004.

      Figure 5/S4, please list the names of the eseB homologs. It is cumbersome to have to access GenBank with the accession number to be able to understand what proteins the authors define as homologs of eseB.

      The names were added to the revised Table S2, Figure 5 and Figure S6 (the original Figure S4).

      The authors mention that other translocon proteins, such as YopB/D and PopB/D, were suggested to cause inflammasome activation. How do these compare to eseB and its homologs? Do they share the CT motif?

      Additional experiments were performed to compare the inflammasome activation abilities of EseB and other translocator proteins including YopD and PopD. The relevant results and discussion are shown in Figure S8 and lines 289-301, 364-372, and 377-379.

      It would be nice to show that there are potentially two groups of translocon proteins, one group sharing homology to needle subunits within the CT region and another that is different. A quick look at the sequence of these proteins suggests that they are quite different and much larger than eseB.

      In our study, additional experiments with more translocator proteins indicated that the possession of EseB T6R-like terminal residues does not necessarily guarantee the protein to activate the NLRC4 inflammasome. Relevant results and discussion are shown in lines 289-301, 364-372, and 377-379.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Satouh et al. report giant organelle complexes in oocytes and early embryos. Although these structures have often been observed in oocytes and early embryos, their exact nature has not been characterized. The authors named these structures "endosomal-lysosomal organelles form assembly structures (ELYSAs)". ELYSAs contain organelles such as endosomes, lysosomes, and probably autophagic structures. ELYSAs are initially formed in the perinuclear region and then migrate to the periphery in an actin-dependent manner. When ELYSAs are disassembled after the 2-cell stage, the V-ATPase V1 subunit is recruited to make lysosomes more acidic and active. The ELYSAs are most likely the same as the "endolysosomal vesicular assemblies (ELVAs)", reported by Elvan Böke's group earlier this year (Zaffagnini et al. doi.org/10.1016/j.cell.2024.01.031). However, it is clear that Satouh et al. identified and characterized these structures independently. These two studies could be complementary. Although the nature of the present study is generally descriptive, this paper provides valuable information about these giant structures. The data are mostly convincing, and only some minor modifications are needed for clarification and further explanation to fully understand the results.

      Reviewer #2 (Public Review):

      Satouh et al report the presence of spherical structures composed of endosomes, lysosomes, and autophagosomes within immature mouse oocytes. These endolysosomal compartments have been named as Endosomal-LYSosomal organellar Assembly (ELYSA). ELYSAs increase in size as the oocytes undergo maturation. ELYSAs are distributed throughout the oocyte cytoplasm of GV stage immature oocytes but these structures become mostly cortical in the mature oocytes. Interestingly, they tend to avoid the region which contains metaphase II spindle and chromosomes. They show that the endolysosomal compartments in oocytes are less acidic and therefore non-degradative but their pH decreases and becomes degradative as the ELYSAs begin to disassemble in the embryos post-fertilization. This manuscript shows that lysosomal switching does not happen during oocyte development, and the formation of ELYSAs prevents lysosomes from being activated. Structures similar to these ELYSAs have been previously described in mouse oocytes (Zaffagnini et al, 2024) and these vesicular assemblies are important for sequestering protein aggregates in the oocytes but facilitate proteolysis after fertilization. The current manuscript, however, provides further details of endolysosomal disassembly post-fertilization. Specifically, the V1-subunit of V-ATPase targeting the ELYSAs increases the acidity of lysosomal compartments in the embryos. This is a well-conducted study and their model is supported by experimental evidence and data analyses.

      Reviewer #3 (Public Review):

      Fertilization converts a cell defined as an egg to a cell defined as an embryo. An essential component of this switch in cell fate is the degradation (autophagy) of cellular elements that serve a function in the development of the egg but could impede the development of the embryo. Here, the authors have focused on the behavior during the egg-to-embryo transition of endosomes and lysosomes, which are cytoplasmic structures that mediate autophagy. By carefully mapping and tracking the intracellular location of well-established marker proteins, the authors show that in oocytes endosomes and lysosomes aggregate into giant structures that they term Endosomal LYSosomal organellar Assembl[ies] (ELYSA). Both the size distribution of the ELYSAs and their position within the cell change during oocyte meiotic maturation and after fertilization. Notably, during maturation, there is a net actin-dependent movement towards the periphery of the oocyte. By the late 2-cell stage, the ELYSAs are beginning to disintegrate. At this stage, the endo-lysosomes become acidified, likely reflecting the activation of their function to degrade cellular components.

      This is a carefully performed and quantified study. The fluorescent images obtained using well-known markers, using both antibodies and tagged proteins, support the interpretations, and the quantification method is sophisticated and clearly explained. Notably, this type of quantification of confocal z-stack images is rarely performed and so represents a real strength of the study. It provides sound support for the conclusions regarding changes in the size and position of the ELYSAs. Another strength is the use of multiple markers, including those that indicate the activity state of the endo-lysosomes. Altogether, the manuscript provides convincing evidence for the existence of ELYSAs and also for regulated changes in their location and properties during oocyte maturation and the first few embryonic cell cycles following fertilization.

      At present, precisely how the changes in the location and properties of the ELYSAs affect the function of the endo-lysosomal system is not known. While the authors' proposal that they are stored in an inactive state is plausible, it remains speculative. Nonetheless, this study lays the foundation for future work to address this question.

      Minor point: l. 299. If I am not mistaken, there is a typo. It should read that the inhibitors of actin polymerization prevent redistribution from the cytoplasm to the cortex during maturation.

      Minor point: A few statements in the Introduction would benefit from clarification. These are noted in the comments to the authors.

      We sincerely appreciate the editorial board of eLife and the reviewers for their helpful and constructive comments on our manuscript. We are pleased that the reviewers acknowledged that we identified and characterized this assembly structure independently. In the revised manuscript, we have carefully considered the reviewers’ comments and conducted additional analysis to address each of them.

      Regarding the typographical errors, we revised the description to fit with our findings and the reviewers’ comments. We also found that the primer sequence was correct, and we carefully checked the accuracy of the entire manuscript.

      We hope that the revised version will now be deemed suitable for publication in eLife.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Q. 1) The authors state in the Abstract that ELYSAs contain autophagosome-like membranes in the outer layer. However, this seems to be just speculation based on the LC3 staining results and is not directly shown. Are there autophagosome-like double membrane structures in ELYSAs?

      We appreciate this comment. We also agree with this concern; however, it was difficult to assert that they are autophagosomes based on the observation of the electron micrographs. For this reason, we rephrased it to be "Most ELYSAs are also positive for an autophagy regulator, LC3.” (lines 33). In addition, we revised the notation to LC3-positive structures in the Result and Discussion section (line 165-169, 286).

      Q. 2) The data in Figure 2A, showing a decrease in the number of LAMP1 structures, seems to contradict the data in Figure 1B, showing an apparent increase in LAMP1 structures. Please explain this discrepancy. If the authors did not count structures just below the plasma membrane, please explain the rationale for this.

      We really appreciate the valuable comment. Regarding the number of LAMP1-positive structures, it is not suitable for comparison with Figure 1B, etc., as pointed out by the reviewer, since the distribution of the LAMP1 signal differs from plane to plane. To avoid any potential confusion, we added new images of the Z-projection of the immunostained images that can better reflect the number of positive structures in the whole oocyte/embryo in Figure 2.

      In addition, as the reviewer pointed out, there is a technical difficulty in measuring the LAMP1-positive signal on the plasma membrane or just below it. We explained how and why we had to delete plasma membrane signals in our response #21.

      Q. 3) The actin dependence is not observed in Figure 5C. What is the difference between Figure 5C and 5E? Please explain further.

      We apologize for the lack of clarity; Figures 5C and 5E show the average number of LAMP1-positive structures (5C) and the percentage of the sum of granule volumes in LAMP1 positive structure (5E), respectively, after classifying the LAMP1 positive granules by their diameters.

      We removed Figure 5E for the sake of conciseness since we already mentioned a similar fact in Figure 5C. To clarify the corresponding explanations, we moved figures that were not classified by diameter to Supplementary Figure 8 to improve readability. Moreover, we have rewritten the main text on lines 200–211.

      Q. 4) While the actin inhibitors reduce the number of peripheral LAMP1 structures (Figure 5F), they do not affect their number in the central region (Figure 5G). How can the authors conclude that actin inhibitors inhibit the migration of LAMP1 structures?

      We appreciate the comment. As pointed out, the number of large LAMP1-positive structures in the medial region did not change. Therefore, we have avoided the description that ELYSAs migrate from the middle region to the cell periphery and have unified the description of whether large structures in the periphery occur. Please refer to the subsection title (line 188), the following descriptions (lines 189–199), the related description in the Results (lines 200–211), and the title and the legend of Figure 5.

      Q. 5) The authors show that the V1A subunit associates with the surface of LAMP1 structures as punctate structures (Figure 6B). What are these V1A-positive structures? Is V1A recruited to some specific domains of ELYSAs, or are V1A-positive active lysosomes recruited to ELYSAs? Please provide an interpretation of these data. The phrase "The V1-subunit of V-ATPase is targeted to these structures" (line 262) is not appropriate because it is indistinguishable whether only the V1 subunits are recruited or active lysosomes containing the V1 subunit are recruited.

      Thank you for the valuable comment. Indeed, our analysis, including the analysis of Fig. 8 described on line 262, did not clarify whether free V1A-mCherry molecules accessed the ELYSA periphery or whether lysosomes with V1A-mCherry molecules newly merged into the ELYSA. Therefore, we added this interpretation to lines 232–234 of the Results and revised the Discussion as "The number of membrane structures positive for V1A-mCherry increase upon ELYSA disassembly, indicating further acidification of the endosomal/lysosomal compartment" (lines 292–294).

      Q. 6) Why did the authors use LysoSensor as a marker for ELYSA instead of LAMP1 in Figure 8 and 9? Some reasons should be given.

      There is a clear technical reason for this: when LAMP1-EGFP was expressed in a zygote, it was largely migrated to the plasma membrane before and after the 2-cell stage, making it difficult to capture the change of ELYSAs. To circumvent this difficulty, we used Lysosensor to visualize ELYSAs instead of LAMP1-EGFP. This explanation was added to lines 258–260.

      Q. 7) In Figure 9A, it is not clear whether the activity of LysoSensor-positive structures is lower at this stage compared to other stages. It may be shown in Figure S7, but the data are not clearly visible. A direct comparison would be ideal.

      A new analysis similar to that shown in Fig. 9 for early 2-cells and 4-cells was performed and added to Figure S7. To support direct comparison, the ranges of axes were set to be similar.

      As a result, the quantified MagicRed signal on the isolated LysoSensor-positive punctate structure in MII oocyte was nearly the same as that in early 2-cells and 4-cells. In early 2-cells, LysoSensor gave a signal at the cellular boundary, where MagicRed staining was not observed, confirming that MagicRed activity is higher in the interior than in the cell periphery in post-fertilization embryos. We have included an additional description in the main text (lines 280–282).

      Q. 8) In the phrase "pregnant mare serum gonadotropin or an anti-inhibin antibody" (line 382), is "or" correct?

      When inducing superovulatory stimulation, an anti-inhibin antibody (distributed as CARD HyperOva) can be used as a substitute for PMSG (after additional stimulation with hCG), which results in the production of eggs of similar quality to those of PMSG. This was used in most experiments. To amend the lack of clarity, a reference (Takeo and Nakagata Plos One, 2015) was added to the description of HyperOva (line 417).

      Q. 9) In almost all graphs, please indicate what the X-axis is indicating (not just "number") so that readers can understand what number is being represented without reading the legends.

      We revised the axis titles in all figures.

      Q. 10) Since grayscale images provide better contrast than color images, it is recommended that single-color images be shown in grayscale.

      We replaced all single-color images with grayscale images.

      Reviewer #2 (Recommendations For The Authors):

      Specific comments:

      Q. 11) Figure 1 and S1- Both Rab5 and Rab7 co-localize with LAMP1. However, there seems to be a lot of LAMP1-free Rab5 dots as compared to the Lamp1-free Rab7. As a result, LAMP1 and Rab7 are co-localized more frequently than LAMP1 and Rab5 (video1). Could it be that early endosomes (Rab5+) are yet to be incorporated into ELYSAs? If so, a brief discussion of this phenomenon would be nice.

      Thank you very much for the comment. We agree with the reviewer’s interpretation. In accordance with this suggestion, we clearly stated in the main text: “Although small punctate structures that are RAB5-positive but LAMP1-negative also spread over the cytosol, most giant structures were positive for RAB5 and LAMP1 (Video 1)” (lines 91–93). In the Discussion section, a brief statement was included: “Considering the large number of RAB5-positive and LAMP1-negative punctate structures in MII oocytes, these layers may also reflect the assembly mechanism of the ELYSA” (lines 318–320).

      Q. 12) Video 3 (and Figure 6) clearly shows the dynamics of LAMP1-labelled vesicles during maturation, which is impressive. In contrast to the live cell imaging after LAMP1 mRNA injection, Figure 1 used anti-LAMP1 Ab to detect endogenous levels of LAMP1. It appears that mRNA microinjection causes LAMP1 overexpression causing more (but smaller) vesicles to form. It should be easy to quantify and compare the vesicles in Figure 1 and 6

      We appreciate the comment. As mentioned, injections of EGFP-LAMP1 mRNA are useful for the visualization of LAMP1 dynamics during the maturation phase from GV to MII by live cell imaging, which is not feasible with immunostaining. However, the fluorescence emitted by EGFP-LAMP1 is only a few tenths of that of antibody staining, and because of the technical difficulty of microinjection into GV oocytes, the signal-to-noise ratio sufficient for imaging was merely one in ten oocytes. In addition, live cell imaging of oocytes in Figure 6 had to be carried out with very low excitation light exposure to reduce the toxicity. It was also performed with a low magnification lens and a longer step size in the z-axis. For these reasons, in examining the point raised, we performed an additional 3D object analysis, in the same way as in Figure 2, on the data of IVM oocytes injected with EGFP-LAMP1 mRNA using the same lens as in Figure 1 and with a longer exposure time than in live imaging. The results were compared with the MII data of Figures 1 and 2.

      As a result, as shown in the new Figure S8, more objects with a diameter of 0.2–0.4 µm were found than in the immunostaining data, which fits the reviewer’s point. In addition, the counts were lower for the 0.6–1.0 µm diameter, but there was no significant difference in the number of larger LAMP1 positive structures corresponding to the ELYSA size. We consider that this was appropriate for the original purpose of characterizing the ELYSA formation process. A description of these points has been added to lines 221–225.

      Q. 13) In Figure 4A and B- Seems like not all LAMP1-positive structures were LC3-positive. Is there any size or location within the oocyte that determines LC3 positivity?

      We appreciate the valuable comment. To answer this comment, we proceeded with a new 3D object-based co-localization analysis on Lamp1 and LC3, determined the number, volume, and distribution within the oocyte, and incorporated the results as Supplementary Figure 6. To examine the positivity, we further analyzed the percentage of double-positive structures of all the LAMP1-positive structures. The results showed that their average diameter significantly shifted from 2.36 µm (GV) to 3.78 µm (MII). Moreover, it was clearly indicated that LAMP1-positive structures smaller than 2 µm in diameter are rarely positive for LC3. In terms of location, measuring the distance of the double positive structures from the oocyte center (the cellular geometric center) indicated that they tend to be observed at the periphery of both stages of oocytes (more than 80% in > 30 µm in the MII oocyte). Of note, no clear tendency of double positivity was observed. A description of these points has been added to lines 174–186.

      Q. 14) In discussion, line 256- Small ELYSAs are formed in GV oocytes. Since you haven't checked the smaller-sized, growing oocytes, I suggest rephrasing this sentence as 'are present' rather than 'are formed'.

      We agree with the reviewer’s suggestion and changed it to "present" (line 287).

      Q. 15) Line 188- ELISA should instead be ELYSA

      Thank you for pointing this out. We have found a few more typographical errors, and all of them have been corrected (lines 213 and 321).

      Reviewer #3 (Recommendations For The Authors):

      Q. 16) Line 42: What do you mean by 'zygotic gene expression following the degradation of the cellular components of each maternal and paternal gamete'? ZGA requires this degradation? Please provide supporting references from the literature.

      We apologize for the confusing wording. We meant to say that both ZGA and degradation of parental components are required. To avoid misunderstanding, we have revised “zygotic gene expression as well as the degradation of the cellular components of each maternal and paternal gamete” and inserted a new reference (line 44).

      Q. 17) 50: MII means metaphase II, not meiosis II.

      We corrected the clerical mistake (line 50).

      Q. 18) 51: Define LC3.

      We added the definition of LC3 (line 51-52).

      Q. 19) 60: 'lysosomal activity in oocytes is upregulated by sperm-derived factors as the oocytes grow and mature'. As written, the sentence implies that oocytes grow and mature after fertilization. This may be true for maturation, but I would be surprised to learn that there is growth of the oocyte after fertilization.

      We appreciate this valuable comment.

      The C. elegans lives mainly as a hermaphrodite, which contains a couple of U-shaped gonad arms including the ovary, spermatheca and uterus in the body. Oocytes grow in the ovary and maturate upon receiving major sperm proteins secreted from sperms and ovulated to the spermatheca for fertilization. In 2017, Kenyon’s group reported that major sperm proteins act as sperm-secreted hormones to upregulates the lysosomal activity in oocytes during oocyte growth and maturation. We have revised our manuscript to avoid misunderstanding, to ' lysosomal activity in oocytes is upregulated by major sperm proteins secreted from sperms as the oocytes grow and mature '. (L. 61-66).

      Q. 20) 94 and Figure 1B: While it is clear that many LAMP1 foci at the late 2-cell stage do not also contain RAB5, it seems that the majority of RAB5 loci also stain for LAMP1. This may be a minor point in the context of the paper but could be clarified.

      We could not easily agree with the suggestion because of the possibility that the images might give different impressions on each plane. Therefore, as a way to verify this point, we attempted to quantify the co-localization by reconstructing the 3D puncta information based on the two types of antibody staining data. Unfortunately, as shown in Fig. 1AB, Rab5 had a high cytoplasmic background, and although we were able to extract peaks, we could not reliably recalibrate the three-dimensional punctate structure (please refer to the new Supplementary Fig. 6). Therefore, co-localization on each other's punctate structure (LAMP1/RAB5 vs. RAB5/LAMP1) could not be verified. The validation using specific planes also showed large differences between planes, with overlapping punctate structures counted separately in adjacent planes, making reliable quantification difficult. This is an issue that will be addressed in the future.

      On the other hand, the newly added Z-projection figure (Fig. 1AB) shows that RAB5-positive and LAMP1-negative punctate structures tend to accumulate along the LAMP1-positive punctate structures larger than 1 µm at the late 2-cell stage in all observed embryos; we added this statement on lines 99–101.

      Q. 21) 100-102 and Figure 2A: Does the decrease in the total number of LAMP1 foci refer just to cytoplasmic or also to membrane foci? If the former, what was the reason for not including the membrane in the analysis?

      We appreciate the critical question. The LAMP1 signal on the plasma membrane interfered with the measurement of the signals just below the plasma membrane. The biological cause of this increased signal on the plasma membrane, as shown in Fig. 2E, seemed to be caused by the migration of the LAMP1 signals post-fertilization, which was also reported in a previous paper by Zaffagnini et al. (2024), published in Cell.

      In our analysis, oocytes are giant cells, and confocal imaging has a technical limitation in obtaining the same fluorescent intensity along the z-axis. However, 3D-object analysis requires thresholding based on absolute values. As a result of this situation, the presence of the plasma membrane signal caused punctate structures located close to the membrane to be captured and recognized as a single, very large LAMP1-positive structure, resulting in the loss of the punctate structure that should be measured.

      To avoid this issue, we have used several programs to correct the fluorescence difference along the z-axis; nonetheless, these attempts were unsuccessful. Therefore, as described in the Materials and Methods section, we applied only background subtraction at each z-position and then manually removed the plasma membrane signal (which was thin and continuous at the edges). Furthermore, when the plasma membrane and punctate structure signals overlapped, we paid attention not to remove the signals but to separate them. Thus, we believe that the decrease in the number and volume of LAMP1-positive structures after fertilization is still a phenomenon associated with the shift of LAMP1 to the plasma membrane.

      Q. 22) Figure 2B, F, G: As the x-axis does not represent a continuous variable, adjacent data points should not be connected by a line. The histogram representations in A, C, and E are much easier to understand. I suggest presenting all data in this format.

      We revised the line graphs to bar graphs. Besides, to make the significance among populations clearer, the significances are now expressed using alphabetical indicators.

      Q. 23) Figure 2B, C: It seems that the values for the different stages are expressed relative to the value at MII. Why not use the GV value at the base-line? This would follow the developmental trajectory of the oocyte/embryo more directly and would not (I believe) change the conclusions.

      We appreciated the comment. We meant to express that ELYSA develops most in the MII phase and that it decreases after fertilization, so considering the reviewer’s suggestion, we expressed GV-MII changes based on GV and changes after fertilization based on the MII phase (Fig. 2C, D).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript is dedicated heavily to cell type mapping and identification of sub-type markers in the human testis but does not present enough results from cross-investigation between NOA cases versus control. Their findings are mostly based on transcriptome and the authors do not make enough use of the scATAC-seq data in their analyses as they put forward in the title. Overall, the authors should do more to include the differential profile of NOA cases at the molecular level - specific gene expression, chromatin accessibility, TF binding, pathway, and signaling that are perturbed in NOA patients that may be associated with azoospermia.

      Strengths:

      (1) The establishment of single-cell data (both RNA and ATAC) from the human testicular tissues is noteworthy.

      (2) The manuscript includes extensive mapping of sub-cell populations with some claimed as novel, and reports marker gene expression.

      (3) The authors present inter-cellular cross-talks in human testicular tissues that may be important in adequate sperm cell differentiation.

      Weaknesses:

      (1) A low sample size (2 OA and 3 NOA cases). There are no control samples from healthy individuals.

      Thank you for your comments. We recognize that the small sample size in this study somewhat limits its generalizability. However, in transcriptomic research, limited sample sizes are a common issue due to the complexities involved in acquiring samples, particularly in studies about the reproductive system. Healthy testicular tissue samples are difficult to obtain, and studies (doi: 10.18632/aging.203675) have used obstructive azoospermia as a control group in which spermatogenesis and development are normal.

      (2) Their argument about interactions between germ and Sertoli cells is not based on statistical testing.

      Thank you for your comments. Due to limited funding, we have not yet fully and deeply conducted validation experiments, but we plan to carry out related experiments in the later stage. We hope that the publication of this study will help to obtain more financial support to further investigate the interactions between germ cells and Sertoli cells.

      (3) Rationale/logic of the study. This study, in its present form, seems to be more about the role of sub-Sertoli population interactions in sperm cell development and does not provide enough insights about NOA.

      Thank you for your comments. In Figure 6, we conducted an in-depth analysis and comparison of the differences between the Sertoli cell subtypes and the germ cell subtypes involved in spermatogenesis in the OA and NOA groups. The results revealed that in the NOA group, especially in the NOA3 group, which has a lower sperm count compared to NOA2 and NOA1, there is a significant loss of Sertoli cell subtypes including SC3, SC4, SC5, SC6, and SC8. The NOA1 group, with a sperm count close to that of the OA group, also had a Sertoli cell profile similar to the OA group. The NOA2 group, with a sperm count between that of NOA1 and NOA3, also exhibited an intermediate profile of Sertoli cell subtypes. Therefore, we suggest that change in Sertoli cell subtypes is a key factor affecting sperm count, rather than just the total number of Sertoli cells. We believe that through these analyses, we can provide in-depth insights into NOA, and we hope that the publication of this study will help obtain more funding support to further validate and expand on these findings.

      (4) The authors do not make full use of the scATAC-seq data.

      Thank you for your comments.We have added analysis of the scATAC-seq data and shown in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Shimin Wang et al. investigated the role of Sertoli cells in mediating spermatogenesis disorders in non-obstructive azoospermia (NOA) through stage-specific communications. The authors utilized scRNA-seq and scATAC-seq to analyze the molecular and epigenetic profiles of germ cells and Sertoli cells at different stages of spermatogenesis.

      Strengths:

      By understanding the gene expression patterns and chromatin accessibility changes in Sertoli cells, the authors sought to uncover key regulatory mechanisms underlying male infertility and identify potential targets for therapeutic interventions. They emphasized that the absence of the SC3 subtype would be a major factor contributing to NOA.

      Weaknesses:

      Although the authors used cutting-edge techniques to support their arguments, it is difficult to find conceptual and scientific advances compared to Zeng S et al.'s paper (Zeng S, Chen L, Liu X, Tang H, Wu H, and Liu C (2023) Single-cell multi-omics analysis reveals dysfunctional Wnt signaling of spermatogonia in non-obstructive azoospermia. Front. Endocrinol. 14:1138386.). Overall, the authors need to improve their manuscript to demonstrate the novelty of their findings in a more logical way.

      Thank you for your detailed review of our work. We greatly appreciate your feedback and have made revisions to our manuscript accordingly.

      Regarding the novelty of our research, we believe our study offers conceptual and scientific advances in several ways:

      We have systematically revealed the stage-specific roles of Sertoli cell subtypes in different stages of spermatogenesis, particularly emphasizing the crucial role of the SC3 subtype in non-obstructive azoospermia (NOA). Additionally, we identified that other Sertoli cell subtypes (SC1, SC2, SC3...SC8, etc.) also collaborate in a stage-specific manner with different subpopulations of spermatogenic cells (SSC0, SSC1/SSC2/Diffed, Pa...SPT3). These findings provide new insights into the understanding of spermatogenesis disorders.

      Compared to the study by Zeng S et al., our research not only focuses on the functional alterations in Sertoli cells but also comprehensively analyzes the interaction patterns between Sertoli cells and spermatogenic cells using scRNA-seq and scATAC-seq technologies. We uncovered several novel regulatory networks that could serve as potential targets for the diagnosis and treatment of NOA.

      We sincerely appreciate your constructive comments and will continue to explore this area further, aiming to make a more significant contribution to the understanding of NOA mechanisms.

      Reviewer #3 (Public Review):

      Summary:

      This study profiled the single-cell transcriptome of human spermatogenesis and provided many potential molecular markers for developing testicular puncture-specific marker kits for NOA patients.

      Strengths:

      Perform single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) on testicular tissues from two OA patients and three NOA patients.

      Weaknesses:

      Most results are analytical and lack specific experiments to support these analytical results and hypotheses.

      Thank you for your thorough review of our work. We highly value your feedback and have made revisions to our manuscript accordingly. Indeed, we have conducted immunofluorescence (IF) experiments to validate the data obtained from single-cell sequencing and have expanded the sample size to enhance the reliability of our results. To better present these validation experiments, we have reorganized and renamed the sample information, making it easier for you to understand which samples were used in the specific experiments. Following the publication of this paper, we plan to secure additional funding to deepen our research, particularly in the area of experimental validation. We sincerely appreciate your support and insightful suggestions, which have greatly helped guide our future research directions.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should include results from cross-investigation comparing NOA/OA patients versus controls.

      Thank you for your comments. In this study, OA was the control group. Healthy testicular tissue samples are difficult to obtain, and studies (doi: 10.18632/aging.203675) have used OA as a control group in which spermatogenesis and development are normal.

      (2) In Table S1, the authors should also include the metric for scATAC-seq, and do more to show the findings the authors obtained in RNA is replicated with chromatin accessibility.

      Thank you for your comments. We have added Table S2, which includes the metric for scATAC-seq.

      (3) A single sample from each OA and NOA group may not be enough to confirm colocalization. The authors should include results from all available samples and use quantitative measures.

      Thank you for your comments. I apologize that the sample size in this study was less than three and we could not conduct quantitative analysis. We will increase the sample size and conduct corresponding experiments in subsequent research.

      (4) The Methods section does not include enough description to follow how the analyses were carried out, and is missing information on some of the key procedures such as velocity and cell cycle analyses.

      Thank you for your comments. The method about velocity and cell cycle analyses was added in the revised manuscript. The description is as follows:

      “Velocity analysis

      RNA velocity analysis was conducted using scVelo's (version 0.2.1) generalized dynamical model. The spliced and unspliced mRNA was quantified by Velocity (version 0.17.17).”

      “Cell cycle analysis

      To quantify the cell cycle phases for individual cell, we employed the CellCycleScoring function from the Seurat package. This function computes cell cycle scores using established marker genes for cell cycle phases as described in a previous study by Nestorowa et al. (2016). Cells showing a strong expression of G2/M-phase or S-phase markers were designated as G2/M-phase or S-phase cells, respectively. Cells that did not exhibit significant expression of markers from either category were classified as G1-phase cells.”

      (5) For the purpose of transparency, the authors should upload codes used for analyses so that each figure can be reproduced. All raw and processed data should be made publicly available.

      Thank you for your comments. We have deposited scRNA-seq and scATAC-seq data in NCBI. ScRNA-seq data have been deposited in the NCBI Gene Expression Omnibus with the accession number GSE202647, and scATAC-seq data have been deposited in the NCBI database with the accession number PRJNA1177103.

      Reviewer #2 (Recommendations For The Authors):

      The detailed points the authors need to improve are attached below.

      The results presented in the study have several weaknesses:

      In Figure 1A, it's required to show HE staining results of all patients who underwent single-cell analysis were provided.

      Thank you very much for your valuable suggestions. In Figure 1, we present the HE staining results paired with the single-cell data, covering all patients involved in the single-cell analysis.

      - Saying "identification of novel potential molecular markers for distinct cell types" seems unsupported by the data.

      Thank you for your comments. I'm sorry for the inaccuracy of my description. We have revised this sentence. The description is as follows: These findings indicate that the scRNA-seq data from this study can serve for cellular classification.

      - The methods suggest an integrated analysis of scRNA-seq and scATAC-seq, but from the figures, it seems like separate analyses were performed. It's necessary to have data showing the integrated analysis.

      Thank you for your comments. We have added an integrated analysis of scRNA-seq and scATAC-seq. The results were shown in Figure S2.

      Figure 2 does not seem to well cover the diversity of germ cell subtypes. The main content appears to be about the differentiation process, and it seems more focused on SSCs (stem cell types), but the intended message is not clearly conveyed.

      Thank you for your comments. Figure S1 revealed the diversity of germ cell subtypes. The second part of the results described the integrated findings from Figures 2 and S1.

      - In Figure 2B, pseudotime could be shown, and I wonder if the pseudotime in this analysis shows a similar pattern as in Figure 2D.

      Thank you for your comments. Figure 2B revealed the pseudotime analysis of 12 germ cell subpopulation. Figure 2D revealed RNA velocity of 12 germ cell subpopulation. The two methods are both used for cell trajectory analysis. The pseudotime in Figure 2B showed a similar pattern as in Figure 2D.

      - While staining occurs within one tissue, saying they are co-expressed seems inaccurate as the staining locations are clearly distinct. For example, the staining patterns of A2M and DDX4 (a classical marker) are quite different, so it's hard to claim A2M as a new potential marker just because it's expressed. Also, TSSK6 was separately described as having a similar expression pattern to DDX4, but from the IF results, it doesn't seem similar.

      Thank you for your comments. We have revised the Figure.

      - It was described that A2M (expressed in SSC0-1), and ASB9 (expressed in SSC2) have open promoter sites in SSC0, SSC2, and Diffing_SPG, but it doesn't seem like they are only open in the promoters of those cell types. For example, there doesn't seem to be a peak in Diffing for either gene. The promoter region of the tracks is not very clear, so overall figure modification seems necessary.

      Thank you for your comments. We have revised the Figure.

      - The ATAC signal scale for each genomic region should be included, and clear markings for the TSS location and direction of the genes are needed.

      Thank you for your comments. We have revised the figure and shown in the revised manuscript.

      Figure 3A mostly shows the SSC2 in the G2/M phase, so it seems questionable to call SSC0/1 quiescent. Also, I wonder if the expression of EOMES and GFRA1 is well distinguished in the SSC subtypes as expected.

      Thank you for your comments. We will validate in subsequent experiments whether the expression of EOMES and GFRA1 is clearly distinguished in the SSC subtypes.

      - In Figure 3C, it would be good to have labels indicating what the x and y axes represent. The figure seems complex, and the description does not seem to fully support it.

      Thank you for your comments. We have added labels indicating what the x and y axes represent in the Figure 3C. The x and y axes represent spliced and unspliced mRNA ratios, respectively.

      - While TFs are the central focus, it's disappointing that scATAC-seq was not used.

      Thank you for your comments. TFs analysis using scATAC-seq will be carried out in the future.

      Figure 4: It would be good to have a more detailed discussion of the differences between subtypes, such as through GO analysis. The track images need modification like marking the peaks of interest and focusing more on the promoter region, similar to the previous figures.

      Thank you for your comments. GO analysis results were put in Figure S5. The description is as follows:

      As shown in Figure S5, SC1 were mainly involved in cell differentiation, cell adhesion and cell communication; SC2 were involved in cell migration, and cell adhesion; SC3 were involved in spermatogenesis, and meiotic cell cycle; SC4 were involved in meiotic cell cycle, and positive regulation of stem cell proliferation; SC5 were involved in cell cycle, and cell division; SC6 were involved in obsolete oxidation−reduction process, and glutathione derivative biosynthetic process; SC7 were involved in viral transcription and translational initiation; SC8 were involved in spermatogenesis and sperm capacitation.

      In Figure 5, it would be good to have criteria for the novel Sertoli cell subtype presented. CCDC62 is presented as a representative marker for the SC8 cluster, but from Figure 4C, it seems to be quite expressed in the SC3 cluster as well. Therefore, in Figure 5E's protein-level check, it's unclear if this truly represents a novel SC8 subtype.

      Thank you for your comments. CCDC62 expression was higher in SC8 cluster than in SC3. Since some molecular markers were not commercially available in the market, CCDC62 was selected as SC8 marker for immunofluorescence verification. Immunofluorescence results showed that CCDC62 is a novel SC8 marker.

      - It might have been more meaningful to use SOX9 as a control and show that markers in the same subtype are expressed in the same location.

      Thank you for your comments. To determine PRAP1, BST2, and CCDC62 as new markers for the SC subtype, we co-stained them with SOX9 (a well-known SC marker).

      - Figures 4 and 5 could potentially be combined into one figure.

      Thank you for your comments. Since combining Figures 4 and 5 into a single image would cause the image to be unclear, two images are used to show it.

      In Figure 6, it would be good to support the results with more NOA patient data.

      Thank you for your comments. Patient clinical and laboratory characteristics has been presented in Table 1.

      - Rather than claiming the importance of SC3 based on 3 single-cell patient data, it would be better to validate using public data with SC3 signature genes (e.g., showing the correlation between germ cell and SC3 ratios).

      Thank you for your comments. I'm sorry I didn't find public data with SC3 signature genes. In the future, we will verify the importance of SC3 through in vivo and in vitro experiments.

      - 462: It seems to be referring to Figure 6G, not 6D.

      Thank you for your comments. We have revised it. The description is as follows: As shown in Figure 6G, State 1 SC3/4/5 were tended to associated with PreLep, SSC0/1/2, and Diffing and Diffed-SPG sperm cells (R > 0.72).

      In Figure 7, the spermatogenesis process is basically well-known, so it would be better to emphasize what novel content is being conveyed here. Additionally, emphasizing the importance of SC3 in the overall process based on GO results leaves room for a better approach.

      Thank you for your valuable suggestions. Regarding Figure 7, we recognize that the spermatogenesis process is well-known, and we will focus on highlighting the novel content, particularly the role and significance of the SC3 subtype in spermatogenesis disorders. As for the importance of SC3 in the overall process based on GO results, we have validated this in Figure 8 through co-staining experiments between Sertoli cells and spermatogenic cells in OA and NOA groups. The results demonstrate a significant correlation between the number of SC3-positive cells and SPT3 spermatogenic cells, particularly in the NOA5-P8 group, where both SC3 and SPT3 cell counts are notably lower than in the NOA4-P7 group. This further supports the critical role of SC3 in the spermatogenesis process. Your suggestions have prompted us to refine our data presentation and more clearly emphasize the novel aspects of our research. We will continue to strive to ensure that every part of our research contributes meaningfully to the academic community. Thank you again for your guidance.

      In Figure 8, only the contents of the IF-stained proteins are listed, which seems slightly insufficient to constitute a subsection on its own. It might have been better to conclude by emphasizing some subtypes.

      Thank you for your comments. We have combined this part of the results with other results into one section. The description is as follows:

      “Co-localization of subpopulations of Sertoli cells and germ cells

      To determine the interaction between Sertoli cells and spermatogenesis, we applied Cell-PhoneDB to infer cellular interactions according to ligand-receptor signalling database. As shown in Figure 6G, compared with other cell types, germ cells were mainly interacted with Sertoli cells. We futher performed Spearman correlation analysis to determine the relationship between Sertoli cells and germ cells. As shown in Figure 6H, State 1 SC3/4/5 were tended to be associated with PreLep, SSC0/1/2, and Diffing and Diffed-SPG sperm cells (R > 0.72). Interestingly, SC3 was significantly positively correlated with all sperm subpopulations (R > 0.5), suggesting an important role for SC3 in spermatogenesis and that SC3 is involved in the entire process of spermatogenesis. Subsequently, to understand whether the functions of germ cells and Sertoli cells correspond to each other, GO term enrichment analysis of germ cells and sertoli cells was carried out (Figure S3, S4). We found that the functions could be divided into 8 categories, namely, material energy metabolism, cell cycle activity, the final stage of sperm cell formation, chemical reaction, signal communication, cell adhesion and migration, stem cells and sex differentiation activity, and stress reaction. These different events were labeled with different colors in order to quickly capture the important events occurring in the cells at each stage. As shown in Figure S3, we discovered that SSC0/1/2 was involved in SRP-dependent cotranslational protein targeting to membrane, and cytoplasmic translation; Diffing SPG was involved in cell division and cell cycle; Diffied SPG was involved in cell cycle and RNA splicing; Pre-Leptotene was involved in cell cycle and meiotic cell cycle; Leptotene_Zygotene was involved in cell cycle and meiotic cell cycle; Pachytene was involved in cilium assembly and spermatogenesis; Diplotene was involved in spermatogenesis and cilium assembly; SPT1 was involved in cilium assembly and flagellated sperm motility; SPT2 was involved in spermatid development and flagellated sperm motility; SPT3 was involved in spermatid development and spermatogenesis. As shown in Figure S4, SC1 were mainly involved in cell differentiation, cell adhesion and cell communication; SC2 were involved in cell migration, and cell adhesion; SC3 were involved in spermatogenesis, and meiotic cell cycle; SC4 were involved in meiotic cell cycle, and positive regulation of stem cell proliferation; SC5 were involved in cell cycle, and cell division; SC6 were involved in obsolete oxidation−reduction process, and glutathione derivative biosynthetic process; SC7 were involved in viral transcription and translational initiation; SC8 were involved in spermatogenesis and sperm capacitation. The above analysis indicated that the functions of 8 Sertoli cell subtypes and 12 germ cell subtypes were closely related.

      To further verify that Sertoli cell subtypes have "stage specificity" for each stage of sperm development, we firstly performed HE staining using testicular tissues from OA3-P6, NOA4-P7 and NOA5-P8 samples. The results showed that the OA3-P6 group showed some sperm, with reduced spermatogenesis, thickened basement membranes, and a high number of sertoli cells without spermatogenic cells. The NOA4-P7 group had no sperm initially, but a few malformed sperm were observed after sampling, leading to the removal of affected seminiferous tubules. The NOA5-P8 group showed no sperm in situ (Figure 7A). Immunofluorescence staining in Figure 7B was performed using these tissues for validation. ASB9 (SSC2) was primarily expressed in a wreath-like pattern around the basement membrane of testicular tissue, particularly in the OA group, while ASB9 was barely detectable in the NOA group. SOX2 (SC2) was scattered around SSC2 (ASB9), with nuclear staining, while TF (SC1) expression was not prominent. In NOA patients, SPATS1 (SC3) expression was significantly reduced. C9orf57 (Pa) showed nuclear expression in testicular tissues, primarily extending along the basement membrane toward the spermatogenic center, and was positioned closer to the center than DDX4, suggesting its involvement in germ cell development or differentiation. BEND4, identified as a marker fo SC5, showed a developmental trajectory from the basement membrane toward the spermatogenic center. ST3GAL4 was expressed in the nucleus, forming a circular pattern around the basement membrane, similar to A2M (SSC1), though A2M was more concentrated around the outer edge of the basement membrane, creating a more distinct wreath-like arrangement. In cases of impaired spermatogenesis, this arrangement becomes disorganized and loses its original structure. SMCP (SC6) was concentrated in the midpiece region of the bright blue sperm cell tail. In the OA group, SSC1 (A2M) was sparsely arranged in a rosette pattern around the basement membrane, but in the NOA group, it appeared more scattered. SSC2 (ASB9) expression was not prominent. BST2 (SC7) was a transmembrane protein primarily localized on the cell membrane. In the OA group, A2M (SSC1) was distinctly arranged in a wreath-like pattern around the basement membrane, with expression levels significantly higher than ASB9 (SSC2). TSSK6 (SPT3) was primarily expressed in OA3-P6, while CCDC62 (SC8) was more abundantly expressed in NOA4-P7, with ASB9 (SCC2) showing minimal expression. Taken together, germ cells of a particular stage tended to co-localize with Sertoli cells of the corresponding stages. Germ cells and sertoli cells at each differentiation stage were functionally heterogeneous and stage-specific (Figure 8). This suggests that each stage of sperm development requires the assistance of sertoli cells to complete the corresponding stage of sperm development.”

      Reviewer #3 (Recommendations For The Authors):

      The authors revealed 11 germ cell subtypes and 8 Sertoli cell subtypes through single-cell analysis of two OA patients and three NOA patients. And found that the Sertoli cell SC3 subtype (marked by SPATS1) plays an important role in spermatogenesis. It also suggests that Notch1/2/3 signaling and integrins are involved in germ cell-Stotoli cell interactions. This is an interesting and useful article that at least gives us a comprehensive understanding of human spermatogenesis. It provides a powerful tool for further research on NOA. However, there are still some issues and questions that need to be addressed.<br /> (1) How to collect testicular tissue, please explain in detail. Extract which part of testicular tissue. It's better to make a schematic diagram.

      Thank you for your comments. The process is as follows: Testicular tissues were obtained from two OA patients (OA1-P1 and OA2-P2) and three NOA patients (NOA1-P3, NOA2-P4, NOA3-P5) using micro-dissection of testicular sperm extraction separately.

      (2) Whether the tissues of these patients are extracted simultaneously or separately, separated into single cells, and stored, and then single cell analysis is performed simultaneously. Please be specific.

      Thank you for your comments. The testicular tissues of these patients were extracted separately, then separated into single cells, and single cell analysis was performed simultaneously.

      (3) When performing single-cell analysis, cells from two OA patients were analyzed individually or combined. The same problem occurred in the cells of three NOA patients.

      Thank you for your comments. Cells from two OA patients and three NOA patients were analyzed individually.

      (4) Can you specifically point out the histological differences between OA and NOA in Figure 1A? This makes it easier for readers to understand the structure change between OA and NOA. Please also label representative supporting cells.

      Thank you for your comments. We have revised the description and it was shown in the revised manuscript.

      (5) The authors demonstrate that "We speculate that this lack of differentiation may be due to the intense morphological changes occurring in the sperm cells during this period, resulting in relatively minor differences in gene expression." Please provide some verification of this hypothesis? For example, use immunofluorescence staining to observe morphological changes in sperm cells.

      Thank you for your comments. Due to limited funds, we will verified this hypothesis in future studies.

      (6) The authors demonstrate that " As shown in Figure 5E, we discovered that PRAP1, BST2, and CCDC62 were co-expressed with SOX9 in testes tissues." The staining in Figure 5D is unclear, and it is difficult to explain that SOX9 is co-expressed with PRAP1 BST2 CCDC62 based on the current staining results. The staining patterns of SOX9 (green) and SOX9 (red) are also different. (SOX9 (red) appears as dots, while the background for SOX9 (green) is too dark to tell whether its staining is also in the form of dots.) In summary, increasing the clarity of the staining makes it more convincing. Alternatively, use high magnification to display these results.

      Thank you for your comments. I have redyed and updated this part of the immunofluorescence staining results. Please refer to the files named Figure 1, Figure 2, Figure 5, and Figure 8.

      (7) In Figure 8, the author emphasized the co-localization of Sertoli cells and Germ cells at corresponding stages and did a lot of staining, but it was difficult to distinguish the specific locations of co-localization, which was similar to Figure 5E. If possible, please mark specific colocalizations with arrows or use high magnification to display these results, in order to facilitate readers to better understand.

      Thank you for your comments. We have re-stained and updated this part of the data. Please refer to the immunofluorescence staining data in the updated Figure 8.

      (8) The authors emphasize that macrophages may play an important role in spermatogenesis. Therefore, adding relevant macrophage staining to observe the differences in macrophage expression between NOA and OA should better support this idea.

      Thank you for your comments. Macrophage-related experiments will be further explored in the future.

      (9) Notch1/2/3 signaling and integrin were discovered to be involved in germ cell-Sertoli cell interaction. However there are currently no concrete experiments to support this hypothesis. At least simple verification experiments are needed.

      Thank you for your comments. Due to limited funding, studies will be carried out in the future.

      (10) Data availability statements should not be limited to the corresponding author, especially for big data analysis. This is crucial to the credibility of this data (Have the scRNA-seq and scATAC-seq in this study been deposited in GEO or other databases, and when will they be released to the public?) The data for such big data analysis needs to be saved in GEO or other databases in advance so that more research can use it.

      Thank you for your comments. We have deposited scRNA-seq and scATAC-seq data in NCBI. “ScRNA-seq data have been deposited in the NCBI Gene Expression Omnibus with the accession number GSE202647, and scATAC-seq data have been deposited in the NCBI database with the accession number PRJNA1177103.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Strengths:

      The paper is solidly based on the ability of the authors to master molecular simulations of highly complex systems. In my opinion, this paper shows no major weaknesses. The simulations are carried out in a technically sound way. Comparative analyses of different systems provide valuable insights, even within the well-known limitations of MD. Plus, the authors further investigate why xCas9 exhibits improved recognition of the TGG PAM sequence compared to SpCas9 via well-tempered metadynamics simulations focusing on the binding of R1335 to the G3 nucleobase and the DNA backbone in both SpCas9 and xCas9. In this context, the authors provide a free-energy profiling that helps support their final model.

      The implementation of FEP calculations to mimic directed evolution improvement of DNA binding is also interesting, original and well-conducted.

      We thank the reviewer for their positive evaluation of our computational strategy. To further substantiate our findings, we have incorporated additional molecular dynamics and Free Energy Perturbation (FEP) calculations for the system bound to GAT. These results corroborate our previous observations obtained with AAG, reinforcing our conclusions.

      Overall, my assessment of this paper is that it represents a strong manuscript, competently designed and conducted, and highly valuable from a technical point of view.

      Weaknesses:

      To make their impact even more general, the authors may consider expanding their discussion on entropic binding to other recent cases that have been presented in the literature recently (such as e.g. the identification of small molecules for Abeta peptides, or the identification of "fuzzy" mechanisms of binding to protein HMGB1). The point on flexibility helping adaptability and expansion of functional properties is important, and should probably be given more evidence and more direct links with a wider picture.

      We have expanded our discussion on the role of entropy in favoring TGG binding to xCas9. To this end, we performed entropy calculations using the Quasi-Harmonic approximation (details provided in the Materials and Methods section). This analysis reveals that R1335 in xCas9 experiences an entropy increase compared to SpCas9, enhancing its adaptability and interaction with the DNA. This analysis and its explanation are detailed on pages 8-9.

      Additionally, we have enriched the Discussion section by clarifying how DNA binding is entropically favored in xCas9, thereby facilitating the recognition of alternative PAM sequences. A refined explanation is also included in the Conclusions section, where we contextualize xCas9 within a broader evolutionary framework of protein-DNA recognition. This highlights how structural flexibility can enable sequence diversity while maintaining high specificity.

      Recommendations for the authors:

      Overall, this is a very interesting and elegant manuscript with compelling results that shed light on the atomistic determinants of genetic-editing technologies.

      Since the paper proposes new findings that may be helpful for experimentalists, it would be interesting if the authors point out (in their discussion/conclusions) specific amino acids to mutate/target for future tests by the experimental community. This should just appear as an open hypothesis/proposal for new experiments.

      In the Conclusions, we have incorporated a discussion on how modifications in the PAM-binding cleft can enhance the recognition of alternative PAM sequences. As an illustrative example, we reference the recently developed SpRY Cas9 variant, which is capable of recognizing a broader range of PAMs. This variant includes mutations within the PAM-binding cleft that likely increase the flexibility of the interacting residues, as suggested by recent cryo-EM structures (Hibshman et al. Nat. Commun. 2024). The importance of fine-tuning the flexibility of the PAM-interacting cleft for engineering strategies has also been highlighted in the abstract.

      Overall, in light of the reviewer’s comments and in consideration of our findings, we revised the manuscript title in: “Flexibility in PAM Recognition Expands DNA Targeting in xCas9.” This new title better highlights the key findings from our research and contextualizes them within the broader goal of expanding DNA targeting capabilities, a critical priority for developing enhanced CRISPR-Cas systems.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This study by Wu et al. provides valuable computational insights into PROTAC-related protein complexes, focusing on linker roles, protein-protein interaction stability, and lysine residue accessibility. The findings are significant for PROTAC development in cancer treatment, particularly breast and prostate cancers.

      The authors' claims about the role of PROTAC linkers and protein-protein interaction stability are generally supported by their computational data. However, the conclusions regarding lysine accessibility could be strengthened with more in-depth analysis. The use of the term "protein functional dynamics" is not fully justified by the presented work, which focuses primarily on structural dynamics rather than functional aspects.

      Strengths:

      (1) Comprehensive computational analysis of PROTAC-related protein complexes.

      (2) Focus on critical aspects: linker role, protein-protein interaction stability, and lysine accessibility.

      Weaknesses:

      (1) Limited examination of lysine accessibility despite its stated importance.

      (2) Use of RMSD as the primary metric for conformational assessment, which may overlook important local structural changes.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors' claims about the role of PROTAC linkers and protein-protein interaction stability are generally supported by their computational data. However, the conclusions regarding lysine accessibility could be strengthened with more in-depth analysis. Expand the analysis of lysine accessibility, potentially correlating it with other structural features such as linker length.

      We thank the reviewers for the suggestions! We performed time dependent correlation analysis to correlate the dihedral angles of the PROTACs and the Lys-Gly distance (Figures 6 and S17). We included detailed explanation on page 16:

      “To further examine the correlation between PROTAC rotation and the Lys-Gly interaction, we performed a time-dependent correlation analysis. This analysis showed that PROTAC rotation translates motion over time, leading to the Lys-Gly interaction, with a correlation peak around 60-85 ns, marking the time of the interaction (Figure 6 and Figure S17). In addition, the pseudo dihedral angles also showed a high correlation (0.85 in the case of dBET1) with Lys-Gly distance. This indicated that degradation complex undergoes structural rearrangement and drives the Lys-Gly interaction.”

      (2) The use of the term "protein functional dynamics" is not fully justified by the presented work, which focuses primarily on structural dynamics rather than functional aspects. Consider changing "protein functional dynamics" to "protein dynamics" to more accurately reflect the scope of the study.

      Thanks to the reviewer for the suggestion to use the more accurate terminology! We agreed with the reviewer that if we keep “protein functional dynamics” in the title, we should focus on how the “overall protein dynamic” links to the “function” – The function is directly related to PROTAC-induced structural dynamics which is commonly seen in “protein-structural-function” relationship, but it is not our main focus. Therefore, we changed the title to replace “functional” by “structural”.  

      (3) Incorporate more local and specific characterization methods in addition to RMSD for a more comprehensive conformational assessment.

      We thank the reviewer for the suggestion. We performed time dependent correlation analysis to understand how the rotation of PROTACs can translate to the Lys-Gly interaction. In addition, we performed dihedral entropies analysis for each dihedral angle in the linker of the PROTACs to better examine the flexibility of each PROTAC.

      We included detailed explanation at page 18: “Our dihedral entropies analysis showed that dBET57 has ~0.3 kcal/mol lower entropies than the other three linkers, suggesting dBET57 is less flexible than other PROTACs (Figure S18).”

      Reviewer #2 (Public review):

      Summary:

      The manuscript reports the computational study of the dynamics of PROTAC-induced degradation complexes. The research investigates how different linkers within PROTACs affect the formation and stability of ternary complexes between the target protein BRD4BD1 and Cereblon E3 ligase, and the degradation machinery. Using computational modeling, docking, and molecular dynamics simulations, the study demonstrates that although all PROTACs form ternary complexes, the linkers significantly influence the dynamics and efficacy of protein degradation. The findings highlight that the flexibility and positioning of Lys residues are crucial for successful ubiquitination. The results also discussed the correlated motions between the PROTAC linker and the complex.

      Strengths:

      The field of PROTAC discovery and design, characterized by its limited research, distinguishes itself from traditional binary ligand-protein interactions by forming a ternary complex involving two proteins. The current understanding of how the structure of PROTAC influences its degradation efficacy remains insufficient. This study investigated the atomic-level dynamics of the degradation complex, offering potentially valuable insights for future research into PROTAC degradability.

      Reviewer #2 (Recommendations for the authors):

      (1) Regarding the modeling of the ternary complex, the BRD4 structure (3MXF) is from humans, whereas the CRBN structure in 4CI3 is derived from Gallus gallus. Is there a specific reason for not using structures from the same species, especially considering that human CRBN structures are available in the Protein Data Bank (e.g., 8OIZ, 4TZ4)?

      We appreciate the reviewer’s insightful comment regarding the choice of crystal structures of BRD4 and CRBN structures from two species. Our initial selection of 4CI3 for CRBN structure was based on its high resolution and publication in Nature journal. Furthermore, the Gallus gallus CRBN structure shares high degree of sequence and structural similarity with Homo sapiens CRBN, especially in the ligand binding region. At the time of our study, we were aware of 4TZ4 as Homo sapiens CRBN, however, we did not use this structure since no publication or detailed experimental was associated with it. Additionally, PDB 8OIZ, was not publicly available yet for other researchers to use at the time.

      (2) Based on the crystal structure (PDB ID: 6BNB) discussed in Reference 6, the ternary complex of dBET57 exhibits a conformation distinct from other PROTACs, with CRBN adopting an "open" conformation. Using the same CRBN structure for dBET57 as for other PROTACs might result in inaccurate docking outcomes.

      Thank you for the reviewer’s comment! As noted by the authors in Reference 6, the observed open conformation of CRBN in the dBET57 ternary complex may result from the high salt crystallization conditions, which could drive structural rearrangement, and crystal contacts that may induce this conformation. The authors also mentioned that this open conformation could, in part, reflect CRBN’s intrinsic plasticity. However, they acknowledged that further studies are needed to determine whether this conformational flexibility is a characteristic feature of CRBN that enables it to accommodate a variety of substrates. Despite these observations, we believe that the compatibility of the observed BRD4<sup>BD1</sup> binding conformation with both open and closed CRBN states suggests that these conformational changes are all possible. Therefore, we believe using the same initial CRBN structure for dBET57 as for other PROTACs can still reasonably reveal the dynamic nature of the ternary complex and would not significantly affect the accuracy of our docking outcomes either.

      (3) Figure 2 displays only a single frame from the simulations, which might not provide a comprehensive representation. Could a contact frequency heatmap of PROTAC with the proteins be included to offer a more detailed view?

      We thank the reviewer for the suggestion! We performed the contact map analysis to observe the average distance between PROTACs and BRD4<sup>BD1</sup> over 400ns of MD simulation (new Figure S4 added).

      We included detailed explanation at page 8 and 9: “The residues contact map throughout the 400ns MD simulation also showed different pattern of protein-protein interactions, indicating that the linkers were able to adopt different conformations (Figure S4).”

      (4) The conclusions in Figure 3 and S11 are based on a single 400 ns trajectory. The reproducibility of these results is therefore uncertain.

      We thank the reviewer for the suggestion! We added one more random seed MD simulation for each PROTAC to ensure the reproducibility of the results. The Result is shown in Figure S21 and the details for each MD run are updated in Table 1.

      (5) Figure 4 indicates significant differences between the first and last 100 ns of the simulations. Does this suggest that the simulations have not converged? If so, how can the statistical analysis presented in this paper be considered reliable?

      We thank the reviewers for the question. The simulation was initiated with a 10-15A gap between BRD4 and Ub to monitor the movement of degradation machinery and Lys-Gly interaction. The significant changes in pseudo dihedral in Figure 4 shows that the large-scale movement of the degradation complex can initiate the Lys-Gly binding. It does not relate to unstable sampling because the system remains very stable when BRD4 comes close to Ub.

      (6) In Figure 5, the dihedral angle of dBET57_#9MD1 is marked on a peptide bond. Shouldn't this angle have a high energy barrier for rotation?

      We thank the reviewers for catching the error! Indeed, it was an error that the dihedral angles were marked on the peptide bond. We reworked the figure and double checked our dihedral correlation analysis. The updated correlate dihedral angle selection and the correlation coefficient is shown in Figure 5.

      (7) Given that crystal structures for dBET 70, 23, and 57 are available, why is there a need to model the complex using protein-protein docking?

      We thank the reviewer for the feedback. Only dBET23 has the ternary complex available in a crystal structure, which has the PROTAC and both proteins, while dBET1, dBET57 and dBET70 are not completed as ternary complexes. Although dBET70 has a crystal structure, its PROTAC’s conformation is not resolved, and thus we decided to still perform protein-protein docking with dBET70. 

      We includeed the explanation at page 8: “Only dBET23 crystal structure is available with the PROTAC and both proteins, while the experimentally determined ternary complexes of dBET1, dBET57 and dBET70 are not available. “

      (8) On page 9, it is mentioned that "only one of the 12 PDB files had CRBN bound to DDB1 (PDB ID 4TZ4)." However, there are numerous structures of the DDB1-CRBN complex available, including those used for docking like 4CI3, as well as 4CI1, 4CI2, 8OIZ, etc.

      We thank the reviewers for the comment! We acknowledged the existence of several DDB1-CRBN complex crystal structures, such as PDB IDs 4CI1, 4CI2, 4CI3, and the more recent 8OIZ. For our study, we chose to use 4TZ4 to maintain consistency in complex construction and to align with the methodology established in a previously published JBC paper (https://doi.org/10.1016/j.jbc.2022.101653), which successfully utilized the same structure for a similar construct. At the time our study was conducted, the 8OIZ structure had not yet been released. We appreciate your suggestion and will consider incorporating alternative structures in future studies to further investigate our findings.

      (9) Table 2 is first referenced on page 8, while Table 1 is mentioned first on page 10. The numbering of these tables should be reversed to reflect their order of appearance in the text.

      We thank the reviewer for catching the error! We switched the order of Table 1 and Table 2.

      Reviewer #3 (Public review):

      The authors offer an interesting computational study on the dynamics of PROTAC-driven protein degradation. They employed a combination of protein-protein docking, structural alignment, atomistic MD simulations, and post-analysis to model a series of CRBN-dBET-BRD4 ternary complexes, as well as the entire degradation machinery complex. These degraders, with different linker properties, were all capable of forming stable ternary complexes but had been shown experimentally to exhibit different degradation capabilities. While in the initial models of the degradation machinery complex, no surface Lys residue(s) of BRD4 were exposed sufficiently for the crucial ubiquitination step, MD simulations illustrated protein functional dynamics of the entire complex and local side-chain arrangements to bring Lys residue(s) to the catalytic pocket of E2/Ub for reactions. Using these simulations, the authors were able to present a hypothesis as to how linker property affects degradation potency. They were able to roughly correlate the distance of Lys residues to the catalytic pocket of E2/Ub with observed DC50/5h values. This is an interesting and timely study that presents interesting tools that could be used to guide future PROTAC design or optimization.

      Reviewer #3 (Recommendations for the authors):

      (1) My most important comment refers to the MM/PBSA analysis, the results of which are shown in Figure S9: binding affinities of -40 to -50 kcal/mol are unrealistic. This would correspond to a dissociation constant of 10^-37 M. This analysis needs to be removed or corrected.

      We thank the reviewer for the comment! MM/PBSA analysis indeed cannot give realistic binding free energy. It does not include the configurational entropy loss which should be a large positive value. In addition, while the implicit PBSA solvent model computes solvation free energy, the absolute values may not be very accurate. However, because this is a commonly used energy calculation, and some readers may like to see quantitative values to ensure that the systems have stable intermolecular attractions, we kept the analysis in SI. We edited the figure legend, moved the Figure S10 in SI page 19, and added sentences to clearly state that the calculations did not include configuration entropy loss “Note that the energy calculations focus on non-bonded intermolecular interactions and solvation free energy calculations using MM/PBSA, where the configuration entropy loss during protein binding was not explicitly included. “.

      (2) I think that the analysis of what in the different dBETx makes them cause different degradation potency is underdeveloped. The dihedral angle analysis (Figure 4B) did not explain the observed behavior in my opinion. Please add additional, clearer analysis as to what structural differences in the dBETx make them sample very different conformations.

      We thank the reviewer for the suggestions! Based on the suggestion, we further performed dihedral entropy analysis for each dihedral angle in the linker part of the PROTAC to examine the flexibility of each PROTAC. Because each PROTAC has a different linker, we now clearly label them in a new Figure S18 in SI page 27. Low dihedral entropies indicate a more rigid structure and thus less flexibility to make a PROTAC more difficult to rearrange and facilitate the protein structural dynamic necessary for ubiquitination.

      We added detailed explanation on page 18: “Our dihedral entropy analysis showed that dBET57 has ~0.3 kcal/mol lower configuration entropies than the other dBETs with three different linkers, suggesting that dBET57 is less flexible than the other PROTACs (Figure S18).”

      (3) "The movement of the degradation machinery correlated with rotations of specific dihedrals of the linker region in dBETs (Figure 5).": this is not sufficiently clear from the figure. Definitely not in a quantitative way.

      We thank the reviewers for the suggestions! To further understand the correlation between PROTACs dihedral angles and the movement of degradation machinery, we performed time dependent correlation analysis to correlate the dihedral angles of the PROTACs and the Lys-Gly distance (Figures 6 and S17).

      We included detailed explanation on page 16:

      “To further examine the correlation between PROTAC rotation and the Lys-Gly interaction, we performed a time-dependent correlation analysis. This analysis showed that PROTAC rotation translates motion over time, leading to the Lys-Gly interaction, with a correlation peak around 60-85 ns, marking the time of the interaction (Figure 6 and Figure S17). In addition, the pseudo dihedral angles also showed a high correlation (0.85 in the case of dBET1) with Lys-Gly distance. This indicated that degradation complex undergoes structural rearrangement and drives the Lys-Gly interaction.

      (4) Cartoons are needed at multiple stages throughout the paper to enhance the clarity of what the modeled complexes looked like (e.g. which subunits they contained).

      We thank the reviewers for the suggestions. We added and remade several Figures with cartoons to better represent the stages. We also used higher resolution and included clearer labels for each protein system.

      (5) The difference between CRL4A E3 ligase and CRBN E3 ligase is not clear to the non-expert reader.

      Thanks for the reviewer’s comment! To clarify the terms "CRL4A E3 ligase" and "CRBN E3 ligase", which refer to different levels of description for the protein complexes, we added a couple of sentences in the Figure 1 legend. As a result, the non-expert readers can clearly know the differences.

      As illustrated in Figure 1,

      • CRL4A E3 ligase refers to the full E3 ligase complex, which includes all protein components such as CRBN, DDB1, CUL4A, and RBX1.

      • CRBN E3 ligase, on the other hand, is a more colloquial term typically used to describe just the CRBN protein, often in isolation from the full CRL4A complex.

      (6) Figure 1, legend: unclear why it's E3 in A and E2 in B.

      We thank the reviewer for the question! E3 ligase in Figure 1A refers to CRBN E3 ligase, where researchers also simply term it CRBN. We have added a sentence to specify that CRBN E3 ligase is also termed CRBN for simplicity. In Figure 1B, E2 was unclear in the sentences. The full name of E2 should be E2 ubiquitin-conjugating enzyme. Because the name is a bit long, researchers also call it E2 enzyme. We have corrected it and used E2 enzyme to make it clearer. 

      (7) "Although the protein-protein binding affinities were similar, other degraders such as dBET1 and dBET57 had a DC50/5h of about 500 nM". It's unclear what experimental data supports the assertion that the protein-protein binding affinities are similar.

      We thank reviewer for the question. Indeed, the statement is unclear.

      We corrected the sentence in page 6: “Although utilizing the exact same warheads, other degraders such as dBET1 and dBET57 had a DC<sub>50/5h</sub> of about 500 nM.”

      (8) Was the construction of the degradation machinery complex guided by experimental data (maybe cryo-EM or tomography)? If not, what is the accuracy of the starting complex for MD? This may impact the reliability of the obtained results.

      Thank you for your insightful comments! Yes, the construction of the degradation machinery complex was guided by available high-resolution crystal structures, which was selected to maintain consistency and align with the methodology established in a previously published JBC paper (https://doi.org/10.1016/j.jbc.2022.101653).

      We acknowledged that static crystal structures represent only a single snapshot of the system and may not capture the full conformational flexibility of the complex. To address this limitation, we performed MD simulations using multiple starting structures. This approach allowed us to explore a broader conformational landscape and reduced the dependence on any single starting configuration, thereby enhancing the reliability of the results.

      We hope this clarifies the robustness of our methodology and the steps taken to ensure accuracy in our simulations.

      (9) "With quantitative data, we revealed the mechanism underlying dBETx-induced degradation machinery": I think this may be too strong of an assertion. The authors may have developed a mechanistic hypothesis that can be tested experimentally in the future.

      We thank the reviewer for the suggestion. This is indeed a strong assertion and needs to be modified. We edited the sentence in page 7: “With quantitative data, we revealed the importance of the structural dynamics of dBETx-induced motions, which arrange positions of surface lysine residues of BRD4<sup>BD1</sup> and the entire degradation machinery.”

      (10) Figure S2: are the RMSDs calculated over all residues? Or just the BRD4 residues? Given that the structures are aligned with respect to CRBN, the reported RMSD numbers might be artificially low since there are many more CRBN residues than there are BRD4 residues. Also, why weren't the crystal structures used for dBET 23 and 70 for the modeling? Wouldn't you want to use the most accurate possible structures? Simulations were run for 23. Why not for 70?

      We thank the reviewer for the suggestion. We added a sentence to more clearly explain the RMSD calculations in Figure S2: “The structural superposition is performed based on the backbone of CRBN and RMSD calculation is conducted based on the backbone of BRD4<sup>BD1</sup>.”

      Although dBET70 has crystal structure, its PROTAC structure is not resolved, and thus we decided to still perform protein-protein docking with dBET70.  dBET1 and dBET57 do not have a crystal structure for the ternary complexes.

      We included the explanation at page 8: “Only dBET23 crystal structure is available with the PROTACs and both proteins, while the experimentally determined ternary complexes of dBET1, PROTACs of dBET57 and dBET70 are not available. “

      a. And there are no crystal structures available for 1 and 57? If so, please clearly say that. Otherwise please report the RMSD.

      We thank the reviewer for the suggestion. We included the explanation at page 8: “Only dBET23 crystal structure is available with the PROTACs and both proteins, while the experimentally determined ternary complexes of dBET1, PROTACs of dBET57 and dBET70 are not available.”

      (11) Table 2 is referenced before Table 1.

      We thank the reviewer for catching the error! We switched the order for Table 1 and Table 2.

      (12) Figure S3 is not referenced in the main paper.

      We thank the reviewer for catching the error! We now referred Figure S3 on page7.

      (13) Minor comments on grammar and sentence structure:

      a. It should be "binding of a ternary complex"

      b. "Our shows the importance": word missing.

      c. "...providing insights into potential orientations for ubiquitination. observe whether the preferred conformations are pre-organized for ubiquitination." Word or words missing.

      We thank reviewer for catching the errors! We corrected grammatical errors and unclear sentences throughout the entire paper and revised the sentences to make them easily understandable for non-expert readers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work describes a convincingly validated non-invasive tool for in vivo metabolic phenotyping of aggressive brain tumors in mice brains. The analysis provides a valuable technique that tackles the unmet need for patient stratification and hence for early assessment of therapeutic efficacy. However, wider clinical applicability of the findings can be attained by expanding the work to include more diverse tumor models.

      We thank the Editors for their comments. This concern was also raised by Reviewer 1 in the Public Review, where we address in more detail – please refer to comment PR-R1.C1. In brief, we agree that a more clinically relevant model should provide more translatable results to patients, and acknowledge this better in the revised manuscript: page 18 (lines 14-17), “While patient-derived xenografts and de novo models would be more suited to recapitulate human GBM heterogeneity and infiltration features, and genetic manipulation of glycolysis and mitochondrial oxidation pathways potentially relevant to ascertain DGE-DMI sensitivity for their quantification, (…)”. However, we also believe that the potential of DGE-DMI for application to different glioblastoma models or patients is demonstrated clearly enough with the two immunocompetent models we chose, extensively reported in the literature as reliable models of glioblastoma.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work introduces a new imaging tool for profiling tumor microenvironments through glucose conversion kinetics. Using GL261 and CT2A intracranial mouse models, the authors demonstrated that tumor lactate turnover mimicked the glioblastoma phenotype, and differences in peritumoral glutamate-glutamine recycling correlated with tumor invasion capacity, aligning with histopathological characterization. This paper presents a novel method to image and quantify glucose metabolites, reducing background noise and improving the predictability of multiple tumor features. It is, therefore, a valuable tool for studying glioblastoma in mouse models and enhances the understanding of the metabolic heterogeneity of glioblastoma.

      Strengths:

      By combining novel spectroscopic imaging modalities and recent advances in noise attenuation, Simões et al. improve upon their previously published Dynamic Glucose-Enhanced deuterium metabolic imaging (DGE-DMI) method to resolve spatiotemporal glucose flux rates in two commonly used syngeneic GBM mouse models, CT2A and GL261. This method can be standardized and further enhanced by using tensor PCA for spectral denoising, which improves kinetic modeling performance. It enables the glioblastoma mouse model to be assessed and quantified with higher accuracy using imaging methods.

      The study also demonstrated the potential of DGE-DMI by providing spectroscopic imaging of glucose metabolic fluxes in both the tumor and tumor border regions. By comparing these results with histopathological characterization, the authors showed that DGE-DMI could be a powerful tool for analyzing multiple aspects of mouse glioblastoma, such as cell density and proliferation, peritumoral infiltration, and distant migration.

      Weaknesses:

      (1) Although the paper provides clear evidence that DGE-DMI is a potentially powerful tool for the mouse glioblastoma model, it fails to use this new method to discover novel features of tumors. The data presented mainly confirm tumor features that have been previously reported. While this demonstrates that DGE-DMI is a reliable imaging tool in such circumstances, it also diminishes the novelty of the study.

      PR-R1.C1 – We thank the Reviewer for the detailed analysis and reply below to each point. PR-R1.C1.1 - novelty: We thank the Reviewer for the comments and understand their perspective. While we acknowledge that our paper is more methodologically oriented, we also believe that significant methodological advances are critical for new discoveries. This was our main motivation and is demonstrated in the present work, showing the ability to map in vivo metabolic fluxes in mouse glioma, a “hot topic” and very desirable in the cancer field. 

      PR-R1.C1.2 – additional tumor features: To strengthen the biological relevance of this methodologic novelty, we have now included immune cell infiltration among the tumor features assessed, besides perfusion, histopathology, cellularity and cell proliferation. For this, we performed iba-1 immunostaining for microglia/ macrophages, now included in Fig. 2-B. These new results demonstrate significantly higher microglia/macrophage infiltration in CT2A tumors compared to GL261, particularly at the tumor border. This is very consistent with the respective tumor phenotypes, namely differences in cell density and cellularity between the 2 cohorts and across pooled cohorts, as we now report: page 9 (lines 10-18), “Such phenotype differences were reflected in the regional infiltration of microglia/macrophages: significantly higher at the CT2A peritumoral rim (PT-Rim) compared to GL261, and slightly higher in the tumor region as well (Fig 2B). Further quantitative regional analysis of Tumor-to-PT-Rim ROI ratios revealed: (i) 47% lower cell density (p=0.004) and 32% higher cell proliferation (p=0.026) in GL261 compared to CT2A (Fig 2C, Table S3); and (ii) strong negative correlations in pooled cohorts between microglia/macrophage infiltration and cellularity (R=-0.91, p=<0.001) or cell density (R=-0.77, p=0.016), suggesting more circumscribed tumor growth with higher peripheral/peritumoral infiltration of immune cells.”; and page 16 (lines 13-19), “GL261 tumors were examined earlier after induction than CT2A (17±0 vs. 30±5 days, p = 0.032), displaying similar volumes (57±6 vs. 60±14, p = 0.813) but increased vascular permeability (8.5±1.1 vs 4.3±0.5 10<sup>3</sup>/min: +98%, p=0.001),  more disrupted stromal-vascular phenotypes and infiltrative growth (5/5 vs 0/5), consistent with significantly lower tumor cell density (4.9±0.2 vs. 8.2±0.3 10<sup>-3</sup> cells/µm<sup>2</sup>: -40%, p<0.001) and lower peritumoral rim infiltration of microglia/macrophages (2.1±0.7 vs. 10.0±2.3 %: -77%, p=0.008)”.

      PR-R1.C1.3 – new tumor features and DGE-DMI: Importantly, such regional differences in cellularity/cell density and immune cell infiltration between the two cohorts were remarkably mirrored by the lactate turnover maps (Fig 3-C), as we now report in the manuscript: page 12 (lines 6-15), “GL261 tumors accumulated significantly less lactate in the core (1.60±0.25 vs 2.91±0.33 mM: -45%, p=0.013) and peritumor margin regions (0.94±0.09 vs 1.46±0.17 mM: 36%, p=0.025) than CT2A – Fig 3 A-B, Table S1. Consistently, tumor lactate accumulation correlated with tumor cellularity in pooled cohorts (R=0.74, p=0.014). Then, lower tumor lactate levels were associated with higher lactate elimination rate, k<sub>lac</sub> (0.11±0.1 vs 0.06±0.01 mM/min: +94%, p=0.006) – Fig 3B – which in turn correlated inversely with peritumoral rim infiltration of microglia/macrophages in pooled cohorts (R=-0.73, p=0.027) – Fig 3-C. Further analysis of Tumor/P-Margin metabolic ratios (Table S3) revealed: (i) +38% glucose (p=0.002) and -17% lactate (p=0.038) concentrations, and +55% higher lactate consumption rate (p=0.040) in the GL261 cohort; and (ii) lactate ratios across those regions reflected the respective cell density ratios in pooled cohorts (R=0.77, p=0.010) – Fig 3-C”. This is a novel, relevant feature compared to our previous work, as highlighted in our discussion: page 17 (lines 1-8), “Tumor vs peritumor border analyses further suggest that lactate metabolism reflects regional histologic differences:

      lactate accumulation mirrors cell density gradients between and across the two cohorts; whereas lactate consumption/elimination rate coarsely reflects cohort differences in cell proliferation, and inversely correlates with peritumoral infiltration by microglia/macrophages across both cohorts. This is consistent with GL261’s lower cell density and cohesiveness, more disrupted stromal-vascular phenotypes, and infiltrative growth pattern at the peritumor margin area, where less immune cell infiltration is detected and relatively lower cell division is expected [43]”.

      We trust that these new features recovered from DGE-DMI (Fig 2-B and Fig 3-C) show its potential for new discoveries in glioblastoma.

      (2) When using DGE-DMI to quantitatively map glycolysis and mitochondrial oxidation fluxes, there is no comparison with other methods to directly identify the changes. This makes it difficult to assess how sensitive DGE-DMI is in detecting differences in glycolysis and mitochondrial oxidation fluxes, which undermines the claim of its potential for in vivo GBM phenotyping.

      PR-R1.C2: We thank the reviewer for raising this important point. The validity of the method for mapping specific metabolic kinetics in mouse glioma was reported in our previous work, using the same animal models, as specified in the introduction (page 4, lines 10-13): “we recently (…) propose[d] Dynamic Glucose-Enhanced (DGE) 2H-MRS [31], demonstrating its ability to quantify glucose fluxes through glycolysis and mitochondrial oxidation pathways in vivo in mouse GBM (…)”. Therefore, this was not reproduced in the present work. 

      In brief, our DGE-DMI results are very consistent with our previous study, where DGE single voxel deuterium spectroscopy was performed in the same tumor models with higher temporal resolution and SNR (as state on page 16, lines 9-10: glycolytic lactate synthesis rate, 0.59±0.04 vs. 0.55±0.07 mM/min; glucose-derived glutamate-glutamine synthesis rate, 0.28±0.06 vs. 0.40±0.08 mM/min), which in turn matched well the values reported by others for glucose consumption rate through: 

      (i) glycolysis, in different tumor models including mouse lymphoma in vivo (0.99 mM/min, by DGE-DMI (Kreis et al. 2020), rat breast carcinoma in situ (1.43 mM/min, using a biochemical assay (Kallinowski et al. 1988), and even perfused GBM cells (1.35 fmol min<sup>−1</sup> cell<sup>−1</sup>, according to Hyperpolarized 13C-MRS (Jeong et al. 2017), very similar to our previous in vivo measurements in GL261 tumors: 0.50 ± 0.07 mM min<sup>−1</sup> = 1.25 ± 0.16 fmol min<sup>−1</sup> cell<sup>−1</sup> (Simoes et al. 2022)); 

      (ii) mitochondrial oxidation, very similar to previous in vivo measurements in mouse GBM xenografts (0.33 mM min<sup>−1</sup>, using 13C spectroscopy (Lai et al. 2018)), and particularly to our in situ measurements in cell culture for (GL261, 0.69 ± 0.09 fmol min<sup>−1</sup> cell<sup>−1</sup>; and CT2A 0.44 ± 0.08 fmol min<sup>−1</sup> cell<sup>−1</sup>), remarkably similar to the in vivo measurements in the respective tumors in vivo (Gl261, 0.32 ± 0.10 mM min<sup>−1</sup> = 0.77 ± 0.23 fmol min<sup>−1</sup> cell<sup>−1</sup>; and CT2A, 0.51 ± 0.11 mM min<sup>−1</sup> = 0.60 ± 0.12 fmol min<sup>−1</sup> cell<sup>−1</sup>) (Simoes et al. 2022)). 

      (3) The study only used intracranial injections of two mouse glioblastoma cell lines, which limits the application of DGE-DMI in detecting and characterizing de novo glioblastomas. A de novo mouse model can show tumor growth progression and is more heterogeneous than a cell line injection model. Demonstrating that DGE-DMI performs well in a more clinically relevant model would better support its claimed potential usage in patients.

      PR-R1.C3: We agree that a more clinically relevant model, such as the one suggested by the Reviewer, would in principle be better suited to provide more translatable results to patients. We however believe that the potential of DGE-DMI for application to different glioblastoma models or patients, with GBM or any other types of brain tumors for that matter, is demonstrated clearly enough with the two syngeneic models we chose, given their robustness and general acceptance in the literature as reliable immunocompetent models of GBM, and for their different histologic and metabolic properties. This way we could fully focus on the novel metabolic imaging method, as compared to our previous single-voxel approach. While both tumor cohorts (GL261 and CT2A) were studied at more advanced stages of tumor progression, the metabolic differences depicted are consistent with the histopathologic features reported, as discussed in the manuscript; namely, the lower glucose oxidation rates. We have now modified the manuscript to highlight this point: page 18 (lines 12-14), “While patient-derived xenografts and de novo models would be more suited to recapitulate human GBM heterogeneity and infiltration features, and genetic manipulation of glycolysis and mitochondrial oxidation pathways could be relevant to ascertain DGE-DMI sensitivity for their quantification, (…)”.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors attempt to noninvasively image metabolic aspects of the tumor microenvironment in vivo, in 2 mouse models of glioblastoma. The tumor lesion and its surrounding appearance are extensively characterized using histology to validate/support any observations made with the metabolic imaging approach. The metabolic imaging method builds on a previously used approach by the authors and others to measure the kinetics of deuterated glucose metabolism using dynamic 2H magnetic resonance spectroscopic imaging (MRSI), supported by de-noising methods.

      Strengths:

      Extensive histological evaluation and characterization.

      Measurement of the time course of isotope labeling to estimate absolute flux rates of glucose metabolism.

      Weaknesses:

      (1) The de-noising method appears essential to achieve the high spatial resolution of the in vivo imaging to be compatible with the dimensions of the tumor microenvironment, here defined as the immediately adjacent rim of the mouse brain tumors. There are a few challenges with this approach. Often denoising methods applied to MR spectroscopy data have merely a cosmetic effect but the actual quantification of the peaks in the spectra is not more accurate than when applied directly to original non-denoised data. It is not clear if this concern is applicable to the denoising technique applied here. However, even if this is not an issue, no denoising method can truly increase the original spatial resolution at which data were acquired. A quick calculation estimates that the spatial resolution of the 2H MRSI used here is 30-40 times too low to capture the much smaller tumor rim volume, and therefore there is concern that normal brain tissue and tumor tissue will be the dominant metabolic signal in so-called tumor rim voxels. This means that the conclusions on metabolic features of the (much larger) tumor are much more robust than the observations attributed to the (much smaller) tumor microenvironment/tumor rim.

      PR-R2.C1: We thank the Reviewer for the constructive comments regarding resolution and tumor rim, and denoising. These issues were raised more extensively in the section Recommendations For The Authors, where they are addressed in detailed (RA-R2.C2). In summary, we agree with the Reviewer that no denoising method can increase the nominal resolution; not was that our purpose. Thus, we clarify the relevance of spectral matrix interpolation in MRSI, and how our display resolution should in principle provide a better approximation to the ground truth than the nominal resolution, relevant for ROI analysis in the tumor margin. While we further show relevant correlations between metabolic maps and histologic features in tumor core and margin, we agree with the reviewer that our observations in the tumor core are more robust than those in the margin, and acknowledge this in the Discussion: page 19, lines 6-10: “Therefore, further DGE-DMI preclinical studies aimed at detecting and quantifying relatively weak signals, such as tumor glutamate-glutamine, and/or increase the nominal spatial resolution to better correlate those metabolic results with histology findings (e.g. in the tumor margin), should improve basal SNR with higher magnetic field strengths, more sensitive RF coils, and advanced DMI pulse sequences [55]).”

      (2) To achieve their goal of high-level metabolic characterization the authors set out to measure the deuterium labeling kinetics following an intravenous bolus of deuterated glucose, instead of the easier measurement of steady-state after the labeling has leveled off. These dynamic data are then used as input for a mathematical model of glucose metabolism to derive fluxes in absolute units. While this is conceptually a well-accepted approach there are concerns about the validity of the included assumptions in the metabolic model, and some of the model's equations and/or defining of fluxes, that seem different than those used by others.

      PR-R2.C2: These concerns about the metabolic model, were also raised in more detail in the section Recommendations For The Authors, where they are addressed more extensively – please refer to RA-R2.C3 (glucose infusion protocol) and RA-R2.C4 (equations). In brief, we explain that the total volume injected (100uL/25g animal) is standard for i.v. administration in mice, and clarify this better in the manuscript (page 24, line 23); as well as the differences between our kinetic model and the original one reported by Kreis et al. (Radiology 2020), who quantified glycolysis kinetics on a subcutaneous mouse model of lymphoma, exclusively glycolytic and thus estimating the maximum glucose flux rate was from the lactate synthesis rate (Vmax = Vlac). Instead, we extended this model to account for glucose flux rates for lactate synthesis (Vlac) and also for glutamate-glutamine synthesis (Vglx) in mouse glioblastoma, where Vmax = Vlac + Vglx, also acknowledging its simplistic approach in the Discussion (page 20, lines 22-24: “(…) metabolic fluxes [estimations] through glycolysis and mitochondrial oxidation (…) could potentially benefit from an improved kinetic model simultaneously assessing cerebral glucose and oxygen metabolism, as recently demonstrated in the rat brain with a combination of 2H and 17O MR spectroscopy [62] (…)”).

      Reviewer #3 (Public Review):

      Summary:

      Simoes et al enhanced dynamic glucose-enhanced (DGE) deuterium spectroscopy with Deuterium Metabolic Imaging (DMI) to characterize the kinetics of glucose conversion in two murine models of glioblastoma (GBM). The authors combined spectroscopic imaging and noise attenuation with histological analysis and showcased the efficacy of metabolic markers determined from DGE DMI to correlate with histological features of the tumors. This approach is also potent to differentiate the two models from GL261 and CT2A.

      Strengths:

      The primary strength of this study is to highlight the significance of DGE DMI in interrogating the metabolic flux from glucose. The authors focused on glutamine/glutamate and lactate. They attempted to correlate the imaging findings with in-depth histological analysis to depict the link between metabolic features and pathological characteristics such as cell density, infiltration, and distant migration.

      Weaknesses:

      (1) A lack of genetic interrogation is a major weakness of this study. It was unclear what underlying genetic/epigenetic aberrations in GL261 and CT2A account for the metabolic difference observed with DGE DMI. A correlative metabolic confirmation using mass spectrometry of the two tumor specimens would give insight into the observed imaging findings.

      PR-R3.C1: We thank the Reviewer for the helpful comments, which we break down below.

      PR-R3.C1.1 - genetic interrogation/manipulation: While we did not have access to conditional models for key enzymes of each metabolic pathway, for their genetic manipulation, we did however assess the mitochondrial function in each cell line, showing a significantly higher respiration buffer capacity and more efficient metabolic plasticity between glycolysis and mitochondrial oxidation in GL261 cells compared to CT2A (Simoes et al. NIMG:Clin 2022). This could drive e.g. more active recycling of lactate through mitochondrial metabolism in GL261 cells, aligned with our observations of increased glucose-derived lactate consumption rate in those tumors compared to CT2A. We have now included this in the discussion (page 17, lines 812): “our results suggest increased lactate consumption rate (active recycling) in GL261 tumors with higher vascular permeability, e.g. as a metabolic substrate for oxidative metabolism [44] promoting GBM cell survival and invasion [45], aligned with the higher respiration buffer capacity and more efficient metabolic plasticity of GL261 cells than CT2A [31].”

      PR-R3.C1.2 - correlation with post-mortem metabolic assessment: implementing this validation step would require an additional equipment, also not accessible to us: focalized irradiator, to instantly halt all metabolic reactions during animal sacrifice. We do believe that DGE-DMI could guide further studies of such nature, aimed at validating the spatio-temporal dynamics of regional metabolite concentrations in mouse brain tumors. Thus, the importance of end-point validation is now stressed more clearly in the manuscript (page 20, lines 13-16): “(…) mapping pathway fluxes alongside de novo concentrations (…) may be determinant for the longitudinal assessment of GBM progression, with end-point validation (…)”.

      These concerns and recommendations were also raised by the Reviewer in the Recommendations to Authors section, where we address them more extensively – please see RA-R1.C3 and RA-R1.C2, respectively.

      (2) A better depiction of the imaging features and tumor heterogeneity would support the authors' multimodal attempt.

      PR-R3.C2: We agree with the Reviewer that including more imaging features would improve the non-invasive characterization of each tumor. Due to the RF coil design and time constraints, we did not acquire additional data, such as diffusion MRI to assess tissue microstructure. Instead, our multi-modal protocol included two dynamic MRI studies on each animal, for multiparametric assessment of tumor volume, metabolism and vascular permeability, using 1H-MRI, 2H-spectroscopy during 2H-labelled glucose injection, and 1H-imaging during Gd-DOTA injection, respectively. Rather than aiming at tumor radiomics, we focused on the dynamic assessment of tumor metabolic turnover with heteronuclear spectroscopy, which is challenging per se and particularly in mouse brain tumors, given their very small size. For such multi-modal studies we used a previously developed dual tuned RF coil: the deuterium coil (2H) positioned in the mouse head, for optimal SNR; whereas the proton coil (1H) had suboptimal performance compared a conventional single tuned coil, and was used only for basic localization and adjustments, reference imaging and tumor volumetry (T2-weighted), and DCE-T1 MRI (T1weighted). The latter was analyzed pixel-wise to assess spatial correlations between tumor permeability and metabolic metrics, as shown in Fig S3. Whereas the limited T2w MRI data collected was only analyzed for tumor volume assessment; no additional imaging features were extracted (e.g. kurtosis/skewness), since such assessment did not shown any differences between the two tumor cohorts in our previous study (Simoes et al NIMG:Clin 2022).

      (3) Integration of the various cell types in the tumor microenvironment, as allowed with the resolution of DGE DMI, will explain the observed difference between GL261 and CT2A. Is there a higher percentage of infiltrative "other cells" observed in GL261 tumor?

      PR-R3.C3: While DGE-DMI resolution is far larger than brain and brain tumor cell sizes, we now performed additional analysis to assess the percentage of microglia/macrophages in both cohorts. The results are now included in the manuscript, namely Fig. 2B, as previously explained in PR-R1.1. Interestingly though, we observed a lower percentage of infiltrative "other cells" in GL261 tumors compared to CT2A, which we discuss in the manuscript: pages 19-20 (lines 20-24 and 1-4), “Finally, our results are indicative of higher microglia/macrophage infiltration in CT2A than GL261 tumors, which is inconsistent with another study reporting higher immunogenicity of GL261 tumors than CT2A for microglia and macrophage populations [56]. Such discrepancy could be related to methodologic differences between the two studies, namely the endpointguided assessment of tumor growth (bioluminescence vs MRI, more precise volumetric estimations) and the stage when tumors were studied (GL261 at 23-28 vs 16-18 days postinjection, i.e. less time for immune cell to infiltration in our case), presence/absence of a cell transformation step (GFP-Fluc engineered vs we used original cell lines), or perhaps media conditioning effects during cell culture due to the different formulations used (DMEM vs RPMI).”

      (4) This underlying technology with DGE DMI is capable of identifying more heterogeneous GBM tumors. A validation cohort of additional in vivo models will offer additional support to the potential clinical impact of this study.

      PR-R3.C4: We agree with the Reviewer that applying DGE-DMI to more clinically-relevant models of human brain tumors will enhance its translational impact to patients, as also suggested by Reviewer 1 and addressed in PR-R1.C3. We also believe that the feasibility and potential of DGE-DMI for application to different glioblastoma models or patients, with GBM or any other primary or secondary brain tumors, is clearly demonstrated in our work, using two reliable and well-described immunocompetent models of GBM. In any case, we have now modified the manuscript to better acknowledge this point: page 18 (lines 14-16), “(…) patient-derived xenografts and de novo models would be more suited to recapitulate human GBM heterogeneity and infiltration features (…)”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors utilize longitudinal MRI to track tumor volumes but perform DMI at endpoint with late-stage tumors. Their previous publication applied metabolic imaging in tumors before the presence of necrosis. It would be valuable to perform longitudinal DMI to examine the evolution of glucose flux metabolic profile over time in the same tumor.

      RA-R1.C1: We thank the Reviewer for the very useful comments to our manuscript. We agree – in this work, we aimed at “extending” our previous DGE-2H single-voxel methodology to multivoxel (DMI), thoroughly demonstrating (1) its in vivo application to the same immunocompetent models of glioblastoma and (2) the ability to depict their phenotypic differences, and therefore (3) the potential for the metabolic characterization of more advanced models of GBM and/or their progression stages. We believe these objectives were achieved. Our results indeed open several possibilities, from longitudinal assessment of the spatio-temporal metabolic changes during GBM progression (and treatment-response) to its application to other models recapitulating more closely the human disease. Now that we have comprehensively demonstrated a protocol for DGE-DMI acquisition, processing and analysis in mouse GBM (a very challenging methodology), and demonstrate it in different mouse GBM cell lines, new studies can be designed to tackle more specific questions, like the one suggested here by the Reviewer. We have modified the manuscript to make this point clearer: page 20 (lines 15-17), “This may be determinant for the longitudinal assessment of GBM progression, with end-point validation; and/or treatment-response, to help selecting among new therapeutic modalities targeting GBM metabolism (…)”; page 21 (lines 5-8), “(…) we report a DGE-DMI method for quantitative mapping of glycolysis and mitochondrial oxidation fluxes in mouse GBM, highlighting its importance for metabolic characterization and potential for in vivo GBM phenotyping in different models and progression stages.”.

      (2) The authors demonstrate a promising correlation between metabolic phenotypes in vivo and key histopathological features of GBM at the endpoint. Directly assessing metabolites involved in glucose fluxes on endpoint tumor samples would strengthen this correlation.

      RA-R1.C2: While we acknowledge the Reviewer’s point, there were two main limitations to implementing such validation step in our protocol: 

      (1) Since we performed dynamic experiments, at the end of each study most 2H-glucose-derived metabolites were already below their maximum concentration (or barely detectable in some cases), as depicted by the respective kinetic curves (Fig 1-D and Fig S7), and thus no longer detectable in the tissues. Importantly, DGE-DMI could guide further studies towards selecting the ideally time-point for validating different metabolite concentrations in specific brain regions.

      (2) Such validation would require sacrificing the animals with a focalized irradiator (which we did not have), to instantly halt all metabolic reactions. Only then we could collect and analyze the metabolic profile of specific brain regions, either by in vitro MS or high-resolution NMR following extraction, or by ex vivo HRMAS analysis of the intact tissue, as reported previously by some of the authors for validation of glucose accumulation in different regions of mouse GL261 tumors (Simões et al. NMRB 2010: https://doi.org/10.1002/nbm.1421). Importantly, even if we did have access to a focalized irradiator, such protocols for metabolic characterization would compromise tissue integrity and thus the histopathologic analysis performed in this study. 

      We do agree with the importance of end-point validation and therefore stress it more clearly in the revised manuscript (page 20, lines 14-16): “(…) mapping pathway fluxes alongside de novo concentrations (…) may be determinant for the longitudinal assessment of GBM progression, with end-point validation (…)”.

      (3) Genetic manipulation of key players in the metabolic pathways studied in this paper (glycolysis and mitochondrial oxidation) would offer a strong validation for the sensitivity of DGE-DMI in accurately distinguishing metabolites (lactate, glutamate-glutamine) and their dynamics.

      RA-R1.C3: Thank you for this comment, we agree. This would be particularly relevant in the context of treatment-response monitoring. While such models were not available to us (conditional spatio-temporal manipulation of metabolic pathway fluxes), we believe our results can still demonstrate this point: We previously used in vivo DGE 2H-MRS to show evidence of decreased glucose oxidation fraction (Vglx/Vlac) in GL261 tumors under acute hypoxia (FiO2=12 %) compared to regular anesthesia conditions (FiO2=31 %), consistent with the inhibition of OXPHOS due to lower oxygens tensions (Simoes et al. NIMG:Clin 2022). In the present work, enhanced glycolysis in tumors vs peritumoral brain regions was clearly observed in all the animals studied, from both cohorts, as shown in Fig 1-B and Fig S4. Moreover, the spectral background (before glucose injection) is limited to a single peak in all the voxels: basal DHO, used as internal reference for spatio-temporal quantification of glucose, glutamine-glutamate, and lactate, all de novo and extensively characterized in healthy and glioma-bearing rodent brain (Lu et al. JCBFM 2018; Zhang et al. NMR Biomed 2024, de Feyter et al. SciAdv 2018; Batsios et al ClinCancerRes 2022;  Simoes et al. NIMG:Clin 2022) and other rodent tumors (Kreis et al. Radiology 2020, Montrazi et al. SciRep 2023). We have modified the manuscript to clarify this point (page 18, lines 14-17) “(…) patient-derived xenografts and de novo models would be more suited to recapitulate human GBM heterogeneity and infiltration features, and genetic manipulation of glycolysis and mitochondrial oxidation pathways could be relevant to ascertain DGE-DMI sensitivity for their quantification (…)”.

      (4) Please explain more why DEG-DMI can distinguish different glucose metabolites and how accurate it is.

      RA-R1.C4: DGE-DMI is the imaging extension of our previous work based on single-voxel deuterium spectroscopy, therefore relying on the same fundamental technique and analysis pipeline but moving from a temporal analysis to a spatio-temporal analysis for each metabolite, and thus dealing with more data. Unlike conventional proton spectroscopy (1H), only metabolites carrying the deuterium label (2H) will be detected in this case, including the natural abundance DHO (~0.03%), the deuterated glucose injected and its metabolic derivatives, namely deuterated lactate and deuterated glutamate-glutamine. Due to their different molecular structures, the deuterium atoms will resonate at specific frequencies (chemical shifts, ppm) during a 2H magnetic resonance spectroscopy experiment, as illustrated in Fig 1-A. The method is fully reproducible and accurate, and has been extensively reported in the literature from high-resolution NMR spectroscopy to in vivo spectroscopic imaging of different nuclei, such as proton (1H), deuterium (2H), carbon (13C), phosphorous (31P), and fluorine (19F). Since the fundamental principles of DMI and its application to brain tumors have been very well described in the flagship article by de Feyter et al., we have now highlighted this in the manuscript: page 4 (lines 4-7), “Deuterium metabolic imaging (DMI) has been (…) demonstrated in GBM patients, with an extensive rationale of the technique and its clinical translation [18], and more recently in mouse models of patient-derived GBM subtypes (…)”.

      (5) When mapping glycolysis and mitochondrial oxidation fluxes, add a control method to compare the reliability of DEG-DMI.

      RA-R1.C5: This concern (“lack of a control method”) was also raised by the Reviewer in the section Public Reviews section, where we already address it (PR-R1.2).

      (6) If using peritumoral glutamate-glutamine recycling as a marker of invasion capacity, what would be the correct rate of the presence of secondary brain lesions?

      RA-R1.C6: While our results suggest the potential of peritumoral glutamate-glutamine recycling as a marker for the presence of secondary brain lesions, this remains to be ascertained with higher sensitivity for glutamate-glutamine detection. Therefore, we cannot make further conclusions in this regard.  

      To make this point clear, we state in different sections of the discussion: page 19 (lines 1-2), “(…) recycling of the glutamate-glutamine pool may reflect a phenotype associated with secondary brain lesions.”; and page 19 (lines 6-10), “Therefore, further DGE-DMI preclinical studies aimed at detecting and quantifying relatively weak signals, such as tumor glutamateglutamine, and/or increase spatial resolution to correlate those metabolic results with histology findings (e.g in the tumor margin), should improve basal SNR with higher magnetic field strengths, more sensitive RF coils, and advanced DMI pulse sequences [55]).”).  

      (7) There are duplicated Vlac in Figure S3 B.

      RA-R1.C7: This was a typo that has now been corrected. Thank you.

      (8) Figure 4, it would be better to add a metabolic map of a tumor without secondary brain lesions to compare.

      RA-R1.C8: We fully agree and have modified Fig 4 accordingly, together with its legend.

      Particularly, we have included tumors C4 (without secondary lesions) vs G4 (with) for this “comparison”, since details of their histology, including the secondary lesions, are provided in Fig 2.

      (9) Full name of SNR and FID should be listed when first mentioned.

      RA-R1.C9: Agreed and modified accordingly, on pages 6-7 (lines 22-1), ”signal-to-noise-ratio (SNR)”, and page 19 (lines 5-6), “free induction decay (FID)”.

      (10) Page 2, Line 14: (59{plus minus}7 mm3) is not needed in the abstract.

      RA-R1.C10: As requested we have removed this specification from the Abstract.

      (11) Page 4, Line 22: Closing out the Introduction section with a statement on broader implications of the present work would enhance the effectiveness of the section.

      RA-R1.C11: We have added an additional sentence in this regard – pages 4-5 (lines 24-2): “Since DMI is already performed in humans, including glioblastoma patients [18], DGE-DMI could be relevant to improve the metabolic mapping of the disease.”

      (12) Define all acronyms to facilitate comprehension. For example, principal component analysis (PCR) and signal-to-noise ratio (SNR).

      R1.C12: Thank you for the comment. We have now defined all the acronyms when first used, including PCA (page 4 (line 11), “Marcheku-Pastur Principal Component Analysis (MP-PCA)”) and SNR (pages 6-7 (lines 22-1), as indicated above in comment R1.9).

      (13) Some elements within the figures have lower resolution, specifically bar graphs.

      RA-R1.C13: We apologize for this oversight. All the Figures have been revised accordingly, to correct this problem. Thank you.

      (14) Page 13, Line 8: "underly" should be spelled "underlie."

      RA-R1.C14: The typo has been corrected on page 15 (line 8), thank you.

      (15) Page 14, Line 13: "better vascular permeability" would be more effectively phrased as "increased vascular permeability."

      RA-R1.C15: This has also been corrected on page 16 (line 14), thank you.

      Reviewer #2 (Recommendations For The Authors):

      (1) I strongly suggest adding a scale bar in the histology figures.

      RA-R2.C1: Thank you for spotting our oversight! This has now been added as requested to Fig 2.

      (2) The 2H MRSI data were acquired at a nominal resolution of 2.25 x 2.27 x 2.25 mm^3, resulting in a nominal voxel volume of 11.5 uL. (In reality, this is larger due to the point spread function leading to signal bleeding from adjacent voxels.) If we estimate the volume of the tumor rim, as indicated by the histology slides, as (generously) ~ 50 um in width, 3.2 mm long (the diagonal of a 2.25 x 2.25 mm^2 square, and 2.27 mm high, we get a volume of 0.36 uL. Therefore the native spatial resolution of the 2H MRSI is at least 30 times larger than the volume occupied by the tumor rim/microenvironment. Normal tissue and tumor tissue will contribute the majority of the metabolic signal of that voxel. I feel an opposite approach could have been pursued: find out the spatial resolution needed to characterize the tumor rim based on the histology, then use a de-noising method to bring the SNR of those data to be acceptable. (this is just a thought experiment that assumes de-noising actually works to improve quantification for MRS data instead of merely cosmetically improve the data, so far the jury is still out on that, in my view).

      RA-R2.C2 – We thank the Reviewer for the detailed analysis and reply below to each point.

      RA-R2.C2.1 – spatial resolution and tumor rim: Our nominal voxel volume was indeed 11.5 uL, defined in-plane by the PSF which explains signal bleeding effects, as in any other imaging modality. The DMI raw data were Fourier interpolated before reconstruction, rendering a final in-plane resolution of 0.56 mm (0.72 uL voxel volume). The tumor rim (margin) analyzed was roughly 0.1 mm width (please note, not 0.05 mm), as explained in the methods section (page 28, line 16) and now more clearly defined with the scale bars in Fig 2. According to the Reviewer’s analysis, this would correspond to 0.1*3.2*2.27 = 0.73 uL, which we approximated with 1 voxel (0.72 uL), as displayed in Fig 3-A. Importantly, it has long been demonstrated that Fourier interpolation provides a better approximation to the ground truth compared to the nominal resolution, and even to more standard image interpolation performed after FT - see for instance Vikhoff-Baaz B et al. (MRI 2001. 19: 1227-1234), now citied in the Methods section: page 24, line 24 ([69]). While we do agree that both normal brain and tumor should contribute significantly to the metabolic signal in this relatively small region, we rely on extensive literature to maintain that despite its smoothing effect, the display resolution provides a better approximation to the ground truth and is therefore more suited than the nominal resolution for ROI analysis in this region. Still, we acknowledge this potential limitation in the Discussion: page 19, lines 6-10: “Therefore, further DGE-DMI preclinical studies aimed at detecting and quantifying relatively weak signals, such as tumor glutamate-glutamine, and/or increase the nominal spatial resolution to better correlate those metabolic results with histology findings (e.g. in the tumor margin), should improve basal SNR with higher magnetic field strengths, more sensitive RF coils, and advanced DMI pulse sequences [55]).”

      RA-R2.C2.2 – metabolic and histologic features at the tumor rim: Furthermore, we also performed ROI analysis of lactate metabolic maps in tumor and peritumoral rim areas closely reflected regional differences in cellularity and cell density, and immune cell infiltration between the 2 tumor cohorts and across pooled cohorts, as explained in the Public Review section - PR-R1.1 – and now report in the manuscript: page 12 (lines 6-16), “GL261 tumors accumulated significantly less lactate in the core (1.60±0.25 vs 2.91±0.33 mM: -45%, p=0.013) and peritumor margin regions (0.94±0.09 vs 1.46±0.17 mM: -36%, p=0.025) than CT2A – Fig 3 A-B, Table S1. Consistently, tumor lactate accumulation correlated with tumor cellularity in pooled cohorts (R=0.74, p=0.014). Then, lower tumor lactate levels were associated with higher lactate elimination rate, k<sub>lac</sub> (0.11±0.1 vs 0.06±0.01 mM/min: +94%, p=0.006) – Fig 3B – which in turn correlated inversely with peritumoral margin infiltration of microglia/macrophages in pooled cohorts (R=-0.73, p=0.027) - Fig 3-C. Further analysis of Tumor/P-Margin metabolic ratios (Table S3) revealed: (i) +38% glucose (p=0.002) and -17% lactate (p=0.038) concentrations, and +55% higher lactate consumption rate (p=0.040) in the GL261 cohort; and (ii) lactate ratios across those regions reflected the respective cell density ratios in pooled cohorts (R=0.77, p=0.010) – Fig 3-C”; page 17 (lines 1-8), “Tumor vs peritumor border analyses further suggest that lactate metabolism reflects regional histologic differences: lactate accumulation mirrors cell density gradients between and across the two cohorts; whereas lactate consumption/elimination rate coarsely reflects cohort differences in cell proliferation, and inversely correlates with peritumoral infiltration by microglia/macrophages across both cohorts. This is consistent with GL261’s lower cell density and cohesiveness, more disrupted stromal-vascular phenotypes, and infiltrative growth pattern at the peritumor margin area, where less immune cell infiltration is detected and relatively lower cell division is expected [43]”.

      RA-R2.C2.3 – alternative method: Regarding the alternative method suggested by the Reviewer, we have tested a similar approach in another region (tumor) and it did not work, as explained the Discussion section (page 19, lines 5-6) and Fig S11. Essentially, Tensor PCA performance improves with the number of voxels and therefore limiting it to a subregion hinders the results. In any case, if we understand correctly, the Reviewer suggests a method to further interpolate our data in the spatial dimension, which would deviate even more from the original nominal resolution and thus sounds counter-intuitive based on the Reviewer’s initial comment about the latter. More importantly, we would like to remark the importance of spectral denoising in this work, questioned by the Reviewer. There are several methods reported in the literature, most of them demonstrated only for MRI. We previously demonstrated how MPPCA denoising objectively improved the quantification of DCE-2H MRS in mouse glioma by significantly reducing the CRLBs: 19% improved fitting precision. In the present study, Tensor PCA denoising was applied to DGE-DMI, which led to an objective 63% increase in pixel detection based on the quality criteria defined, unambiguously reflecting the improved quantification performance due to higher spectral quality. 

      (3) Concerns re. the metabolic model: 2g/kg of glucose infused over 120 minutes already leads to hyperglycemia in plasma. Here this same amount is infused over 30 seconds... such a supraphysiological dose could lead to changes in metabolite pool sizes -which are assumed to not change since they are not measured, and also fractional enrichment which is not measured at all. Such assumptions seem incompatible with the used infusion protocol.

      RA-R2.C3:  We understand the concern. However, the protocol was reproduced exactly as originally reported by Kreis et al (Radiology 2020) that performed the measurements in mice and measured the fraction of deuterium enrichment (f=0.6). Since we also worked with mice, we adopted the same value for our model. The total volume injected was 100uL/25g animal, and adjusted for animal weight (96uL/24g average – Table S1), as we reported before (Simões et al. NIMG:Clin 2022), which is standard for i.v. bolus administration in mice as it corresponds to ~10% of the total blood volume. This volume is therefore easily diluted and not expected to introduce significant changes in the metabolic pool sizes. Continuous infusion protocols on the other hand will administer higher volumes, easily approaching the mL range when performed over periods as large as 120 min. This would indeed be incompatible with our bolus infusion protocol. We have now clarified this in the manuscript – page 24 (line 23): “i.v. bolus of 6,6<sup>′2</sup>H<sub>2</sub>-glucose (2 mg/g, 4 µL/g injected over 30 s (…)”.

      (4) Vmax = Vlac + Vglx. This is incorrect: Vmax = Vlac.

      RA-R2.C4: Thank you for raising this concern. As indicated in RA-R2.C3, our model (Simões et al. NIMG:Clin 2022) was adapted from the original model proposed by Kreis et al. (Radiology 2020), where the authors quantified glycolysis kinetics on a subcutaneous mouse model of lymphoma, exclusively glycolytic and thus estimating the maximum glucose flux rate was from the lactate synthesis rate (Vmax = Vlac). However, we extended this model to account for glucose flux rates for lactate synthesis (Vlac) and also for glutamate-glutamine synthesis (Vglx), where Vmax = Vlac + Vglx, as explained in our 2022 paper. While we acknowledge the rather simplistic approach of our kinetic model compared to others - reported by 13C-MRS under continuous glucose infusion in healthy mouse brain (Lai et al. JCBFM 2018) and mouse glioma (Lai et al. IJC 2018) – and acknowledge this in the Discussion (page 20, lines 22-24: “(…) metabolic fluxes [estimations] through glycolysis and mitochondrial oxidation (…) could potentially benefit from an improved kinetic model simultaneously assessing cerebral glucose and oxygen metabolism, as recently demonstrated in the rat brain with a combination of 2H and 17O MR spectroscopy [62] (…)”), our Vlac and Vglx results are consistent with our previous DGE 2H-MRS findings in the same glioma models, and very aligned with the literature, as discussed in PR-R1.C2.1.

      (5) Some other items that need attention: 0.03 % is used as the value for the natural abundance of DHO. The natural abundance of 2H in water can vary somewhat regionally, but I have never seen this value reported. The highest seen is 0.015%.

      RA-R2.C5: The Reviewers is referring to the natural abundance of deuterium in hydrogen: 1 in ~6400 is D, i.e. 0.015 %. The 2 hydrogen atoms in a water molecule makes ~3200 DHO, i.e. 0.03%. Indeed the latter can have slight variations depending on the geographical region, as nicely reported by Ge et al (Front Oncol 2022), who showed a 16.35 mM natural-abundance of DHO in the local tap water of St Luis MO, USA (55500/16.35 = 1/3364 = 0.034%).

      (6) Based on the color scale bar in Figure 1, the HDO concentration appears to go as high as 30 mM. Even if this number is off because of the previous concern (HDO), it appears to be a doubling of the HDO concentration. Is this real? What would be the origin of that? No study using [6,6'-2H2]-glucose that I'm aware of has reported such an increase in HDO.

      RA-R2.C6: As explained before (RA-R2.C3 and RA-R2.C4), we based our protocol and model on Kreis et al (Radiology 2020), who reported ~10 mM basal DHO levels raising up to ~27 mM after 90min, which are well within the ~30 mM ranges we report over a longer period (132 min).

      Similar DHO levels were mapped with DGE-DMI in mouse pancreatic tumors (Montrazi et al. SciRep 2023).

      (7) "...the central spectral matrix region selected (to discard noise regions outside the brain, as well as the olfactory bulb and cerebellum)". This reads as if k-space points correspond one-toone with imaging pixels, which is not the case.

      RA-R2.C7: We rephrased the sentence to avoid such potential misinterpretation, specifically: page 25 (lines 19-21): “Each dataset was averaged to 12 min temporal resolution and the noise regions outside the brain, as well as the olfactory bulb and cerebellum, were discarded (…)”.

      (8) The use of the term "glutamate-glutamine recycling" is not really appropriate since these metabolites are not individually detected with 2H MRS, which is a requirement to measure this neurotransmitter cycling.

      RA-R2.C8: Thank you for this comment. To avoid this misinterpretation, we have now rephrased "glutamate-glutamine recycling" to “recycling of the glutamate-glutamine pool” in all the sentences, namely: page 2 (lines 14-15); page 15 (line 8); page 15 (line 8); page 19 (line 1); page 21 (line 10).

      Reviewer #3 (Recommendations For The Authors):

      (1) One major issue is the lack of underlying genetics, and therefore it is hard for readers to put the observed difference between GL261 and CT2A into context. The authors might consider perturbing the genetic and regulatory pathways on glycolysis and glutamine metabolism, repeating DGE DMI measure, in order to enhance the robustness of their findings.

      RA-R3.C1: We thank the reviewer for the helpful revision and comments. The point made here is aligned with Reviewer 1’s, addressed in RA-R1.C3; and also with our previous reply to the Reviewer, PR-R3.C1. Thus, we agree that conditional spatio-temporal manipulation of metabolic pathway fluxes would be relevant to further demonstrate the robustness of DGEDMI, particularly for treatment-response monitoring. While such models were not available to us, our previous findings seem compelling enough to demonstrate this point. Thus, we previously showed a significantly higher respiration buffer capacity and more efficient metabolic plasticity between glycolysis and mitochondrial oxidation in GL261 cells compared to CT2A (Simoes et al. NIMG:Clin 2022), which could enhance lactate recycling through mitochondrial metabolism in GL261 cells and thus explain our observations of increased glucose-derived lactate consumption rate in those tumors compared to CT2A. We have now included this in the discussion (page 17, lines 8-12): “our results suggest increased lactate consumption rate (active recycling) in GL261 tumors with higher vascular permeability, e.g. as a metabolic substrate for oxidative metabolism [44] promoting GBM cell survival and invasion [45], aligned with the higher respiration buffer capacity and more efficient metabolic plasticity of GL261 cells than CT2A [31].” Moreover, we previously showed evidence of DGE-2H MRS’ ability to detect decreased glucose oxidation fraction (Vglx/Vlac) in GL261 tumors under acute hypoxia (FiO2=12 %) compared to regular anesthesia conditions (FiO2=31 %), consistent with the inhibition of OXPHOS due to lower oxygens tensions (Simoes et al. NIMG:Clin 2022).

      (2) Is increased resolution possible for DGE DMI to correlate with histological findings?

      RA-R3.C2: The resolution achieved with DGE DMI, or any other MRI method, is limited by the signal-to-noise ratio (SNR), which in turn depends on the equipment (magnetic field strength and radiofrequency coil), the pulse sequence used, and post-processing steps such as noiseremoval. Thus, increased resolution could be achieved with higher magnetic field strengths, more sensitive RF coils, more advanced DMI pulse sequences, and improved methods for spectral denoising if available. We have used the best configuration available to us and discussed such limitations in the manuscript, including now a few modifications to address the Reviewer’s point more clearly – page 19 (lines 6-10): “Therefore, further DGE-DMI preclinical studies aimed at detecting and quantifying relatively weak signals, such as tumor glutamateglutamine, and/or increase the nominal spatial resolution to better correlate those metabolic results with histology findings (e.g in the tumor margin), should improve basal SNR with higher magnetic field strengths, more sensitive RF coils, and advanced DMI pulse sequences [55])”.

      (3) The authors might consider measuring the contribution of stromal cells and infiltrative immune cells in the analysis of DGE DMI data, to construct a more comprehensive picture of the microenvironment.

      RA-R3.C3: Thank you for this important point. We now added additional Iba-1 stainings of infiltrating microglia/macrophages, for each tumor, as suggested by the Reviewer; stromal cells would be more difficult to detect and we did not have access to a validated staining method for doing so. Our new data and results - now included in Fig 2B – indicate significantly higher levels of Iba-1 positive cells in CT2A tumors compared to GL261, which are particularly noticeable in the periphery of CT2A tumors and consistent with their better-defined margins and lower infiltration in the brain parenchyma. This has been explained more extensively in PRR1.1.

      (4) Additional GBM models with improved understanding of the genetic markers would serve as an optimal validation cohort to support the potential clinical translation.

      RA-R3.C4: We agree with the Reviewer and direct again to RA-R1.3, where we already addressed this suggestion in detail and introduced modifications to the manuscript accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this report, the authors investigated the effects of reproductive secretions on sperm function in mice. The authors attempt to weave together an interesting mechanism whereby a testosterone-dependent shift in metabolic flux patterns in the seminal vesicle epithelium supports fatty acid synthesis, which they suggest is an essential component of seminal plasma that modulates sperm function by supporting linear motility patterns.

      Strengths:

      The topic is interesting and of general interest to the field. The study employs an impressive array of approaches to explore the relationship between mouse endocrine physiology and sperm function mediated by seminal components from various glandular secretions of the male reproductive tract.

      Thank you for your positive evaluation of our study's topic and approach. We are pleased that you found our investigation into the effects of reproductive secretions on sperm function to be of general interest to the field. We appreciate your positive feedback on the diverse methods we employed to explore this complex relationship.

      Weaknesses:

      Unfortunately, support for the proposed mechanism is not convincingly supported by the data, and the experimental design and methodology need more rigor and details, and the presence of numerous (uncontrolled) confounding variables in almost every experimental group significantly reduce confidence in the overall conclusions of the study.

      The methodological detail as described is insufficient to support replication of the work. Many of the statistical analyses are not appropriate for the apparent designs (e.g. t-tests without corrections for multiple comparisons). This is important because the notion that different seminal secretions will affect sperm function would likely have a different conclusion if the correct controls were selected for post hoc comparison. In addition, the HTF condition was not adjusted to match the protein concentrations of the secretion-containing media, likely resulting in viscosity differences as a major confounding factor on sperm motility patterns.

      We appreciate you highlighting concerns regarding our weak points and apologize for our unclear description. We revised the manuscript to be as rigorous and detailed as possible. In addition, some experimental designs were changed to simpler direct comparisons, and additional experiments were conducted (New Figure 1A-F, lines 103-113). We have made our explanations more consistent with the provided data, which includes further experimentation with additional controls and larger sample sizes to increase the reliability of the findings.

      To address the multiple testing problem, a multiple testing correction was made by making the statistical tests more stringent (Please see Statistical analysis in the Methods section and the Figure legends). Based on different statistical methods, the analysis results did not require significant revisions of the previous conclusions.

      Because the experiments on mixing extracts from the seminal vesicles were exploratory, we planned to avoid correcting for multiple comparisons. Repeating the t-test could lead to a Type I error in some results, so we apologize for not interpreting and annotating them. In the revised version, we removed the dataset for experiments on mixing extracts from the seminal vesicles and prostate, and we changed the description to refer to the clearer dataset mentioned above.

      The viscosity of the secretion-containing medium was measured with a viscometer, confirming that secretions did not significantly affect the viscosity of the solution. In addition, as the reviewer pointed out, we addressed the issue that the HTF condition could not be used as a control because of the heterogeneity in protein concentration (New Fig.1G, lines 110-111).

      Overall, we concluded that seminal vesicle secretion improves the linear motility of sperm more than prostate secretion.

      There is ambiguity in many of the measurements due to the lack of normalization (e.g. all Seahorse Analyzer measurements are unnormalized, making cell mass and uniformity a major confounder in these measurements). This would be less of a concern if basal respiration rates were consistently similar across conditions and there were sufficient independent samples, but this was not the case in most of the experiments.

      We apologize for the many ambiguities in the first manuscript. Cell culture experiments in the paper, including the flux analysis, were performed under conditions normalized or fixed by the number of viable cells. The description has also been revised to emphasize that the measurement values are standardized by cell count (lines 183-185, 189-190, 194-197). We emphasize that testosterone affects metabolism under the same number of viable cells (New Fig.4). This change in basal respiration is thought to be due to the shift in the metabolic pathway of seminal vesicle epithelial cells to a “non-normal TCA cycle” in which testosterone suppresses mitochondrial oxygen consumption, even under aerobic conditions (New Figs.3, 4, 5).

      The observation that oleic acid is physiologically relevant to sperm function is not strongly supported. The cellular uptake of 10-100uM labeled oleic acid is presumably due to the detergent effects of the oleic acid, and the authors only show functional data for nM concentrations of exogenous oleic acid. In addition, the effect sizes in the supporting data were not large enough to provide a high degree of confidence given the small sample sizes and ambiguity of the design regarding the number of biological and technical replicates in the extracellular flux analysis experiments.

      Thank you for your important critique. As you noted, the too-high oleic acid concentration did not reflect physiological conditions. Therefore, we changed the experimental design of an oleic acid uptake study and started again. We added an in vitro fertilization experiment corresponding to the functional data of exogenous oleic acid at nM concentrations (New Fig.7J,K, Lines 274-282).

      For the flux data to determine the effect of oleic acid on sperm metabolism, we have indicated in the text that the data were obtained based on eight male mice and two technical replicates. Pooled sperm isolated and cultured from multiple mice were placed in one well. The measurements were taken in three different wells, and each experiment was repeated four times. We did not use the extracellular flux analyzers XFe24 or XFe96. The measurements were also repeated because the XF HS Mini was used in an 8-well plate (only a maximum of 6 samples at a run since 2 wells were used for calibration).

      Overall, the most confident conclusion of the study was that testosterone affects the distribution of metabolic fluxes in a cultured human seminal vesicle epithelial cell line, although the physiological relevance of this observation is not clear.

      We thank the comments that this finding is one of the more robust conclusions of our study. Below we have written our thoughts on the physiological relevance of the observation results and our proposed revisions. In the mouse experiments, when the action of androgens was inhibited by flutamide, oleic acid was no longer synthesized in the seminal vesicles. The results of the experiments using cultured seminal vesicle epithelial cells showed that oleic acid was not being synthesized because of a change in metabolism dependent on testosterone. We have also added IVF data on the effects of oleic acid on sperm function (New Fig.7 and Supplementary Fig. 5, lines 274-282).<br /> As you can see, we have obtained consistent data in vitro and in vivo in mice. Our data also showed that the effects of testosterone on metabolic fluxes in vitro are similar in mouse and human seminal vesicle epithelial cells (New Fig.9). Therefore, it can be assumed that a decrease in testosterone levels causes abnormalities in the components of human semen. However, the conclusion was overestimated in the original manuscript, so we changed the wording as follows: It could be assumed that a decrease in testosterone levels causes abnormalities in the components of human semen. (lines 422-423)

      In the introduction, the authors suggest that their analyses "reveal the pathways by which seminal vesicles synthesize seminal plasma, ensure sperm fertility, and provide new therapeutic and preventive strategies for male infertility." These conclusions need stronger or more complete data to support them.

      We appreciate your comments about the suggestion presented in the introduction.

      We also removed our conclusions regarding treatment and prevention strategies for male infertility (lines 96-98). We wanted to discuss our findings not conclusively but as future applications that could result from further research based on our initial findings.

      The last sentence of the introduction has been revised to tone down these assertions as follows: These analyses revealed that testosterone promotes the synthesis of oleic acid in seminal vesicle epithelial cells and its secretion into seminal plasma, and the oleic acid ensures the linear motility and fertilization ability of sperm.

      We are grateful for your suggestions, which have prompted us to refine our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Using a combination of in vivo studies with testosterone-inhibited and aged mice with lower testosterone levels, as well as isolated mouse and human seminal vesicle epithelial cells, the authors show that testosterone induces an increase in glucose uptake. They find that testosterone induces differential gene expression with a focus on metabolic enzymes. Specifically, they identify increased expression of enzymes that regulate cholesterol and fatty acid synthesis, leading to increased production of 18:1 oleic acid.

      Strength:

      Oleic acid is secreted by seminal vesicle epithelial cells and taken up by sperm, inducing an increase in mitochondrial respiration. The difference in sperm motility and in vivo fertilization in the presence of 18:1 oleic acid and the absence of testosterone is small but significant, suggesting that the authors have identified one of the fertilization-supporting factors in seminal plasma.

      Thank you for your positive comments regarding our work on the role of testosterone in regulating metabolic enzymes and the subsequent production of 18:1 oleic acid in seminal vesicle epithelial cells. We are pleased that the strength of our findings, particularly identifying oleic acid as a factor influencing sperm motility and mitochondrial respiration, has been recognized.

      Weaknesses:

      Further studies are required to investigate the effect of other seminal vesicle components on sperm capacitation to support the author's conclusions. The author's experiments focused on potential testosterone-induced changes in the rate of seminal vesicle epithelial cell glycolysis and oxphos, however, provide conflicting results and a potential correlation with seminal vesicle epithelial cell proliferation should be confirmed by additional experiments.

      Thank you very much for your valuable criticism. Although we fully agree with your comment, conducting experiments to investigate the effects of other seminal vesicle components on the fertilization potential of sperm would be a great challenge for us. This is because it has taken us the last three years to identify oleic acid as a key factor in seminal plasma. We are considering a follow-up study to explore the effect of other seminal vesicle components on sperm capacitation. Therefore, we have revised the Introduction and conclusions to tone down our assertions .

      The revised manuscript also includes additional data showing a correlation between changes in metabolic flux and the proliferation of seminal vesicle epithelial cells using shRNA. As a result, it was shown that cell proliferation is promoted when mitochondrial oxidative phosphorylation is promoted by ACLY knockdown (New Fig.8D, lines 303-305). This shows a close relationship between the metabolic shift in seminal vesicle epithelial cells and cell proliferation. The revised manuscript includes an interpretation and discussion of these results (lines 369-379).

      We are grateful for your suggestions, which have prompted us to refine our manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Male fertility depends on both sperm and seminal plasma, but the functional effect of seminal plasma on sperm has been relatively understudied. The authors investigate the testosterone-dependent synthesis of seminal plasma and identify oleic acid as a key factor in enhancing sperm fertility.

      Strengths:

      The evidence for changes in cell proliferation and metabolism of seminal vesicle epithelial cells and the identification of oleic acid as a key factor in seminal plasma is solid.

      Weaknesses:

      The evidence that oleic acids enhance sperm fertility in vivo needs more experimental support, as the main phenotypic effect in vitro provided by the authors remains simply as an increase in the linearity of sperm motility, which does not necessarily correlate with enhanced sperm fertility.

      We appreciate the positive feedback on the solid evidence of cell proliferation and metabolic changes in seminal vesicle epithelial cells and the identification of oleic acid as an important factor in seminal plasma. We fully agree with the assessment that the evidence linking oleic acid and increased sperm fertility in vivo needs further experimental support. To address this concern, we changed the experimental design of an oleic acid study and started again to be more physiological regarding the effect of oleic acid on fertility outcomes, increased the replicates of artificial insemination, and added in vitro fertilization assessments (New Fig.7 and supplementary Fig.5, lines 274-282). The revised manuscript describes these experiments and discusses the association between oleic acid and fertility.

      We are grateful for your suggestions, which have prompted us to refine our manuscript.

      Recommendations for the authors:

      Reviewing Editor's note:

      As you can see from the three reviewers' comments, the reviewers agree that this study can be potentially important if major concerns are adequately addressed. The major concern common to all the reviewers is the incomplete mechanistic link between the physiological androgen effect on the production of oleic acid and its effect on sperm function. Statistical analyses need more rigor and consideration of other important capacitation parameters are needed to address these concerns and to improve the manuscript to support the current conclusions.

      Thank you for summarizing the reviewers' feedback and for your insights regarding the major concerns raised. We appreciate the reviewers' understanding of the potential importance of our work and have addressed the issues highlighted to strengthen the manuscript. We believe these changes will improve the quality of the manuscript and provide a clearer and more complete understanding of the role of androgens and oleic acid in sperm function.

      Reviewer #1 (Recommendations For The Authors):

      The following comments are provided with the hope of aiding the authors in improving the alignment between the data and their interpretations.

      Thank you for allowing us to strengthen our manuscript with your valuable comments and queries. We have made our best efforts to reflect your feedback.

      Major Comments:

      (1) The methodological detail is not sufficient to reproduce the work. For example:<br /> a. Manufacturer protocols are referred to extensively. These protocols are neither curated nor version-controlled. Please consider describing the underlying components of the assays. If information is not available, please consider providing catalog numbers and lot numbers in the methods (if appropriate for journal style requirements).

      We appreciate this suggestion, which we believe is important to ensure reproducibility. We described the catalog number in our Methodology and included as much information as possible.

      b. Please consider describing the analyses in full, with consideration given to whether blinding was part of the design. For example- line 492: "apoptotic cells were quantified using ImageJ". How was this quantified? How were images pre-processed? Etc.

      Although blinding was not performed, experiments and analyses based on Fisher's three principles were conducted to eliminate bias (lines 549-552). In order to avoid false-positive or false-negative results, it is clearly stated that tissue sections treated with DNAse were used as positive controls, and tissue sections without TdT were used as negative controls for apoptosis. We have added detailed quantification methods (lines 544-546).

      c. Please consider providing versions of all acquisition and analysis software used.

      We have added software version information in Materials and Methods.

      (2) Please consider revisiting the statistical analyses. Many of the analyses don't seem appropriate for the design. For example, the use of a t-test with multiple comparisons for repeated measures design in Figure 2 and the use of t-test for two-factor design in Figure 8. etc.

      To address the multiple testing issues, the statistical methodology was changed to a more rigorous one. Details are given in the Statistical analysis in the Methods section and the Figure legends.

      (3) The increase in % LIN in Figure 1 may be confounded by differences in viscosity between HTF and the fluid secretion mixtures. For this reason, HTF may not be an appropriate control for the ANOVA post hoc analysis. HTF protein was not adjusted to the same concentration as the secretion mixtures, correct? Ultimately, it does not appear that there would be a significant statistical effect of the different fluid mixtures if appropriate statistical comparisons were made. This detracts from the notion that the secretions impact sperm function.

      (4) Figure 1, the statistical analysis in the legend suggests that the experiments were analyzed with a t-test. Were corrections made for multiple comparisons in B-D? An ANOVA would probably be more appropriate.

      We used a viscometer to measure the viscosity of a solution of prostate and seminal vesicle secretions adjusted to a protein concentration of 10 mg/mL. The results showed that the secretions did not cause any significant viscosity changes (New Fig.1G, Lines 110-111).

      As you pointed out, the protein levels in the HTF medium and the secretion mixture are not adjusted to the same concentration. In addition, the original manuscript was not a controlled experiment because the two factors, seminal vesicle and prostate extracts, were modified. Therefore, to investigate the effect of prostate and seminal vesicle secretions on sperm motility, we modified the experimental design to directly compare the effects of the two groups: seminal vesicle and prostate extracts (New Fig.1A-G, lines 101-113). To show the sperm quality used in this study, motility data from sperm cultured in the HTF medium are presented independently in New Supplemental Fig.1A.

      (5) Additionally in Figure 1, there is no baseline quality control data to show that there are no intrinsic differences between sperm sampled from the two treatment groups. So baseline differences in sperm quality/viability remain a potential confounder.

      We thank you for this important point. Epididymal sperm were collected from healthy mice. We recovered only the seminal vesicle secretions from the flutamide-treated mice to pursue its role in the accessory reproductive glands, since testosterone targets the testes and accessory reproductive organs. So, there was no qualitative difference between the epididymal sperm before treatment. Nevertheless, incubation with seminal vesicle secretion for one hour altered the sperm motility pattern and in vivo fertilization results. Sperm function was altered by seminal vesicle secretion in a short period of culture time. We apologize for the confusion, and we have revised the text and figure to carry a clearer message (lines 128-132).

      (6) Figure 1E, did the authors confirm that flutamide-treated mice had decreased serum androgens? How often were mice treated with flutamide? This is important because flutamide has a relatively short half-life and is rapidly metabolized to inert hydroxyflutamide.

      Serum testosterone levels were unchanged. Flutamide was administered every 24 hours for 7 consecutive days. Although there was no change in blood testosterone levels (New Supplemental Fig.1B), a decrease in the weight of the seminal vesicles, prostate, and epididymis was confirmed. This is thought to be due to the pharmacological activity of flutamide.

      (7) Figure 1H, the meaning of 'relative activity of mitochondria' isn't clear. JC-1 does not measure 'activity'. A decreased average voltage potential across the inner mitochondrial membrane may indicate that more of the sperm from the flutamide group were dead. Additionally, J-aggregates are slow to form, generally requiring long incubation periods of at least 90 minutes or more. Additional positive and negative controls for predictable mitochondrial transmembrane voltage potential polarization states would have improved the quality of this experiment.

      Thank you for pointing this out. We have replaced the relative activity of mitochondria with high mitochondrial membrane potential (New Fig.1M, lines 125-128). Actually, it is thought that the sperm cultured in seminal vesicle secretions from mice that had been administered flutamide died because the motility of the sperm was also significantly reduced. Since antimycin reduces mitochondrial membrane potential, we have added an experiment in which 10 µM antimycin-treated sperm were used as a control to confirm that the JC-1 reaction is sensitive to changes in membrane potential.

      (8) Figure 4, the extracellular flux data appear to be unnormalized. The Seahorse instruments are extremely sensitive to the mass and uniformity of the cells at the bottom of the well. This may be a significant confounder in these results. For example, all of the observed differences between groups could simply be a product of differential cell mass, which is in line with the reduced growth potential of testosterone-treated cells indicated by the authors in the results section.

      We thank you for this important point. After correcting for cell viability, we seeded the same number of viable cultured cells into wells between experimental groups before measuring them in the flux analyzer. There were no significant differences in survival rates in all experiments. As a result, an increase in glucose-induced ECAR and a suppression of mitochondrial respiration were observed. We would like to emphasize that this difference based on metabolic data does not imply a reduction in the growth potential of the cells due to testosterone treatment.

      We described that these measurements are normalized based on cell count and viability (lines 184, 190, 195).

      (9) How did the authors know that the isolated mouse primary cells were epithelial cells? Was this confirmed? What was the relative sample purity?

      The cells were labeled with multiple epithelial cell markers (cytokeratin) and confirmed using immunostaining and flow cytometry. The percentage of cells positive for epithelial cell markers was approximately 80%. A stromal cell marker (vimentin) was also used to confirm purity, but only a few percent of cells were positive. The contaminating cell type was considered to be mainly muscle cells because the gene expression levels of muscle cell markers verified by RNA-seq were relatively high.

      (10) It is misleading to include the lactate/pyruvate media measurements in the middle of the figure in Figure 4 D and E because it seems at first glance like these measurements were made in the seahorse media but they are completely unrelated. Additionally, these measures are not normalized and are sensitive to confounding differences in cell viability, seeding density, mass, etc.

      Thank you for pointing this out. We have placed the lactate and pyruvate measurement graphs after the flux data of ECAR. We noted that these measurements are normalized based on cell count and viability (lines 189-190). The doubling time of seminal vesicle epithelial cells was approximately 3 days, and testosterone inhibited cell proliferation. Therefore, the seeding concentration of cells was increased 4-fold in the testosterone-treated group compared to the control, and experiments were conducted to ensure that the confluency at the time of measurement after 7 days of culture was comparable between groups.

      (11) The flux analyzer assays sold by Agilent have many ambiguities and problems of interpretation. Unfortunately, Agilent's interest in marketing/sales has outpaced their interest in scientific rigor. Please consider revising some of the language regarding the measurements. For example, 'ATP production rate' is not directly measured. Rather, oligomycin-sensitive respiration rate is measured. The conversion of OCR to ATP production rate is an estimation that depends on complex assumptions often requiring additional testing and validation. The same is true for other ambiguous terms such as 'maximal respiration' referring to FCCP uncoupled respiration, and glycolytic rate- which is also not measured directly. If the authors are interested in a more detailed description of the problems with Agilent's interpretation of these assays please see the following reference (PMID: 34461088).

      Thank you for your critical criticism and thoughtful advice, as well as for sharing the excellent reference. We agree with you on the flux analyzer ambiguities and data interpretation problems. The description of the measured values has been revised as follows.

      We have replaced the “ATP production rate” with the “oligomycin-sensitive respiratory rate.” Similarly, we have replaced “maximal respiration” with “FCCP-induced unbound respiration.” (lines 197-202) We chose not to deal with the conversion of OCR to ATP production rate because it is outside the scope of interest in our study.

      Avoid using the term "glycolytic capacity". We use “Oligomycin-sensitive ECAR.” (line 186) We recognize that the ECARs measured in this study reflect experimental conditions and may not fully represent physiological glycolytic flux in vivo. So, the main section includes a data set of glucose uptake studies to emphasize the significance of the changes obtained with the flux analyzer assay. (New Fig.6, lines 230-254)

      Figure 6, it's not surprising to see the accumulation of labeled oleic acid in the cells, however, this does not mean that oleic acid is participating in normal metabolic processes. Oleic acid will have detergent effects at high (uM) concentrations. The observation that sperm 'take up' OA at 10-100 uM concentrations should also be validated against sperm function the health of the cells is very likely to be negatively impacted. Additionally, no apparent accumulation is noted in the fluorescence imaging at 1uM, but the authors insinuate that uptake occurs at low nM concentrations. The effects in Figure 6D-F are nominal at best and are likely a result of the small sample sizes.

      Thank you for your good suggestion. We agree with the reviewer that high concentrations of oleic acid had a detergent effect. To improve the consistency of functional data and observations, oleic acid uptake tests were performed under the same concentration range as the sperm motility tests (New Fig.7A-C). The oleic acid concentration at this time was calculated regarding the oleic acid concentration in seminal fluid recovered from mice as detected by GCMS to reflect in vivo conditions.

      Epididymal sperm were incubated with fluorescently labeled oleic acid and observed after quenching of extracellular fluorescence. Fluorescent signals were detected selectively in the midpiece of the sperm. The fluorescence intensity of sperm quantified by flow cytometry increased significantly in a dose-dependent manner (New Fig.7A-C, lines 261-264).

      Furthermore, increasing the sample size did not change the trend of the sperm motility data. Although the effect size of oleic acid on sperm motility was small (New Fig.7D-G, lines 265-268), an improvement in fertilization ability was observed both in vitro (IVF) and in vivo (AI) (New Fig.7J-L, lines 274-282, 286-291). We conclude that the effect of oleic acid on sperm is of substantial significance. These data and interpretations have been revised in the text in the Results section.

      (12) Figure 6H, I applaud the authors for attempting intrauterine insemination experiments to test their previous findings. That said, there is no supporting data included to show that the sperm from the treatment groups had comparable starting viability/quality. Additionally, it is difficult to tell if the results are due to the small sample sizes and particularly the apparent outlier in the flutamide-only group.

      Thanks for the praise and comments for improvement. As we answered in your comment #5 above, the epididymal sperm was collected from healthy mice. Therefore, there is no qualitative difference in the epididymal sperm before treatment. This is described in the figure legend (lines 1130-1131). We apologize again for this complication. We also more than doubled the number of replications of the experiment. The impact of the outlier would have been minimal.

      (13) One final question related to Figure 6H: how did the authors know they were retrieving all of the possible 2-cell embryos from the uterus? Perhaps the authors could provide the raw counts of unfertilized eggs and 2-cell embryos so we can see if there were differences between the mice.

      We retrieved the pronuclear stage embryos from the fallopian tubes. It is not certain whether all embryos were recovered. Therefore, we added the number of embryos in the graph and in the supplementary data.

      (14) Figure 7 has the same seahorse assay normalization problem as mentioned earlier. Without normalization, it is difficult to tell if the effects are simply due to differences in cell mass. Were the replicates indicated in the graphs run on the same plate? If so, it would be much more convincing to see a nested design, with technical replicates within plates, and additional replicates run on separate plates.

      As we answered in your comment #8 above, these measurements were normalized based on sperm count. This has been corrected to be noted in the text and the figure legend (lines 1123-1124).

      Pooled sperm isolated and cultured from multiple mice were placed in one well. The measurements were taken in three different wells, and each experiment was repeated four times. We did not use the extracellular flux analyzers XFe24 or XFe96. The measurements were also repeated because the XF HS Mini was used in an 8-well plate (only a maximum of 6 samples at a run since 2 wells were used for calibration).

      (15) The statistical test in Figures 8E and F described in the legend is inappropriate (t-test), this appears to be a two-factor design.

      Thank you for pointing this out. Differences between groups were assessed using a two-way analysis of variance (ANOVA). When the two-way ANOVA was significant, differences among values were analyzed using Tukey's honest significant difference test for multiple comparisons.

      (16) The data in Figure 8 are interesting, and the effects appear to be a little more consistent compared with the mouse primary cells, potentially due to cell uniformity. However, the data are unnormalized, causing significant ambiguity, and there are no measures of cell viability to determine if the effects are due to cell death (or at least relative cell mass).

      As we answered in your comments #8 and #14 above, these measurements were normalized based on cell count and viability. This has been corrected to be noted in the figure legend (lines 1185-1186).

      Minor Comments:

      (1) The section title indicating the beginning of the results section is missing.

      A section title has been added to indicate the beginning of the results section.

      (2) There were several typos and confusingly worded statements throughout. Please consider additional editing.

      We used a proofreading service and corrected as much as possible.

      (3) In the introduction, a brief description of seminal fluid physiology is provided, but the reference is directed toward human physiology. Given that the research is performed solely in the mouse, a brief comparative description of mouse physiology would be helpful. For example, what is the role of mouse seminal fluid in the formation of the mating plug? What are the implications of the relative size disparity in seminal vesicles in mice versus humans? Etc.

      The third paragraph of the introduction has been revised (lines 57-60).

      Reviewer #2 (Recommendations For The Authors):

      Thank you for allowing us to strengthen our manuscript with your valuable comments and queries. We have made our best efforts to reflect your feedback.

      (1) The abstract is confusing and partly misleading and should be revised to more clearly and accurately summarize the study.

      The abstract was revised to be clearer and more accurate (lines 20-34).

      (2) The introduction should be revised to more accurately describe the sperm life cycle. Spermatogenesis, per definition, for example, exclusively takes place in the testis, sperm do not gain fertilization competence in the epididymis, sperm isolated from the epididymis cannot fertilize an oocyte unless in vitro capacitated, etc. In the last paragraph the connection between changes in fructose and citrate concentration, sperm metabolism and testicular-derived testosterone and AR remain unclear.

      The introduction was revised to be clearer and more accurate (lines 44-45).

      Citric acid and fructose are chemical components that are the subject of biochemical testing and are commonly used as semen testing items for humans and livestock. This is because the secretory function of the prostate and seminal vesicles is dependent on androgens. The measurement of citric acid and fructose concentrations in semen is routinely used to indicate testicular androgen production function (ISBN: 978-1-4471-1300-3, 978 92 4 0030787).

      (3) Throughout the manuscript the concept of (in vitro) capacitation is missing. Mixing sperm with seminal plasma is not the only way to achieve sperm that can fertilize the oocyte. Since media containing bicarbonate and albumin is the standard procedure in the field to capacitate epididymal mouse sperm rein vitro, the manuscript would gain value from a comparison between the effect of seminal plasma and in vitro capacitating media. Interesting readouts in addition to motility would i.e. be sAC activation, PKA-substrate phosphorylation, and acrosomal exocytosis.

      Thank you for pointing out this important point. As the reviewer points out, fertilization can be achieved in artificial insemination and in vitro fertilization using epididymal sperm which have not been exposed to seminal plasma. This has historically led to an underestimation of the role of accessory reproductive glands, such as the prostate and seminal vesicles. However, it has been reported that the removal of seminal vesicles in rodents decreases the fertilization rate after natural mating. This has been shown to be due to multiple factors affecting sperm motility rather than factors involved in plug formation (PMID: 3397934), but details of these factors and the whole picture of the role of the accessory glands were not known. This led us to become interested in the effects of sperm plasma on sperm other than fertilization and led us to begin research on the role of the accessory glands that synthesize sperm plasma.

      Early in our study, we found that simply exposing sperm to seminal vesicle extracts for 1 hour before IVF dramatically reduced fertilization rates, even in HTF medium containing bicarbonate and albumin. The experiment was designed on the assumption that seminal plasma contains factors that inhibit sperm from acquiring fertilizing ability. Therefore, we conducted experiments using modified HTF without albumin to avoid unintended motility patterns.

      However, we also respect the reviewer's opinion, and we have added our preliminary data related to IVF (New supplementary Fig.5).

      (4) In the introduction and throughout the manuscript it is unclear what the authors mean by "linear motility". An increase in VSL doesn't mean that the sperm swim in a more linear or straight way, or even that the sperm are 'straightened', it means that they swim faster from point A to point B. Do the authors mean progressive or hyperactivated motility? Please clarify.

      For all conditions tested the authors should follow the standard in the field and include the % of motile, progressively motile, and hyperactivated sperm.

      Thank you for pointing this out. We appreciate your feedback regarding the terminology. In our manuscript, "linear motility" refers to the degree to which sperm move in a straight line. We have clarified this by explaining that VSL (Straight-Line Velocity) and LIN (Linearity) are used to quantify and describe linear motility in sperm analysis: Higher VSL values indicate more direct, linear movement. A higher LIN value indicates a straighter path, thus representing greater linear motility. These terms have been standardized, and explanations have been added to the main text (lines 111-113).

      In response to your suggestion, we have included the percentage of motility and progressive motility for all conditions tested. However, since the experiment was performed using modified HTF without albumin, we have decided not to report the percentage of hyperactivation to avoid confusion.

      (5) Did the authors confirm that the injection of flutamide decreases androgen levels? That control needs to be included in the experiment to validate the conclusion.

      Injection of flutamide did not reduce androgen levels (see reviewer #1, comment 6). This is because flutamide's mechanism of action is based on antagonizing androgen and inhibiting its binding to the androgen receptor (New Fig.2A).

      (6) The role of mitochondrial activity in sperm progressive motility is still under investigation. PMID: 37440924 i.e. showed that inhibition of the ETC does not affect progressive but hyperactivated motility. The authors should either include additional experiments to confirm the correlation between mitochondrial activity and sperm progressive motility or tone down that conclusion.

      We have previously shown that treatment with D-chloramphenicol, an inhibitor of mitochondrial translation, significantly reduced sperm mitochondrial membrane potential, ATP levels, and linear motility (PMID: 31212063). Also, in the previous manuscript, we did not address progressive motility or hyperactivated motility in our analysis. We have chosen to discuss the effect of mitochondrial activity on linear motility rather than on progressive motility and hyperactivation of sperm.

      Was mitochondrial activity also altered in epididymal sperm incubated with and without seminal plasma or in aged mice?

      The mitochondrial membrane potential of epididymal sperm cultured with seminal vesicle extract (SV) was higher than that of epididymal sperm cultured without seminal vesicle extract (without SV: 67.3 ± 0.8%, with SV: 83.4 ± 1.8%). On the other hand, the mitochondrial membrane potential of epididymal sperm cultured with seminal vesicle extract recovered from aged mice was decreased (SV from aged: 60.3 ± 2.7%). It should be noted that the epididymal spermatozoa used in these experiments were healthy individuals, different from those from which seminal vesicle extracts were collected. (See also the response to reviewer 1's comment #5.)

      (7) The quality of the provided images showing AR, Ki67, and TUNEL staining should be improved or additional images should be included. Especially the AR staining is hard to detect in the provided images. The authors should also include a co-staining between AR and vesicle epithelial cells. That epithelial cells are multilayered does not come across in the pictures provided.

      We apologize for any inconvenience caused. The image has been replaced with one of higher resolution. The multilayered structure of the epithelial cells will also be seen.

      For the 12-month-old mice, an age-matched control should be included to support the authors' conclusion.

      To clarify the seminal vesicle changes associated with aging, we included images of 3-month-old mice as controls (New Supplementary Fig.2D).

      Overall, the rationale for the experiment does not become clear. How are the amount of seminal vesicle epithelial cells, testosterone, and AR expression connected to seminal plasma secretions? Why is it a disadvantage to have proliferating seminal vesicle epithelial cells? How is proliferation connected to the proposed switch in metabolic pathway activity?

      We have added some explanations and supporting data to the manuscript (New Fig.8D, lines 303-305, 315-319, 369-379). Cell proliferation stopped when the metabolic shift occurred, redirecting glucose toward fatty acid synthesis. Fatty acid synthesis is an important function of the seminal vesicle, and in the presence of testosterone, fatty acid synthesis enhancement and arrest of proliferation occur simultaneously. The connection between metabolism and cell proliferation was further demonstrated when ACLY was knocked down by shRNA, which stopped fatty acid synthesis and released the proliferative arrest induced by testosterone, allowing the cells to proliferate again. However, we do not know what effects occur when cell proliferation is stopped.

      (8) The experiments provided for glycolysis and oxphos are inconsistent and insufficient to support the authors' conclusion that testosterone shifts glycolytic and oxphos activity of seminal vesicle epithelial cells. Multiple groups (PMID 37440924, 37655160, 32823893) have shown that the increased flux through central carbon metabolism during capacitation is accompanied by an accumulation of intracellular lactate and increased secretion of lactate into the surrounding media. How do the authors explain that they see an increase in glucose uptake and ECAR but not in lactate and a decrease in pyruvate? Did the authors additionally quantify intracellular pyruvate and lactate? Since pyruvate and lactate are in constant equilibrium, it is odd that one metabolite is changing and the other one is not.

      Thank you for pointing this out. Since ECAR is often used as an alternative to lactate production but does not directly measure lactate levels, we measured changes in lactate and pyruvate concentrations in the culture medium. Under our experimental conditions, glucose appeared to be directed primarily towards anabolic processes, such as fatty acid synthesis, rather than the OXPHOS pathway, which may explain the lack of lactate production. The observed decrease in pyruvate might indicate its conversion to acetyl-CoA in the mitochondria, supporting both fatty acid synthesis and the TCA cycle. This shift would be consistent with the metabolic reprogramming toward anabolic activity.

      What do the authors mean by "the glycolytic pathway was not enhanced despite the activation of glycolysis" Seahorse, especially using a series of pathway inhibitors, only provides an indirect measurement of glycolysis and oxphos since the instrument does not provide a distinction from which pathways the detected protons are originating. The authors should consider a more optimized experimental design, i.e. the authors could monitor ECAR and OCR in the presence of glucose over time with and without the addition of testosterone. That would be less invasive since the sperm are not starved at the beginning of the experiment and would provide a more direct read-out. Did the authors normalize cell numbers in their experiment? Alternatively, the authors could consider performing metabolomics experiments.

      I agree with the reviewer. Buzzwords such as “glycolytic capacity” simply do not make sense, so we have removed them from the phrases noted by the reviewer. Please refer to the response to some of reviewer 1's points regarding the ambiguity of the data measured by the flux analyzer. Nevertheless, the assay design of the flux analysis could be used as a good “starting point” and provide information on the glycolytic system and respiratory control. Therefore, the interpretation of the flux analysis is supported by subsequent data sets.

      (9) The authors would strengthen their results by confirming their gene expression data by quantifying the expression of the respective proteins.

      Does testosterone treatment increase GLUT4 protein levels in isolated seminal vesicle epithelial cells? Or does it change the localization of the transporter? Are GLUT4 gene and protein levels altered in flutamide-treated cells? How do the authors explain that testosterone increases glucose uptake without changing Glut gene expression?

      We performed Western blot analysis to measure GLUT4 protein levels in seminal vesicle epithelial cells after testosterone treatment. The results showed that testosterone does not alter the expression of GLUT4 protein but simply changes its subcellular localization (New Fig.6C,D, lines 238-244).

      The discussion includes the interpretation of the observation that testosterone increases glucose uptake by altering localization without altering GLUT4 gene expression, a phenomenon commonly seen in other cells, such as cardiomyocytes (lines 362-364). The revised main figure also includes a data set of changes in GLUT4 localization, including flutamide-treated data. See also Reviewer 3's main comment #1.

      (10) Considering that the authors claim that SV secretions are crucial for sperm fertilization capacity, how do they explain that fertilization rates are still at 40 % when sperm are treated with flutamide?

      It is actually about 50% fertilized with HTF because it is fertilized without SV. Considering this baseline, we found that seminal vesicle secretions positively affect sperm in vivo fertilization. On the other hand, seminal plasma from flutamide-treated mice reduced the fertilization ability of healthy sperm. These are described in the text (lines 283-294).

      (11) It would be beneficial for the reader to include a schematic summarizing the results.

      Thank you for your advice from the reader's point of view. We have visualized the summaries of this study and added them to the manuscript (New Fig.10).

      Minor comments:

      Line 38: Male fertility, no article, please revise.

      I have changed “The male fertility” to “Male fertility” and added some references (lines 42-43).

      Line 55: Seminal plasma or TGFb? Please clarify.

      Corrected as follows. “TGFβ, a component of seminal plasma, increases antigen-specific Treg cells in the uterus of mice and humans, which induces immune tolerance, resulting in pregnancy.” (lines 60-62)

      Line 63: Why do the authors find it surprising that blood and seminal plasma have different compositions?

      This is because seminal plasma contains unique biochemical components that are not normally found in blood or only in small quantities. The intention was to emphasize the unique function of seminal plasma in supporting the physiological functions of sperm and to highlight its complex role by comparing it to blood. We clarified these intentions and reflected them in the revised text (lines 62-67).

      Line 94: The headline causes confusion. Seminal plasma does not induce sperm motility, it increases progressive sperm motility.

      Corrected as follows. “The effect of androgen-dependent changes in mouse seminal vesicle secretions on the linear motility of sperm” (lines 101-102)

      Reviewer #3 (Recommendations For The Authors):

      Thank you for allowing us to strengthen our manuscript with your valuable comments and queries. We have made our best efforts to reflect your feedback.

      Major:

      Figure 4 and Figure 5: The trend shows that GLUT3 is up-regulated and GLUT4 is downregulated although both of them are not statistically significant. However, GLUT4 is picked for all the following experiments based on protein localization. Providing other evidence/discussion why not to further consider other GLUTs will help to justify. Also, this reviewer suggests including GLUT4 localization data in the main figure as it is important data for the logical flow to link the following figures.

      We focused on GLUT4 because it was known that testosterone increases glucose uptake by changing the localization of GLUT4 without changing its expression (lines 230-231). In the revised manuscript, the increasing trend in Glut3 gene expression was also mentioned in the discussion, in addition to GLUT4 (lines 360-362). In any case, the results showed that testosterone increased glucose uptake by regulating the function of glucose transporters.

      Immunostaining of GLUT1~4 was performed to compare seminal vesicles from flutamide-treated mice with controls, and localization changes were observed only in GLUT4. Therefore, we hypothesized that GLUT4 is regulated by testosterone and performed the experiment. Fortunately, we were able to obtain a GLUT4-specific inhibitor, which dramatically inhibited the testosterone-dependent glucose uptake and subsequent lipid synthesis in seminal epithelial cells, leading us to believe that GLUT4 is a major glucose transporter.

      Increasing sperm linearity by oleic acid is observed and interpreted as enhanced sperm fertilizing potential. It is not clear why and how sperm linearity can be a determinant factor for enhancing sperm fertility in vivo. Providing an explanation of the effect of oleic acid on another key motility parameter more proven to be directly correlated with fertility (i.e., hyperactivation), and more direct evidence of oleic acid on enhancing sperm linearity indeed increasing sperm fertilization using IVF, is strongly recommended to support the author's main conclusion.

      Thank you for pointing this out. It is known that proteins derived from the seminal vesicles inhibit the hyperactivation of sperm and the acrosome reaction. Therefore, we conducted an experiment to add oleic acid, focusing on fatty acid synthesis caused by the metabolic shift of the seminal vesicles, which had not been known until now.

      Sperm were pretreated with an oleic acid-containing medium before IVF and oleic acid enhanced sperm linearity. When the sperm number was sufficient, there was no change in the cleavage rate after in vitro fertilization, but when the sperm count was reduced to one-tenth of the normal, the cleavage rate increased compared to the control (lines 274-282). In other words, the physiological role of oleic acid is to increase the probability of fertilization by keeping the sperm motility pattern linear or progressive. This increases the likelihood of the sperm passing through the female reproductive tract and environments that are unfavorable to sperm survival. Our research has uncovered significant insights into the role of seminal vesicle fluid and oleic acid in sperm fertilization. Due to the strong effect of the decapacitation factor, we found that seminal vesicle fluid reduces the fertilization rate in IVF. However, it does not interfere with the fertilization rate in in vivo during artificial insemination. This emphasizes the importance of oleic acid, along with other protein components of seminal plasma, in ensuring the in vivo fertilization ability of sperm.

      Minor:

      Please correct a typo in Line 173: sifts to shifts

      All typographical errors have been corrected.

    1. Author response:

      We plan to submit a revised version of our manuscript eLife-RP-RA-2024-105013, in which we address all comments raised by the two expert reviewers.

      Below we describe what we like to address in this revision. We understand that the provisional response is not meant to be a point-by-point reply. Therefore, our revision plan more generally summarizes the comments of the reviewers and how we plan to address them.

      Reviewer #1:

      This reviewer is overall very positive and states that our ‘work is likely to become the go-to resource for quantification in this field’. This reviewer raises few weaknesses of the manuscript that are explicitly described as minor.

      Microscopic resolution sufficient to support quantitative spine assessments?

      In the detailed revision, we will provide quantification of microscopic resolution and will relate this to the spine comparisons offered. Where needed, we will add caveats discussing measurement limits.

      Age of the human tissue.

      Most analysis is based on the study of three brains from elderly individuals. For the analysis of dendritic spines, we added measures from a younger brain (37 years-old). We will make it more clear, which datasets contained these measures and what the results of our comparative analysis have been.

      Genetic diversity contributing to species differences?

      We provide an updated discussion on this interesting topic.

      Reviewer #2:

      This reviewer also expresses a largely positive view of the manuscript, noting that ‘..the data will be of widespread interest to the cerebellar field…’. 

      Microscopic resolution:

      see above.

      Figure panels / Fig. 3:

      We will make sure that the figures are readable and will provide a clarification of gray scales used in Fig. 3.

      Vertical vs horizontal dendrite orientation:

      This is a point that requires clarification. Per our definition, all dendrites fall either into the vertical or horizontal category. We will make sure that this is defined sufficiently well.

    1. Author response:

      Response to Referee 1

      We agree that convex walls increase the time that consortia remain trapped in pores at high magnetic fields. Since the non-monotonic behavior of the drift velocity with the Scattering number arises largely due to these long trapping times, we agree that experiments using concave pores are likely to show a peak drift velocity that is diminished or erased.

      However, we disagree that a random packing of spheres or similar particles provides an appropriate model for natural sediment, which is not composed exclusively of hard particles in a pure fluid. Pore geometry is also influenced by clogging. Biofilms growing within a network of convex pillars in two-dimensional microfluidic devices have been observed to connect neighboring pillars, thereby forming convex pores. Similar pore structures appear in simulations of biofilm growth between spherical particles in three dimensions. Moreover, the salt marsh sediment in which MMB live is more complex than simple sand grains, as cohesive organic particles are abundant. Experiments in microfluidic channels show that cohesive particles clog narrow passageways and form pores similar to those analyzed here. Thus, we expect convex pores to be present and even common in natural sediment where clogging plays a role.

      The concentration of convex pores in the experiments presented here is almost certainly much higher than in nature. Nonetheless, since magnetotactic bacteria continuously swim through the pore space, they are likely to regularly encounter such convexities. Efficient navigation of the pore space thus requires that magnetotactic bacteria be able to escape these traps. In the original version of this manuscript, this reasoning was reduced to only one or two sentences. That was a mistake, and we thank the reviewer for prompting us to expand on this point. As the reviewer notes, this reasoning is central to the analysis and should have been featured more prominently. In the final version, we will devote considerable space to this hypothesis and provide references to support the claims made above.

      The reviewer suggests that the generality of this work depends on our finding a "positive correlation between the swimming speed and alignment [rate] based on parameters derived from literature." We wish to emphasize that, in addition to predicting this correlation, our theory also predicts the function that describes it. The black line in Figure 3 is not fitted to the parameters found in the literature review; it is a pure prediction.

      Response to Referee 2

      In the "Recommendations for the Authors," this reviewer drew our attention to a manuscript that absolutely should have been prominently cited. As the reviewer notes, our manuscript meaningfully expands upon this work. We are pleased to learn that the phenomena discussed here are more general than we initially understood. It was an oversight not to have found this paper earlier. The final version will better contextualize our work and give due credit to the authors. We sincerely appreciate the reviewer for bringing this work to our attention.

      We disagree that the use of non-culturable organisms and our unrealistic array should be considered serious weaknesses. While any methodological choice comes with trade-offs, we believe these choices best advance our aims. First, the goal of our research, both within and beyond this manuscript, is to understand the phenotypes of magnetotactic bacteria in nature. While using pure cultures enables many useful techniques, phenotypic traits may drift as strains undergo domestication. We therefore prioritize studying environmental enrichments.

      Clearly, an array of obstacles does not fully represent natural heterogeneity. However, using regular pore shapes allows us to average over enough consortium-wall collisions to enable a parameter-free comparison between theory and experiment. Conducting an analysis like this with randomly arranged obstacles would require averaging over an ensemble of random environments, which is practically challenging given the experimental constraints. Since we find good agreement between theory and experiment in simple geometries, we are now in a position to justify extending our theory to more realistic geometries. Additionally, we note that a microfluidic device composed of a random arrangement of obstacles would also be a poor representation of environmental heterogeneity, as pore shape and network topology differ between two and three dimensions.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      Cruz-González and colleagues draw on DNA methylation and paired genetic data from 621 participants (n=308 controls; n=313 participants with Alzheimer's Disease). The authors generate a panel of epigenetic biomarkers of aging with a primary focus on the Horvath multi-tissue clock. The authors find weaker correlations between predicted epigenetic age and chronological age in subgroups with higher African ancestry than within a subgroup identified as White. The authors then examine genetic variation as a potential source for between-group differences in epigenetic clock performance. The authors draw on a large collection of publicly available methylation quantitative trait loci datasets and find evidence for substantial overlap between clock CpGs located within the Horvath clock and methQTLs. Going further, the authors show that methQTLs that overlap with Horvath clock CpGs show greater allelic variation in African ancestral groups pointing to a potential explanation for poorer clock performance within this group. 

      Thank you for this summary.

      Strengths:  

      This is an interesting dataset and an important research question. The authors cite issues of portability regarding polygenic risk scores as a motivation to examine between-group differences in the performance of a panel of epigenetic clocks. The authors benefit from a diverse cohort of individuals with paired genetic data and focus on a clinical phenotype, Alzheimer's disease, of clear relevance for studies evaluating age-related biomarkers.  

      Weaknesses:  

      While the authors tackle an important question using a diverse cohort the current manuscript is lacking some detail that may diminish the potential impact of this paper. For example:  

      (1) Information on chronological ages across groups should be reported to ensure there are no systematic differences in ages or age ranges between groups (see point below).  

      Thank you for pointing out this omission. The age ranges are similar across cohorts. No individuals under 60 were considered, and the average ages per cohort ranged from 72 to 76. Neither average age nor age range was consistently higher or lower in the admixed cohorts for which the clocks had lower performance compared to the White cohort. We will report the age distributions in supplementary material in the revision.

      (2) The authors compare correlations between chronological age and epigenetic age in sub-groups within to correlations reported by Horvath (2013). Attempting to draw comparisons between these two datasets is problematic. The current study has a much smaller N (particularly for sub-group analyses) and has a more restricted age range (6090yrs versus 0-100 yrs). Thus, is an alternative explanation simply that any weaker correlations observed in this study are driven by sample size and a restricted age range? Reporting the chronological ages (and ranges) across subgroups in the current study would help in this regard. Similarly, given the lack of association between AD status and epigenetic age (and very small effect in the white group), it may be of interest to examine the correlation between chronological age and epigenetic age in each group including the AD participants: would the between-group differences in correlations between chronological age and epigenetic be altered by increasing the sample size?

      Our conclusions about the reduced accuracy of the clocks in admixed individuals are based on comparisons within the MAGENTA cohorts, not on the comparisons to previous reports. We show significantly reduced accuracy on African American and Puerto Rican cohorts in MAGENTA compared to the White MAGENTA cohort. The reviewer is correct that the lower correlation in each of the cohorts compared to those in the Horvath study is due to the older age range of our cohort. Indeed, other studies applying the Horvath clock have seen similar correlations to those observed on the White MAGENTA cohort (Marioni et al., 2015, Horvath 2013, and Shireby et al., 2020). Following the suggestion to increase sample size, we conducted the chronological age vs. epigenetic age correlation analysis with the inclusion of AD cases. The significantly lower performance of the clock on Puerto Ricans and African Americans relative to White individuals remains after including all individuals in each cohort. We will include these results on the full cohorts in MAGENTA in the revision.

      (3) The correlation between chronological age and epigenetic age, while helpful is not the most informative estimate of accuracy. Median absolute error (and an analysis of MAE across subgroups) would be a helpful addition.  

      We used correlation because this is commonly used to evaluate the performance of epigenetic age clocks, but we agree that direct error quantification provides a complementary perspective. We confirm that the African American and Puerto Rican cohorts have higher error than the White cohort, and we will report these comparisons in the revision.

      (4) More information should be provided about how DNAm data were generated. Were samples from each ancestral group randomized across plates/slides to ensure ancestry and batch are not associated? How were batch effects considered? Given the relatively small sample sizes, it would be important to consider the impact of technical variation on measures of epigenetic age used in the current study. The use of principal Component-based versions of these clocks (Higgins Chen et al., 2023; Nature Aging https://doi.org/10.1038/s43587-022-00248-2) may help address concerns such concerns.  

      Thank you for pointing out the need for additional context on data generation. All omics data from the MAGENTA study were generated using protocols that aim to minimize technical artifacts and batch effects. We will add detailed protocol information will be detailed in the revision. We also thank the reviewer for their suggestion on applying the principal component clock to account for potential technical variation. We are planning to perform these analyses and include them in the revision.

      (5) Marioni et al., (2015) found a very weak cross-sectional association between DNAm Age and cognitive function (r~0.07) in a cohort of >900 participants. Given these effect sizes, I would not interpret the absence of an effect in the current study to reflect issues of portability of epigenetic biomarkers. 

      We agree that previous links between DNAm Age and AD/cognitive function have been small in magnitude. For example, the PhenoAge paper (Levine et al., 2018) and a study using the Horvath clock (Levine et al., 2015) found age acceleration of less than a year in AD patients relative to non-demented individuals. These effects have been detected in studies with relatively small sample sizes (e.g., 700 for Levine et al. 2015 and 604 for Levine et al. 2018). Our study is of similar size, but the cohort-specific analyses have lower power. Nonetheless, we replicate the modest, but significant association with AD in the white MAGENTA cohort. We have performed power calculations and find that we have 26% power to detect an effect of this size in the Cubans, 46% for the Peruvians, 66% for the Whites, 74% for the Puerto Ricans, and 84% for the African Americans. Given the relatively high power in the Puerto Rican and African American cohorts, we suggest that the reduced accuracy of the clocks contributes to the lack of association. We will also add caveats about power and the small sample size in the revision.

      6) The methQTL analyses presented are suggestive of potential genetic influence on DNAm at some Horvath CpGs. Do authors see differences in DNAm across ancestral groups at these potentially affected CpGs? This seems to be a missing piece together (e.g., estimating the likely impact of methQTL on clock CpG DNAm). 

      Thank you for this excellent suggestion. We will add this analysis in the revision. This will enable us to test for further evidence for our hypothesis about the role of ancestryspecific meQTL on clock accuracy.  

      Reviewer #2 (Public review):

      Summary:  

      This paper seeks to characterize the portability of methylation clocks across groups. Methylation clocks are trained to predict biological aging from DNA methylation but have largely been developed in datasets of individuals with primarily European ancestries. Given that genetic variation can influence DNA methylation, the authors hypothesize that methylation clocks might have reduced accuracy in non-European ancestries.  

      Strengths:  

      The authors evaluate five methylation clocks in 621 individuals from the MAGENTA study. This includes approximately 280 individuals sampled in Puerto Rico, Cuba, and Peru, as well as approximately 200 self-identified African American individuals sampled in the US. To understand how methylation clock accuracy varies with proportion of nonEuropean ancestry, the authors inferred local ancestry for the Puerto Rican, Cuban, Peruvian, and African American cohorts. Overall, this paper presents solid evidence that methylation clocks have reduced accuracy in individuals with non-European ancestries, relative to individuals with primarily European ancestries. This should be of great interest to those researchers who seek to use methylation clocks as predictors of agerelated, late-onset diseases and other health outcomes.

      Thank you for this summary.

      Weaknesses:  

      One clear strength of this paper is the ability to do more sophisticated analyses using the local ancestry calls for the MAGENTA study. It would be valuable to capitalize on this strength and assess portability across the genetic ancestry spectrum, as was recently advocated by Ding et al. in Nature (2023). For example, the authors could regress non-European local ancestry fraction on measures of prediction accuracy. This could paint a clearer picture of the relationship between genetic ancestry and clock accuracy, compared to looking at overall correlations within each cohort. 

      Thank you for this excellent suggestion. We agree that modeling portability across genetic ancestry as a spectrum would help support our conclusions. We will add this to the revision.

      The authors present two possible reasons that methylation clocks might have reduced accuracy in individuals with non-European ancestries: genetic variants disrupting methylation sites (i.e., "disruptive variants") and genetic variants influencing methylation sites (i.e., meQTLs). The authors conclude disruptive variants do not contribute to poor methylation clock portability, but the evidence in support of this conclusion is incomplete. The site frequency spectrum of disruptive variants in Figure 4 is estimated from all gnomAD individuals, and gnomAD is comprised of primarily European individuals. Thus, the observation that disruptive variants are generally rare in gnomAD does not rule them out as a source of poor clock portability in admixed individuals with non-European ancestries. 

      Thank you for this question. The allele frequencies were so low that even if they all occurred in individuals of non-European ancestries, they would still be incredibly rare. Nonetheless, in the revision, we will make this clear by reporting ancestry-specific allele frequencies.

      It is also unclear to what extent meQTLs impact methylation clock portability. The authors find that the frequency of meQTLs is higher in African ancestry populations, but this could reflect the fact that some of the analyzed meQTLs were ascertained in African Americans. The number of meQTL-affected methylation sites also varies widely between clocks, ranging from 6 to 271; thus, meQTLs likely impact the portability of different clocks in different ways. Overall, the paper would benefit from a more quantitative assessment of the extent to which meQTLs influence clock portability. 

      We agree that the meQTL likely influence the clocks in different ways and that the ascertainment of the meQTLs in different populations makes direct comparisons challenging. To provide mechanistic insights into the ways that meQTL influence the methylation clocks, we plan to leverage the individual-level genetic data generated for the MAGENTA individuals. This will allow us to explore whether the individuals who have the specified clock-influencing meQTL receive less accurate predictions from the methylation clocks. In addition, the new analysis of whether individuals from different cohorts have different methylation levels at clock CpGs with ancestry-variable meQTLs will help establish the differences between groups (see response to Reviewer #1 point 6). Finally, to resolve potential bias due to ascertaining some of the meQTL in African Americans, we will conduct the same analyses from the manuscript, holding out the set of meQTL from African Americans. These results will be included in the revision.

      The paper implies that methylation clocks have an inferior ability to predict AD risk in admixed populations relative to white individuals, but the difference between white AD patients and controls is not significant when correcting for multiple testing. This nuance should be made more explicit. 

      We agree that the signal is not particularly strong in the white cohort, but the effect size is in line with previous studies. We will add power calculations and discussion to help the interpretation of these results (see response to Reviewer #1 point 5).  

      Finally, this paper overlooks the possibility that environmental exposures co-vary with genetic ancestry and play a role in decreasing the accuracy of methylation clocks in genetically admixed individuals. Quantifying the impact of environmental factors is almost certainly outside of the scope of this paper. However, it is worth acknowledging the role of environmental factors to provide the field with a more comprehensive overview of factors influencing methylation clock portability. It is also essential to avoid the assumption that correlations with genetic ancestry necessarily arise from genetic causes.  

      We entirely agree about the importance of discussing environmental exposures. We did not intend to discount them in our manuscript. We will clarify their potential role and the scope of our analyses in the revision. We expect that environmental factors certainly contribute to differences between groups. The revisions outlined above may help us better quantify the genetic contribution.

      Reviewer #3 (Public review):

      This manuscript examines the accuracy of DNA methylation-based epigenetic clocks across multiple cohorts of varying genetic ancestry. The authors find that clocks were generally less accurate at predicting age in cohorts with large proportions of nonEuropean (especially African) ancestry, compared to cohorts with high European ancestry proportions. They suggest that some of this effect might be explained by meQTLs that occur near CpG sites included in clocks, because these variants may be at higher frequencies (or at least different frequencies) in cohorts with high proportions of non-European ancestry relative to the training set. They also provide discussions of potential paths forward to alleviate bias and improve portability for future clock algorithms.  

      The topic is timely due to the increasing popularity of DNA methylation-based clocks and the acknowledgment that many algorithms (e.g., polygenic risk scores) lack portability when applied to cohorts that substantially differ in ancestry or other characteristics from the training set. This has been discussed to some degree for DNA methylation-based clocks, but could of course use more discussion and empirical attention which the authors nicely provide using an impressive and diverse collection of data.  

      The manuscript is clear and well-written, however, some key background was missing (e.g., what we know already about the ancestry composition of clock training sets) and most importantly several analyses would benefit from being taken one step further. For example, the main argument of the paper is that ancestry impacts clock predictions, but this is determined by subsetting the data by recruitment cohort rather than analyzing ancestry as a continuous variable. Extending some of the analyses could really help the authors nail down their hypothesized sources of lack of portability, which is critical for making recommendations to the community and understanding the best paths forward.  

      Thank you for these suggestions. As noted in our response to reviewer #2, we will analyze ancestry as a continuous variable in the revision. We will also add details on the training of previous clocks and previous work on clock accuracy.

    1. Author response:

      We thank the reviewers for the careful review of our manuscript. Overall, they were positive about our use of cutting-edge methods to identify six inversions segregating in Lake Malawi. Their distribution in ~100 species of Lake Malawi species demonstrated that they were differentially segregating in different ecogroups/habitats and could potentially play a role in local adaptation, speciation, and sex determination. Reviewers were positive about our finding that the chromosome 10 inversion was associated with sex-determination in a deep benthic species and its potential role in regulating traits under sexual selection. They agree that this work is an important starting point in understanding the role of these inversions in the amazing phenotypic diversity found in the Lake Malawi cichlid flock.

      There were two main criticisms that were made which we summarize:

      (1) Lack of clarity. It was noted that the writing could be improved to make many technical points clearer. Additionally, certain discussion topics were not included that should be.

      We will rewrite the text and add additional figures and tables to address the issues that were brought up in a point-by-point response. We will improve/include (1) the nomenclature to understand the inversions in different lineages, (2) improved descriptions for various genomic approaches, (3) a figure to document the samples and technologies used for each ecogroup, and 4) integration of LR sequences to identify inversion breakpoints to the finest resolution possible.

      (2) We overstate the role that selection plays in the spread of these inversions and neglect other evolutionary processes that could be responsible for their spread.

      We agree with the overarching point. We did not show that selection is involved in the spread of these inversions and other forces can be at play. Additionally, there were concerns with our model that the inversions introgressed from a Diplotaxodon ancestor into benthic ancestors and incomplete lineage sorting or balancing selection (via sex determination) could be at play. Overall, we agree with the reviewers with the following caveats. 1. Our analysis of the genetic distance between Diplotaxodons and benthic species in the inverted regions is more consistent with their spread through introgression versus incomplete lineage sorting or balancing selection. 2. This question of selection is much more complicated in the context of the Lake Malawi cichlid radiation with ~800 different species. We believe the role of these inversions must be considered in a species- and time-specific way. In other words, the evolutionary forces acting on these inversions at the time of their formation are likely different than the role of the evolutionary forces acting now. Further the role of these inversions is likely different in different species. For example, the inversion of 10 and 11 play a role in sex determination in some species but not others and the potential pressures acting on the inverted and non-inverted haplotypes will be very different. These are very interesting and important questions booth for understanding the adaptive radiations in Lake Malawi and in general, and we are actively studying crosses to understand the role of these inversions in phenotypic variation between two species. We will modify the text to make all of these points clearer.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, which leads to the experimentally observed phenomenon of feature competition. The authors also examine how various (hyper)parameters-such as adaptation timescale, the excitatory-to-inhibitory cell ratio, regularization strength, and background current-affect the model. These findings add biological realism to a specific implementation of efficient coding. They show that efficient coding explains, or at least is consistent with, multiple experimentally observed properties of excitatory and inhibitory neurons. 

      As discussed in the first round of reviews, the model's ability to replicate biological observations such as the 4:1 ratio of excitatory vs. inhibitory neurons hinges on somewhat arbitrary hyperparameter choices. Although this may limit the model's explanatory power, the authors have made significant efforts to explore how these parameters influence their model. It is an empirical question whether the uncovered relationships between, e.g., metabolic cost and the fraction of excitatory neurons are biologically relevant.

      The revised manuscript is also more transparent about the model's limitations, such as the lack of excitatory-excitatory connectivity. Further improvements could come from explicitly acknowledging additional discrepancies with biological data, such as the widely reported weak stimulus tuning of inhibitory neurons in the primary sensory cortex of untrained animals.

      We thank the Reviewer for their insightful characterization of our paper and for further suggestions on how to improve it. We have now further improved the transparency about model’s limitations and we explicitly acknowledged the discrepancy with biological data about connection probability and about the selectivity of inhibitory neurons (pages 4 and 15).

      Reviewer #2 (Public review): 

      Summary: 

      In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength.

      Strengths: 

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models. In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some long-standing puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important. Lastly, though several of the observations have been reported and studied before, this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Weaknesses: 

      This work is the latest among a line of research papers studying the properties of efficient spiking networks. Many of the characteristics and findings here have been discussed before, thereby limiting the new insights that this work can provide. Thus, the conclusions of this work should be considered and understood in the context of those previous works, as the authors state. Furthermore, the number of assumptions and free parameters in the model, though necessary to bring the model closer to biophysical reality, make it more difficult to understand and to draw clear conclusions from. As the authors state, many of the optimality claims depend on these free parameters, such as the dimensionality of the input signal (M=3), the relative weighting of encoding error and metabolic cost, and several others. This raises the possibility that it is not the case that the set of biophysical properties measured in the brain are accounted for by efficient coding, but rather that theories of efficient coding are flexible enough to be consistent with this regime. With this in mind, some of the conclusions made in the text may be overstated and should be considered in this light.

      Conclusions, Impact, and additional context: 

      Notions of optimality are important for normative theories, but they are often studied in simple models with as few free parameters as possible. Biophysically detailed and mechanistic models, on the other hand, will often have many free parameters by their very nature, thereby muddying the connection to optimality. This tradeoff is an important concern in neuroscientific models. Previous efficient spiking models have often been criticized for their lack of biophysically-plausible characteristics, such as large synaptic weights, dense connectivity, and instantaneous communication. This work is an important contribution in showing that such networks can be modified to be much closer to biophysical reality without losing their essential properties. Though the model presented does suffer from complexity issues which raise questions about its connections to "optimal" efficient coding, the extensive study of various parameter dependencies offers a good characterization of the model and puts its conclusions in context.

      We thank the Reviewer for their thorough and accurate assessment of our paper.  

      Reviewer #3 (Public review): 

      Summary: 

      In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work. 

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs. 

      They then investigate in depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and show the networks can operate in a biologically realistic regime.

      Strengths: 

      * The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field

      * They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly

      * They put sensible constraints on their networks, while still maintaining the good properties these networks should have

      Weaknesses: 

      * One of the core goals of the paper is to make a more biophysically realistic network than previous work using similar optimization principles. One of the important things they consider is a split into E and I neurons. While this works fine, and they consider the coding consequences of this, it is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. This would be out of scope for the current paper however.

      * The theoretical advances in the paper are not all novel by themselves, as most of them (in particular the split into E and I neurons and the use of biophysical constants) had been achieved in previous models. However, the authors discuss these links thoroughly and do more in-depth follow-up experiments with the resulting model. 

      Assessment and context: 

      Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporate aspects of energy efficiency. For computational neuroscientists this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers the model provides a clearer link of efficient coding spiking networks to known experimental constraints and provides a few predictions.

      We thank the Reviewer for a positive assessment and for pointing out the merits of our work.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The authors have addressed my previous concerns, and I agree that the manuscript has improved. However, I believe they could still do more to acknowledge two notable mismatches between the model and experimental data.

      (1) Stimulus selectivity of excitatory and inhibitory neurons 

      In the model, excitatory and inhibitory neurons exhibit similar stimulus selectivity, which appears inconsistent with most experimental findings. The authors argue that whether inhibitory neurons are less selective remains an open question, citing three studies in support. However, only one of these studies (Ranyan) was conducted in primary sensory cortex and it is, to my knowledge, one of the few papers showing this (indeed, it's often cited as an exception). The other two studies (Kuan and Najafi) recorded from the parietal cortex of mice trained on decision making tasks, and therefore seem less relevant to the model.

      In contrast to the cited studies, the overwhelming majority of the work has found that inhibitory neurons in sensory cortex, in particular those expressing Parvalbumin, are less stimulus selective than excitatory cells. And this is indeed the prevailing view, as summarized by the review from Hu et al. (Science, 2014): "PV+ interneurons exhibit broader orientation tuning and weaker contrast specificity than pyramidal neurons." This view emerged from numerous classical studies, including Sohya et al. (J. Neurosci., 2007), Cardin (J. Neurosci., 2007), Nowak (Cereb. Cortex, 2008), Niell et al. ( J. Neurosci., 2008), Liu (J. Neurosci., 2009), Kerlin (Neuron, 2010), Ma et al. (J. Neurosci., 2010), Hofer et al. (Nature Neurosci. 2011), and Atallah et al. (Neuron 2012). Weak inhibitory tuning has been confirmed by recent studies, such as Sanghavi & Kar (biorxiv 2023), Znamenskiy et al. (Neuron 2024), and Hong et al. (Nature, 2024).

      The authors should acknowledge this consensus and cite the conflicting evidence. Failing to do so is cherry picking from the literature. Since training can increase the stimulus selectivity of PV+ neurons to that of Pyr levels, also in primary visual cortex (Khan et al. Neuron 2018), a favourable interpretation of the model is that it represents a highly optimized, if not overtrained, state.

      We have carefully considered the literature cited by the Reviewer. We agree with the interpretation that stimulus selectivity of inhibitory neurons in our model is higher than the stimulus selectivity of Parvalbumin-positive inhibitory neurons in the primary sensory cortex of naïve animals. We have edited the text in Discussion (page 14).

      (2) Connection probability 

      The manuscript claims that "rectification sets the overall connection probability to 0.5, consistent with experimental results (Pala & Petersen; Campagnola et al.)." However, the cited studies, and others, report significantly lower probabilities, except for Pyr-PV (E-I connections in the model). For example, Campagnola et al. measured PV-Pyr connectivity at 34% in L2/3 and 20% in L5.

      It's perfectly acceptable that the model cannot replicate every detail of biological circuits. But it's important to be cautious when claiming consistency with experimental data.

      Here as well, we agree with the Reviewer that the connection probability of 0.5 is consistent with reported connectivity of Pyr-PV neurons, but less so with reported connectivity of PV-Pyr neurons. We have now qualified our claim about compatibility of the connection probability in our model with empirical observations more precise (page 4).

      Reviewer #2 (Recommendations for the authors): 

      I commend the authors for an extremely thorough and detailed rebuttal, and for all of the additional work put in to address the reviewer concerns. For the most part, I am satisfied with the current state of the manuscript. 

      We thank the Reviewer for recognizing our effort to address the first round of Reviews to our best ability.

      Here are some small points still remaining that I think the authors should address: 

      (1) Pg. 8, "We verified the robustness of the model to small deviations from the optimal synaptic weights" - while the authors now cite Calaim et al. 2022 in the discussion, its relevance to several of the results justify its inclusion in other places. Here is one place where the authors test something that was also studied in this previous paper.

      The Reviewer is correct that Calaim et al. (eLife 2022) addressed the robustness of synaptic weights, and we now cited this study when describing our results on jiVering of synaptic connections (page 8).

      (2) Pg. 9, "In our optimal E-I network we indeed found that optimal coding efficiency is achieved in absence of within-neuron feedback or with weak adaptation in both cell types" Pg. 10, "the absence of within-neuron feedback or the presence of weak and short-lasting spike-triggered adaptation in both E and I neurons are optimally efficient solutions" The authors seem to state that both weak adaptation and no adaptation at all are optimal. In contrast to the rest of the results presented, this is very vague and does not give a particular level of adaptation as being optimal. The authors should make this more clear. 

      We agree that the text about optimal level of adaptation was unclear. The optimal solution is no adaptation, while weak and short-lasting adaptation define a slightly suboptimal, yet still efficient, network state, as now stated on page 10.

      (3) Pg. 13, "In summary our analysis suggests that optimal coding efficiency is achieved with four times more E neurons than I neurons and with mean I-I synaptic efficacy about 3 times stronger..." --- claims such as these are still too strong, in my opinion. It is rather the case that the particular ratio of E to I neurons and connections strengths can be made consistent with an optimally efficient regime.

      We agree here as well. We have revised the text (page 13) to beVer explain our results.

      (4) Pg. 14, "firing rates in the 1CT model were highly sensitive to variations in the metabolic constant" (Fig. 8I, as compared to Fig. 6C). This difference between the 1CT and E-I networks is striking, and I would suspect it is due to some idiosyncrasies in the difference between the two models (e.g., the relative amount of delay that it takes for lateral inhibition to take effect, or the fact that E-E connections have not been removed in this model). The authors should ideally back up this result with some justified explanation. 

      We agree with Reviewer that the delay for lateral inhibition in the E-I model is twice that of the 1CT model and that the E-I model gains stability from the lack of E-E connectivity. Furthermore, the tuning is stronger in I compared to E neurons in the E-I model, which contributes to making the E-I network inhibition-dominated (Fig. 1H). In contrast, the average excitation and inhibition in the 1CT model are of exactly the same magnitude. The property of being inhibition-dominated makes the E-I model more stable. We report these observations in the revised text (pages 14-15). 

      Reviewer #3 (Recommendations for the authors): 

      Overall my points were very well responded to and I removed most of my weaknesses.

      I appreciate the authors implementing my suggested analysis change for Figure 8, and I find the result very clear. I would further suggest they add a bit of text for the reader as to why this is done. For a new reader without much knowledge of these networks at first it seems the inhibitory population is very good at representation in fig 8G: so why is it not further considered in fig 8H?

      We thank the reviewer for providing further suggestions. We now clarified in the text why only the excitatory population of the E-I model is considered in E-I vs 1 cell type model comparison (page 14). 

      Thanks for sharing the code. From a quick browse through it looks very manageable to implement for follow up work, although some more guidance for how to navigate the quite complicated codebase and how to reproduce specific paper results would be helpful.

      We have also updated the code repository, where we have included more complete instructions on how to reproduce results of each figure. We renamed the folders with the computer code so that they point to a specific figure in the paper. The repository has been completed with the output of the numerical simulations we run, which allows immediate replot of all figures. We have deposited the repository at Zenodo to have the final version of the code associated with the DOI ttps://doi.org/10.5281/zenodo.14628524. This is mentioned in the section Code availability (page 17).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 Public Review:

      Summary

      This very short paper shows a greater likelihood of C->U substitutions at sites predicted to be unpaired in the SARS-CoV-2 RNA genome, using previously published observational data on mutation frequencies in  SARS-CoV-2 (Bloom and Neher, 2023).

      General comments

      A preference for unpaired bases as a target for APOBEC-induced mutations has been demonstrated previously in functional studies so the finding is not entirely surprising. This of course assumes that A3A or other APOBEC is actually the cause of the majority of C->U changes observed in SARS-CoV-2 sequences.

      I'm not sure why the authors did not use the published mutation frequency data to investigate other potential influences on editing frequencies, such as 5' and 3' base contexts. The analysis did not contribute any insights into the potential mechanisms underlying the greater frequency of C->U (or G->U) substitutions in the SARS-CoV-2 genome.

      I have added additional discussions of mechanisms focusing on the question of whether basepairing bias is  primarily driven by secondary structure dependence of underlying mutation rates or by conservation of  secondary structure (Discussion lines 178–192) and I added a brief analysis of the 5′ and 3′ contexts of the  relationship between being basepaired in a secondary structure model and apparent mutational fitness  (Figures S1 and S2, Results lines 85–97). I found that the 5′ context of unpaired, but not paired basepairs  influences apparent mutational fitness (preference for 5′ U), and that the  is also . Additionally, there is a 3′  preference for G, indicating some CpG suppression. This contrasts to some degree with another analysis  based on counting lineage frequencies that may have lacked power to detect relatively small effects  (Simmonds  mBio  2024).

      Reviewer 1 Author recommendations:

      There are at least 5 publications describing the mapping/prediction of SARS-CoV-2 RNA secondary structure from 2022-2023 and their predictions are not entirely consistent. Why did the authors only refer to the Lan et al. paper?

      I have added comparisons when the Lan et al secondary structure model is replaced by one of two others  derived from SHAPE data (Results lines 110–122). Unsurprisingly, similar secondary structure models give  similar results and performance is modestly higher for the models from Lan et al. This is consistent  with  their observations that DMS reactivities performed better as classifiers of SL5 and ORF1 secondary  structure (the reason I compared to this secondary structure model and reactivity data set rather than  others), but I did not go into detail on this in the revision since there are many differences in methods  beyond class of reactivity probe. For example, somewhat stronger correlation for the Vero than the Huh7  dataset in Lan  et al  could arise from combining data  from two replicates, from cell type, or from differences  in data analysis methods. It’s also a small difference and cannot be confidently distinguished from noise.

      I conducted a preliminary comparison of the performance of DMS and SHAPE data for predicting mutations  where DMS data is available, but I opted against including this analysis in the manuscript for the same  reasons. Instead, I included in results and discussion comments on how, in general, reactivity data contains  information that is predictive of substitution rates that is not captured by binary secondary structure models.  I also discuss how multiple data sources can potentially be integrated to more accurately predict the impact  of a substitution on fitness (Discussion lines 195–201).

      Specific substitutions are referred to as C->T and C29085T for example, but as the genome of SARS-CoV-2 is RNA, and T should be a U.

      I agree and I have changed all “T” to “U” in the paper and analysis scripts. The choice of “T” was motivated  by what seemed to appear most frequently in papers on SARS-CoV-2 mutational spectra, but “U” is nearly  universal in papers on secondary structure and mutation mechanisms, so I agree it makes more sense in  this paper.

      The C29085T substitution is somewhat non-canonical as it is a single base bulge in a longer duplex section of dsRNA, very unlike the favoured sites for mutation in the Nakata et al paper.

      I have added a discussion of Nakata  et al ( NAR 2023) ( Introduction lines 29–32). I did not go into this depth  in the revision, but the analysis of ~2M patient sequences in Nakata  et al  also noted a high rate of UUC→UUU substitution, so the UUUC context of C29095 (shared by 3 of the 10 positions highlighted in  Nakata  et al  that had high mutation frequencies with  exogenous APOBEC3A expression) could be  interesting to investigate further.

      High C29095U substitution frequency is indeed somewhat at odds with the results in that work, which found  that UC→UU substitutions to be elevated in longer single-stranded regions than the context of C29095U in  SARS-CoV-2 secondary structure models (a single unpaired base opposing three unpaired bases in an  asymmetric internal loop).

      I'm not sure why DMS reactivity is considered a separate variable from pairing likelihood as one informs the other.

      The intent here, which was not clear, was to show that a binary basepairing model that uses DMS  reactivities as constraints does not capture all of the information available. I have clarified this in as  described above discussing information in different reactivy datasets.

      The C29095U substitution is also relavent to the consideration of DMS reactivity in addition to the resulting  secondary structure model. These are not considered as separate predictors and the reason for showing  both is mentioned in the paper: “DMS reactivity was more strongly correlated with estimated mutational  fitness than basepairing when analysis was limited to positions with detectable DMS reactivity.” I have  clarified this in the revised manuscript and also it is relevant to the discussion of a potential model  integrating all available datasets.

      Reviewer 2 Public Review:

      Hensel investigated the implications of SARS-CoV-2 RNA secondary structure in synonymous and nonsynonymous mutation frequency. The analysis integrated estimates of mutational fitness generated by Bloom and Neher (from publicly available patient sequences) and a population-averaged model of RNA basepairing from Lan et al (from DMS mutational profiling with sequencing, DMS-MaPseq).

      The results show that base-pairing limits the frequency of some synonymous substitutions (including the most common CT), but not all: GA and AG substitutions seem unaffected by base-pairing.

      The author then addressed nonsynonymous CT substitutions at base-paired positions. While there is still a generally higher estimated mutational fitness at unpaired positions, they propose a coarse adjustment to disentangle base-pairing from inherent mutational fitness at a given position. This adjustment reveals that nonsynonymous substitutions at base-paired positions, which define major variants, have higher mutational fitness.

      Overall, this manuscript highlights the importance of considering RNA secondary structure in viral evolution studies.

      The conclusions of this work are generally well supported by the data presented. Particularly, the author acknowledges most limitations of the analyses, and addresses them. Even though no new sequencing results were generated, the author used available data generated from the analysis of roughly seven million sequenced patient samples. Finally, the author discusses ways to improve the current available models.

      There are a number of limitations of this work that should be highlighted, specifically in regard to the secondary structure data used in this paper. The Lan et al. dataset was generated using a multiplicity of infection (MOI) of 0.05, 24 hours post-infection (h.p.i.). At such a low MOI and late timepoint, viral replication is not synchronous and sequencing artifacts might be generated by cell debris and viral RNA degradation, therefore impacting the population-averaged results. In addition, the nonsynonymous base-paired positions in Figure 2 have relatively high population-averaged DMS reactivity, which suggests those positions are dynamic. Therefore, the proposed adjustment could result in an incorrect estimation of their inherent mutational fitness.

      I would go further than this to say that the proposed adjustmentment  will usually  result in an incorrect  estimate. My intent is to propose an improved, but still likely incorrect, estimate by utilizing  in  vitro  data to  refine baseline mutation rates in order to obtain improved, but only coarsely adjusted, estimates of  mutational fitness. I added a note in the discussion that  in vitro  reactivities (and, consequently, secondary  structure models) may not reflect secondary structures  in vivo ( Discussion lines 204–205). I did not go  into  detail regarding the specific technical considerations mentioned here because they are outside the scope of  my expertise.

      I am not sure that top-ranked non-synonymous C→U positions have particularly high DMS values after  coarse adjustment for basepairing (labeled amino acid mutations in Figure 2). Of the six common mutations  used as examples, three have minimum values in the dataset considered (which is processed  normalized/filtered data rather than raw data) and three do not have very high DMS reactivity.

      However, there is clearly information in base reactivity that is not captured by a binary basepairing model,  which is indicated by residual positive correlation between DMS reactivity and mutational fitness after  adjustment. I now include a figure demonstrating this for synonymous C→U substitutions as Figure S3, and  I have tried to clarify the language throughout the manuscript to make it clear that a more accurate  adjustment is possible.

      Additionally, like all such RNA probing experiments within cells, it remains difficult to deconvolve DMS/SHAPE low reactivity with RNA accessibility (e.g. from protein binding).

      I agree, and in revising this manuscript it was interesting to see that Nakata  et al ( discussed above)  identified relatively large single-stranded regions with enhanced UC→UU substitution frequencies with  exogenous APOBEC3A expression, while C29095U, for example, is a single unpaired base with high DMS  reactivity and high empirical C→U substitution frequency (discussed briefly in the introduction of the revised  manuscript). Future analyses could consider heterogeneity in secondary structure as well as secondary  structures with low heterogeneity where strained conformations could have higher reactivity.

      This work presents clear methods and an easy-to-access bioinformatic pipeline, which can be applied to other RNA viruses. Of note, it can be readily implemented in existing datasets. Finally, this study raises novel mechanistic questions on how mutational fitness is not correlated to secondary structure in the same way for every substitution.

      Overall, this work highlights the importance of studying mutational fitness beyond an immune evasion perspective. On the other hand, it also adds to the viral intrinsic constraints to immune evasion.

      Reviewer 2 Author recommendations:

      Even though the experiment was not performed in this manuscript, it would be helpful for the readers if it was briefly explained how secondary structure is inferred from DMS reactivity, as this technique is not broadly used.

      It is not objective to refer to the Lan et al. model of RNA structure as "high quality" given the limitations of their experimental approach (low MOI, asynchronous infection, DMS-only, no long-range interactions) and the lack of external validation of the structure of the genome they propose.

      I removed “high-quality” from the abstract. Since a result of the paper is that secondary structure correlates  with synonymous substitution rates, this is an observation that can be used to retrospectively compare the  quality of secondary structure models in this respect. I updated the manuscript to include such a  comparison, and did not find a large difference between secondary structure models (Results lines  110–122). I added a discussion of how multiple data sources can potentially be integrated to more  accurately predict the impact of a substitution of viral fitness.

      I have also added a brief discussion of constraints on how much we can confidently infer from these  experiments given limitations of the experimental approach. I note that DMS and SHAPE data provide  information that can be combined to make a stronger model, and that predictions can be rapidly tested  given observations by Gout (Symonds?) et al that  in  vitro  substitution rates correlate with those observed  during the pandemic (Discussion lines 195–201).

      Mutational fitness from Bloom & Neher was derived throughout the pandemic, much of which came from a period with the most active surveillance (Delta / Omicron waves). Consequently, these viruses differ from the WA1 strain used by Lan et al. far more than the 3 nt differences between lineage A and B that the author refers to. The following sentence should therefore be revised to avoid misleading the reader:

      "Additionally, note that DMS data was obtained in experiments using the WA1 strain in Lineage A, which differs from the more common Lineage B at 3 positions and could have different secondary structure."

      Revised:

      “Additionally, note that DMS data was obtained in experiments using the WA1 strain in Lineage A,  which differs from the more common Lineage B at 3 positions and could have different secondary  structure. Furthermore, mutational fitness is estimated from the phylogenetic tree of published  sequences (the public UShER tree (Turakhia et al., 2021) additionally curated to filter likely artifacts  such as branches with numerous reversions) that are typically far more divergent and subsequently  will have somewhat different secondary structures. Since the dataset used for mutational fitness  aggregates data across viral clades, my analysis will not capture secondary structure variation  between clades or indels and masked sites that were not considered in that analysis (Bloom and  Neher, 2023).”

      To determine the extent to which the results depend on the single RNA structure model, it would be informative "turn the crank again" on the analysis with one of the other RNA structure datasets for SARS-CoV-2 (though most other datasets suffer from similar problems of asynchronicity of infection).

      I have added comparisons when the Lan  et al  secondary  structure model is replaced by one of two others  derived from SHAPE data as described above. Also, I conducted preliminary comparisons of underlying  DMS and SHAPE reactivity data as described above, but I opted not to include these in the revised  manuscript given that methods different beyond the chemical probe used. I also discuss how multiple data  sources can potentially be integrated to more accurately predict the impact of a substitution of viral fitness.

      In Figure 1 it would be helpful to add the values of the unpaired/basepaired ratios in the plot for clarity.

      Furthermore, a similar analysis using the substitution frequency, which strengthens the conclusions, is mentioned in the text, however, it is not shown. It could be shown as part of Figure 1, or as a supplementary figure.

      This was a good suggestion since numbers around 1 are not perceived as being very significant. I added  the ratio of median unpaired:paired rates to Figure 1, updated the corresponding manuscript text and the  figure caption, and note that the numbers are somewhat changed from the first version of my manuscript  because of updating to use the most up-to-date mutational fitness estimates.

      It is not clear how the two constants were calculated to obtain the "adjusted mutational fitness". It could be shown as part of Figure 2, or as a supplementary figure.

      I added dashed lines and arrows to Figure 2 showing median paired/unpaired mutational fitnesses and the  adjustment made to normalize to the overall median. I also added Figure S3 showing this for synonymous  substitutions, where it is more clear given the lower fraction of mutations with substantial fitness impacts.

      Minor comments

      Statements like "the current fast-growing lineage JN.1.7" never age well... please revise to state the period of time to which this refers.

      Revised:

      “…lineage JN.1.7, which had over 20% global prevalence in Spring 2024…”

      Also, I checked the list of mutations and the examples given remain in the top 15 ranked basepaired,  non-synonymous C→U mutations (BA.2-defining C26060U is added to the list, but I did not update to  include this). It replaces C9246U, which was not mentioned in the first version of the manuscript.

      Similarly, please provide context for the reader in the phrase: "This was one mutation that characterized the B.1.177 lineage" (e.g. add its early reference as "EU1" and that it predominated in Europe in autumn 2020, prior to the emergence of the Alpha variant).

      Revised to add detail:

      This was one of the mutations that characterized the B.1.177 lineage. This lineage, also known as  EU1, characterized a majority of sequences in Spain in summer 2020 and eventually in several  other countries in Europe prior to the emergence of the Alpha variant. However, it was unclear  whether or this lineage had higher fitness than other lineages or if A222V specifically conferred a  fitness advantage.

      "massive sequencing of SARS-CoV-2" - the meaning of the word "massive" is unclear. Revise.

      Revised  “…millions of patient SARS-CoV-2 sequences published during the pandemic…”

    1. Author response:

      The following is the authors’ response to the original reviews.

      We were pleased that many of the critical comments of the reviewers have allowed us to improve our manuscript. In addition to revise the originally submitted figures, we performed new experiments (e.g. new Fig.2, Fig.3, Fig.4, and Fig.6) and revised the manuscript substantially following the reviewers’ comments and suggestions to our initial submission. A point-by-point response to the reviewers’ critiques are summarized below, and new supportive data are provided in this revised manuscript. Per the Reviewers’ comments and revisions, we revised the title to be “Cold induces brain region-selective cell activity-dependent lipid metabolism”. 

      Reviewer #1:

      Strengths:

      A strength of the study is trying to better understand how metabolism in the brain is a dynamic process, much like how it has been viewed in other organs. The authors also use a creative approach to measuring in vivo lipid peroxidation via delivery of a BD-C11 sensor through a cannula to the region in conjunction with fiber photometry to measure fluorescence changes deep in the brain.

      We thank the Reviewer so much for the positive comments on this interesting study on metabolism in the brain.

      Weaknesses:

      One weakness was many of the experiments were done in a manner that could not distinguish between the contributions of neurons and glial cells, limiting the extent of conclusions that could be made. While this is not easily doable for all experiments, it can be done for some. For example, the Fos experiments in Figure 3 would be more conclusive if done with the labeling of neuronal nuclei with NeuN, as glial cells can also express Fos. To similarly show more conclusively that neurons are being activated during cold exposure, the calcium imaging experiments in Figure S3 can be done with cold exposure. 

      We agreed with the Reviewers’ comments. We revised the original Figure 3 (new Figure 6) and Figure S3 (new Figure S4). Our data show that cold increased Fos-positive cells in the PVH (Figure 6) and increased neuronal Ca2+ signals (new Figure S4). As it is difficult to exclude the involvements of astrocytes in the cold-induced lipid metabolism, and to address this reviewer’s questions, we revised the title and the text with replacing “neuronal” with “‘cell” activity, and we concluded that cold induced lipid metabolism depending on “cell activity” instead of “neuronal activity”. Studying cell type-specific contributions to the cold-induced effects on lipid metabolism will require many efforts beyond the scope of this study, to which we assumed that both neurons and glial cells contribute.

      Additionally, many experiments are only done with the minimal three animals required for statistics and could be more robust with additional animals included.  

      We thank this reviewer for the comments. We added the sample sizes accordingly in this revised manuscript.

      Another weakness is that the authors do not address whether manipulating lipid droplet accumulation or lipid peroxidation has any effect on PVH function (e.g. does it change neuronal activity in the region?).

      We thank this reviewer for bringing up this interesting point. The focus of this study was to examine how cold modulates lipid metabolism in the brain, while it is another interesting project studying how brain lipid metabolism (e.g. manipulating LD accumulation or lipid peroxidation) modulates neuronal activity, which however will require many efforts beyond the scope of this study. Manipulating LD or peroxidation would affect multiple cellular signaling pathways and physiological experimental conditions need to be developed. However, to address this reviewer’s questions, we performed preliminary studies with treating brain slices with the lipid peroxidation inhibitor a-TP and recorded PVH neurons, but did not observe differences in firing rates in a-TPtreated brain slices and controls (Data not shown).  

      Reviewer #2:

      Strengths:

      A set of relatively novel and interesting observations. Creative use of several in vivo sensors and techniques.

      We thank the Reviewer so much for the positive comments on our studies in both concept and techniques. 

      Weaknesses:  

      (1) The physiological relevance of lipolysis and thermogenesis genes in the PVH. The authors need to provide quantitative and substantial characterizations of lipid metabolism in the brain beyond a panel of qPCRs, especially considering these genes are likely expressed at very low levels. mRNA and protein level quantification of genes in Fig 1, in direct comparison to BAT/iWAT, should be provided. Besides bulk mRNA/protein, IHC/ISH-based characterization should be added to confirm to cellular expression of these genes.

      We agreed with the Reviewer’s comments and thank this reviewer for the constructive suggestions. To address this reviewer’s comments and suggestions, we performed additional experiments to verify cold-induced expressions of lipid lipolytic genes and proteins. For example, we stained ATGL and HSL in both neurons and astrocytes in the PVH. Matching with the increased gene expressions, cold increased protein expressions of ATGL (new Figure 2) and HSL (new Figure 3) in both neurons and astrocytes. We also performed western blots of p-HSL and HSL and observed that cold increased the expression level of p-HSL (new Figure 4). These new results support our conclusions and further demonstrate that cold increases lipid metabolism in the PVH.   

      (2) The fiberphotometry work they cited (Chen 2022, Andersen 2023, Sun 2018) used well-established, genetically encoded neuropeptide sensors (e.g., GRABs). The authors need to first quantitatively demonstrate that adapting BD-C11 and EnzCheck for in vivo brain FP could effectively and accurately report peroxidation and lipolysis. For example, the sensitivity, dynamic range, and off-time should all be calibrated with mass spectrometry measurements before any conclusions can be made based on plots in Figures 4, 5, and 6. This is particularly important because the main hypothesis heavily relies on this unvalidated technique.

      We thank this reviewer’s comments. Fiber photometry has been well demonstrated to detect fluorescent-labelled biomolecules in my laboratory and other labs, as indicated in the above stated publications. In this study, we combined photometry with the well commercially developed and validated lipid metabolic fluorescent-labelled biomarkers to monitor lipid metabolic dynamics in vivo. We indeed verified this approach in both brain (this study) and peripheral adipose tissues (another project). Particularly, our data in this study show that lipid peroxidation inhibitor a-TP blocked the cold-induced lipid peroxidation signals (Fig. 7A-C) and the pan-lipase inhibitor DEUP blocked the cold-induced lipolytic signals (Fig. 8A-C). These results demonstrate that the signals detected by photometry indeed reflect lipid peroxidation and lipolysis respectively in the brain. Meanwhile, we agreed with the reviewer’s suggestions on mass spectrometry measurements, while it is not feasible for us to perform the spectrometry in the brain in vivo at this moment.       

      (3) Generally, the histology data need significant improvement. It was not convincing, for example, in Figure 3, how the Fos+ neurons can be quantified based on the poor IF images where most red signals were not in the neurons. 

      We thank this reviewer for this comment. We performed additional experiments to add sample size and presented high quality images. 

      (4) The hypothesis regarding the direct role of brain temperature in cold-induced lipid metabolism is puzzling. From the introduction and discussion, the authors seem to suggest that there are direct brain temperature changes in responses to cold, which could be quite striking. However, this was not supported by any data or experiments. The authors should consolidate their ideas and update a coherent hypothesis based on the actual data presented in the manuscript. 

      We thank this reviewer for bringing up this comment and constructive suggestions. To make this study more concise on the cold-induced lipid metabolism, we removed the statements related to the brain temperature.

      Reviewer #1 (Recommendations For The Authors):

      An additional minor weakness is that the authors are redundant in their discussion, sometimes repeating sections from the introduction (e.g. this line in the discussion "Evidence shows that the brain's energy expenditure efficiency largely depends on the temperature (Yu et al., 2012), and temperature gradients between different brain regions exist (Anderson and Moser, 1995; Delgado and Hanai, 1966; Hayward and Baker, 1968; McElligott and Melzak, 1967; Moser and Mathiesen, 1996; Thornton, 2003)"). 

      We thank the Reviewer for these comments. We revised the text following the suggestions accordingly and removed the statements and references related to brain temperatures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary: 

      IPF is a disease lacking regressive therapies which has a poor prognosis, and so new therapies are needed. This ambitious phase 1 study builds on the authors' 2024 experience in Sci Tran Med with positive results with autologous transplantation of P63 progenitor cells in patients with COPD. The current study suggests that P63+ progenitor cell therapy is safe in patients with ILD. The authors attribute this to the acquisition of cells from a healthy upper lobe site, removed from the lung fibrosis. There are currently no cell-based therapies for ILD and in this regard the study is novel with important potential for clinical impact if validated in Phase 2 and 3 clinical trials. 

      Strengths: 

      This study addresses the need for an effective therapy for interstitial lung disease. It offers good evidence that the cells used for therapy are safe. In so doing it addresses a concern that some P63+ progenitor cells may be proinflammatory and harmful, as has been raised in the literature (articles which suggested some P63+ cells can promote honeycombing fibrosis; references 26 &35). The authors attribute the safety they observed (without proof) to the high HOPX expression of administered cells (a marker found in normal Type 1 AECs. The totality of the RNASeq suggests the cloned cells are not fibrogenic. They also offer exploratory data suggesting a relationship between clone roundness and PFT parameters (and a negative association between patient age and clone roundness). 

      We thank the reviewer for the important comments.

      Weaknesses: 

      The authors can conclude they can isolate, clone, expand, and administer P63+ progenitor cells safely; but with the small sample size and lack of a placebo group, no efficacy should be implied.

      We thank the reviewer for the suggestion and agree that we should be more cautious to discuss the efficacy of current study. 

      Specific points: 

      (1) The authors acknowledge most study weaknesses including the lack of a placebo group and the concurrent COVID-19 in half the subjects (the high-dose subjects). They indicate a phase 2 trial is underway to address these issues. 

      N/A

      (2) The authors suggest an efficacy signal on pages 18 (improvement in 2 subjects' CT scans) and 21 (improvement in DLCO) but with such a small phase 1 study and such small increases in DLCO (+5.4%) the authors should refrain from this temptation (understandable as it is). 

      We believe that exploring potential efficacy signal is also one aim of this study. All these efficacy endpoint analyses had been planned in prior to the start of clinical trials (as registered in ClinicalTrial.gov) and the data need be analyzed anyhow.

      (3) Likewise most CT scans were unchanged and those that improved were in the mid-dose group (albeit DLCO improved in the 2 patients whose CT scans improved). 

      Yes, it is.

      (4) The authors note an impressive 58m increase in 6MWTD in the high-dose group but again there is no placebo group, and the low-dose group has no net change in 6MWTD at 24 weeks. 

      Yes.

      (5) I also raise the question of the enrollment criteria in which 5 patients had essentially normal DLCO/VA values. In addition there is no discussion as to whether the transplanted stem cells are retained or exert benefit by a paracrine mechanism (which is the norm for cell-based therapies).

      Thank you for your detailed feedback.  The enrollment criteria are based on DLCO instead of DLCO/VA. And we would like to further discuss the possible benefit by paracrine mechanism in the revised manuscript.

      Recommendations for the authors: 

      (1) Four of the enrolled subjects had normal DLCO/VA (% of predicted) (>90% of predicted). This raises questions about the severity of their illness see: Table 1: Subjects 103, 105, 112, and 204 have DLCO/VA % predicted >90% of predicted and would appear not to qualify for the study. While technically enrollment criteria for DLCO are satisfied, DLCO/VA is an equally valid measure of ILD severity, and these 4 cases seem very mild. 

      Thank you for your detailed feedback. Yes, the current inclusion criteria is based on DLCO but not DLCO/VA.  And we believe improvement of DLCO and DLCO/VA is both meaningful. In future trial, we will consider DLCO/VA as inclusion criteria as well.

      (2) The authors state "Resolution of honeycomb lesion was also observed in patients of higher dose groups". This appears inaccurate as only 2 subjects in the study showed CT improvement and they were not in the highest dose group. This statement is an overreach for a Phase 1 study and should be removed from the abstract and more balance inserted in the text. The phase 2 study they are doing will answer these questions. 

      Thank you. We changed our statement about efficacy in the abstract part.

      a) Under exclusion criteria: More detail is required as to what defines "subjects who cannot tolerate cell therapy". 

      Those patients cannot tolerate previous cell therapy, for example mesenchymal stem cell transplantation, would not be included in the current trial.

      b) Figure S6 is important and should be in the main manuscript. This Figure shows that many (6) subjects had COVID at some trial measurement time points. This is an unfortunate confounder for efficacy signals (but efficacy is not the point of this study). Second, Figure 6 (in my view) shows little efficacy signal, which is a reminder to the authors that efficacy should not be implied in a study that was not powered to detect efficacy. 

      We agreed that the efficacy should be discussed very carefully.

      (3) Figure S3: It appears at some does there is a significant rise in monocytes (1M cells) and neutrophils (3 M cells). 

      Thank you for your reasonable concerns regarding the safety of the treatment. The monocyte counts in the S3 patients, even after an increase, remains within the reference range, and therefore we consider this elevation to be clinically meaningless. One patient exhibited a significant increase in neutrophils at 24 weeks, which was attributed to a grade II adverse event, acute bronchitis, which was unrelated to cell therapy. The symptoms resolved within three days following treatment with appropriate medication.

      (4) Figure 3: I wonder about the statistical significance of the 6MWD. Was this done by repeat measure ANOVA? The analysis suggests a p=0.08 but all error bars between low and high dose overlap and the biggest difference is at 24 weeks, and that appears to be labelled as not significant.

      Thank you for your kind reminding. The 6MWD result with a p-value of 0.008 was derived to compare the improvement in 6MWD at the 24-week time point versus baseline within the higher group. Therefore, a paired t-test was used for this analysis. In the revised version, we label them more clearly.

      Reviewer #2 (Public review):

      Summary: 

      This manuscript describes a first-in-human clinical trial of autologous stem cells to address IPF. The significance of this study is underscored by the limited efficacy of standard-of-care anti-fibrotic therapies and increasing knowledge of the role p63+ stem cells in lung regeneration in ARDS. While models of acute lung injury and p63+ stem cells have benefited from widespread and dynamic DAD and immune cell remodeling of damaged tissue, a key question in chronic lung disease is whether such cells could contribute to the remodeling of lung tissue that may be devoid of acute and dynamic injury. A second question is whether normal regions of the lung in an otherwise diseased organ can be identified as a source of "normal" p63+ stem cells, and how to assess these stem cells given recently identified p63+ stem cell variants emerging in chronic lung diseases including IPF. Lastly, questions of feasibility, safety, and efficacy need to be explored to set the foundation for autologous transplants to meet the huge need in chronic lung disease. The authors have addressed each of these questions to different extents in this initial study, which has yielded important if incomplete information for many of them. 

      Strengths: 

      As with a previous study from this group regarding autologous stem cell transplants for COPD (Ref. 24), they have shown that the stem cells they propagate do not form colonies in soft agar or cancers in these patients. While a full assessment of adverse events was confounded by a wave of Covid19 infections in the study participants, aside from brief fevers it appears these transplants are tolerated by these patients. 

      We thank the reviewer for the important comments.

      Weaknesses: 

      The source of stem cells for these autologous transplants is generally bronchoscopic biopsies/brushings from 5th-generation bronchi. Although stem cells have been cloned and characterized from nasal, tracheal, and distal airway biopsies, the systematic cloning and analysis of p63+ stem cells across the bronchial generations is less clear. For instance, p63+ stem cells from the nasal and tracheal mucosa appear committed to upper airway epithelia marked by 90% ciliated cells and 10% goblet cells (Kumar et al., 2011. Ref. 14). In contrast, p63+ stem cells from distal lung differentiate to epithelia replete with Club, AT2, and AT1 markers. The spectrum of p63+ stem cells in the normal bronchi of any generation is less studied. In the present study, cells are obtained by bronchoscopy from 3-5 generation bronchi and expanded by in vitro propagation. Single-cell RNA-seq identifies three clusters they refer to as C1, C2, and C3, with the major C1 cluster said to have characteristics of airway basal cells and C2 possibly the same cells in states of proliferation. Perhaps the most immediate question raised by these data is the nature of the C1/C2 cells. Whereas they are clearly p63/Krt5+ cells as are other stem cells of the airways, do they display differentiation character of "upper airway" marked by ciliated/goblet cell differentiation or those of the lung marked by AT2 and AT1 fates? This could be readily determined by 3-D differentiation in so-called airliquid interface cultures pioneered by cystic fibrosis investigators and should be done as it would directly address the validity of the sourcing protocol for autologous cells for these transplants. This would more clearly link the present study with a previous study from the same investigators (Shi et al., 2019, Ref. 9) whereby distal airway stem cells mitigated fibrosis in the murine bleomycin model. The authors should also provide methods by which the autologous cells are propagated in vitro as these could impact the quality and fate of the progenitor cells prior to transplantation. 

      We totally agree that the sub-population of the progenitor cells should be further analyzed. We would try this in the revised manuscript. And the methods to expand P63+ lung progenitor cells have been described in full details by Frank McKeon/Wa Xian group (Rao, et.al., STAR Protocols, 2020), which is adapted to pharmaceutical-grade technology patented by Regend Therapeutics, Ltd.

      The authors should also make a more concerted effort to compare Clusters 1, 2, and 3 with the variant stem cell identified in IPF (Wang et al., 2023, Ref. 27). While some of the markers are consistent with this variant stem cell population, others are not. A more detailed informatics analysis of normal stem cells of the airways and any variants reported could clarify whether the bronchial source of autologous stem cells is the best route to these transplants.  

      We thank for reviewer for the good suggestion and would like to make more detailed comparison in the revised manuscript.

      Other than these issues the authors should be commended for these firstin-human trials for this important condition.

      Thank you so much for the kind compliment.

      Recommendations for the authors: 

      Described in the review text but the authors need to be clear about how they propagated autologous stem cells in vitro.

      (1) Perhaps the most immediate question raised by these data is the nature of the C1/C2 cells. Whereas they are clearly p63/Krt5+ cells as are other stem cells of the airways, do they display differentiation character of "upper airway" marked by ciliated/goblet cell differentiation or those of the lung marked by AT2 and AT1 fates?

      The differentiation potential of the P63+/KRT5+ basal progenitor cells have been analyzed in multiple previous literatures, which are mentioned in the revised introduction part. Basically, the human P63+ progenitor cells can differentiate into airway epithelial cells in the airway area, while give rise to immature, but functional AT1 cells in alveolar area.

      (2) The authors should also provide methods by which the autologous cells are propagated in vitro as these could impact the quality and fate of the progenitor cells prior to transplantation.

      The methods to expand P63+ lung progenitor cells have been described in full details by Frank McKeon/Wa Xian group (Rao, et.al., STAR Protocols, 2020), which is adapted to pharmaceutical-grade technology patented by Regend Therapeutics, Ltd.

      (3) A more detailed informatics analysis of normal stem cells of the airways and any variants reported could clarify whether the bronchial source of autologous stem cells is the best route to these transplants.

      We thank the reviewer for the kind suggestion and have included the comparative analysis in revised Figure S2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public review)

      Weaknesses: 

      The main weakness of the manuscript is that to a large degree, one of its main conclusions (MAP symmetry underlies differences in regenerative capacity) relies mainly on a correlation, without firmly establishing a causal link. However, this weakness is relatively minor because (1) it is partially addressed with the Spastin KO and (2) there isn't a trivial way to show a causal relationship in this case.

      We thank Reviewer #1 for their positive assessment of our manuscript. To further strengthen the claim that MAP asymmetry underlies differences in regenerative capacity, we could investigate the effect of depleting other MAPs that lose asymmetry after conditioning lesion (CRMP5 and katanin). One would expect that similarly to spastin, this would disrupt the physiological asymmetry of DRG axons and impair axon regeneration. We further discussed this issue in the revised version of the manuscript (page 17, line 381).

      Reviewer #2 (Public review)

      Weaknesses:

      In order for the method to be used it needs to be better described. For instance what proportion of neurons develop just two axonal branches, one of which is different? How selective are the researchers in finding appropriate neurons?

      We thank Reviewer #2 for their positive assessment of our manuscript. As suggested, we included further methodological details on the in vitro system in the revised version of the manuscript. We have previously evaluated the percentage of DRG neurons exhibiting different morphologies in our cultures: multipolar (4±1%), bipolar, (35±8%) bell-shaped (17±5%), and pseudo-unipolar neurons (43±3%). This was included in the revised manuscript on Figure 1B and page 5, line 107.  All the pseudo-unipolar neurons analysed had distinct axonal branches in terms of diameter and microtubule dynamics. For imaging purposes, we selected pseudounipolar neurons with axons unobstructed from other cells or neurites within a distance of at least 20–30 μm from the bifurcation point, to ensure optimal imaging. In the case of laser axotomy experiments, this distance was increased to 100–200 μm to ensure clear analysis of regeneration. These selection criteria is now detailed in the Methods (page 19, line 417, and page 21, line 474).

      Reviewer #3 (Public review):

      (1) Weaknesses:

      While some of the data are compelling, experimental evidence only partially supports the main claims. In its current form, the study is primarily descriptive and lacks convincing mechanistic insights. It misses important controls and further validation using 3D in vitro models.

      We recognize the importance of further exploring the contribution of other MAPs to microtubule asymmetry and regenerative capacity of DRG axons. In future work, we plan to investigate this issue using knockout mice for katanin and CRMP5. Regarding the mechanisms underlying the differential localization of proteins in DRG axons, we performed in-situ hybridization to evaluate the availability of axonal mRNA but no differences were found between central and peripheral DRG axons (Figure 4 – figure supplement 2). To address whether differences in protein transport exist, we attempted to transduce DRG neurons with GFP-tagged spastin both in vitro and in vivo. However, these experiments were inconclusive as very low levels of spastin-GFP were detected. We are actively optimizing these approaches and will address this challenge in future studies. These points were further discussed in the revised manuscript (page 15, line 330 and page 17, line 381).

      (2) Given the heterogeneity of dorsal root ganglion (DRG) neurons, it is unclear whether the in vitro model described in this study can be applied to all major classes of DRG neurons. 

      We acknowledge the diversity of DRG neurons and agree that assessing the presence

      of different DRG subtypes in our culture system will enrich its future use. Despite this heterogeneity, we focused on DRG neuron features that are common to all subtypes i.e, pseudo-unipolarization and higher regenerative capacity of peripheral branches. This point was addressed on page 14, line 309 of the revised manuscript.

      (3) Also unclear is the inconsistency with embryonic DRG cultures with embryonic (E)16 from rats and E13 from mice (spastin knockout and wild-type controls). 

      Given our previous experience in establishing DRG neuron cultures from E16 Wistar rats and E13 C57BL/6 mice, these developmental stages are equivalent, yielding cultures of DRG neurons with similar percentages of different morphologies. Of note, in our colonies, gestation length is ~19 days in C57BL/6 mice (background of the spastin knockout line) and ~22 days in Wistar Han rats. This was further clarified in the Methods (page 18, line 404).

      (4) Furthermore, the authors stated (line 393) that only a small subset of cultured DRG neurons exhibited a pseudo-unipolar morphology. The authors should include the percentage of the neurons that exhibit a pseudo-unipolar morphology.

      We have previously evaluated the percentage of DRG neurons exhibiting different morphologies in our cultures: multipolar (4±1%), bipolar, (35±8%) bell-shaped (17±5%), and pseudo-unipolar neurons (43±3%). This was included in the revised manuscript on Figure 1B and on page 5, line 107. In line 393, we referred specifically to an experimental setup where DRG neuron transduction was done, and 30 transduced neurons were randomly selected for longitudinal imaging. From these, the number of viable pseudo-unipolar DRG neurons was limited by both the random nature of viral transduction and light-induced toxicity throughout continuous imaging over seven consecutive days at hourly intervals. This was clarified in the revised manuscript (page 20, line 438).

      (5) The significance of studying microtubule polymerization to DRG asymmetry in vitro is questionable, especially considering the model's validity. The authors might consider eliminating the in vitro data and instead focus on characterizing DRG asymmetry in vivo both before and after a conditioning lesion. If the authors choose to retain the in vitro data, classifying the central and peripheral-like branches in cultured DRG neurons will require further in-depth characterization. Additional validation should be performed in adult DRG neuron cultures not aged in vitro.

      The in vitro system here presented reliably reproduces several key features of DRG neurons observed in vivo, including asymmetry in axon diameter, regenerative capacity, axonal transport, and microtubule dynamics. Of note, most studies in the field have been done using multipolar DRG neurons that do not recapitulate in vivo morphology and asymmetries. Thus, the current in vitro model serves as a versatile tool for advancing our understanding of DRG biology and associated diseases. This system is particularly suited to study axon regeneration asymmetries, and enables the investigation of mechanisms occurring at the stem axon bifurcation, such as asymmetric protein transport and microtubule dynamics, which are challenging to examine in vivo due to the length of the stem axon and the difficulty of locating the DRG T-junction. It will be important to optimize similar cultures using adult DRG neurons. However, this comes with challenges, such as lower cell viability. This is the case with multiple other neuron types for which the vast majority of cultures are obtained from embryonic tissue. These concerns were addressed in the revised version of the manuscript (page 13, line 296 and page 14 line 302).

      (6) The comparison of asymmetry associated with a regenerative response between in vitro and in vivo paradigms has significant limitations due to the nature of the in vitro culture system. When cultured in isolation, DRG neurons fail to form functional connections with appropriate postsynaptic target neurons (the central branch) or to differentiate the peripheral domains associated with the innervation of target organs. Rather than growing neurons on a flat, hard surface like glass, more physiologically relevant substrates and/or culturing conditions should be considered. This approach could help eliminate potential artifacts caused by plating adult DRG neurons on a flat surface. Additionally, the authors should consider replicating their findings in a 3D culture model or using dorsal root ganglia explants, where both centrally and peripherally projecting axons are present.

      We agree that a more sophisticated system, such as a compartmentalized culture, holds great potential for future research. In this respect, we are currently engaged in developing such models. A compartmentalized system would enable the separation of three compartments: central nervous system neurons, DRG neurons, and peripheral targets. While previous efforts to create compartmentalized DRG cultures have been reported (e.g., PMID: 11275274 and PMID: 37578145), these systems have not demonstrated the development of pseudo-unipolar morphology. Incorporating non-neuronal DRG cells into the DRG neuron compartment, may successfully support the development of a pseudo-unipolar morphology. 

      We also recognize the importance of dimensionality in fostering pseudo-unipolar morphology. Of note, our model provides a 3D-like environment, as DRG glial cells are continuously replicating over the 21 days in culture. In relation to DRG explants, we attempted their use but encountered limitations with confocal microscopy as the axial resolution was insufficient to resolve processes at the DRG T-junction or within individual branches. The above issues are now discussed in the revised manuscript (page 14, line 312).

      (7) Panels 5H-J require additional processing with astrocyte markers to accurately define the lesion borders. Furthermore, including a lower magnification would facilitate a direct comparison of the lesion site. 

      In our study, we relied on the alignment of nuclei to delineate the lesion site as in our accumulated experience, this provides an accurate definition of the lesion boarder. Outside the lesion, the nuclei are well-aligned, while at the lesion site, they become randomly distributed. Additionally, CTB staining further supports the identification of the rostral boarder of the lesion, as most injured central DRG axons stop their growth at the injury site. This was further detailed in the Methods of the revised manuscript (page 32, line 730).

      (8) The use of cholera toxin subunit B (CTB) to trace dorsal column sensory axons is prone to misinterpretation, as the tracer accumulates at the axon's tip. This limitation makes it extremely challenging to distinguish between regenerating and degenerating axons.

      While alternative methods to trace or label regenerating axons exist, CTB is a wellestablished and widely used tracer for central sensory projections, as shown in different studies (PMID: 22681683, PMID: 26831088 and PMID: 33349630). Regarding the concern of possiblebCTB labeling in degenerating axons, we believe this is unlikely to be the case in our system, as in spinal cord injury controls, CTB-positive axons are nearly absent. Also, as regeneration was investigated six weeks after injury, axon degeneration has most likely already occurred as shown in (PMID: 15821747 and PMID: 25937174).

      Recommendations for the authors: 

      Reviewer #1:

      (1) Figure 1 can be improved by adding a quantification of the fraction of neurons at each stage as a function of time.

      We have updated Figure 1 to include the quantification of the percentages of different DRG neuron morphologies at DIV21 (Figure 1B), which corresponds to the stage at which all in vitro experiments were conducted.

      (2) Figure 3A: why are retrograde transport events not shown?

      Retrograde transport events are not displayed as results did not reach statistical significance.

      (3) Figure 3 and 4: Combine the quantifications of with/without lesion, such that not only the differences between branches are apparent, but also the differences induced in each branch by the lesion.

      As requested, only combined quantifications of microtubule dynamics for naive and conditioning lesion are provided in the revised version of Figure 3 (Figures 3H and 3K), to highlight both branch-specific differences and lesion-induced changes. However, for Figure 4, as the western blots for naive and conditioning lesion were performed on separate gels, it is unfeasible to combine their quantification.

      (4) Figure 5: does spastin KO lead to a difference in the "MAP signature" of each branch? Also, if in addition to MAPs there are other known molecules (and an antibody is available) that show differential localization to peripheral/central branches, it would be nice to check if this asymmetry is also lost in spastin KO.

      Evaluating the MAP signature in DRG axons from spastin KO mice will be important to explore in future experiments. Despite some scattered reports in the literature, our study is the first to identify a distinct protein signature of central and peripheral DRG axons. This is especially relevant in the case of Tau, as irrespective of the experimental conditions, its levels are always increased in the peripheral DRG axon.

      Reviewer #2:

      (1) Please provide a more complete description of the culture method. Do all neurons develop two asymmetric branches or just a few, and how are they selected? Does the timing of the events in vitro correspond with what is happening to the neurons in embryos?

      We have included the percentages of the various DRG neuron morphologies at DIV21 in the revised manuscript (Figure 1B and on page 5, line 107). Additionally, a more detailed description of the culture method is now provided in the Methods, including the criteria used to select pseudo-unipolar neurons (page 19, line 417, and page 21, line 474). 

      Regarding the timing of events, upon DRG dissociation, neurons reinitiate polarization, taking 21 days to reach approximately 40% pseudo-unipolar morphology. A similar percentage is reached at E16.5 during rat development in vivo (PMID: 8729965).

      (2) Are the neurons and their branches resting on the glia? Is there any relation to the presence of glia and the type of growth that is seen?

      Yes, neurons and their branches rest on glia. This is required for DRG pseudounipolarization. In future studies, we plan to further investigate neuron-extrinsic mechanisms leading pseudo-unipolarization, and to identify the specific glial cell type(s) needed throughout this process. This is now discussed in the revised manuscript (page 14, line 306).

      (3) Is it possible to trace microtubules so as to see whether the microtubules of the two branches mix, or whether they remain separate all the way to the cell bodies?

      We used DRG neurons transduced with EB3-GFP, to examine microtubule polymerization at the T-junction through live imaging. This revealed a high continuum of polymerization from the stem axon to the central-like axon (Figure 4 – figure supplement 2D-G). To further determine whether microtubules from both branches mix or remain separate, alternative techniques such as FIB-SEM could be performed. This point is now further discussed in the revised manuscript (page 16, line 352).

      (4) Using the term MAPs would lead readers to expect to see an analysis of different levels of MAP1, MAP2, etc. It would be interesting to see this if the authors have done it, but it is not necessary for the paper.

      We assessed the expression of MAP2 via western blot in DRG peripheral and central axons and no differences were found. This is now referred to in the Discussion (pages 15, line 327).

      (5) The regeneration experiments on the spastin knockouts are complicated by the lesion being in CNS tissue, which introduces various issues. Is there a difference in regeneration after dorsal root crush?

      We have not yet examined whether regeneration differs after dorsal root crush in the spastin knockout model. However, this presents an interesting question, as Schwann cells in the dorsal root, may support regeneration of central DRG axons.  

      Reviewer #3:

      The authors stated that the normality of the datasets was tested using the Shapiro-Wilk or D'Agostino-Pearson omnibus normality test. Given the low sample size (n=4) for some of the experiments presented (e.g., Figure 3B), it is not clear how normality was assessed which justifies the use of parametric tests.

      We followed GraphPad’s recommendations for selecting the appropriate normality test (https://www.graphpad.com/support/faqid/959/). The D'Agostino-Pearson omnibus K2 test, recommended for its versatility, was used when sample size was 8 or more. For smaller sample sizes (n < 8), we used the Shapiro-Wilk test, which is also widely used in biological research and can be employed with datasets of at least 3 values. These tests guided our decision-making regarding the use of parametric or non-parametric statistical tests.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      The manuscript by Zhang et al. explores the effect of autophagy regulator ATG6 on NPR1-mediated immunity. The authors propose that ATG6 directly interacts with NPR1 in the nucleus to increase its stability and promote NPR1-dependent immune gene expression and pathogen resistance. This novel role of ATG6 is proposed to be independent of its role in autophagy in the cytoplasm. The authors demonstrate through biochemical analysis that ATG6 interacts with NPR1 in yeast and very weakly in vitro. They further demonstrate using overexpression transgenic plants that in the presence of ATG6-mcherry the stability of NPR1-GFP and its nuclear pool is increased.

      Comments on latest version:

      The term "invasion" has to be replaced with infection, as it doesn't have much meaning to this particular study. I already explained this point in the first review, but authors did not address it throughout the manuscript.

      Thank you for your constructive feedback. We have taken your suggestion into account and replaced "invasion" with "infection" in the revised manuscript (Lines 44,45,99,100,298,341,387,415,461,463,464,1002).

      In fig. 1e there's no statistical analysis. How can one show measurements from multiple samples without statistical analysis? All the data points have to be shown in the graph and statistics performed. In the arg6-npr1 and snrk-npr1 pairs no nuclear marker is included. How can one know where the nucleus is, particularly in such poor quality low res. images? The nucleus marker has to be included in this analysis and shown. This is an important aspect of the study as nuclear localization of ATG6 is proposed to be essential for its new function.

      Thank you for bringing this to our attention. We conducted the BIFC experiments again using nls-mCherry transgenic tobacco, which yielded clearer images. The results clearly demonstrate that ATG6 interacts with NPR1 in both the cytoplasm and nucleus. YFP signaling in the nucleus co-localizes with nls-mCherry (a nuclear localization mark). SnRK2.8 was employed as a positive control for NPR1 interaction." Relative fluorescence intensity of YFP were analyzed using image J software, n = 15 independent images were analyzed to quantify YFP fluorescence. All data points are displayed in the image, and we also conducted a Student's t-test analysis. We have incorporated these results into the revised manuscript (Fig 1d and e).

      Co-localization provided in the fig. S2 cannot complement this analysis, particularly since no cytoplasmic fraction is present for NPR1-GFP in fig. S2.

      Thank you for your observation. We repeated the experiment and confirmed that NPR1 and ATG6 co-localize in both the nucleus and cytoplasm. The image in Figure S2 has been updated accordingly.

      In the alignment in fig 2c, it is not explained what are the species the atg6 is taken from. The predicted NLS has to be shown in the context of either the entire protein sequence alignment or at least individual domain alignment with the indication of conserved residues (consensus). They have to include more species in the analysis, instead of including 3 proteins from a single species. Also, the predicted NLS in atg6 doesn't really have the classical type architecture, which might be an indication that it is a weak NLS, consistent with the fact that the protein has significant cytoplasmic accumulation. They also need to provide the NLS prediction cut-off score, as this parameter is a measure of NLS strength.

      Line 150: the NLS sequence "FLKEKKKKK" is a wrong sequence.

      Thank you for your suggestion. In both plants and animals, proteins are transported to the nucleus via specific nuclear localization signals (NLSs), which are typically characterized by short stretches of basic amino acids (Dingwall and Laskey, 1991, Raikhel, 1992, Nigg, 1997). Following your recommendation, we re-predicted potential NLS sequences in the ATG6 protein using NLSExplorer (http://www.csbio.sjtu.edu.cn/bioinf/NLSExplorer). Although we did not identify a classical monopartite NLS, we discovered a bipartite NLS similar to the consensus bipartite sequence (KRX<sub>(10-12)</sub>K(KR)(KR)) (Kosugi et al., 2009)in the carboxy-terminal region (475-517 aa) of ATG6, with a cut-off score of 2.6. These findings are consistent with substantial accumulation of ATG6 in the cytoplasm and minimal accumulation in the nucleus. Additionally, our comparison of ATG6 C-terminal sequences across several species, including Microthlaspi erraticum, Capsella rubella, Brassica carinata, Camelina sativa, Theobroma cacao, Brassica rapa, Eutrema salsugineum, Raphanus sativus, Hirschfeldia incana and Brassica napus, sequence comparison indicates that this bipartite NLS is relatively conserved. We have incorporated these results into the revised manuscript (lines 450-160).

      In fig. 3d no explanation for the error bars is included, and what type of statistical analysis is performed is not explained.

      Thank you for bringing this to our attention. In Figure 3d, a Student's t-test was conducted to analyze the data. The mean and standard deviation were calculated from three biological replicates, and the relevant description has been included in the figure notes.

      Reference

      Dingwall, C. and Laskey, R.A. (1991) Nuclear targeting sequences--a consensus? Trends Biochem Sci, 16, 478-481.

      Kosugi, S., Hasebe, M., Matsumura, N., Takashima, H., Miyamoto-Sato, E., Tomita, M. and Yanagawa, H. (2009) Six classes of nuclear localization signals specific to different binding grooves of importin alpha. J Biol Chem, 284, 478-485.

      Nigg, E.A. (1997) Nucleocytoplasmic transport: signals, mechanisms and regulation. Nature, 386, 779-787.

      Raikhel, N. (1992) Nuclear targeting in plants. Plant Physiol, 100, 1627-1632.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      However, given that S1P is upstream NF-κB signaling, it is unclear if it offers conceptual innovations as compared to previous studies from the same team (Palazzo et al. 2020; 2022, 2023)

      We find distinct differences between the impacts of S1P- and NFkB-signaling on glial activation, neuronal differentiation of the progeny of MGPCs and neuronal survival in damaged retinas. In the current study we demonstrate that 2 consecutive daily intravitreal injections of S1P selectively activated mTor (pS6) and Jak/Stat3 (pStat3), but not MAPK (pERK1/2) signaling in Müller glia.  Further, inhibition of S1P synthesis (SPHK1 inhibitor) decreased ATF3, mTor (pS6) and pSmad1/5/9 levels in activated Müller glia in damaged retinas. Inhibition of NFkB-signaling in damaged chick retinas did not impact the above-mentioned cell signaling pathways (Palazzo et al., 2020). Thus, S1P-signaling impacts cell signaling pathways in MG that are distinct from NFκB, but we cannot exclude the possibility of cross-talk between NFkB and these pathways. Further, inhibition of NFκB-signaling potently decreases numbers of dying cells and increases numbers of surviving ganglion cells (Palazzo et al 2020). Consistent with these findings, a TNF orthologue, which presumably activates NFκB-signaling, exacerbates cell death in damage retinas (Palazzo et al., 2020). By contrast, 5 different drugs targeting S1P-signaling had no effect on numbers of dying cells and only one S1PR1 inhibitor modestly decreased numbers of dying cells (current study). Although two different inhibitors of NFkB-signaling suppressed the proliferation of microglia in damaged retinas (Palazzo et al., 2020), all of the S1P-targeting drugs had no effect upon the proliferation of microglia (current study). In addition, inhibition of NFκB does not influence the neurogenic potential of MGPCs in damaged chick retinas (Palazzo et al., 2020), whereas inhibition of S1P receptors (S1PR1 and S1PR3) and inhibition of S1P synthesis (SPHK1) significantly increased the differentiation of amacrine-like neurons in damaged retinas (current study). Collectively, in comparison to the effects of pro-inflammatory cytokines and NFκB-signaling, our current findings indicate that S1P-signaling through S1PR1 and S1PR3 in Müller glia has distinct effects upon cell signaling pathways, neuronal regeneration and cell survival in damaged retinas. We will revise text in the Discussion (pages 33-34) to better highlight these important distinctions between NFκB- and S1P-signaling.

      Reviewer #2 (Public review):

      Weaknesses:

      The methodology is not very clean. A number of drugs (inhibitors/ antagonists/agonists signal modulators) are used to modulate S1P expression or signaling in the retina without evidence that these drugs are reaching the target cells. No alternative evaluation if the drugs, in fact, are effective. The drug solubility in the vehicle and in the vitreous is not provided, and how did they decide on using a single dose of each drug to have the optimal expected effect on the S1P pathway?

      Müller glia are the predominant retinal cell type that expresses S1P receptors. Consistent with these patterns of expression, we report Müller glia-specific effects of different agonists and antagonists that increase or decrease S1P-signaling. Since we compare cell-level changes within contralateral eyes wherein one retina is exposed to vehicle and the other is exposed to vehicle plus drug, it seems highly probable that the drugs are eliciting effects upon the Müller glia. It is possible, but very unlikely, that the responses we observed could have resulted from drugs acting on extra-retinal tissues, which might secondarily release factors that elicit cellular responses in Müller glia. However, this seems unlikely given the distinct patterns of expression for different S1P receptors in Müller glia, and the outcomes of inhibiting Sphk1 or S1P lyase on retinal levels of S1P.

      For example, we provide evidence that S1PR1 and S1PR3 expression is predominant in Müller glia in the chick retina using single cell-RNA sequencing and fluorescence in situ hybridization (FISH). Thus, we expect that S1PR1/3-targeting small molecule inhibitors to directly act on Müller glia, which is consistent with our read-outs of cell signaling with injections of S1P in undamaged retinas. We show that SPHK1 and SGPL1, which encode the enzymes that synthesize or degrade S1P, are expressed by different retinal cell types, including the Müller glia. The efficacy of the drugs that target SPHK1 and SGPL1 was assessed by measuring levels of S1P in the retina. By using liquid chromatography and tandem mass spectroscopy (LC-MS/MS), we provide data that inhibition of S1P synthesis (inhibition of SPHK1) significantly decreased levels of S1P in normal retinas, whereas inhibition of S1P degradation (inhibition of SGPL1) increased levels of S1P in damaged retinas (Fig. 5).  These data suggest that the SPHK1 inhibitor and the SGPL1 inhibitor specifically act at the intended target to influence retinal levels of S1P.  Further, inhibition of SPHK1 (to decrease levels S1P) results in decreased levels of ATF3, pS6 (mTor) and pSMAD1/5/9 in Müller glia, consistent with the notion that reduced levels of S1P in the retina impacts signaling at Müller glia. Finally, we find similar cellular responses to chemically different agonists or antagonists, and we find opposite cellular responses to agonists and antagonists, which are expected to be complimentary if the drugs are specifically acting at the intended targets in the retina. We will revise the Discussion to better address caveats and concerns regarding the actions and specificity of different drugs within the retina following intravitreal delivery.

      We will provide the drug solubility specifications and estimates of the initial maximum dose per eye for each drug. For chick eyes between P7 and P14, these estimates will assume a volume of about 100 ul of liquid vitreous, 800 ul gel vitreous and an average eye weight of 0.9 grams. We will revise Table 1 (pharmacological compounds) with ranges of reported in vivo ED50’s (mg/kg) for drugs and we will list the calculated initial maximum dose (mg/kg equivalent) per eye. Doses were chosen based on estimates of the initial maximum ocular dose that were within the range of reported ED50’s. However, as is the case for any in vivo model system, it is difficult to predict rates of drug diffusion out of the vitreous, how quickly the drugs are cleared from the entire eye, how much of the compound enters the retina, and how quickly the drug is cleared from the retina. Accordingly, we assessed drug specificity and sites of activation by relying upon readouts of cell signaling pathways that are parsed with patterns of expression of different S1P receptors and measurements of retinal levels of S1P following exposure to drugs targeted enzymes that synthesize or degrade S1P, as described above. 

      Reviewer #1 (Recommendations for the authors):

      I am wondering if Muller glia can be considered as fully differentiated at early postnatal stages as those used in this study. Is this mechanism operative in adult retinas? Could the authors perform studies in older animals, just to have the proof of principle that the proposed mechanism is retained.

      Chickens are considered to be adult at about 4 months of age, when the females start laying eggs. Unfortunately, housing, maintenance, handling and experimentation on large adult chickens has proven to be challenging. Nevertheless, there is evidence that Muller glia reprogramming remains robust in mature chick retinas from the P1 through P30, but the zones of proliferation shift away from central retina and become increasingly confined to the retinal periphery (Fischer, 2005). MG “maturation” appears to occur in a central-to-peripheral gradient, much like the process of embryonic retinal differentiation, but a zone of regeneration-competent MG remains in the periphery during adolescent development (Fischer, 2005).

      We have defined central vs peripheral retina in the Methods.

      To partially address this question, we have generated a new supplemental Figure 6 showing (i) SPHK1 fluorescent in-situ labeling of central and peripheral regions at P10, and (ii) analysis of EdU+Sox2+ MGPCs in central versus regions treated with NMDA +/-S1PR1 inhibitor or NMDA+/- SPHK1 inhibitor. We find that patterns of S1PR1 transcription in the central region are similar to the peripheral region (not shown), and S1PR1 inhibition modestly increased numbers of MGPCs in central regions. Unlike the peripheral regions of retina, SPHK1 FISH signal in the central region remains low at 48 hours post-injury (supplemental Fig. 6). Additionally, we found that the SPHK1 inhibitor had no effect on numbers of proliferating MGPCs in the central regions of retina, whereas SPHK1 inhibitors stimulated proliferation of MGPCs in the periphery (Fig. 4). It is likely that mature MG in central retinal regions are not responsive to SPHK1 inhibition due to low levels of expression.

      We have previously shown that Notch-related genes show unique patterns of expression in the central and peripheral retinas, and expression levels significantly change at P0, P7, and P21 (Ghai et al, 2010). We found that Notch inhibition reduced cell death and numbers of MGPCs in central regions but not peripheral regions. Recent sc-RNA sequencing analysis of murine macula and peripheral retinal regions has revealed interesting differences in NFKBIA/Z and NFIA expression, possibly indicating a difference in the early inflammatory transcriptional response to retinal damage (Zhang et al, 2024 biorxiv). We believe that spatial sequencing of peripheral “immature” and central “mature” chick Muller glia will be a useful tool in the future to reveal key differences in signaling pathway-related gene expression which confer a competence for regeneration in the periphery.

      We have added text to the Results (pages 20-21) and Discussion (page 32) to address the S1P-signaling in central (mature MG) vs peripheral (immature MG) regions of the retina.

      Minor points.

      The abstract is difficult to follow and consists of a list of what activates or represses the formation of MGPC. Please rewrite the abstract to integrate information and provide a clearer message. Also, please include the species of study in the abstract and mention it again at the beginning of the results, at least.

      We have rewritten the abstract to simplify and clarify our main points (p 2).

      Lines 65-69. The sentence is unclear, perhaps there are words either missing or in excess and there is a need to check the spelling.

      We have simplified this sentence to improve clarity and referenced our recently published review to support.

      Lines 112-113. Please explain why " retinas were treated with saline, NMDA, or 2 or 3 doses insulin+FGF2 and the combination of NMDA and insulin+FGF2". There is a reference but readers will appreciate understanding right away why.

      We have added a sentence to clarify the purpose of comparing gene expression patterns in MG and MGPCs in NMDA-damaged retinas versus retinas treated with insulin+FGF2.

      Lines 223-257. This list of experiments is difficult to follow and perhaps should be summarized better. Somehow lines 257-261 say it all.

      We have revised this section to clarify differences in outcomes between S1PR1/3 activators and inhibitors. We also stated the enzymatic functions of SPHK1 and SGPL1 to improve clarity.

      Lines 392-441. Comparative expression analysis should be summarized as the message is somehow simple but the description is rather lengthy.

      We have revised our comparative expression analyses to be more concise.

      Reviewer #2 (Recommendations for the authors):

      (1) Only a single dose of the drugs (inhibitor/ antagonists/agonists signal modulators) is used for each drug, as shown in Table 1. How do they know this is an effective dose?

      We estimated the appropriate dose based on the initial maximum dose, which we based on the reported ED50 values for each drug. We have revised Table 1 to include this information.

      (2) Most of the drugs appeared to be hydrophobic, but except for sphingosine and S1P, all are described to be injected with sterile saline. They must provide solubility characteristics of these drugs in solvents. For example, FTY720 is not water-soluble, which raises the question of all of their drugs' solubility, bioavailability to the cells of interest, and their effectivity in signal transduction in the retinal cells.

      Some S1P-targeting compounds were delivered in 20% DMSO in saline to support the solubility of the different lipophillic small molecule agonists/antagonists. We have added information to the Methods to describe the use of DMSO to solubilize these drugs (p 6) in Table 1 and p 5. We have also revised Table 1 with ranges of reported ED50’s (mg/kg) for all drugs and listed the calculated initial maximum dose (mg/kg) per eye.

      (3) Drugs were delivered to the vitreous chamber, but there was no information on how they would cross the inner limiting membrane to affect or modulate S1P metabolism in retinal MG or to bind the S1P receptors on MG or other retinal cell types.

      All selected compounds are small-molecule drugs, many of which are structural analogues of sphingosine or S1P. These drugs would be classified as BDDCS Class II drugs, meaning they have low solubility but high cell permeability. Thus, it is highly probable that they diffuse across the ILM to act on S1P receptors on MG, but it is also likely that their bioavailability is more limited, requiring a higher dose, repeated doses, and the use of solubilizing agents. We have clarified our use of DMSO to solubilize these drugs (p 6) according to vendor recommendations (p 5). This information has been added to the Methods.

      (4) Gene expression is a very dynamic process; without providing more evidence that the expression changes are the direct effect of the drug treatment, the conclusions made based on the gene expression profiles are not strong. Additional points:

      We do not make assertions that changes in scRNA-seq expression profiles are the direct result of S1P-targetting drugs. We report significant changes in cellular expression profiles following NMDA-induced retinal damage or ablation of microglia. We feel that new experiments to assess the gene expression profiles of retinal cells that are directly downstream of the different S1P-targetting drugs is better suited for future studies.

      (5) Please add in the introduction that there is only one sphingosine kinase in chicken, as no SPHK2 is known to be present.

      We have added additional information regarding the expression of SPHK1 and SPHK2 genes in the chick genome (p 4).

      (6) Fig 1d and in many other UMAP clusters, the low expressing genes are barely visible (Ex. 1d, S1PR2, and S1PR3); please extract them in separate UMAP clusters and provide them in supplements.

      We have revised supplemental Figure 1 to include separate panels for each of the S1P-related gene.

      (7) The Figure References for SPHK1 (Fig. 2e), SGPL1 (Fig. 2e), ASAH1 (Fig. 2f), CERS6 (Fig. 2f), and CERS5 (Fig. 2f) in the line # 124- 132 should belong to Figure 1, not Figure 2.

      We have corrected these figure references (p 14).

      (8) The description of the expression of zebrafish genes does not match the figures. For example, 'Similarly, sphk1 was detected in very few cells in the retina (Fig. 10j). By comparison, sphk2 was detected in a few bipolar cells and rod photoreceptors (Fig. 10j). Similar to patterns of expression seen in chick and human retinas, sgpl1 was detected in microglia and a few cells scattered among the different clusters of inner retinal neurons and rod photoreceptors (Fig. 10j)', the expression of these genes are not in very few or few scattered cells rather in many cells.

      We have revised these statements to improve clarity and more accurately describe the data in Figure 10 (p 28).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors employed a combinatorial CRISPR-Cas9 knockout screen to uncover synthetically lethal kinase genes that could play a role in drug resistance to kinase inhibitors in triple-negative breast cancer. The study successfully reveals FYN as a mediator of resistance to depletion and inhibition of various tyrosine kinases, notably EGFR, IGF-1R, and ABL, in triple-negative breast cancer cells and xenografts. Mechanistically, they demonstrate that KDM4 contributes to the upregulation of FYN and thereby is an important mediator of drug resistance. All together, these findings suggest FYN and KDM4A as potential targets for combination therapy with kinase inhibitors in triple-negative breast cancer. Moreover, the study may also have important implications for other cancer types and other inhibitors, as the authors suggest that FYN could be a general feature of drug-tolerant persister cells.

      Strengths:

      (1) The authors used a large combination matrix of druggable tyrosine kinase gene knockouts, enabling studying of co-dependence of kinase genes. This approach mitigates off-target effects typically associated with kinase inhibitors, enhancing the precision of the findings.

      (2) The authors demonstrate the importance of FYN in drug resistance in multiple ways. They demonstrate synergistic interactions using both knockouts and inhibitors, while also revealing its transcriptional upregulation upon treatment, strengthening the conclusion that FYN plays a role in the resistance.

      (3) The study extends its impact by demonstrating the potent in vivo efficacy of certain combination treatments, underscoring the clinical relevance of the identified strategies.

      Weaknesses:

      (1) The methods and figure legends are incomplete, posing a barrier to the reproducibility of the study and hindering a comprehensive understanding and accurate interpretation of the results.

      We thank the reviewer for pointing this out. We tried adding as much detail in methods and figures legends as possible to maximize reproducibility and accuracy in interpreting our results as will be described for our responses for the recommendations for authors.

      (2) The authors make use of a large quantity of public data (Fig. 2D/E, Fig. 3F/L/M, Fig 4C, Fig 5B/H/I), whereas it would have strengthened the paper to perform these experiments themselves. While some of this data would be hard to generate (e.g. patient data) other data could have been generated by the authors. The disadvantage of the use of public data is that it merely comprises associations, but does not have causal/functional results (e.g. FYN inhibition in the different cancer models with various drugs). Moreover, by cherry-picking the data from public sources, the context of these sources is not clear to the reader, and thus harder to interpret correctly. For example, it is not directly clear whether the upregulation of FYN in these models is a very selective event or whether it is part of a very large epigenetic re-programming, where other genes may be more critical. While some of the used data are from well-known curated databases, others are from individual papers that the reader should assess critically in order to interpret the data. Sometimes the public data was redundant, as the authors did do the experiments themselves (e.g. lung cancer drug-tolerant persisters), in this case, the public data could also be left out.

      More importantly, the original sources are not properly cited. While the GEO accession numbers are shown in a supplementary table, the articles corresponding to this data should be cited in the main text, and preferably also in the figure legend, to clarify that this data is from public sources, which is now not always the case (e.g. line 224-226). If these original papers do already mention the upregulation of FYN, and the findings from the authors are thus not original, these findings should be discussed in the Discussion section instead of shown in the Results.

      We welcome the reviewer’s concern. As reviewer pointed out, our analysis with FYN expression levels in multiple studies with drug tolerant cells may merely reflect association and not causal relationships. We had at least shown that FYN inhibition may reduce drug tolerance in TNBC and EGFR inhibitor treated lung cancer cells (figures 2H, 5E). The causal role of FYN in emergence of drug tolerance in other cancers treated with different drugs (such as irinotecan treated colon adenocarcinoma and gemcitabine treated pancreatic adenocarcinoma) may be beyond scope of this study. We made a brief discussion addressing this concern in lines 273-275.

      We also added proper citations of the public data used in this study in main text and figure legends in lines 267-269. The GEO accession numbers are listed in supplementary table S2. Importantly, none of the referenced studies identified FYN as key factor in generating drug tolerant cells.

      (3) The claim in the abstract (and discussion) that the study "highlights FYN as broadly applicable mediator of therapy resistance and persistence", is not sufficiently supported by the results. The current study only shows functional evidence for this for an EGFR, IGF1R, and Abl inhibitor in TNBC cells. Further, it demonstrates (to a limited extent) the role of FYN in gefitinib and osimertinib resistance (also EGFR inhibitors) in lung cancer cells. Thus, the causal evidence provided is only limited to a select subset of tyrosine kinase inhibitors in two cancer types. While the authors show associations between FYN and drug resistance in other cancer types and after other treatments, these associations are not solid evidence for a causal connection as mentioned in this statement. Epigenetic reprogramming causing drug resistance can be accompanied by altered gene expression of many genes, and the upregulation of FYN may be a consequence, but not a cause of the drug resistance. Therefore, the authors should be more cautious in making such statements about the broad applicability of FYN as a mediator of therapy resistance.

      We fully agree with the reviewer’s concern that FYN upregulation is simply an association, and may not be the cause of drug tolerance and resistance. Therefore, to accurately convey our findings, we edited our manuscript in lines 34-36 in abstract to “FYN expression is associated with therapy resistance and persistence by demonstrating its upregulation in various experimental models of drug-tolerant persisters and residual disease following targeted therapy, chemotherapy, and radiotherapy” and lines 288-290 in discussion to “ Upregulation of FYN is a general feature of drug tolerant cancer cells, suggesting the association of FYN expression with drug resistance and tumor recurrence after treatment.” We hope this satisfies the reviewer.

      (4) The rationale for picking and validating FYN as the main candidate gene over other genes such as FGFR2, FRK2, and TEK is not clear.

      a. While gene pairs containing FGFR2 knockouts seemed to be equally effective as FYN gene pairs in the primary screening, these could not be validated in the validation experiment. It is unclear whether multiple individual or a pool of gRNAs were used for this validation, or whether only 1 gRNA sequence was picked per gene for this validation. If only 1 gRNA per gene was used, this likely would have resulted in variable knockout efficiencies. Moreover, the T7 endonuclease assay may not have been the best method to check knockout efficiency, as it only implies endonuclease activity around a gene (but not to the extent of indels that can cause frameshifts, such as by TIDE analysis, or extent of reduction in protein levels by western blot).

      b. Moreover, FRK2 and TEK, also demonstrated many synergistic gene pairs in the primary screen. However, many of these gene pairs were not included in the validation screening. The selection criteria of candidate gene pairs for validation screening is not clear. Still, TEK-ABL2 was also validated as a strong hit in the validation screen. The authors should better explain the choice of FYN over other hits, and/or mention that TEK and FRK2 may also be important targets for combination treatment that can be further elucidated.

      We thank the reviewer for improving our manuscript. We had concerns with the generalizability of FGFR2, FRK and TEK in TNBC as their expressions are very low in MDA-MB-231, nor were they enriched in TNBC compared to cancer cell lines of other subtypes. We added a brief comment on this concern in results section and discussion section (lines 150-154, figure S3). Although we acknowledge that the validations done in figure 2B is a result of only one guide RNA, with validations with pharmacological inhibition of FYN (figure 2F-I), we hope the reader and reviewer can be convinced with our key findings in synthetic lethality between FYN and other tyrosine kinases.

      (5) On several occasions, the right controls (individual treatments, performed in parallel) are not included in the figures. The authors should include the responses to each of the single treatments, and/or better explain the normalization that might explain why the controls are not shown.

      a. Figure 2G: The effect of PP2 treatment, without combined treatment, is not shown.

      b. Figure 2H/3G: The effect of the knockouts on growth alone, compared to sgGFP, is not demonstrated. It is unclear whether the viability of knockouts is normalized to sgGFP, or to each untreated knockout.

      c. Figure 2L: The effect of SB203580 as a single treatment is not shown.

      We thank the reviewer for pointing this out. The data shown for all figures listed in these concerns were normalized by the changes in viability by pharmacological or genetic perturbations that synergized with TKIs (NVP-ADW742, gefitinib…etc.) used in the figures in the original manuscript. As reviewer had suggested, we newly added the effect of SB203580 and PP2 treatment on cell viability in supplementary figures S4A, S4K. SB203580 had no significant effect on cell viability, while PP2 treatment caused significant decrease in cell viability, which is expected as PP2 can inhibit activity of multiple Src family kinases. Regardless of the effect of SB203580 and PP2 on cell viability as single agent, it is evident that treatment of TKIs synergistically decreased cell viability in cancer cell lines. The change in viability by FYN or histone lysine demethylase knockout was also provided in newly added figure S4D and S6C. Notably, genetic ablation of FYN or histone lysine demethylases had modest, if any, influences on cell viability.

      (6) The study examines the effects at a single, relatively late time point after treatment with inhibitors, without confirming the sequential impact on KDM4A and FYN. The proposed sequence of transcriptional upregulation of KDM4A followed by epigenetic modifications leading to FYN upregulation would be more compellingly supported by demonstrating a consecutive, rather than simultaneous, occurrence of these events. Furthermore, the protein level assessment at 48 hours (for RNA levels not clearly described), raises concerns about potential confounding factors. At this late time point, reduced cell viability due to the combination treatment could contribute to observed effects such as altered FYN expression and P38 MAPK phosphorylation, making it challenging to attribute these changes solely to the specific and selective reduction of FYN expression by KDM4A.

      We thank the reviewer for pointing this out. We performed time course experiment for NVP-ADW742 treatment on MDA-MB-231 cells in our newly added figure 3E. Surprisingly, treatment of NVP-ADW742 increased KDM4A protein level within two hours. FYN protein accumulation followed KDM4A accumulation after 24 hours. This observation, with our chromatin immunoprecipitation data in figure 3O, provide evidence that FYN accumulation is a consequence of KDM4A accumulation and H3K9me3 demethylation upon TKI treatment. We newly discussed this data in results and discussion section in lines 214-216.

      (7) The cut-off for considering interactions "synergistic" is quite low. The manual of the used "SynergyFinder" tool itself recommends values above >10 as synergistic and between -10 and 10 as additive ( https://synergyfinder.fimm.fi/synergy/synfin_docs/). Here, values between 5-10 are also considered synergistic. Caution should be taken when discussing those results. Showing the actual dose response (including responses to each single treatment) may be required to enable the reader to critically assess the synergy, along with its standard deviation.

      We thank the reviewer for careful comments. We reanalyzed our data with SynergyFinder plus tool (Zheng, Genomics, Proteomics, and Bioinformatics 2022), which implements mathematical models distinct from SynergyFinder 3, for more faithful implementation of Bliss, Loewe independence models, and more critically, calculates statistical significance of the synergy. We provide updates synergy plots with statistics in figures 2F, 3J, and S4B. All drug combinations show statistically significant synergy (p<0.01). We also add raw data used to calculate synergy in figures 2F, 3J and S4B in supplementary dataset S2.

      (8) As the effect size on Western blots is quite limited and sometimes accompanied by differences in loading control, these data should be further supported by quantifications of signal intensities of at least 3 biological replicates (e.g. especially Figure 3A/5A). The figure legends should also state how many independent experiments the blots are representative of.

      We added quantifications for figure 3A and 5A for better depiction of our results. Figure legends were edited to indicate this is a representative of three independent experiments.

      (9) While the article provides mechanistic insights into the likely upregulation of FYN by KDM4A, this constitutes only a fragment of the broader mechanism underlying drug resistance associated with FYN. The study falls short in investigating the causes of KDM4A upregulation and fails to explore the downstream effects (except for p38 MAPK phosphorylation, which may not be complete) of FYN upregulation that could potentially drive sustained cell proliferation and survival. These omissions limit the comprehensive understanding of the complete molecular pathway, and the discussion section does not address potential implications or pathways beyond the identified KDM4A-FYN axis. A more thorough exploration of these aspects would enhance the study's contribution to the field.

      We welcome the reviewer’s careful concern. We agree our delineation of mechanisms underlying TKI resistance in TNBC involving KDM4 and FYN is far from complete. The increases in expression of histone demethylases were observed in cancers treated with different drugs. The mechanisms governing the increase in histone demethylase expression is not known and is beyond the scope of this paper. We newly added this in discussion section in lines 299-304.

      (10) FYN has been implied in drug resistance previously, and other mechanisms of its upregulation, as well as downstream consequences, have been described previously. These were not evaluated in this paper, and are also not discussed in the discussion section. Moreover, the authors did not investigate whether any of the many other mechanisms of drug resistance to EGFR, IGF1R, and Abl inhibitors that have been described, could be related to FYN as well. A more comprehensive examination of existing literature and consideration of alternative or parallel mechanisms in the discussion would enhance the paper's contribution to understanding FYN's involvement in drug resistance.

      FYN has been implicated in TKI resistance in CML cell lines (Irwin, Oncotarget, 2015). In this study, FYN is similarly transcriptionally upregulated in imatinib resistant CML, and this upregulation is dependent on EGR1 transcription factor. To address this concern, we generated EGR1 KO MDA-MB-231 cells and tested whether these cells retain the ability to accumulate FYN. Consistent with the previous study, imatinib treatment increased EGR1 protein level. However, EGR1 knockout did not influence FYN accumulation in MDA-MB-231 cells. EGR1 mediated accumulation of FYN may be context specific phenomenon to CML (Figure S5B). We newly discussed this result in result sections in lines 187-190. We also acknowledge that SRC family kinases are generally involved in drug resistance in many cancers. We discuss the recent findings regarding SRC family kinases in drug resistance in result section in lines 145-147 and discussion sections in lines 315-317.

      Reviewer #2 (Public Review):

      Summary:

      Kim et al. conducted a study in which they selected 76 tyrosine kinases and performed CRISPR/Cas9 combinatorial screening to target 3003 genes in Triple-negative breast cancer (TNBC) cells. Their investigation revealed a significant correlation between the FYN gene and the proliferation and death of breast cancer cells. The authors demonstrated that depleting FYN and using FYN inhibitors, in combination with TKIs, synergistically suppressed the growth of breast cancer tumor cells. They observed that TKIs upregulate the levels of FYN and the histone demethylase family, particularly KDM4, promoting FYN expression. The authors further showed that KDM4 weakens the H3K9me3 mark in the FYN enhancer region, and the inhibitor QC6352 effectively inhibits this process, leading to a synergistic induction of apoptosis in breast cancer cells along with TKIs. Additionally, the authors discovered that FYN is upregulated in various drug-resistant cancer cells, and inhibitors targeting FYN, such as PP2, sensitize drug-resistant cells to EGFR inhibitors.

      Strengths:

      This study provides new insights into the roles and mechanisms of FYN and KDM4 in tumor cell resistance.

      Weaknesses:

      It is important to note that previous studies have also implicated FYN as a potential key factor in drug resistance of tumor cells, including breast cancer cells. While the current study is comprehensive and provides a rich dataset, certain experiments could be refined, and the logical structure could be more rigorous. For instance, the rationale behind selecting FYN, KDM4, and KDM4A as the focus of the study could be more thoroughly justified.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The methods and figure legends are incomplete, posing a barrier to the reproducibility of the study and hindering a comprehensive understanding and accurate interpretation of the results. A critical revision of these aspects is needed, for example:

      a. Catalogue numbers of certain products critical to reproduce the study (e.g. antibodies) and/or at what company they have been purchased (e.g. used compounds)

      b. On several occasions the used concentrations of drugs or exposure time are not mentioned (e.g. Figure 2H, G (PP2), I, J, K, L, etc.)

      c. Figure legend of figure panels E-I in Figure 5 seems to be completely incorrect and not consistent with the figure axis etc.

      d. RT-qPCR methodology is not described in Methods.

      e. Western blot methods are very limited: these should be described in more detail or cite an article that does.

      f. Organoid culture: Information about the source of tumour cells (e.g. pre-treatment biopsy, material after surgery), isolation of tumor cells (e.g. methodology, characterization of material) and culture conditions (e.g. culture time before the experiment) is lacking.

      g. Information about how gefitinib/osimertinib-resistant PC9 and HCC827 cells are generated (as well as culture conditions and where they are from) is missing.

      We thank the reviewer for pointing these out. We have done our best to add experimental details for reproducibility in methods section and figure legends in lines 343-348, 408-426, 431-432, 439-453, 648-650, 671-672 and 691-693.

      (2) Figure 1B/C/D: it would be more meaningful if the most important hits (at least in one of these panels) were highlighted (e.g. line with gene-pair named), or visualized separately, so that the reader does not have to read the supplementary table to know what the most important hits were.

      We thank the reviewer for careful concern. We newly added labels for key synergistic gene pairs in figures 1D as reviewer suggested.

      (3) qPCR data shown in Figure S4 is from 1 independent experiment. As these experiments (especially qPCR) can be rather variable and the effect size is not very large, I would highly recommend repeating these experiments, or excluding them, as conclusions from them are not solid.

      We found performing qPCR with many drugs that did not cause substantial synergistic cell death with NVP-ADW742 in figure S5C (figure S4A in previous version of manuscript) will not provide much additional insights. Also, as we were more interested in finding direct regulators of FYN expression, we focused on drugs that inhibit epigenetic regulator that activate transcription. Therefore, we focused on performing FYN qPCR with drug combinations involving GSK-J4 (KDM6 inhibitor) and pinometostat(DOT1L inhibitor). As shown in our newly added figure in S5D, while GSK-J4 inhibited FYN expression, pinometostat failed to do so. Also, we also confirm that knockout of KDM5 or KDM6 reproducibly failed to decrease FYN expression upon TKI treatment (figure S5E and S5G). The new results are discussed in lines 193-198. We hope these additions satisfy the reviewer.

      (4) For validation of synergistic knockouts, it would be helpful for the interpretation to also show the viability/growth of each knockout (or treatment), instead of mostly normalized scores. For example, the reader now has no insight into whether FYN knockout itself already affects cell viability, or not. If it (or EGFR/IGF1R/ABL knockout) would already substantially affect cell viability, a further reduction in cell viability may not be as relevant as when it would not affect cell viability at all.

      We thank the reviewer for pointing this out. We replaced our figure in figure 2A to indicate raw changes in cell viability in each single and double knockout cells in figure S2A. We hope this satisfies the reviewer.

      (5) The curve fitting as in Figure 2G is somewhat misleading. While the curve seems to be forced to go from 1-0, the +PP2 dose-response curve does actually not seem to start at 1, but rather at 0.8, likely resulting from the effect of PP2 as a single treatment, thus, effects may be interpreted as more synergistic than that they truly are.

      The results shown in figure 2G is actually normalized to cells treated or not with PP2 to better reflect the effect of NVP-ADW742, gefitinib and imatinib in the presence of PP2. So viability value starting at 0.8 is not because of the effect of PP2 treatment as single agent (because it is normalized to PP2 treated cells), but is actually because very small dose of particularly NVP-ADW742 resulted in modest decrease in viability. To more accurately depict our findings, we added the data point in figure 2G with TKI dose of 0uM at viability 1. We also added details for normalization of viability in figure legends.

      (6) The readability of the paper could be enhanced by higher-quality images (now the text is quite pixelated).

      We had technical difficulties in converting file types. We have replaced figures for better resolution for all main and supplementary figures.

      (7) The discussion now contains one paragraph about the selectivity of kinase inhibitors, and that repurposing of inhibitors with more relaxed specificity or multi-kinase inhibitors can be beneficial. This does not seem to fall within the scope of the study, as there was no comparison between selective and non-selective inhibitors. It was also not clearly mentioned that the non-selective inhibitors worked better than the gene knockouts, or that for example, KDM3 and KDM4 knockout together worked better than only KDM4 knockout. It is recommended to either remove this paragraph, or rephrase it so that it better fits the actual results

      We agree with the reviewer. We chose to remove this paragraph in lines 308-313.

      (8) The entire paper does not discuss any known functions of FYN. Its function could be very briefly introduced in the results section when highlighting it as an important hit. More importantly, its known role in cancer and especially drug resistance should be discussed in the discussion (see also Public review).

      We thank the reviewer for pointing this out. We added brief description of the role of FYN in cancer malignancy and drug resistance in lines 145-147. Particularly, FYN accumulation by EGR1 transcription factor had been described in the context of imatinib resistant chronic myeloid leukemia (Irwin, Oncotarget, 2015). To address this, we tested whether EGR1 knockout decreases FYN level in MDA-MB-231 (Figure S5A). Notably EGR1 knockout failed to decrease FYN protein level. This result was discussed in lines 187-190.

      (9) Textual changes including:

      a. Line 29 (and others) "Massively parallel combinatorial CRISPR screens": I would rather choose a more descriptive term, such as "combinatorial tyrosine kinase knockout CRISPR screen", which already clarifies the screen used knockouts of (druggable) tyrosine kinases only. Using both "Parallel" and "combinatorial" is somewhat redundant, and "massively" is subjective, in my opinion.

      Manuscript edited as suggested (lines 29, 63, 86, 283). The term “massively parallel” have been removed as they don’t significantly change our scientific findings.

      b. Line 67 (and others): "to identify ... for elimination of TNBC": while this may be its potential implication, this study has identified genes in (mostly) TNBC cell lines and cell line xenografts. Please rephrase to something more within the scope of this research.

      Manuscript edited as suggested (lines 68-69) as “we utilize CombiGEM-CRISPR technology to identify tyrosine kinase inhibitor combinations with synergistic effect in TNBC cell line and xenograft models for potential combinatorial therapy against TNBC.” We hope it satisfies the reviewer.

      c. Line 31 (and others): Please check the capitals of words describing inhibitors, and make them consistent (e.g. Imatinib written with capital I, other inhibitors without capitals).

      We thank the reviewer for catching this error. We changed all “imatinib” and “osimertinib” to lowercase.

      d. Line 71: "... combining PP2, saracatinib (FYN inhibitor), .." ..." Here it is not clear PP2 is a FYN inhibitor, and, as saracatinib is a well-known Src-inhibitor, it is not correct to just say "FYN inhibitor". Better to rephrase to something such as:  "combining PP2 (Lck/Fyn inhibitor), saracatinib (Src/FYN inhibitor).

      As reviewer noted, most Src family kinase inhibitors are not selective against specific member among other Src family members. Therefore, we changed line 73 to “PP2, saracatinib (Src family kinase / FYN inhibitor).”

      e. Line 81: "The resulting library enabled massively parallel screens of pairwise knockouts, .." To clarify this is for the selected kinases only: "The resulting library enabled screens of pairwise knockouts of the 76 tyrosine kinase genes, .."

      Manuscript edited as suggested by the reviewer in line 86.

      f. Line 88 (and others): "after infection" consider rephrasing to "after transduction" as this is more commonly used when using lentiviral vectors only.

      We thank the reviewer for this. Every “infection” that designates lentiviral transduction were changed to “transduction”.

      g. Line 97-99: While being described as "good" correlation, a correlation of the same sgRNA pair, yet in a different order, of r=0.5 does not seem to be very good, neither does a correlation of r=0.74 for biological replicates. Please consider describing in a less subjective way.

      We removed the subjective terms and changed the manuscript as follows: “sgRNA pair (e.g., sgRNA-A + sgRNA-B and sgRNA-B + sgRNA-A) were positively correlated (r = 0.50) and were combined when calculating Z (Fig. S1D). The Z scores for three biological replicates were also correlated with r = 0.74 between replicates #2 and #3 (Fig. S1E).” in lines 97-101.

      h. Lines 92-96 and lines 102-115: The results section here contains quite a lot of technical information. While some information may be directly needed to understand the described results (such as a very short and simple explanation of how to interpret gene interaction score), other information may be more appropriate for the Methods section, to enhance the readability of the paper. Consider simplifying here and giving a more detailed overview in the Methods section. Also, the text is not entirely clear. You seem to give two separate explanations of how the GI scores were calculated (Starting in lines 106 and 111): please rephrase and clearly indicate the connections between those two explanations (in the Methods section).

      We thank the reviewer for valuable suggestion. We moved significant portions of the technical descriptions in methods section. We also clarified the text regarding the procedures for calculating GI scores in lines 385-387.

      i. Line 142: "These findings suggest that gene A could represent an attractive drug target.." "Gene A" should be "FYN"?

      We thank the reviewer for catching this. Indeed, it is “FYN” and we changed it in line 154.

      j. Line 149: Introduce Saracatinib, and make the reader aware that it actually mostly targets Src, and FYN with lower affinity.

      We newly added text in lines 73 and 164 to indicate that saracatinib is an inhibitor against Src family kinases.

      k. Line 469: "by the two sgRNA." "by the two sgRNAs".

      Corrected

      l. Throughout text/figures/figure legends, please check for consistency in the naming of cell lines, compounds, referring to figures etc. (E.g. MDA-MB-231/MDA MB 231/MDAMB-231 ; Fig. 1/Figure 1).

      Corrected. Thank you for catching this error.

      m. In Methods, frequently ug or uL are used instead of µg or µL

      Corrected.

      n. Legend Figure 5: Clarify what A, G, I, D, and P mean.

      Corrected in line 685-686 to: “A: NVP-ADW742, G: gefitinib, I: imatinib, D: doxorubicin, P: Paclitaxel.”

      o. Line 303: What is meant by: "The six variable nucleotides were added in reverse primer for multiplexing". Could you clarify this in the text?

      We apologize for confusion the six nucleotides is index sequence for multiplexed run in NGS. The text in lines 373-374 is edited to: “The six nucleotides described as “NNNNNN” in reverse primer above represents unique index to identify biological replicates in multiplexed NGS run.”

      Reviewer #2 (Recommendations For The Authors):

      To enhance the robustness of the conclusions drawn from this study, certain concerns merit attention.

      Concerns:

      (1) Line 130 indicates that eight synergistic target gene combinations were validated. It would be helpful to clarify the criteria used to select these gene pairs and provide the rationale for studying these specific combinations of genes.

      In fact, we had selected the gene pairs that we had the sgRNAs against available when we performed the experiments, so we did not have very good reason to explain our selections. Instead we added a brief discussion in lines 304-306 that further validations are required for the gene pairs not experimentally tested.

      (2) According to Figure 2C, FYN was identified as crucial among the 30 gene pairs, and its upregulation in TNBC prompted further investigation. It would be informative to discuss the expression levels of TEK, FRK, and FGFR2 in TNBC and explain why these nodes were not studied. Is there existing evidence demonstrating the superiority of FYN over these other genes?

      The similar concern was raised by reviewer #1. The expression levels of TEK, FRK and FGFR2 were relatively low in MDA-MB-231 and TNBCs in general, and we were concerned about the generalizability of these targets for treating TNBC. While the validation of these genes for possible synthetic lethality may lead to valuable insight, this may be beyond scope of this paper. This concern is newly discussed in result and discussion sections in lines 150-154.

      (3) The screening process employed only one cell line, and validation was conducted with only one cell line (Figure 2A). Consider supplementing the findings with more convincing evidence from other breast cancer cell lines to strengthen the conclusions.

      Although the CRISPR screens and primary validations were done with only one cell line, further validations with drug combinations were done in independent cancer cell lines such as Hs578T (figures S4E-J). Also, the possible association of FYN expression in drug tolerant cells were also demonstrated in lung cancer cells. We hope this satisfies the reviewer.

      (4) The network analysis in Figure 2C lacks a description of the methodology used. It would be beneficial to provide a brief explanation of the methods employed for this analysis.

      The network analysis was done manually with the size of each node proportional to the number of gene pairs. We newly added text in figure legend in line 638 to clarify this.

      (5) The significance of gene A mentioned in line 142 is unclear. Please provide a clear explanation or context for the importance of this gene.

      This is a mistake that were also pointed out by reviewer #1. The “gene A” should have been “FYN”. We corrected this in line 154.

      6. In Figure 2J and Figure 2K, it would be more informative to measure the phosphorylation levels of FYN and SRC rather than just their baseline levels. Consider revising the figures accordingly.

      We thank the reviewer for a careful comment. We newly provide supplementary figure S5A to show that phosphorylation level of FYN is increased, but this increase was proportional to the increase in FYN protein level, so the ratio of pFYN/FYN did not change significantly. We discussed this result in lines 187-190.

      (7) Figure S4B lacks biological replicates, which could impact the reliability of the experimental results. Consider adding biological replicates to enhance the robustness of the findings.

      This was also pointed out by reviewer #1. Instead of performing qPCR for all drugs, we focused on validating the decrease in FYN mRNA level for drug combinations that synergistically kill cancer cells. We were also aiming to identify direct mediator of FYN mRNA upregulation, so we focused on drug combination that involves inhibitor of epigenetic regulator that promotes transcription. To this end, we tested the impact of GSK-J4(KDM6 inhibitor) and pinometostat (DOT1L inhibitor) in combination with TKI in regulating FYN expression level. Notably, while GSK-J4 attenuated FYN mRNA accumulation by NVP-ADW742 treatment, pinometostat failed to do so (figure S5C). We newly described these results in lines 192-197 in results section.

      (8) Line 186 indicates that KDM3 knockout was not tested in Figure S5A. It would be helpful to provide an explanation for this omission or consider including the data if available.

      We thank the reviewer for pointing this out. The T7 endonuclease assay results for KDM3, KDM4 and PHF8 are added in figure S6B. All guide RNAs used in the study efficiently generated indel mutations.

      (9) In line 206, KDM4A is introduced, but Figures 3J and 3M had already pointed to KDM4A. The authors did not analyze the ChIP results for other members of the KDM4 family at this point. Please address this inconsistency and provide a rationale for focusing on KDM4A. Additionally, in Figure 3M, consider adding peak labeling to the enriched portion for clarity.

      We welcome the reviewer’s careful concern. KDM4 family enzymes perform catalytically identical reactions, and are thought to be redundant. Therefore, we judged that the most abundantly expression genes among KDM4 family should be the primary target to focus on. To this end, we analyzed the expression levels of KDM4 family genes in supplementary figure S6A. Indeed KDM4A expression was the highest among other KDM4 family genes. We discussed this in results section in lines 218-220.

      (10) The author only indicated the relationship between the H3K9me3 level in the enhancer region and FYN expression. It would be valuable to verify the activity of the enhancers and investigate additional markers such as H3K27ac and H3K4me1. Consider discussing these aspects to provide a more comprehensive understanding.

      Since we and others had shown that histone dementhylases are increased upon drug treatment, we focused on histone methylation marks which are associated with gene repression and whose removal by demethylases are associated with drug resistance. To this end, KDM6 demethylases removing H3K27me3 may serve as attractive alternative. In our newly added supplementary figure S6E, ADW742 treatment did not decrease H3K27me3 level in FYN promoter, indicating that H3K9me3 may be the dominant epigenetic change that modulates FYN expression upon drug treatment. This was briefly discussed in lines 233-235.

      (11) In Figure 4A, the addition of the drug alone does not inhibit tumor growth. Please provide an explanation for this result and consider discussing potential reasons for the observed lack of inhibition.

      The drug dose was adjusted carefully to minimize tumor shrinkage by single drug so that synergistic tumor shrinkage can be clearer.

      (12) Line 208 indicates missing parentheses in the text describing Figure 4C. Please correct the text accordingly to ensure clarity.

      Corrected. Thank you for catching this error.

      (13) The figure legends for Figures 5E, F, G, and H contain errors. Please correct the figure legends to accurately describe the respective figures.

      We thank the reviewer for catching this error. We have changed the figure legends in lines 691-697 to accurately describe the figures.

      (14) It may be beneficial for the authors to divide the results section into several subsections and add headings to improve the overall understanding of the findings.

      This is an excellent suggestion. We divided our results section into subsections and added headings in lines 80, 141, 181, 237 and 251 to help readers understand our findings.

      (15) The authors should include the sgRNA sequences used for gene targeting, along with details of the target genes and negative/positive controls, in the Supplementary Materials to enhance reproducibility and transparency.

      This is a critical point for improving reproducibility of our work. The sgRNA sequences used in the study are newly added in supplementary table S3.

      (16) The resolution of the figures in the Supplementary Materials is too low, which may impede the authors' ability to interpret the data. Consider providing higher-resolution figures for better readability.

      We had similar concern posed by reviewer #1, we provided higher resolution image for all main and supplementary figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors constructed a novel HSV-based therapeutic vaccine to cure SIV in a primate model. The novel HSV vector is deleted for ICP34.5. Evidence is given that this protein blocks HIV reactivation by interference with the NF-kB pathway. The deleted construct supposedly would reactivate SIV from latency. The SIV genes carried by the vector ought to elicit a strong immune response. Together the HSV vector would elicit a shock and kill effect. This is tested in a primate model.

      Thank you for your kind comments and suggestions, which are very helpful in improving our manuscript. We have carefully revised our manuscript and performed additional experiments accordingly, and we now think this version has been substantially improved for your reconsideration.

      Strengths and weaknesses:

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. Why is no eGFP readout given in Figure 1C as for WT HSV? The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); so the story seems incomplete.

      Thank you for your careful review and kind reminder.

      (1) We are sorry for the misunderstanding of Figure 1C. In the experiment of Figue 1C, we used an HSV-1 17 strain containing GFP (HSV-GFP) and HSV-DICP34.5 (recombinant HSV-1 17 strain with ICP34.5 deletion based on HSV-GFP) to reactivate the HIV latency cell line (J-Lat 10.6 cell). Since detecting GFP cannot distinguish between HSV infection and HIV reactivation, we assessed the reactivation by measuring the mRNA levels of HIV LTR upon stimulation with either HSV-GFP or HSV-ΔICP34.5. Actually, in Figure 1B, we had verified the reactivation efficacy by infecting J-Lat 10.6 cells with the HSV-1 17 strain containing GFP (HSV-GFP) and found significant upregulation of mRNA levels of HIV-1 LTR, Tat, Gag, Vif, and Vpr. We have adjusted the corresponding descriptions accordingly in the revised manuscript.

      (2) We agree with your insightful mention that the mechanism underlying increased activation by HSV-ΔICP34.5 is worthy to be further explored in the future study. In this study, we found that ICP34.5 play an antagonistic role with the reactivation of HIV latency by HSV-1 mainly through the modulation of host NF-κB and HSF1 pathways, while HSV-1 (especially HSV-ΔICP34.5) might reactivate HIV latency through NF-κB, HSF1, and other yet-to-be-determined mechanisms. Thus, ICP34.5 overexpression can only a partial effect on the reduction of the HIV latency reactivation by HSV-1. We have mentioned this issue in the revised “Discussion section”. Intriguingly, these findings collectively indicated that ICP34.5 might play an antagonistic role in the reactivation of HIV by HSV-1, and thus our modified HSV-DICP34.5 constructs can effectively reactivate HIV/SIV latency through the release of imprisonment from ICP34.5. However, ICP34.5 overexpression had only a partial effect on the reduction of the HIV latency reactivation, indicating that HSV-DICP34.5-based constructs can also reactivate HIV latency through other yet-to-be-determined mechanisms. (Lines 334 to 340).

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? An RNA seq analysis is required to show the effect on cellular genes.

      Thank you for your questions and suggestions.

      (1) It’s well known that ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and previous studies (in gene therapy and oncolytic virotherapy) have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      (2) In our study, we found both adenovirus and vaccinia virus cannot reactivate HIV latency (Figure S3). In addition, the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1 (Figure S4). Thus, these data suggested the reactivation of HIV latency by HSV-1 might be virus-specific. Of course, this might be further investigated in future studies. We have added the corresponding description in the revised manuscript.

      (3) To explore the mechanism of reactivating viral latency by HSV-DICP34.5-based constructs, we performed RNA-seq analysis (Figure S5). We have added the corresponding description accordingly in the revised manuscript.

      (3) The primate groups are too small and the results to variable to make averages. In Figure 5, the group with ART and saline has two slow rebounders. It is not correct to average those with a single quick rebounder. Here the interpretation is NOT supported by the data.

      We agree with you that this is a pilot study with limited numbers of rhesus macaques. Although the number of macaques was relatively limited, these nine macaques were distributed evenly based on the background level of age, sex, weight, CD4 count, and viral load (VL) (Table S2). All SIV-infected macaques used in this study had a long history of SIV infection and had several courses of ART therapy, which mimics treatment of chronic HIV-1 infection in humans. These macaques were infected with SIVmac239 for more than 5 years, and highly pathogenic SIV-infected macaques have been well-validated as a stringent model to recapitulate HIV-1 pathogenesis and persistence during ART therapy in humans. Indeed, in our Chinese rhesus model, ART treatment effectively suppressed SIV infection to undetectable levels in plasma, and upon ART discontinuation, virus rapidly rebounded, which is very similar with that in ART-treated HIV patients. We think the results of this pilot study were very promising for further studies which will be expanded the scale of animals and then to preclinical and clinical study in our next projects. Thank you for your understanding.

      As for your question regarding “the two animals with low VL and slow rebound”, our explanation is following: As mentioned above, these macaques were distributed evenly based on the background level of CD4 count and VL (Table S2), and then there were different change of viral load and viral rebound in different groups. Thus, we think these data can support our interpretation. Moreover, our conclusion can also be supported from at least three evidences.

      (1) The VL in the ART+saline group promptly rebounded after ART discontinuation, with an average 8.63-fold increase in the rebounded peak VL compared with the pre-ART VL (Figure 5A, D and E). However, plasma VL in the ART+HSV-sPD1-SIVgag/SIVenv group exhibited a delayed rebound interval (Figure 5B-D).

      (2) There was a lower rebounded peak VL than pre-ART VL in the ART+HSV-sPD1-SIVgag/SIVenv group (average 12.20-fold decrease), while a higher rebounded peak VL than pre-ART VL in the ART+HSV-empty group (average 2.74-fold increase) (Figure 5E).

      (3) We found significant suppression of total SIV DNA and integrated SIV DNA provirus in the ART+HSV-sPD1-SIVgag/SIVenv group. However, the copies of the SIV DNA provirus were significantly improved in the ART+HSV-empty group and ART+saline group (Figure 5F-G).

      Thank you for your understanding.

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

      Thank you for your kind question comment and question. We confirmed the enhanced reactivation of HIV latency by HSV-∆ICP34.5 in primary CD4+ T cells from people living with HIV (PLWH) (Figure S2). As mentioned above, previous studies have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this article, Wen et. al. describe the development of a 'proof-of-concept' bi-functional vector based on HSV-deltaICP-34.5's ability to purge latent HIV-1 and SIV genomes from cells. They show that co-infection of latent J-lat T-cell lines with an HSV-deltaICP-34.5 vector can reactivate HIV-1 from a latent state. Over- or stable expression of ICP 34.5 ORF in these cells can arrest latent HIV-1 genomes from transcription, even in the presence of latency reversal agents. ICP34.5 can co-IP with- and de-phosphorylate IKKa/b to block its interaction with NF-k/B transcription factor. Additionally, ICP34.5 can interact with HSF1 which was identified by mass-spec. Thus, the authors propose that the latency reversal effect of HSV-deltaICP-34.5 in co-infected JLat cells is due to modulatory effects on the IKKa/b-NF-kB and PP1-HSF-1 pathway.

      Next, the authors cleverly construct a bifunctional HSV-based vector with deleted ICP34.5 and 47 ORFs to purge latency and avoid immunological refluxes, and additionally, expand the application of this construct as a vaccine by introducing SIV genes. They use this 'vaccine' in mouse models and show the expected SIV-immune responses. Experiments in rhesus macaques (RM), further elicit the potential for their approach to reactivate SIV genomes and at the same time block their replication by antibodies. What was interesting in the SIV experiments is that the dual-functional vector vaccine containing sPD1- and SIV Gag/Env ORFs effectively delayed SIV rebound in RMs and in some cases almost neutralized viral DNA copy detection in serum. Very promising indeed, however, there are some questions I wish the authors had explored to get answers to, detailed below.

      Overall, this is an elegant and timely work demonstrating the feasibility of reducing virus rebound in animals, with the potential to expand to clinical studies. The work was well-written, and sections were clearly discussed.

      Strengths:

      The work is well designed, rationale explained, and written very clearly for lay readers.<br /> Claims are adequately supported by evidence and well-designed experiments including controls.

      Thank you for your nice comments regarding our work.

      Weaknesses:

      (1) While the mechanism of ICP34.5 interaction and modulation of the NF-kB and HSF1 pathways are shown, this only proves ICP34.5 interactions but does not give away the mechanism of how the HSV-deltaICP-34.5 vector purges HIV-1 latency. What other components of the vector are required for latency reversal? Perhaps serial deletion experiments of the other ORFs in the HSV-deltaICP-34.5 vector might be revealing.

      Thank you for your valuable suggestion. In fact, we are currently further exploring some potential viral genes of HSV-1 that might play a role in the reactivation of HIV latency. We have found that the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1 (Figure S4), showing that ICP0 might play a vital role for the reactivation. Of course, this might be further investigated in future studies. We have added the corresponding description in the revised manuscript.

      (2) The efficacy of the HSV vaccine vectors was evaluated in Rhesus Macaque model animals. Animals were chronically infected with SIV (a parent of HIV), treated with ART, challenged with bi-functional HSV vaccine or controls, and discontinued treatment, and the resulting virus burden and immune responses were monitored. The animals showed SIV Gag and Env-specific immune responses, and delayed virus rebound (however rebound is still there), and below-detection viral DNA copies. What would make a more convincing argument to this reviewer will be data to demonstrate that after the bi-functional vaccine, the animals show overall reduction in the number of circulating latent cells. The feasibility of obtaining such a result is not clearly demonstrated.

      Thank you for your valuable mention. We have now provided more data about this issue. We found significant suppression of total SIV DNA and integrated SIV DNA provirus in the ART+HSV-sPD1-SIVgag/SIVenv group. However, the copies of the SIV DNA provirus were significantly improved in the ART+HSV-empty group and ART+saline group (Figure 5F-G). We have added the corresponding description in the revised manuscript.

      (3) The authors state that the reduced virus rebound detected following bi-functional vaccine delivery is due to latent genomes becoming activated and steady-state neutralization of these viruses by antibody response. This needs to be demonstrated. Perhaps cell-culture experiments from specimens taken from animals might help address this issue. In lab cultures one could create environments without antibody responses, under these conditions one would expect a higher level of viral loads to be released in response to the vaccine in question.

      Thanks for your kind mention and suggestion. We performed the following cell experiment to address this issue. Primary CD4+ T cells from people living with HIV (PLWH) were isolated, and then infected with HSV or HSV-∆ICP34.5 constructs. As expected, we confirmed the enhanced reactivation of HIV latency by HSV-∆ICP34.5 (Figure S2). Thank you.

      (4) How do the authors imagine neutralizing HIV-1 envelope epitopes by a similar strategy? A discussion of this point may also help.

      Thank you for your kind comment. We have added the corresponding discussion in the revised manuscript. “The current consensus on HIV/AIDS vaccines emphasizes the importance of simultaneously inducing broadly neutralizing antibodies and cellular immune responses. Therefore, we believe that incorporating the induction of broadly neutralizing antibodies into our future optimizing approaches may lead to better therapeutic outcomes.” (Lines 384 to 388)

      (5) I thought the empty HSV-vector control also elicited somewhat delayed kinetics in virus rebound and neutralization, can the authors comment on why this is the case?

      Thank you for your careful review and mention. We agree with you that the HSV-1 empty vector does exhibit somewhat a delayed rebound. We think the possible reason is: Although the empty HSV-vector cannot elicit SIV-specific CTL responses, it effectively activates the latent SIV reserviors, and then these activated virions can be partially killed by ART drugs. Therefore, even without carrying HIV/SIV antigens, somewhat delayed kinetics in virus rebound may be observed. Thank you.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should provide toxicity data for HSV transduction after deleting ICP34.5 and provide an explanation of why overexpression of ICP34.5 has such a small effect.

      Thank you for your questions and suggestions. As mentioned above, we now provided data for the safety of HSV-DICP34.5-based constructs.

      (1) It’s well known that ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and previous studies (in gene therapy and oncolytic virotherapy) have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      (2) We agree with your insightful mention that the mechanism underlying increased activation by HSV-ΔICP34.5 is worthy to be further explored in the future study. In this study, we found that ICP34.5 play an antagonistic role with the reactivation of HIV latency by HSV-1 mainly through the modulation of host NF-κB and HSF1 pathways, while HSV-1 (especially HSV-ΔICP34.5) might reactivate HIV latency through NF-κB, HSF1, and other yet-to-be-determined mechanisms. Thus, ICP34.5 overexpression can only a partial effect on the reduction of the HIV latency reactivation by HSV-1. We have mentioned this issue in the revised “Discussion section”. “Intriguingly, these findings collectively indicated that ICP34.5 might play an antagonistic role in the reactivation of HIV by HSV-1, and thus our modified HSV-DICP34.5 constructs can effectively reactivate HIV/SIV latency through the release of imprisonment from ICP34.5. However, ICP34.5 overexpression had only a partial effect on the reduction of the HIV latency reactivation, indicating that HSV-DICP34.5-based constructs can also reactivate HIV latency through other yet-to-be-determined mechanisms.” (Lines 334 to 340).

      (2) How specific is the effect for HIV reactivation? An RNA seq analysis is required to show the effect on cellular genes.

      Thank you for your questions and suggestions.

      (1) In our study, we found both adenovirus and vaccinia virus cannot reactivate HIV latency (Figure S3). In addition, the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1 (Figure S4). Thus, these data suggested the reactivation of HIV latency by HSV-1 might be virus-specific. Of course, this might be further investigated in future studies. We have added the corresponding description in the revised manuscript.

      (2) To explore the mechanism of reactivating viral latency by HSV-DICP34.5-based constructs, we performed RNA-seq analysis (Figure S5). Results showed that there were numerous differentially expressed genes (DEGs) in response to HSV-ΔICP34.5 infection. Among them, 2288 genes were upregulated, and 611 genes were downregulated. GO analysis showed the enrichment of these DEGs in cellular cycle, cellular development, and cellular proliferation, and KEGG enrichment analysis indicated the enrichment in pathways such as cellular cycle and cytokine-cytokine receptor interaction. We have added the corresponding description accordingly in the revised manuscript.

      (3) A comparison in primates has to be given for constructs with or without ICP34.5 to validate cell culture data (what is an empty vector?)

      Thank you for your reminder. In the revised manuscript, we performed the following cell experiment to address this issue. Primary CD4+ T cells from people living with HIV (PLWH) were isolated, and then infected with HSV or HSV-∆ICP34.5 constructs. As expected, we confirmed the enhanced reactivation of HIV latency by HSV-∆ICP34.5 (Figure S2). Thank you.

      (4) Legends should be improved in writing and content.

      Thank you for your kind mention. In the revised version, we have improved both the manuscript content and the legends of all Figures have been carefully revised in writing and content. Thank you.

      (5) The primate groups should be enlarged before any reliable conclusions can be made. Inflammatory/tox data should be provided.

      Thank you for your question.

      (1) As mentioned above, we agree with you that this is a pilot study with limited numbers of rhesus macaques. Although the number of macaques was relatively limited, these nine macaques were distributed evenly based on the background level of age, sex, weight, CD4 count, and viral load (VL) (Table S2). All SIV-infected macaques used in this study had a long history of SIV infection and had several courses of ART therapy, which mimics treatment of chronic HIV-1 infection in humans. These macaques were infected with SIVmac239 for more than 5 years, and highly pathogenic SIV-infected macaques have been well-validated as a stringent model to recapitulate HIV-1 pathogenesis and persistence during ART therapy in humans. Indeed, in our Chinese rhesus model, ART treatment effectively suppressed SIV infection to undetectable levels in plasma, and upon ART discontinuation, virus rapidly rebounded, which is very similar with that in ART-treated HIV patients. We think the results of this pilot study were very promising for further studies which will be expanded the scale of animals and then to preclinical and clinical study in our next projects. Thank you for your understanding.

      (2) As well known, ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and previous studies have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      (6) Discuss the potential of inflammatory HSV vaccines to be used in PLWH without clinical symptoms.

      Thank you for your mention. As discussed above, we found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (Figure 1D, Figure S1), and we also found that HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      I think the authors have done due diligence to the experimental system, and collected evidence to show the feasibility of delaying virus rebound in macaques. However, I would encourage the authors to perform experiments that can back up the claim that delayed virus rebound is due to neutralization effects, or perhaps due to a reduction in viral reservoir. I believe insights into this process will add rigor, and push the relevance of the study to the next level.

      Thank you for your nice comment and valuable suggestion. We have now provided more data about this issue. We found significant suppression of total SIV DNA and integrated SIV DNA provirus in the ART+HSV-sPD1-SIVgag/SIVenv group. However, the copies of the SIV DNA provirus were significantly improved in the ART+HSV-empty group and ART+saline group (Figure 5F-G). We also discussed that incorporating the induction of broadly neutralizing antibodies into our future optimizing approaches may lead to better therapeutic outcomes in the revised Discussion section. We have added the corresponding description in the revised manuscript. Thank you.

      Altogether, all of the above comments and suggestions are very helpful in improving our manuscript. We have taken these comments into account seriously and try our best to address these questions point-by-point. After making extensive revisions, we now submit this revised manuscript for your re-consideration. Thank you again for all of your comments and suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study uses single nucleus multiomics to profile the transcriptome and chromatin accessibility of mouse XX and XY primordial germ cells (PGCs) at three time-points spanning PGC sexual differentiation and entry of XX PGCs into meiosis (embryonic days 11.5-13.5). They find that PGCs can be clustered into sub-populations at each time point, with higher heterogeneity among XX PGCs and more switch-like developmental transitions evident in XY PGCs. In addition, they identify several transcription factors that appear to regulate sex-specific pathways as well as cell-cell communication pathways that may be involved in regulating XX vs XY PGC fate transitions. The findings are important and overall rigorous. The study could be further improved by a better connection to the biological system, including the addition of experiments to validate the 'omics-based findings in vivo and putting the transcriptional heterogeneity of XX PGCs in the context of findings that meiotic entry is spatially asynchronous in the fetal ovary. Overall, this study represents an advance in germ cell regulatory biology and will be a highly used resource in the field of germ cell development.

      Strengths:

      (1) The multiomics data is mostly rigorously collected and carefully interpreted.

      (2) The dataset is extremely valuable and helps to answer many long-standing questions in the field.

      (3) In general, the conclusions are well anchored in the biology of the germ line in mammals.

      Weaknesses:

      (1) The nature of replicates in the data and how they are used in the analysis are not clearly presented in the main text or methods. To interpret the results, it is important to know how replicates were designed and how they were used. Two "technical" replicates are cited but it is not clear what this means.

      The two independent technical replicates comprised different pools of paired gonads. This sentence was added to the methods section of the revised manuscript.

      (2) Transcriptional heterogeneity among XX PGCs is mentioned several times (e.g., lines 321-323) and is a major conclusion of the paper. It has been known for a long time that XX PGCs initiate meiosis in an anterior-to-posterior wave in the fetal ovary starting around E13.5. Some heterogeneity in the XX PGC populations could be explained by spatial position in the ovary without having to invoke novel subpopulations.

      We thank the reviewer for pointing out this important biological phenomenon. We also recognize that transcriptional heterogeneity among XX PGCs is likely due to the anterior-to-posterior wave of meiotic initiation in E13.5 ovaries and highlight this possibility in our manuscript. However, since our study utilizes single-nucleus RNA-sequencing and not spatial transcriptomics, we are not able to capture the spatial location of the XX PGCs analyzed in our dataset. As such, our analysis applied clustering tools to classify the populations of XX PGCs captured in our dataset. 

      (3) There is essentially no validation of any of the conclusions. Heterogeneity in the expression of a given marker could be assessed by immunofluorescence or RNAscope.

      In our revised manuscript, we included immunofluorescence staining of potential candidate factors involved in PGC sex determination, such as PORCN and TFAP2C. Testing and optimizing antibodies for the targets identified in this study are ongoing efforts in our lab and we look forward to sharing our results with the research community.

      (4) The paper sometimes suffers from a problem common to large resource papers, which is that the discussion of specific genes or pathways seems incomplete. An example here is from the analysis of the regulation of the Bnc2 locus, which seems superficial. Relatedly, although many genes and pathways are nominated for important PGC functions, there is no strong major conclusion from the paper overall.

      In this manuscript, we set out to identify candidate factors, some already known and many others unknown, involved in the developmental pathways of PGC sex determination using computational tools. Our goal, as a research group and with future collaborators, is to screen these interesting candidates and discover their function in the primordial germ cell. Our research, presented in this study, represents a launching pad for which to identify future projects that will investigate these factors in further detail.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Alexander et al describes a careful and rigorous application of multiomics to mouse primordial germ cells (PGCs) and their surrounding gonadal cells during the period of sex differentiation.

      Strengths:

      In thoughtfully designed figures, the authors identify both known and new candidate gene regulatory networks in differentiating XX and XY PGCs and sex-specific interactions of PGCs with supporting cells. In XY germ cells, novel findings include the predicted set of TFs regulating Bnc2, which is known to promote mitotic arrest, as well as the TFs POU6F1/2 and FOXK2 and their predicted targets that function in mitosis and signal transduction. In XX germ cells, the authors deconstruct the regulation of the premeiotic replication regulator Stra8, which reveals TFs involved in meiosis, retinoic acid signaling, pluripotency, and epigenetics among predictions; this finding, along with evidence supporting the regulatory potential of retinoic acid receptors in meiotic gene expression is an important addition to the debate over the necessity of retinoic acid in XX meiotic initiation. In addition, a self-regulatory network of other TFs is hypothesized in XX differentiating PGCs, including TFAP2c, TCF5, ZFX, MGA, and NR6A1, which is predicted to turn on meiotic and Wnt signaling targets. Finally, analysis of PGC-support cell interactions during sex differentiation reveals more interactions in XX, via WNTs and BMPs, as well as some new signaling pathways that predominate in XY PGCs including ephrins, CADM1, Desert Hedgehog, and matrix metalloproteases. This dataset will be an excellent resource for the community, motivating functional studies and serving as a discovery platform.

      Weaknesses:

      My one major concern is that the conclusion that PGC sex differentiation (as read out by transcription) involves chromatin priming is overstated. The evidence presented in the figures includes a select handful of genes including Porcn, Rimbp1, Stra8, and Bnc2 for which chromatin accessibility precedes expression. Given that the authors performed all of their comparisons between XX versus XY datasets at each timepoint, have they missed an important comparison that would be a more direct test of chromatin priming: between timepoints for each sex? Furthermore, it remains possible that common mechanisms of differentiation to XX and XY could be missing from this analysis that focused on sexspecific differences.

      We thank the reviewer for their thoughtful assessment and suggestions, as stated here. We note that chromatin priming in PGCs prior to sex determination is a well-documented research finding (see references below), that is further supported by our single-nucleus multiomics data. To support these findings previously stated in the scientific literature, we included data demonstrating the asynchronous correlation between chromatin accessibility and gene expression during PGC sex determination. Specifically, we investigated the associations of differentially accessible chromatin peaks with differentially expressed gene expression for each PGC type (between sexes and across embryonic stages) using computational tools and methods that are well-established and applied by the research community. In our manuscript, we note that the patterns we identified support the potential role of chromatin priming in PGC sex determination. Nevertheless, we further highlight that a comprehensive profile of 3D chromatin structure and enhancer-promoter contacts in differentiating PGCs is needed to fully understand how changes to chromatin facilitate PGC sex determination.

      References:

      (1) Chen, M., et al. Integration of single-cell transcriptome and chromatin accessibility of early gonads development among goats, pigs, macaques, and humans. Cell Reports 41 (2022).

      (2) Huang, T.-C. et al. Sex-specific chromatin remodelling safeguards transcription in germ cells. Nature 600, 737–742 (2021).

      Reviewer #3 (Public Review):

      Summary:

      Alexander et al. reported the gene-regulatory networks underpinning sex determination of murine primordial germ cells (PGCs) through single-nucleus multiomics, offering a detailed chromatin accessibility and gene expression map across three embryonic stages in both male (XY) and female (XX) mice. It highlights how regulatory element accessibility may precede gene expression, pointing to chromatin accessibility as a primer for lineage commitment before differentiation. Sexual dimorphism in these elements and gene expression increases over time, and the study maps transcription factors regulating sexually dimorphic genes in PGCs, identifying sex-specific enrichment in various transcription factors. Strengths:

      The study includes step-wise multiomic analysis with some computational approach to identify candidate TFs regulating XX and XY PGC gene expression, providing a detailed timeline of chromatin accessibility and gene expression during PGC development, which identifies previously unknown PGC subpopulations and offers a multimodal reference atlas of differentiating PGC clusters. Furthermore, the study maps a complex network of transcription factors associated with sex determination in PGCs, adding depth to our understanding of these processes.

      Weaknesses:

      While the multiomics approach is powerful, it primarily offers correlational insights between chromatin accessibility, gene expression, and transcription factor activity, without direct functional validation of identified regulatory networks.

      As stated in our response above to a similar concern, we note that our research study represents a launching pad for which to identify future projects that will investigate candidates that may be involved in PGC sex determination, in further detail. With this rich dataset in hand, our goal in future research projects is to screen these candidates and discover their function in PGCs. 

      Response to Recommendations

      Reviewer #1 (Recommendations For The Authors):

      (1) Clarify at first introduction how combined ATAC-seq/RNA-seq mulitomics libraries were prepared, including if ATAC and RNA-seq data are from the same cell.

      This information was added to the introduction of the revised manuscript.

      (2) Clarify what the two technical replicates represent. Are they two libraries from the same gonad or the same pool of gonads? Are they from 2 different gonads?

      The two independent technical replicates comprised different pools of paired gonads. This sentence was added to the methods section of the revised manuscript.

      (3) In Supplemental Figure 1, there is substantial variation in the number of unique snATAC-seq fragments between some conditions. Could this create a systematic bias that affects clustering?

      We recognize the concern that substantial variation in the number of unique snATAC-seq fragments between conditions could potentially create a systematic bias that affects clustering. However, we analyzed our snATAC-seq dataset with Signac, which performs term frequency-inverse document frequency (TF-IDF) normalization. This is a process that normalizes across cells to correct for differences in cellular sequencing depth. Given that sequencing depth was taken into account in our normalization and clustering procedures, and that the unbiased clustering of PGCs also reflects the sex and embryonic stage of PGCs, we are confident that the clustering of the snATAC-seq datasets closely reflects the biological variability present in the PGCs collected.

      References:

      Signac Website:  https://stuartlab.org/signac/articles/pbmc_vignette

      Stuart, T., Srivastava, A., Madad, S., Lareau, C. A., & Satija, R. (2021). Single-cell chromatin state analysis with Signac. Nature methods, 18(11), 1333-1341.

      (4) In Figures 2a, 2e, 3a, and 3e, the visualization scheme is very difficult to follow. It's very hard to see the colors corresponding to average expression for many genes because the circles are so small. In addition, the yellow color is hard to see and makes it hard to estimate the size of the circle since the boundaries can be indistinct. I recommend using a different visualization scheme and/or set of size scales be used.

      In Figures 2a, 2e, 3a, and 3e, we chose this color palette to be inclusive of viewers who are colorblind. The chosen colors are visible on both a computer screen and on printed paper. We also included a legend of the color scale and dot size representing the average expression and percent of cells expressing the gene, respectively. If the color cannot be seen, it is because the cell population is not expressing the gene.

      (5) Perform in vivo validation (immunofluorescence or RNAscope) of at least some targets implicated in PGC development by this study.

      Such validations (immunofluorescence staining of PORCN and TFAP2C) are now included in Figure 4 and the supplement.

      (6) In line 351, the authors state that "we observed a strong demarcation between XX and XY PGCs at E12.5-E13.5." But in Figure 1j it looks like a reasonably high fraction of both XX and XY E12.5 cells are in cluster 1, which should mean that there is some overlap.

      While it is true that Figure 1j shows overlap of both XX and XY E12.5 cells in cluster 1, we were commenting on the separation of E12.5 XX (clusters 4 and 5) and E12.5 XY (clusters 8 and 9) PGCs. We have modified the sentence beginning at line 351 to state that the separation between XX and XY PGCs occurs at E13.5.

      (7) In lines 404-405: "We first linked snATAC-seq peaks to XY PGC functional genes". It is important to know how the peaks were linked to genes.

      We added the following sentence to address this comment: “Peak-to-gene linkages were determined using Signac functionalities and were derived from the correlation between peak accessibility and the intensity of gene expression.”

      (8) In Supplemental Figure 5c, the XX E11.5 condition has a substantially higher fraction of ATAC peaks at promoter regions compared to the others. Does this have statistical and biological significance?

      This is an interesting observation beyond the scope of our manuscript. Many interesting questions arise from this study and it is our plan to investigate further in the future. 

      (9) Line 885: "The increased number of DA peaks at E13.5 may be the result of changes to chromatin structure as XX PGCs enter meiotic prophase I"; but in Figure 4b, there's only a modest increase in DAP number from E12.5 to E13.5 in XX PGCs, compared to a massive gain in XY PGCs.

      In our manuscript, we comment on both phenomena: the doubling of differentially accessible peaks in XX PGCs from E12.5 to E13.5 and the massive increase in differentially accessible peaks in XY PGCs from E12.5 to E13.5. In our description of these results, we propose several hypotheses leading to these increases in differentially accessible peaks. As such, it cannot be ruled out that the changes to chromatin structure that occur during meiotic prophase I contribute to the gain in differentially accessible peaks in XX PGCs at E13.5, and we included this statement in the manuscript accordingly.

      Reviewer #2 (Recommendations For The Authors):

      (1) The methods state at line 141 that nuclei with mitochondrial reads of more than 25% were removed, however our understanding from the Bioconductor manual and companion manuscript (Amezquita, R.A., Lun, A.T.L., Becht, E. et al. Orchestrating single-cell analysis with Bioconductor. Nat Methods 17, 137-145 (2020). https://doi.org/10.1038/s41592-019-0654-x) is that snRNA-seq approaches remove mitochondrial transcripts entirely and datasets containing mitochondrial transcripts are thought to feature incompletely stripped nuclei. It is thought that mitochondrial transcripts participating in nuclear import may remain hanging on to the nuclear envelope and get encapsulated into GEMs. If the mitochondrial read cutoff of 25% was used intentionally to keep this potentially contaminating signal, please justify why this was done for this dataset.

      We agree with the reviewer that the presence of mitochondrial transcripts may be potentially contaminating signal. In our preprocessing steps, we removed the mitochondrial genes and transcripts from our datasets so that they would not influence or affect our analyses. The following sentence was added to the methods section on snRNA-seq data processing: “Mitochondrial genes and transcripts were removed from the snRNA-seq datasets to eliminate any potentially contaminating signal.”

      (2) Methods line 227: please include log2fold change and p-adjusted value cutoffs for GO enrichment.

      We used clusterprofiler for our GO enrichment analysis. Our GO enrichment analysis did not include a log2fold change analysis and the p-adjusted value cutoff is stated in the methods.

      (3) Results line 310: the claim that "At E12.5-E13.5, XY PGCs converged onto a single distinct population (cluster 7), indicating less transcriptional diversity among E12.5-E13.5 XY PGCs when compared to E12.5E13.5 XX PGCs (Fig1d)" would be strengthened if the authors quantified transcriptional distance with distance metrics such as euclidean or cosine distance.

      We used a clustering approach to gain insights into the transcriptional diversity of PGC populations. Using an additional metric, such as Euclidean or cosine distance, would not provide meaningful information not already achieved by clustering or change the conclusions presented in the manuscript.

      (4) Results line 317: the authors allude to Lars2 defining clusters 2 & 3 as a marker gene, but it is not clear why this is highlighted until the reader reaches the discussion, which alludes to the published role of Lars2 in reproduction. Please consider moving this sentence to the results section for clarity and perhaps expanding the discussion on the meaning.

      To provide clarity, we added the statement “genes with reported roles in reproduction” to the results section.

      (5) In Figure 2a, why do the authors choose to focus on Zkscan5 in XY PGCs when it is expressed by such a small portion of cells (<25%)? Do they assume that this is due to dropouts?

      We chose to focus on Zkscan5 as an example because of its enriched and differential expression in male PGCs, the motif for Zkscan5 is not enriched in female PGCs, and the reported roles of Zkscan5 in regulating cellular proliferation and growth. Zkscan5 is an example of how candidate genes can be identified for further investigation.

      (6) Line 461: "the population of E13.5 XX PGCs displaying the strongest Stra8 expression levels corresponded to the same population of XX PGCs with the highest module score of early meiotic prophase I genes (Figure 3c; Supplementary Fig. 3a-b)". However did the authors also consider examining the Stra8+ XX PGCs that do not robustly express meiotic genes to understand more about their differentiation potential?

      We are thankful to the reviewer for this suggestion. However, this research question is beyond the scope of the manuscript. We plan to investigate further in future research studies.

      (7) Line 505: "when we searched for the presence of RA receptor motifs in peaks linked to genes related to meiosis and female sex determination, we found that Stra8, Rec8, Rnf2, Sycp1, Sycp2, Ccnb3, and Zglp1 contain the RA receptor motifs in their regulatory sequences (Supplementary Figure 4g)." My read of the text is that the authors are not taking a side on the RA and meiosis controversy, but rather trying to reveal what the data can tell us, and the answer is that there is a strong signature linking RA to meiotic genes, which supports this as a valid biological pathway. But what is the strength of the RA>meiosis pathway compared to other mechanisms (which must be functioning in the triple receptor KO)? Perhaps the authors could take this analysis further with the following questions: (1) ask whether meiotic genes are more enriched in RA motifs compared to other expressed genes or other motifs (2) compare the strength of peak-gene correlations for all peaks containing RA receptor motifs vs. those with peaks for Zglp1, Rnf2, etc binding. The strengths of these correlations could provide clues to how much gene expression varies in response to RA exposure vs. modulation of these other factors and thus tell us something about how much RA is playing a role.

      We agree with the reviewer that this is a very interesting and important question. We also thank the reviewer for their thoughtful suggestions on the types of bioinformatics analyses that could answer this question. However, the section on RA signaling during PGC sex determination is only a small part of the manuscript and would be better analyzed in greater detail in a future research study or publication.

      (8) The shift from promoters in E11.5 XX PGCs to distal intergenic regions is fascinating. What can we learn about epigenetic reprogramming/methylation changes across gene bodies? 

      We agree with the reviewer that this is an interesting question about gene regulation in E11.5 XX PGCs. However, we prefer to analyze the epigenetic reprogramming changes across gene bodies in this cell population in additional research studies. Our purpose and goal for this section was to link differentially accessible chromatin peaks with differentially expressed genes to identify putative gene regulatory networks.

      (9) Line 581: why did the authors choose to highlight and validate PORCN1 in PGCs? Please elaborate.

      As stated in the manuscript, we chose to highlight and validate PORCN1 in PGCs because of its role in WNT signaling and because of the visibly strong correlation between chromatin accessibility at the XXenriched DAP in Fig. 4c (dashed box) and and gene expression of PORCN1.

      (10) Figure 5f would be easier to interpret if presented as two columns rather than a circle; show one line of the proteins and the other line with the transcripts so that each is on the same line and there are connections between them.

      This comment is related to stylistic preferences. The purpose of Fig. 5f is to demonstrate that the candidate transcription factors may regulate the expression of other enriched transcription factors. Figure 5f figure accomplishes this goal.

      (11) Line 640: "The predicted target genes of TCFL5 totaled 74% (367/494) of all DEGs with peak-to-gene linkages in XX PGCs". This seems like a high number and a lot of work for just TCFL5; given the overlap between other TFs and target genes, how many of these 367 target genes overlap with other TFs?

      We agree with the reviewer that this is an important declaration to make. We added the following sentence to the results section on TCFL5: “A large majority of the predicted target genes of TCFL5 were also predicted to be the target genes of the enriched TFs presented in Fig. 5e, e.g., the predicted target genes of these TFs overlapped with 4%-100% of the predicted target genes of TCFL5.”

      (12) The presentation of TCFL5 in the results section would make more sense with the additional mention of reproductive phenotypes already known (currently in the discussion Lines 914-917). I would furthermore suggest that the discussion goes into more depth on the difference between the regulatory network of TCFL5 in XX meiosis vs XY.

      We thank the reviewer for this comment, however, we already state in the results section that TCFL5 is known to influence XX PGC sex determination.

      (13) In the Methods, please state more clearly for those not familiar that the genetic background of mice is mixed.

      We described the mice with their official names, which provides the context of their genetic backgrounds.

      (14) Please specify which morphologic criteria were used to verify the stage of embryos in the methods.

      We added the following text to the methods section of the revised manuscript: “Plug date was used to determine the stage of embryos collected for single-nucleus RNA-seq and ATAC-seq. The stage of E11.5 embryos was confirmed by counting somites. The stage of embryos collected at E12.5 was confirmed by the morphological presence of the vessel and cords of the testes collected from XY embryos. Similarly, we confirmed the stage of embryos collected at E13.5 by the size of the gonads, the presence of more distinct cords in the testes of XY embryos, and the elongation of the ovaries of XX embryos.”

      (15) The total number of cells and PGCs that passed QC and are included in UMAPS should be stated.

      The requested information was added to the legend for Fig. 1 of the revised manuscript: “The number of PGCs per sex and embryonic stage are: 375 E11.5 XX PGCs; 1,106 E12.5 XX PGCs; 750 E13.5 XX PGCs; 110 E11.5 XY PGCs; 465 E12.5 XY PGCs; and 348 E13.5 XY PGCs.”

      (16) The order of timepoints changes between figures, and this is not for any obvious reason. Please make it consistent. Figures 1 and 6 list XX 11.5, 12.5, 13.5, and the same for XY, but Figures 2, 3, and 4 use the reverse order: XY E13.5, E12.5, E11.5, and then XX. 

      We thank the reviewer for this comment. However, we chose this order for each of the figures to match the coordinates of the graphs and where we would expect the reader to begin reading the graph first. For example, in Figure 3a, XX E11.5 is closest to the x-axis and would be expected to be read first.   

      (17) In Figure S2 the colors of clusters are hard to distinguish, and it is suggested that the cluster numbers should be listed above each colored bar to avoid frustration.

      We made the suggested correction to Figure S2.

      (18) In Figures 2e and 3e: what do the dashed boxes indicate?

      The dashed boxes are to guide the reader’s eyes to the fact that the order of transcription factors/genes under the Cistrome DB regulatory potential score and gene expression plots are the same.

      (19) In Figure 5a: break panels into i-iv so that the in-text call-outs are not all the same.

      We made the suggested correction to Figure 5a and modified the in-text call-outs.

      (20) Please indicate XX in Figure 5e and XY in Figure 5l.

      We made the suggested correction to Figure 5e and 5l.

      (21) In Figure S5c: Please reorganize DA chromatin peak charts so that columns are XX and XY with rows at the same timepoint.

      We made the suggested correction to Figure S5c.

      (22) In Figure S7a: please make images larger so that the overlapping expression of PORCN and TRA98 is more visible, and consider adding a more magnified panel.

      This image is now included in the main text, with expanded panels.

      (23) Line 742-754: this seems like a long introduction for the results section; please consider tightening it up.

      We believe this text is important and necessary to provide context to the bioinformatics analyses of cell signaling pathways in PGCs. Not all readers will be familiar with the ligand-receptor signals between gonadal support cells and PGCs, and this text provides details on which signaling pathways are known to direct sex determination of PGCs.

      (24) For UMAP plots in Figures 2c, 3c, S3b, and S4b, the text overlaid with the timepoints and sexes onto the UMAP plots is misleading, as it allows the reader to presume that the entire group of cells for a given sex/timepoint is located in the location of the text overlay. However, from the UMAP plots in Figure 1i-j, it is clear that the cells from a given sex/timepoint are actually spread across multiple identified clusters. Thus, the overlaid text obscures the important heterogeneity detected. To better represent the actual locations on the UMAP plot of cells from each sex/timepoint, it would be better to show inset density plots alongside these UMAP plots so the reader can locate the cells for themselves. 

      We thank the reviewer for this comment. However, we chose this formatting to offer simplicity and ease of understanding to our UMAPs in addition to highlighting the general biological patterns of gene expression. If the reader is interested in discerning more of the heterogeneity of the UMAPs, they may refer back to Figure 1.

      Reviewer #3 (recommendations for the authors):

      There are some errors or places that need clarification or corrections:

      (1) Figure 1f, according to the graph, it should be 8 clusters, not 9.

      There are 9 clusters because the numbering for the clusters start at ‘0’.

      (2) Why did cluster 8 have so many different states of cells from both sexes?

      The identification of cluster 8 is likely an artifact of sequencing, and would require several different analyses to figure out why cluster 8 has many different states of cells from both sexes. While this will address a technical issue associated with the dataset, this will not change any major conclusions of the study.

      (3) Figure 1i, shouldn't that be ten instead of eleven?

      There are 11 clusters because the numbering for the clusters start at ‘0’.

      (4) Figure 2a, zkscan expression level comparison was not so obvious as the bubble size was small. How many folds of differences from xx pgc?

      There is a 1.5 fold increase in the expression of Zkscan5 between XY and XX PGCs at E13.5. We included this information in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors constructed a novel HSV-based therapeutic vaccine to cure SIV in a primate model. The novel HSV vector is deleted for ICP34.5. Evidence is given that this protein blocks HIV reactivation by interference with the NFkappaB pathway. The deleted construct supposedly would reactivate SIV from latency. The SIV genes carried by the vector ought to elicit a strong immune response. Together the HSV vector would elicit a shock and kill effect. This is tested in a primate model.

      Strengths and weaknesses:

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); this is acknowledged by the authors that no full mechanistic explanation can be given at this moment.

      Thank you for your comments. We agree with you that the mechanism underlying increased reactivation by deleting ICP34.5 is only partially explored. As you pointed out, the deletion of ICP34.5 leads to a significant reactivation, while the overexpression of ICP34.5 has a relatively weak inhibitory effect on reactivation. This difference prompts us to further contemplate the role of HSV-1 in regulating HIV latency and reactivation. Our data (Figure S4), along with previous literature (Mosca et al., 1987, Nabel et al., 1988), have indicated that the ICP0 protein might play a crucial role in the reactivation of HIV latency. However, we found for the first time that ICP34.5 can play an antagonistic role with this reactivation. This is a very interesting topic for understanding the complicated interactions between host cells and different viruses. We will investigate the deeper insights in future studies, and we have mentioned this limitation in the revised Discussion Section. Thank you!

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? A RNA seq analysis is required to show the effect on cellular genes.

      A RNA seq analysis was done in the revised manuscript comparing the effect of HSV-1 and deleted vector in J-LAT cells (Fig S5). More than 2000 genes are upregulated after transduction with the modified vector in comparison with the WT vector. Hence, the specificity of upregulation of SIV genes is questioned. Authors do NOT comment on these findings. In my view it questions the utility of this approach.

      Thank you for your mentions.

      (1) As for the toxicity of HSV-ΔICP34.5, it is well known that ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and thus deleting ICP34.5 is beneficial to improve the safety of HSV-based constructs. As expected, we have demonstrated experimentally that HSV-DICP34.5 exhibited lower virulence and replication ability than wild-type HSV-1 (Figure S1). Importantly, we also observed a significant decrease in the expression of inflammatory factors in PWLH when compared to wild-type HSV-1 (Figure 1I-K). These data suggested that the safety of HSV-DICP34.5 should be more tolerable than wild-type HSV vector.

      (2) The RNASeq analysis is aimed to explore the HSV-ΔICP34.5-induced signaling pathways, but it is not suitable to use this data for assessing the toxicity of HSV-ΔICP34.5 constructs. As for the RNASeq data, we think it is reasonable to observe many upregulated genes (which are involved in a variety of signaling pathways), since HSV-DICP34.5 constructs reactivated HIV latency more effectively than wild-type HSV by modulating the IKKα/β-NF-kB pathway and PP1-HSF1 pathway.

      (3) To further validate whether HSV-ΔICP34.5 can specifically activate the HIV latent reservoir, we conducted additional experiments using vaccinia virus and adenovirus as controls, and results showed that both vaccinia virus and adenovirus cannot effectively reactivate HIV latency (Figure S3). Moreover, the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1, and overexpressing ICP0 greatly reactivate the latent HIV (Figure S4, Figure S5), implying that this reactivation should be virus-specific and ICP0 plays an important factor on reversing HIV latency. Interestingly, we herein found that ICP34.5 can act as an antagonistic factor for this reactivation of HIV latency by HSV-1. Thus, after the deletion of ICP34.5, the ability of HSV to reverse HIV latency was significantly enhanced. Our research group will investigate the underlying mechanism in future studies. Thank you for your insightful mention.

      (3) The primate groups are too small and the results to variable to make averages. In Fig 5, the group with ART and saline has two slow rebounders. It is not correct to average those with the single quick rebounder. Here the interpretation is NOT supported by the data.

      Although authors provided some promising SIV DNA data, no additional animals were added. Groups of 3 animals are too small to make any conclusion, especially since the huge variability in response. The average numbers out of 3 are still presented in the paper, which is not proper science.

      No data are given of the effect of the deletion in primates. Now the deleted construct is compared with an empty vector containing no SIV genes. Authors provide new data in Fig S2 on the comparison of WT and modified vector in cells from PLWH, but data are not that convincing. A significant difference in reactivation is seen for LTR in only 2/4 donors and in Gag in 3/4 donors. (Additional question what is meaning of LTR mRNA, do authors relate to genomic RNA??)

      Thank you for your serious review and kind reminder.

      (1) We agree with you that it is not appropriated to use averages for this pilot study with limited numbers of macaques. We are currently unable to conduct another experiment with a larger number of macaques, but we think the results of this pilot study were very promising for further studies. Now, following your kind suggestions, we have removed the averages and now presented the data for each monkey individually in the revised manuscript. We have also modified the corresponding description accordingly (Line 254 to 262). Thank you for your understanding.

      (2) Regarding your comment about the lack of data on the deletion of ICP34.5 from HSV-1, we are sorry for previously unclear description. In fact, the empty vector used in our animal experiments not only does not contain SIV antigens but also has the ICP34.5 deletion. We have revised the corresponding description accordingly (For example, we use HSV-DICP34.5DICP47-empty, HSV-DICP34.5DICP47-sPD1-SIVgag/SIVenv instead of HSV-empty, HSV-sPD1-SIVgag/SIVenv). We hope this revision will address your question.

      (3) As for the reactivation effects observed in PLWH samples, the data may be not perfect, but we think this result (a significant difference in reactivation is seen for LTR in 2/4 donors and for Gag in 3/4 donors, and the purpose of detecting LTR RNA is to evaluate the level of virus replication) is promising to support our conclusion (The enhanced reactivation effect in primary CD4+ T cells by HSV-∆ICP34.5 than wild-type HSV). Of course, we recognize the need for more samples to gain a comprehensive understanding of reactivation effect in different individuals in future study. In addition, we corrected the description of LTR RNA (Lines 99-106 and 115-116). Thank you for the reminder!

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

      The RNA seq data add on to this worry and should at least be discussed.

      Thank you for your mention. As mentioned above, the RNASeq analysis is aimed to explore the HSV-ΔICP34.5-induced signaling pathways, but it is not suitable to use this data for assessing the toxicity of HSV-ΔICP34.5 constructs. Actually, ICP34.5 is a neurotoxicity factor that can antagonize innate immune responses, and thus ICP34.5 deletion is beneficial to improve the safety of HSV-based constructs. As expected, our data have demonstrated experimentally that HSV-DICP34.5 exhibited lower virulence and replication ability than wild-type HSV-1 (Figure S1). Importantly, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5H) and body weight (Figure S10) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5DICP47-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-DICP34.5DICP47-sPD1-SIVgag/SIVenv group (Figure S11). These data suggested that the safety of HSV-DICP34.5 should be more tolerable than wild-type HSV vector. We have added a more comprehensive description in the revised Discussion (Lines 328-334). Thank you again for all of your kind comments and suggestions.

      Reviewer #2 (Public review):

      Summary:

      In this article Wen et. al., describe the development of a 'proof-of-concept' bi-functional vector based out of HSV-deltaICP-34.5's ability to purge latent HIV-1 and SIV genomes from cells. They show that co-infection of latent J-lat T-cell lines with a HSV-deltaICP-34.5 vector can reactivate HIV-1 from a latent state. Over- or stable expression of ICP 34.5 ORF in these cells can arrest latent HIV-1 genomes from transcription, even in the presence of latency reversal agents. ICP34.5 can co-IP with- and de-phosphorylate IKKa/b to block its interaction with NF-k/B transcription factor. Additionally, ICP34.5 can interact with HSF1 which was identified by mass-spec. Thus, the authors propose that the latency reversal effect of HSV-deltaICP-34.5 in co-infected JLat cells is due to modulatory effects on the IKKa/b-NF-kB and PP1-HSF-1 pathway.

      Next the authors cleverly construct a bifunctional HSV based vector with deleted ICP34.5 and 47 ORFs to purge latency and avoid immunological refluxes, and additionally expand the application of this construct as a vaccine by introducing SIV genes. They use this 'vaccine' in mouse models and show the expected SIV-immune responses. Experiments in rhesus macaques (RM), further elicit potential for their approach to reactivate SIV genomes and at the same time block their replication by antibodies. What was interesting in the SIV experiments is that the dual-functional vector vaccine containing sPD1- and SIV Gag/Env ORFs effectively delayed SIV rebound in RMs and in some cases almost neutralized viral DNA copy detection in serum. Very promising indeed, however there are some questions I wish the authors explored to answer, detailed below.

      Overall, this is an elegant and timely work demonstrating the feasibility of reducing virus rebound in animals, and potentially expand to clinical studies. The work was well written, and sections were clearly discussed.

      Strengths:

      The work is well designed, rationale explained and written very clearly for lay readers.

      Claims are adequately supported by evidence and well designed experiments including controls.

      We appreciate your positive comment for our work.

      Weaknesses:

      (1) It looks like ICP0 is also involved in latency reversal effects. More follow-up work will be required to test if this is in fact true.

      Both our data (Figure S4, Figure S5) and previous literature (Nabel et al., 1988, Mosca et al., 1987) have reported that HSV ICP0 may play a role in reversing HIV latency. However, the exact mechanisms behind this effect have not yet been fully elucidated. Of note, we herein reported for the first time that ICP34.5 can act as an antagonistic factor for this reactivation of HIV latency by HSV-1. Thus, after the deletion of ICP34.5, the ability of HSV to reverse HIV latency was significantly enhanced. Our research group will investigate the underlying mechanism in future studies. Thank you for your insightful mention.

      (2) It is difficult to estimate the depletion of the latent viral reservoir. The authors have tried to address this issue. A more convincing argument to this reviewer will be data to demonstrate that after the bi-functional vaccine, the animals show overall reduction in the number of circulating latent cells. The feasibility to obtain such a result is not clearly demonstrated.

      Thank you for your comment. As you mentioned, we have indeed measured both total DNA and integrated DNA (iDNA) in blood cells (see Figure 5E-F), which can provide support for the reduction of the latent viral reservoir. Thank you for your kind reminder.

      (3) The authors state that the reduced virus rebound detected following bi-functional vaccine delivery is due to latent genomes becoming activated and steady-state neutralization of these viruses by antibody response. This needs to be demonstrated. Perhaps cell-culture experiments from specimen taken from animals might help address this issue. In lab cultures one could create environments without antibody responses, under these conditions one would expect higher level of viral loads being released in response to the vaccine in question.

      Thank you for your valuable suggestion. We believe that the reduced virus rebound observed may be influenced by immune responses from T cells and antibodies induced by both ART and the vaccine. We appreciate your insight and agree that future studies should focus on investigating the activation effects of the vaccine under controlled conditions that simulate the absence of immune responses in primary animal cells. This will help us better understand the mechanisms involved and address your concerns more comprehensively.

      Reviewer #2 (Recommendations for the authors):

      The Authors have sufficiently addressed my comments. Below are a few minor changes that can help with clarity.

      Lines 126-127: This sentence should be changed. Perhaps, "these data suggests that .... Safety of... in PLWH might be tolerable, at least in vitro."

      Thanks for your suggestion. We have revised it accordingly. (Line 130).

      Lines 128-132: Would this not mean that reactivation is due to ICP0 gene? Have the authors tried to express ICP0-gene into J-Lat cells and see if that is the reason for reactivation? This seems somewhat incomplete. At the end of 132, please add ", in the presence of ICP0". Also a sentence describing this effect is warranted.

      Thank you for your insightful suggestion. Yes, both our data and previous literature supported that the ICP0 gene can play a significant role in the reactivation of HIV latency (Figure S4, Figure S5). Of note, we herein reported for the first time that ICP34.5 can act as an antagonistic factor for this reactivation of HIV latency by HSV-1. Thus, after the deletion of ICP34.5, the ability of HSV to reverse HIV latency was significantly enhanced. We have described this effect in the revised version accordingly. Additionally, we have added the phrase “in the presence of ICP0” to the results section (Lines 137) to clarify this point.

      MOSCA, J. D., BEDNARIK, D. P., RAJ, N. B., ROSEN, C. A., SODROSKI, J. G., HASELTINE, W. A., HAYWARD, G. S. & PITHA, P. M. 1987. Activation of human immunodeficiency virus by herpesvirus infection: identification of a region within the long terminal repeat that responds to a trans-acting factor encoded by herpes simplex virus 1. Proc Natl Acad Sci U S A 84:  7408.DOI: https://doi.org/10.1073/pnas.84.21.7408, PMID: 2823260

      NABEL, G. J., RICE, S. A., KNIPE, D. M. & BALTIMORE, D. 1988. Alternative mechanisms for activation of human immunodeficiency virus enhancer in T cells. Science 239:  1299.DOI: https://doi.org/10.1126/science.2830675, PMID: 2830675

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      By using the biophysical chromosome stretching, the authors measured the stiffness of chromosomes of mouse oocytes in meiosis I (MI) and meiosis II (MII). This study was the follow-up of previous studies in spermatocytes (and oocytes) by the authors (Biggs et al. Commun. Biol. 2020: Hornick et al. J. Assist. Rep. and Genet. 2015). They showed that MI chromosomes are much stiffer (~10 fold) than mitotic chromosomes of mouse embryonic fibroblast (MEF) cells. MII chromosomes are also stiffer than the mitotic chromosomes. The authors also found that oocyte aging increases the stiffness of the chromosomes. Surprisingly, the stiffness of meiotic chromosomes is independent of meiotic chromosome components, Rec8, Stag3, and Rad21L. with aging.

      Strengths:

      This provides a new insight into the biophysical property of meiotic chromosomes, that is chromosome stiffness. The stiffness of chromosomes in meiosis prophase I is ~10-fold higher than that of mitotic chromosomes, which is independent of meiotic cohesin. The increased stiffness during oocyte aging is a novel finding.

      Weaknesses:

      A major weakness of this paper is that it does not provide any molecular mechanism underlying the difference between MI and MII chromosomes (and/or prophase I and mitotic chromosomes).

      We acknowledge that our study does not provide a comprehensive explanation for the stage-related alterations in chromosome stiffness; however, we believe that the observation of these changes is itself of broad interest. Initially, we hypothesized that DNA damage or depletion of meiosis-specific cohesin might contribute to the observed increase in chromosome stiffness. However, our experimental finding did not support these hypotheses, indicating that neither DNA damage nor cohesion depletion is responsible for the stiffness increase. The molecular basis underlying the stage-related stiffness increase remains elusive and requires exploration in future studies. In the Discussion, we propose that factors such as condensin, nuclear proteins, and histone methylation may play a role in regulating meiotic chromosome stiffness. The involvement of these factors in stage-related chromosome stiffening requires future investigation.

      Reviewer #2 (Public Review):

      This paper reports investigations of chromosome stiffness in oocytes and spermatocytes. The paper shows that prophase I spermatocytes and MI/MII oocytes yield high Young Modulus values in the assay the authors applied. Deficiency in each one of three meiosis-specific cohesins they claim did not affect this result and increased stiffness was seen in aged oocytes but not in oocytes treated with the DNA-damaging agent etoposide.

      The paper reports some interesting observations which are in line with a report by the same authors of 2020 where increased stiffness of spermatocyte chromosomes was already shown. In that sense, the current manuscript is an extension of that previous paper, and thus novelty is somewhat limited. The paper is also largely descriptive as it does neither propose a mechanism nor report factors that determine the chromosomal stiffness.

      There are several points that need to be considered.

      (1) Limitations of the study and the conclusions are not discussed in the "Discussion" section and that is a significant gap. Even more so as the authors rely on just one experimental system for all their data - there is no independent verification - and that in vitro system may be prone to artefacts.

      Our experimental system has been used to study different types of chromosome stiffness as well as nuclear stiffness.  We have compared our results with previously published data and found the data is consistent across different experiments. To address the reviewer’s concern, we describe the limitations of our in vitro experimental approach in the Discussion section.

      (2) It is somewhat unfortunate that they jump between oocytes and spermatocytes to address the cohesin question. Prophase I (pachytene) spermatocytes chromosomes are not directly comparable to MI or MII oocyte chromosomes. In fact, the authors report Young Modulus values of 3700 for MI oocytes and only 2700 for spermatocyte prophase chromosomes, illustrating this difference. Why not use oocyte-specific cohesin deficiencies?

      In this study, our goal was to investigate the mechanism underlying the increased chromosome stiffness observed during prophase I. Ideally, we would have compared wild-type and cohesin-deleted mouse oocytes at the metaphase I (MI) stage. However, experimental constraints made this approach unfeasible: spermatocytes and oocytes from  Rec8<sup>-/-</sup> and  Stag3<sup>-/-</sup> mutant mice cannot reach MI stage, and  Rad21l<sup>-/-</sup> mutant mice are sterile in males and subfertile in females, because cohesin proteins are crucial for germline cell development.

      Additionally, collecting prophase I chromosomes from oocytes is exceptionally challenging and requires fetal mice as prophase I oocyte sources because female oocytes progress to the diplotene stage during fetal development. The process is further complicated by the difficulty of genotyping fetal mice, making the study of female prophase I impracticable. By contrast, spermatocytes are continuously generated in males throughout life, with meiotic stages readily identifiable, making them more accessible for analysis.

      Our findings consistently showed increased chromosome stiffness in both prophase I spermatocytes and MI oocytes, suggesting that the phenomenon is not sex-specific. This observation implies that similar effects on chromosome stiffness may occur across meiotic stages, from prophase I to MI.

      (3) It remains unclear whether the treatment of oocytes with the detergent TritonX-100 affects the spindle and thus the chromosomes isolated directly from the Triton-lysed oocytes. In fact, it is rather likely that the detergent affects chromatin-associated proteins and thus structural features of the chromosomes.

      Regarding the use of Triton X-100, it is important to emphasize that the concentration used (0.05%) is very low and unlikely to significantly affect chromosome stiffness. To support this assertion, we have provided additional evidence in the revised manuscript demonstrating that this low concentration of Triton X-100 has a negligible effect on chromosome stiffness (Supplement Fig. 5, Right panel).

      (4) Why did the authors use mouse strains of different genetic backgrounds, CD-1, and C57BL/6? That makes comparison difficult. Breeding of heterozygous cohesin mutants will yield the ideal controls, i.e. littermates.

      The genetic mutant mice, all in a C57BL/6 background, were generously provided by Dr. Philip Jordan and delivered to our lab. As our lab does not currently maintain C57BL/6 colony and given that this strain typically produces small litter sizes - which would have complicated the remainder of the study - we chose CD-1 mice as the control group and used C57BL/6 mice specifically for the cohesin study. To address potential concerns regarding genetic background differences, we compared our results with previously published data from C57BL/6 mice and found no significant differences (2710 ± 610 Pa versus 3670 ± 840 Pa, P= 0.4809) (Biggs et al., 2020). Furthermore, prophase I spermatocytes from CD-1 mice showed no significant difference compared to any of the three cohesin-deleted C57BL/6 mutant mice, suggesting that chromosome stiffness is not significantly influenced by genetic background.

      (5) How did the authors capture chromosome axes from STAG3-deficienct spermatocytes which feature very few if any axes? How representative are those chromosomes that could be captured?

      We isolated chromosomes from prophase I mutant spermatocytes, which were identified by their large size, round shape, and thick chromosomal threads - characteristics indicative of advanced condensation and a zygotene-like stage during prophase I (Supplemental Fig. 3). The methodology for isolating these chromosomes has been described in details in our previous publication (Biggs et al., 2020), which is referenced in the current manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Understanding the mechanical properties of chromosomes remains an important issue in cell biology. Measuring chromosome stiffness can provide valuable insights into chromosome organization and function. Using a sophisticated micromanipulation system, Liu et al. analyzed chromosome stiffness in MI and MII oocytes. The authors found that chromosomes in MI oocytes were ten-fold stiffer than mitotic ones. The stiffness of chromosomes in MI mouse oocytes was significantly higher than that in MII oocytes. Furthermore, the knockout of the meiosis-specific cohesin component (Rec8, Stag3, Rad21l) did not affect meiotic chromosome stiffness. Interestingly, the authors showed that chromosomes from old MI oocytes had higher stiffness than those from young MI oocytes. The authors claimed this effect was not due to the accumulated DNA damage during the aging process because induced DNA damage reduced chromosome stiffness in oocytes.

      Strengths:

      The technique used (isolating the chromosomes in meiosis and measuring their stiffness) is the authors' specialty. The results are intriguing and informative to the chromatin/chromosome and other related fields.

      Weaknesses:

      (1) How intact the measured chromosomes were is unclear.

      Currently, a well-calibrated chromosome mechanics experiment requires the extracellular isolation of chromosomes. In experiments conducted parallel to those in our previous study (Biggs et al., 2020), we obtained quantitatively consistent results, including measurements of the Young modulus for prophase I spermatocyte chromosomes.  Our isolation approach is significantly gentler than bulk methods that rely on hypotonic buffer-driven cell lysis and centrifugation. If substantial chromosomal damage had occurred during isolation, we would expect greater variation between experiments, as different amounts or types of damage could influence the results. 

      (2) Some control data needs to be included.

      We used wild-type prophase I spermatocytes and metaphase I (MI) oocytes as controls. To validate our findings, we compared some of our results with those reported in a previous study and observed consistent outcomes (Biggs et al., 2020).

      (3) The paper was not well-written, particularly the Introduction section.

      We have revised the paper and improved the overall quality of the manuscript.

      (4) How intact were the measured chromosomes? Although the structural preservation of the chromosomes is essential for this kind of measurement, the meiotic chromosomes were isolated in PBS with Triton X-100 and measured at room temperature. It is known that chromosomes are very sensitive to cation concentrations and macromolecular crowding in the environment (PMID: 29358072, 22540018, 37986866). It would be better to discuss this point.

      As suggested, we investigated the impact of PBS and Triton X-100 on chromosome stiffness. Our findings indicate that neither PBS nor Triton X-100 caused significant changes in chromosome stiffness (Supplemental Fig. 5).

      Recommendations For The Authors:

      Major points of Reviewers that the Editor indicated should be addressed

      (1) Reviewer's point 3, the effect of the high concentration of etoposide: It would be advisable to use lower concentrations of etoposide to observe the effect of DNA damage on chromosome stiffness more accurately.

      The effect of etoposide on oocyte is dose-dependent (Collins et al., 2015). Oocytes are generally not highly sensitive to DNA damage, and even at relatively high concentrations, not all may exhibit a response. To ensure that sufficient DNA damage in the oocytes we isolated, we used relatively high concentration of etoposide for the experiment. This concentration (50 μg/ml) falls within the typical range reported in the literature (Marangos and Carroll, 2012)(Cai et al., 2023)(Lee et al., 2023). As the reviewer suggested, we tested two additional lower concentrations of etoposide (5 μg/ml and 25 μg/ml) (see Fig. 5 C). We did not observe any significant differences in chromosome stiffness in 5 µg/ml etoposide-treated oocytes compared to the control. However, higher concentrations of etoposide (25 μg/ml) significantly reduced oocyte chromosome stiffness compared to the control.

      Revision to manuscript:

      “Results at lower etoposide concentrations revealed that chromosome stiffness in untreated control oocytes was not significantly different from that in oocytes treated with 5 μg/ml etoposide (3780 ± 700 Pa versus 3930 ± 400 Pa, P = 0.8624). However, chromosome stiffness in untreated oocytes was significantly higher than that in oocytes treated with 25 μg/ml etoposide (3780 ± 700 Pa versus 1640 ± 340 Pa, P = 0.015) (Figure 5C).”

      (2) Reviewer's point 3, the effect of Triton X-100: This is related to the concern of the #3 reviewer. It is critical to check whether the detergent does not affect the stiffness indirectly or not.

      To demonstrate that the low concentration of Triton X-100 does not influence chromosome stiffness, we conducted additional experiments. First, we isolated chromosomes and measured their stiffness. Then, we treated the chromosomes with 0.05% Triton X-100 via micro-spraying and remeasured the stiffness. The results showed no significant difference (see Supplement Fig. 5 right panel).

      Revision to manuscript:

      “In addition to past experiments indicating that mitotic chromosomes are stable for long periods after their isolation (Pope et al., 2006), we carried out control experiments on mouse oocyte chromosomes where we incubated them for 1 hour in PBS, or exposed them to a flow of Triton X-100 solution for 10 minutes; there was no change in chromosome stiffness in either case (Methods and Supplementary Fig. 5).”

      (3) Reviewer's point 1, the effect of the buffer composition: Please describe how the composition affects the stiffness of the chromosomes.

      PBS is an economical and effective buffer solution that closely mimics the osmotic conditions of the cytoplasm, which is crucial for maintaining chromosomal structural integrity. Appropriate ion concentrations are crucial for preserving chromosome integrity, as imbalances—either too high or too low—can alter chromosome morphology (Poirier and Marko, 2002). When chromosomes are stored in PBS, their stiffness remains relatively stable, even with prolonged exposure, ensuring minimal changes to their physical properties. To confirm this, we isolated chromosomes and measured their stiffness. After one-hour incubation in PBS, we remeasured stiffness and observed no significant differences, which demonstrated that chromosomes remain stable in PBS (see Supplement Fig.5 left panel).

      Revision to manuscript:

      “In this study, we developed a new way to isolate meiotic chromosomes and measure their stiffness. However, one concern is that the measurements were conducted in PBS solution, which is different from the intracellular environment. To address this, we monitored chromosome stiffness overtime in PBS solution and found that it remained stable over a period of one hour (Supplement Fig. 5 Left panel).”

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      (1) Previously, the role of condensin complexes in chromosome stiffness is shown (Sun et al. Chromosome Research, 2018). Thus, at least the authors described the condensin staining on MI and MII chromosomes.

      We have added sentences in the discussion to elaborate on the role of condensin.

      Revision to manuscript:

      “Several factors, including condensin, have been found to affect chromosome stiffness (Sun et al., 2018). Condensin exists in two distinct complexes, condensin I and condensin II, and both are active during meiosis. Published studies indicate that condensin II is more sharply defined and more closely associated with the chromosome axis from anaphase I to metaphase II (Lee et al., 2011). Additionally, condensin II appears to play a more significant role in mitotic chromosome mechanics compared to condensin I (Sun et al., 2018). Thus, condensin II likely contributes more significantly to meiotic chromosome stiffness than condensin I.”

      (2) Although the authors nicely showed the difference in the stiffness between MI and MII chromosomes (Figure 2), as known, MI chromosomes are bivalent (with four chromatids) while MII chromosomes are univalent (with two chromatids). The physical property of the chromosomes would be affected by the number of chromatids. It would be essential for the authors to measure the physical properties of a univalent of MI chromosomes from mice defective in meiotic recombination such as Spo11 and/or Mlh3 KO mice.

      The reviewer correctly pointed out that the number of chromatids in chromosomes differs between metaphase I (MI) and metaphase II (MII) stages. We have addressed this difference by calculating Young’s modulus (E), a mechanical property that describes the elasticity of a material, independent of its geometry. Young’s modulus describes the intrinsic properties of the material itself, rather than the specific characteristics of the object being tested. It is calculated as E=(F/A)/(∆L/L0), where F was the force given to stretch the chromosome, A was the cross-section area, ∆L was the length change of the chromosome, and L0 was the original length of the chromosome. While an increase in chromosome or chromatid numbers, results in a larger cross-sectional area, leading to a higher doubling force (F). This variation in chromosome number or cross-sectional area does not impact the calculation of chromosome stiffness/Young’s modulus (E). While study of the mutants suggested by the referee would certainly be interesting, it would be likely that the absence of these key recombination factors would impact chromosome stiffness in a more complex way than just changing their thickness; this type of study is beyond the scope of the present manuscript and is an exciting direction for future studies.

      (3) In Figure 5, the authors measure the stiffness of etoposide-treated MI chromosomes. The concentration of the drug was 50 ug/ml, which is very high. The authors should analyze the different concentrations of the drug to check the chromosome stiffness. Moreover, etoposide is an inhibitor of Topoisomerase II. The effect of the drug might be caused by the defective Top2 activity, rather than Top2-adducts, thus DNA damage. It is very important to check the other Top2 inhibitors or DNA-damaging agents to generalize the effect of DNA damage on chromosome stiffness. Moreover, DNA damage induces the DNA damage response. It is important to check the effect of DDR inhibitors on the damage-induced change of stiffness.

      The reviewer is correct in noting that etoposide can induce DNA damage and inhibit Top2 activity. To address this concern, our previous DNase experiment provided further clarity and supports our results of this study (Biggs et al., 2020). This experiment was conducted in vitro, where DNase treatment caused DNA damage on chromosomes without affecting Top2 activity or triggering DNA damage response. The results demonstrated that DNase treatment led to reduced chromosome stiffness, which aligns with the findings presented in our manuscript.

      (4) In the same line as the #3 point, the authors also need to check the effect of etoposide on the stiffness of mitotic chromosomes from MEF.

      Experiments on MEF mitotic chromosomes were designed to serve as a reference for the meiotic chromosome studies. The etoposide experiments on meiotic chromosomes specifically aimed to investigate how DNA damage affects meiotic chromosome structure. While it would be interesting to explore the effects of etoposide-induced DNA damage on mitotic chromosomes, it represents a distinct research question that falls outside the scope of the current study.

      Minor points:

      (1) Line 141-142: Previous studies by the author analyzed the stiffness of mitotic chromosomes from pro-metaphase. Which stage of cell cycles did the authors analyze here?

      To ensure consistency in our experiments, we also measured the stiffness of mitotic chromosomes at the prometaphase stage. The precise stage used is very near to metaphase, at the very end of the prometaphase stage. We have modified the manuscript to clarify this point.

      Revision to manuscript:

      “For comparison with the meiotic case, we measured the chromosome stiffness of Mouse Embryonic Fibroblasts (MEFs) at late pro-metaphase (just slightly before their attachment to the mitotic spindle) and found that the average Young’s modulus was 340 ± 80 Pa (Figure 2B). The value is consistent with our previously published data, where the modulus for MEFs was measured to be 370 ± 70 Pa (Biggs et al., 2020).”

      (2) Line 157: Here, the doubling force of MI (and MII) oocytes should be described in addition to those of spermatocytes.

      The purpose of this paragraph is to demonstrate the reproductivity and consistency of our experiments. In this section, we compared our data with previously published findings. Published data do not include chromosome stiffness measurement from MI mouse oocytes. Our experiment is the first to assess this. Therefore, we did not include MI mouse oocytes in that comparison. To clarify this, we have added sentences to highlight the comparison of doubling force.

      Revision to manuscript:

      “Here, we found that the doubling forces of chromosomes from MI and MII oocytes are 3770 ± 940 pN and 510 ± 50 pN, respectively. We conclude that chromosomes from MI oocytes are much stiffer than those from both mitotic cells and MII oocytes (Supplement Fig. 2), in terms of either Young’s modulus or doubling force.”

      (3) Line 202: What stage of prophase I do the authors mean by the spermatocyte stage here? Diakinesis, Metaphase I or prometaphase I? I am not sure how the authors can determine a specific stage of prophase I by only looking at the thickness of the chromosomes. Please show the thickness distribution of WT and Rec8<sup>-/-</sup> chromosomes.

      We have reworded the sentence and clarified that the spermatocyte stage is prophase I stage. Since Rec8<sup>-/-</sup> spermatocytes cannot progress beyond the pachytene stage of prophase I, the isolated chromosomes must be in prophase I rather than diakinesis, metaphase I, prometaphase I, or any later stages (Xu et al., 2005). Based on the cell size and degree of chromosome condensation (Biggs et al., 2020), it is most likely that the measured chromosomes are at the zygotene-like stage. However, as we cannot definitively determine the exact substage of prophase I, thus, we have referred to them simply as prophase I.

      Revision to manuscript:

      “We isolated chromosomes from Rec8<sup>-/-</sup> prophase I spermatocytes, which displayed large and round cell size and thick chromosomal threads, indicative of advanced chromosome compaction after stalling at a zygotene-like prophase I stage (Supplement Fig. 3). The combination of large cell size and degree of chromosome compaction allowed us to reliably identify Rec8<sup>-/-</sup> prophase I chromosomes. Using micromanipulation, we measured chromosome stiffness by stretching the chromosomes (Supplement Fig. 3) (Biggs et al., 2019).”

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 135: that statement is not substantiated; better to show retraction data and full reversibility.

      We added a figure showing oocyte chromosome stretching, which showed that the oocyte chromosome is elastic, and that the stretching process is reversible (Supplement Fig.1).

      (2) Line 144: the authors claim that the Young Modulus of MII oocytes is "slightly" higher than that of mitotic cells (MEFs). Well, "slightly" means it is rather similar, and therefore the commonly used statement that MII is similar to mitosis is OK - contrary to the authors' claim.

      We have removed the word “slightly” in the manuscript. The difference is statistically significant.

      Revision to manuscript:

      “Surprisingly, despite this reduction, the stiffness of MII oocyte chromosomes was still significantly higher than that for mitotic cells (Figure 2B).”

      (3) There are a lot of awkward sentences in this text. Some sentences lack words, are not sufficiently precise in wording and/or logic, and there are numerous typos. Some examples can be found in lines 89 (grammar), 94, 95 ("looked"), 98, 101 ("difference" - between what?), and some are commonplaces or superficial (lines 92/93, 120..., ). Occasionally the present and past tense are mixed (e.g. in M&M). Thus the manuscript is quite poorly written.

      Thanks for the comments of the reviewer. We have revised all the sentences highlighted by the reviewer and polished the entire manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) Line 48. "We then investigated the contribution of meiosis-specific cohesin complexes to chromosome stiffness in MI and MII oocytes." There is no data on oocytes with meiosis-specific cohesin KO. This part should be corrected.

      We have corrected this error.

      Revision to manuscript:

      “We examined the role of meiosis-specific cohesin complexes in regulating chromosome stiffness.”

      (2) Lines 155-157. The result of MI mouse oocyte chromosomes should also be mentioned here (Supplementary Figure 1).

      Please see our response to Reviewer 1 – Minor Point 2.

      (3) Line 163. "The stiffness of chromosomes in MI mouse oocytes is significantly higher compared to MII oocytes."<br /> Is this because two homologs are paired in MI chromosomes (but not in MII chromosomes)? The authors may want to discuss the possible mechanism.

      Please see our response to Reviewer 1 – Major Point 2.

      (4) Line 188: "We hypothesized that MI oocytes... would have higher chromosome stiffness than MII oocytes." Why did the authors measure chromosomes from spermatocytes but not MI oocytes?

      Both spermatocytes and oocytes from Rec8<sup>-/-</sup>, Stag3<sup>-/-</sup>, and Rad21l<sup>-/-</sup> mutant mice cannot reach MI stage because cohesin proteins are crucial for germline-cell development. We chose to use spermatocytes in our study because collecting fetal meiotic oocytes is extremely difficult, and genotyping fetal mice adds another layer of complexity to the experiments. In females, all oocytes complete prophase I and progress to the dictyotene stage during the fetal stage. Obtaining individual oocytes at this stage is challenging. In contrast, spermatocytes are continuously generated at all stages in males.

      (5) To support the authors' conclusion, verifying the KO of REC8, STAG3, and RAD21L by immunostaining or other methods is essential.

      These mice are provided by one of the authors, Dr. Philip Jordan, who has published several papers using these knockout mice (Hopkins et al., 2014)(Ward et al., 2016). The immunostaining of these models has already been well-characterized in those previous studies. In addition to performing double genotyping, we also use the size of the collected testes as an additional verification of the mutant genotype. These knockout mice have significantly smaller testes compared to their wild-type counterparts, providing a clear physical indicator of the mutation.

      (6) Some of the cited papers and descriptions in the Introduction are not appropriate and confusing. This part should be improved:

      Line 79. Recent studies have revealed that the 30-nm fiber is not considered the basic structure of chromatin (e.g., review, PMID: 30908980; original papers, PMID: 19064912, 22343941, 28751582). This point should be included.

      We have corrected the references as needed. Additionally, thank you for the updated information regarding the 30-nm fiber. We have removed all the descriptions about the 30-nm fiber to ensure the information is accurate and up to date.

      (7) Line 83. Reviews on mitotic chromosomes, rather than Ref. 9, should be cited here. For instance, PMID: 33836947, 31230958.

      We have corrected it and added references according to the review’s suggestion.

      (8) Line 85. Refs. 10 and 11 are not on the "Scaffold/Radial-Loop" model. For instance, PMID: 922894, 277351, 12689587. The other popular model is the hierarchical helical folding model (PMID: 98280, 15353545).

      We have corrected it and added appropriate references according to the review’s suggestion. Regarding the hierarchical helical folding model, our experiments do not provide data that either support or refute this model. Thus, we have opted not to include any discussion of this model in our manuscript.

      (9) Figure legends. There is no description of the statistical test.

      We have added the description of the statistical test at the end of the figure legends for clarity.

      (10) Line 156. The authors should mention which stages in spermatocyte prophase I (pachytene?) were used for their measurement.

      We cannot precisely determine the substage of prophase I in the spermatocytes although it is most likely in the pachytene stage.

      (11) Line 241. "DNA damage reduces chromosome stiffness in oocytes." It would be better to show how much damage was induced in aged and etoposide-treated chromosomes, for example, by gamma-H2AX immunostaining. In addition, there are some papers that show DNA damage makes chromatin/chromosomes softer (e.g., PMID: 33330932). The authors need to cite these papers.

      The effects of etoposide and age on meiotic oocytes has been published (Collins et al., 2015)(Marangos et al., 2015)(Winship et al., 2018).

      We are grateful for the citation information provided by the reviewer and have added it to our manuscript.

      Revision to manuscript:

      “Overall, these findings suggest that DNA damage reduces chromosome stiffness in oocytes instead of increasing it, which aligns with other studies showing that DNA damage can make chromosomes softer (Dos Santos et al., 2021). These results suggest that the increased chromosome stiffness observed in aged oocytes is not due to DNA damage.”

      (12) Line 328. Senescence?

      This error is corrected in the revised manuscript.

      Revision to manuscript:

      “Defective chromosome organization is often related to various diseases, such as cancer, infertility, and senescence (Thompson and Compton, 2011; Harton and Tempest, 2012; He et al., 2018).”

      References:

      Biggs, R., P.Z. Liu, A.D. Stephens, and J.F. Marko. 2019. Effects of altering histone posttranslational modifications on mitotic chromosome structure and mechanics. Mol. Biol. Cell. 30:820–827. doi:10.1091/mbc.E18-09-0592.

      Biggs, R.J., N. Liu, Y. Peng, J.F. Marko, and H. Qiao. 2020. Micromanipulation of prophase I chromosomes from mouse spermatocytes reveals high stiffness and gel-like chromatin organization. Commun. Biol. 3:1–7. doi:10.1038/s42003-020-01265-w.

      Cai, X., J.M. Stringer, N. Zerafa, J. Carroll, and K.J. Hutt. 2023. Xrcc5/Ku80 is required for the repair of DNA damage in fully grown meiotically arrested mammalian oocytes. Cell Death Dis. 14:1–9. doi:10.1038/s41419-023-05886-x.

      Collins, J.K., S.I.R. Lane, J.A. Merriman, and K.T. Jones. 2015. DNA damage induces a meiotic arrest in mouse oocytes mediated by the spindle assembly checkpoint. Nat. Commun. 6. doi:10.1038/ncomms9553.

      Harton, G.L., and H.G. Tempest. 2012. Chromosomal disorders and male infertility. Asian J. Androl. 14:32–39. doi:10.1038/aja.2011.66.

      He, Q., B. Au, M. Kulkarni, Y. Shen, K.J. Lim, J. Maimaiti, C.K. Wong, M.N.H. Luijten, H.C. Chong, E.H. Lim, G. Rancati, I. Sinha, Z. Fu, X. Wang, J.E. Connolly, and K.C. Crasta. 2018. Chromosomal instability-induced senescence potentiates cell non-autonomous tumourigenic effects. Oncogenesis. 7. doi:10.1038/s41389-018-0072-4.

      Hopkins, J., G. Hwang, J. Jacob, N. Sapp, R. Bedigian, K. Oka, P. Overbeek, S. Murray, and P.W. Jordan. 2014. Meiosis-Specific Cohesin Component, Stag3 Is Essential for Maintaining Centromere Chromatid Cohesion, and Required for DNA Repair and Synapsis between Homologous Chromosomes. PLoS Genet. 10:e1004413. doi:10.1371/journal.pgen.1004413.

      Lee, C., J. Leem, and J.S. Oh. 2023. Selective utilization of non-homologous end-joining and homologous recombination for DNA repair during meiotic maturation in mouse oocytes. Cell Prolif. 56:1–12. doi:10.1111/cpr.13384.

      Lee, J., S. Ogushi, M. Saitou, and T. Hirano. 2011. Condensins I and II are essential for construction of bivalent chromosomes in mouse oocytes. Mol. Biol. Cell. 22:3465–3477. doi:10.1091/mbc.E11-05-0423.

      Marangos, P., and J. Carroll. 2012. Oocytes progress beyond prophase in the presence of DNA damage. Curr. Biol. 22:989–994. doi:10.1016/j.cub.2012.03.063.

      Marangos, P., M. Stevense, K. Niaka, M. Lagoudaki, I. Nabti, R. Jessberger, and J. Carroll. 2015. DNA damage-induced metaphase i arrest is mediated by the spindle assembly checkpoint and maternal age. Nat. Commun. 6:1–10. doi:10.1038/ncomms9706.

      Poirier, M.G., and J.F. Marko. 2002. Mitotic chromosomes are chromatin networks without a mechanically contiguous protein scaffold. Proc. Natl. Acad. Sci. U. S. A. 99:15393–15397. doi:10.1073/pnas.232442599.

      Pope, L.H., C. Xiong, and J.F. Marko. 2006. Proteolysis of Mitotic Chromosomes Induces Gradual and Anisotropic Decondensation Correlated with a Reduction of Elastic Modulus and Structural Sensitivity to Rarely Cutting Restriction Enzymes. Mol. Biol. Cell. 17:104. doi:10.1091/MBC.E05-04-0321.

      Dos Santos, Á., A.W. Cook, R.E. Gough, M. Schilling, N.A. Olszok, I. Brown, L. Wang, J. Aaron, M.L. Martin-Fernandez, F. Rehfeldt, and C.P. Toseland. 2021. DNA damage alters nuclear mechanics through chromatin reorganization. Nucleic Acids Res. 49:340–353. doi:10.1093/nar/gkaa1202.

      Sun, M., R. Biggs, J. Hornick, and J.F. Marko. 2018. Condensin controls mitotic chromosome stiffness and stability without forming a structurally contiguous scaffold. Chromosom. Res. 26:277–295. doi:10.1007/s10577-018-9584-1.

      Thompson, S.L., and D.A. Compton. 2011. Chromosomes and cancer cells. Chromosom. Res. 19:433–444. doi:10.1007/s10577-010-9179-y.

      Ward, A., J. Hopkins, M. Mckay, S. Murray, and P.W. Jordan. 2016. Genetic Interactions Between the Meiosis-Specific Cohesin Components, STAG3, REC8, and RAD21L. G3 (Bethesda). 6:1713–24. doi:10.1534/g3.116.029462.

      Winship, A.L., J.M. Stringer, S.H. Liew, and K.J. Hutt. 2018. The importance of DNA repair for maintaining oocyte quality in response to anti-cancer treatments, environmental toxins and maternal ageing. Hum. Reprod. Update. 24:119–134. doi:10.1093/humupd/dmy002.

      Xu, H., M.D. Beasley, W.D. Warren, G.T.J. van der Horst, and M.J. McKay. 2005. Absence of Mouse REC8 Cohesin Promotes Synapsis of Sister Chromatids in Meiosis. Dev. Cell. 8:949–961. doi:10.1016/j.devcel.2005.03.018.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:<br /> (1) I still think that the authors need to set the importance of the differences in aggregation in the context of toxicity arising from protein misfolding/aggregation. While the authors state the limitation in the response, and I agree that a single manuscript cannot complete a field of investigation I still think that this is an important point missing from this manuscript.

      We thank the reviewer for the comments, we are working to address this issue and will elucidate in our future studies.

      (2) I retain my reservations about the fluorescence intensity data shown for Rho123, DCF, Jc1, and MitoSox. The errors are much lower than what we typically achieve in biological experiments in our as well as our collaborator's lab. A glimpse at published literature would also support our statement. Specifically, RHO123 shows a large difference in errors between Figure 5 and Figure 5 Supplement 2. The point to note is that the absolute intensities do not vary between these figures, but the errors are the order of magnitude lower in the main figures. I, therefore, accept these figures in good faith without further interrogation.

      We really value these comments from the reviewer and also do not want to cause any potential misleading interpretations of the data. We have therefore asked a more experienced author to redo all the experiments on the physiological indicators (Rho123, JC1 and MitoSox) that directly reflect mitochondrial function, and left out the DCF data. The new experimental data are in line with our previous results. We have clearly described these changes in the Results, Materials and Methods and Figure legends sections.

      The new data from the redo experiments are: Rho123 fluorescence intensity data in Figure 5A, B and C; Figure 6B; JC1 staining in Figure 6E; JC1 staining in Figure 7A, B and D.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper introduces a new approach to modeling human behavioral responses using image-computable models. They create a model (VAM) that is a combination of a standard CNN coupled with a standard evidence accumulation model (EAM). The combined model is then trained directly on image-level data using human behavioral responses. This approach is original and can have wide applicability. However, many of the specific findings reported are less compelling.

      Strengths:

      (1) The manuscript presents an original approach to fitting an image-computable model to human behavioral data. This type of approach is sorely needed in the field.

      (2) The analyses are very technically sophisticated.

      (3) The behavioral data are large both in terms of sample size (N=75) and in terms of trials per subject.

      Weaknesses:

      Major

      (1) The manuscript appears to suggest that it is the first to combine CNNs with evidence accumulation models (EAMs). However, this was done in a 2022 preprint

      (https://www.biorxiv.org/content/10.1101/2022.08.23.505015v1) that introduced a network called RTNet. This preprint is cited here, but never really discussed. Further, the two unique features of the current approach discussed in lines 55-60 are both present to some extent in RTNet. Given the strong conceptual similarity in approach, it seems that a detailed discussion of similarities and differences (of which there are many) should feature in the Introduction.

      Thanks for pointing this out—we agree that the novel contributions of our model (the VAM) with respect to prior related models (including RTNet) should be clarified, and have revised the Introduction accordingly. We include the following clarifications in the Introduction:

      “The key feature of the VAM that distinguishes it from prior models is that the CNN and EAM parameters are jointly fitted to the RT, choice, and visual stimulus data from individual participants in a unified Bayesian framework. Thus, both the visual representations learned by the CNN and the EAM parameters are directly constrained by behavioral data. In contrast, prior models first optimize the CNN to perform the behavioral task, then separately fit a minimal set of high-level CNN parameters [RTNet, Rafiei et al., 2024] and/or the EAM parameters to behavioral data [Annis et al., 2021; Holmes et al., 2020; Trueblood et al., 2021]. As we will show, fitting the CNN with human data—rather than optimizing the model to perform a task—has significant consequences for the representations learned by the model.”

      E.g. in the case of RTNet, the variability of the Bayesian CNN weight distribution, the decision threshold, and the magnitude of the noise added to the images are adjusted to match the average human accuracy (separately for each task condition). RTNet is an interesting and useful model that we believe has complementary strengths to our own work.

      Since there are several other existing models in addition to the VAM and RTNet that use CNNs to generate RTs or RT proxies (by our count, at least six that we cite earlier in the Introduction), we felt it was inappropriate to preferentially include a detailed comparison of the VAM and RTNet beyond the passage quoted above.

      (2) In the approach here, a given stimulus is always processed in the same way through the core CNN to produce activations v_k. These v_k's are then corrupted by Gaussian noise to produce drift rates d_k, which can differ from trial to trial even for the same stimulus. In other words, the assumption built into VAM appears to be that the drift rate variability stems entirely from post-sensory (decisional) noise. In contrast, the typical interpretation of EAMs is that the variability in drift rates is sensory. This is also the assumption built into RTNet where the core CNN produces noisy evidence. Can the authors comment on the plausibility of VAM's assumption that the noise is post-sensory?

      In our view, the VAM is compatible with a model in which the drift rate variability for a given stimulus is due to sensory noise, since we do not specify the origin of the Gaussian noise added to the drift rates. As the reviewer notes, the CNN component of the VAM processes a given stimulus deterministically, yielding the mean drift rates. This does not preclude us from imagining an additional (unmodeled) sensory process that adds variability to the drift rates. The VAM simply represents this and other hypothetical sources of variability as additive Gaussian noise. We agree however that it is worthwhile to think about the origin of the drift rate variability, though it is not a focus of our work.

      (3) Figure 2 plots how well VAM explains different behavioral features. It would be very useful if the authors could also fit simple EAMs to the data to clarify which of these features are explainable by EAMs only and which are not.

      In our view, fitting simple EAMs to the data would not be especially informative and poses a number of challenges for the particular task we study (LIM) that are neatly avoided by using the VAM. In particular, as we show in Figure 2, the stimuli vary along several dimensions that all appear to influence behavior: horizontal position, vertical position, layout, target direction, and flanker direction. Since the VAM is stimulus-computable, fitting the VAM automatically discovers how all of these stimulus features influence behavior (via their effect on the drift rates outputted by the CNN). In contrast, fitting a simple EAM (e.g. the LBA model) necessitates choosing a particular parameterization that specifies the relationship between all of the stimulus features and the EAM model parameters. This raises a number of practical questions. For example, should we attempt to fit a separate EAM for each stimulus feature, or model all stimulus features simultaneously?

      Moreover, while we could in principle navigate these issues and fit simple EAMs to the data, we do not intend to claim that simple EAMs fail to explain the relationship between stimulus features and behavior as well as the VAM. Rather, the key strength of the VAM relative to simple EAMs is that it includes a detailed and biologically plausible model of human vision. The majority of the paper capitalizes on this strength by showing how behavioral effects of interest (namely congruency effects) can be explained in terms of the VAM’s visual representations.

      (4) VAM is tested in two different ways behaviorally. First, it is tested to what extent it captures individual differences (Figure 2B-E). Second, it is tested to what extent it captures average subject data (Figure 2F-J). It wasn't clear to me why for some metrics only individual differences are examined and for other metrics only average human data is examined. I think that it will be much more informative if separate figures examine average human data and individual difference data. I think that it's especially important to clarify whether VAM can capture individual differences for the quantities plotted in Figures 2F-J.

      We would like to clarify that Fig. 2J in fact already shows how well the VAM captures individual differences for the average subject data shown in Fig. 2H (stimulus layout) and Fig. 2I (stimulus position). For a given participant and stimulus feature, we calculated the Pearson's r between model/participant mean RTs across each stimulus feature value. Fig. 2J shows the distribution of these Pearson’s r values across all participants for stimulus layout and horizontal/vertical position.

      Fig. 2G also already shows how well the VAM captures individual differences in behavior. Specifically, this panel shows individual differences in mean RT attributable to differences in age. For Fig. 2F, which shows how the model drift rates differ on congruent vs. incongruent trials, there is no sensible way to compare the models to the participants at any level of analysis (since the participants do not have drift rates). 

      (5) The authors look inside VAM and perform many exploratory analyses. I found many of these difficult to follow since there was little guidance about why each analysis was conducted. This also made it difficult to assess the likelihood that any given result is robust and replicable. More importantly, it was unclear which results are hypothesized to depend on the VAM architecture and training, and which results would be expected in performance-optimized CNNs. The authors train and examine performance-optimized CNNs later, but it would be useful to compare those results to the VAM results immediately when each VAM result is first introduced.

      Thanks for pointing this out—we apologize for any confusion caused by our presentation of the CNN analyses. We have added in additional motivating statements, methodological clarifications, and relevant references to our Results, particularly for Figure 3 in which we first introduce the analyses of the CNN representations/activity. In general, each analysis is prefaced by a guiding question or specific rationale, e.g. “How do the models' visual representations enable target selectivity for stimuli that vary along several irrelevant dimensions?” We also provide numerous references in which these analysis techniques have been used to address similar questions in CNNs or the primate visual cortex.

      We chose to maintain the current organization of our results in which the comparison between the VAM and the task-optimized models are presented in a separate figure. We felt that including analyses of both the VAM and task-optimized models in the initial analyses of the CNN representations would be overwhelming for many readers. As the reviewer acknowledges, some readers may already find these results challenging to follow. 

      (6) The authors don't examine how the task-optimized models would produce RTs. They say in lines 371-2 that they "could not examine the RT congruency effect since the task-optimized models do not generate RTs." CNNs alone don't generate RTs, but RTs can easily be generated from them using the same EAM add-on that is part of VAM. Given that the CNNs are already trained, I can't see a reason why the authors can't train EAMs on top of the already trained CNNs and generate RTs, so these can provide a better comparison to VAM.

      We appreciate this suggestion, but we judge the suggestion to “train EAMs on top of the already trained CNNs and generate RTs” to be a significant expansion of the scope of the paper with multiple possible roads forward. In particular, one must specify how the outputs of the task-optimized CNN (logits for each possible response) relate to drift rates, and there is no widely-accepted or standard way to do this. Previously proposed methods include transforming representation distances in the last layer to drift rates (https://doi.org/10.1037/xlm0000968), fitting additional subject-specific parameters that map the logits to drift rates

      (https://doi.org/10.1007/s42113-019-00042-1), or using the softmax-scored model outputs as drift rates directly (https://doi.org/10.1038/s41562-024-01914-8), though in the latter case the RTs are not on the same scale as human data. In our view, evaluating these different methods is beyond the scope of this paper. An advantage of the VAM is that one does not have to fit two separate models (a CNN and a EAM) to generate RTs.

      Nonetheless, we agree that it would be informative to examine something like RTs in the task-optimized models. Our revised Results section now includes an analysis of the confidence of the task-optimized models’ decisions, which we use a proxy for RTs:   

      “Since the task-optimized models do not generate RTs, it is not possible to directly measure RT congruency effects in these models without making additional assumptions about how the CNN's classification decisions relate to RTs. However, as a coarse proxy for RT, we can examine the confidence of the CNN's decisions, defined as the softmax-scored logit (probability) of the most probable direction in the final CNN layer. This choice of RT proxy is motivated by some prior studies that have combined CNNs with EAMs [Annis et al., 2021; Holmes et al., 2020; Trueblood et al., 2021]. These studies explicitly or implicitly derive a measure of decision confidence from the activity of the last CNN layer. The confidence measure is then mapped to the EAM drift rates, such that greater decision confidence generally corresponds to higher drift rates (and therefore shorter RTs).

      We calculated the average confidence of each task-optimized CNN separately for congruent vs. incongruent trials. On average, the task-optimized models showed higher confidence on congruent vs. incongruent trials (W = 21.0, p < 1e-3, Wilcoxon signed-rank test; Cohen's d = 0.99; n = 75 models). These analyses therefore provide some evidence that task-optimized CNNs have the capacity to exhibit congruency effects, though an explicit comparison of the magnitude of these effects with human data requires additional modeling assumptions (e.g., fitting a separate EAM).”

      (7) The Discussion felt very long and mostly a summary of the Results. I also couldn't shake the feeling that it had many just-so stories related to the variety of findings reported. I think that the section should be condensed and the authors should be clearer about which explanations are speculations and which are air-tight arguments based on the data.

      We have shortened the Discussion modestly and we have added in some clarifying language to help clarify which arguments are more speculative vs. directly supported by our data.

      Specifically, we added in the phrase “we speculate that…” for two suggestions in the Discussion (paragraphs 3 and 5), and we ensured that any other more speculative suggestions contain such clarifying language. We have also added in subheadings in the Discussion to help readers navigate this section. 

      (8) In one of the control analyses, the authors train different VAMs on each RT quantile. I don't understand how it can be claimed that this approach can serve as a model of an individual's sensory processing. Which of the 5 sets of weights (5 VAMs) captures a given subject's visual processing? Are the authors saying that the visual system of a given subject changes based on the expected RT for a stimulus? I feel like I'm missing something about how the authors think about these results.

      We agree that these particular analyses may cause confusion and have removed them from our revised manuscript.

      Reviewer #2 (Public Review):

      In an image-computable model of speeded decision-making, the authors introduce and fit a combined CCN-EAM (a 'VAM') to flanker-task-like data. They show that the VAM can fit mean RTs and accuracies as well as the congruency effect that is present in the data, and subsequently analyze the VAM in terms of where in the network congruency effects arise.

      Overall, combining DNNs and EAMs appears to be a promising avenue to seriously model the visual system in decision-making tasks compared to the current practice in EAMs. Some variants have been proposed or used before (e.g., doi.org/10.1016/j.neuroimage.2017.12.078 , doi.org/10.1007/s42113-019-00042-1), but always in the context of using task-trained models, rather than models trained on behavioral data. However, I was surprised to read that the authors developed their model in the context of a conflict task, rather than a simpler perceptual decision-making task. Conflict effects in human behavior are particularly complex, and thereby, the authors set a high goal for themselves in terms of the to-be-explained human behavior. Unfortunately, the proposed VAM does not appear to provide a great account of conflict effects that are considered fundamental features of human behavior, like the shape of response time distributions, and specifically, delta plots (doi.org/10.1037/0096-1523.20.4.731). The authors argue that it is beyond the scope of the presented paper to analyze delta plots, but as these are central to studies of human conflict behavior, models that aim to explain conflict behavior will need to be able to fit and explain delta plots.

      Theories on conflict often suggest that negative/positive-trending delta plots arise through the relative timing of response activation related to relevant and irrelevant information.

      Accumulation for relevant and irrelevant information would, as a result, either start at different points in time or the rates vary over time. The current VAM, as a feedforward neural network model, does not appear to be able to capture such effects, and perhaps fundamentally not so: accumulation for each choice option is forced to start at the same time, and rates are a static output of the CNN.

      The proposed solution of fitting five separate VAMs (one for each of five RT quantiles) is not satisfactory: it does not explain how delta plots result from the model, for the same reason that fitting five evidence accumulation models (one per RT quantile) does not explain how response time distributions arise. If, for example, one would want to make a prediction about someone's response time and choice based on a given stimulus, one would first have to decide which of the five VAMs to use, which is circular. But more importantly, this way of fitting multiple models does not explain the latent mechanism that underlies the shape of the delta plots.

      As such, the extensive analyses on the VAM layers and the resulting conclusions that conflict effects arise due to changing representations across layers (e.g., "the selection of task-relevant information occurs through the orthogonalization of relevant and irrelevant representations") - while inspiring, they remain hard to weigh, as they are contingent on the assumption that the VAM can capture human behavior in the conflict task, which it struggles with. That said, the promise of combining CNNs and EAMs is clearly there. A way forward could be to either adjust the proposed model so that it can explain delta plots, which would potentially require temporal dynamics and time-varying evidence accumulation rates, or perhaps to start simpler and combine CCNs-EAMs that are able to fit more standard perceptual decision-making tasks without conflict effects.

      We thank the reviewer for their thoughtful comments on our work. However, we note that the

      VAM does in fact capture the positive-trending RT delta plot observed in the participant data (Fig. S4A), though the intercepts for models/participants differ somewhat. On the other hand, the conditional accuracy functions (Fig. S4B) reveal a more pronounced difference between model and participant behavior. As the reviewer points out, capturing these effects is likely to require a model that can produce time-varying drift rates, whereas our model produces a fixed drift rate for a given stimulus. We also agree that fitting a separate VAM to each RT quantile is not a satisfactory means of addressing this limitation and have removed these analyses from our revised manuscript.

      However, while we agree that accurately capturing these dynamic effects is a laudable goal, it is in our view also worthwhile to consider explanations for the mean behavioral effect (i.e. the accuracy congruency effect), which can occur independently of any consideration of dynamics. One of our main findings is that across-model variability in accuracy congruency effects is better attributed to variation in representation geometry (target/flanker subspace alignment) vs.

      variation in the degree of flanker suppression. This finding does not require any consideration of dynamics to be valid at the level of explanation we pursue (across-user variability in congruency effects), but also does not preclude additional dynamic processes that could give rise to more specific error patterns. Our revised discussion now includes a section where we summarize and elaborate on these ideas:

      “It is not difficult to imagine how the orthogonalization mechanism described above, which explains variability in accuracy congruency effects across individuals, could act in concert with other dynamic processes that explain variability in congruency effects within individuals (e.g., as a function of RT). In general, any process that dynamically gates the influence of irrelevant sensory information on behavioral outputs could accomplish this, for example ramping inhibition of incorrect response activation [https://doi.org/10.3389/fnhum.2010.00222], a shrinking attention spotlight [https://doi.org/10.1016/j.cogpsych.2011.08.001], or dynamics in neural population-level geometry [https://doi.org/10.1038/nn.3643]. To pursue these ideas, future work may aim to incorporate dynamics into the visual component and decision component of the VAM with recurrent CNNs [https://doi.org/10.48550/arXiv.1807.00053, https://doi.org/10.48550/arXiv.2306.11582] and the task-DyVA model [https://doi.org/10.1038/s41562-022-01510-8], respectively.”

      Reviewer #3 (Public Review):

      Summary:

      In this article, the authors combine a well-established choice-response time (RT) model (the Linear Ballistic Accumulator) with a CNN model of visual processing to model image-based decisions (referred to as the Visual Accumulator Model - VAM). While this is not the first effort to combine these modeling frameworks, it uses this combination of approaches uniquely.

      Specifically, the authors attempt to better understand the structure of human information representations by fitting this model to behavioral (choice-RT) data from a classic flanker task. This objective is made possible by using a very large (by psychological modeling standards) industry data set to jointly fit both components of this VAM model to individual-level data. Using this approach, they illustrate (among other results) (1) how the interaction between target and flanker representations influence the presence and strength of congruency effects, (2) how the structure of representations changes (distributed versus more localized) with depth in the CNN model component, and (3) how different model training paradigms change the nature of information representations. This work contributes to the ML literature by demonstrating the value of training models with richer behavioral data. It also contributes to cognitive science by demonstrating how ML approaches can be integrated into cognitive modeling. Finally, it contributes to the literature on conflict modeling by illustrating how information representations may lead to some of the classic effects observed in this area of research.

      Strengths:

      (1) The data set used for this analysis is unique and is made publicly available as part of this article. Specifically, they have access to data for 75 participants with >25,000 trials per participant. This scale of data/individual is unusual and is the foundation on which this research rests.

      (2) This is the first time, to my knowledge, that a model combining a CNN with a choice-RT model has been jointly fit to choice-RT data at the level of individual people. This type of model combination has been used before but in a more restricted context. This joint fitting, and in particular, learning a CNN through the choice-RT modeling framework, allows the authors to probe the structure of human information representations learned directly from behavioral data.

      (3) The analysis approaches used in this article are state-of-the-art. The training of these models is straightforward given the data available. The interesting part of this article (opinion of course) is the way in which they probe what CNN has learned once trained. I find their analysis of how distractor and target information interfere with each other particularly compelling as well as their demonstration that training on behavioral data changes the structure of information representations when compared to training models on standard task-optimized data.

      Weaknesses:

      (1) Just as the data in this article is a major strength, it is also a weakness. This type of modeling would be difficult, if not impossible to do with standard laboratory data. I don't know what the data floor would be, but collecting tens of thousands of decisions for a single person is impractical in most contexts. Thus this type of work may live in the realm of industry. I do want to re-iterate that the data for this study was made publicly available though!

      We suspect (but have not systematically tested) that the VAMs can be fitted with substantially less data. We use data augmentation techniques (various randomized image transformations) during training to improve the generalization capabilities of the VAMs, and these methods are likely to be particularly important when training on smaller datasets. One could consider increasing the amount of image data augmentation when working with smaller datasets, or pursuing other forms of data augmentation like resampling from estimated RT distributions (see https://doi.org/10.1038/s41562-022-01510-8 for an example of this). In general, we don’t think that prospective users of our approach should be discouraged if they have only a few hundred trials per subject (or less) - it’s worth trying!

      (2) While this article uses choice-RT data it doesn't fully leverage the richness of the RT data itself. As the authors point out, this modeling framework, the LBA component in particular, does not account for some of the more nuanced but well-established RT effects in this data. This is not a big concern given the already nice contributions of this article and it leads to an opportunity for ongoing investigation.

      We agree that fully capturing the more nuanced behavioral effects you mention (e.g. RT delta plots and conditional accuracy functions) is a worthwhile goal for future research—see our response to Reviewer #2 for a more detailed discussion. ----------

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The phrase in the Abstract "convolutional neural network models of visual processing and traditional EAMs are jointly fitted" made me initially believe that the two models were fitted independently. You may want to re-word to clarify.

      We think that the phrase “jointly fitted” already makes it clear that both the CNN and EAM parameters are estimated simultaneously, in agreement with how this term is usually used. But we have nonetheless appended some additional clarifying language to that sentence (“in a unified Bayesian framework”).

      (2) Lines 27-28: EAMs "are the most successful and widely-used computational models of decision-making." This is only true for the specific type of decision-making examined here, namely joint modeling of choice and response times. Signal detection theory is arguably more widely-used when response times are not modeled.

      Thanks for pointing this out - we have revised the referenced sentence accordingly.

      (3) Could the authors clarify what is plotted in Figure 2F?

      Fig. 2F shows the drift rates for the target, flanker, and “other” (non-target/non-flanker) accumulators averaged over trials and models for congruent vs. incongruent trials. In case this was a source of confusion, we do not show the value of the flanker drift rates on congruent trials because the flanker and target accumulators are identical (i.e. the flanker/congruent drift rates are equivalent to the target/congruent drift rates).

      (4) Lines 214-7: "The observation that single-unit information for target direction decreased between the fourth and final convolutional layers while population-level decoding remained high is especially noteworthy in that it implies a transition from representing target direction with specialized "target neurons" to a more distributed, ensemble-level code." Can the authors clarify why this is the only reasonable explanation for these results? It seems like many other explanations could be construed.

      We have added additional clarification to this section and now use more tentative language:

      “The observation that single-unit information for target direction decreased between the fourth and final convolutional layers indicates that the units become progressively less selective for particular target directions. Since population-level decoding remained high in these layers, this suggests a transition from representing target direction with specialized "target neurons" to a more distributed, ensemble-level code.”

      (5) Lines 372-376: "Thus, simply training the model to perform the task is not sufficient to reproduce a behavioral phenomenon widely-observed in conflict tasks. This challenges a core (but often implicit) assumption of the task-optimized training paradigm, namely that to do a task well, a training model will result in model representations that are similar to those employed by humans." While I agree with the general sentiment, I feel that its application here is strange. Unless I'm missing something, in the context of the preceding sentence, the authors seem to be saying that researchers in the field expect that CNNs can produce a behavioral phenomenon (RTs) that is completely outside of their design and training. I don't think that anyone actually expects that.

      We moved the discussion/analyses of RTs to the next paragraph. It should now be clear that this statement refers specifically to the absence of an accuracy congruency effect in the task-optimized models.

      (6) Lines 387-389: "As a result, the VAMs may learn richer representations of the stimuli, since a variety of stimulus features-layout, stimulus position, flanker direction-influence behavior (Figure 2)." That is certainly true of tasks like this one where an optimal model would only focus on a tiny part of the image, whereas humans are distracted by many features. I'm not sure that this distractibility is the same as "richer representations". When CNNs classify images based on the background, would the authors claim that they have richer representations than humans?

      We agree that “richer” may not be the best way to characterize these representations, and have changed it to “more complex”.

      (7) Is it possible that drift rate d_k for each response happens to be negative on a given trial? If so, how is the decision given on such trials (since presumably none of the accumulators will ever reach the boundary)?

      It is indeed possible for all of the drift rates to be negative, though we found that this occurred for a vanishingly small number of trials (mean ± s.e.m. percent trials/model: 0.080 ± 0.011%, n = 75 models), as reported in the Methods. These trials were excluded from analyses.

      (8)  Can the authors comment on how they chose the CNN architecture and whether they expect that different architectures will produce similar results?

      Before establishing the seven-layer CNN architecture used throughout the paper, we conducted some preliminary experiments using other architectures that differed primarily in the number of CNN layers. We found that models with significantly fewer than seven layers typically failed to reach human-level accuracy on the task while larger models achieved human-level accuracy but (unsurprisingly) took longer to train.

      Reviewer #3 (Recommendations For The Authors):

      - In the introduction to this paper (particularly the paragraph beginning in line 33), the authors note that EAMs have typically been used in simplified settings and that they do not provide a means to account for how people extract information from naturalistic stimuli. While I agree with this, the idea of connecting CNNs of visual processing with EAMs for a joint modeling framework has been done. I recommend looking at and referencing these two articles as well as adjusting the tenor of this part of an introduction to better reflect the current state of the literature. For full disclosure, I am one of the authors on these articles. https://link.springer.com/article/10.1007/s42113-019-00042-1 https://www.sciencedirect.com/science/article/abs/pii/S0010027721001323

      We agree—thanks for pointing this out. The revised Introduction now discusses prior related models in more detail (including those referenced above) and better clarifies the novel contributions of our model. We specifically highlight that a novel contribution of the VAM is that “the CNN and EAM parameters are jointly fitted to the RT, choice, and visual stimulus data from individual participants in a unified Bayesian framework.”

      - The statement in lines 56-58 implies that this is the first article to glue CNNs together with EAMs. I would edit this accordingly based on the prior comment here and references provided. I will note that the second feature of the approach in this paper is still novel and really nice, namely the fact that the CNN and the EAM are jointly fitted. In the aforementioned references, the CNN is trained on the image set, and individual level Bayesian estimation was only applied to the EAM. Thus, it may be useful to highlight the joint estimation aspect of this investigation as well as how the uniqueness of the data available makes it possible.

      Agreed—see above.

      - Figure 3c and associated text. I understand the MI analysis you are performing here, however it is difficult to interpret as it stands. In the figure, what does a MI of 0.1 mean?? Can you give some context to that scale? I do find the interpretation of the hunchback shape in lines 210-222 to be somewhat of a stretch. The discussion that precedes (lines 199-209) this is clear and convincing. Can this discussion be strengthened more? And more interpretability of Figure 3c would be helpful; entropic scales can be hard to interpret without some context or scale associated.

      The MI analyses in Fig. 3C (and also Figs. 4C and 6E) show normalized MI, in which the raw MI has been divided by the entropy of the stimulus feature distribution. This normalization facilitates comparing the MI for different stimulus features, which is relevant for Figs. 4C and 6E. The normalized MI has a possible range of [0, 1], where 1 indicates perfect correlation between the two variables and 0 indicates complete independence. We now note in the legend of these figures that the possible normalized MI range is [0, 1], which should help with interpreting these values. Our revised results section for Fig. 3C now also includes some additional remarks on our interpretation of the hunchback shape of the MI.

      - Lines 244-248 and the analyses in Figure 3 suggest a change in the behavior of the CNN around layer 4. This is just a musing, but what would happen if you just used a 4 layer CNN, or even a 3 layer? This is not just a methods question. Your analysis suggests a transition from localized to distributed information representation. Right now, the EAM only sees the output of the distributed representation. What if it saw the results the more local representations from early layers? Of course, a shallower network may just form the distributed representations earlier, but it would interesting if there were a way to tease out not just the presence of distributed vs local representations, but the utility of those to the EAM.

      Thanks for this interesting suggestion. We did do some preliminary experiments in models with fewer layers, though we only examined the outputs of these models and did not assess their representations. We found that models with 3–5 layers generally failed to achieve human-level accuracy on the task. In principle, one could relate this observation to the representations of these models as a means of assessing the relative utility of distributed/local representations. However, there are confounding factors that one would ideally control for in order to compare models with different numbers of layers in this fashion (namely, the number of parameters).

      - Section Line 359 (Task optimized models) - It would be helpful to clarify here what these task-optimized models are being trained to do. As I understand it, they are being trained to directly predict the target direction. But are you asking them to learn to predict the true target direction? Or are you training them to predict what each individual responds? I think it is the second (since you have 75 of these), but it's not clear. I looked at the methods and still couldn't get a clear description of this. Also, are you just stripping the LBA off of the end of the CNN and then essentially putting a softmax in its place? If so, it would be helpful to say so.

      The task-optimized models were actually trained to output the true target direction in each stimulus, rather than trained to match the decisions of the human participants. We trained 75 such models since we wanted to use exactly the same stimuli as were used to train each VAM. The task-optimized CNNs were identical to those used in the VAMs, except that the outputs of the last layer were converted to softmax-scored probabilities for each direction rather than drift rates. The Results and Methods section now included additional commentary that clarifies these points.

      - Line 373-376: This statement is pretty well established at this point in the similarity judgement literature. I recommend looking at and referencing https://onlinelibrary.wiley.com/doi/full/10.1111/cogs.13226 https://www.nature.com/articles/s41562-020-00951-3 https://link.springer.com/article/10.1007/s42113-020-00073-z

      Thanks for pointing this out. For reference, the statement in question is “Thus, simply training the model to perform the task is not sufficient to reproduce a behavioral phenomenon widely-observed in conflict tasks. This challenges a core (but often implicit) assumption of the task-optimized training paradigm, namely that training a model to do a task well will result in model representations that are similar to those employed by humans.”

      We agree that the first and third reference you mention are relevant, and we now cite them along with some other relevant work. In our view, the second reference you mention is not particularly relevant (that paper introduces a new computational model for similarity judgements that is fit to human data, but does not comment on training models to perform tasks vs. fitting to human data).

      - Line 387-388: "VAMs may learn richer representations". This is a bit of a philosophical point, but I'll go ahead and mention it. The standard VAM does not necessarily learn "richer" feature representations. Rather, you are asking the VAM and task-optimized models to do different things. As a result, they learn different representations. "Better" or "richer" is in the eye of the beholder. In one view, you could view the VAM performance as sub-par since it exhibits strange artifacts (congruency effects) and the expansion of dimensionality in the VAM representations is merely a side-effect of poor performance. I'm not advocating this view, just playing devils advocate and suggesting a more nuanced discussion of the difference between the VAM and task-optimized models.

      We agree—this is a great point. We have changed this statement to read “the VAMs may learn more complex [rather than richer] representations of the stimuli”.

      - Lines 567-570: Here you discuss how the LBA backend of the VAM can't account for shrinking spotlight-like RT effects but that fitting models to different RT quantiles helps overcome this. I find this to be one of the weakest points of the paper (the whole process of fitting RT quantiles separately to begin with). This is just a limitation of the RT component of the model. This is a great paper but this is just a limitation inherent in the model. I don't see a need to qualify this limitation and think it would be better to just point out that this is a limitation of the LBA itself (be more clear that it is the LBA that is the limiting factor here) and that this leaves room for future research. From your last sentence of this paragraph, I agree that recurrent CNNs would be interesting. I will note that RNN choice-RT models are out there (though not with CNNs as part of the model).

      We agree and have revised this section of the Discussion accordingly (see our response to Reviewer #2 for more detail). We also removed the analyses of models trained on separate RT quantiles.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      The study presents a potentially valuable approach to genetically modify cells to produce extracellular matrices with altered compositions, termed cell-laid, engineered extracellular matrices (eECM). The evidence supporting the authors' conclusions regarding the utility of eECM for endogenous repair is solid, although there are some disagreements on the chondrogenicity of lyophilized constructs which was viewed as lacking robust evidence for endochondral ossification.

      We thank the reviewers for the assessment of our work. We however strongly contest the lack of evidence for chondrogenicity and endochondral ossification. This is robustly demonstrated and a clear strength of our study.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to modify the characteristics of the extracellular matrix (ECM) produced by immortalized mesenchymal stem cells (MSCs) by employing the CRISPR/Cas9 system to knock out specific genes. Initially, they established VEGF-KO cell lines, demonstrating that these cells retained chondrogenic and angiogenic properties. Additionally, lyophilized carriage tissues produced by these cells exhibited retained osteogenic properties.

      Subsequently, the authors established RUNX2-KO cell lines, which exhibited reduced COLX expression during chondrogenic differentiation and notably diminished osteogenic properties in vitro. Transplantation of lyophilized carriage tissues produced by RUNX2-KO cell lines into osteochondral defects in rat knee joints resulted in the regeneration of articular cartilage tissues as well as bone tissues, a phenomenon not observed with tissues derived from parental cells. This suggests that gene-edited MSCs represent a valuable cell source for producing ECM with enhanced quality.

      Strengths:

      The enhanced cartilage regeneration observed with ECM derived from RUNX2-KO cells supports the authors' strategy of creating gene-edited MSCs capable of producing ECM with superior quality. Immortalized cell lines offer a limitless source of off-the-shelf material for tissue regeneration.

      Weaknesses:

      Most of the data align with anticipated outcomes, offering limited novelty to advance scientific understanding. Methodologically, the chondrogenic differentiation properties of immortalized MSCs appeared deficient, evidenced by Safranin-O staining of 3D tissues and histological findings lacking robust evidence for endochondral differentiation. This presents a critical limitation, particularly as authors propose the implantation of cartilage tissues for in vivo experiments. Instead, the bulk of data stemmed from type I collagen scaffold with factors produced by MSCs stimulated by TGFβ.

      We thank the reviewer for the thorough evaluation. We appreciate the highlighted novelty but overall disagree with key points from the provided assessment. The most important one being non the contested in vitro cartilage and endochondral ossification by engineered ECMs, for which we have provided compelling evidence. Of note, the reviewer points the “osteogenic” properties of our tissues; the wording is incorrect since cells are absent from the final grafts. Here, the term ”osteoinductivity” should be employed, in line with the model of ectopic ossification used to demonstrate de novo bone formation.

      In the revised version, the authors presented Safranin-O staining results of pellets prior to lyophilization. The inset of figures showing entire pellets revealed that Safranin-O-positive areas were limited, suggesting that cells in the negative regions had not differentiated into chondrocytes. In Figure 3F, DAPI staining showed devitalized cells in the outer layer but was negative in the central part, indicating the absence of cells in these areas and incomplete differentiation induction.

      We strongly disagree with the reviewer on the lack of demonstrated chondrogenicity. We have provided evidence of Safranin-O positivity, GAGs quantification, as well as collagen type 2 and collagen type X stainings (also quantified). Frankly, those are gold standard assays in the field and we do not understand the reviewer point of view. We however agree that our grafts are not entirely composed of cartilage matrix. There are areas where cartilage is absent, in particular in the core of the tissues. This is expected from in vitro engineered cartilage pellets even from primary BM-MSCs donors. By selecting primary donors it is possible to obtain a superior cartilage formation. Our MSOD-B cells remain to-the-best-of-our -knowledge, the only human line capable of in vitro chondrogenesis, even if considered moderate.

      We agree with the absence of cells in the core area of our tissues, as correctly pointed out by the reviewer. This has been reported in other studies whereby the lack of media diffusion can lead to necrotic core formation.

      The rationale for establishing VEGF-KO cell lines remains unclear, and the authors' explanation in the revised manuscript is still equivocal. While they mention that VEGF is a late marker for endochondral ossification, the data in Figures 1D and 1E clearly show that VEGF-KO affects the early phase of endochondral ossification.

      We feel that the rationale for a VEGF-KO is sufficiently conveyed. In our study, VEGF-KO affects GAGs content in the tissue, but not the efficiency of ossification.

      Insufficient depth was given to elucidate the disparity in osteogenic properties between those observed in ectopic bone formation and those observed in transplantation into osteochondral defects.

      We here agree with the reviewer on the limited depth of our osteochondral assessment. However, this was performed as a proof-of-concept and we clearly conveyed both limitations and need of a follow-up study to demonstrate the repair efficacy of our tissue in such defect context.

      In the ectopic bone formation study, most of the collagenous matrix observed at 2 weeks was resorbed by 6 weeks, with only a small amount contributing to bone formation in MSOD-B cells (Figs. 2I and 4C). This finding does not align with the micro-CT data presented in Figures 2H and 4B. For the micro-CT experiments, it would be more appropriate to use a standard window for bone and present the data accordingly.

      Stainings report the deposition of collagens and may be misleading as not only indicating frank bone formation. This is the reason why we provided microCT data, offering a quantitative assessment of the full grafts and more reliably evaluating mineralized/bone tissue. We feel that our results matched our conclusions.

      While the regeneration of articular cartilage in RUNX2-KO ECM presents intriguing results, the study lacked an exploration into underlying mechanisms, such as histological analyses at earlier time points.

      We do agree with the reviewer regarding this limitation. In addition to mechanisms and early timepoints, we are also interested in longer in vivo evaluation. This represents a significant amount of work which is beyond the scope of our present manuscript.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors have started off using an immortalized human cell line and then gene edited it to decrease the levels of VEGF1 (in order to influence vascularization), and the levels of Runx2 (to decrease osteogenesis). They first transplanted these cells with a collagen scaffold. The modified cells showed a decrease in vascularization when VEGF1 was decreased, and suggested an increase in cartilage formation.

      In another study, matrix generated by these cells subsequently remodeled into a bone marrow organ. When RUNX2 was decreased, the cells did not mineralize in vitro, and their matrices expressed types I and II collagen but not type X collagen in vitro, in comparison with unedited cells. In vivo, the author claims that remodeling of the matrices into bone was somewhat inhibited. Lastly, they utilized matrices generated by RUNX2-edited cells to regenerate chondro-osteal defects. They suggest that the edited cells regenerated cartilage in comparison with unedited cells.

      Strengths:

      - The notion that inducing changes in the ECM by genetically editing the cells is a novel one, as it has long been thought that ECM composition influences cell activity.

      - If successful, it may be possible to make off the shelf ECMS to carry out different types of tissue repair.

      Weaknesses:

      - The authors have not demonstrated robust cartilage formation (quantitation would be useful).

      - Measuring total GAG content does not prove the presence of cartilage

      - There are numerous overstatements about forming and implanting cartilage.

      - Although it is implied, RUNX2 deletion did not improve cartilage formation by the modified cells.

      - In the control line, MSOD-B there were variability in the amount of safranin O positive material in various histological panels in the figures.; more quantitation is needed.

      - In the in vivo articular defect experiments, an untreated injured joint is needed as a negative control.

      - Statements about bone generation are often not reflective of the microCT data presented.<br /> - The discussion over-interprets the results.

      We thank the reviewer for the further assessment of our work. We respectfully disagree with most of the provided statements. The chondrogenicity of our graft is robustly demonstrated using multiple readouts, including quantitative ones. Beyond GAGs, we provided clear Safranin-O stainings, as well as collagen type 2 and X indicating presence of hypertrophic cartilage matrix. Those are the gold standards in the field and we thus do not understand the reviewer scepticism. We do agree that our grafts are fully composed of cartilage matrix, with areas (in the core) deprived of cartilage. This does not impact the core findings of our study and its conclusions, and we strongly feel our statements about forming in vitro cartilage fully stand.

      We do not claim in the manuscript an increased cartilage formation following RUNX2 deletion. We report in vitro an impaired hypertrophy (collagen type X) and maintenance of collagen type 2 and GAGs content.

      We are confident on our data regarding de novo bone formation bi priming endochondral ossification, confirmed both by stainings and microCT. We feel that our claims are well-supported.


      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors aimed to modify the characteristics of the extracellular matrix (ECM) produced by immortalized mesenchymal stem cells (MSCs) by employing the CRISPR/Cas9 system to knock out specific genes. Initially, they established VEGF-KO cell lines, demonstrating that these cells retained chondrogenic and angiogenic properties. Additionally, lyophilized carriage tissues produced by these cells exhibited retained osteogenic properties. 

      Subsequently, the authors established RUNX2-KO cell lines, which exhibited reduced COLX expression during chondrogenic differentiation and notably diminished osteogenic properties in vitro. Transplantation of lyophilized carriage tissues produced by RUNX2-KO cell lines into osteochondral defects in rat knee joints resulted in the regeneration of articular cartilage tissues as well as bone tissues, a phenomenon not observed with tissues derived from parental cells. This suggests that gene-edited MSCs represent a valuable cell source for producing ECM with enhanced quality. 

      Strengths: 

      The enhanced cartilage regeneration observed with ECM derived from RUNX2-KO cells supports the authors' strategy of creating gene-edited MSCs capable of producing ECM with superior quality. Immortalized cell lines offer a limitless source of off-the-shelf material for tissue regeneration. 

      We thank the reviewer for the interest in our work. We however want to clarify that the present manuscript does not report the generation of ECM with “superior quality”, but rather of modulated composition and thus function.  

      Weaknesses: 

      Most data align with anticipated outcomes, offering limited novelty to advance scientific understanding. Methodologically, the chondrogenic differentiation properties of immortalized MSCs appeared deficient, evidenced by Safranin-O staining of 3D tissues and histological findings lacking robust evidence for endochondral differentiation. This presents a critical limitation, particularly as authors propose the implantation of cartilage tissues for in vivo experiments. Instead, the bulk of data stemmed from type I collagen scaffold with factors produced by MSCs stimulated by TGFβ. 

      The chondrogenic differentiation of our MSOD-B line and their capacity of undergoing endochondral ossification has been robustly demonstrated in previous studies (Pigeot et al., Advanced Materials 2021 and Grigoryan et al., Science Translational Medicine 2022). In the present manuscript, we thus compare the chondrogenic capacity of newly established VEGF-KO and RUNX-KO lines to those of MSOD-B cells. We demonstrate by qualitative (Safranin-O staining, Collagen type 2 and Collagen type X immuno-stainings) and quantitative (glycosaminoglycans assay) assays that the generated tissues consist in cartilage grafts of similar quality than the MSOD-B counterpart. Of note, the safranin-O stainings were performed on lyophilized tissues, which can alter the staining quality/intensity. We now provide additional stainings of generated tissues pre-lyophilization. This is implemented in Figure 1D, Figure 3D.

      The rationale behind establishing VEGF-KO cell lines remains unclear. What specific outcomes did the authors anticipate from this modification? 

      VEGF is a known master regulator of angiogenesis and a key mediator of endochondral ossification. It has also been extensively used in bone tissue engineering studies as a supplemented factor – primarily in the form of VEGFα – to increase the vascularization and thus outcome of bone formation of engineered grafts (https://www.nature.com/articles/s42003-020-01606-9, https://www.sciencedirect.com/science/article/pii/S8756328216301752). In our study, it was thus identified as a natural candidate to demonstrate the possibility to generate VEGF-KO cartilage and subsequently assess the functional impact on both the angiogenic and osteogenic potential of resulting cartilage tissue. This is now clarified in the manuscript (page 3, paragraph 4).

      Insufficient depth was given to elucidate the disparity in osteogenic properties between those observed in ectopic bone formation and those observed in transplantation into osteochondral defects. While the regeneration of articular cartilage in RUNX2-KO ECM presents intriguing results, the study lacked an exploration into underlying mechanisms, such as histological analyses at earlier time points. 

      Using RUNX2-KO ECM, we aimed at demonstrating the impact on cartilage remodeling and bone formation. This was performed ectopically but also in the rat osteochondral defect as a regenerative set-up of higher clinical relevance. We agree with the reviewer that additional experimental groups and time-points (not only earlier but also longer ones) would offer a better mechanistic understanding of the ECM contribution to the joint repair. However, as stated in our manuscript this is a proof-of-concept study that successfully demonstrated the influence of the cartilage ECM modification on the in vivo skeletal regeneration. A follow-up study would need to be performed to complement existing evidence and strengthen the relevance of our approach for cartilage repair. This is now further emphasized in the discussion (page 11, paragraph 3).  

      Reviewer #2 (Public Review): 

      The manuscript submitted by Sujeethkumar et al. describes an alternative approach to skeletal tissue repair using extracellular matrix (ECM) deposited by genetically modified mesenchymal stromal/stem cells. Here, they generate a loss of function mutations in VEGF or RUNX2 in a BMP2overexpressing MSC line and define the differences in the resulting tissue-engineered constructs following seeding onto a type I collagen matrix in vitro, and following lyophilization and subcutaneous and orthotopic implantation into mice and rats. Some strengths of this manuscript are the establishment of a platform by which modifications in cell-derived ECM can be evaluated both in vitro and in vivo, the demonstration that genetic modification of cells results in complexity of in vitro cell-derived ECM that elicits quantifiable results, and the admirable goal to improve endogenous cartilage repair. However, I recommend the authors clarify their conclusions and add more information regarding reproducibility, which was one limitation of primary-cell-derived ECMs. 

      We thank the reviewer for the positive evaluation of our work.  

      Overcoming the limitations of native/autologous/allogeneic ECMs such as complete decellularization and reduction of batch-to-batch variability was not specifically addressed in the data provided herein. For the maintenance of ECM organization and complexity following lyophilization, evidence of complete decellularization was not addressed, but could be easily evaluated using polarized light microscopy and quantification of human DNA for example in constructs pre and post-lyophilization. 

      We appreciate the reviewer comments and acknowledge the lack of information in the first version of our manuscript. In line with our previous study (Pigeot et al., Advanced Materials 2021), the ectopic evaluation of our cartilage pellets was strictly done with lyophilized tissues using immunocompromised animals. Lyophilized tissues are thus considered devitalized, and not decellularized. Instead, the osteochondral defect experiment was performed with decellularized tissues in order to be able to implant the grafts in the rat immuno-competent model. This is now specified consistently throughout the manuscript. The decellularization process is also now incorporated accordingly in the method section (page 14, paragraph 2). We also provide quantifications of GAGs and DNAs from tissue pre- and post-decellularization (Supplementary figure 6A and 6B), described in the result section of the manuscript (page 9, paragraph 1). The decellularization step led to 97-98% of DNA removal.

      Importantly, we do not claim full maintenance of ECM integrity following lyophilization nor decellularization.  This is now clarified in the discussion (page 12, paragraph 2). However, we report their capacity to instruct skeletal regeneration in multiple contexts despite extensive processing.

      It would be ideal to see minimization of batch-to-batch variability using this approach, as mitigation of using a sole cell line is likely not sufficient (considering that the sole cell line-derived Matrigel does exhibit batch-to-batch and manufacturer-to-manufacturer variability). I recommend adding details regarding experimental design and outcomes not initially considered. Inter- and intraexperimental reproducibility was not adequately addressed. The size of in vitro-derived cartilage pellets was not quantified, and it is not clear that more than one independent 'differentiation' was performed from each gene-edited MSC line to generate in vitro replicates and constructs that were implanted in vivo. 

      We thank the Reviewer for the comment on variability/reproducibility concern. Using a cell line does confer higher robustness but indeed does not grant unlimited consistency of batch production. We now temper our claims in the discussion and mention the need to regularly recharacterize cell lines properties upon passages (page 12, paragraph 2). Using our edited lines, we have generated multiple batches of cartilage grafts for their in vitro characterization or in vivo performance assessment. We have now compiled batch variations of GAG content and pellet volume, provided as Supplementary figure 5. This revealed that batches are indeed not identical (nor each pellets), but the production remains consistent.

      The use of descriptive language in describing conclusions may mislead the reader and should be modified accordingly throughout the manuscript. For example, although this reviewer agrees with the comparative statements made by the authors regarding parental and gene-edited MSC lines, non-quantifiable terms such as 'frank' 'superior' (example, line 242) are inappropriate and should rather be discussed in terms of significance. Another example is 'rich-collagenous matrix,' which was not substantiated by uniform immunostaining for type II collagen (line 189). 

      We thank the Reviewer for the constructive suggestions. We have revised the language accordingly throughout the manuscript. 

      I have similar recommendations regarding conclusive statements from the rat implantation model, which was appropriately used for the purpose of evaluating the response of native skeletal cells to the different cell-derived ECMs. Interpretations of these results should be described with more accuracy. For example, increased TRAP staining does not indicate reduced active bone formation (line 237). Many would not conclude that GAGs were retained in the RUNX2-KO line graft subchondral region based on the histology. Quantification of % chondral regeneration using histology is not accurate as it is greatly influenced by the location in the defect from which the section was taken. Chondral regeneration is usually semi-quantified from gross observations of the cartilage surface immediately following excision. The statements regarding integration (example line 290) are not founded by histological evidence, which should show high magnification of the periphery of the graft adjacent to the native tissue. 

      We have revised our language relative to the TRAP staining description (page 9, paragraph 2). We also agree with the reviewer on the semi-quantitative approach of our methodology,  which we transparently disclosed both in the main text (page 9, paragraph 3) and method section (page 18, paragraph 2). The sectioning location does influence the analysis, but to prevent this we performed an assessment at different depth (top, middle, bottom for each sample). This is now implemented in our method section (page 18, paragraph 3). On the tissue integration, we now provide higher magnification images of the implant/host tissue area (Figure 5F).

      Reviewer #3 (Public Review): 

      Summary: 

      In this study, the authors have started off using an immortalized human cell line and then geneedited it to decrease the levels of VEGF1 (in order to influence vascularization), and the levels of Runx2 (to decrease chondro/osteogenesis). They first transplanted these cells with a collagen scaffold. The modified cells showed a decrease in vascularization when VEGF1 was decreased, and suggested an increase in cartilage formation. 

      In another study, the matrix generated by these cells was subsequently remodeled into a bone marrow organ. When RUNX2 was decreased, the cells did not mineralize in vitro, and their matrices expressed types I and II collagen but not type X collagen in vitro, in comparison with unedited cells. In vivo, the author claims that remodeling of the matrices into bone was somewhat inhibited. Lastly, they utilized matrices generated by RUNX2 edited cells to regenerate chondro-osteal defects. They suggest that the edited cells regenerated cartilage in comparison with unedited cells. 

      Strengths: 

      - The notion that inducing changes in the ECM by genetically editing the cells is a novel one, as it has long been thought that ECM composition influences cell activity. 

      - If successful, it may be possible to make off-the-shelf ECMS to carry out different types of tissue repair. 

      We thank the Reviewer for the critical evaluation of our work and the highlighted novelty of it.  

      Weaknesses: 

      - The authors have not generated histologically identifiable cartilage or bone in their transplants of the cells with a type I scaffold. 

      The chondrogenic differentiation of our MSOD-B line and their capacity of undergoing endochondral ossification has been robustly demonstrated in previous studies (Pigeot et al., Advanced Materials 2021 and Grigoryan et al., Science Translational Medicine 2022). In the present manuscript, we thus compare the chondrogenic capacity of newly established VEGF-KO and RUNX-KO lines to those of MSOD-B. We demonstrate by qualitative (Safranin-O staining, Collagen type 2 and Collagen type X immuno-stainings) and quantitative (glycosaminoglycans assay) assays that the generated tissues consist in cartilage tissue of similar quality than the MSOD-B. Of note, the safranin-O stainings were performed on lyophilized tissues, which can alter the staining quality/intensity. We now provide here additional stainings of generated tissues pre-lyophilization. This is implemented in Figure 1D and Figure 3D.

      On the contested formation of bone in vivo by our ECMs grafts, we have provided compelling qualitative evidence via Masson´s Trichrome stainings and quantification of mineralized volume by µCT. Both cortical bone and trabecular structures were identified ectopically. Those are standard evaluation methods in the field, we would be happy to receive additional suggestions by the Reviewer. 

      - In many cases, they did not generate histologically identifiable cartilage with their cell-free-edited scaffold. They did generate small amounts of bone but this is most likely due to BMPs that were synthesized by the cells and trapped in the matrix. 

      We now appreciate that the Reviewer agrees on the successful formation of bone induced by our engineered grafts. We however still respectfully disagree with the “small amount of bone” statement since our MSOD-B and MSOD-B VEGF KO cartilage grafts led to the full generation of a mature ectopic bone organ (that is, also composed of extensive marrow). This has been assessed qualitatively and quantitatively. 

      We agree with the Reviewer on the key role of BMP-2 in the remodeling process into bone and bone marrow, which we have extensively described in our previous publication (Pigeot et al., Advanced Materials 2021). However, the low amount of BMP-2 (in the dozens of nanogram/tissue range) embedded in the matrix is not sufficient per se to induce ectopic endochondral ossification. It is the combined presence of GAGs in the matrix -thus cartilage- that allows the success of bone formation.  

      - There is a great deal of missing detail in the manuscript. 

      We have incorporated additional methodological details describing the lyophilization/decellularization process of our tissues prior to evaluation (see Material and Methods section). We also have included a description of the MSOD-B line and implemented genetic elements (Supplementary Figure 1A).  

      - The in vivo study is underpowered, the results are not well documented pictorially, and are not convincing. 

      We believe our group size supports our conclusions confirmed by statistical assessment. We have provided additional stainings and images of higher magnifications (Figure 5) for both the ectopic and orthotopic in vivo evaluation.  

      - Given the fact that they have genetically modified cells, they could have done analyses of ECM components to determine what was different between the lines, both at the transcriptome and the protein level. Consequently, the study is purely descriptive and does not provide any mechanistic understanding of what mixture of matrix components and growth factors works best for cartilage or bone. But this presupposes that they actually induced the formation of bona fide cartilage, at least. 

      We thank the Reviewer for the suggestion. However, our study did not aim at understanding what ECM graft composition work best for cartilage nor bone regeneration respectively. Instead, we propose the exploitation of our cellular tools to interrogate the function of key ECM constituents and their impact in skeletal regeneration. We once more confirm that we generated cartilage grafts which is now better supported by additional histological assessment before lyophilization.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In a previous work, Prut and colleagues had shown that during reaching, high-frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report, they extend their previous work by the addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joints. More interestingly, the experiment revealed evidence for the decomposition of the reaching movement, as well as an increase in the variance of the trajectory.

      Strengths:

      This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.

      Weaknesses:

      My major concerns are described below.

      If I understand the task design correctly, the monkeys did not need to stop their hand at the target. I think this design may be suboptimal for investigating the role of the cerebellum in control of reaching because a number of earlier works have found that the cerebellum's contributions are particularly significant as the movement ends, i.e., stopping at the target. For example, in mice, interposed nucleus neurons tend to be most active near the end of the reach that requires extension, and their activation produces flexion forces during the reach (Becker and Person 2019). Indeed, the inactivation of interposed neurons that project to the thalamus results in overshooting of reaching movements (Low et al. 2018). Recent work has also found that many Purkinje cells show a burst-pause pattern as the reach nears its endpoint, and stimulation of the mossy fibers tends to disrupt endpoint control (Calame et al. 2023). Thus, the fact that the current paper has no data regarding endpoint control of the reach is puzzling to me.

      We appreciate the reviewer’s point that cerebellar contributions can be particularly critical near the endpoint of a reach. In our current task design, monkeys were indeed required to hold at the target briefly—100 ms for Monkeys S and P, and 150 ms for Monkeys C and M—before receiving a reward. However, given the size of the targets and the velocity of movements, it often happened that the monkey didn’t have to stop its movement to obtain a reward. Importantly, we relaxed the task’s requirements (by increasing target size and reducing temporal constraints) to allow monkeys to perform the task under cerebellar block conditions as we found that the strict criteria in these conditions yield a low success rate. This design is suboptimal for studying endpoint accuracy which, as we now appreciate, is an important aspect of cerebellar control. In our revision, we will clarify these aspects of the task design and acknowledge that it is sub-optimal for examining the role of cerebellum in end-point control. Future studies will explicitly address this point more carefully.

      Because stimulation continued after the cursor had crossed the target, it is interesting to ask whether this disruption had any effects on the movements that were task-irrelevant. The reason for asking this is because we have found that whereas during task-relevant eye or tongue movements the Purkinje cells are strongly modulated, the modulations are much more muted when similar movements are performed but are task-irrelevant (Pi et al., PNAS 2024; Hage et al. Biorxiv 2024). Thus, it is interesting to ask whether the effects of stimulation were global and affected all movements, or were the effects primarily concerned with the task-relevant movements.

      This is a very interesting suggestion. Although our main analysis focused on target-directed reaching movements, we have the data for the between-trial movements under continuous stimulation (e.g., return to center movements). In our revised supplementary material, we will examine the effect of cerebellar block on endpoint velocities in inter-trial movements versus task-related movements.

      If the schematic in Figure 1 is accurate, it is difficult for me to see how any of the reaching movements can be termed single joint. In the paper, T1 is labeled as a single joint, and T2-T4 are labeled as dual-joint. The authors should provide data to justify this.

      The is reviewer right and movements to all targets engages shoulder and elbow but the single joint participation varied in a target-specific manner. In the manuscript, we used the term “single-joint” to indicate a target direction in which one joint remains stationary, resulting in minimal coupling torque at the adjacent joint. Specifically, for Targets 1 and 5 in our experiments, the net torque (and thus acceleration) at the elbow was negligible, and hence the shoulder experienced correspondingly low coupling torque (as illustrated in Figure 3c of our manuscript). To avoid confusion, we will use the term ‘predominantly single-joint’ movements in our revised manuscript to indicate targets with low coupling torques. We will also include an additional figure in the revised supplementary material displaying the net torques at the shoulder and elbow, similar to Figures 2c and 3c. Our goal is to demonstrate that movements to targets 1 and 5 are characterized by predominantly one-joint engagement (i.e., the elbow is stationary with low net torque) and low coupling torques, rather than implying a purely isolated, single-joint motion.

      Because at least part of this work was previously analyzed and published, information should be provided regarding which data are new.

      We will include a clear statement in the Methods section specifying which components of the dataset and analyses are entirely new. While some of the same animals and stimulation protocol were presented in prior work, the inverse-dynamics modeling, analyses of progressive movement changes across trials under stimulation and invariance of motor noise to movement velocity are newly reported in this manuscript.

      Reviewer #2 (Public review):

      This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high-frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center outreaching movements and has been published from this laboratory in several preceding studies. I found the take-home-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, and the data was clear, convincing, and novel. My comments below highlight suggestions to improve clarity and sharpen some arguments.

      Primary comments:

      (1) Torque vs. tone: Is it known whether this type of cerebellar blockade is reducing muscle tone or inducing any type of acute co-contraction that could influence limb velocity through mechanisms different than 'atonia'? If so, the authors should discuss this information in the discussion section starting around line 336, and clarify that this motivates (if it does) the focus on 'torques' rather than muscle activation. Relatedly, besides the fact that there are joints involved, is there a reason there is so much emphasis on torque per se? If the muscle is deprived of sufficient drive, it would seem that it would be more straightforward to conceptualize the deficit as one of insufficient timed drive to a set of muscles than joint force. Some text better contextualizing the choices made here would be sufficient to address this concern. I found statements like those in the introduction "hand velocity was low initially, reflecting a primary muscle torque deficit" to be lacking in substance. Either that statement is self-evident or the alternative was not made clear. Finally, emphasize that it is a loss of self-generated torque at the shoulder that accounts for the velocity deficits. At times the phrasing makes it seem that there is a loss of some kind of passive torque.

      We appreciate the reviewer’s emphasis on distinguishing reduced muscle tone and altered co-contraction patterns as possible explanations for decreased limb velocity. Our focus on torques arises from previous studies suggesting that the core deficit in cerebellar ataxia is impaired prediction of coupling torques. This point will be added in the discussion section of our revised manuscript where we will explain why we prioritize muscle torques and how muscle-level activation collectively contributes to net joint torques. Also, we will underscore that the observed velocity deficits primarily reflect a reduction of self-generated torque at the shoulder (whether acute or adaptive), rather than any reduction in passive torques.

      (2) Please clarify some of the experimental metrics: Ln 94 RESULTS. The success rate is used as a primary behavioral readout, but what constitutes success is not clearly defined in the methods. In addition to providing a clear definition in the methods section, it would also be helpful for the authors to provide a brief list of criteria used to determine a 'successful' movement in the results section before the behavioral consequences of stimulation are described. In particular, the time and positional error requirements should be clear.

      Successful trials were trials in which monkeys didn’t leave the center position before the go signal and reached the peripheral target within a specific time criteria. These values varied in different monkeys. We will include detailed definitions of our success criteria in the revised methods section of our manuscript. Specifically, we will update our methods section to include (i) the timing criteria of each phase of the trials and (ii) the size of the peripheral targets indicating the tolerance for endpoint accuracy.

      (3) Based on the polar plot in Figure 1c, it seemed odd to consider Targets 1-4 outward and 5-8 inward movements, when 1 and 5 are side-to-side. Is there a rationale for this grouping or might results be cleaner by cleanly segregating outward (targets 2-4) and inward (targets 6-8) movements? Indeed, by Figure 3 where interaction torques are measured, this grouping would seem to align with the hypothesis much more cleanly since it is with T2,T3,and T4 where clear coupling torques deficits are seen with cerebellar block.

      We acknowledge the reviewer’s observation regarding Targets 1 and 5 being side-to-side rather than strictly “outward” or “inward.” In the first section of our results, we grouped the targets in this way to emphasize the notably stronger effect of the cerebellar block on targets involving shoulder flexion (‘outward’) as compared to those involving shoulder extension (‘inwards’). For subsequent analyses we focused on the effects of cerebellar block on outward targets where movements were single-joint (Target 1) vs. multi-joint (Targets 2-4). To clarify this aspect, in our revised manuscript we will explain the rationale for grouping T1–T4 as “outward” and T5–T8 as “inward,” including how we defined them.

      (4) I did not follow Figure 3d. Both the figure axis labels and the description in the main text were difficult to follow. Furthermore, the color code per animal made me question whether the linear regression across the entire dataset was valid, or would be better performed within animal, and the regressions summarized across animals. The authors should look again at this section and figure.

      We will revise the figure labels and legend to clarify how each axis is defined. Please note that pooling the data was done after confirming that data from each animal expressed a similar trend. Specifically, the correlation coefficients were all positive but statistically significant in 3 out of the 4 monkeys. Moreover, following the reviewers’ feedback, we also did a partial correlation analysis (which controls for the variability across monkeys) and found a significant correlation (r = 0.33, p < 0.001). These points will be described in the revised manuscript.

      (5) Line 206+ The rationale for examining movement decomposition with a cerebellar block is presented as testing the role of the cerebellum in timing. Yet it is not spelled out what movement decomposition and trajectory variability have to do with motor timing per se.

      The reviewer is right and the relations between timing, decomposition and variability need to be explicitly presented. In our revision, we will explain how decomposed movements may reflect impaired temporal coordination across multiple joints—a critical cerebellar function. We will also clarify how increased variability in joint coordination can result in increased trial-to-trial variability of trajectories.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript, "Disentangling acute motor deficits and adaptive responses evoked by the loss of cerebellar output," Sinha and colleagues aim to identify distinct causes of motor impairments seen when perturbing cerebellar circuits. This goal is an important one, given the diversity of movement-related phenotypes in patients with cerebellar lesions or injuries, which are especially difficult to dissect given the chronic nature of the circuit damage. To address this goal, the authors use high-frequency stimulation (HFS) of the superior cerebellar peduncle in monkeys performing reaching movements. HFS provides an attractive approach for transiently disrupting cerebellar function previously published by this group. First, they found a reduction in hand velocities during reaching, which was more pronounced for outward versus inward movements. By modeling inverse dynamics, they find evidence that shoulder muscle torques are especially affected. Next, the authors examine the temporal evolution of movement phenotypes over successive blocks of HFS trials. Using this analysis, they find that in addition to the acute, specific effects on muscle torques in early HFS trials, there was an additional progressive reduction in velocity during later trials, which they interpret as an adaptive response to the inability to effectively compensate for interaction torques during cerebellar block. Finally, the authors examine movement decomposition and trajectory, finding that even when low-velocity reaches are matched to controls, HFS produces abnormally decomposed movements and higher than expected variability in trajectory.

      Strengths:

      Overall, this work provides important insight into how perturbation of cerebellar circuits can elicit diverse effects on movement across multiple timescales.

      The HFS approach provides temporal resolution and enables analysis that would be hard to perform in the context of chronic lesions or slow pharmacological interventions. Thus, this study describes an important advance over prior methods of circuit disruption, and their approach can be used as a framework for future studies that delve deeper into how additional aspects of sensorimotor control are disrupted (e.g., response to limb perturbations).

      In addition, the authors use well-designed behavioral approaches and analysis methods to distinguish immediate from longer-term adaptive effects of HFS on behavior. Moreover, inverse dynamics modeling provides important insight into how movements with different kinematics and muscle dynamics might be differentially disrupted by cerebellar perturbation.

      Weaknesses:

      The argument that there are acute and adaptive effects to perturbing cerebellar circuits is compelling, but there seems to be a lost opportunity to leverage the fast and reversible nature of the perturbations to further test this idea and strengthen the interpretation. Specifically, the authors could have bolstered this argument by looking at the effects of terminating HFS - one might hypothesize that the acute impacts on muscle torques would quickly return to baseline in the absence of HFS, whereas the longer-term adaptive component would persist in the form of aftereffects during the 'washout' period. As is, the reversible nature of the perturbation seems underutilized in testing the authors' ideas.

      We agree that our approach could more explicitly exploit the rapid reversibility of high-frequency stimulation (HFS) by examining post-stimulation ‘washout’ periods. However, for the present dataset, we ended the session after the set of cerebellar block trials. We plan to study the effect of cerebellar block on immediate post-block washout trials in the future.  

      The analysis showing that there is a gradual reduction in velocity during what the authors call an adaptive phase is convincing. That said, the argument is made that this is due to difficulty in compensating for interaction torques. Even if the inward targets (i.e., targets 6-8) do not show a deficit during the acute phase, these targets still have significant interaction torques (Figure 3c). Given the interpretation of the data as presented, it is not clear why disruption of movement during the adaptive phase would not be seen for these targets as well since they also have large interaction torques. Moreover, it is difficult to delve into this issue in more detail, as the analyses in Figures 4 and 5 omit the inward targets.

      The reviewer is right and movements to Targets 6–8 (inward) were seemingly unaffected despite also involving significant interaction torques. In fact, we have already attempted to address this issue in the discussion section of the version 1 of our manuscript. Specifically, we note that while outward targets (2–4) tend to involve higher coupling torque impulses on average, this alone does not fully explain the differential impact of cerebellar block, as illustrated by discrepancies at the individual target level (e.g., target 7 vs. target 1). We proposed two possible explanations: (1) a bias toward shoulder flexion in the effect of cerebellar block—consistent with earlier studies showing ipsilateral flexor activation or tone changes following stimulation or lesioning of the deep cerebellar nuclei; and (2) a posture-related facilitation of inward (shoulder extension) movements from the central starting position.

      The text in the Introduction and in the prior work developing the HFS approach overstates the selectivity of the perturbations. First, there is an emphasis on signals transmitted to the neocortex. As the authors state several times in the Discussion, there are many subcortical targets of the cerebellar nuclei as well, and thus it is difficult to disentangle target-specific behavioral effects using this approach. Second, the superior cerebellar peduncle contains both cerebellar outputs and inputs (e.g., spinocerebellar). Therefore, the selectivity in perturbing cerebellar output feels overstated. Readers would benefit from a more agnostic claim that HFS affects cerebellar communication with the rest of the nervous system, which would not affect the major findings of the study.

      The reviewer is right that the superior cerebellar peduncle carries both descending and ascending fibers, and that cerebellar nuclei project to subcortical as well as cortical targets. However, it is also important to note that in primates the cerebellar-thalamo-cortical (CTC) pathway greatly expanded (on the expanse of the cerbello-rubro-spinal tract) in mediating cerebellar control of voluntary movements (Horne and Butler, 1995). The cerebello-subcortical pathways lost its importance over the course of evolution (Nathan and Smith, 1982, Padel et al., 1981, ten Donkelaar, 1988). In our previous study we found that the ascending spinocerebellar axons which enter the cerebellum through the SCP are weakly task-related and the descending system is quite small (Cohen et al, 2017). However, we cannot rule out an effect of HFS mediated in part through other systems. In the revised introduction section, we will clarify this point and use more careful language about the scope of our stimulation, emphasizing that HFS disrupts cerebellar communication broadly, rather than solely the cerebello-thalamo-cortical pathway.

      The text implies that increased movement decomposition and variability must be due to noise. However, this assumption is not tested. It is possible that the impairments observed are caused by disrupted commands, independent of whether these command signals are noisy. In other words, commands could be low noise but still faulty.

      We recognize the reviewer’s concern about linking movement decomposition and trial-to-trial trajectory variability with motor noise. As presented in our discussion section, we interpret these motor abnormalities as a form of motor noise in the sense that they are generated by faulty motor commands. We draw our interpretation from the findings of previous research work which show that the cerebellum aids in the state estimation of the limb and subsequent generation of accurate feedforward commands. Therefore, disruption of the cerebellar output may lead to faulty motor commands resulting in the observed asynchronous joint activations (i.e., movement decomposition) and unpredictable trajectories (i.e., increased trial-to-trial variability). Both observed deficits resemble increased motor noise.

      Throughout the text, the use of the term 'feedforward control' seems unnecessary. To dig into the feedforward component of the deficit, the authors could quantify the trajectory errors only at the earliest time points (e.g., in Figure 5d), but even with this analysis, it is difficult to disentangle feedforward- and feedback-mediated effects when deficits are seen throughout the reach. While outside the scope of this study, it would be interesting to explore how feedback responses to limb perturbation are affected in control versus HFS conditions. However, as is, these questions are not explored, and the claim of impaired feedforward control feels overstated.

      We agree that to strictly focus on feedforward control, we could have examined the measured variables in the first 50-100 ms of the movement which has been shown to be unaffected by feedback responses (Pruszynski et al. 2008, Todorov and Jordan 2002, Pruszynski and Scott 2012, Crevecoeur et al. 2013). However, in our task the amplitude of movements made by our monkeys was small and therefore the response measures we used were too small in the first 50-100 ms for a robust estimation. Also, fixing a time window led to an unfair comparison between control and cerebellar block trials, in which velocity was significantly reduced and therefore movement time was longer. Therefore, we used the peak velocity, torque-impulse at the peak velocity and maximum deviation of the hand trajectory as response measures. We will acknowledge this point in the discussion section of our revised manuscript. We will also tone down references to feedforward control throughout the text of our revised manuscript as suggested by the reviewer.

      The terminology 'single-joint' movement is a bit confusing. At a minimum, it would be nice to show kinematics during different target reaches to demonstrate that certain targets are indeed single joint movements. More of an issue, however, is that it seems like these are not actually 'single-joint' movements. For example, Figure 2c shows that target 1 exhibits high elbow and shoulder torques, but in the text, T1 is described as a 'single-joint' reach (e.g. lines 155-156). The point that I think the authors are making is that these targets have low interaction torques. If that is the case, the terminology should be changed or clarified to avoid confusion.

      Indeed, as reviewer #1 also noted, movements to target 1 and 5 are not purely single-joint but rather have relatively low coupling torques. Our intention while using the term “single-joint” was to indicate a target direction in which one joint remains stationary, resulting in minimal coupling torque at the adjacent joint. Specifically, for Targets 1 and 5 in our experiments, the net torque (and thus acceleration) at the elbow was negligible, and hence the shoulder experienced correspondingly low coupling torque (as illustrated in Figure 3c of our manuscript). ). To avoid confusion, we will use the term ‘predominantly single-joint’ movements in our revised manuscript to indicate targets with low coupling torques. We will also include an additional figure in the revised supplementary material displaying the net torques at the shoulder and elbow, similar to Figures 2c and 3c. Our goal is to demonstrate that movements to targets 1 and 5 are characterized by predominantly one-joint engagement (i.e., the elbow is stationary with low net torque) and low coupling torques, rather than implying a purely isolated, single-joint motion.

      The labels in Figure 3d are confusing and could use more explanation in the figure legend.

      In Figure 3d, it is stated that data from all monkeys is pooled. However, if there is a systematic bias between animals, this could generate spurious correlations. Were correlations also calculated for each animal separately to confirm the same trend between velocity and coupling torques holds for each animal?

      We will revise the figure legend and main-text explanation for Figure 3d. Please note that pooling the data was done after confirming that data from each animal expressed a similar trend. Specifically, the correlation coefficients were positive but significant for 3 out of the 4 monkeys. Moreover, following the reviewers’ feedback, we also did a partial correlation analysis (which controls for the variability across monkeys) and found a significant correlation (r = 0.33, p < 0.001). These points will be described in the revised manuscript.

      In Table S1, it would be nice to see target-specific success rates. The data would suggest that targets with the highest interaction torques will have the largest reduction in success rates, especially during later HFS trials. Is this the case?

      We will provide a breakdown of the success rates as a function of targets. However, one should note that success/failure may depend on several factors beyond impaired limb dynamics. In a previous study (Nashef et al. 2019) we identified several causes of failure such as (i) not entering the central target in time, (ii) moving out too early from the peripheral target, (iii) Reaction time longer than permitted, or (iv) premature exit from the central target before permitted.

    1. Author response:

      We thank all reviewers for the highly detailed review and the time and effort which has been invested in this review. We have read their perspectives, questions and suggested improvements with great interest. We have reflected on the public review in detail and have made the first provisional responses which are outlined below. First, we would like to respond to four main issues pointed out by the editor and reviewers:

      (1) Lack of yield data in the manuscript: There have been yield data collected in most of the sites and years of our study, and these have already been published and cited in our manuscript. In the appendix of our manuscript, we included a table with yield data for the sites and years in which the beetle diversity was studied. These data show that strip cropping does not cause a systematic yield reduction.

      (2) Sampling design clarification: Our paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight inconsistencies in how data were collected or processed (e.g. taxonomic level of species identification). We will explain the sampling design and data analysis in more detail to increase clarity and transparency.

      (3) Additional data analysis: In the revised manuscript we will present an analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. This will give better insight of the variation in responses among ground beetle taxa.

      (4) Restrict findings to our system: We will nuance our findings further and will focus more strongly on the implications of our data on ground beetle communities, rather than on agrobiodiversity in a broader sense.

      We will further work on improving the manuscript based on reviewers feedback in the coming weeks, aiming to submit a revised version of the manuscript at the end of February.

      Detailed response to editor and reviewers:

      Editor Comments:

      (1) You only have analyzed ground beetle diversity, it would be important to add data on crop yields, which certainly must be available (note that in normal intercropping these would likely be enhanced as well).

      Most yield data have been published in three previous papers, which we already cited or will cite (one was not yet published at the time of submission). Our argumentation is based on these studies. We had also already included a table in the appendix that showed the yield data that relates specifically to our locations and years of measurement. The finding that strip cropping does not majorly affect yield is based on these findings. We will consider changing the title of our manuscript to remove the explicit focus on yield.

      (2) Considering the heterogeneous data involving different experiments it is particularly important to describe the sampling design in detail and explain how various hierarchical levels were accounted for in the analysis.

      We agree that some important details to our analysis were not described in sufficient detail. Especially reviewer 2 pointed out several relevant points that we did account for in our analyses, but which were not clear from the text in the methods section. We are convinced that our data analyses are robust and that our conclusions are supported by the data. We will revise the methods section to make our approach clearer and more transparent.

      (3) In addition to relative changes in richness and density of ground beetles you should also present the data from which these have been derived. Furthermore, you could also analyze and interpret the response of the different individual taxa to strip cropping.

      With our heterogeneous dataset it was quite complicated to show overall patterns of absolute changes in ground beetle abundance and richness, especially for the field-level analyses. As the sampling design was not always the same and occasionally samples were missing, the number of year series that made up a datapoint were different among locations and years. However, we always made sure that for the comparison of a paired monoculture and strip cropping field, the number of year series was always made equal through rarefaction. That is, the number of ground beetle(s) (species) are always expressed as the number per 2 to 6 samples. Therefore, we prefer to stick to relative changes as we are convinced that this gives a fairer representation of our complex dataset.

      We agree with the second point that both the editor and several reviewers pointed out. The indicator species analyses that we used were biased by rare species, and we now omit this analysis. Instead, we will include a GLM analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. We chose for genera here (and not species) as we could then include all locations and years within the analysis, and in most cases a genus was dominated by a single species (but notable exceptions were Amara and Harpalus, which were made up of several species). We will illustrate these findings still in a similar fashion as we did for the indicator species analysis.

      (4) Keep to your findings and don't overstate them but try to better connect them to basic ecological hypotheses potentially explaining them.

      After careful consideration of the important points that reviewers point out, we decided to nuance our points about biodiversity conservation along two key lines: (1) the extent to which ground beetles can be indicators of wider biodiversity changes; and (2) our findings that are not as straightforward positive as our narrative suggests. We still believe that strip cropping contributes positively to carabid communities, and will carefully check the text to avoid overstatements.

      Reviewer 1:

      Summary:

      This study demonstrates that strip cropping enhances the taxonomic diversity of ground beetles across organically-managed crop systems in the Netherlands. In particular, strip cropping supported 15% more ground beetle species and 30% more individuals compared to monocultures.

      Strengths:

      A well-written study with well-analyzed data of a complex design. The data could have been analyzed differently e.g. by not pooling samples, but there are pros and cons for each type of analysis and I am convinced this will not affect the main findings. A strong point is that data were collected for 4 years. This is especially strong as most data on biodiversity in cropping systems are only collected for one or two seasons. Another strong point is that several crops were included.

      We thank reviewer 1 for their kind words and agree with this strength of the paper. The paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight inconsistencies in how data were collected or processed (e.g. taxonomic level of species identification).  

      Weaknesses:

      This study focused on the biodiversity of ground beetles and did not examine crop productivity. Therefore, I disagree with the claim that this study demonstrates biodiversity enhancement without compromising yield. The authors should present results on yield or, at the very least, provide a stronger justification for this statement.

      We acknowledge that we indeed did not formally analyze yield in our study, but we have good reason for this. The claim that strip cropping does not compromise yield comes from several extensive studies (Juventia et al., 2024; Ditzler et al., 2023; Carillo-Reche et al., 2023) that were conducted in nearly all the sites and years that we included in our study. We chose not to include formal analyses of productivity for two key reasons: (1) a yield analysis would duplicate already published analyses, and (2) we prefer to focus more on the ecology of ground beetles and the effect of strip cropping on biodiversity, rather than diverging our focus also towards crop productivity. Nevertheless, we have shown the results on yield in Table S6 and refer extensively to the studies that have previously analyzed this data.

      Reviewer 2:

      Summary:

      The authors aimed to investigate the effects of organic strip cropping on carabid richness and density as well as on crop yields. They find on average higher carabid richness and density in strip cropping and organic farming, but not in all cases.

      Strengths:

      Based on highly resolved species-level carabid data, the authors present estimates for many different crop types, some of them rarely studied, at the same time. The authors did a great job investigating different aspects of the assemblages (although some questions remain concerning the analyses) and they present their results in a visually pleasing and intuitive way.

      We appreciate the kind words of reviewer 2 and their acknowledgement of the extensiveness of our dataset. In our opinion, the inclusion of many different crops is indeed a strength, rarely seen in similar studies; and we are happy that the figures are appreciated.

      Weaknesses:

      The authors used data from four different strip cropping experiments and there is no real replication in space as all of these differed in many aspects (different crops, different areas between years, different combinations, design of the strip cropping (orientation and width), sampling effort and sample sizes of beetles (differing more than 35 fold between sites; L 100f); for more differences see L 237ff). The reader gets the impression that the authors stitched data from various places together that were not made to fit together. This may not be a problem per se but it surely limits the strength of the data as results for various crops may only be based on small samples from one or two sites (it is generally unclear how many samples were used for each crop/crop combination).

      The paper indeed combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight differences in the experimental design. At the time that we did our research, there were only a handful of farmers that were employing strip cropping within the Netherlands, which greatly reduced the number of fields for our study. Therefore, we worked in the sites that were available and studied as many crops on these sites. Since there was variation in the crops grown in the sites, for some crops we have limited replication. In the revision we will explain this more clearly.

      One of my major concerns is that it is completely unclear where carabids were collected. As some strips were 3m wide, some others were 6m and the monoculture plots large, it can be expected that carabids were collected at different distances from the plot edge. This alone, however, was conclusively shown to affect carabid assemblages dramatically and could easily outweigh the differences shown here if not accounted for in the models (see e.g. Boetzl et al. (2024) or Knapp et al. (2019) among many other studies on within field-distributions of carabids).

      Point well taken and we will present a more detailed description of the sampling design in the methods. Samples were always taken at least 10 meters into the field, and always in the middle of the strip. This would indeed mean that there is a small difference between the 3- and 6m wide strips regarding distance from another strip, but this was then only a difference of 1.5 to 3 meters from the edge. A difference that, based on our own extensive experience with ground beetle communities, will not have a large impact on the findings of ground beetles. The distance from field/plot edges was similar between monocultures and strip cropped fields.

      The authors hint at a related but somewhat different problem in L 137ff - carabid assemblages sampled in strips were sampled in closer proximity to each other than assemblages in monoculture fields which is very likely a problem. The authors did not check whether their results are spatially autocorrelated and this shortcoming is hard to account for as it would have required a much bigger, spatially replicated design in which distances are maintained from the beginning. This limitation needs to be stated more clearly in the manuscript.

      This is a limitation that is hard to avoid in comparisons between strip cropping and monoculture systems because the use of a statistically robust design with sufficient replication and still using field sizes that are representative for farming practice are often not possible. We will acknowledge this limitation in the revised manuscript. To allow a fair comparison based on sufficient number of replications, we chose to combine data from several years and locations (despite this not being the ideal experimental design). This approach has the drawback that ground beetle communities are difficult to compare. Therefore, we chose to further investigate two years of data from Wageningen as the factorial design allowed a fair comparison between monocultures and strip cropping. We analyzed three crop combinations during two years, but we still cannot exclude a potential influence of spatial autocorrelation. We acknowledged this limitation in our original submission, and we will clarify this point further in the revision. 

      Similarly, we know that carabid richness and density depend strongly on crop type (see e.g. Toivonen et al. (2022)) which could have biased results if the design is not balanced (this information is missing but it seems to be the case, see e.g. Celeriac in Almere in 2022).

      The samples size ranges between 2 and 6 per combination of cropping design, crop, location and year. We believe that this will allow a meaningful analysis. Moreover, our main focus is the comparison between monoculture and strip cropping, and not the comparison between different crops. Even though we show that crop types have different ground beetle communities, we are most interested in the contrast of ground beetle communities in strip cropping and monoculture systems.  

      A more basic problem is that the reader neither learns where traps were located, how missing traps were treated for analyses how many samples there were per crop or crop combination (in a simple way, not through Table S7 - there has to have been a logic in each of these field trials) or why there are differences in the number of samples from the same location and year (see Table S7). This information needs to be added to the methods section.

      Point well taken. We will clarify this further in the revised manuscript. As we combined data from several experimental designs that originally had slightly different research questions, this in part caused differences between numbers of rounds or samples per crop, location or year.

      As carabid assemblages undergo rapid phenological changes across the year, assemblages that are collected at different phenological points within and across years cannot easily be compared. The authors would need to standardize for this and make sure that the assemblages they analyze are comparable prior to analyses. Otherwise, I see the possibility that the reported differences might simply be biased by phenology.

      We agree and we dealt with this issue by using year series instead of using individual samples of different rounds. While this approach is not perfect, it allows us to get the best possible impression of the entire ground beetle community across seasons. For our analyses we had the choice to only include data from sampling rounds that were conducted at the same time, or to include all available data. We chose to analyze all data, and made sure that the number of samples between strip cropping and monoculture fields per location, year and crop was always the same by pooling and rarefaction. In this way we have analyzed a complex multi-year, multi-crop and multi-location dataset as good as we could.

      Surrounding landscape structure is known to affect carabid richness and density and could thus also bias observed differences between treatments at the same locations (lower overall richness => lower differences between treatments). Landscape structure has not been taken into account in any way.

      We did not include landscape structure as there are only 4 sites, which does not allow a meaningful analysis of potential effects landscape structure. Studying how landscape interacts with strip cropping to influence insect biodiversity would require at least, say 15 to 20 sites, which was not feasible for this study. However, such an analysis may be possible in an ongoing project (CropMix) which includes many farms that work with strip cropping.

      In the statistical analyses, it is unclear whether the authors used estimated marginal means (as they should) - this needs to be clarified.

      In the revised manuscript we will further clarify this point.

      In addition, and as mentioned by Dr. Rasmann in the previous round (comment 1), the manuscript, in its current form, still suffers from simplified generalizations that 'oversell' the impact of the study and should be avoided. The authors restricted their analyses to ground beetles and based their conclusions on a design with many 'heterogeneities' - they should not draw conclusions for farmland biodiversity but stick to their system and report what they found. Although I understand the authors have previously stated that this is 'not practically feasible', the reason for this comment is simply to say that the authors should not oversell their findings.

      In the revised manuscript, we will nuance our findings by explaining that strip cropping is a potentially useful tool to support ground beetle biodiversity in agricultural fields, but the effects on other taxa still needs to be further explored.

      Reviewer 3:

      Summary:

      In this paper, the authors made a sincere effort to show the effects of strip cropping, a technique of alternating crops in small strips of several meters wide, on ground beetle diversity. They state that strip cropping can be a useful tool for bending the curve of biodiversity loss in agricultural systems as strip cropping shows a relative increase in species diversity (i.e. abundance and species richness) of the ground beetle communities compared to monocultures. Moreover, strip cropping has the added advantage of not having to compromise on agricultural yields.

      Strengths:

      The article is well written; it has an easily readable tone of voice without too much jargon or overly complicated sentence structure. Moreover, as far as reviewing the models in depth without raw data and R scripts allows, the statistical work done by the authors looks good. They have well thought out how to handle heterogenous, yet spatially and temporarily correlated field data. The models applied and the model checks performed are appropriate for the data at hand. Combining RDA and PCA axes together is a nice touch.

      We thank reviewer 3 for their kind words and appreciation for the simple language and analysis that we used.

      Weaknesses:

      The evidence for strip cropping bringing added value for biodiversity is mixed at best. Yes, there is an increase in relative abundance and species richness at the field level, but it is not convincingly shown this difference is robust or can be linked to clear structural and hypothesised advantages of the strip cropping system. The same results could have been used to conclude that there are only very limited signs of real added value of strip cropping compared to monocultures.

      Point well taken. We agree that the effect of strip cropping on carabid beetle communities are subtle and we will nuance the text in the revised version to reflect this.

      There are a number of reasons for this:

      (1) Significant differences disappear at crop level, as the authors themselves clearly acknowledge, meaning that there are no differences between pairs of similar crops in the strip cropping fields and their respective monoculture. This would mean the strips effectively function as "mini-monocultures".

      This is indeed in line with our conclusions. Based on our data and results, the advantages of strip cropping seem mostly to occur because crops with different communities are now on a same field, rather than that within the strips you get mixtures of communities related to different crops. We discussed this in the first paragraph of the discussion in the original submission.

      The significant relative differences at the field level could be an artifact of aggregation instead of structural differences between strip cropping and monocultures; with enough data points things tend to get significant despite large variance. This should have been elaborated further upon by the authors with additional analyses, designed to find out where differences originate and what it tells about the functioning of the system. Or it should have provided ample reason for cautioning in drawing conclusions about the supposed effectiveness of strip cropping based on these findings.

      We believe that this is a misunderstanding of our approach. In the field-level analyses we pooled samples from the same field (i.e. pseudo-replicates were pooled), resulting in a relatively small sample size of 50 samples. We will explain this better in the methods section. Therefore, the statement “with enough data points things tend to get significant” is not applicable here.

      (2) The authors report percentages calculated as relative change of species richness and abundance in strip cropping compared to monocultures after rarefaction. This is in itself correct, however, it can be rather tricky to interpret because the perspective on actual species richness and abundance in the fields and treatments is completely lost; the reported percentages are dimensionless. The authors could have provided the average cumulative number of species and abundance after rarefaction. Also, range and/or standard error would have been useful to provide information as to the scale of differences between treatments. This could provide a new perspective on the magnitude of differences between the two treatments which a dimensionless percentage cannot.

      We agree that this would be the preferred approach if we would have had a perfectly balanced dataset. However, this approach is not feasible with our unbalanced design and differences in sampling effort. While we acknowledge the limitation of the interpretation of percentages, it does allow reporting relative changes for each combination of location, year and crop. The number of samples on which the percentages were based were always kept equal (through rarefaction) between the cropping systems (for each combination of location, year and crop), but not among crops, years and location. The reason for this is that we did not always have an equal number of samples available between both cropping systems, and this approach allowed us to make a better estimation whenever more samples were available. For example, sometimes we had 2 samples from a strip cropped field and 6 from the monoculture, here we would use rarefaction up to 2 samples (where we would just have a better estimation from the monoculture). In other cases, we had 4 samples in both strip cropped and monoculture field, here we chose to use rarefaction to 4 samples to get a better estimation altogether. Adding a value for actual richness or abundance to the figures would have distorted these findings, as the variation would be huge (as it would represent the number of ground beetle(s) species per 2 to 6 pitfall samples). Furthermore, the dimension that reviewer 3 describes would thus be “The number of ground beetle species / individuals per 2 to 6 samples”, not a very informative unit either. We chose to trade-off better estimations of difference between cropping systems over a more readily interpretable unit.

      (3) The authors appear to not have modelled the abundance of any of the dominant ground beetle species themselves. Therefore it becomes impossible to assess which important species are responsible (if any) for the differences found in activity density between strip cropping and monocultures and the possible life history traits related reasons for the differences, or lack thereof, that are found. A big advantage of using ground beetles is that many life history traits are well studied and these should be used whenever there is reason, as there clearly is in this case. Moreover, it is unclear which species are responsible for the difference in species richness found at the field level. Are these dominant species or singletons? Do the strip cropping fields contain species that are absent in the monoculture fields and are not the cause of random variation or sampling? Unfortunately, the authors do not report on any of these details of the communities that were found, which makes the results much less robust.

      Thank you for raising this point. We have reconsidered our indicator species analysis and found that it is rather sensitive for rare species and insensitive for changes in common species. Therefore, we will replace the indicator species analyses with a GLM analysis for the 12 most common genera of ground beetles In the revised manuscript. This will allow us to go more in depth on specific traits of the genera which abundances change depending on the cropping system. In the revised manuscript, we will also discuss these common genera more in depth, rather than focusing on rarer species. Furthermore, we will add information on rarity and habitat preference to the table that shows species abundances per location (Table S2).

      (4) In the discussion they conclude that there is only a limited amount of interstrip movement by ground beetles. Otherwise, the results of the crop-level statistical tests would have shown significant deviation from corresponding monocultures. This is a clear indication that the strips function more like mini-monocultures instead of being more than the sum of its parts.

      This is in line with our point in the first paragraph of the discussion and an important message of our manuscript.

      (5) The RDA results show a modelled variable of differences in community composition between strip cropping and monoculture. Percentages of explained variation of the first RDA axis are extremely low, and even then, the effect of location and/or year appear to peak through (Figure S3), even though these are not part of the modelling. Moreover, there is no indication of clustering of strip cropping on the RDA axis, or in fact on the first principal component axis in the larger RDA models. This means the explanatory power of different treatments is also extremely low. The crop level RDA's show some clustering, but hardly any consistent pattern in either communities of crops or species correlations, indicating that differences between strip cropping and monocultures are very small.

      We agree and we make a similar point in the first paragraph of the discussion.

      Furthermore, there are a number of additional weaknesses in the paper that should be addressed:

      The introduction lacks focus on the issues at hand. Too much space is taken up by facts on insect decline and land sharing vs. land sparing and not enough attention is spent on the scientific discussion underlying the statements made about crop diversification as a restoration strategy. They are simply stated as facts or as hypotheses with many references that are not mentioned or linked to in the text. An explicit link to the results found in the large number of references should be provided.

      We will streamline the introduction by omitting the land sharing vs. land sparing topic and better linking references to our research findings.

      The mechanistic understanding of strip cropping is what is at stake here. Does strip cropping behave similarly to intercropping, a technique that has been proven to be beneficial to biodiversity because of added effects due to increased resource efficiency and greater plant species richness? This should be the main testing point and agenda of strip cropping. Do the biodiversity benefits that have been shown for intercropping also work in strip cropping fields? The ground beetles are one way to test this. Hypotheses should originate from this and should be stated clearly and mechanistically.

      We agree with the reviewer and will clarify this research direction clearer in the introduction of the revised manuscript.

      One could question how useful indicator species analysis (ISA) is for a study in which predominantly highly eurytopic species are found. These are by definition uncritical of their habitat. Is there any mechanistic hypothesis underlying a suspected difference to be found in preferences for either strip cropping or monocultures of the species that were expected to be caught? In other words, did the authors have any a priori reasons to suspect differences, or has this been an exploratory exercise from which unexplained significant results should be used with great caution?

      Point well taken. We agree that the indicator species analysis has limitations and therefore now replaced this with GLM analysis for the 12 most common ground beetle genera.

      However, setting these objections aside there are in fact significant results with strong species associations both with monocultures and strip cropping. Unfortunately, the authors do not dig deeper into the patterns found a posteriori either. Why would some species associate so strongly with strip cropping? Do these species show a pattern of pitfall catches that deviate from other species, in that they are found in a wide range of strips with different crops in one strip cropping field and therefore may benefit from an increased abundance of food or shelter? Also, why would so many species associate with monocultures? Is this in any way logical? Could it be an artifact of the data instead of a meaningful pattern? Unfortunately, the authors do not progress along these lines in the methods and discussion at all.

      We thank reviewer 3 for these valuable perspectives. In the revised manuscript, we will further explore the species/genera that respond to cropping systems and discuss these findings in more detail.

      A second question raised in the introduction is whether the arable fields that form part of this study contain rare species. Unfortunately, the authors do not elaborate further on this. Do they expect rare species to be more prevalent in the strip cropping fields? Why? Has it been shown elsewhere that intercropping provides room for additional rare species?

      The answer is simply no, we did not find more rare species in strip cropping. In the revised manuscript, we will add a column for rarity (according to waarneming.nl) in the table showing abundances of species per location. We only found two rare species, one of which we only found a single individual and one that was more related to the open habitat created by a failed wheat field. We will discuss this more in depth in the discussion.

      Considering the implications the results of this research can have on the wider discussion of bending the curve and the effects of agroecological measures, bold claims should be made with extreme restraint and be based on extensive proof and robust findings. I am not convinced by the evidence provided in this article that the claim made by the authors that strip cropping is a useful tool for bending the curve of biodiversity loss is warranted.

      We believe that strip cropping can be a useful tool because farmers readily adopt it and it can result in modest biodiversity gains without yield loss. However, strip cropping is indeed not a silver bullet (which we also don’t claim). We will nuance the implications of our study in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Goal: Find downstream targets of cmk-1 phosphorylation, identify one that also seems to act in thermosensory habituation, test for genetic interactions between cmk-1 and this gene, and assess where these genes are acting in the thermosensory circuit during thermosensory habituation.

      Methods: Two in vitro analyses of cmk-1 phosphorylation of C. elegans proteins. Thermosensory habituation of cmk-1 and tax-6 mutants and double mutants was assessed by measuring the rate of heat-evoked reversals (reversal probability) of C. elegans before and after 20s ISI repeated heat pulses over 60 minutes.

      Conclusions: cmk-1 and tax-6 act in separate habituation processes, primarily in AFD, that interact complexly, but both serve to habituate the thermosensory reversal response. They found that cmk-1 primarily acts in AFD and tax-6 primarily acts in RIM (and FLP for naïve responses). They also identified hundreds of potential cmk-1 phosphorylation substrates in vitro.

      Strengths:

      The effect size in the genetic data is quite strong and a large number of genetic interaction experiments between cmk-1 and tax-1 demonstrate a complex interaction.

      Thanks a lot for these positive remarks.

      Weaknesses:

      The major concern about this manuscript is the assumption that the process they are observing is habituation. The two previously cited papers using this (or a very similar) protocol, Lia and Glauser 2020 and Jordan and Glauser 2023, both use the word 'adaptation' to describe the observed behavioral decrement. Jordan and Glauser 2023 use the words 'habituation' or 'habituation-like' 10 times, however, they use 'adaptation' over 100 times. It is critical to distinguish habituation from sensory adaptation (or fatigue) in this thermal reversal protocol. These processes are often confused/conflated, however, they are very different; sensory adaptation is a process that decreases how much the nervous system is activated by a repeated stimulus, therefore it can even occur outside of the nervous system. Habituation is a learning process where the nervous system responds less to a repeated stimulus, despite (at least part of the nervous system) the nervous system still being similarly activated by the stimulus. Habituation is considered an attentional process, while adaptation is due to the fatigue of sensory transduction machinery. Control experiments such as tests for dishabituation (where the application of a different stimulus causes recovery of the decremented response) or rate of spontaneous recovery (more rapid recovery after short inter-stimulus intervals) are required to determine if habituation or sensory adaptation are occurring. These experiments will allow the results to be interpreted with clarity, without them, it isn't actually clear what biological process is actually being studied.

      Thanks for the comment. As this reviewer points out, “adaptation” and “habituation” are often conflated. Many scientists (maybe not the majority though) use a less stringent definition for the word habituation, than the one presented by this reviewer. More particularly, the term habituation is used in human pain research to refer solely to the reduction of response to repeated stimuli, in the absence of a detailed assessment of the more stringent criteria mentioned here. In addition to the practice in pain research, the main reason why we steered toward ‘habituation’ from our previous publication is because it immediately conveys the idea of a response reduction, whereas ‘adaptation’ could in principle be either an up-regulation or a down-regulation of the response (again, based on various definitions). But we agree that using the word “habituation” came at the cost of triggering a confusion about the exact nature of the process, for those considering the stricter definition of the word “habituation”. In the manuscript under revision, we are changing this terminology to “adaptation”. Also following suggestions from Reviewer 2, we are strengthening the description of the protocol in the Result section and clarifying, why the adaptation phenomenon is not a ‘thermal damage’ effect or ‘fatigue’ effect in the neuro-muscular circuit controlling reversal.

      While the discrepancy between the in vitro phosphorylation experiments and the in silico predictions was discussed, the substantial discrepancy (over 85% of the substrates in the smaller in vitro dataset were not identified in the larger dataset) between the two different in vitro datasets was not discussed. This is surprising, as these approaches were quite similar, and it may indicate a measure of unreliability in the in vitro datasets (or high false negative rates).

      Thanks for the comment. This is an important aspect which we will more extensively cover in the Discussion section of the revised manuscript.

      The strong consistency of the CMK-1 recognition consensus sequences across the two in vitro dataset speaks against the unreliability of the analyses. Instead, there are a few points to highlight that explain the somewhat low degree of overlap between the two datasets, which indeed relate to the false negative rates as this reviewer suggests.

      (1) In the peptide library analysis, Trypsin cleavage prior to kinase treatment will leave a charged N-term or C- terminus and in addition remove part of the protein context required for efficient kinase recognition. This will have a variable effect across the different substrates in the peptide library, depending on the distance between the cleavage site and the phosphosite, but will not affect the native protein library. This effect increases the false negative rate in the peptide library.

      (2) The number and distribution of “available substrate phosphosites” diverge in the two libraries. Indeed, the peptide library is expected to contain a markedly larger diversity of potential CMK-1 substrate sites than the protein library (because the Trypsin digestion will reveal substrates that are normally buried in a native protein), but the depth of MS analysis is the same for the two libraries. In somewhat simplistic terms, the peptide-library analysis is prone to be saturated with abundant phosphorylated peptides, which prevent detecting all phosphosites. If the peptide analysis could have been made deeper, we would probably have increased the overlap (at the cost of increasing the number of false positive too).

      (3) We have chosen quite strict criteria and applied them separately to define each hit list; therefore, we know we have many false negatives in each list, which will naturally reduce the expected overlap.

      As we will clarify in the revised manuscript, we tend to give more trust to the protein-library dataset (since substrates are in a configuration closer to that in vivo), with those hits also present in the peptide dataset (like TAX-6 was) as the most convincing hits, as they could be validated in a second type of experiment.

      Additionally, the rationale for, and distinction between, the two separate in vitro experiments is not made clear.

      We reasoned that both substrate types have their own benefits and limitations (as discussed in the manuscript), so it was an added value to run both. We proposed that the subset of targets present in both datasets to be the most solid list of candidates. We will also reinforce our point in the revised discussion that the protein-library is likely to contain much less false positives.

      Line 207: After reporting that both tax-6 and cnb-1 mutants have high spontaneous reversals, it is not made clear why cnb-1 is not further explored in the paper. Additionally, this spontaneous reversal data should be in a supplementary figure.

      We kept the focus of the article primarily on TAX-6, because it was identified as CMK-1 target in vitro; CNB-1 was not. Moreover, we didn’t have cnb-1(gf) mutants to pursue the analysis, and we were stuck by the cnb-1(lf) constitutive high reversal rate for any further follow up. We have added a supplementary file to present the spontaneous reversals rates.

      Figure 3 -S1: This model doesn't explain why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement (presumably by reducing the inhibition by tax-6) but the +cyclo A group (inhibited tax-6) showed weaker response decrement, as here there is even further weakened inhibition of tax-6 on this process. Also, the cmk-1(lf) +cyclo A group is labeled as constitutive habituation, however, this doesn't appear to be the case in Figure 3 (seems like a similar initial level and response decrement phenotype to wildtype).

      Thanks a lot for the comment. We are glad that the presentation of our complex dataset was clear enough to bring the reader to that level of detailed reflection and interpretation on the proposed model. To address the two points raised in this reviewer’s comment, we are modifying to the model presentation and provide additional clarifications below, where we use the term adaptation instead of habituation (as in the revised Figure):

      Regarding the first point, “why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement … but the +cyclo A group showed weaker response decrement”. This is really a very good point, that cannot be easily explained if all the branches (arrows) in the model have the same weight or work as ON/OFF switches. We tried to convey the relative importance of the regulation effect via the thickness of the arrow lines (which we will clarified in the legend in the revised ms). The main ‘quantitative’ nuances to take into consideration here originate from 2 assuption of the model (which we are clarifying in the revised  manuscript):

      Assumption 1: the inhibitory effect of TAX-6 on the CMK-1 anti-adaptation branch and the inhibitory effect of TAX-6 on the CMK-1 pro-adaptation branch are not of the same magnitude (we have further enhanced the line thickness differences in the revised model, top left panel for wild type).

      Assumption 2: the two antagonistic direct effects of CMK-1 on adaptation are not of the same magnitude, most strikingly in the context of CMK-1(gf) mutants.

      In our model, the cyclosporin A treatment alone (bottom left panel) causes a strong boost on the CMK-1 inhibitory branch and a less marked boost on the CMK-1 activator branch (following assumption 1). This causes an imbalance between the two antagonist direct CMK-1-dependent drives, which reduces (but doesn’t fully block) adaptation. Indeed, we don’t observe a total block of adaptation with cyclosporin A in wild type, the effect being significantly milder than the totally non-adapting phenotypes seen, e.g., in TAX-6(gf) mutants. From there, the question is what happen in CMK-1(gf) background that would mask the anti-adaptation effect of Cyclosporin A? Here assumption 2 is relevant, and the CMK-1(gf) pro-adaptation direct branch is always prevalent and imbalance the regulation toward faster adaptation (the role of TAX-6 becoming negligible in the CMK-1(gf) background and ipso facto that of Cyclosporin A).

      Regarding the second point, “the cmk-1(lf) +cyclo A group is labeled as constitutive habituation”. We regret a confusing word choice in the first version of the manuscript; we intended to mean “normal habituation phenotype” but in the joint absence of antagonistic CMK-1 and TAX-6 regulatory signaling (so the regulation is not like in wild-type, but the phenotype ends up like in wild type). We are modifying the label to “normal adaptation” and will leave a note in the legend that an apparently normal adaptation phenotype seems to be the “default” situation when the two antagonistic regulatory pathways are shut off.

      More discussion of the significance of the sites of cmk-1 and tax-6 function in the neural circuit should take place. Additionally, incorporating the suspected loci of cmk-1 and tax-6 in the neural circuit into the model would be interesting (using proper hypothetical language). For example, as it seems like AFD is not required for the naïve reversal response but just its reduction, cmk-1 activity in AFD might be generating inhibition of the reversal response by AFD. It certainly would be understandable if this isn't workable, given extrasynaptic signaling and other unknowns, but it potentially could also be helpful in generating a working model for these complex interactions. For example, cmk-1 induces AIZ inhibition of AVA (AIZ is electrically coupled to AFD), and tax-6 reduces RIM activation of AVA (these neurons are also electrically coupled according to the diagram). RIM is also a neuropeptide-rich neuron, so this could allow it to interact with the cmk-1-related process(es) in AFD. Some discussion of possibilities like this could be informative.

      Thanks for the comment. These hypothetical inter-cellular communication pathways are indeed nice possibilities. On the other hand, we could envision several additional pathways. Following this helpful suggestion, we will expand the discussion of hypothetical models in the revised manuscript-

      Provide an explanation for why some of the experiments in Figure 4 have such a high N, compared to other experiments.

      The conditions with the highest n correspond to conditions which we have also used as ‘control’ condition for other type of experiments in the lab and as part of side projects, but which could be gathered for the present article. We have been working with cmk-1(lf) and tax-6(gf) mutants for many years… and the robust non-adapting phenotype was a reference point and a quality control when analyzing other non-adapting mutants.

      Because the loss of function and gain of function mutations in cmk-1 have a similar effect, it is likely that this thermosensory plasticity phenotype is sensitive to levels of cmk-1 activity. Therefore, it is not surprising that the cmk-1 promoter failed to rescue very well as these plasmid-driven rescues often result in overexpression. Given this and that the cmk-1p rescue itself was so modest, these rescue experiments are not entirely convincing (and very hard to interpret; for example, is the AFD rescue or the ASER rescue more complete? The ASER one is actually closer to the cmk-1p rescue). Given the sensitivity to cmk-1 activity levels, a degradation strategy would be more likely to deliver clear results (or perhaps even the overactivation approach used for tax-6).

      Thanks for the comment. We respectfully disagree with this reviewer’s statement “the loss of function and gain of function mutations in cmk-1 have a similar effect”. We suspect a confusion here, because our data clearly show that these two mutant types have an opposite phenotype. That being said, we interpret the weak rescue effect with cmk-1p as a probable result of overexpression or incomplete/imbalanced expression across neurons (as the promoter used might not include all the relevant regulatory regions). We dedicated considerable efforts to establish an endogenous CMK-1::degron knock in, for tissue-specific auxin-induced degradation (AID), but we were unfortunately not able to obtain consistent results. Unfortunately, the only useful data regarding CMK-1 place-of-action are the cell-specific rescue data already included in the report.

      Reviewer #2 (Public review):

      Summary:

      The reduction in a response to a specific stimulus after repeated exposures is called habituation. Alterations in habituation to noxious stimuli are associated with chronic pain in humans, however, the underlying molecular mechanisms involved are not clear. This study uses the nematode C. elegans to study genes and mechanisms that underlie habituation to a form of noxious stimuli based on heat, termed thermo-noxious stimuli. The authors previously showed that the Calcium/Calmodulin-dependent protein kinase (CMK-1) regulates thermo-nociceptive habituation in the nematode C. elegans. Although CMK-1 is a kinase with many known substrates, the downstream targets relevant for thermo-nociceptive habituation are not known. In this study, the authors use two different kinase screens to identify phosphorylation targets of CMK-1. One of the targets they identify is Calcineurin (TAX-6). The authors show that CMK-1 phosphorylates a regulatory domain of Calcineurin at a highly conserved site (S443). In a series of elegant experiments, the authors use genetic and pharmacological approaches to increase or decrease CMK-1 and Calcineurin signaling to study their effects on thermo-nociceptive habituation in C. elegans. They also combine these various approaches to study the interactions between these two signaling proteins. The authors use specific promoters to determine in which neurons CMK-1 and Calcineurin function to regulate thermo-nociceptive habituation. The authors propose a model based on their findings illustrating that CMK-1 and Calcineurin act mostly in different neurons to antagonistically regulate habituation to thermo-nociceptive stimuli in a complex manner.

      Strengths:

      (1) Given the conservation of habituation across phylogeny, identifying genes and mechanisms that underlie nociceptive habituation in C. elegans may be relevant for understanding chronic pain in humans.

      (2) The identification of canonical CaM Kinase phosphorylation motifs in the substrates identified in the CMK-1 substrate screen validates the screen.

      (3) The use of loss and gain of function approaches to study the effects of CMK-1 and Calcineurin on thermo-nociceptive responses and habituation is elegant.

      (4) The ability to determine the cellular place of action of CMK-1 and Calcineurin using neuron-specific promoters in the nematode is a clear strength of the genetic model system.

      Thanks a lot for these positive remarks.

      Weaknesses:

      (1) The manuscript begins by identifying Calcineurin as a direct substrate of CMK-1 but ends by showing that CMK-1 and Calcineurin mostly act in different neurons to regulate nociceptive habituation which disrupts the logical flow of the manuscript.

      We understand this point and we have carefully considered and (re-considered) the way to articulate the report. However, we could not present the story much differently as we would have no justification to investigate the role of TAX-6 and its interaction with CMK-1, if we would not have first identified it a phospho-target in vitro. Carefully considering this point, we found that the abstract of the first manuscript version was probably too cursory and susceptible to trigger wrong expectations among readers. We will extensively revise the abstract to clarify this point. Furthermore, we will reinforce this point in the last paragraph of the introduction.

      (2) The physiological relevance of CMK-1 phosphorylation of Calcineurin is not clear.

      We do agree and will explicitly discuss this aspect in the revised Discussion section, and make is also clear from the abstract on.

      (3) It is not clear if Calcineurin is already a known substrate of CaM Kinases in other systems or if this finding is new.

      We are not aware of any studies having shown Calcineurin is a direct target of CaM kinase I. But it was found to be substrate of CaM kinase II as well as of other kinases, as we explicitly presented in the discussion section. We will complement the text mentioning we are not aware of Calcineurin having so far been reported to by a CaM kinase I substrate.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper by Lee and Ouellette explores the role of cyclic-d-AMP in chlamydial developmental progression. The manuscript uses a collection of different recombinant plasmids to up- and down-regulate cdAMP production, and then uses classical molecular and microbiological approaches to examine the effects of expression induction in each of the transformed strains.

      Strengths:

      This laboratory is a leader in the use of molecular genetic manipulation in Chlamydia trachomatis and their efforts to make such efforts mainstream is commendable. Overall, the model described and defended by these investigators is thorough and significant.

      Weaknesses:

      The biggest weakness in the document is their reliance on quantitative data that is statistically not significant, in the interpretation of results. These challenges can be addressed in a revision by the authors.

      Thank you for these comments. We have generated new data, which we hope the reviewer will find more compelling. These will be included in a revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. Chlamydia are obligate intracellular bacterial pathogens that rely on eukaryotic host cells for growth. The chlamydial life cycle depends on a cell form developmental cycle that produces phenotypically distinct cell forms with specific roles during the infectious cycle. The RB cell form replicates amplifying chlamydia numbers while the EB cell form mediates entry into new host cells disseminating the infection to new hosts. Regulation of cell form development is a critical question in chlamydia biology and pathogenesis. Chlamydia must balance amplification (RB numbers) and dissemination (EB numbers) to maximize survival in its infection niche. The main findings In this manuscript show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of the transitionary gene hctA and late gene omcB. The authors also knocked down the expression of the dacA-ybbR operon and reported a reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion. Overall, this is a very intriguing study with important implications however the data is very preliminary and the model is very rudimentary and is not well supported by the data.

      Thank you for your comments. Chlamydia is not an easy experimental system, but we will do our best to address the reviewer’s concerns in a revised submission.

      Describing the significance of the findings:

      The findings are important and point to very exciting new avenues to explore the important questions in chlamydial cell form development. The authors present a model that is not quantified and does not match the data well.

      Describing the strength of evidence:

      The evidence presented is incomplete. The authors do a nice job of showing that overexpression of the dacA-ybbR operon increases c-di-AMP and that knockdown or overexpression of the catalytically dead DacA protein decreases the c-di-AMP levels. However, the effects on the developmental cycle and how they fit the proposed model are less well supported.

      dacA-ybbR ectopic expression:

      For the dacA-ybbR ectopic expression experiments they show that hctA is induced early but there is no significant change in OmcB gene expression. This is problematic as when RBs are treated with Pen (this paper) and (DOI 10.1128/MSYSTEMS.00689-20) hctA is expressed in the aberrant cell forms but these forms do not go on to express the late genes suggesting stress events can result in changes in the developmental expression kinetic profile. The RNA-seq data are a little reassuring as many of the EB/Late genes were shown to be upregulated by dacA-ybbR ectopic expression in this assay.

      As the reviewer notes, we also generated RNAseq data, which validates that late gene transcripts (including sigma28 and sigma54 regulated genes) are statistically significantly increased earlier in the developmental cycle in parallel to increased c-di-AMP levels. The lack of statistical significance in the RT-qPCR data for omcB, which shows a trend of higher transcripts, is less concerning given the statistically significantly RNAseq dataset. We have reported the data from three replicates for the RT-qPCR and do not think it would be worthwhile to attempt more replicates in an attempt to “achieve” statistical significance.

      The authors also demonstrate that this ectopic expression reduces the overall growth rate but produces EBs earlier in the cycle but overall fewer EBs late in the cycle. This observation matches their model well as when RBs convert early there is less amplification of cell numbers.

      dacA knockdown and dacA(mut)

      The authors showed that dacA knockdown and ectopic expression of the dacA mutant both reduced the amount of c-di-AMP. The authors show that for both of these conditions, hctA and omcB expression is reduced at 24 hpi. This was also partially supported by the RNA-seq data for the dacA knockdown as many of the late genes were downregulated. However, a shift to an increase in RB-only genes was not readily evident. This is maybe not surprising as the chlamydial inclusion would just have an increase in RB forms and changes in cell form ratios would need more time points.

      Thank you for this comment. We agree that it is not surprising given the shift in cell forms. The reduction in hctA transcripts argues against a stress state as noted above by the reviewer, and the RNAseq data from dacA-KD conditions indicates at least that secondary differentiation has been delayed. We will try to clarify this in a revision.

      Interestingly, the overall growth rate appears to differ in these two conditions, growth is unaffected by dacA knockdown but is significantly affected by the expression of the mutant. In both cases, EB production is repressed. The overall model they present does not support this data well as if RBs were blocked from converting into EBs then the growth rate should increase as the RB cell form replicates while the EB cell form does not. This should shift the population to replicating cells.

      We agree that it seems that perturbing c-di-AMP production, whether by knockdown or overexpressing the mutant DacA(D164N), has an overall negative impact on chlamydial growth. We have generated new data, which we think will address this. These new data will be included in a revised manuscript.

      Overall this is a very intriguing finding that will require more gene expression data, phenotypic characterization of cell forms, and better quantitative models to fully interpret these findings.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors Eapen et al. investigated the peptide inhibitors of Cdc20. They applied a rational design approach, substituting residues found in the D-box consensus sequences to better align the peptides with the Cdc20-degron interface. In the process, the authors designed and tested a series of more potent binders, including ones that contain unnatural amino acids, and verified binding modes by elucidating the Cdc-20-peptide structures. The authors further showed that these peptides can engage with Cdc20 in the cellular context, and can inhibit APC/C<sup>Cdc20</sup> ubiquitination activity. Finally, the authors demonstrated that these peptides could be used as portable degron motifs that drive the degradation of a fused fluorescent protein.

      Strengths:

      This manuscript is clear and straightforward to follow. The investigation of different peptide variations was comprehensive and well-executed. This work provided the groundwork for the development of peptide drug modalities to inhibit degradation or apply peptides as portable motifs to achieve targeted degradation. Both of which are impactful.

      Weaknesses:

      A few minor comments:

      (1) In my opinion, more attention to the solubility issue needs to be discussed and/or tested. On page 10, what is the solubility of D2 before a modification was made? The authors mentioned that position 2 is likely solvent exposed, it is not immediately clear to me why the mutation made was from one hydrophobic residue to another. What was the level of improvement in solubility? Are there any affinity data associated with the peptide that differ with D2 only at position 2?

      The reviewer is correct that we have not done any detailed solubility characterisation; we refer only to observations rather than quantitative analysis. We wrote that we reverted from Leu to Ala due to solubility - we will clarify this statement to say that that we reverted to Ala, as it was the residue present in D1, for which we observed a measurable affinity by SPR and saw a concentration-dependent response in the thermal shift analysis. We do not have any peptides or affinity data that explore single-site mutations with the parental peptide of D2. D2 is included in the paper because of its link to the consensus D-box sequence and thus was the logical path to the investigations into positions 3 and 7 that come later in the manuscript.

      (2) I'm not entirely convinced that the D19 density not observed in the crystal structure was due to crystal packing. This peptide is peculiar as it also did not induce any thermal stabilization of Cdc20 in the cellular thermal shift assay. Perhaps the binding of this peptide could be investigated in more detail (i.e., NMR?) Or at least more explanation could be provided.

      This section will be clarified. The lack of observed density was likely due to the relatively low affinity of D19 and also to the lack of binding of the three C-terminal residues in the crystal, and consequently it has a further reduced affinity. The current wording in the manuscript puts greater emphasis on this second aspect being a D19-specific issue, even though it applies to all four soaked peptides. The extent of peptide-induced thermal stabilisations observed by TSA and CETSA is different, with the latter experiment consistently showing smaller shifts. This observation may be due to the more complex medium (cell lysate vs. purified protein) and/or different concentrations of the proteins in solution. In the CETSA, we over-expressed a HiBiT-tagged Cdc20, which is present in addition to any endogenously expressed Cdc20. Although we did not investigate it, the near identical D-box binding sites on Cdc20 and Cdh1 would suggest that there will be cross-specificity, which could further influence the CETSA experiments.

      Reviewer #2 (Public review):

      Summary:

      The authors took a well-characterised (partly by them), important E3 ligase, in the anaphase-promoting complex, and decided to design peptide inhibitors for it based on one of the known interacting motifs (called D-box) from its substrates. They incorporate unnatural amino acids to better occupy the interaction site, improve the binding affinity, and lay foundations for future therapeutics - maybe combining their findings with additional target sites.

      Strengths:

      The paper is mostly strengths - a logical progression of experiments, very well explained and carried out to a high standard. The authors use a carefully chosen variety of techniques (including X-ray crystallography, multiple binding analyses, and ubiquitination assays) to verify their findings - and they impressively achieve their goals by honing in on tight-binders.

      Weaknesses:

      Some things are not explained fully and it would be useful to have some clarification. Why did the authors decide to model their inhibitors on the D-box motif and not the other two SLiMs that they describe?

      For completeness, in addition to the D-box we did originally construct peptides based on the ABBA and KEN-box motifs, but they did not show any shift in melting temperature of cdc20 in the thermal shift assay whereas the D-box peptides did; consequently, we focused our efforts on the D-box peptides. Moreover, there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study by Mark Hall’s lab (described in Qin et al. 2016), which tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated. They observed that whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study from David Morgan’s lab (Hartooni et al. 2022) looking at binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.

      What exactly do they mean when they say their 'observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast 'pseudo-substrate' inhibitor Acm1, acts to impede polyubiquitination of the bound protein'? It's an interesting thing to think about, and probably the paper they cite explains it more but I would like to know without having to find that other paper.

      Interesting results from a number of labs (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10. In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator.

      After further reading on this topic, we will modify the relevant piece of text from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation. Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      Reviewer #3 (Public review):

      Summary:

      Eapen and coworkers use a rational design approach to generate new peptide-inspired ligands at the D-box interface of cdc20. These new peptides serve as new starting points for blocking APC/C in the context of cancer, as well as manipulating APC/C for targeted protein degradation therapeutic approaches.

      Strengths:

      The characterization of new peptide-like ligands is generally solid and multifaceted, including binding assays, thermal stability enhancement in vitro and in cells, X-ray crystallography, and degradation assays.

      Weaknesses:

      One important finding of the study is that the strongest binders did not correlate with the fastest degradation in a cellular assay, but explanations for this behavior were not supported experimentally. Some minor issues regarding experimental replicates and details were also noted.

      Interesting results from a number of labs (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10. In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator.

      After further reading on this topic, we will modify the relevant piece of text from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation. Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In the manuscript entitled "A VgrG2b fragment cleaved by caspase-11/4 promotes Pseudomonas aeruginosa infection through suppressing the NLRP3 inflammasome", Qian et al. found an activation of the non-canonical inflammasome, but not the downstream NLRP3 inflammasome, during the infection of macrophage by P. aeruginosa, which is in sharp contrast to that by E. coli (Figure 1). In realizing that the suppression of the NLRP3 inflammasome is Caspase-11 dependent, the authors performed a screening among P. aeruginosa proteins and identified VgrG2b being a major substrate of Caspase-11 (Figure 2). Next, the authors mapped the cleavage site on VgrG2b to D883, and demonstrated that cleavage of VgrG2b by Caspase-11 is essential for the suppression of the NLRP3 inflammasome (Figure 3). Furthermore, they found that a binding between the C-terminal fragment of the cleaved VgrG2b and NLRP3 existed (Figure 4), which was then proved to block the association of NLRP3 with NEK7 (Figure 5). Finally, the authors demonstrated that blocking of VgrG2b cleavage, by either mutation of the D883 or administration of a designed peptide, effectively improved the survival rate of the P. aeruginosa-infected mice (Figure 6). This is a well-designed and executed study, with the results clearly presented and stated.

      We are deeply grateful for your recognition and positive comments on our article. Thank you for your effort and dedication in reviewing our manuscript. We are honored to have the opportunity to receive feedback form professional reviewers like you.

      Reviewer #2 (Public review):

      Summary:

      In their manuscript, Quian and colleagues identified a novel mechanism by which Pseudomonas control inflammatory responses upon inflammasome activation. They identified a caspase-11 substrate (VgrG2b) which, upon cleavage, binds and inhibits the NLRP3 to reduce the production of pro-inflammatory cytokines. This is a unique mechanism that allows for the tailoring of the innate immune response upon bacterial recognition.

      Strengths:

      The authors are presenting here a novel conceptual framework in host-pathogen interactions. Their work is supported by a range of approaches (biochemical, cellular immunology, microbiology, animal models), and their conclusions are supported by multiple independent evidences. The work is likely to have an important impact on the innate immunity field and host-pathogen interactions field and may guide the development of novel inhibitors.

      Weaknesses:

      Although quite exhaustive, a few of the authors' conclusions are not fully supported (e.g., caspase-11 directly cleaving VgrG2b, the unique affinity of VgrG2b-C for NLRP3) and would require complementary approaches to validate their findings fully. This is minimal.

      We sincerely appreciate your professional review and kind appraisal on our article. These comments are really valuable and helpful for improving our manuscript. According to your suggestions, we have made some modifications and added some supplemental data to make our results more convincing. The detailed responses are listed point-by-point below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I really enjoyed reading your manuscript and believe this is an important conceptual advance for the innate immunity field. Your conclusions are in general well-supported, you used a range of methodologies and the quality of the presentation of the results is excellent. I have a few comments here that I hope will contribute to improving an already great piece of work:

      Elements to be improved:

      Line 109-110: the author claims that the release of mito DNA is required for NLRP3 activation. ' I would support this with a reference. I believe this may not be fully agreed on in the field. Cleavage of GSDMD by caspase4/11 is required, however. A few groups showed the required for K+ efflux in this context (Broz, Brough, Schroder labs).

      It is a very good suggestion. Indeed, there is still controversy over this issue, and we have revised our text to make our manuscript more neutral. We have also cited these important references to help readers understand where the controversy lies.

      I disagree that OMV _+ Pseudomonas is a natural way to simulate natural infection. I would argue it is even quite artificial. Pseudomonas alone should be sufficient to generate OMV without the addition of extra OMVs.

      This is a good point. Before we infected BMDM cells with PAO1 stains, we had washed with PBS for at least three times to exclude the interference of contents in the LB medium. Moreover, in our experimental system, the time for co-incubation between bacteria and host cells is very limited. During this time, the amount of OMV secreted by bacteria may not reach the level of activating inflammasomes, and this concentration is also relatively low compared to the OMV concentration secreted by bacteria under physiological conditions. Therefore, we added extra OMVs to simulate the chronic infection condition in a short time.

      The co-expression of caspase with VrG2b and assume the cleavage is direct. However, the work is lacking work with recombinant proteases (commercially available), which would strengthen their conclusions regarding the ability of caspase-4/11 to directly cleave the protein. Based on the recognised sequence (DXXD), I believe caspase-4/11 is not directly responsible for this. These caspases were shown to cleave caspase-3/7, which can cleave such sequence (DXXX). As caspase-4 can cleave caspase-3/7 in their lysates, I would recommend testing this hypothesis to further strengthen the authors' conclusions.

      These are very good points. As data shown on Fig. 3F, we used recombinant VgrG2b and caspase-11 p22/p10 to prove the direct cleavage of caspase-11. To exclude the effect of caspase-3/7, we treated cells with inhibitors of caspase-3/7 and found that caspase-3/7 are not the executor for VgrG2b cleavage (new Fig. S3E, F).

      The affinity between caspase-11 and VgrG2b-C is puzzling as one would normally expect the caspase and its substrates to quickly dissociate. Does VgrG2b-C impact the activity of caspase-4/11 upon cleavage? Can VrgG2b-C also interact with p20/p10 caspase-1? I believe the authors only tried the full-length version of caspase-1 in supplemental.

      These are very good questions. We agree enzymes and substrates only have temporary interactions normally, which are not easy to catch. However, we used mutant caspase-11(C254A) inhibiting its cleavage of substrates, so that the combination of VgrG2b or VgrG2b-C with caspase-11(C254A) could be detected. This mutation is frequently used in immunoprecipitation (Wang K, Cell, 2020). We had tested the impact of VgrG2b-C on the enzyme activity of caspase-4/11, and showed that VgrG2b-C did not affect the cleavage of GSDMD by caspase-11 (Fig. 5C). We also tried the caspase-1 p20/p10, also found that they had no interaction with VgrG2b-C (new Fig. S4G).

      Can more details be provided about the generation of recombinant caspase-11, VgrG2b-C, and other recombinant proteins tested?

      Thanks for your suggestion, we have revised our description in the new version.

      The authors assumed that VgrG2C-b does not impact other inflammasome (such as NLRC4) based on their X-gal assay. I would also confirm this with a functional assay (e.g., transfection of flagellin in macrophages).

      This is a good suggestion. We have tested the impact of VgrG2b-C on NLRC4 inflammasome and found that VgrG2b-C does not affect NLRC4 activation with the transfection of flagellin (new Fig. S5K).

      Often, representative experiments are shown. For Elisa, cell death assays and quantitative experiments, pooling the data would be appropriate. Appropriate statistical analysis should be conducted based on this as well.

      Thanks for your suggestions. In the revised manuscript, we pooled the data of three independent experiments for our analysis of ELISA and cell death assays. We also added descriptions of statistical analysis in our revised text.

      VgrG2b has been suggested to be a metalloprotease (PMID: 31577948). Is its protease activity required for the phenomenon observed?

      This is a very good question. The active region of metalloprotease VgrG2b-C is aa932-941, especially the core sequence of HEXXH. Structure data also confirms that H935, E936, H939, E983 play key roles in the coordination with Zn ions (Sana TG, mBio, 2015; Wood TE, Cell reports, 2019). In our study, the cleavage of VgrG2b by caspase-4/11 depends on the recognition of tetrapeptide sequence in aa880-883. We added data showing that the cleavage of VgrG2b and the inhibition of NLRP3 inflammasome were not affected by VgrG2b enzymatic activity (new Fig. S4I-K).

      What is the affinity of VgrG2b-C for NLRP3? Is it higher than NEK7? A quantitative experiment would be required to claim this.

      This is a great point of view. We added the quantitative data certifying that VgrG2b-C has higher affinity with NLRP3 compared with NEK7 in the revised manuscript (326 nM VS 681 nM).

      The Material and Method section is a bit light and would benefit from adding more information (e.g. cell density, microscopy details, number of cells imaged, etc).

      Thanks for your suggestion. We have added more details in the Material and Method section in revised manuscript.

    1. Author response:

      We thank the reviewers for their concise and detailed summaries, and appreciate the constructive feedback on the article’s strengths and weaknesses. In response, we plan to strengthen our work in a revised version by presenting the model assumptions for the electrocyte more explicitly and further elaborate on the generalisability of the results to other cell types with different ion channels including calcium and chloride.

      Experimental work is beyond the scope of our modelling-based study. However, we would like our work to serve as a framework for future experimental studies into the role of the electrogenic pump current (and its possible compensatory currents) in disease, and its role in evolution of highly specialised excitable cells (such as electrocytes).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The authors demonstrate that, while the loss of Ezrin increases lysosomal biogenesis and function, its presence is required for the specific endocytosis of EGFR. Upon further investigation, the authors reveal that Ezrin is a crucial intermediary protein that links EGFR to AKT, leading to the phosphorylation and inhibition of TSC. TSC is a critical negative regulator of the mTORC1 complex, which is dysregulated in various diseases, making their findings a valuable addition to multiple fields of study. Their cell signaling findings are translatable to an in vivo Medaka fish model and suggest that Ezrin may play a crucial role in retinal degeneration.

      Strengths: 

      Giamundo, Intartaglia, et al. utilized unbiased proteomic and transcriptomic screens in Ezrin KO cells to investigate the mechanistic function of Ezrin in lysosome and cell signaling pathways. The authors' findings are consistent with past literature demonstrating Ezrin's role in the EGFR and mTORC1 signaling pathways. They used several cell lines, small molecule inhibitors, and cellular and in vivo knockout models to validate signaling changes through biochemical and microscopy assays. Their use of multiple advanced microscopy techniques is also impressive.

      We are grateful to the Editor and the Reviewers for their important and constructive comments, which amended us to improve our manuscript. We have now carried out new experiments and analyses to further support our findings.

      Weaknesses: 

      While the authors demonstrated activation of TSC1 (lysosomal accumulation) and inactivation of Akt (decreased phosphorylation in TSC1), as well as decreased mTORC1 signaling in Ezrin knockout cells, direct experiments showing the rescue of mTORC1 activity by AKT and TSC1 mutants are required to confirm the linear signaling pathway and establish Ezrin as a mediator of EGFR-AKTTSC1-mTORC1 signaling. Although the authors presented representative images from advanced microscopy techniques to support their claims, there is insufficient quantification of these experiments. Additionally, several immunoblots in the manuscript lack vital loading controls, such as input lanes for immunoprecipitations and loading controls for western blots.

      We wish to thank the Reviewer for his/her important and constructive comments on our manuscript and to consider that our study provides new information for understanding the mechanism regulating TSC/mTORC1 pathway. We have now extensively revised the manuscript according to his/her suggestions. Indeed, to expand on the evidence demonstrating Ezrin as a mediator of EGFR-AKTTSC1-mTORC1 signaling, the revised manuscript includes quantification of all advanced microscopy images, rescue experiments demonstrating the role of Ezrin in AKT/TSC/mTORC1 molecular network, and controls for WBs and immunoprecipitations.

      Reviewer #2 (Public Review):

      Summary: 

      The authors begin with the stated goal of gaining insight into the known repression of autophagy by Ezrin, a major membrane-actin linker that assembles signaling complexes on membranes. RNA and protein expression analysis is consistent with upregulation of lysosomal proteins in Ezrin-deficient MEFs, which the authors confirm by immunostaining and western blotting for lysosomal markers. Expression analysis also implicates EGF signaling as being altered downstream of Ezrin loss, and the authors demonstrate that Ezrin promotes relocalization of EGFR from the plasma membrane to endosomes. Ezrin loss impacts downstream MAPK/Akt/mTORC1 signaling, although the mechanistic links remain unclear. An Ezrin mutant Medaka fish line was then generated to test Ezrin's role in retinal cells, which are known to be sensitive to changes in autophagy regulation. Phenotypes in this model appear generally consistent with observations made in cultured cells, though mild overall. 

      Strengths: 

      Data on the impact of Ezrin-loss on relocalization of EGFR from the plasma membrane are extensive, and thoroughly demonstrate that Ezrin is required for EGFR internalization in response to EGF. 

      A new Ezrin-deficient in vivo model (Medaka fish) is generated.

      Strong data demonstrates that Ezrin loss suppresses Akt signaling. Ezrin loss also clearly suppresses mTORC1 signaling in cell culture, although examination of mTORC1 activity is notably missing in Ezrin-deficient fish. 

      We thank the Reviewer for the recognition of our study and apologize for the insufficient evidence reported in the previous version of the manuscript. As requested by the Reviewer, we considerably expanded the number of experiments to support EZRIN/EGFR/TSC molecular network in regulating autophagy pathway in the revised manuscript. Furthermore, following the Reviewer’s comment we have expanded the interpretation of our findings in the "Discussion” section. We hope the new version of our manuscript will satisfy the Reviewer’s worries.

      Weaknesses: 

      LC3 is used as a readout of autophagy, however the lipidated/unlipidated LC3 ratio generally does not appear to change, thus there does not appear to be evidence that Ezrin loss is affecting autophagy in this study. 

      We certainly agree with the Reviewer on the importance of this issue and apologize for the lack of clarity. Ezrin is an already widely characterized protein participating autophagy pathway. Several studies, including our previous studies, demonstrated that both silencing and pharmacological inhibition of Ezrin may promote autophagy by promoting activation of TFEB, in part through the TRPML1-calcineurin signaling pathway (Naso et al 2020; Intartaglia et al 2022; Lou et al 2024). However, a full elucidation on how Ezrin controls autophagy is still not unknown. As suggested by the Reviewer, to reinforce our data, we have now fixed this inaccuracy by better elucidating this aspect in the revised manuscript. Accordingly, we have monitored the autophagic flux and LC3 expression level following the guidelines for the use and interpretation of assays for monitoring autophagy (4th edition) by Klionsky et al. 2021. The data presented in the new Figure supplement 1 now better support the notion that depletion of Ezrin increases autophagic flux. We hope the new version of our manuscript will satisfy the Reviewer’s worries.

      The conclusion is drawn that Ezrin loss suppresses EGF signaling, however this is complicated by a strong increase in phosphorylation of the p38 MAPK substrate MK2. Without additional characterization of MAPK and Erk signaling, the effect of Ezrin loss remains unclear.  Causative conclusions between effects on MAPK, Akt, and mTORC1 signaling are frequently drawn, but the data only demonstrate correlations. For example, many signaling pathways can activate mTORC1 including MAPK/Erk, thus reduced mTORC1 activity upon Ezrin-loss cannot currently be attributed to reduced Akt signaling. Similarly, other kinases can phosphorylate TSC2 at the sites examined here, so the conclusion cannot be drawn that Ezrin-loss causes a reduction in Akt-mediated TSC2 phosphorylation.

      We agree with the Reviewer that this is an interesting and important question. However, we respectfully disagree with the Reviewer and feel that addressing this point by additional studies on both MAPK and ERK pathways, as the Reviewer suggests, is outside the scope of this manuscript. We therefore prefer to address these questions in future studies. However, following the Reviewer’s comment we have expanded the interpretation of our findings in the "Discussion” section. We hope the new version of our manuscript will satisfy the Reviewer’s worries.

      In Figure 7, the conclusion cannot be drawn that retinal degeneration results from aberrant EGFR signaling.

      We certainly agree with the Reviewer on the importance of this issue. We now fixed this inaccuracy by adding TUNEL staining that showed the retinal degeneration in Ezrin KO medaka fish. The results of these assays are described in the Results section and documented in revised Figure 7, panels H.

      It is unclear why TSC1 is highlighted in the title, as there does not appear to be any specific regulation of TSC1 here. 

      We modified the title accordingly

      In Figure 1 the conclusion is drawn that there is an increase in lysosome number with Ezrin KO, however it does not appear that the current analysis can distinguish an increased number from increased lysosome size or activity. Similarly, conclusions about increased lysosome "biogenesis" could instead reflect decreased turnover.

      Following this Reviewer’s observation, we changed the text according to his/her suggestion.

      Immunoprecipitation data for a role for Ezrin as a signaling scaffold appear minimal and seem to lack important controls.

      We apologize for these inaccuracies. We have now carried out new experiments to further support our findings. Moreover, all blots were changed for better exposed images. In the revised Figures the controls were showed.

      In Figure 3A it seems difficult to conclude that EGFR dimerization is reduced since the whole blot, including the background between lanes, is lighter on that side.

      We now fixed this inaccuracy. The blots were changed for better exposed images in revised Figure 3, panel A. and quantified

      In Figure 6C specificity controls for the TSC1 and TSC2 antibodies are not included but seem necessary since their localization patterns appear very different from each other in WT cells.

      We apologize because we have created some confusion. We have now emended this mistake and revised all panels in Figure 6C (now Figure 6D) for consistency between figures and text. Concerning the specificity of TSC1 and TSC2 antibodies and staining, indeed, antibodies labelling was showing the ordinary pattern from TSC in the cells as stated in Menon et al. 2014. We would like to point out that the antibodies are the same indicated in Menon et al. 2014 and our data are not only based on TSC1 and TSC2 staining but on a considerable number of in vivo and in vitro experiments in which many and different markers were used by performing several complementary approaches (i.e. immunofluorescence, western blot analysis, Omics, etc.)

      Menon S, Dibble CC, Talbott G, Hoxhaj G, Valvezan AJ, Takahashi H, Cantley LC, Manning BD. Spatial control of the TSC complex integrates insulin and nutrient regulation of mTORC1 at the lysosome. Cell. 2014 Feb 13;156(4):771-85.

      In Figure 7 the signaling effects in Ezrin-deficient fish are mild compared to cultured cells, and effects on mTORC1 are not examined. Further data on the retinal cell phenotypes would strengthen the conclusions.

      We thank the Reviewer for his/her comment. We have now fixed this inaccuracy in the revised manuscript. We added the analysis for p4EBP1 (S65), a mTORC1 substrate Figure 7 panel D. 

      In Figure 7F there appears to be more EGFR throughout the cell, so it is difficult to conclude that more EGFR at the PM in Ezrin-/- fish means reduced internalization. 

      We agree with the Reviewer that it is an important question that helped us to improve the quality of the data presented. As correctly noted by the Reviewer, EGFR protein level is increased due to EZRIN deletion. This is evident in Figure 7 panel F, in line with both proteomic analysis and in vitro experiments (Figure 2I; Figure 3E; Figure 5C). We also agree that the increase of EGFR protein level could strength the background of immunofluorescence. Therefore, to better represent the EGFR membrane translocation on flat mount RPE from medaka lines, we add a highlighting box showing it in both WT and KO medaka line in the revised Figure 7 panel F.

      Reviewer #3 (Public Review): 

      Summary: 

      In this study, the authors have attempted to demonstrate a critical role for the cytoskeletal scaffold protein Ezrin, in the upstream regulation of EGFR/AKT/MTOR signaling. They show that in the absence of Ezrin, ligand-induced EGFR trafficking and activation at the endosomes is perturbed, with decreased endosomal recruitment of the TSC complex, and a corresponding decrease in AKT/MTOR signaling. 

      Strengths: 

      The authors have used a combination of novel imaging techniques, as well as conventional proteomic and biochemical assays to substantiate their findings. The findings expand our understanding of the upstream regulators of the EGFR/AKT MTOR signaling and lysosomal biogenesis, appear to be conserved in multiple species, and may have important implications for the pathogenesis and treatment of diseases involving endo-lysosomal function, such as diabetes and cancer, as well as neuro-degenerative diseases like macular degeneration. Furthermore, pharmacological targeting of Ezrin could potentially be utilized in diseases with defective TFEB/TFE3 functions like LSDs. While a majority of the findings appear to support the hypotheses, there are substantial gaps in the findings that could be better addressed. Since Ezrin appears to directly regulate MTOR activity, the effects of Ezrin KO on MTOR-regulated, TFEB/TFE3 -driven lysosomal function should be explored more thoroughly. Similarly, a more convincing analysis of autophagic flux should be carried out. Additionally, many immunoblots lack key controls (Control IgG in co-IPs) and many others merit repetition to either improve upon the quality of the existing data, validate the findings using orthogonal approaches, or provide a more rigorous quantitative assessment of the findings, as highlighted in the recommendation for authors. 

      We thank the Reviewer for the recognition of our study and apologize for the inaccuracies previously. We also greatly appreciate the efforts the reviewer went through with his/her support and help for the improvement of our manuscript. We considerably expanded the number of experiments to support EZRIN/EGFR/AKT network in controlling mTORC1 pathway in the revised manuscript as requested by the Reviewer. We hope the new version of our manuscript will satisfy the Reviewer’s worries.

      Reviewer #1 (Recommendations for The Authors):

      Major comments: 

      (1) While the authors show that, in the absence of Ezrin, TSC accumulates on the lysosome and suppresses mTORC1 signaling, they should perform additional genetic experiments to strengthen their conclusions. Can they knockout or knockdown TSC1/2 in Ezrin-deficient cells to rescue mTORC1 activity? Can they mutate the lysosomal localization signal on TSC1 (TSC1Q149E/R204E/K238E) in Ezrin-deficient cells to rescue mTORC1 activity? Does constitutively active AKT (myr-AKT or AKT-E40K) restore mTORC1 activity in Ezrin-deficient cells? 

      We agree with the Reviewer that it is an important concern that helped us to improve the quality of the data presented. We now provide in the revised version of Figure supplement 4F the results of pharmacological inhibition of Ezrin on MEF-TSC2 KO cells. In line with our findings, the lack of TSC2 is able to rescue mTORC1 signaling in absence of Ezrin activity. Thus, these data strongly support that Ezrin is required for TORC1pathway via TSC complex targeting.

      (2) In the absence of Ezrin, TSC1 constitutively localizes on the lysosome and suppresses mTORC1. Does this suppression hold in the presence of other mTORC1-activating signals (i.e., amino acids, insulin, oxygen)? 

      Following the reviewer’s suggestion we now provide this information in the revised Figure 6C, in which we showed that stimulation with insulin does not exert its activating effect on mTORC1 signaling (i.e. phosphorylation of pP70 S6 - pT389). These new data, together with the experiments on MEF TSC2 KO cells, clearly support the model by which Ezrin works as a scaffold protein connecting ATK signaling to TSC complex. The lack of Ezrin induces a disconnection between AKT and TSC complex, which is translocated on lysosomes and insensitive to inhibition of AKT signaling.

      (3) In Figure 3A, the authors showed EGFR dimerization through a western blot of a crosslinking assay. However, the western blot data are unclear and do not strongly support their statement. Additionally, the authors mentioned that the dimerization is confirmed by immunofluorescence analysis, but this statement should be revised since the imaging analysis only indirectly shows the copresence of EZR and EGFR, not necessarily the dimerized EGFR. The authors should perform additional experiments to strengthen their claim or tone down their statements in the text and model figure. 

      We certainly agree with the Reviewer on the importance of this issue and now we have fixed this inaccuracy in the revised manuscript. The blots of crosslinking were changed for better exposed images in revised Figure 3, panel A. Moreover, we also properly quantified signals to support our conclusion.

      (4) It is interesting that Ezrin binds EGFR, AKT, and TSC as a scaffolding protein. To define the mechanisms by which Ezrin interacts with AKT, EGFR, and TSC, can the authors perform domain analyses to determine which regions of Ezrin are required for its binding with AKT, EGFR, and TSC in mediating EGFR-AKT-TSC-mTORC1 signaling? 

      We thank the Reviewer for his/her comment that improves our manuscript. Conducting domain analysis in the lab would be ideal, although this seems to us a long tour de force that might be associated to several technical and experimental issues. However, in silico approaches provide a helpful alternative for generating initial hypotheses about domain-domain interactions, though they should be seen as a starting point rather than a complete solution. Recent advances in fold prediction suggest that AlphaFold3 could be used to predict dimer formation and, consequently, domain-domain interactions. However, such an approach is challenging in this case because some of the considered proteins are transmembrane, and all are prone to form multimeric complexes with multiple partners, making them poor candidates for reliable fold predictions. In fact, the predicted dimers are poorly supported, and AlphaFold3 lacks confidence in the relative positioning of interactors, limiting its interpretability. Alternatively, database mining and machine-learning methods, such as HINT, Domine, and PPIDomainMiner, provide more robust evidence. Indeed, these tools allow us to consistently identify a strong interaction between Ezrin's FERM central domain and EGFR's PK domain shown now in the Figure Supplement 2C and Supplement Figure 3C-H. Importantly, these findings generate valuable hypotheses, therefore experimental validation is still necessary. But we prefer to leave it for future studies.

      Minor Comments: 

      (1) There are several immunoblots that did not have adequate controls:  - In Figure 2D, an input lane should be shown for each of the cell lysates to demonstrate the presence of other proteins in the cell lysate used for the IP.

      We have now fixed this inaccuracy in the revised manuscript.

      - Figure 3A does not have a loading control. Also, immunoblot quality should be significantly improved.

      We have now fixed this inaccuracy in the revised manuscript.

      - The HER2 western blot in Figure 5C does not accurately represent the data shown in the quantification graph.

      We have now fixed this inaccuracy by replacing HER2 western blot in the revised Figure 5C.

      - In Figure 6A, the authors should include an input as a control for the IP. To further support their claim in the model figure, can the authors also probe the IP lysate for Ezrin and Tsc2? If all are indeed in a complex together, they should be present. 

      Following this Reviewer’s observation, we add the input as control in the IP in the revised Figure 6A. Moreover, we include the immunoprecipitation data for the EZRIN and TSC2 interaction, accordingly (Figure 6A).

      - Phosphorylation sites across figures should be uniformly annotated for consistency and ease of understanding, e.g., pTSC2(S939), pS6K1(T389), and pAKT(S473).

      We have now fixed this inaccuracy in the revised text.

      (2) There are several microscopy data that lack adequate quantification. For instance, Figures 2E, 2F, 3C, 4A, 5A, and 6F only show very few cells as representative images, which is not sufficient to support their claims. 

      We thank the Reviewer for his/her comment that improves our manuscript. Accordingly, we add adequate quantification and statistical analysis in the revised Figures, accordingly.

      (3) Some suggestions to improve the readability of the manuscript: 

      -  In the abstract (line 32): "Loss of Ezrin was deficient in TSC repression by EGF and culminated in translocation of TSC to lysosomes triggering suppression of mTORC1 signaling." The wording is somewhat confusing, please change such as "Loss of Ezrin was not sufficient to repress TSC by EGF and culminated..." or "Loss of Ezrin blunted EGF-induced TSC suppression and culminated..." 

      We apologize for the lack of clarity and now we have fixed this inaccuracy by better elucidating this aspect in the revised manuscript.

      -  Figure 3D has a typo in the western blot labeling. Please change Citosol to Cytosol. 

      We have now fixed this inaccuracy in the revised text.

      -  Line 291: "Moreover, TSC2 resulted activated and AKT/mTOR signaling..." The wording is confusing. 

      We have now fixed this inaccuracy in the revised text. The text now reads: “Moreover, we found that TSC2 was dephosphorylated  in response to light in the retina, when inactive Ezrin (Naso et al., 2020) and EGFR are weakly expressed (Figure supplement 6C) as a consequence of a decrease of the AKT/mTORC1 signaling…..)

      -  The model in Figure 8 indicates that upon EGF stimulation, the activated Ezrin interacts with EGFR, causing its dissociation from actin filaments and leading to its endosome incorporation. However, the authors did not provide supporting data for this claim. Can the authors either cite literature or provide data for this? Otherwise, the model should be edited to remove actin filaments in the model. 

      We have now fixed this inaccuracy by removing actin filaments in the revised model.

      Reviewer #2 (Recommendations For The Authors):

      The data and written text seem to deal entirely with mTORC1, rather than mTORC2, thus it seems "mTOR" should be changed to "mTORC1" throughout. 

      We have now fixed this inaccuracy in the revised manuscript.

      For clarification, the TSC protein complex should be referred to as the "TSC complex", whereas "TSC" generally refers to the tumor syndrome Tuberous Sclerosis Complex.

      We have now fixed this inaccuracy in the revised manuscript.

      Quantification of colocalization would be helpful in all the panels where it is currently missing.

      We thank the Reviewer for his/her comment that improves our manuscript. Accordingly, we add adequate quantification of colocalization for each immunofluorescence in the revised Figures, accordingly.

      Line 84 typo "thorough" should be "through" 

      We have now fixed this inaccuracy in the revised manuscript.

      Line 178 - typo 

      We have now fixed this inaccuracy in the revised manuscript.

      Line 209 - typo 

      We have now fixed this inaccuracy in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      Fig. 1 The data showing an increase in lysosomal biogenesis suggests an increase in transcriptional activity. This should be confirmed by one or more of the following: 1) Increased TFEB/TFE3 nuclear localization following EZR loss, 2) Increased CLEAR promoter luciferase activity assays, 3) Increased expression of multiple CLEAR transcripts (https://www.science.org/doi/10.1126/science.1174447) or 4) Increased TFEB/ TFE3/ CLEAR gene signatures by RNA seq. Similarly, data showing increased autophagic flux should be confirmed in the presence of chloroquine or bafilomycin. 

      We agree with the Reviewer that it is an important concern that helped us to improve the quality of the data presented. It is well established that a major mechanism regulating TFEB activity is represented by the nuclear translocation. We have now carried out new experiments demonstrating that depletion of Ezrin induces TFEB nuclear translocation in Ezrin<sup>-/-</sup> cells. These findings are in line with our previous data in which pharmacological inhibition and silencing of Ezrin induced the same cellular phenotype. We also apologize because we have created some confusion, because we already carried out experiments with Bafilomycin to confirm the increase of autophagic flux. Therefore, the blots of autophagic flux were changed for better exposed images in revised Figure supplement 1H and the text was modified to emphasize these findings, accordingly.

      Fig 2D, the lanes with EZR -/- cells expressing the EZR mutants should be repeated on the same gel as the first 2 lanes (with the WT and EZR<sup>-/-</sup> cells) 

      We thank the Reviewer for his/her comment that improves our manuscript. In order to avoid any confusion, when describing the results in Figure 2D, we have now modified the Figure 2D, providing the required controls in the response to Reviewer #1 and #2. We hope the new version of our data will satisfy the Reviewer’s worries.

      Fig 2F- The presence of reduced EGFR in intracellular compartments in Ezrin KO/ -/- cells should be quantified, and shown for a 2nd EZR null cell line as well (Ezrin null MEFs) 

      We added EGFR quantification in Figure 2F. We have now carried out new experiments demonstrating that EGFR is localized on cytoplasmic membrane in MEF Ezrin KO (Figure supplement 2H), accordingly. 

      Fig 2G, did the authors test the effects of EZR depletion on basal and EGF stimulated EGFR autophosphorylation on Y1068 and Y1045 as well as downstream activation of p42/44 ERK MAPK?  Those should be tested in the HeLa system as well as the MEFs cells with EZR KO. 

      Following the Reviewer’s request, we have now added western blot data for EGFR autophosphorylation on Y1068 and p42/44 ERK MAPK in Figure 5C. Moreover, we have now added western blot data for p42/44 ERK MAPK on MEF cells in Figure supplement 2F. In contrast, we cannot provide any data for EGFR autophosphorylation on Y1068, because the antibody was not working on proteins from MEF cells.

      Also, why would HER3 levels be expected to decrease? There seems to be minimal change in HER3 expression. Also, the significance of increased MK2 phosphorylation should be further elaborated. 

      The Reviewer raised justified concerns about the HER3 and MK2. We have discussed these aspects in the "results section”, accordingly. 

      Fig 3A- Crosslinking of EGFR is not very apparent in this blot. The crosslinking blots should be repeated 3 times and quantified. 

      We certainly agree with the Reviewer on the importance of this issue and now we have fixed this inaccuracy in the revised manuscript. The blots of crosslinking were changed for better exposed images in revised Figure 3, panel A. Moreover, we also properly quantified signals to support our conclusion.

      Fig 3D- How were membrane endosomes isolated? This should be stated in the methods. Membrane/ Cytosol and Endosome fractionation showing EGFR levels should be shown in Ezrin null MEFs as well, and membrane expression should be further substantiated with surface biotinylation for cell surface EGFR. 

      We now report more information about the method that we used for membrane endosomes isolation in the Materials and Methods section. Following the Reviewer’s request, we also show that EGFR was not localized on endosomes upon EGF on Ezrin null MEFs. This data was reported in the new revised Figure Supplement 2G. Moreover, we have now carried out new experiments demonstrating the membrane localization of EGFR in MEF Ezrin KO cells. These findings are shown in Figure supplement 2H.

      Fig 5C: Similar to 2G, EGFR autophosphorylation on Y1068 and Y1045 should also be measured, as well as downstream activation of p42/44 ERK MAPK? 

      Following the Reviewer’s request, we have now carried out new experiments to assess the EGFR autophosphorylation on Y1068 and Y1045, as well as downstream activation of p42/44 ERK MAPK.  We added these new data in the revised Figure 5C, accordingly. 

      Fig 5D: Similar to 3D, Membrane/ Cytosol and Endosome fractionation showing EGFR levels should be shown in Ezrin null MEFs as well, and further substantiated with surface biotinylation for cell surface EGFR. 

      Following the Reviewer’s request, we show that EGFR was not localized on endosomes upon EGF (Figure Supplement 2G). 

      Supplement 2E: The blots show lower expression of EGFR and higher MAPK activation in EZR KO cells, contradicting the data in the other cells. 

      We apologize because we have created some confusion. It occurred during the preparation of Figure supplement 2E, reflecting image of a previous not finalized version of the Figure. We have now removed the error and replaced with a correct WB panel.

      Supplement 2F: The authors should repeat the NSC668394 experiment using: 1) multiple doses, 2) In both the Ezrin KO and null cell lines 3) and repeat 3X to quantify differences in total EGFR. 

      We respectfully disagree with the Reviewer and feel that addressing this point by additional studies on dose response of NSC668394, as the Reviewer suggests, is outside the scope of this manuscript. However, we would like to point out that we have already conducted extensive studies on the doseresponse effects of NSC668394 administration in vitro (Patent: WO2020070333A1). 

      Moreover, we apologize for not having provided enough information about the number of biological independent replicates for WB analyses. Therefore, to fill this gap of information we have expanded the Material and Methods section, accordingly.

      Patent: WO2020070333A1 - Ezrin inhibitors and uses thereof

      Fig 6A: The IP experiments should be repeated with Control IgG 

      We have now fixed this inaccuracy in the revised manuscript.

      Typos: 

      (1) Figure 3D: Citosol 

      We have now fixed this inaccuracy in the revised manuscript.

      (2) Line 216-217: "increased EGFR protein 217 levels on purified membranes and endosomes (Figure 3D and E)" - That should be decreased EGFR on endosomes in accordance with Figure 3D (lower panels) 

      We have now fixed this inaccuracy in the revised manuscript.

      (3) Abstract: "Consistently, Medaka fish deficient for Ezrin exhibit defective endo-lysosomal pathway" 

      We have now fixed this inaccuracy in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Shen et al. conducted three experiments to study the cortical tracking of the natural rhythms involved in biological motion (BM), and whether these involve audiovisual integration (AVI). They presented participants with visual (dot) motion and/or the sound of a walking person. They found that EEG activity tracks the step rhythm, as well as the gait (2-step cycle) rhythm. The gait rhythm specifically is tracked superadditively (power for A+V condition is higher than the sum of the A-only and V-only condition,

      Experiments 1a/b), which is independent of the specific step frequency (Experiment 1b). Furthermore, audiovisual integration during tracking of gait was specific to BM, as it was absent (that is, the audiovisual congruency effect) when the walking dot motion was vertically inverted (Experiment 2). Finally, the study shows that an individual's autistic traits are negatively correlated with the BM-AVI congruency effect.

      Strengths:

      The three experiments are well designed and the various conditions are well controlled. The rationale of the study is clear, and the manuscript is pleasant to read. The analysis choices are easy to follow, and mostly appropriate.

      Weaknesses:

      On revision, the authors are careful not to overinterpret an analysis where the statistical test is not independent from the data (channel) selection criterion.

      Thanks for the suggestion and we have done this according to your recommendations below.

      Reviewer #1 (Recommendations for the authors):

      Re: the double-dipping concern: I appreciate the revision. Just to clarify: my concern rests with the selection of *electrodes* based on the interaction test for the 1Hz condition. The 2Hz condition analogous test yields no significant electrodes. You perform subsequent tests (t-tests and 3-way interaction) on the data averaged across the electrodes that were significant for the 1Hz condition. Therefore, these tests will be biased to find a pattern reflecting an interaction at 1Hz, while no similar bias exists for an effect at 2Hz. Therefore, there is a bias to observe a 3-way interaction, and simple effects compatible with a 2-way interaction only for 1Hz, not for 2Hz (which is exactly what you found). There is no good statistical alternative here, I appreciate that, but the bias exists nonetheless. I think the wording is improved in this revision, and the evidence is convincing even in light of this bias.

      We are grateful for your thoughtful comments on the analytical methods. We appreciate your concerns regarding the potential bias of examining 3-way interaction based on electrodes yielding a 2-way interaction effect. To address this issue, we have conducted a bias-free analysis based on electrodes across the whole brain. The results showed a similar pattern of 3-way interaction as previously reported (p = 0.051), suggesting that the previous findings might not be caused by electrode selection. Given that the main results of Experiment 2 were not based on whole-brain analysis, we did not involve this analysis in the main text, and we have removed the three-way interaction results based on selected electrodes from the manuscript to reduce potential concerns. It is also noteworthy that, when performing analyses based on channels independent of the interaction effect at 1 Hz (i.e., significant congruency effects in the upright and inverted conditions, respectively, at 2Hz), we got similar results as reported in the main text (i.e., non-significant interaction and correlation at 2 Hz). These results were presented in the supplementary file in previous versions and mentioned in the correlation part of the Results section (see Fig. S2). Once again, we sincerely appreciate your careful review of our research. We hope the abovementioned points adequately address your concern.

      Reviewer #2 (Public review):

      Summary:

      The authors evaluate spectral changes in electroencephalography (EEG) data as a function of the congruency of audio and visual information associated with biological motion (BM) or non-biological motion. The results show supra-additive power gains in the neural response to gait dynamics, with trials in which audio and visual information was presented simultaneously producing higher average amplitude than the combined average power for auditory and visual conditions alone. Further analyses suggest that such supra-additivity is specific to BM and emerges from temporoparietal areas. The authors also find that the BM-specific supra-additivity is negatively correlated with autism traits.

      Strengths:

      The manuscript is well-written, with a concise and clear writing style. The visual presentation is largely clear. The study involves multiple experiments with different participant groups. Each experiment involves specific considered changes to the experimental paradigm that both replicate the previous experiment's finding yet extend it in a relevant manner.

      In the first revisions of the paper, the manuscript better relays the results and anticipates analyses, and this version adequately resolves some concerns I had about analysis details. In a further revision, it is clarified better how the results relate to the various competing hypotheses on how biological motion is processed.

      Weaknesses:

      Still, it is my view that the findings of the study are basic neural correlate results that offer only minimal constraint towards the question of how the brain realizes the integration of multisensory information in the service of biological motion perception, and the data do not address the causal relevance of observed neural effects towards behavior and cognition. The presence of an inversion effect suggests that the supraadditivity is related to cognition, but that leaves open whether any detected neural pattern is actually consequential for multi-sensory integration (i.e., correlation is not causation). In other words, the fact that frequency-specific neural responses to the [audio & visual] condition are stronger than those to [audio] and [visual] combined does not mean this has implications for behavioral performance. While the correlation to autism traits could suggest some relation to behavior and is interesting in its own right, this correlation is a highly indirect way of assessing behavioral relevance. It would be helpful to test the relevance of supra-additive cortical tracking on a behavioral task directly related to the processing of biological motion to justify the claim that inputs are being integrated in the service of behavior. Under either framework, cortical tracking or entrainment, the causal relevance of neural findings toward cognition is lacking.

      Overall, I believe this study finds neural correlates of biological motion that offer some constraint toward mechanism, and it is possible that the effects are behaviorally relevant, but based on the current task and associated analyses this has not been shown (or could not have been, given the paradigm).

      Reviewer #2 (Recommendations for the authors):

      Thank you for your revisions; I have updated the Strengths section, and reworded the weaknesses section. I now concede that the neural effects observed offer some constraint towards what the neural mechanisms for AV integration for BM are, whereas in my previous review, I said too strongly that these results do not offer any information about mechanism.

      Thank you again for your insightful thoughts and comments on our research. They have contributed greatly to enhancing the discussion of the article and provided valuable inspiration for future exploration of causal mechanisms.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the mechanism of axon growth directed by the conserved guidance cue UNC-6/Netrin. Experiments were designed to distinguish between alternative models in which UNC-6/Netrin functions as either a short-range (haptotactic) cue or a diffusible (chemotactic) signal that steers axons to their final destinations. In each case, axonal growth cones execute ventrally directed outgrowth toward a proximal source of UNC-6/Netrin. This work concludes that UNC-6/Netrin functions as both a haptotactic and chemotactic cue to polarize the UNC-40/DCC receptor on the growth cone membrane facing the direction of growth. Ventrally directed axons initially contact a minor longitudinal nerve tract (vSLNC) at which UNC-6/Netrin appears to be concentrated before proceeding in the direction of the ventral nerve cord (VNC) from which UNC-6/Netrin is secreted. Time-lapse imaging revealed that growth cones appear to pause at the vSLNC before actively extending ventrally directed filopodia that eventually contact the VNC. Growth cone contacts with the vSLNC were unstable in unc-6 mutants but were restored by the expression of a membrane-tethered UNC-6 in vSLNC neurons. In addition, the expression of membrane-tethered UNC-6/Netrin in the VNC was not sufficient to rescue initial ventral outgrowth in an unc-6 mutant. Finally, dual expression of membrane-tethered UNC-6/Netrin in both vSLNC and VNC partially rescued the unc-6 mutant axon guidance defect, thus suggesting that diffusible UNC-6 is also required. This work is important because it potentially resolves the controversial question of how UNC-6/Netrin directs axon guidance by proposing a model in which both of the competing mechanisms, e.g., haptotaxis vs chemotaxis, are successively employed. The impact of this work is bolstered by its use of powerful imaging and genetic methods to test models of UNC-6/Netrin function in vivo thereby obviating potential artifacts arising from in vitro analysis.

      Strengths:

      A strength of this approach is the adoption of the model organism C. elegans to exploit its ready accessibility to live cell imaging and powerful methods for genetic analysis.

      Weaknesses:

      A membrane-tethered version of UNC-6/Netrin was constructed to test its haptotactic role, but its neuron-specific expression and membrane localization are not directly determined although this should be technically feasible. Time-lapse imaging is a key strength of multiple experiments but only one movie is provided for readers to review.

      Thank you for your comments. We have now used SNAP labeling to directly visualize the localization of membrane tethered UNC-6 and confirmed UNC-6 is only detectable on the sublateral and ventral nerve cords (Figure S3A). These data have been added to the manuscript on page 15, lines 342-347. We have also provided a representative movie for each imaged genotype (Videos S2-10).

      Reviewer #2 (Public Review):

      Nichols et al studied the role of axon guidance molecules and their receptors and how these work as long-range and/or local cues, using in-vivo time-lapse imaging in C. elegans. They found that the Netrin axon guidance system works in different modes when acting as a long-range (chemotaxis) cue vs local cue (haptotaxis). As an initial context, they take advantage of the postembryonic-born neuron, PDE, to understand how its axon grows and then is guided into its target. They found that this process occurs in various discrete steps, during which the growth cone migrates and pauses at specific structures, such as the vSLNC. The role of the UNC-6/Netrin and UNC-40/DCC axon guidance ligand-receptor pair was then looked at in terms of its requirement for

      (1) initial axon outgrowth direction

      (2) stabilization at the intermediate target

      (3) directional branching from the sublateral region or

      (4) ventral growth from the intermediate target to the VNC.

      They found that each step is disrupted in the unc-6/Netrin and unc-40/DCC mutants and observed how the localization of these proteins changed during the process of axon guidance in wild-type and mutant contexts. These observations were further supported by analysis of a mutant important for the regulation of Netrin signaling, the E3 ubiquitin ligase madd-2/Trim9/Trim67. Remarkably, the authors identified that this mutant affected axonal adhesion and stabilization, but not directional growth. Using membrane-tethered UNC-6 to specific localities, they then found this to be a consequence of the availability of UNC-6 at specific localities within the axon growth path. Altogether, this data and in-vivo analysis provide compelling evidence of the mechanistic foundation of Netrin-mediated axon guidance and how it works step by step.

      The conclusions are well-supported, with both imaging and quantification of each step of axon guidance and localization of UNC-6 and UNC-40. Using a different type of neuron to validate their findings further supports their conclusions and strengthens their model. It's not yet known whether this model holds true for other ligand-receptor pairs, but the current work sets the stage for future analysis of other axon guidance molecules using time-lapse in-vivo imaging. There are still two outstanding questions that are important to address to support the authors' model and conclusions.

      (1) The results of UNC-6-TM expression at different locations are clear and support the conclusions but need to consider that there's no diffusible UNC-6 available. What would happen if UNC-6 is tethered to the membrane in an otherwise completely 'normal' UNC-6 gradient. Does the axon guidance ensue normally or does it get stuck in the respective site of the membrane tethered-UNC-6 and doesn't continue to outgrow properly? This is an important control (expression of the UNC-6-TM at the vSLNC or VNC in the wild type background) that would help clarify this question and gain a better insight into the separability of both axon guidance steps and the ability to manipulate these.

      Thank you for your comments. We expressed UNC-6<SUP>TM</SUP> at vSLNC and VNC in wild-type animals and examined adult morphology of both HSN and PDE in the control conditions you suggested. These data are available in Tables 1 and 2 with no statistical differences compared to wildtype animals. Second, we also provide still images of developing PDE axons near the vSLNC (Figure S3D) to confirm that this axon guidance step is intact when UNC-6<SUP>TM</SUP> is overexpressed in specific regions. Together, these data suggest that the TM rescue constructs do not interfere with endogenous axon guidance pathways. We have added these results to the manuscript on page 15, lines 347-349.

      (2) Axon guidance systems do not work in a vacuum and are generally competing against each other. For example, the SLT-1/Slit and SAX-3/ROBO axon guidance ligand-receptor pair is also required for PDE, and other post-embryonic neurons, axon guidance. It would be interesting to test mutants for these genes with the membrane tethered-UNC-6 to determine if the different steps of axon guidance are disrupted and if so, to what degree these are disrupted.

      Thank you for this suggestion. We have performed time-lapse imaging on slt-1 mutants and unc-6; slt-1 double mutants. These data are available in a new figure, Figure 3. Indeed, we found that slt-1 mutants showed abnormal direction of axon emergence and stabilization at the VNC but normal stabilization at vsLNC and axonal branching (Fig.3). These data can be found in the manuscript from pages 11-12, lines 248-269.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript from Nichols, Lee, and Shen tackles an important question of how unc6/netrin promotes axon guidance: i.e. haptotaxis vs chemotaxis. This has recently been a large topic of investigation and discussion in the axon guidance field. Using live cell imaging of unc6/netrin and unc40/DCC in several neurons that extend axons ventrally during development, as well as TM localized mutants of Unc6, they suggest that unc6 promotes first haptotaxis of the emerging growth cone followed by chemotaxis of the growth cone. This is timely, as a recent preprint from the Lundquist group, using a similar strategy to make only a TM anchored unc6 similarly found that this could rescue only the haptotaxis-like growth of the PDE neuron, but not the second phase of growth. However, their conclusions were quite different based on the overexpression of unc6 everywhere rescuing the second phase, and thus they conclude that a gradient is not present.

      Strengths:

      As this has been quite a controversy in both the invertebrate and vertebrate field, one strength of this paper is that they use an unc6-neon green to demonstrate unc6 localization, and show a gradient of localization.

      Weaknesses:

      This is important, although it could be strengthened by first showing a more zoomed-out image of unc6 in the animal, and second demonstrating the localization of the transmembrane anchored unc6 mutants, to help define what may be the "diffusible Unc6".

      Thank you for your comments. We have performed both of these experiments. In Figure 6A, we provide a zoomed out image of PDE growth cone interacting with UNC-6::mNG prior to reaching the vSLNC. Notably, we do not observe an obvious gradient that extends into this more dorsal region of the animal. We have also shown the membrane localization of UNC-6<sup>TM</sup> through SNAP labeling in Figure S3A. These data have been added to the manuscript on page 15, lines 342-347.

      I suggest two additional experimental or analysis suggestions: First, the authors clarify the phenotype of ventral emergence of the growth cone. Though the manuscript images suggest that no matter the mutant there is ventral emergence of the growth cone, but then later defects, yet they claim ventral emergence defects with the UNC6 tethered mutants, but there is no comparison of rose plots. This is confusing and needs to be addressed.

      Thank you for your comment. We have now included images (i.e. slt-1(eh15) and unc-6(ev400); slt-1(eh15) genotypes in Figure 3) and movies showing misoriented axon emergence. We have also provided an additional quantification that allows for statistical comparison of emergence angle across genotypes. This quantification takes the sine function of the angle to quantify the relative emergence trajectory across the dorsal-ventral axis. A value of 1 indicates 90° dorsal emergence, and -1 indicates 90° ventral emergence. Statistical comparisons across genotypes demonstrate that axons in both unc-6 and slt-1 mutants are misoriented relative to wild-type axons. These comparisons can be found in Figures S1B, 3C, S2B, S3C.

      Second, I have concerns that the analysis of unc40 polarization may be misleading in some cases when there appears to indeed be accumulation in the growth cone, but since the only analysis shown is relative to the rest of the cell, that can be lost.

      Thank you for sharing your concerns about the UNC-40 polarization quantifications. We have separately compared the value of the integrated density of UNC-40::GFP in each cellular domain (vSLNC-contacting area and the dorsal soma) between genotypes. While we did not include these comparisons in the original manuscript, we have now included them in the revised manuscript. Overall, these data support our conclusions that UNC-40 mispolarization occurs across the entire cell (Fig. S1F,G; S2E-H; S3E,F).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      Comment 1: Within the scope of the current work there are no major weaknesses. That said, the authors themselves note pressing questions beyond the scope of this study that remain unanswered. For instance, the mechanistic nature of the interactions between FMO-4 and the other players in this story, for example in terms of direct protein-protein interactions, is not at all understood yet.

      We thank the reviewer for the positive review, and fully agree and acknowledge that there are unanswered questions for future studies that are beyond the scope of this manuscript.

      Reviewer 2:

      Comment 1: The effects of carbachol and EDTA on intracellular calcium levels are inferred, especially in the tissues where fmo-4 is acting. Validating that these agents and fmo-4 itself have an impact on calcium in relevant subcellular compartments is important to support conclusions on how fmo-4 regulates and responds to calcium.

      We thank the reviewer for this important suggestion. We agree that carbachol and EDTA can be broad agents and validating that they are altering calcium levels is very useful. While this is technically challenging, we attempted to address this by using neuronally expressed GCaMP7f calcium indicator worms and measuring their GFP fluorescence upon exposure to carbachol and EDTA. Assessing both short term and long term exposure to these agents, we were able to show that carbachol increases GFP fluorescence, indicating an increase in calcium levels, and EDTA decreases GFP fluorescence, indicating a decrease in calcium levels. Unfortunately, because FMO-4 is not neuronally expressed, we were not able to test the effects of FMO-4 on calcium in this strain, which would require hypodermal expression and possibly short-term modification of fmo-4 expression to test. We have made sure to temper our language about the indirect measures we used.

      Comment 2: Experiments are generally reliant on RNAi. While in most cases experiments reveal positive results, indicating RNAi efficacy, key conclusions could be strengthened with the incorporation of mutants.

      We appreciate and value this suggestion and agree that mutants could be helpful to strengthen our conclusions. We address this caveat in the discussion of the revised manuscript. We explain that we were concerned about knocking out key calcium regulating genes like itr-1 and mcu-1 that either already result in some level of sickness in the worms when knocked down (itr-1) or could lead to confounding metabolic changes if knocked out. We do find that our RNAi lifespan results are robust and reproducible, but we also understand and recognize the caveats that come with using RNAi knockdown instead of full deletion mutants.

      Reviewer 3:

      Comment 1: no obvious transcriptomic evidence supporting a link between fmo-4 and calcium signaling: either for knockout worms or fmo-4 overexpressing strains.

      We thank the reviewer for this feedback. While there is some transcriptomic evidence, we agree that it is not overwhelming evidence. We do think that this evidence, combined with the phenotype observed under thapsigargin (i.e., significant reduction in worm size and significant delay or prevention of development), in addition to the genetic connections to calcium regulation, provide additional compelling evidence that FMO-4 interacts with calcium signaling.

      Comment 2: no direct measures of alterations in calcium flux, signalling or binding that strongly support a connection with fmo-4.

      As described in reviewer 2 comment 1, we have successfully used GCaMP7f worms to assess calcium flux upon exposure to carbachol and EDTA. This approach confirmed the changes in calcium expected from these compounds. Unfortunately, because FMO-4 is not neuronally expressed, we were not able to test the effects of FMO-4 on calcium in this strain, which would require hypodermal expression and possibly short-term modification of fmo-4 expression to test. We have made sure to temper our language about the indirect measures we used.

      Comment 3: no measures of mitochondrial morphology or activity that strongly support a connection with fmo-4.

      This is a great point, and something we are currently working on to include for a future manuscript. 

      Comment 4: lack of a complete model that places fmo-4 function downstream of DR and mTOR signalling (first Results section), fmo-2 (second Results section) and at the same time explains connection with calcium signalling.

      We thank the reviewer for this helpful feedback. We have included a more complete working model in our revision.

      Recommendations for the authors:

      Reviewer 1:

      Comment 1: "We utilized fmo-4 (ok294) knockout (KO) animals on five conditions reported to extend lifespan in C. elegans." Here I believe "fmo-4 (ok294)" should be "fmo-4(ok294)". (No space).

      We thank the reviewer for this helpful revision. We have made this change as suggested.

      Comment 2: "Wild-type (WT) worms on DR experience a ~35% lifespan extension compared to fed WT worms, but when fmo-4 is knocked out this extension is reduced to ~10% and this interaction is significant by cox regression (p-value < 4.50e-6)." Here "cox regression" should be "Cox regression".

      We have made this change as suggested.

      Comment 3: "Having established this role, we continued lifespan analyses of fmo-4 KO worms exposed to RNAi knockdown of the S6-kinase gene rsks-1 (mTOR signaling), the von hippel lindau gene vhl-1 (hypoxic signaling), the insulin receptor daf-2 (insulin-like signaling), and the cytochrome c reductase gene cyc-1 (mitochondrial electron transport chain, cytochrome c reductase) (Fig 1C-F)." Here "von hippel lindau" should be "Von Hippel-Lindau".

      We have made this change as suggested.

      Comment 4: In three instances in the caption of Figure 5, the "4" in fmo-4 is not italicized when it should be.

      We have made this change as suggested.

      Comment 5: In two instances in the caption of Figure 7, the "4" in fmo-4 is not italicized when it should be, and in one instance in the caption of Figure 7, the "6" in atf-6 is not italicized when it should be.

      We have made this change as suggested.

      Comment 6: "Supplemental Data 3 provides the results of the Log-rank test and Cox regression analysis, which were run in Rstudio." Here Rstudio should be RStudio.

      We have made this change as suggested.

      Comment 7: In the references, within article titles italicization (e.g. of Caenorhabditis elegans) is frequently missing. While this is often an artifact introduced by reference management software, it should be corrected in the final manuscript.

      We thank the reviewer for all the helpful revision suggestions. We have made sure all the references are properly italicized where necessary.

      Reviewer 2:

      Comment 1: While FMO-4 is clearly placed in the ER calcium pathway genetically, the molecular mechanism by which FMO-4 would alter ER calcium is unclear. Notably, Tuckowski et al. highlight this gap in the discussion as well.

      We thank the reviewer for identifying this important caveat. We hope to address the molecular mechanism by which FMO-4 alters ER calcium in upcoming projects.

      Comment 2: Determining whether overexpression of catalytically dead FMO-4 or introduction of an inactivating point mutant into the endogenous locus phenocopy FMO-4 OE and KO animals would help distinguish between mechanisms involving protein-protein interactions or downstream metabolic regulation.

      We thank the reviewer for this valuable suggestion. This is an experiment we are hoping to do in the near future to better understand molecular mechanisms and protein-protein interactions.

      Reviewer 3:

      Comment 1: When measuring the effect of thapsigargin on development of fmo-4 mutants it would be great to use a developmental assay rather than quantifying normalized worm area. Also please add scale bars to Figure 3G and 4H, it seems that fmo-4 overexpression decreases worm size even in control conditions, clarify if this is the case.

      We thank the reviewer for this feedback. In addition to quantifying normalized worm area in Figure 3G-I, we have added a developmental assay (Figure 3J) that shows the development time of wild-type worms on DMSO or thapsigargin as well as the fmo-4 OE worms on DMSO or thapsigargin. These data validate that the fmo-4 OE worm development is either delayed significantly or even prevented when the worms are treated with thapsigargin.

      We have added scale bars to Figure 3G and 4H as suggested.

      We also appreciate the reviewer’s observation of the fmo-4 overexpression worms appearing smaller than wild-type worms in control conditions. We looked through the replicates and found that just one replicate showed a significant decrease in worm size, as observed in our unrevised manuscript. We repeated this experiment twice more to gather more data and determined that the fmo-4 overexpression worms were ultimately not significantly different in size compared to wild-type worms. We have included the new images and quantifications in Figure 3G-I and Figure 4H-J in the revised manuscript.

      Comment 2: correct or replace Supplementary Table 2, which is not showing a DAVID analysis as the title and text would suggest. We should see biological/molecular processes, effect sizes, p-values, ...

      We thank the reviewer for identifying this issue. We have added more detail to the Supplementary Table 2 so that it is clearer what is being shown in each tab.

      Comment 3: clarify the data presented in Supplementary Data 2 because it does not clearly explain what is shown

      This is a great point, and we have added more detail to the Supplementary Data 2 to make sure the data are more clearly explained in each tab.

      Comment 4: in Figure 5B the fluorescent images do not seem to reflect the quantification in panel 5C.

      Thank you for this feedback. We re-analyzed our data to make sure the proper fluorescent images are included with their matching quantifications in Figure 5B-C.

      Comment 5: where is Supplementary Data 3?

      We thank the reviewer for noticing this. Supplementary Data 3 was accidentally missing from the first submission, and has now been added.

      Comment 6: conceptually the last results section (regarding atf-6) does not add much to the story, I would consider removing these results

      We appreciate this feedback. We have decided to keep Figure 7 because we think it helps to validate fmo-4’s role in calcium movement from the ER. While we show genetic interactions between fmo-4 and key genes involved in calcium regulation (crt-1, itr-1, and mcu-1), we think that showing how fmo-4 also interacts with atf-6, a known regulator of calcium homeostasis, strengthens and supports the genetic mechanisms of fmo-4 proposed in this manuscript.

      Comment 7: the model proposed in Figure 7E is not convincingly supported by the results:<br /> o the arrows connecting atf-6, fmo-4 and crt-1 (calreticulin) suggest that fmo-4 is downstream of atf-6 and upstream of crt-1: Berkowitz 2020 showed that atf-6 knockdown downregulates calreticulin, so unless the authors show that this downregulation is mediated directly by fmo-4, the more likely explanation is that atf-6 knockdown affects calcium levels which in turn induces fmo-4 expression.

      We thank the reviewer for this helpful feedback. We have addressed this by updating our proposed model. We used a solid arrow leading from the reduction of atf-6 to induction of fmo-4, as this is supported by our data in Figure 7A-B. We then used dashed arrows between fmo-4 and crt-1 as well as between atf-6 and crt-1 to indicate that more data is needed to clarify this part of the pathway.

      Comment 8: Avoid pointing at a mitochondrial connection in the title as the only evidence supporting this interaction comes from the mcu-1 RNAi epistasis.

      We appreciate the reviewer’s suggestion. We added another piece of evidence suggesting an interaction between fmo-4 and the mitochondria to Supplementary Figure 7G-H. Here we show that while fmo-4 OE worms are resistant to paraquat stress, knocking down vdac-1 (a calcium regulator located in the outer mitochondrial membrane), abrogates this effect. We have kept mitochondria in our title but have made sure to temper our language in the main text to avoid pointing to a strong mitochondrial connection, since we have two pieces of evidence connecting fmo-4 to the mitochondria.

    1. Author response:

      Reviewer #1 (Public review):  

      Hüppe and colleagues had already developed an apparatus and an analytical approach to capture swimming activity rhythms in krill. In a previous manuscript they explained the system, and here they employ it to show a circadian clock, supplemented by exogenous light, produces an activity pattern consistent with "twilight" diel vertical migration (DVM; a peak at sunset, a midnight sink, and a peak in the latter half of the night). 

      They used light:dark (LD) followed by dark:dark (DD) photoperiods at two times of the year to confirm the circadian clock, coupled with DD experiments at four times of year to show rhythmicity occurs throughout the year along with DVM in the wild population. The individual activity data show variability in the rhythmic response, which is expected. However, their results showed rhythmicity was sustained in DD throughout the year, although the amplitude decayed quickly. The interpretation of a weak clock is reasonable, and they provide a convincing justification for the adaptive nature of such a clock in a species that has a wide distributional range and experiences various photic environments. These data also show that exogenous light increases the activity response and can explain the morning activity bouts, with the circadian clock explaining the evening and late-night bouts. This acknowledgement that vertical migration can be driven by multiple proximate mechanisms is important. 

      The work is rigorously done, and the interpretations are sound. I see no major weaknesses in the manuscript. Because a considerable amount of processing is required to extract and interpret the rhythmic signals (see Methods and previous AMAZE paper), it is informative to have the individual activity plots of krill as a gut check on the group data. 

      The manuscript will be useful to the field as it provides an elegant example of looking for biological rhythms in a marine planktonic organism and disentangling the exogenous response from the endogenous one. Furthermore, as high latitude environments change, understanding how important organisms like krill have the potential to respond will become increasingly important. This work provides a solid behavioral dataset to complement the earlier molecular data suggestive of a circadian clock in this species. 

      We appreciate the positive evaluation of our work by Reviewer 1, acknowledging our approach to record locomotor activity in krill as well as the importance of the findings in assessing krill’s potential to respond to environmental change in their habitat.  

      Reviewer #2 (Public review):  

      Summary: 

      This manuscript provides experimental evidence on circadian behavioural cycles in Antarctic krill. The krill were obtained directly from krill fishing vessels and the experiments were carried out on board using an advanced incubation device capable of recording activity levels over a number of days. A number of different experiments were carried out where krill were first exposed to simulated light:dark (L:D) regimes for some days followed by continuous darkness (DD). These were carried out on krill collected during late autumn and late summer. A further set of experiments was performed on krill across three different seasons (summer, autumn, winter), where incubations were all DD conditions. Activity was measured as the frequency by which an infrared beam close to the top of the incubation tube was broken over unit time. Results showed that patterns of increased and decreased activity that appeared synchronised to the LD cycle persisted during the DD period. This was interpreted as evidence of the operation of an internal (endogenous) clock. The amplitude of the behavioural cycles decreased with time in DD, which further suggests that this clock is relatively weak. The authors argued that the existence of a weak endogenous clock is an adaptation to life at high latitudes since allowing the clock to be modulated by external (exogenous) factors is an advantage when there is a high degree of seasonality. This hypothesis is further supported by seasonal DD experiments which showed that the periodicity of high and low activity levels differed between seasons. 

      Strengths 

      Although there has been a lot of field observations of various circadian type behaviour in Antarctic krill, relatively few experimental studies have been published considering this behaviour in terms of circadian patterns of activity. Krill are not a model organism and obtaining them and incubating them in suitable conditions are both difficult undertakings. Furthermore, there is a need to consider what their natural circadian rhythms are without the overinfluence of laboratory-induced artefacts. For this reason alone, the setup of the present study is ideal to consider this aspect of krill biology.

      Furthermore, the equipment developed for measuring levels of activity is well-designed and likely to minimise artefacts. 

      We would like to thank Reviewer 2 for their positive assessment of our approach to study the influence of the circadian clock on krill behavior. We are delighted, that Reviewer 2 found our mechanistic approach in understanding daily behavioral patterns of Antarctic krill using the AMAZE set-up convincing, and that the challenging circumstances of working with a polar, non-model species are acknowledged.

      Weaknesses 

      I have little criticism of the rationale for carrying out this work, nor of the experimental design. Nevertheless, the manuscript would benefit from a clearer explanation of the experimental design, particularly aimed at readers not familiar with research into circadian rhythms. Furthermore, I have a more fundamental question about the relationship between levels of activity and DVM on which I will expand below. Finally, it was unclear how the observational results made here related to the molecular aspects considered in the Discussion. 

      (1) Explanation of experimental design - I acknowledge that the format of this particular journal insists that the Results are the first section that follows the Introduction. This nevertheless presents a problem for the reader since many of the concepts and terms that would generally be in the Methods are yet to be explained to the reader. Hence, right from the start of the Results section, the reader is thrown into the detail of what happened during the LD-DD experiments without being fully aware of why this type of experiment was carried out in the first place. Even after reading the Methods, further explanation would have been helpful. Circadian cycle type research of this sort often entrains organisms to certain light cycles and then takes the light away to see if the cycle continues in complete darkness, but this critical piece of knowledge does not come until much later (e.g. lines 369372) leaving the reader guessing until this point why the authors took the approach they did. I would suggest the following (1) that more effort is made in the Introduction to explain the exact LD/DD protocols adopted (2) that a schematic figure is placed early on in the manuscript where the protocol is explained including some logical flow charts of e.g. if behavioural cycle continues in DD then internal clock exists versus if cycle does not continue in DD, the exogenous cues dominate - followed by - major decrease in cyclic amplitude = weak clock versus minor decrease = strong clock and so on 

      We would like to thank Reviewer 2 for pointing out that the experimental design and the rationale behind it are not becoming clear early in the manuscript, especially for people outside the field of chronobiology. We think that the suggestion to include a schematic figure early in the manuscript is excellent and we plan to implement this in a revised version of the manuscript.  

      (2) Activity vs kinesis - in this study, we are shown data that (i) krill have a circadian cycle - incubation experiments; (ii) that krill swarms display DVM in this region - echosounder data (although see my later point). My question here is regarding the relationship between what is being measured by the incubation experiments and the in situ swarm behaviour observations. The incubation experiments are essentially measuring the propensity of krill to swim upwards since it logs the number of times an individual (or group) break a beam towards the top of the incubation tube. I argue that krill may be still highly active in the rest of the tube but just do not swim close to the surface, so this approach may not be a good measure of "activity". Otherwise, I suggest a more correct term of what is being measured is the level of "upward kinesis". As the authors themselves note, krill are negatively buoyant and must always be active to remain pelagic. What changes over the day-night cycle is whether they decide to expend that activity on swimming upwards, downwards or remaining at the same depth. Explaining the pattern as upward kinesis then also explains by swarms move upwards during the night. Just being more active at night may not necessarily result in them swimming upwards. 

      We believe that there is a slight misunderstanding in the way that what we call “activity” is measured. The experimental columns are equipped with five detector modules, evenly distributed over the height of the column. In our analysis we count all beam breaks that are caused by upward movement, i.e. every time a detector module is triggered after a detector module at a lower position has been triggered, and not only when the top detector module is triggered. In this way, we record upward swimming movements throughout the column, and not only when the krill swims all the way to the top of the column. This still means that what we are measuring is swimming activity, caused by upward swimming. We use this measure, to deliberately separate increased swimming activity, from baseline activity (i.e. swimming which solely compensates for negative buoyancy) and inactivity (i.e. passive sinking). 

      A higher activity is thus at first interpreted as an increase in swimming activity, which in the field may result in upwards directed swimming but also could mean a horizontal increase in activity, for example representing increased foraging and feeding activity. This would explain the daily activity pattern observed under LD cycles (Fig. 2), which shows a general increase in activity during the dark phase. This nighttime increase could be used for both upward directed migration during sunset as well as horizontal directed swimming for feeding and foraging throughout the night.

      We will formulate the description of the activity metric more clearly in the revised version of the manuscript.

      (3) Molecular relevance - Although I am interested in molecular clock aspects behind these circadian rhythms, it was not made clear how the results of the present study allow any further insight into this. In lines 282 to 284, the findings of the study by Biscontin et al (2017) are discussed with regard to how TIM protein is degraded by light via the clock photreceptor CRYTOCHROME 1. This element of the Discussion would be a lot more relevant if the results of the present study were considered in terms of whether they supported or refuted this or any other molecular clock model. As it stands, this paragraph is purely background knowledge and a candidate for deletion in the interest of shortening the Discussion.  

      We agree that this part is not directly related to the data presented in the manuscript and will therefore omit this part in the revised version of the manuscript to keep the discussion concise and focused on the results. 

      Other aspects 

      (i) 'Bimodal swimming' was used in the Abstract and later in the text without the term being fully explained. I could interpret it to mean a number of things so some explanation is required before the term is introduced. 

      We thank the Reviewer for pointing this out and will provide an explanation for the term “bimodal swimming” in a revised version of the manuscript. 

      (ii) Midnight sinking - I was struck by Figure 2b with regards to the dip in activity after the initial ascent, as well as the rise in activity predawn. Cushing (1951) Biol Rev 26: 158-192 describes the different phases of a DVM common to a number of marine organisms observed in situ where there is a period of midnight sinking following the initial dusk ascent and a dawn rise prior to dawn descent. Tarling et al (2002) observe midnight sinking pattern in Calanus finmarchicus and consider whether it is a response to feeding satiation or predation avoidance (i.e. exogenous factors). Evidence from the present study indicates that midnight sinking (and potential dawn rise) behaviour could alternatively be under endogenous control to a greater or lesser degree. This is something that should certainly be mentioned in the Discussion, possibly in place of the molecular discussion element mentioned above - possibly adding to the paragraph Lines 303-319. 

      We would like to thank the Reviewer for pointing this out and agree that it would be interesting to add the idea of an endogenous control of midnight sinking to the discussion. We plan to implement this in a revised version of the manuscript. 

      (iii) Lines 200-207 - I struggled to follow this argument regarding Piccolin et al identifying a 12 h rhythm whereas the present study indicates a ~24 h rhythm. Is one contradicting the other - please make this clear. 

      In our study we found that the circadian clock drives a bimodal pattern of swimming activity in krill, meaning it controls two bouts of activity in a 24 h cycle. Piccolin et al. (2020) identified a swimming activity pattern of ~12 h (i.e. two peaks in 24 h) at the group level, which is in line with our findings at the individual level. We will revisit the mentioned section for more clarity in a revised version.   

      (iv) Although I agree that the hydroacoustic data should be included and is generally supportive of the results, I think that two further aspects should be made clear for context (a) whether there was any groundtruthing that the acoustic marks were indeed krill and not potentially some other group know to perform DVM such as myctophids (b) how representative were these patterns - I have a sense that they were heavily selected to show only ones with prominent DVM as opposed to other parts of the dataset where such a pattern was less clear - I am aware of a lot of krill research where DVM is not such a clear pattern and it is disingenuous to provide these patterns as the definitive way in which krill behaves. I ask this be made clear to the reader (note also that there is a suggestion of midnight sinking in Fig 5b on 28/2).  

      To clarify the mentioned points concerning the hydroacoustic data:

      a) As mentioned in the Methods section, only hydroacoustic data during active fishing was included in the analysis. E. superba occurs in large monospecific aggregations and the fishery is actively targeting E. superba and monitoring their catch and the proportion of non-target species continuously with cameras. Krill fishery bycatch rates are very low (0.1–0.3%, Krafft et al. 2018), and fishing operations would stop if non-target species were being caught in significant proportions at any time. Therefore, and supported by our own observations when we conducted the experiments, we argue that it is a valid assumption that the backscattering signal shown in Figure 5 is predominantly caused by E. superba. 

      b) We are aware of the fact that DVM patterns of Antarctic krill are highly variable and that normal DVM patterns do not need to be the rule (e.g. see our cited study on the plasticity of krill DVM by Bahlburg et al. 2023). The visualized data were not selected for their DVM pattern but represent the period directly preceding the sampling for behavioral experiments in four different seasons (namely S1-S4), including the day of sampling. These periods were chosen to assess the DVM behavior of krill swarms in the field in the days before and during the sampling for behavioral experiments. 

      We will include these aspects in the Methods section in a revised version of the manuscript in order to improve understanding.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors' research group had previously demonstrated the release of large multivesicular body-like structures by human colorectal cancer cells. This manuscript expands on their findings, revealing that this phenomenon is not exclusive to colorectal cancer cells but is also observed in various other cell types, including different cultured cell lines, as well as cells in the mouse kidney and liver. Furthermore, the authors argue that these large multivesicular body-like structures originate from intracellular amphisomes, which they term "amphiectosomes." These amphiectosomes release their intraluminal vesicles (ILVs) through a "torn-bag mechanism." Finally, the authors demonstrate that the ILVs of amphiectosomes are either LC3B positive or CD63 positive. This distinction implies that the ILVs either originate from amphisomes or multivesicular bodies, respectively.

      Strengths:

      The manuscript reports a potential origin of extracellular vesicle (EV) biogenesis. The reported observations are intriguing.

      Weaknesses:

      It is essential to note that the manuscript has issues with experimental designs and lacks consistency in the presented data. Here is a list of the major concerns:

      (1) The authors culture the cells in the presence of fetal bovine serum (FBS) in the culture medium. Given that FBS contains a substantial amount of EVs, this raises a significant issue, as it becomes challenging to differentiate between EVs derived from FBS and those released by the cells. This concern extends to all transmission electron microscopy (TEM) images (Figure 1, 2P-S, S5, Figure 4 P-U) and the quantification of EV numbers in Figure 3. The authors need to use an FBS-free cell culture medium.

      Although FBS indeed contains bovine EVs, however, the presence of very large multivesicular EVs (amphiectosomes) that our manuscript focuses on has never been observed and reported. For reported size distributions of EVs in FBS, please find a few relevant references below:

      PMID: 29410778, PMID: 33532042, PMID: 30940830 and PMID: 37298194

      All the above publications show that the number of lEVs > 350-500 nm is negligible in FBS. The average diameter of MV-lEVs (amphiectosomes) described in our manuscript is around 1.00-1.50 micrometer.

      Reviewer #1: These papers evaluated the effectiveness of various methods to eliminate EVs from FBS, emphasizing the challenges associated with the presence of EVs in FBS. They also caution against using FBS in EV studies due to these issues. However, I did not find a clear indication regarding the size distributions of EVs in FBS in these papers.

      Please provide accurate reference supporting the claim that 'lEVs > 350-500 nm are negligible in FBS.' The papers cited by the authors do not address this specific point.

      In the revised manuscript, we addressed the point that due to sterile filtering of FBS, it cannot contain large >0.22 µm EVs

      Our response to Reviewer #1 point 2. When we demonstrated the TEM of isolated EVs, we consistently used serum- free conditioned medium (Fig2 P-S, Fig2S5 J, O) as described previously (Németh et al 2021, PMID: 34665280).

      Reviewer #1: This is an important point that is not mentioned in the original main text, figure legend or method. Please address.

      We agree and we apologize for it. We added this information to the revised manuscript.

      Our response to Reviewer #1 point 3. Our TEM images show cells captured in the process of budding and scission of large multivesicular EVs excluding the possibility that these structures could have originated from FBS.

      Reviewer #1: These images may also depict the engulfment of EVs in FBS. Hence, it is crucial to utilize EV-free or EV-depleted FBS.

      As we mentioned earlier, we added the information to the revised manuscript that sterile filtering of the FBS presumably removed particles >0.22 µm EVs

      Our response to Reviewer #1 point 4. In addition, in our confocal analysis, we studied Palm-GFP positive, cell-line derived MV-lEVs. Importantly, in these experiments, FBS-derived EVs are non-fluorescent, therefore, the distinction between GFP positive MV-lEVs and FBS-derived EVs was evident.

      Reviewer #1: I agree that these fluorescent-labeled assays conclusively indicate that the MV-lEVs are originating from the cells. However, the images of concerns are the non- fluorescent-labeled images in (Figure 1, 2P-S, S5, Figure 4 P-U and Figure 3). The MV-lEVs may derive from both the cells and FBS.

      Please see above our response to points 1-3.

      Our response to Reviewer #1 point 5. In addition, culturing cells in FBS-free medium (serum starvation) significantly affects autophagy. Given that in our study, we focused on autophagy related amphiectosome secretion, we intentionally chose to use FBS supplemented medium.

      Reviewer #1 If this is a concern, the authors should use EV-depletive FBS.

      As we discussed above, sterile filtration of FBS removes particles >0.22 µm. In addition, based on our preliminary experiments, EV-depleted serum may effect cell physiology. 

      Our response to Reviewer #1 point 6. Even though the authors of this manuscript are not familiar with the technological details how FBS is processed before commercialization, it is reasonable to assume that the samples are subjected to sterile filtration (through a 0.22 micron filter) after which MV-lEVs cannot be present in the commercial FBS samples.

      Reviewer #1This is a fair comment that needs to be included in the manuscript.

      As you suggested, this comment is now included in the revised manuscript

      (2) The data presented in Figure 2 is not convincingly supportive of the authors' conclusion. The authors argue that "...CD81 was present in the plasma membrane-derived limiting membrane (Figures 2B, D, F), while CD63 was only found inside the MV-lEVs (Fig. 2A, C, E)." However, in Figure 2G, there is an observable CD63 signal in the limiting membrane (overlapping with the green signals), and in Figure 2J, CD81 also exhibits overlap with MV-IEVs.

      Both CD63 and CD81 are tetraspanins known to be present both in the membrane of sEVs and in the plasma membrane of cells (for references, please see Uniprot subcellular location maps: https://www.uniprot.org/uniprotkb/P08962/entry#subcellular_location https://www.uniprot.org/uniprotkb/P60033/entry#subcellular_location). However, according the feedback of the reviewer, for clarity, we will delete the implicated sentence from the text.

      Reviewer #1 Please also justify the statement questioned in (3) as these arguments are interconnected.

      We hope you find our above responses to your comment acceptable.

      (3) Following up on the previous concern, the authors argue that CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs, respectively (Figure 2-A-M). However, in lines 104-106, the authors conclude that "The simultaneous presence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs..." This statement indicates that CD63 and CD81 co-localize to the MV-IEVs. The authors need to address this apparent discrepancy and provide an explanation.

      There must be a misunderstanding because we did not claim or implicate in the text that “CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs”. Here we studied co-localization of the above proteins in the case intraluminal vesicles (ILVs). In Fig 2. we did not show any analysis of limiting membrane co-localization.

      Reviewer #1 I have indicated that this statement is found in lines 104-106, where the authors argue, 'The simultaneous presence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs...' If the authors acknowledge the inaccuracy of this statement, please provide a justification for this argument.

      For clarity, we modified the description of data shown in Fig2 in the revised manuscript.

      (4) The specificity of the antibodies used in Figure 2 should be validated through knockout or knockdown experiments. Several of the antibodies used in this figure detect multiple bands on western blots, raising doubts about their specificity. Verification through additional experimental approaches is essential to ensure the reliability and accuracy of all the immunostaining data in this manuscript.

      We will consider this suggestion during the revision of the manuscript.

      Reviewer #1:Please do so.

      We carefully considered the suggestion, but we realized that it was not feasible for us to perform gene silencing in the case of all our used antibodies before resubmission of our revised manuscript. However, we repeated the Western blot for mouse anti-CD81 (Invitrogen MAA5-13548) and replaced the previous Western blot by it in the revised manuscript (Fig.2-S4H)

      (5) In Figures 2P-R, the morphology of the MV-IEVs does not resemble those shown in Figures 1-A, H, and D, indicating a notable inconsistency in the data.

      EM images in Figure2 P-R show sEVs separated from serum-free conditioned media as opposed to MV-lEVs, which were in situ captured in fixed tissue cultures (Fig1). Therefore, the two EV populations necessarily have different size and structure. Furthermore, Fig. 1 shows images of ultrathin sections while in Figure 2P-R, we used a negative-positive contrasting of intact sEV-s without embedding and sectioning.

      (6) There are no loading controls provided for any of the western blot data.

      Not even the latest MISEV 2023 guidelines give recommendations for proper loading control for separated EVs in Western blot (MISEV 2023 , DOI: 10.1002/jev2.12404 PMID: 38326288). Here we applied our previously developed method (PMID: 37103858), which in our opinion, is the most reliable approach to be used for sEV Western blotting. For whole cell lysates, we used actin as loading control (Fig3-S2B).

      Reviewer #1: The blots referenced here (Fig2-S3; Fig2-S4B; Fig3-S2B) were conducted using total cell lysates, not EV extracts. Only one blot in Fig3-S2B includes an actin control. All remaining blots should incorporate actin controls for consistency.

      Fig2-S3 (corresponding to Fig2-S4 in the revised manuscript) only shows reactivity of the used antibodies. This Western blot is not intended to serve as a basis of any quantitative conclusions. Fig2-S4 (corresponding to Fig2-S5 in the revised manuscript) includes the actin control. Fig3-S2B shows the complete membrane, which was cut into 4 pieces, and the immune reactivity of different antibodies was tested. The actin band was included on the anti-LC3B blot. For clarity, we rephrased the figure legend.

      Additionally, for Figures 2-S4B, the authors should run the samples from lanes i-iii in a single gel.

      Please note that in Figure 2- S4B, we did run a single gel, and the blot was cut into 4 pieces, which were tested by anti-GFP, anti-RFP, anti-LC3A and anti-LC3B antibodies. Full Western blots are shown in Fig.3_S2 B, and lanes “1”, “2” and “3” correspond to “i”, “ii” and “iii” in Fig.2-S4, respectively.

      Reviewer #1: In the original Figure 2- S4B, the blots were sectioned into 12 pieces. If lanes "i," "ii," and "iii" were run on the same blot, the authors are advised to eliminate the grids between these lanes.

      Grids separating the lanes have been eliminated on Fig.2_S4 (now Fig.2_S5 in the revised manuscript).

      (7) In Figure 2-S4, is there co-localization observed between LC3RFP (LC3A?) with other MV-IFV markers? How about LC3B? Does LC3B co-localize with other MV-IFV markers?

      In Supplementary Figure 2-S4, we showed successful generation of HEK293T-PalmGFP-LC3RFP cell line. In this case we tested the cells, and not the released MV-lEVs. LC3A co-localized with the RFP signal as expected.

      Reviewer #1: Does LC3RFP colocalize with MV-IFV markers in HEK293T-PalmGFP-LC3RFP cell line? This experiment aims to clarify the conclusion made in lines 104-106, where the authors assert that 'The concurrent existence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs...'

      In the case of PalmGFP-LC3RFP cells, LC3-RFP is overexpressed. Simultaneous assessment of this overexpressed protein with non-overexpressed, fluorescent antibod-detected molecules proved to be challenging because of spectral overlaps and inappropriate signal-noise ratios. Furthermore, in association with EVs, the number of antibody-detected molecules is substantially lower than in cells. Therefore, even though we tried, we could not successfully perform these experiments.

      (8) The TEM images presented in Figure 2-S5, specifically F, G, H, and I, do not closely resemble the images in Figure 2-S5 K, L, M, N, and O. Despite this dissimilarity, the authors argue that these images depict the same structures. The authors should provide an explanation for this observed discrepancy to ensure clarity and consistency in the interpretation of the presented data.

      As indicated in Material and Methods, Fig 2-S5 F, G, H and I are conventional TEM images fixed by 4% glutaraldehyde 1% OsO<sub>4</sub> 2h and embedded into Epon resin with a post contrasting of 3.75% uranyl acetate 10 min and 12 min lead citrate. Samples processed this way have very high structure preservation and better image quality, however, they are not suitable for immune detection. In contrast, Fig.2.-S5 K,L,M,N shows immunogold labelling of in situ fixed samples. In this case we used milder fixation (4% PFA, 0.1% glutaraldehyde, postfixed by 0.5% OsO<sub>4</sub> 30 min) and LR-White hydrophilic resin embedding. This special resin enables immunogold TEM analysis. The sections were exposed to H<sub>2</sub>O<sub>2</sub> and NaBH<sub>4</sub> to render the epitopes accessible in the resin. Because of the different applied techniques, the preservation of the structure is not the same. In the case of Fig.2 J, O, separated sEVs were visualised by negative-positive contrast and immunogold labelling as described previously (PMID: 37103858).

      Reviewer #1: Please include this justification in the revised version.

      We included this justification in the revised manuscript.

      (9) For Figures 3C and 3-S1, the authors should include the images used for EV quantification. Considering the concern regarding potential contamination introduced by FBS (concern 1), it is advisable for the authors to employ an independent method to identify EVs, thereby confirming the reliability of the data presented in these figures.

      In our revised manuscript, we will provide all the images used for EV quantification in Figure 3C. Given that Figures 3C and 3-S1 show MV-lEVs released by HEK293T-PlamGFP cells, the possible interference by FBS-derived non-fluorescent EVs can be excluded.

      Reviewer #1: Please provide all the images.

      Original LASX files are provided (DOI: 10.6019/S-BIAD1456 ).

      Reviewer #1: The images raising concerns regarding the contamination of EVs in FBS primarily consist of transmission electron microscopy (TEM) images, namely, Figure 1, 2P-S, S5, and Figure 4 P-U, along with the quantification of EV numbers in Figure 3. These concerns persist despite the use of fluorescent-labeled experiments. While fluorescent-labeled MV-lEVs are conclusively identified as originating from the cells, the MV-lEVs observed in Figure 1, 2P-S, S5, and Figure 4 P-U and Figure 3 may derive from both the cells and FBS.

      Large EVs (with diameter >800 nm) derived from FBS were not present in our experiments, as discussed above.

      (10) Do the amphiectosomes released from other cell types as well as cells in mouse kidneys or liver contain LC3B positive and CD63 positive ILVs?

      Based on our confocal microscopic analysis, in addition the HEK293T-PalmGFP cells, HT29 and HepG2 cells also release similar LC3B and CD63 positive MV-lEVs. Preliminary evidence shows MV-lEV secretion by additional cell types.

      The response of Reviewer #1: Please show these data in the revised manuscript. Moreover, do cells in mouse kidneys or liver contain LC3B positive and CD63 positive ILVs?

      We have added new confocal microscopic images to Fig2-S3 showing amphiectosomes released also by the H9c2 (ATCC) cardiomyoblast cell line. To preserve the ultrastructure of MV-lEVs in complex organs like kidney and liver, fixation with 4% glutaraldehyde with 1% OsO4 appears to be essential. This fixation does not allow for immune detection to assess LC3B and CD63 positive MV-lEVs in the ultrathin sections.

      Reviewer #2 (Public Review):

      Summary:

      The authors had previously identified that a colorectal cancer cell line generates small extracellular vesicles (sEVs) via a mechanism where a larger intracellular compartment containing these sEVs is secreted from the surface of the cell and then tears to release its contents. Previous studies have suggested that intraluminal vesicles (ILVs) inside endosomal multivesicular bodies and amphisomes can be secreted by the fusion of the compartment with the plasma membrane. The 'torn bag mechanism' considered in this manuscript is distinctly different because it involves initial budding off of a plasma membrane-enclosed compartment (called the amphiectosome in this manuscript, or MV-lEV). The authors successfully set out to investigate whether this mechanism is common to many cell types and to determine some of the subcellular processes involved.

      The strengths of the study are:

      (1) The high-quality imaging approaches used, seem to show good examples of the proposed mechanism.

      (2) They screen several cell lines for these structures, also search for similar structures in vivo, and show the tearing process by real-time imaging.

      (3) Regarding the intracellular mechanisms of ILV production, the authors also try to demonstrate the different stages of amphiectosome production and differently labelled ILVs using immuno-EM.

      Several of these techniques are technically challenging to do well, and so these are critical strengths of the manuscript.

      The weaknesses are:

      (1) Most of the analysis is undertaken with cell lines. In fact, all of the analysis involving the assessment of specific proteins associated with amphiectosomes and ILVs are performed in vitro, so it is unclear whether these processes are really mirrored in vivo. The images shown in vivo only demonstrate putative amphiectosomes in the circulation, which is perhaps surprising if they normally have a short half-life and would need to pass through an endothelium to reach the vessel lumen unless they were secreted by the endothelial cells themselves.

      Our previous results analyzing PFA-fixed, paraffin embedded sections of colorectal cancer patients provided direct evidence that MV-lEV secretion also occurs in humans in vivo (PMID: 31007874). Regarding your comment on the presence of amphiectosomes in the circulation despite their short half-lives, we would like to point out that Fig1.X shows a circulating lymphocyte which releases MV-lEV within the vessel lumen. Furthermore, in the revised manuscript, an additional Fig.1-S1 is provided. Here, we show the release of MV-lEVs both by an endothelial and a sub-endothelial cell (Fig.1-S1G). In addition, these images show the simultaneous presence of MV-lEVs and sEVs in the circulation (Fig.1-S1.A,C,D,H and I). The transmission electron micrographs of mouse kidney and liver sections provide additional evidence that the MV-lEVs are released by different types of cells, and the “torn bag release” also takes place in vivo (Fig.1.V).

      (2) The analysis of the intracellular formation of compartments involved in the secretion process (Figure 2-S5) relies on immuno-EM, which is generally less convincing than high-/super-resolution fluorescence microscopy because the immuno-labelling is inevitably very sporadic and patchy. High-quality EM is challenging for many labs (and seems to be done very well here), but high-/super-resolution fluorescence microscopy techniques are more commonly employed, and the study already shows that these techniques should be applicable to studying the intracellular trafficking processes.

      As you suggested, in the revised manuscript, we present additional super-resolution microscopy (STED) data. The intracellular formation of amphisomes, the fragmentation of LC3B-positive membranes and the formation of LC3B-positive ILVs were captured (Fig. 3B-F).

      (3) One aspect of the mechanism, which needs some consideration, is what happens to the amphisome membrane, once it has budded off inside the amphiectosome. In the fluorescence images, it seems to be disrupted, but presumably, this must happen after separation from the cell to avoid the release of ILVs inside the cell. There is an additional part of Figure 1 (Figure 1Y onwards), which does not seem to be discussed in the text (and should be), that alludes to amphiectosomes often having a double membrane.

      We agree with your comment regarding the amphisome membrane and we added a sentence to the Discussion of the revised manuscript. Fig1Y onwards is now discussed in the manuscript. In addition, we labelled the surface of living HEK293 cells with wheat germ agglutinin (WGA), which binds to sialic acid and N-acetyl-D-glucosamine. After removing the unbound WGA by washes, the cells were cultured for an additional 3 hours, and the release of amphiectosomes was studied. The budding amphiectosome had WGA positive membrane providing evidence that the external limiting membrane had a plasma membrane origin (Fig.3G)

      (4) The real-time analysis of the amphiectosome tearing mechanism seemed relatively slow to me (over three minutes), and if this has been observed multiple times, it would be helpful to know if this is typical or whether there is considerable variation.

      Thank you for this comment. In the revised manuscript, we highlight that the first released LC3 positive ILV was detected as early as within 40 sec.

      Overall, I think the authors have been successful in identifying amphiectosomes secreted from multiple cell lines and demonstrating that the ILVs inside them have at least two origins (autophagosome membrane and late endosomal multivesicular body) based on the markers that they carry. The analysis of intracellular compartments producing these structures is rather less convincing and it remains unclear what cells release these structures in vivo.

      I think there could be a significant impact on the EV field and consequently on our understanding of cell-cell signalling based on these findings. It will flag the importance of investigating the release of amphiectosomes in other studies, and although the authors do not discuss it, the molecular mechanisms involved in this type of 'ectosomal-style' release will be different from multivesicular compartment fusion to the plasma membrane and should be possible to be manipulated independently. Any experiments that demonstrate this would greatly strengthen the manuscript.

      We appreciate these comments of the reviewer. Experiments are on their way to elucidate the mechanism of the “ectosomal style” exosome release and will be the topic of our next publication.

      In general, the EV field has struggled to link up analysis of the subcellular biology of sEV secretion and the biochemical/physical analysis of the sEVs themselves, so from that perspective, the manuscript provides a novel angle on this problem.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors describe a novel mode of release of small extracellular vesicles. These small EVs are released via the rupture of the membrane of so-called amphiectosomes that resemble "morphologically" Multivesicular Bodies.

      These structures have been initially described by the authors as released by colorectal cancer cells (https://doi.org/10.1080/20013078.2019.1596668). In this manuscript, they provide experiments that allow us to generalize this process to other cells. In brief, amphiectosomes are likely released by ectocytosis of amphisomes that are formed by the fusion of multivesicular endosomes with autophagosomes. The authors propose that their model puts forward the hypothesis that LC3 positive vesicles are formed by "curling" of the autophagosomal membrane which then gives rise to an organelle where both CD63 and LC3 positive small EVs co-exist and would be released then by a budding mechanism at the cell surface that appears similar to the budding of microvesicles /ectosomes. Very correctly the authors make the distinction from migrasomes because these structures appear very similar in morphology.

      Strengths:

      The findings are interesting despite that it is unclear what would be the functional relevance of such a process and even how it could be induced. It points to a novel mode of release of extracellular vesicles.

      Weaknesses:

      This reviewer has comments and concerns concerning the interpretation of the data and the proposed model. In addition, in my opinion, some of the results in particular micrographs and immunoblots (even shown as supplementary data) are not of quality to support the conclusions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Highlight MV-IEV, ILV and limiting membrane in Figure-1G, N, and U.

      Based on the suggestion, we revised Figure1

      (2) Figure 1-Y-AF are not mentioned in the text.

      In the revised manuscript, we discuss Figure 1Y-AF

      (3) The term "IEVs" in Figure 2-S2 is not defined.

      We modified the figure legend: we changed MV-lEV to amphiectosome

      (4) Need to quantify co-localization in Figure 2-S2.

      As suggested, we carried out the co-localisation analysis (Fig2-S2I), and Fig2-S2 was re-edited

      Reviewer #2 (Recommendations For The Authors):

      I have two recommendations for improving the manuscript through additional experiments:

      (1) I think the description of the intracellular processes taking place in order to form amphiectosomes would be much stronger if some super-resolution imaging could be undertaken. This should label the different compartments before and after fusion with specific markers that highlight the protein signature of the different limiting and ILV membranes much more clearly than immuno-EM. It will also help in characterising the double-membrane structure of amphiectosomes at the point of budding and reveal whether the patchy labelling of the inner membrane emerges after amphiectosome release (the schematic model currently suggests that it happens before).

      Thank you for your suggestion. STED microscopy was applied and results are shown in new Fig3 and the schematic model was modified accordingly.

      (2) The implications of the manuscript would be more wide-ranging if the authors could test genetic manipulations that are believed to block exosome or ectosome release, eg. Rab27a or Arrdc1 knockdown. This may allow them to determine whether MV-lEVs can be released independently of the classical exosome release mechanism because they use a different route to be released from the plasma membrane. This experiment is not essential, but I think it would start to address the core regulatory mechanisms involved, and if successful, would easily allow the authors to determine the ratio of CD63-positive sEVs being secreted via classical versus amphiectosome routes.

      The suggestion is very valuable for us and these studies are being performed in a separate project.

      I think there are several other ways in which the manuscript could be improved to better explain some of the approaches, findings and interpretation:

      (1) Include some explanation in the text of certain key tools, particularly:

      a. Palm-GFP and whether its expression might alter the properties of the plasma membrane since this is used in a lot of experiments and is the only marker that seems to uniformly label the outer membrane of amphiectosomes. One concern might be that its expression drives amphiectosome secretion.

      We found evidence for amphiectosome release also in the case of several different cells not expressing Palm-GFP. We believe, this excludes the possibility that Palm-GFP expression is the inducer of the amphiectosome release. Both by fluorescent and electron microscopy, the Palm-GFP non expressing cells showed very similar MV-lEVs. In addition, in the case of non-transduced HEK293 and fluorescent WGA-binding, we made similar observations.

      b. Lactadherin - does this label the amphiectosomes after their release or does the wash-off step mean that it only labels cells, which subsequently release amphiectosomes?

      Lactadherin labels the amphiectosomes after their release and fixation. Living cells cannot be labelled by lactadherin as PS is absent in the external plasma membrane layer of living cells. We used WGA on HEK293 cells to further support the plasma membrane origin of the external membrane of amphiectosomes.

      (2) Explain the EM and confocal imaging approaches more clearly. Most importantly, is a 3D reconstruction always involved to confirm that 'separated' amphiectosomes are not joined to cells in another Z-plane.

      Thank you for your suggestion. We have modified the manuscript accordingly

      (3) Presenting triple-labelled images with red, green and yellow channels does not allow individual labelling to be determined without single-channel images and even then, it is much more informative to use three distinguishable colours that make a different colour with overlap, eg. CMY? Fig.2_S2D and E do not display individual channels, so definitely need to be changed.

      In case of Fig.2_S2D, we now show the individual channels, the earlier E image has been removed. In case of the STED images, CMY colors had been used, as you suggested.

      (4) Please discuss in the text the data in Figure 1Y onwards concerning single/double membranes on MV-lEVs.

      In the revised manuscript, we discuss the question on single/double membranes and we refer to Figure 1Y-AF

      (5) On line 162, reword 'intraluminal TSPAN4 only' to 'one in which TSPAN4 is only intraluminal' to make it clear that other proteins are also marking the intraluminal region, not TSPAN4 only.

      We modified the text accordingly.

      (6) Points for further discussion and further conclusions:

      a. In vivo experiments - discuss the limitations of this part of the analysis - it seems that none of the amphiectosome markers have been analysed in this part of the study and the MV-lEVs are only in the circulation.

      b. Can the authors give any further indication of the levels of MV-lEVs relative to free sEVs from any of their studies?

      Using our current approach, it is not possible to determine the levels of MV-lEVs to free sEV. Without analyzing serial ultrathin sections, determination of the relative ratio of MV-lEVs and sEVs would depend on the actual section plane. In future projects, we will determine the ratio of LC3 positive and negative sEVs by single EV analysis techniques (such as SP-IRIS). In the revised manuscript, additional TEM images are included to provide evidence for the simultaneous presence of sEVs and MV-lEVs and MV-lEVs both inside and outside of the circulation.

      c. Please discuss the single versus double membrane issue (relating to experiments proposed above).

      We discuss this question in more details in the revised manuscript.

      d. Please point out that the release mechanism (plasma membrane budding) will involve different molecular mechanisms to establish exosome release, and this might provide a route to determine relative importance.

      We are currently running a systemic analysis of the release mechanism of amphiectosomes, and this will be the topic of a separate manuscript.

      Reviewer #3 (Recommendations For The Authors):

      * The model is not supported.

      * The data is not of quality.

      * The appropriate methods are not exploited.

      We are sorry, we cannot respond to these unsupported critiques.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This important study showing that sleep deprivation increases functional synapses while depleting silent synapses supports previous findings that excitatory signaling increases during wakefulness. This manuscript focuses in particular on AMPA/NMDA ratios. An interesting, although speculative, aspect of the manuscript is the inclusion of a model for the accumulation of sleep need that is based upon the MEF2C transcription factor but also links to the sleep-regulating SIK3-HDAC4/5 pathway. The authors have clarified some questions raised in the previous review, but the evidence for major claims was still found to be incomplete, requiring additional experimentation.

      The major claims of this study are: 1) SD increases the AMPA/NMDA receptor ratio and RS restores it; 2) SD decreases silent synapses compared to CS and RS restores their number after SD; 3) the majority of SD-induced DEGs are found in ExIT cells (glutamate pyramidal neurons projecting within the telencephalon); 4) ExIT SD-induced DEGs are enriched for genes encoding synaptic shaping components and for autism spectrum disorder risk and; 5) these DEGs are also enriched for DEGs induced by Mef2c loss of function restricted to forebrain glutamate neurons (ExIT cells comprise a subset of these) and by over-expression of constitutively nuclear HDAC4 that represses MEF2c transcriptional function. The last claim is consistent with an intracellular signaling model (presented as a hypothesis to be tested, in figure 4B).

      [The above is added to the start of the discussion section.]

      The specific claims are supported by solid evidence provided in this manuscript. The statistical support is now more clearly presented, with several changes in response to queries by reviewer 1.

      The technical issues raised by reviewer 1 do not detract from the claims, thus supported. The rationale for this assessment is expanded below in response to reviewer 1.

      Summary:

      This manuscript by Vogt et al examines how the synaptic composition of AMPA and NMDA receptors changes over sleep and wake states. The authors perform whole-cell patch clamp recordings to quantify changes in silent synapse number across conditions of spontaneous sleep, sleep deprivation, and recovery sleep after deprivation. They also perform single nucleus RNAseq to identify transcriptional changes related to AMPA/NMDA receptor composition following spontaneous sleep and sleep deprivation. The findings of this study are consistent with a decrease in silent synapse number during wakefulness and an increase during sleep. However, these changes cannot be conclusively linked to sleep/wake states. Measurements were performed in motor cortex, and sleep deprivation was achieved by forced locomotion, raising the possibility that recent patterns of neuronal activity, rather than sleep/wake states, are responsible for the observed results.

      Strengths:

      This study examines an important question. Glutamatergic synaptic transmission has been a focus of studies in the sleep field, but AMPA receptor function has been the primary target of these studies. Silent synapses, which contain NMDA receptors but lack AMPA receptors, have important functional consequences for the brain. Exploring the role of sleep in regulating silent synapse number is important to understanding the role of sleep in brain function. The electrophysiological approach of measuring the failure rate ratio, supported by AMPA/NMDA ratio measurements, is a rigorous tool to evaluate silent synapse number.

      The authors also perform snRNAseq to identify genes differentially expressed in the spontaneous sleep and sleep deprivation groups. This analysis reveals an intriguing pattern of upregulated genes controlled by HDAC4 and Mef2c, along with synaptic shaping component genes and genes associated with autism spectrum disorder, across cell types in the sleep deprivation group. This unbiased approach identifies candidate genes for follow-up studies. The finding that ASD-risk genes are differentially expressed during SD also raises the intriguing possibility that normal sleep function is disrupted in ASD.

      Weaknesses:

      A major consideration to the interpretation of this study is the use of forced locomotion for sleep deprivation. Measurements are made from motor cortex, and therefore the effects observed could be due to differences in motor activity patterns across groups, rather than lack of sleep per se.

      Experimentally induced lack of sleep always involves differences in motor activity. As previously noted in revision 1, motor learning is unlikely to occur in this paradigm and inspection of the video (in supplementary materials) shows no repetitive motor behavioral sequences during the sleep deprivation, nor can this be considered exercise due to the very slow speed of treadmill movement employed. The obvious major difference between groups is a lack of sleep per se. (See below in the “Recommendations for authors”, reviewer 1 for comments on localized wake activity inducing localized sleep-need responses)

      Considering that other groups have failed to find a difference in AMPA/NMDA ratio in mice with different spontaneous sleep/wake histories (Bridi et al., Neuron 2020), confirmation of these findings in a different brain region would greatly strengthen the study.

      The study of Bridi et al., Neuron 2020, is not comparable to our study for several important reasons. First, their compared groups were from different circadian phases (180 degrees out of phase), whereas in our study, the circadian times for each group were matched (ZT=6hours). Second, experimentally induced sleep loss did not occur whereas it was a focus of our study. Third, spontaneous sleep/wake cannot be accurately matched amongst subjects whereas in our study, sleep loss was matched exactly between groups.

      We agree that assessment of AMPA/NMDA ratio and silent synapse number in sleep deprived compared to ad libitum sleep in other areas of the neocortex is of great interest and something we hope to pursue. It would not be surprising to find differences as preliminarily reported by Bahl, et al., Nat Commun. 2024 Jan 26;15(1):779. However, such data would not further strengthen our already well supported evidence for the differences we report in the motor cortex.

      The electrophysiological measurements and statistical analyses raise several questions. Input resistance (cutoffs and actual values) are not provided, making it difficult to assess recording quality.

      As stated in our first reply, these data were omitted (an admitted oversight on our part) but are now supplied in the methods section as, “Series resistance values for the recording pipette ranged between 8 and 15 MOhm and experiments with changes larger than 25% were not used for further analyses”. We have now also added the Rs/Rm (as a separate column) for each recorded neuron in table 1.

      Parametric one-way ANOVAs were used, although the data do not appear to be normally distributed.

      We have now removed all the One-way ANOVA tests for clarity (non-parametric tests were previously supplied in addition to the one-way ANOVA tests). Determination of significance with Kruskal-Wallis non-parametric test has not altered statistical support for our conclusions.

      Reviewer 1 correctly points out that we had not tested for normality of our distributions- the distributions are likely to be normal but the sample size is too small to confidently make this call  for the ratio data which is why we removed the one-way ANOVA’s entirely from table 1.

      Two-way ANOVA’s are used to assess AMPA and EPSC amplitudes and failure rates (table 1 tab 2&5)  across sleep conditions. As now indicated (table 1, tab 2&5), the distributions of AMPA and NMDA amplitudes and FRs passed the D'Agostino & Pearson test for normality and QQ plots provide illustration supporting this claim.

      In addition, for the AMPA/NMDA and FRR measurements (Figures 1E, F), the SD group (rather than the control sleep group) was used as the control group for post-hoc comparisons, but it is unclear why.

      The label of “control group” is arbitrary. CS and RS groups are similar (sleep density for RS>CS as expected).  Since this appears to be confusing, we now compare all groups to one another in table 1 with the same statistical outcome (additional comparison of CS to RS).

      While the data appear in line with the authors' conclusions, the number of mice (3/group) and cells recorded is low, and adding more would better account for inter-animal variability and increase the robustness of the findings.

      Of course, the larger the sample, the better the approximation to the population. Our sample sizes yielded significant differences at the usual p<=0.05 threshold with non-parametric testing. A larger sample size could allow for normality testing of the distributions of the data, but fortunately, this was not necessary to support our conclusions.

      The snRNAseq data are intriguing. However, several genes relevant to the AMPA/NMDA ratio are mentioned, but the encoded proteins would be expected to have variable effects on AMPA/NMDA receptor trafficking and function, making the model presented in Figure 4C oversimplified. A more thorough discussion of the candidate genes and pathways that are upregulated during sleep deprivation, the spatiotemporal/posttranslational control of protein expression, and their effects on AMPA/NMDA trafficking vs function is warranted.

      We have not studied the candidate genes at this point and do not yet understand their potential role(s) in sleep-related AMPA/NMDA functional ratio, only that their expression levels are altered with sleep condition. We agree with the reviewer that the data are intriguing and in need of further investigation. An important first step that can help direct such studies is the identification and preliminary characterization of good candidate genes with respect their cell type specificity, significance and fold change as we have done. Their potential roles likely depend on “the spatiotemporal/posttranslational control” and other factors as reviewer 1 notes.

      Reviewer #2 (Public review):

      Here Vogt et al., provide new insights into the need for sleep and the molecular and physiological response to sleep loss. The authors expand on their previously published work (Bjorness et al., 2020) and draw from recent advances in the field to propose a neuron-centric molecular model for the accumulation and resolution of sleep need and basis of restorative sleep function. While speculative, the proposed model successfully links important observations in the field and provides a framework to stimulate further research and advances on the molecular basis of sleep function. In my review, I highlight the important advances of this current work, the clear merits of the proposed model, and indicate areas of the model that can serve to stimulate further investigation.

      Strengths:

      Reviewer comment on new data in Vogt et al., 2024

      Using classic slice electrophysiology, the authors conclude that wakefulness (sleep deprivation (SD)) drives a potentiation of excitatory glutamate synapses, mediated in large part by "un-silencing" of NMDAR-active synapses to AMPAR-active synapses. Using a modern single nuclear RNAseq approach the authors conclude that SD drives changes in gene expression primarily occurring in glutamatergic neurons. The two experiments combined highlight the accumulation and resolution of sleep need centered on the strength of excitatory synapses onto excitatory neurons. This view is entirely consistent with a large body of extant and emerging literature and provides important direction for future research.

      Consistent with prior work, wakefulness/SD drives an LTP-type potentiation of excitatory synaptic strength on principle cortical neurons. It has been proposed that LTP associated with wake, leads to the accumulation of sleep need by increasing neuronal excitability, and by the "saturation" of LTP capacity. This saturation subsequently impairs the capacity for further ongoing learning. This new data provides a satisfying mechanism of this saturation phenomenon by introducing the concept of silent synapses. The new data show that in mice well rested, a substantial number of synapses are "silent", containing an NMDAR component but not AMPARs. Silent synapses provide a type of reservoir for learning in that activity can drive the un-silencing, increasing the number of functional synapses. SD depletes this reservoir of silent synapses to essentially zero, explaining how SD can exhaust learning capacity. Recovery sleep led to restoration of silent synapses, explaining how recovery sleep can renew learning capacity. In their prior work (Bjorness et al., 2020) this group showed that SD drives an increase in mEPSC frequency onto these same cortical neurons, but without a clear change in pre-synaptic release probability, implying a change in the number of functional synapses. This prediction is now born out in this new dataset.

      The new snRNAseq dataset indicates the sleep need is primarily seen (at the transcriptional level) in excitatory neurons, consistent with a number of other studies. First, this conclusion is corroborated by an independent, contemporary snRNAseq analysis recently available as a pre-print (Ford et al., 2023 BioRxiv https://doi.org/10.1101/2023.11.28.569011). A recently published analysis on the effects of SD in drosophila imaged synapses in every brain region in a cell-type dependent manner (Weiss et al., PNAS 2024), concluding that SD drives brain wide increases in synaptic strength almost exclusively in excitatory neurons. Further, Kim et al., Nature 2022, heavily cited in this work, show that the newly described SIK3-HDAC4/5 pathway promotes sleep depth via excitatory neurons and not inhibitory neurons.

      The new experiments provided in Fig1-3 are expertly conducted and presented. This reviewer has no comments of concern regarding the execution and conclusions of these experiments.

      Reviewer comment on model in Vogt et al., 2024

      To the view of this reviewer the new model proposed by Vogt et al., is an important contribution. The model is not definitively supported by new data, and in this regard should be viewed as a perspective, providing mechanistic links between recent molecular advances, while still leaving areas that need to be addressed in future work. New snRNAseq analysis indicates SD drives expression of synaptic shaping components (SSCs) consistent with the excitatory synapse as a major target for the restorative basis of sleep function. SD induced gene expression is also enriched for autism spectrum disorder (ASD) risk genes. As pointed out by the authors, sleep problems are commonly reported in ASD, but the emphasis has been on sleep amount. This new analysis highlights the need to understand the impact on sleep's functional output (synapses) to fully understand the role of sleep problems in ASD.

      Importantly, SD induced gene expression in excitatory neurons overlap with genes regulated by the transcription factor MEF2C and HDAC4/5 (Fig. 4). In their prior work, the authors show loss of MEF2C in excitatory neurons abolished the SD transcriptional response and the functional recovery of synapses from SD by recovery sleep. Recent advances identified HDAC4/5 as major regulators of sleep depth and duration (in excitatory neurons) downstream of the recently identified sleep promoting kinase SIK3. In Zhou et al., and Kim et al., Nature 2022, both groups propose a model whereby "sleep-need" signals from the synapse activate SIK3, which phosphorylates HDAC4/5, driving cytoplasmic targeting, allowing for the de-repression and transcriptional activation of "sleep genes". Prior work shows that HDAC4/5 are repressors of MEF2C. Therefore, the "sleep genes" derepressed by HDAC4/5 may be the same genes activated in response to SD by MEF2C. The new model thereby extends the signaling of sleep need at synapses (through SIK3-HDAC4/5) to the functional output of synaptic recovery by expression of synaptic/sleep genes by MEF2C. The model thereby links aspects of expression of sleep need with the resolution of sleep need by mediating sleep function: synapse renormalization.

      Weaknesses:

      Areas for further investigation.

      In the discussion section Vogt et al., explore the links between excitatory synapse strength, arguably the major target of "sleep function", and NREM slow-wave activity (SWA), the most established marker of sleep need. SIK3-HDAC4/5 have major effects on the "depth" of sleep by regulating NREM-SWA. The effects of MEF2C loss of function on NREM SWA activity are less obvious, but clearly impact the recovery of glutamatergic synapses from SD. The authors point out how adenosine signaling is well established as a mediator of SWA, but the links with adenosine and glutamatergic strength are far from clear. The mechanistic links between SIK3/HDAC4/5, adenosine signaling, and MEF2C, are far from understood. Therefore, the molecular/mechanistic links between a synaptic basis of sleep need and resolution with NREM-SWA activity require further investigation.

      Additional work is also needed to understand the mechanistic links between SIK3-HDAC4/5 signaling and MEF2C activity. The authors point out that constitutively nuclear (cn) HDAC4/5 (acting as a repressor) will mimic MEF2C loss of function. This is reasonable, however, there are notable differences in the reported phenotypes of each. Notably, cnHDAC4/5 suppresses NREM amount and NREM SWA but had no effect on the NREM-SWA increase following SD (Zhou et al., Nature 2022).

      We speculate that the effect of cnHDAC4/5 to reduce NREM-SWA together with the reduction of NREM amount may be due to a localized increase in neuronal excitability of arousal centers, which would be expected to mask NREM-SWA. Rebound NREM-SWA may reflect the relative rebound increase of NREM-SWA still present under chronic masking conditions (induced by cnHDAC4/5) of increased arousal system excitability. A similar effect to overcome NREM-SWA masking was reported in a Kcna2 KO mouse (a Shaker homologue) by Douglas, et al. (2007, BMC Biol).

      Loss of MEF2C in CaMKII neurons had no effect on NREM amount and suppressed the increase in NREM-SWA following SD (Bjorness et al., 2020). These instances indicate that cnHDAC4/5 and loss of MEF2C do not exactly match suggesting additional factors are relevant in these phenotypes. Likely HDAC4/5 have functionally important interactions with other transcription factors, and likewise for MEF2C, suggesting areas for future analysis.

      This is not a surprising outcome since both MEF2c and HDAC4/5 are transcription factors whose function(s) are determined by multiple other factors a subset of which are relevant to sleep conditions while other determining factors are not necessarily relevant to sleep. These factors can include their phosphorylation state, genomic accessibility, and interaction with other transcription factors. All these other factors are known to be both cell type specific and determined by intracellular conditions, that in turn, are affected by extracellular conditions and ligands. We certainly agree there is much future analysis needed.

      One emerging theme may be that the SIK3-HDAC4/5 axis are major regulators of the sleep state, perhaps stabilizing the NREM state once the transition from wakefulness occurs. MEF2C is less involved in regulating sleep per se, and more involved in executing sleep function, by promoting restorative synaptic modifications to resolve sleep need.

      A useful way to restate the above might be to distinguish between control of arousal levels determining the behavioral states, wake or sleep (including REM sleep) and control of sleep function. The term, sleep, is typically used to describe the behavioral state of sleep that acts as a permissive gate to sleep function (that resolves sleep need). The sleep state should not be conflated with sleep function. There is abundant evidence that control of arousal can be dissociated from sleep need and sleep function.

      Finally, advances in the roles of the respective SIK3-HDAC4/5 and MEF2C pathways point towards transcription of "sleep genes", as clearly indicated in the model of Fig.4. Clearly more work is needed to understand how the expression of such genes ultimately lead to resolution of sleep need by functional changes at synapses.

      We are in full agreement. We also note the SIK3-HDAC4/5 pathway may have more than one role, i.e., to affect arousal centers to alter behavioral state and, more generally, to control MEF2c’s transcriptional activity thus controlling sleep-related, glutamate, synaptic phenotype.

      What are these sleep genes and how do they mechanistically resolve sleep need? Thus, the current work provides a mechanistic framework to stimulate further advances in understanding the molecular basis for sleep need and the restorative basis of sleep function.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) I appreciate the authors' thoughtful discussion of the use of forced locomotion for their sleep deprivation technique in their response, as well as the additional information that was provided regarding use of the treadmill in the manuscript. However, given that previous studies have failed to find a difference in AMPA/NMDA ratio following spontaneous sleep vs wake, confirmation of the findings in a non-motor brain region with the same SD technique (or confirmation within motor cortex with a different technique, although the authors correctly point out that other techniques also increase locomotor activity) would greatly strengthen the paper.

      Addressed above

      Notably, differences in motor activity patterns, not necessarily overall amount of locomotion, may induce differential synaptic changes between groups. This point at least warrants acknowledgement and discussion, but this has not been incorporated into the text of the manuscript.

      We will incorporate the following into the discussion:

      There is evidence that learning of a motor task  or experience of forced altered motor activity can result in localized increases in NREM (slow wave sleep)-slow wave activity (Huber R, Ghilardi MF, Massimini M, Tononi G. Local sleep and learning. Nature. 2004;430(6995):78-81); Huber et al., 2006) in the motor cortex. Since SWS-SWA is considered a marker for sleep homeostasis, the altered motor activity induced increase of SWS-SWA was considered evidence for sleep-related function. Our earlier work has clearly shown that the treadmill method of SD increases frontal cortical SWS-SWA rebound, indicating a sleep-homeostatic process (Bjorness et al., 2016; Bjorness et al., 2020). Furthermore, we have also shown that this means of experimental SD causes similar glutamate synaptic changes as those observed using other means of SD like gentle handling (Liu, et al., JoNS 2010).

      (2) The number of mice and cells used for electrophysiology in this study remains low; more animals should be included to account for inter-animal variability.

      For this study, increasing the number of mice and cells will have p<0.05 chance of altering our conclusions by rejecting the null hypotheses of the electrophysiology findings.

      (3) The additional methodological information provided allays some of my concerns regarding the electrophysiological data. However, information about the input resistance (cutoffs used and/or actual values) is still not provided, which is important for assessing recording quality.

      We have now supplied the experimentally determined input resistance for each neuron used in this study (a separate column in table 1, tabs marked, “data”).

      (4) It is not meaningful to compare raw AMPA or NMDA responses because stimulus electrode placement will differ between cells, potentially activating different numbers of afferents. Presenting these comparisons (Figure 1C) has the potential to mislead the reader.

      This is not misleading (it didn’t mislead reviewer 1) as we described the conditions. As expected by reviewer 1, the variability using “raw AMPA or NMDA responses…” was too great, but did indicate an interaction between receptor responses and sleep condition. This provided (as stated in the results section) rationale to examine, and to only draw conclusions from the AMPA/NMDA amplitude and FR ratios.

      (5) I appreciate clarification on the statistics and the authors' response has answered some of my questions. However, this also raises additional questions. What test was used to determine normality (and therefore whether to perform a parametric vs nonparametrictest)?

      Described above.

      Why was the FRR data analysis changed to a parametric test, when it does not appear that the data are normally distributed?

      Showing the parametric test was a mistake on our part- there are not enough samples to conclusively conclude the distributions are normal as reviewer 1 correctly suspects. However, the non-parametric Kruskal-Wallis tests that we also show  in table 1 indicate significant differences between conditions and the non-parametric, two-stage linear step-up procedure of Benjamini, Krieger and Yekutieli, indicates significant differences between CS-SD and RS-SD but not for CS-RS, supporting our conclusions. The (unsupported) parametric tests are now removed in Table 1 leaving behind the non-parametric test.

      Why were post-hoc tests chosen to compare to a control group rather than all pairwise comparisons,

      We now provide post-hoc all-pairwise comparisons to give the same results using the BKY analysis.

      and why was the SD rather than CS group used as the control in Figures 1E and F?

      Why were different post-hoc tests chosen for the data in Figures 1E, F?

      There was no need for this and we now, only show statistics that are used to draw our conclusions for the AMPA/NMDA EPSC ratios data shown in Figure 1E and Failure Rate Ratios data shown in Figure 1F (the conclusions are supported by the non-parametric post-hoc test and remain unchanged).

      (6) Genes in the SSC, ASD, Mef2cKO, and HD4cn categories are almost exclusively upregulated in the SD group compared to the CS group (Figure 4A). As the authors point out in their response, "No claim of mechanism linking the changed expression to altered AMPAR or NMDAR activity can be made at this point," largely due to the fact that we do not know the spatiotemporal or posttranslational modification patterns of the translated proteins, and how they affect receptor trafficking vs function. This is in agreement with my original point: as written (and as illustrated in Figure 4C), the manuscript implies that upregulation during SD increases the AMPA/NMDA ratio via receptor trafficking,

      The model indicates a likely (but not necessarily exclusive) role for AMPA/NMDA trafficking to explain the functional electrophysiological data that we do report and which is not in dispute. The SSC-DEGs in ExIT cells are consistent with sleep-altered AMPA/NMDA trafficking but remain only a correlation. However, the point is taken and Figure 4c has been revised to only reflect what we have observed electrophysiologically and the speculated mechanism(s) mediated by observed SSC-DEGs are illustrated with “?’s”.

      while in reality the picture is likely much more complicated, and therefore a more thorough discussion is warranted. Some discussion was provided in the authors' response but does not appear to have been incorporated into the text or Figure 4C.

      As indicated above the proposed model is changed in Figure 4c to more explicitly indicate which aspects reflect our electrophysiological data and which aspects reflect only an association of observations. 

      Minor comments:

      (1) Please justify only using male mice

      We had to start somewhere with our limited resources. Our intentions are to follow up with similar experiments using female mice, should funding be realized.

      (2) The model in Figure 4C is oversimplified and remains problematic, for the reasons stated in comment #6, above.

      See responses above.

      (3) Figure 4D remains confusing

      We agree. The unnecessary addition of adenosine effects on cholinergic arousal centers (experimentally well supported), have been removed from the figure to provide a more focused indication of how SWS-SWA can be related to either MEF2c and/or to ADORA1 activation through reduction of glutamate synaptic strength. ADORA1 activation elicits reduced glutamate synaptic activity through pre- and postsynaptic inhibition whereas MEF2c activation is essential to reduce sleep elicited, glutamate EPSC reduction. Reduced glutamate synaptic strength, whatever the cause, is associated with increased SWS-SWA.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The study by Aguirre-Botero et al. shows the dynamics of 3D11 anti-CSP monoclonal antibody (mAb) mediated elimination of rodent malaria Plasmodium berghei (Pb) parasites in the liver. The authors show that the anti-CSP mAb could protect against intravenous (i.v.) Pb sporozoite challenge along with the cutaneous challenge, but requires higher concentration of antibody. Importantly, the study shows that the anti-CSP mAb not only affects sporozoite motility, sinusoidal extravasation, and cell invasion but also partially impairs the intracellular development inside the liver parenchyma, indicating a late effect of this antibody during liver stage development. While the study is interesting and conducted well, the only novel yet very important observation made in this manuscript is the effect of the anti-CSP mAb on liver stage development.

      Major

      This observation is highlighted in the manuscript title but is supported by only limited data. A such it needs to be substantiated and a mechanism should be investigated.  The phenomenon of intracellular effects of the anti-CSP mAb should be analyzed in much more detail. For example, can the authors demonstrate uptake of the Ab together with the parasite during hepatocyte invasion? What cellular mechanism leads to elimination?

      Lines 234 - 243; 308 - 325: These results are the gist of the entire study and also defined the title of the manuscript. Thus, it would be pre-mature to claim the substantial effect of 3D11 antibody in late killing of the parasite in the infected hepatocytes just by looking at the decreased GFP fluorescence. The authors need to at least verify the fitness of the liver stages by measuring the size of the developing parasites as well as using different parasite specific markers (UIS4, MSP1, HSP70 etc.) in immunofluorescence assays on the infected liver sections and in vitro infections. 

      We greatly appreciate the comments. We have taken the suggestions into consideration and deepened the characterization of 3D11's late killing of parasites. We first analyzed the presence of 3D11 in the intracellular parasite after the invasion and compared it with the CSP expression on the surface of control parasites (new Fig. 4F). Next, we tested a potential action of 3D11 added in the cell culture after the invasion (new Fig. 4G). The two new panels and the text accompanying them are shown below.

      “Post-invasion labeling of 3D11 bound to the membrane of intracellular parasites revealed a strong staining surrounding the parasite at 2 and 15h, but only punctual traces of 3D11 at 44h (Figure 4F, 3D11, 3D11). Of note, CSP was detected surrounding the control parasites at all time-points indicating that the lack of staining at 44h is not due to a decrease in the CSP amount on the parasite surface (Figure 4F, CSP, Control).  To evaluate the potential post-invasion entry of 3D11 into the PV of infected cells and posterior neutralization of intracellular parasites, we incubated invaded cells from 2 to 44 h with 3D11, but no effect on the parasite intracellular development was observed (Figure 4G, 2h p.i.). 3D11 incubated for 2 h with sporozoites and cells elicited, as expected, a dose-dependent inhibition of parasite development. Altogether, our results indicate that the late inhibition of parasite development is already achieved at 15h and likely caused by antibodies dragged inside cells bound to sporozoites before or during the invasion.”

      Finally, we better characterized the parasite loss of fitness caused by 3D11 in infected cells by quantifying the parasite size, GFP intensity and the presence and intensity of UIS4, a parasitophorous vacuole membrane developmental marker at 2, 4 and 44h as described below in the new figure 5 and accompanying text.

      “To further characterize the killing of intracellular parasites by 3D11 in HepG2 cells, we next evaluated the expression of the parasitophorous vacuole membrane (PVM) marker, UIS4 37, to infer the parasite intracellular development at 2, 4 and 44h. HepG2 cells were incubated with Pb-GFP expressing sporozoites in the absence (Control, Figure 5) or presence of 1.25 µg/mL of 3D11 during the first two hours of incubation (3D11, Figure 5). The chosen 3D11 concentration led to ~50% decrease in cell invasion (Figure 4C, 2h) and ~30% decrease in the post-invasion number of EEFs (Figure 4D), leaving enough parasites to be analyzed by microscopy. To distinguish between extracellular and intracellular parasites at 2h, washed and fixed samples were incubated with mouse 3D11 mAb (1µg/mL) and revealed with a fluorescent anti-mouse secondary antibody (Figure 5A, 3D11 in blue). Samples were then permeabilized and incubated with a goat anti-UIS4 polyclonal antibody revealed with a fluorescent anti-goat secondary antibody (Figure 5A, UIS4 in red). DNA was stained with Hoechst (Figure 5A, DNA in white).

      Extracellular GFP+ sporozoites were identified by their 3D11+UIS4- phenotype (Figure 5A, 2h, extracellular). Conversely, intracellular parasites were identified by their 3D11- phenotype and stained positive or negative for UIS4 (Figure 5A, 2h and 44h, intracellular). UIS4+ PVM is normally associated with a productive cell infection 37. However, a small number of EEFs can develop in the absence of UIS4 37, likely inside the host cell nucleus (Figure 5A, 44h, intranuclear).

      In the control and 3D11-treated groups, the percentage of intracellular UIS4- parasites decreased 2 to 3-fold from 2 to 44h, as expected of a parasite population negative for a marker of productive infection (Figure 5B). However, while at 2h in the control group, this population represented 14% of intracellular parasites, in the 3D11-treated group, it reached 48% (Figure 5B). This ~3-fold increase in the UIS4 negative population could explain the late killing of intracellular sporozoites by 3D11. Whether this population is constituted by intracellular transmigratory sporozoites lacking a PVM or parasites surrounded by a PVM, but incapable of secreting UIS4 still needs to be determined. At 44h, surviving EEFs in the 3D11-treated samples presented a similar area and UIS4 staining intensity than control parasites (Figure 5C, D). However, as observed by flow cytometry (Figure 4D), the GFP intensity of 3D11-treated parasites was significantly lower than control EEFs, indicating that 3D11 can somehow affect protein expression with undetermined effects in the genesis of red blood cell infecting stages.”

      Minor<br /> • Line 44 - 43: The statement is applicable only to the rodent infecting Plasmodium parasites. The authors need to clarify that.

      This is an important clarification. We have modified the text that now reads:

      “The sporozoite surface is covered by a dense coat of the circumsporozoite protein (CSP), shown to be an immunodominant protective antigen using a rodent malaria model”

      • Line 68: Replace the second 'against' after the CSP with 'of'.

      It is done.

      • Line 141 - 143: The 3D11 mAb does affect the homing and killing in the blood of cutaneous injected sporozoites. The authors need to clearly state that the statement is true only for i.v. injected sporozoites.

      Thank you for the comment. Now the text reads:

      “Altogether, these data indicate that 3D11 rather than having an early effect on i.v. inoculated sporozoites in the blood circulation, e.g. by inhibiting the homing or killing the parasite in the blood, requires more than 4 h to eliminate most parasites in the liver.”

      • Figure 3B: The numbers of sporozoites detected in the experiment varies from 0 h (line 172) to 2 h (line 184). Therefore, the numbers need to be mentioned on all the bars of each timepoint.

      We have now added the numbers at the top of the graph from Figure 3B.

      • Figure 3C: If the authors have used flk1-GFP mice, then how well they were able to detect the Pb-PfCSP GFP parasites in the vessel vs. parenchyma in the intravital imaging? The representative images for Pb-PfCSP GFP should also be included.

      Since 3D11 does not target PbPf parasites most of them are motile in the movies, making them easily distinguishable from the endothelial cells. In addition, the stronger GFP intensity of sporozoites makes them detectable in the sinusoids. Representative images were added in the new Figure S3.

      • It is not mentioned anywhere how the viability of the sporozoites was determined. This has to be described especially in the methods section.

      • Also, the flow acquisition and data analysis of the sporozoites and infected HepG2 cells must be described in the method section.

      We briefly mentioned it in the results (line 228- 230): “In addition, by comparing the total number of recovered GFP+ sporozoites at 2 h in the two studied conditions, we measured the early lethality (%viable sporozoites, Figure 4B) of the anti-CSP Ab on the extracellular forms of the parasite (Figure 4A).”

      A more detailed description has been added in the methods section that now reads:

      “After 2 h, the supernatant was collected, and the culture was washed 2x with 0.5 volume of PBS. The cells were subsequently trypsinized. The supernatant plus the washing steps and the trypsinized cells were analyzed by flow cytometry to quantify the amount of GFP+ events inside and outside cells (Figure 3A and Figure S4). Viability was then quantified by the sum of the total number of sporozoites (GPF+ events) in the supernatant, inside and outside the cells. We calculated the percentage of parasite viability by dividing the average of the total number of sporozoites in the treated samples by the average in controls using three technical replicates for each condition. Additionally, we quantified the percentage of infected cells using the total number of GFP+ events in the HepG2 gate (Figure S4). To compare the biological replicates, we further normalized to the control of each experiment. For the samples used to analyze parasite development, the cells were incubated for 15 or 44 h after sporozoite addition, and the medium was changed after 2 and 24 h. The cells were trypsinized and the percentage of intracellular parasites was determined by flow cytometry as described above (Figure S4). The prolonged effect between 2 h and 15/44 h was calculated by normalizing the percentage of infected cells at 15/44 h to that of 2 h. For all flow cytometry measurements, the same volume was acquired.”

      • Figure 4: The flow layouts should be included for at least comparing the 0 vs. 5 μg/ml of 3D11 mAb concentrations.

      Flow layouts were added in the supplementary figure 4.

      • Line 651 (Figure S1 legend): Typographical error '14'.

      Thank you for noticing. We corrected it.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Aguirre-Botero and collaborators report on the dynamics of Plasmodium parasite elimination in the liver using the 3D11 anti-CSP monoclonal antibody (mAb). By using microscopy and bioluminescence imaging in the P. berghei rodent malaria model, the authors first demonstrate that higher antibody concentrations are required for protection against intravenous sporozoite challenge, when compared to cutaneous challenge, which is not surprising. The study also shows that the 3D11 mAb reduces sporozoite motility, impairs hepatic sinusoidal barrier crossing, and more relevantly inhibits intracellular development of liver stages through its cytotoxic activity. These findings highlight the role of this specific monoclonal antibody, 3D11 mAb against CSP, in targeting sporozoites in the liver.
>

      Major Comments

      The study provides valuable insights into the mechanisms of protection conferred by the 3D11 anti-CSP monoclonal antibody against P. berghei sporozoites and this finding allow the field to speculate that other monoclonal antibodies against CSP of P. Falciparum may act similarly. However, an important experiment is missing that would significantly strengthen the conclusions. Specifically, the authors should perform experiments where the monoclonal antibody is added immediately after the sporozoites have completed invasion. This should be done both in vitro and in vivo to show whether the antibody has any effect on intracellular development of liver stages when added after invasion.

      While the claims are generally supported by the data presented, to comprehensively conclude the late cytotoxic effects of 3D11, the additional experiment of post-invasion antibody application is relevant. This would help determine if the observed effects are due to the antibody's action during invasion or its continued action post-invasion.

      The data and methods are presented in a manner that allows for reproducibility. The use of microscopy and bioluminescence imaging is well-documented. The experiments appear adequately replicated, and statistical analyses are appropriate.

      We thank reviewer 2 for these important suggestions. To be sure that the effect might not come from the internalization of the antibodies after sporozoite invasion, we tested the amount of 3D11 bound to the parasite following invasion (new Fig. 4F) and the potential post-invasion neutralizing effect of 3D11 in vitro. The results obtained are presented below.

      “Post-invasion labeling of 3D11 bound to the membrane of intracellular parasites revealed a strong staining surrounding the parasite at 2 and 15h, but only punctual traces of 3D11 at 44h (Figure 4F, 3D11, 3D11). Of note, CSP was detected surrounding the control parasites at all time-points indicating that the lack of staining at 44h is not due to a decrease in the CSP amount on the parasite surface (Figure 4F, CSP, Control).  To evaluate the potential post-invasion entry of 3D11 into the PV of infected cells and posterior neutralization of intracellular parasites, we incubated invaded cells from 2 to 44 h with 3D11, but no effect on the parasite intracellular development was observed (Figure 4G, 2h p.i.). 3D11 incubated for 2 h with sporozoites and cells elicited, as expected, a dose-dependent inhibition of parasite development. Altogether, our results indicate that the late inhibition of parasite development is already achieved at 15h and likely caused by antibodies dragged inside cells bound to sporozoites before or during the invasion.”

      Minor Comments

      The text and figures are clear and accurate. Some minor typographical errors should be corrected.

      Thank you for the remark; we have verified the text again to remove typographical errors.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Aguirre-Botero et al have studied the effect of a potent monoclonal antibody against the circumsporozoite protein, the major surface protein of the malaria sporozoite. This is an elegantly designed, performed, and analyzed study. They have efficiently delineated the mode of action of anti-CSP repeat mAb and confirmed previous in vitro work (not cited) that demonstrated the same intracellular effect. 

      Specific comments

      Line 51: The authors claim a correlation between high antibody levels and protection. However, they did not provide direct proof that these antibodies were responsible for protection, nor did they establish a cut-off level of anti-CSP antibodies that would distinguish between protected and unprotected individuals.

      We thank reviewer 3 for the comments. Indeed, we agree with reviewer 3, these are correlative studies where the causality cannot be established. We modified the ensuing sentence to specify the causality between anti-CSP mAbs and in vivo protection against sporozoite infection. Now the text reads:

      “Extensive research has demonstrated a positive correlation between high levels of anti-CSP antibodies (Abs) induced by the RTS,S/AS01 vaccine and efficacy against malaria(11-13). Remarkably, anti-CSP monoclonal Abs (mAbs) have been proven to protect in vivo against malaria in various experimental settings, including, mice(14-21), monkeys(23), and humans(24-26)”

      Line 326: The late intrahepatic effect of mAb against the CSP repeat has been previously reported (see Figure 2, Nudelman et al, J Immunol, 1989). The effect was shown to affect the transition from liver trophozoites to liver schizonts. This study should be cited and discussed.

      Thank you for this important remark. We included this seminal reference and now the modified text reads:

      “Notably, a similar effect has been previously reported using sera from mice immunized with PfCSP or mAb against P. yoelii (Py) CSP. Incubation of Pf or Py sporozoites with the immune sera or mAbs not only affected sporozoite invasion in vitro but continued to affect intracellular forms for several days after invasion(38,39). Additionally, using anti-PfCSP sera, it was also observed that late EEFs from sera-treated sporozoites had abnormal morphology(38). Altogether, it was thus concluded that the anti-CSP Abs present in the sera had a long-term effect on the parasites(38,39).”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Kaya et al. studies the effect of food consumption on hippocampal sharp wave ripples (SWRs) in mice. The authors use multiple foods and forms of food delivery to show that the frequency and power of SWRs increase following food intake, and that this effect depends on the caloric content of food. The authors also studied the effects of the administration of various food-intake-related hormones on SWRs during sleep, demonstrating that ghrelin negatively affects SWR rate and power, but not GLP-1, insulin, or leptin. Finally, the authors use fiber photometry to show that GABAergic neurons in the lateral hypothalamus, increase activity during a SWR event.

      Strengths:

      The experiments in this study seem to be well performed, and the data are well presented, visually. The data support the main conclusions of the manuscript that food intake enhances hippocampal SWRs. Taken together, this study is likely to be impactful to the study of the impact of feeding on sleep behavior, as well as the phenomena of hippocampal SWRs in metabolism.

      Weaknesses:

      Details of experiments are missing in the text and figure legends. Additionally, the writing of the manuscript could be improved.

      We thank the reviewer for their favorable assessment of the work and its potential impact. We will add all requested details in the text and figure legends and will revise the wording of the manuscript to improve its clarity.

      Reviewer #2 (Public review):

      Summary:

      Kaya et al uncover an intriguing relationship between hippocampal sharp wave-ripple production and peripheral hormone exposure, food intake, and lateral hypothalamic function. These findings significantly expand our understanding of hippocampal function beyond mnemonic processes and point a direction for promising future research.

      Strengths:

      Some of the relationships observed in this paper are highly significant. In particular, the inverse relationship between GLP1/Leptin and Insulin/Ghrelin are particularly compelling as this aligns well with opposing hormone functions on satiety.

      Weaknesses:

      I would be curious if there were any measurable behavioral differences that occur with different hormone manipulations.

      We thank the reviewer for their favorable assessment of the work and its contribution to our understanding of non-mnemonic hippocampal function. Whether there are behavioral differences that occur following administration of the different hormones is a great question, yet unfortunately our study design did not include fine behavioral monitoring to the degree that would allow answering it. While some previous studies have partially addressed the behavioral consequences of the delivery of these hormones (we will include a reference to these studies in the revised manuscript), how these changes may interact with the hippocampal and hypothalamic effects we observe is a very interesting next step.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Kaya et al. explores the effects of feeding on sharp wave-ripples (SWRs) in the hippocampus, which could reveal a better understanding of how metabolism is regulated by neural processes. Expanding on prior work that showed that SWRs trigger a decrease in peripheral glucose levels, the authors further tested the relationship between SWRs and meal consumption by recording LFPs from the dorsal CA1 region of the hippocampus before and after meal consumption. They found an increase in SWR magnitude during sleep after food intake, in both food restricted and ad libitum fed conditions. Using fiber photometry to detect GABAergic neuron activity in the lateral hypothalamus, they found increased activity locked to the onset of SWRs. They conclude that the animal's satiety state modulates the amplitude and rate of SWRs, and that SWRs modulate downstream circuits involved in regulating feeding. These experiments provide an important step forward in understanding how metabolism is regulated in the brain. However, currently, the paper lacks sufficient analyses to control for factors related to sleep quality and duration; adding these analyses would further support the claim that food intake itself, as opposed to sleep quality, is primarily responsible for changes in SWR activity. Adding this, along with some minor clarifications and edits, would lead to a compelling case for SWRs being modulated by a satiety state. The study will likely be of great interest in the field of learning and memory while carrying broader implications for understanding brain-body physiology.

      Strengths:

      The paper makes an innovative foray into the emerging field of brain-body research, asking how sharp wave-ripples are affected by metabolism and hunger. The authors use a variety of advanced techniques including LFP recordings and fiber photometry to answer this question. Additionally, they perform comprehensive and logical follow-up experiments to the initial food-restricted paradigm to account for deeper sleep following meal times and the difference between consumption of calories versus the experience of eating. These experiments lay the groundwork for future studies in this field, as the authors pose several follow-up questions regarding the role of metabolic hormones and downstream brain regions.

      We thank the reviewer for their appreciation and constructive review of the work.

      Weaknesses:

      Major comments:

      (1) The authors conclude that food intake regulates SWR power during sleep beyond the effect of food intake on sleep quality. Specifically, they made an attempt to control for the confounding effect of delta power on SWRs through a mediation analysis. However, a similar analysis is not presented for SWR rate. Moreover, this does not seem to be a sufficient control. One alternative way to address this confound would be to subsample the sleep data from the ad lib and food restricted conditions (or high calorie and low calorie, etc), to match the delta power in each condition. When periods of similar mean delta power (i.e. similar sleep quality) are matched between datasets, the authors can then determine if a significant effect on SWR amplitude and rate remains in the subsampled data.

      This is an important point that we believe we addressed in a few complementary ways. First, the mediation analysis we implemented measures the magnitude and significance of the contribution of food on SWR power after accounting for the effects of delta power, showing a highly significant food-SWR contribution. While the objective of subsampling is similar, mediation is a more statistically robust approach as it models the relationship between food, SWR power, and delta power in a way that explicitly accounts for the interdependence of these variables. Further, subsampling introduces the risk of losing statistical power by reducing the sample size, due to exclusion of data that might contain relevant and valuable information. Mediation analysis, on the other hand, uses the full dataset and retains statistical power while modeling the relationships between variables more holistically. However, as we were not satisfied with a purely analytical approach to test this issue, we carried out a new set of experiments in ad-libitum fed mice, where there is no potential issue of food restriction impairing sleep quality in the pre-sleep session. In these conditions food amount also significantly correlated with, and showed significant mediation of, the SWR power change. Finally, we acknowledge and discuss this point in the Discussion, highlighting that given the known relationship between cortical delta and SWRs, it is challenging to fully disentangle these signals.

      (2) Relatedly, are the animals spending the same amount of time sleeping in the ad lib vs. food restricted conditions? The amount of time spent sleeping could affect the probability of entering certain stages of sleep and thus affect SWR properties. A recent paper (Giri et al., Nature, 2024) demonstrated that sleep deprivation can alter the magnitude and frequency of SWRs. Could the authors quantify sleep quantity and control for the amount of time spent sleeping by subsampling the data, similar to the suggestion above?

      We will include a comparison of sleep amount in the revised manuscript.

      Additionally, we will add details to the Methods section that were missing in the original submission that are relevant to this point. Specifically, within the sleep sessions, the ongoing sleep states were scored using the AccuSleep toolbox (https://github.com/zekebarger/AccuSleep) using the EEG and EMG signals. NREM periods were detected based on high EEG delta power and low EMG power, REM periods were detected based on high EEG theta power and low EMG power, and Wake periods were detected based on high EMG power. Importantly, only NREM periods were included for subsequent SWR detection, quantification and analyses (in particular, reported SWR rates reflect the number of SWRs per second of NREM sleep).

      (3) Plot 5I only reports significance but does not clearly show the underlying quantification of LH GABAergic activity. Upon reading the methods for how this analysis was conducted, it would be informative to see a plot of the pre-SWR and post-SWR integral values used for the paired t-test whose p-values are currently shown. For example, these values could be displayed as individual points overlaid on a pair of box-and-whisker plots of the pre- and post-distribution within the session (perhaps for one example session per mouse with the p-value reported, to supplement a plot of the distribution of p-values across sessions and mice). If these data are non-normal, the authors should also use a non-parametric statistical test.

      We will include this quantification and visual representation in the revised manuscript.

      Minor comments:

      (4) A brief explanation (perhaps in the discussion) of what each change in SWR property (magnitude, rate, duration) could indicate in the context of the hypothesis may be helpful in bridging the fields of metabolism and memory. For example, by describing the hypothesized mechanistic consequence of each change, could the authors speculate on why ripple rate may not increase in all the instances where ripple power increases after feeding? Why do the authors speculate that ripple duration does not increase, given that prior work (Fernandez-Ruiz et al. 2019) has shown that prolonged ripples support enhanced memory?

      We will include a discussion of these points in the revised manuscript.

      (5) The authors suggest that "SWRs could modulate peripheral metabolism" as a future implication of their work. However, the lack of clear effects from GLP-1, leptin and insulin complicates this interpretation. It might be informative for readers if the authors expanded their discussion of what specific role they speculate that SWRs could play in regulating metabolism, given these negative results.

      While we provided potential explanations for the lack of effects of the hormone administrations, we will further elaborate on this point in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: <br /> In this manuscript, the authors identified that 

      (1) CDK4/6i treatment attenuates the growth of drug-resistant cells by prolongation of the G1 phase; 

      (2) CDK4/6i treatment results in an ineffective Rb inactivation pathway and suppresses the growth of drug-resistant tumors;

      (3) Addition of endocrine therapy augments the efficacy of CDK4/6i maintenance;

      (4) Addition of CDK2i with CDK4/6 treatment as second-line treatment can suppress the growth of resistant cell;

      (5) The role of cyclin E as a key driver of resistance to CDK4/6 and CDK2 inhibition.

      Strengths:

      To prove their complicated proposal, the authors employed orchestration of several kinds of live cell markers, timed in situ hybridization, IF and Immunoblotting. The authors strongly recognize the resistance of CDK4/6 + ET therapy and demonstrated how to overcome it.

      Weaknesses:

      The authors need to underscore their proposed results from what is to be achieved by them and by other researchers. 

      Thank you for your thoughtful review and for highlighting both the strengths and weaknesses of our manuscript. We appreciate your recognition of the methodological rigor and the significance of our findings in addressing resistance to CDK4/6 inhibitors combined with endocrine therapy.

      To address your concern regarding the need to delineate our results from those achieved by other researchers, we will incorporate clarifications in the revised manuscript. Specifically, we will:

      (1) Clearly distinguish our novel contributions from prior findings in the field.

      (2) Explicitly cite and discuss relevant studies to contextualize our work, ensuring that our contributions are appropriately framed within the broader body of knowledge.

      These revisions will enhance the transparency and impact of our manuscript, as well as highlight the originality and significance of our findings. Thank you again for your constructive feedback.

      Reviewer #2 (Public review):

      Summary:

      This study elucidated the mechanism underlying drug resistance induced by CDK4/6i as a single agent and proposed a novel and efficacious second-line therapeutic strategy. It highlighted the potential of combining CDK2i with CDK4/6i for the treatment of HR+/HER2- breast cancer.

      Strengths:

      The study demonstrated that CDK4/6 induces drug resistance by impairing Rb activation, which results in diminished E2F activity and a delay in G1 phase progression. It suggests that the synergistic use of CDK2i and CDK4/6i may represent a promising second-line treatment approach. Addressing critical clinical challenges, this study holds substantial practical implications.

      Weaknesses: 

      (1) Drug-resistant cell lines: Was a drug concentration gradient treatment employed to establish drug-resistant cell lines? If affirmative, this methodology should be detailed in the materials and methods section. 

      We greatly appreciate the reviewer for raising this important question. In the revised manuscript, we will update the methods section to include a detailed description of how the drug-resistant cell lines were developed. Specifically, we will clarify whether a drug concentration gradient treatment was employed and provide step-by-step details to ensure reproducibility.

      (2) What rationale informed the selection of MCF-7 cells for the generation of CDK6 knockout cell lines? Supplementary Figure 3. A indicates that CDK6 expression levels in MCF-7 cells are not notably elevated. 

      We appreciate the reviewer’s insightful question about the rationale for selecting MCF-7 cells to generate CDK6 knockout cell lines. This choice was guided by prior studies highlighting the significant role of CDK6 in mediating resistance to CDK4/6 inhibitors (1-4). Moreover, we observed a 4.6-fold increase in CDK6 expression in CDK4/6 inhibitor-resistant MCF-7 cells compared to their drug-naïve counterparts (Supplementary Figure 3A). While we did not detect notable differences in CDK4/6 activity between wild-type and CDK6 knockout cells under CDK4/6 inhibitor treatment, these findings point to a potential non-canonical function of CDK6 in conferring resistance to CDK4/6 inhibitors.

      (3) For each experiment, particularly those involving mice, the author must specify the number of individuals utilized and the number of replicates conducted, as detailed in the materials and methods section. 

      We sincerely thank the reviewer for bringing this to our attention. In the revised manuscript, we will provide explicit details regarding the number of replicates and mice used for each experiment. This information will be included in the materials and methods section, figure legends, and relevant text to ensure transparency and clarity.

      (4) Could this treatment approach be extended to triple-negative breast cancer? 

      We greatly appreciate the reviewer’s inquiry about extending our findings to triple-negative breast cancer (TNBC). Based on our data presented in Figure 1 and Supplementary Figure 2, which include the TNBC cell line MDA-MB-231, we anticipate that the benefits of maintaining CDK4/6 inhibitors could indeed be applied to TNBC with an intact Rb/E2F pathway.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript, Armand and colleagues investigate the potential of continuing CDK4/6 inhibitors or combining them with CDK2 inhibitors in the treatment of breast cancer that has developed resistance to initial therapy. Utilizing cellular and animal models, the research examines whether maintaining CDK4/6 inhibition or adding CDK2 inhibitors can effectively control tumor growth after resistance has set in. The key findings from the study indicate that the sustained use of CDK4/6 inhibitors can slow down the proliferation of cancer cells that have become resistant, and the combination of CDK2 inhibitors with CDK4/6 inhibitors can further enhance the suppression of tumor growth. Additionally, the study identifies that high levels of Cyclin E play a significant role in resistance to the combined therapy. These results suggest that continuing CDK4/6 inhibitors along with the strategic use of CDK2 inhibitors could be an effective strategy to overcome treatment resistance in hormone receptor-positive breast cancer.

      Strengths:

      (1) Continuous CDK4/6 Inhibitor Treatment Significantly Suppresses the Growth of Drug-Resistant HR+ Breast Cancer: The study demonstrates that the continued use of CDK4/6 inhibitors, even after disease progression, can significantly inhibit the growth of drug-resistant breast cancer.

      (2) Potential of Combined Use of CDK2 Inhibitors with CDK4/6 Inhibitors: The research highlights the potential of combining CDK2 inhibitors with CDK4/6 inhibitors to effectively suppress CDK2 activity and overcome drug resistance.

      (3) Discovery of Cyclin E Overexpression as a Key Driver: The study identifies overexpression of cyclin E as a key driver of resistance to the combination of CDK4/6 and CDK2 inhibitors, providing insights for future cancer treatments.

      (4) Consistency of In Vitro and In Vivo Experimental Results: The study obtained supportive results from both in vitro cell experiments and in vivo tumor models, enhancing the reliability of the research.

      (5) Validation with Multiple Cell Lines: The research utilized multiple HR+/HER2- breast cancer cell lines (such as MCF-7, T47D, CAMA-1) and triple-negative breast cancer cell lines (such as MDA-MB-231), validating the broad applicability of the results.

      Weaknesses:

      (1) The manuscript presents intriguing findings on the sustained use of CDK4/6 inhibitors and the potential incorporation of CDK2 inhibitors in breast cancer treatment. However, I would appreciate a more detailed discussion of how these findings could be translated into clinical practice, particularly regarding the management of patients with drug-resistant breast cancer. 

      We greatly appreciate this opportunity to further contextualize our findings within clinical practice. In the revised manuscript, we will expand the discussion to explore how the identified mechanisms can inform patient stratification and therapeutic combinations. We will also highlight the potential of integrating CDK2 inhibitors with continued CDK4/6 inhibition as a second-line strategy for HR+ breast cancer patients who exhibit resistance to CDK4/6 inhibitors, leveraging insights from current and ongoing clinical trials. This will provide a clearer framework for translating our findings into actionable therapeutic strategies.

      (2) While the emergence of resistance is acknowledged, the manuscript could benefit from a deeper exploration of the molecular mechanisms underlying resistance development. A more thorough understanding of how CDK2 inhibitors may overcome this resistance would be valuable. 

      Thank you for this insightful suggestion. In the revised manuscript, we will delve deeper into the molecular mechanisms by which CDK2 inhibitors counteract resistance to CDK4/6 inhibitors and endocrine therapy. We will emphasize the role of the non-canonical Rb inactivation pathway and upregulated transcriptional activity in reactivating CDK2, which contribute to resistance under CDK4/6 inhibition. Furthermore, we will discuss how dual inhibition of CDK4/6 and CDK2 effectively suppresses this resistance pathway, offering a mechanistic rationale for the therapeutic potential of this combination strategy.

      (3) The manuscript supports the continued use of CDK4/6 inhibitors, but it lacks a discussion on the long-term efficacy and safety of this approach. Additional studies or data to support the safety profile of prolonged CDK4/6 inhibitor use would strengthen the manuscript. 

      We greatly appreciate the reviewer for raising this important point. To address this, we will incorporate a discussion on the long-term safety and efficacy of CDK4/6 inhibitor maintenance therapy. Drawing from clinical trials and retrospective analyses (5-9), we will highlight data supporting the tolerability of prolonged CDK4/6i treatment, particularly in combination with endocrine therapy. We will also discuss its clinical benefits over chemotherapy or endocrine therapy alone, contextualizing these findings with our proposed therapeutic approach (6,8-11).

      References:

      (1) Yang C, Li Z, Bhatt T, Dickler M, Giri D, Scaltriti M_, et al._ Acquired CDK6 amplification promotes breast cancer resistance to CDK4/6 inhibitors and loss of ER signaling and dependence. Oncogene 2017;36:2255-64

      (2) Li Q, Jiang B, Guo J, Shao H, Del Priore IS, Chang Q_, et al._ INK4 Tumor Suppressor Proteins Mediate Resistance to CDK4/6 Kinase Inhibitors. Cancer Discov 2022;12:356-71

      (3) Ji W, Zhang W, Wang X, Shi Y, Yang F, Xie H_, et al._ c-myc regulates the sensitivity of breast cancer cells to palbociclib via c-myc/miR-29b-3p/CDK6 axis. Cell Death & Disease 2020;11:760

      (4) Wu X, Yang X, Xiong Y, Li R, Ito T, Ahmed TA_, et al._ Distinct CDK6 complexes determine tumor cell response to CDK4/6 inhibitors and degraders. Nature Cancer 2021;2:429-43

      (5) Martin JM, Handorf EA, Montero AJ, Goldstein LJ. Systemic Therapies Following Progression on First-line CDK4/6-inhibitor Treatment: Analysis of Real-world Data. Oncologist 2022;27:441-6

      (6) Xi J, Oza A, Thomas S, Ademuyiwa F, Weilbaecher K, Suresh R_, et al._ Retrospective Analysis of Treatment Patterns and Effectiveness of Palbociclib and Subsequent Regimens in Metastatic Breast Cancer. J Natl Compr Canc Netw 2019;17:141-7

      (7) Basile D, Gerratana L, Corvaja C, Pelizzari G, Franceschin G, Bertoli E_, et al._ First- and second-line treatment strategies for hormone-receptor (HR)-positive HER2-negative metastatic breast cancer: A real-world study. Breast 2021;57:104-12

      (8) Kalinsky K, Accordino MK, Chiuzan C, Mundi PS, Sakach E, Sathe C_, et al._ Randomized Phase II Trial of Endocrine Therapy With or Without Ribociclib After Progression on Cyclin-Dependent Kinase 4/6 Inhibition in Hormone Receptor–Positive, Human Epidermal Growth Factor Receptor 2–Negative Metastatic Breast Cancer: MAINTAIN Trial. Journal of Clinical Oncology;0:JCO.22.02392

      (9) Kalinsky K, Bianchini G, Hamilton EP, Graff SL, Park KH, Jeselsohn R_, et al._ Abemaciclib plus fulvestrant vs fulvestrant alone for HR+, HER2- advanced breast cancer following progression on a prior CDK4/6 inhibitor plus endocrine therapy: Primary outcome of the phase 3 postMONARCH trial. Journal of Clinical Oncology 2024;42:LBA1001-LBA

      (10) Mayer EL, Wander SA, Regan MM, DeMichele A, Forero-Torres A, Rimawi MF_, et al._ Palbociclib after CDK and endocrine therapy (PACE): A randomized phase II study of fulvestrant, palbociclib, and avelumab for endocrine pre-treated ER+/HER2- metastatic breast cancer. Journal of Clinical Oncology 2018;36:TPS1104-TPS

      (11) Llombart-Cussac A, Harper-Wynne C, Perello A, Hennequin A, Fernandez A, Colleoni M_, et al._ Second-line endocrine therapy (ET) with or without palbociclib (P) maintenance in patients (pts) with hormone receptor-positive (HR[+])/human epidermal growth factor receptor 2-negative (HER2[-]) advanced breast cancer (ABC): PALMIRA trial. Journal of Clinical Oncology 2023;41:1001-

    1. Author response:

      We appreciate the time and thoughtful reviews of all 3 reviewers. Ahead of a full revision of the paper, we would like to address a couple of points the reviewers have raised that we plan to address in more detail in our full revision.

      (1) The relationship between membrane tension and interfacial tension: The major request by reviewers was for a better explanation of the relationship between measured mechanical parameters and membrane interfacial tension. We plan to include a schematic of the different forces at play in the membrane and to clarify our discussion and here, provide a brief explanation.

      In our study, we identified a relationship between channel activation pressure and two membrane mechanical properties (area expansion modulus (K<sub>A</sub>) and bending rigidity (K<sub>c</sub>)) though we did not find a correlation between channel activation pressure and a third mechanical property (membrane fluidity). Through further computational analysis of the membranes, we identified an additional property called interfacial tension that helps unify and explain our results. Interfacial tension (γ) is a property akin to surface tension that reflects the chemical composition at the interface of the membrane (between the polar headgroups of the lipids and the hydrophobic acyl chains of the lipids) and balances the repulsive interaction of the nonpolar hydrocarbon chains with the polar headgroup regions of the lipids. In the established polymer brush model, the expansion modulus is proportional to the interfacial tension (W. Rawicz, Biophyiscal Journal, 2000)

      γ = K<sub>A</sub>/C,

      where C is a constant. Interfacial tension occurs at the boundary between the lipid bilayer and external aqueous environment and is different from mechanical tension. While mechanical membrane tension (t) reflects a physical force in plane with the membrane, interfacial tension reflects the chemical composition at each interface of the membrane. While mechanical membrane tension depends on the size and shape of the membrane, interfacial tension is independent of these features and depends on the molecular composition of the liquid-liquid interface. An expanded discussion on this topic was recently provided (Lipowsky. Faraday Discussions. 2024). While distinct, these two properties can be related to one another via the area expansion modulus (K<sub>A</sub>). Typically, one would imagine that upon reducing interfacial tension, and correspondingly reducing the K<sub>A</sub>, it should now take less energy to stretch the membrane to the same extent and should reduce the activation pressure (and corresponding in plane mechanical tension ) required to open an embedded mechanosensitive channel. Interestingly though, interfacial tension also works to pull the channel open so that a reduction in interfacial tension also means more energy will be required to open the channel. We find that reductions in interfacial tension and corresponding increased energy required to open embedded channels outweighs the reduced tension that should be required to stretch the membrane. We plan to more clearly explain this tradeoff in our revision. Overal, our findings identify the exact properties driving mechanosensitive channel behavior in our study. Further, they provide a guide to understanding how and why shifts in mechanosensitive channel activation occur by connecting chemical composition changes to the changes in membrane tension propagation in a given membrane.

      (2) Data presentation to support determined area expansion modulus and bending rigidity values: We will show stress strain curves used to derive Ka and kc values

      (3) Address why membrane tension data was not shown for ephys experiments: The micropipette and patch clamp setups are different, and we did not use the same system for both measurements. In fact, limitations in tools that would allow for concurrent tension measurements while conducting channel activation measurements have limited our understanding of the role of membrane tension on mechanosensation to date. While recent studies have attempted to resolve this limitation through the design of new tools that enable concurrent monitoring of mechanosensitive channel activation and membrane tension (Lüchtefeld et al. Nature Methods. 2024), these tools were not available to us during our study or now. Because our study also attempted to connect these two features (membrane tension and channel activation) but we lacked tools to do so simultaneously, we used two sets of measurements to separately uncover membrane mechanical properties and channel activation pressure.

      One reason it is difficult to measure membrane tension during a typical patch clamp study is because of limitations in the imaging equipment and pipettes used for this assay. The experiment is usually done by looking through the eyepiece and the pipette angle is around 45 degrees from the plane of the stage so it would be hard to visualize changes in the patch geometry in the tip of the pipette. Basically, we are able to see the pipette touch the GMPV, but cannot resolve the patch moving up the pipette. In response to the reviewer comment that tension=pressure difference times pipette radius divided by 2, we were unable to measure the radius and changes in radius of a patch upon increases in applied pressure due to the above mentioned imaging constraints. This limitation is why we were unable to directly measure applied tension with our current patch clamp set up.

      (4) Interfacial tension is not experimentally measured: Interfacial tension = K<sub>A</sub> /C where C is a constant (typically C=4 for bilayer membranes). The best way to measure interfacial tension is to determine K<sub>A</sub> (the area expansion modulus), which we have experimentally done by generating stress vs strain curves for GPMVs. In literature, reductions in interfacial tension of a membrane are typically experimentally determined by measuring a corresponding reduction in the associated K<sub>A</sub> value (eg. Ly and Longo. Biophys J. 2004). We have similarly followed this approach.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Therefore, their tool may be useful for stimulating multiple populations using a blue excitatory opsin in neuron A and their tool for red excitation of neuron B… Yet, there are no data presented that showcases their new tool for this purpose

      We agree with the reviewer that in this manuscript we have not experimentally shown the applicability of our system for dual optical stimulation. However, the suppression of blue-light excitation of ZipV/T-IvfChr-expressing neurons strongly suggests this can be used in experiments exciting populations of neurons similarly shown for BiPOLES. We don’t see a theoretical basis where this experiment cannot be done if sufficient cell targeting mechanisms (such as the use of cre-lox or retroAAV) is utilized. We have started several projects pursuing these utilities in the meantime.

      While they do show that red light = excitation and blue light = inhibition, they neither show 1) all-optical on/off modulation of the same cell; nor 2) high-frequency inhibition or excitation (max stim rate of 20hz, which is the same as the BiPOLES paper used for their LC stimulation paradigm; Vierock, as above, Figure 7a-d).

      Regarding point 1, we understand that the reviewer asks if we have optically excited (with red light) and inhibited (with blue light) the same neurons. If so, figure 4B1 (optical excitation of ZipT-IvfCh with red light) and figure 5A (optical inhibition of  ZipT-IvfCh with blue light) represent largely the same set of neurons.

      Regarding point 2, we respectfully disagree with the reviewer’s interpretation of Figure 7a-d) in Vierock et al. As we understand, in this part the authors apply a 20 Hz optical stimulation protocol to the LC neurons in vivo. However, there is no data showing that individual neurons do follow this stimulation protocol. To be clear, we are not saying that BiPOLES cannot drive 20 Hz APs. Very likely it can. It is based on ChrimsonR which is capable of doing so (Klapoetke et al., Figure 2). Although, in this manuscript we have not shown data for optical stimulation above 20Hz, our system is based on vfChrimson, which is known to drive AP of 100Hz and above (Mager et al., figure 2 and 3).  

      they must revise the manuscript to show that their approach is both 1) different in some way when compared to BiPOLES (it is my understanding that they did not do this, as per the supplementary alignment of the BiPOLES sequence and the sequence of the BiPOLES-like construct that they did test) and 2) that the properties that the investigators specifically tailored their construct to have confer some sort of experimental advantage when compared to the existing standard.

      In the latest version of the manuscript, we have compared our ZipV-IvfChr and the BiPOLES construct adapted with vfChrimson (Fig. 2 Suppl 1). The mean photocurrent amplitude of IvfChr in the ZipV-IvfChr construct is ~2.7 x higher than BiPOLES adapted with vfChrimson (14 randomly selected HEK293 cells in each group) (Fig. 2 Suppl 1B). We conducted this experiment in HEK293 cells to ensure accurate voltage-clamping and less biased cell selection. Even adjusting for the smaller photocurrent of vfChrimson vs ChrimsonR, this would still translate to ~1.6 x greater photocurrent with ZipV-IvfChr compared to the original BiPOLES utilizing ChrimsonR. We believe the increased efficiency of excitation is an important aspect of adapting vfChrimson for red-light excitation of neurons.

      Reviewer #2 (Public Review):

      (1) In the Introduction or Discussion, the authors could better motivate the need for a red-shifted actuator that lacks blue crosstalk, by giving some specific examples of how the tool could be productively used, e.g. pairing with another blue-shifted excitatory opsin in a different population, or pairing with a GFP-based fluorescent indicator, e.g. GCaMP. The motivation for the current tool is not obvious to non-experts.

      In the discussion, we now provided examples for potential use of the tool. For example, one of the key aspects that can be manipulated by the existing tool is the induction of spike-timing dependent plasticity with 2 wavelengths of light with blue light channelrhodopsin such as oChIEF is used to evoke presynaptic release and ZipT-IvfChr expressed in postsynaptic neuron. In this situation, the rapid termination of inhibitory response is critical so it does not interfere with the induction of LTP or LTD. Another experiment is the alternate control of projection neurons and interneurons in cortical areas, independent controls of neurons of direct and indirect pathways in the striatum to manipulate behavior.

      (2) Simultaneous excitation and inhibition are not the same as non-excitation. The authors mentioned shunting briefly. Another possible issue is changes in osmotic balance. Activation of a Na+ channel and a Cl- channel will lead to net import of NaCl into the cell, possibly changing osmotic pressure. Please discuss.

      We agree with the notion that osmotic, ionic and pH changes in small neuronal structure can be disruptive to the physiology and this is the reason we developed our approach where the fastest channelrhodopsins are used so we can minimize the channel opening time and the flux of ions through the channels when brief light illuminations are applied. Not only the flux of protons, sodium ions and calcium ions are minimized, the flux of chloride should be minimal as well (as the membrane potential should be close to the reversal potential of chloride reversal potential hence low ion flow). Hence our approach should be minimally disruptive compared to most other existing channelrhodopsin-based approaches when short or minimal light pulses were used in conjunction with our tools. This recommendation is included in the updated manuscript .

      (3) The authors showed that in ZipT-IvfChr, orange light drives excitation and blue light does not. But what about simultaneous blue and orange light? Can the blue light overwhelm the effect of the orange light? Since the stated goal is to open the blue part of the spectrum for other applications, one is now worried about "negative" crosstalk. Please discuss and, ideally, characterize this phenomenon.

      We now have performed this experiment. Simultaneous blue (470nm) and red light (635nm) stimulation does not produce AP (Fig .4 Suppl 1A)). This suggests the inhibitory effect of ACR is more efficient than the excitatory effects of IvfChr due to their higher conductance, this re-emphasizes the rapid termination of the ACR effects is critical for minimal disruption of physiological effects in such pairing strategy.

      (3.1) Does the use of the new tool require careful balancing of the expression levels of the ZipT and the IvfChr? Does it require careful balancing of blue and orange light intensities?

      As with any optogenetic tool, the users should validate the efficacy of the tool in their own system. Our tool solely relies on the balanced expression of the 2A system, the efficiency of the two opsins and their degradation of the time-span of expression. These aspects of the tool would be better addressed in future versions of the tools or improvement of the BiPOLES-type of tandem expression in subsequent versions. From the instrumentation side, the light intensity and differential penetration depth requires careful consideration. However, this holds true in most optogenetic and fluorescence imaging-based approaches as well. In the current update of the manuscript, we have included further discussion on these aspects as well.

      (3.2) Also, many opsins show complex and nonlinear responses to dual-wavelength illumination, so each component should be characterized individually under simultaneous blue + orange light.

      We now have performed this experiment (please see our comment to point 3)

      (3.3) I was expecting to see photocurrents at different holding potentials as a function of illumination wavelength for the coexpressed construct (i.e. to see at what wavelength it switches from being excitatory to inhibitory); and also to see I-V curves of the photocurrent at blue and orange wavelengths for the co-expressed constructs (i.e. to see the reversal potential under blue excitation). Overall, the patch clamp and spectroscopic characterization of the individual constructs was stronger than that of the combined constructs.

      We have added the IV curves for the co-expressed construct at different holding potentials for 470nm and 635nm wavelengths. This shows reverse potential for the two wavelengths that are intended for in vitro and in vivo applications. Performing a similar experiment for a variety of wavelengths would not be as valuable, in part, due to the enormous amount of data generated. As we have shown in the study, the response of any channelrhodopsins vary with different light duration and light intensities in addition to the wavelengths and holding potentials. The results for each recorded cell could include stimulation by different wavelengths, stimulation by different illumination intensities, stimulation with different light duration in addition to different holding potentials. Not only would the results be highly variable from cell-to-cell, there will be potentially hundreds or thousands of combinations to be tested per cell (e.g., 5 light intensities @1, 2.5 , 5 , 10 and 20 mW/mm>sup>2</sup>, 8 different wavelengths @ 450nm, 475nm, 500nm, 525nm, 550nm, 575nm, 600nm and 625nm, 7 light durations @ 1ms, 5ms, 10ms, 50ms, 100ms, 500ms and 1s, and , and 6 holding potentials @ -80mV, -70mV, -60mV, -40mV, -20mV and 0mV would result in 1680 stimulation conditions per recorded cell).Technically, the significant lowering of membrane resistance when both IvfChr and ZipACR variants are activated simultaneously would compromise the quality of voltage-clamping even in HEK293 cells with series resistance compensation. We have yet to see any other studies that had included such ambitious electrophysiology experiment for the channelrhodopsin characterization, likely due to the feasibility of such experiment.

      Reviewer #3 (Public Review):

      (1) The enhanced vf-Chrimson could potentially be a highlight of the manuscript, serving broader applications. Yet, gauging the overall improvements of ivf-Chrimson in comparison to other Chrimson variants remains intricate due to several reasons. First, photocurrents from ivf-Chrimson seem smaller than those from C-Chrimson (Supplemental Figure 3), and a direct comparison with standard vf-Chrimson is absent.

      We appreciate the reviewer’s positive view of our modified variant. We did not emphasize this particular modification as it was identical to our previous published modification and similar to that previously published by others (CsChrimson and C1Chrimson). In all these cases, improved membrane expression was consistently detected. We believe that expression data and our comparison of C-Chrimson and IvfChr is sufficient to justify the improved membrane expression and function.

      Second, while membrane expression of ivf-Chrimson appears enhanced in provided brightfield recordings, the quantitative analysis would necessitate confocal microscopy and a membrane marker (Supplemental Figure)

      We have now quantified the results with a membrane palmitoylated mCherry using confocal microscopy shown in Fig 2 Suppl1 A. We measured the Pearson Correlation Coefficient of the mCherry with EGFP or Citrine signal for the 6 constructs (vfChrimson, vfChrimson with trafficking sequence, vfChrimson with N-terminal signaling peptide from oChIEF (C-vfChrimson), vfChrimson with trafficking sequence and N-terminal signaling peptide from oChIEF (IvfChr), BiPOLES with EGFP or citrine and vfChrimson) and the results were identical and consistent with the prior results using epifluorescence microscopy.

      (2) Finally, other N-terminal modified Chrimson variants, like CsChrimson by Klapoetke et al. in 2014 and C1Chrimson by Oda et al. in 2018, have been generated. Comparing ivf-Chrimson to vf-CsChrimson or vf-C1Chrimson would be important to evaluate the benefits of the applied N-terminal modification.

      Our development of IvfChrimson is similar to the approach of vf-CsChrimson and identical to that of vf-C1Chrimson and we do not claim these modifications to be unique or superior. However, we have developed our design independently of these other studies and we have more extensive functional comparison and characterization data of our IvfChrimson variant than the other studies.

      (2.1) The action spectra of ZipACR suggest peak absorption of ZipACR WT and its mutant at 525 - 550 nm (Fig. 3). This is even further red-shifted than previously reported by Govorunova et al. Further action spectra recordings differ for all constructs between recordings initiated with blue or red light (Supplementary Fig. 5). This discrepancy is unexpected and should be discussed.

      We thank the reviewer for the comment, this was a mistake in the traces used for the figure. The example traces were the spectral response measured from the 400 nm to 650 nm instead of the 650 nm to 400 nm order shown in the spectral data. This has now been corrected.

      Additionally, the representative photocurrents of Zip(151V) in Fig. 3D1 do not align with the corresponding action spectrum in Fig. 3D2 as they show maximal photocurrents for 400 nm excitation.

      Please, see point above.

      (3) The authors introduce two different bicistronic expression cassettes-ZipT-IvfChR and ZipV-IvfChR-without providing clear guidelines on their conditions of use. Although the authors assert that ZipT is slower and further red-shifted than ZipV, the differences in the data for both ACR mutants are small and the benefits of the different final constructs should be explained.

      In our testing in neurons, ZipT has less ‘escaped’ spikes after the termination of the light pulses in the cells we have tested. However, this is dependent on the membrane properties such as capacitance and resistance of the cells. ZipV has a faster termination time and in some situations may be necessary due to its faster termination time and reduced disruption of physiological processes.

      We have now included this discussion in our updated manuscript.

      (4) The ZipT/V-IvfChRs are designed as bicistronic constructs; yet, disparities in membrane trafficking and protein degradation between the two channels could lead to divergences in blue and red light photoresponses. For future applicants, understanding the extent of expression ratio variations across cells using the presented expression cassettes could be of significance and should be discussed.

      We now have included this discussion in our responses above.

      Reviewer #1 (Recommendations For The Authors):

      (1) The Figure 1a mV cartoon traces for chloride are confusing. The chloride currents are depolarizing, not hyperpolarizing. As noted by the authors, these channels largely generate AP blockade through shunting inhibition (division), not hyperpolarization (subtraction).

      The figure has been corrected.

      (2) Figure 2A does not show where the light is applied. Why are some of the bars blue and some of them not filled?

      This has been corrected

      (3) Figure 2C1 does not show where the light is applied. There should be an inset to detail the blue-light-cessation-evoked AP. Also doesn't give the holding potential.

      The requested details are added.

      (4) Figure 2C2 inset is described as showing that "Light-induced currents with 470 nm illumination were initially outward but turned inward immediately following light offset." Is that correct? It looks to me like the current turns inward about half-way through the light pulse and then becomes even stronger after the light turns off. That is also consistent with the CC traces, which appear to show a transition toward depolarization during the light pulse before the AP initiation at light offset.

      Yes, the reviewer's observation is correct. There are blue light-induced outward and inward current peaks at the onset and offset of the light. Accordingly, we have modified the phrasing for Fig. 2C2.

      (5) Figure 3D1 shows that Zip(151V) has a peak current at 400nm, with a steady increase in current from red to blue, however, this is not the case in the summary data in 3D2. It's also not shown in Supplementary Figure 5B. What's going on?

      We apologize for the prior version of the figure associated with the first submission. The example traces from 400nm -> 650 nm were incorrectly included in the figure whereas the 650nm -> 400 nm example traces should be included. This has been corrected.

      (6) Figure 3D1 has no time scale.

      It is now been included

      (7) Figure 3E1 should read "Transduced" and not "Transfected"

      This has been corrected.

      (8) IvfChr fidelity drops off dramatically at 20hz...down to 50% efficiency of generating APs. This is described in the legend as "high frequency". Maybe the cart came before the horse in this figure...as it looks like in panel C that using less light power density improves fidelity in the dual opsin configuration with red light stimulation...why not use that power for the characterization? Did you try any higher frequencies? Or longer pulse widths? This is an important characterization to inform further use of the tool. This shortcoming isn't a cell-intrinsic limitation, as the 470nm stim with IVfChr was 100% successful at both 10hz and 20hz.

      It is known that red but not blue light pulses induce desensitization (optical fatigue) in red-shifted ChR variants. Indeed, one can reinstate the response to red light, by giving violet-blue light pulses (Fig 4. Suppl 2). We think this is the reason that the 470nm stimulation was more effective in inducing AP in cells expressing IvfChR. Higher light intensities induce greater desensitization, but are preferred for faster opening of channels and depolarization of neurons. This can explain why, in some situations, lower light intensities were more effective in producing APs when pulse trains were used. We have recordings from cells firing APs at 40Hz (not included). All these cells had high expression levels of the opsin.   

      (9) Figure 4D: why use 100ms pulse width? How do you know that this isn't causing depol block? Or some of the nefarious concerns that are raised in the discussion, such as "...disrupt[ion of] normal neuronal physiology and signal processing that occurs in millisecond time scale"?

      We used 100ms pulse duration to follow the published protocol that this experiment is based on (Lin et al., 2013, Nature Neuroscience). 

      (10) Figure 4E-bottom: What is the blue peak at light onset? Is the tool driving early activation before silencing?

      There seems to be an early, sharp and brief activation by blue light. We don’t know the definite cause of this, but we speculate this is driven by blue-light activation of ZipACR and not the IvfChr portion of the construct. The reason is that such a sharp rise is absent when only IvfChr is expressed (Fig. 4E, upper panel). Soma-targeted motif tethered to channelrhodopsins is known to result in preferential expression of channels close to soma but does not exclude the expression of channelrhodopsin in axonal and dendritic compartments, especially when animals are allow to recover for long period of time after viral injection. We believe that ZipACR at axonal terminals where the chloride concentration is high can still cause blue-light evoked depolarization and transmitter release. We observed this phenomenon in two mice in their first trial. The data for individual trials for each mouse are included in a supplementary table.

      (11) Figure 4G: Earlier in this same figure (B2, C), 470nm light was more effective at stimulating IvfChr than 635nm light. Is it unexpected that 638nm light would in this in vivo context be more effective at driving IvfChr responses than 450 nm light (at least as reflected by the AUC measurements)? Does this reflect fiber placement and light penetration/scattering?

      The spectral peaks of Chrimson-based variants including vfChrimson are all centered around 600 nm, and at 635 / 638 nm light, the amplitudes of photo-response decline, the channel onset slows, and the channels suffer greater desensitization. In isolated preparations where the light penetration is similar between 635 / 638 nm and 470 nm, 470 nm responses can outperform 635 / 638 nm responses due to its lack of desensitization and higher consistency in its response. This is also a strong reason that we have developed our current approach. In in vivo preparation shown in Fig. 4D-G, the much higher tissue penetration of 638nm light due to reduced absorption and reduced scattering can offset the performance of IvfChr to 450 nm light.  

      (12) In the methods, it is noted that different viral batches appear to generate different levels of neuronal toxicity. If that is the case, how did you differentiate between true differences between constructs vs. differential cell health effects?

      For figure 4D-F (whisker movement), we determined virus toxicity using NeuN staining. In slice recordings, we used the electrophysiological property of the neurons to assess their health. For this manuscript, we had one batch of virus that produced toxicity. We did not include any data from this batch.

      Reviewer #2 (Recommendations For The Authors):

      ● Define AUC on first use.

      It is now defined.

      ● Figure 3C2: Please explain how the photocurrents were normalized. As presented, it looks like under strong orange light, the ZipACR has higher photocurrent than the ivfChr.

      This is due to the fact vfChrimson and other Chrimson-based variants do not fully recover in the dark after 590 nm stimulation. We tested IvfChrimson with both reconditioning light pulse of 405 nm and without 405 nm and we can consistently reach a greater ‘maximal’ response from the same cell after 405 nm reconditioning (see Fig. 4 Suppl 2). We therefore normalize the response to the maximal recorded response of the cell often achieved with 10 or 20 mW/mm<sup>2</sup> 590 nm stimulation after 405 nm reconditioning. We understand this can be confusing and have now replaced the light-intensity response in Fig. 3C2 with the one with 405 nm reconditioning which is easier to interpret for the readers.

      ● P. 3: "As expected, blue light pulses induce transient membrane suppression..." Unclear what "suppression" means. Shunting? Hyperpolarization?

      We rephrased this to “As expected, blue light pulses transiently suppress APs…”

      ● P. 3: "illumination at 470 nm and 590 nm wavelengths led to similar amounts of courtship song (110.1 {plus minus} 12.8 and 78.5 {plus minus} 11.6,n = 16-17, respectively)". What are the units of "courtship song"?

      The unit for courtship song is the number of pulses per 10 seconds. This has been clarified in the figure.

      ● P. 5: The quantification of photocurrent in terms of pA/pF/A.U. is non-standard. I understand the impetus to normalize by expression to give something proportional to per-molecule conductance, but a user cares about overall photocurrent. Please also give the real photocurrents, either pA or pA/pF.

      We have provided the real photocurrent in pA or pA/pF where scientifically appropriate. To avoid selection and experimenter’s bias in our data, we did not set criteria for data elimination for cells with specific fluorescence intensity or photocurrent amplitude. Some resulting response can range from vary up to 20 folds from the same construct in many experiments. We do not believe that averaging absolute photocurrent amplitude would be justified due to the imbalance of weighing in the results. We do acknowledge that not selecting or eliminating data points would introduce higher noise in recordings with smaller responses but this is preferable over the selection or experimenter bias that is likely to be introduced otherwise.

      ● Please quote illumination intensities wherever possible.

      ● P. 7: why was the red light crosstalk into Zip(151T) tested at 635 nm instead of 590 nm? Isn't the relevant parameter 590 nm, since that will be used for the excitatory opsin?

      In all our characterizations of the constructs using slice electrophysiology recordings, we used 635nm instead of 590nm. The reason is that compared to 590nm wavelength, at 635nm the photocurrent for Zip(151T) and Zip(151V) is significantly reduced (Fig. 3D1,D2).

      ● P. 10: "we examined the power at which responses to 470 nm and 635 nm lights induce APs in neurons expressing ZipT-IvfChr, ZipV-IvfChr, or IvfChr", but the preceding sentence says you didn't test the ZipT-IvfChr. This is confusing, please clarify.

      The previous paragraph refers to the photocurrent recordings in HEK293 cells where our fast LED based illumination system is limited to 590 nm light, whereas the subsequent paragraph refers to the brain slice neuronal recordings. We have now emphasized the difference of the experiments in the rewrite.

      ● Fig. 4B1, top: Why don't the blue traces return to the same baseline after the stimulus epochs?

      We observed this shift in baseline (~4mV more depolarized) in cells expressing IvfChR (or vfChR) only with blue light stimulation. This was observed in the neurons recorded in the CA1 as well (data not shown). There was no such a change following red light stimulation (Fig. 4B1). Therefore, this should not affect the applicability of our construct. The original paper introducing vfChR did not test the responses of their constructs to blue light. There could be another photocycle state that is activated stronger by 470nm than 590nm and it has a slow off-rate, but this is only a speculation from our side. It must be noted we did not observe such a phenomenon in cells expressing ChrimsonR (Fig. 1 Suppl 1C).

      ● Fig. S3B, right: The two colors are barely distinguishable on the graph. Consider more distinct colors and/or different symbols.

      It has been changed accordingly.

      ● P. 15: "However, we do not recommend the use of orange light pulses, as we observed a significant photocurrent in this wavelength." Not clear what this is referring to. Which construct? Under which circumstances shouldn't one use orange light pulses? Where's the data showing this?

      This is referring to Fig. 3D1,D2 and Figure 4 suppl Fig. 2 which show a normalized ~40-50% photocurrent at 590nm. Now in the text, the reference figures for the data are added.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Audio et al. measured cerebral blood volume (CBV) across cortical areas and layers using high-resolution MRI with contrast agents in non-human primates. While the non-invasive CBV MRI methodology is often used to enhance fMRI sensitivity in NHPs, its application for baseline CBV measurement is rare due to the complexities of susceptibility contrast mechanisms. The authors determined the number of large vessels and the areal and laminar variations of CBV in NHP and compared those with various other metrics.

      Strengths:

      Non-invasive mapping of relative cerebral blood volume is novel for non-human primates. A key finding was the observation of variations in CBV across regions; primary sensory cortices had high CBV, whereas other higher areas had low CBV. The measured CBV values correlated with previously reported neuronal and receptor densities.

      Weaknesses:

      A weakness of this manuscript is that the quantification of CBV with postprocessing approaches to remove susceptibility effects from pial and penetrating vessels, as well as orientation dependency, is not fully validated, especially on a laminar scale. Further specific comments follow.

      We suspect that the comment regarding the lack of validation on laminar level stems from an error made by the corresponding author in the original bioRxiv submission (v1, May 17th https://www.biorxiv.org/content/10.1101/2024.05.16.594068v1?versioned=true), where Figure 3 which contains laminar validation was lost during pdf conversion. After submitting to E-Life, this mistake was quickly identified, and a corrected manuscript was re-uploaded to the bioRxiv (v2, June 5th, https://doi.org/10.1101/2024.05.16.594068). Although we informed the eLife staff about the update, it appears that the revised manuscript may not have reached reviewer #1 in time. We sincerely apologize for any confusion or inconvenience this may have caused.

      (1) Baseline CBV indices were determined using contrast agent-enhanced MRI (deltaR2*). Although this approach is suitable for areal comparisons, its application on a laminar scale has not been validated in the literature or in this study. By comparing with histological vascular information of V1, the authors attempted to validate their approach. However, the generalization of their method is questionable. The main issue is whether the large vessel contribution is minimized by processing approaches properly in various cortical areas (such as clusters 1-3 in Figure 5). It would be beneficial to compare deltaR2* with deltaR2 induced by contrast agents in a few selected slices, as deltaR2 is supposed to be sensitive to microvessels, not macrovessels. Please discuss this issue.

      The requested validation is presented in Figure 3F, which compares our deltaR2* measurements with previously invasive estimates of large vessel, capillary and cytochrome oxidase (CO) levels in V1 (Weber et al., 2008; doi.org/10.1093/cercor/bhm259). Our deltaR2* values show a stronger correspondence with microvascularity and CO levels than large vessels. Moreover, Figure 3D illustrates relative differences between V1 and V2, which closely align with the relative vascular volume differences reported by Zheng et al., 1991. It is important to note that Weber and colleagues averaged across V2-V5 due to similar vascularity across these areas. In our material, we also observed similar vascularity in these areas, though V5 (e.g., MT) has slightly denser vascularity, in agreement with reports of CO staining.

      Additionally, we report similar GM/WM vascular density, and high vascular density in primary sensory areas. Unfortunately, available ground-truth data on vascularity does not provide further (general) validation data for laminar vasculature in macaques (such as those in cluster 1-3; Fig. 5). That said, we have provided substantial evidence linking whole-brain vascular measures with variations in neuron (for data distribution, see Supp. Fig. 6F) and receptor densities, which we believe provides strong support for our approach.

      We would like to clarify that the authors do not assert that gradient-echo MRI is exclusively sensitive to microvessels and not macrovessels. This is not stated anywhere in the manuscript. If any sentence appears misleading, please let us know, and we will consider revising it. It is well-established that large vessels contribute to ΔR2* (Ogawa et al., 1993; Boxerman et al., 1995), and this is clearly stated in the manuscript (introduction, methods, results and discussion) and demonstrated in Figures 2A, B, and Supp. Figs. 2, 3, and 4. The primary concern, as the reviewer also noted, is whether we have sufficiently minimized the contribution of large vessels in our parcellated data analysis.

      At the parcellated level, we used the median value to avoid skewness in the data distribution, which primarily arises from large vessels, as regions near these vessels exhibit higher ΔR2*. The skewness of ΔR2* is also visible in Figure 1F, G. While this approach mitigates this large-small vessel issue, it does not entirely resolve it, as a slight linear increase toward the cortical surface remains (in all parcels). This is likely due to our inability to delineate all penetrating vessels as shown in Figure 2E and because contrast agents cumulatively accumulate toward superficial layers where blood originates and returns to the pial surface. To mitigate this issue, we detrended across layers the parcellated profiles, obtaining results similar to the ground-truth measures of vascularity in V1-V5 and CO histology in V1.

      (2) High-resolution MRI with a critical sampling frequency estimated from previous studies (Weber 2008, Zheng 1991) was performed to separate penetrating vessels, which is considered one of the major advancements in this study. However, this approach is still insufficient to accurately identify the number of vessels due to the blooming effects of susceptibility and insufficient spatial resolution. There was no detailed description of the detection criteria. More importantly, the number of observable penetrating vessels is dependent on imaging parameters and the dose of the contrast agent. If imaging slices were obtained in parallel to the cortex with higher in-plane resolution, it would likely improve the detection of penetrating vessels. Using higher-field MRI would further enhance the detection of penetrating vessels. Therefore, the reported value is only applicable to the experimental and processing conditions used in this study. Detailed selection criteria should be mentioned, and all potential pitfalls should be discussed.

      We believe that Figure 2 represents a significant conceptual and data analysis advancement in the field of vascular imaging. To the best of our knowledge, this is the first MRI study attempting to assess vessel density across cortical layers and compare the number of vessels to the known ground-truth. While we do not claim to have achieved a perfect solution (as shown in Figure 2), we offer a robust challenge to the imaging community by introducing this novel benchmarking approach. Our hope is that this conceptual framework will inspire the MR imaging community to tackle this challenge.

      Regarding imaging parameters, TE did not have much effect on our results, with a slight effect observed in the superficial layers due to the presence of large pial vessels (blooming effect; Fig. 2C). This also suggests that similar results could be achieved by changing the contrast agent dose, though there are, of course, CNR requirements and limitations at either end of the spectrum.

      We completely agree with the reviewer that spatial resolution is critical in resolving the arterio-venous networks, and we have dedicated significant attention to this topic in the introduction, results and discussion sections. We also agree with the reviewer that if imaging slices were obtained in parallel to the cortex with higher in-plane resolution, it would improve the detection of vessels. However, while this approach is ideal for counting vessels in a single plane and isolated region of cortex, it is less suited to the surface mapping of vessels, which is the focus of our study.

      Regarding the exclusion of vessels, based on visual comparison of vessels in volume space, Frangi-filter detection of vessels in volume space, and surface detection of vessels, we found no evidence to develop additional exclusion criteria (Supp. Fig. 3). On the contrary, we identified a number of false negatives in both the surface maps and volume maps. Notable exceptions to this rule seemed to occur at premotor areas F2 and F3 (Matelli et al., 1984; Patterns of cytochrome oxidase activity in the frontal agranular cortex of the macaque monkey). In these regions, we observed peculiar “pockets” of signal drop-out in equivolumetric layers 4-5. It is unclear what these signal-voids represent but it is interesting to note that these cortical areas F1-F5 were originally delineated by distinct CO+ positive large cells (Matelli et al., 1984).

      (3) Attempts to obtain pial vascular structures were made (Figure 2). As mentioned in this manuscript, the blooming effect of susceptibility contrasts is problematic. In the MRI community, T1-based Gd contrast agents have been used for mapping large vasculature, which is a better approach for obtaining pial vascular structures. Alternatively, computer tomography with a blood contrast agent can be used for mapping blood vasculature noninvasively. This issue should be discussed.

      We agree with the reviewer that T1-based contrast agents may offer more precise direct localization of large vessels in pial vasculature. However, the primary focus of our study was not on visualizing pial vascular structures, but rather on measuring vascular volume across cortical layers. For this purpose, we opted to use ferumoxytol, which provides superior T2*-contrast and about ten times longer plasma half-life compared to gadolinium. While we anticipated artifacts from the pial network, we developed a novel method to indirectly map these long-distance susceptibility artifacts arising from large vessels onto the cortical surface (Fig. 2A). If the goal would be to specifically visualize pial vessels, we applaud the high-resolution TOF angiography developed for direct vessel visualization (Bollman et al., 2022; https://doi.org/10.7554/eLife.71186)

      Changes in text:

      “4.1 Methodological considerations - vessel density informed MRI

      While the pial vessels can be directly visualized using high-resolution time-of-flight MRI (Bollmann et al., 2022), and computed tomography (Starosolski et al., 2015), imaging of the dense vascularity within the large and highly convoluted primate gray matter presents other formidable challenges. Here, we used a combination of ferumoxytol contrast agent and cortical layer resolution 3D gradient-echo MRI to map cerebrovascular architecture in macaque monkeys. These methods allowed us to indirectly delineate large vessels and indirectly estimate translaminar variations in cortical microvasculature.”

      (4) Since baseline R2* is related to baseline R2, vascular volume, iron content, and susceptibility gradients, it is difficult to correlate it with physiological parameters. Baseline R2* is also sensitive to imaging parameters; higher spatial resolution tends to result in lower R2* values (closer to the R2 value). Therefore, baseline R2* findings need to be emphasized.

      We agree with the reviewer's comment on the complexity of correlating baseline R2* with vasculature, given its sensitivity to multiple factors such as venous oxygenation, iron content, and imaging parameters such as image resolution. While our study focuses on vascular measurements, one could also highlight iron’s role in brain energy metabolism. Deoxygenated blood affects R2*, iron in oligodendrocytes supports myelination and neuronal signaling, and iron’s role in cytochrome c oxidase during electron transport impacts mitochondrial energy production. These metabolic factors collectively affect baseline R2* and link it to vasculature. Though quantitative susceptibility mapping (QSM) could help differentiate these different factors, it is beyond the scope of this study.

      (5) CBV-weighted deltaR2* is correlated with various other metrics (cytoarchitectural parcellation, myelin/receptor density, cortical thickness, CO, cell-type specificity, etc.). While testing the correlation between deltaR2* and these other metrics may be acceptable as an exploratory analysis, it is challenging for readers to discern a causal relationship between them. A critical question is whether CBV-weighted deltaR2* can provide insights into other metrics in diseased or abnormal brain states. If this is the case, then high-resolution deltaR2* will be useful. Please comment on this possibility.

      We agree with the reviewer that correlation deltaR2* with other metrics, such as myelin and cortical thickness, receptors and interneuron types, remains exploratory. Establishing causal relationships requires advanced multivariate analysis across cortical layers, but mapping histological stains to cortical layers is still under development. While this exploratory approach is promising, the ability to apply these insights to diseased or abnormal brain states is not yet clear. Layer-specific analysis of vasculature and function in disease is a future goal, and ongoing work aims to expand this line of inquiry. For now, while high-resolution deltaR2* may indeed offer diagnostic potential, we prefer to refrain from overstating its clinical utility at this stage. We agree that multimodal studies integrating neuroanatomy, function, and vascular metrics will be valuable for deeper insights into brain abnormalities.

      Changes in text:

      “4.3 The vascular network architecture is intricately connected to the neuroanatomical organization within cerebral cortex

      …To comprehensively understand the factors contributing to the vascular organization of the brain, experimental disentanglement through multivariate analysis of laminar cell types and receptor densities is needed (Hayashi et al., 2021, Froudist-Walsh et al., 2023).”

      (6) There is no discussion about the deltaR2* difference across subcortical areas (Figure 1). This finding is intriguing and warrants a thorough discussion in the context of the cortical findings.

      We thank the reviewer for this comment. We have expanded discussion on subcortical structures:

      Section 4.3, 1st paragraph:

      “In the cerebral cortex, neurons account for a significant portion (≈80-90%) of energy demand, with most of this energy allocated to signaling (≈80%) and maintaining membrane resting potentials (≈20%) (Attwell and Laughlin, 2001; Howarth et al., 2012). Since firing frequency is modulatory and the neural networks utilize distributed coding, the maintenance of resting-state membrane potential determines the minimal energy budget and the lower-limit for cerebral perfusion. Based on neuronal variability and energy dedicated to maintaining surface potential, this suggest an approximate (4 × 20% ≈) 80% variation in CBF and a resultant 25% variation in CBV across the cortex, in line with Grubbs' law (CBV = 0.80 × CBF0.38) (Grubb et al., 1974). In the cerebellar cortex, neuron density is higher, and the resting potentials are thought to account for more than 50% of energy usage (Howarth et al., 2012), aligning with its higher vascular volume compared to the cerebral cortex (Fig. 1F). However, this is a simplified estimation, and a more comprehensive assessment would need to account for consider an aggregate of biophysical factors such as…”

      Section 4.3, 4th paragraph:

      “When viewed in terms of information flow, CBV appear to decrease along the canonical circuit pathway (e.g., L4→L2/3→L5) in the primary visual cortex (Douglas and Martin, 2007) and as one ascends the hierarchy (e.g., V1→V2→V3&4→MT→7A) from primary sensory areas (Fig. 3F, Supp. Fig. 8) (Felleman and Van Essen et al., 1991, Markov et al., 2014). A similar pattern is observed in the auditory hierarchy, where the inferior colliculus, an early processing hub, exhibits the highest vascular volume, followed by a gradual reduction along cortical auditory ‘where’ and ‘what’ pathways (Fig. 1F, Fig. 3B).”

      (7) Figure 3 is missing. Several statements in the manuscript require statistics (e.g., bimodality in Figure 2D, Figure 3F).

      We apologize to the reviewer for the absence of Figure 3 in the initial submission.

      As for statistical testing of bimodality, we respectfully disagree and feel that this would not add much value to the manuscript. We think a descriptive, rather than rigorous, approach is sufficient in this context.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents a new approach for non-invasive, MRI-based measurements of cerebral blood volume (CBV). Here, the authors use ferumoxytol, a high-contrast agent, and apply specific sequences to infer CBV. The authors then move to statistically compare measured regional CBV with the known distribution of different types of neurons, markers of metabolic load, and others. While the presented methodology captures an estimated 30% of the vasculature, the authors corroborated previous findings regarding the lack of vascular compartmentalization around functional neuronal units in the primary visual cortex.

      Strengths:

      Non-invasive methodology geared to map vascular properties in vivo.

      Implementation of a highly sensitive approach for measuring blood volume.

      Ability to map vascular structural and functional vascular metrics to other types of published data.

      Weaknesses:

      The key issue here is the underlying assumption about the appropriate spatial sampling frequency needed to capture the architecture of the brain vasculature. Namely, ~7 penetrating vessels / mm2 as derived from Weber et al 2008 (Cer Cor). The cited work begins by characterizing the spacing of penetrating arteries and ascending veins using a vascular cast of 7 monkeys (Macaca mulatta, same as in the current paper). The ~7 penetrating vessels / mm2 are computed by dividing the total number of identified vessels by the area imaged. The problem here is that all measurements were made in a "non-volumetric" manner and only in V1. Extrapolating from here to the entire brain seems like an over-assumption, particularly given the region-dependent heterogeneity that the current paper reports.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      - For broader readership, it would be beneficial to provide a guide on how to interpret baseline R2* versus ΔR2*.

      The text was edited as follows:

      “…For quantitative assessment, R<sub>2</sub>* values were estimated from multi-echo gradient-echo images acquired both before and after the administration of ferumoxytol contrast agent (Table 1). Subsequently, the baseline R<sub>2</sub>* and ΔR<sub>2</sub>*, an indirect proxy measure of CBV (Boxerman et al., 1995), volume maps for each subject were mapped onto the twelve native equivolumetric layers (ELs) (Fig. 1C). Each vertex was then corrected for normal of the cortex relative to B<sub>0</sub> direction (Supp. Fig. 1). Surface maps for each subject were registered onto a Mac25Rhesus average surface using cortical curvature landmarks and then averaged across the subjects (Fig. 1D, E). Around cortical midthickness, the distribution of R<sub>2</sub>*, an aggregate measure for ferritin-bound iron, myelin content and venous oxygenation levels (Langkammer et al., 2012), resembled the spatial pattern of ΔR<sub>2</sub>* vascular volume. However, across cortical layers, these measures exhibited reversed patterns: R<sub>2</sub>* increased toward the white matter surface, whereas ΔR<sub>2</sub> decreased (Fig. 1E, G).”

      - The legends in Figure 1 describe green/cyan arrows, which are not visible in the figure itself.

      We thank the reviewer for noting this discrepancy. The reference to green/cyan arrows was removed from the Figure 1 legend.

      - There are typos in Section 3.3: "(Figure 4A, E)" and "(cluster 3; Figure 3)" should be corrected to Figure 5.

      We thank the reviewer for noting this error. The references to the Figures were corrected.

      Reviewer #2 (Recommendations for the authors):

      The work is elegantly presented and very easy to follow. The figures and the data presented there are compelling and well-organized. I have enjoyed reading the paper, despite my disagreement with the validity of the methodology presented.

      Validation against MRA methods (high resolution needed here, Bolan et al 2006, cited also by the authors). Certainly, that work used a much higher magnetic field. This could be done through collaboration if such a magnet is not available. In my humble opinion, the current arguments provided in the paper as validation fall short in convincing future readers. Other TOF approaches might be better suited (in combination with line scanning or single plane sequences) for the 3T used in this work.

      We appreciate the reviewer’s suggestion regarding time-of-flight (TOF) angiography at ultra-high magnetic fields, such as 9.4T for improved visualization of fast-flowing blood in arterial vessels, as elegantly demonstrated in Bolan et al., 2006. However, our focus was on mapping vasculature across cortical layers and TOF is not optimal for imaging slow capillary blood inflow. To enhance CNR also at capillary level, we used ferumoxytol-contrast agent to create quantitative CBV-weighted cortical layer maps (Boxerman et al., 1995).

      We are open to collaborative opportunities to revisit this work using ultra-high magnetic field strengths and more detailed neuroanatomical ground-truth measures. However, the recommended line scanning or single-plane sequences, at least on first impression, seem inadequate for whole-brain coverage and cortical surface mapping.

      Some of the methodology can be made more accessible to non-MRI readers. For example, a more elaborate explanation of R2* and ΔR2 could benefit future readers.

      Elaborated as requested (see above reply).

      A more detailed discussion of the limitations of the methodology could also be beneficial here. Explain the potential implications of under-sampling denser vascular areas (i.e. with potentially more than 7 penetrating vessels per mm2).

      V1, with its highest neuronal density, likely also has the highest feeding/draining vessel density. Based on this, we hypothesized that a 0.23 mm isotropic image resolution would sufficiently capture cortical arterio-venous networks, but we did not achieve the expected detection of 7 penetrating vessels per mm<sup>2</sup>. Consequently, we refrained from quantifying vessel density in other areas, albeit we did report the total vessel count.

      This under-sampling likely biases our ΔR2* estimates, skewing them toward larger vessels. To address this, we used median parcel values to avoid over-representing large vessels (the long-tail in ΔR2 parcels data distribution represents large vessels) and corrected for the cortical surface bias where blood originates from and returns to the pial network. These steps helped mitigate large vessel bias as described in the methods, results and discussion (see also our response to Reviewer #1, question #1).

      To improve clarity for readers, we further clarified:

      Methods:

      “The effect of blood accumulation in large feeding arteries and draining veins toward in the superficial layers was estimated using linear model and regressed out from the parcellated ΔR<sub>2</sub>* maps.”

      Results:

      “To mitigate bias resulting from undersampling the large-caliber vessels (Fig. 2A, B), median parcel values were obtained and M132 parcellated ΔR2* profiles were then detrended across ELs in each subject and then averaged.”

      Discussion:

      “This methodology, however, has known limitations. First, gradient-echo imaging is more sensitized toward large pial vessels running along the cortical surface and large penetrating vessels, which could differentially bias the estimation of Δ R<sub>2</sub>* across cortical layers (Fig. 2A, 2B) (Boxermann et al., 1995; Zhao et al., 2006). Additionally, vessel orientation relative to the B<sub>0</sub> direction introduce strong layer-specific biases in quantitative ΔR<sub>2</sub>* measurements (Supp. Fig. 1C) (Ogawa et al., 1993; Viessmann et al., 2019; Lauwers et al., 2008). To address these concerns, we conducted necessary corrections for B<sub>0</sub>-orientation, obtained parcel median values and regressed linear-trend thereby mitigating the effect of undersampling large-caliber vessels across ELs (Fig. 2C, Supp. Fig. 1).” 

      Please note, we are currently unable to create BALSA links to the figures due to maintenance issues at the data repository. As a result, we have opted to remove the links:

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the effects of the explicit recognition of statistical structure and sleep consolidation on the transfer of learned structure to novel stimuli. The results show a striking dissociation in transfer ability between explicit and implicit learning of structure, finding that only explicit learners transfer structure immediately. Implicit learners, on the other hand, show an intriguing immediate structural interference effect (better learning of novel structure) followed by successful transfer only after a period of sleep.

      Strengths:

      This paper is very well written and motivated, and the data are presented clearly with a logical flow. There are several replications and control experiments and analyses that make the pattern of results very compelling. The results are novel and intriguing, providing important constraints on theories of consolidation. The discussion of relevant literature is thorough. In summary, this work makes an exciting and important contribution to the literature.

      Weaknesses:

      There have been several recent papers that have identified issues with alternative forced choice (AFC) tests as a method of assessing statistical learning (e.g. Isbilen et al. 2020, Cognitive Science). A key argument is that while statistical learning is typically implicit, AFC involves explicit deliberation and therefore does not match the learning process well. The use of AFC in this study thus leaves open the question of whether the AFC measure benefits the explicit learners in particular, given the congruence between knowledge and testing format, and whether, more generally, the results would have been different had the method of assessing generalization been implicit. Prior work has shown that explicit and implicit measures of statistical learning do not always produce the same results (eg. Kiai & Melloni, 2021, bioRxiv; Liu et al. 2023, Cognition).

      We agree that numerous papers in the Statistical Learning literature discuss how different test measures can lead to different results and, in principle, using a different measure could have led to varying results in our study. In addition, we believe there are numerous additional factors relevant to this issue including the dichotomous vs. continuous nature of implicit vs. explicit learning and the complexity of the interactions between the (degree of) explicitness of the participants' knowledge and the applied test method that transcend a simple labeling of tests as implicit or explicit and that strongly constrains the type of variations the results of  different test would produce. Therefore, running the same experiments with different learning measures in future studies could provide additional interesting data with potentially different results.

      However, the most important aspect of our reply concerning the reviewer's comment is that although quantitative differences between the learning rate of explicit and implicit learners are reported in our study, they are not of central importance to our interpretations. What is central are the different qualitative patterns of performance shown by the explicit and the implicit learners, i.e., the opposite directions of learning differences for “novel” and “same” structure pairs, which are seen in comparisons within the explicit group vs. within the implicit group and in the reported interaction. Following the reviewer's concern, any advantage an explicit participant might have in responding to 2AFC trials using “novel” structure pairs should also be present in the replies of 2AFC trials using the “same” structure pairs and this effect, at best, could modulate the overall magnitude of the across groups (Expl/Impl.) effect but not the relative magnitudes within one group. Therefore, we see no parsimonious reason to believe that any additional interaction between the explicitness level of participants and the chosen test type would impede our results and their interpretation.

      Given that the explicit/implicit classification was based on an exit survey, it is unclear when participants who are labeled "explicit" gained that explicit knowledge. This might have occurred during or after either of the sessions, which could impact the interpretation of the effects.

      We agree that this is a shortcoming of the current design, and obtaining the information about participants’ learning immediately after Phase 1 would have been preferred. However, we made this choice deliberately as the disadvantage of assessing the level of learning at the end of the experiment is far less damaging than the alternative of exposing the participants to the exit survey question earlier and thereby letting them achieve explicitness or influence their mindset otherwise through contemplating the survey questions before Phase 2. Our Experiment 5 shows how realistic this danger of unwanted influence is: with a single sentence alluding to pairs in the instructions of Exp 5, we  could completely change participants' quantitative performance and qualitative response pattern. Unfortunately, there is no implicit assessment of explicitness we could use in our experimental setup. We also note that given the cumulative nature of statistical learning, we expect that the effect of using an exit survey for this assessment only shifts absolute magnitudes (i.e. the fraction of people who would fall into the explicit vs. implicit groups) but not aspects of the results that would influence our conclusions.

      Reviewer #2 (Public Review):

      Summary:

      Sleep has not only been shown to support the strengthening of memory traces but also their transformation. A special form of such transformation is the abstraction of general rules from the presentation of individual exemplars. The current work used large online experiments with hundreds of participants to shed further light on this question. In the training phase, participants saw composite items (scenes) that were made up of pairs of spatially coupled (i.e., they were next to each other) abstract shapes. In the initial training, they saw scenes made up of six horizontally structured pairs, and in the second training phase, which took place after a retention phase (2 min awake, 12 h incl. sleep, 12 h only wake, 24 h incl. sleep), they saw pairs that were horizontally or vertically coupled. After the second training phase, a two-alternatives-forced-choice (2-AFC) paradigm, where participants had to identify true pairs versus randomly assembled foils, was used to measure the performance of all pairs. Finally, participants were asked five questions to identify, if they had insight into the pair structure, and post-hoc groups were assigned based on this. Mainly the authors find that participants in the 2-minute retention experiment without explicit knowledge of the task structure were at chance level performance for the same structure in the second training phase, but had above chance performance for the vertical structure. The opposite was true for both sleep conditions. In the 12 h wake condition these participants showed no ability to discriminate the pairs from the second training phase at all.

      Strengths:

      All in all, the study was performed to a high standard and the sample size in the implicit condition was large enough to draw robust conclusions. The authors make several important statistical comparisons and also report an interesting resampling approach. There is also a lot of supplemental data regarding robustness.

      Weaknesses:

      My main concern regards the small sample size in the explicit group and the lack of experimental control.

      The sample sizes of the explicit participants in our experiments are, indeed, much smaller than those of the implicit participants due to the process of how we obtain the members of the two groups. However, these sample sizes of the explicit groups are not small at all compared to typical experiments reported in Visual Statistical Learning studies, rather they tend to be average to large sizes. It is the sizes of the implicit subgroups that are unusually high due to the aforementioned data collecting process. Moreover, the explicit subgroups have significantly larger effect sizes than the implicit subgroup, bolstering the achieved power that is also confirmed by the reported Bayes Factors that support the “effect” or the “no effect” conclusions in the various tests ranging in value from substantial to very strong.  Based on these statistical measures,  we think the sample sizes of the explicit participants in our studies are adequate.

      As for the lack of experimental control, indeed, we could not fully randomize consolidation condition assignment. Instead, the assignment was a product of when the study was made available on the online platform Prolific. This method could, in theory, lead to an unobserved covariate, such as morningness, being unbalanced between conditions. We do not have any reasons to believe that such a condition would critically alter the effects reported in our study, but as it follows from the nature of unobserved variables, we obviously cannot state this with certainty. Therefore, we added an explicit discussion of these potential pitfalls in the revised version of the manuscript.

      Reviewer #3 (Public Review):

      In this project, Garber and Fiser examined how the structure of incidentally learned regularities influences subsequent learning of regularities, that either have the same structure or a different one. Over a series of six online experiments, it was found that the structure (spatial arrangement) of the first set of regularities affected the learning of the second set, indicating that it has indeed been abstracted away from the specific items that have been learned. The effect was found to depend on the explicitness of the original learning: Participants who noticed regularities in the stimuli were better at learning subsequent regularities of the same structure than of a different one. On the other hand, participants whose learning was only implicit had an opposite pattern: they were better in learning regularities of a novel structure than of the same one. This opposite effect was reversed and came to match the pattern of the explicit group when an overnight sleep separated the first and second learning phases, suggesting that the abstraction and transfer in the implicit case were aided by memory consolidation.

      These results are interesting and can bridge several open gaps between different areas of study in learning and memory. However, I feel that a few issues in the manuscript need addressing for the results to be completely convincing:

      (1) The reported studies have a wonderful and complex design. The complexity is warranted, as it aims to address several questions at once, and the data is robust enough to support such an endeavor. However, this work would benefit from more statistical rigor. First, the authors base their results on multiple t-tests conducted on different variables in the data. Analysis of a complex design should begin with a large model incorporating all variables of interest. Only then, significant findings would warrant further follow-up investigation into simple effects (e.g., first find an interaction effect between group and novelty, and only then dive into what drives that interaction). Furthermore, regardless of the statistical strategy used, a correction for multiple comparisons is needed here. Otherwise, it is hard to be convinced that none of these effects are spurious. Last, there is considerable variation in sample size between experiments. As the authors have conducted a power analysis, it would be good to report that information per each experiment, so readers know what power to expect in each.

      Answering the questions we were interested in required us to investigate two related but separate types of effects within our data: general above-chance performance in learning, and within- and across-group differences.

      Above-chance performance: As typical in SL studies, we needed to assess whether learning happened at all and which types of items were learned. For this, a comparison to the chance level is crucial and, therefore, one-sample t-test is the statistical test of choice. Note that all our t-tests were subject to experiment-wise correction for multiple comparisons using the Holm-Bonferroni procedure, as reported in the Supplementary Materials.

      Within- and across-group differences: To obtain our results regarding group and par-type differences and their interactions, we used mixed ANOVAs and appropriate post-hoc tests as the reviewer suggested. These results are reported in the method section.

      Concerning power analysis, in the revised version of the manuscript we added analysis of achieved power for the statistical tests most critical to our arguments.

      (2) Some methodological details in this manuscript I found murky, which makes it hard to interpret results. For example, the secondary results section of Exp1 (under Methods) states that phase 2 foils for one structure were made of items of the other structure. This is an important detail, as it may make testing in phase 2 easier, and tie learning of one structure to the other. As a result, the authors infer a "consistency effect", and only 8 test trials are said to be used in all subsequent analyses of all experiments. I found the details, interpretation, and decision in this paragraph to lack sufficient detail, justification, and visibility. I could not find either of these important design and analysis decisions reflected in the main text of the manuscript or in the design figure. I would also expect to see a report of results when using all the data as originally planned.

      We thank the reviewer for pointing out these critical open questions our manuscript that need further clarification. The inferred “consistency effect” is based on patterns found in the data, which show an increase in negative correlation between test types during the test phase. As this is apparently an effect of the design of the test phase and not an effect of the training phase, which we were interested in, we decided to minimize this effect as far as possible by focusing on the early test trials. For the revised version of the manuscript, we revamped and expanded the discussion of how this issue was handled and also add a short comment in the main text, mentioning the use of only a subset of test trials and pointing the interested reader to the details.

      Similarly, the matched sample analysis is a great addition, but details are missing. Most importantly, it was not clear to me why the same matching method should be used for all experiments instead of choosing the best matching subgroup (regardless of how it was arrived at), and why the nearest-neighbor method with replacement was chosen, as it is not evident from the numbers in Supplementary Table 1 that it was indeed the best-performing method overall. Such omissions hinder interpreting the work.

      Since our approach provided four different balanced metrics (see Supp. Tables 1-4) for each matching method, it is not completely straightforward to make a principled decision across the methods. In addition, selecting the best method for each experiment separately carries the suspicion of cherry-picking the most suitable results for our purposes. For the revised version, we expanded on our description of the matching and decision process and added supplementary descriptive plots showing what our data looks like under each matching method for each experiment. These plots highlight that the matching techniques produce qualitatively roughly identical results and picking one of them over the other does not alter the conclusions of the test. The plots give the interested reader all the necessary information to assess the extent our design decisions influence our results.

      (3) To me, the most surprising result in this work relates to the performance of implicit participants when phase 2 followed phase 1 almost immediately (Experiment 1 and Supplementary Experiment 1). These participants had a deficit in learning the same structure but a benefit in learning the novel one. The first part is easier to reconcile, as primacy effects have been reported in statistical learning literature, and so new learning in this second phase could be expected to be worse. However, a simultaneous benefit in learning pairs of a new structure ("structural novelty effect") is harder to explain, and I could not find a satisfactory explanation in the manuscript.

      Although we might not have worded it clearly, we do not claim that our "structural novelty effect" comes from a “benefit” in learning pairs of the novel structure. Rather, we used the term “interference” and lack of this interference. In other words, we believe that one possible explanation is that there is no actual benefit for learning pairs of the novel structure but simply unhindered learning for pairs of the novel structure and simultaneous inference for learning pairs of the same structure. Stronger interference for the same compared to the novel structure items seems as a reasonable interpretation as similarity-based interference is well established in the general (not SL-specific) literature under the label of proactive interference.

      After possible design and statistical confounds (my previous comments) are ruled out, a deeper treatment of this finding would be warranted, both empirically (e.g., do explicit participants collapse across Experiments 1 and Supplementary Experiment 1 show the same effect?) and theoretically (e.g., why would this phenomenon be unique only to implicit learning, and why would it dissipate after a long awake break?).

      Across all experiments, the explicit participants showed the same pattern of results but no significant difference between pair types, probably due to insufficiency of the available  sample sizes. We already included in the main text the collapsed explicit results across Experiments 1-4 and Supplementary Experiment 1 (p. 16).  This analysis confirmed that, indeed, there was a significant generalization for explicit participants across the two learning phases. We could re-run the same analysis for only Experiment 1 and Supplementary Experiment 1, but due to the small sample of  N=12 in Suppl. Exp. 1, this test will be likely completely underpowered. Obtaining the sufficient sample size for this one test would require an excessive number (several hundreds) of new participants.

      In terms of theoretical treatment, we already presented our interpretation of our results in the discussion section, which we expanded on in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It would be very useful to add individual data points (and/or another depiction of the distribution) to the bar plots. If not in the main figures, as added figures in the supplement.

      We added violin plots for all results in the Supplementary.

      (2) It would be helpful to include in the supplement some examples of responses that led to the 'explicit' or 'implicit' classification. Specifically, what kind of response was considered to contain a partial recognition of the underlying structure vs. no recognition?

      We added example responses used for classification in the Supplementary.

      (3) It would be useful to show the results of Experiment 5 as well as the diagonal version as supplemental figures.

      We added the requested figures in the Supplementary.

      Typos: page 10: "in in the tests", page 15: "rerun"

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      (1) My strongest reservation relates to the small sample size in the explicit group. The authors do report stats for all experiments together in one analysis and I think this is the only robust finding for this group. I would suggest removing any comparisons between this smaller group and the larger implicit group since they do not make a lot of sense due to the imbalance in sample size in my opinion. If they do want to report the explicit group individually for each experiment, they should at least test for differences between the experiments also for this group using ANOVA.

      We do agree that the unbalanced nature of the sample sizes can be problematic for the between-group comparisons. The t-tests reported for between-group comparisons are in fact Welch’s t-test better suited for unequal sample sizes and variances. Previously, we failed to report that these t-tests were Welch’s t-test, which we fixed in the revised version.

      In the Supplementary, we previously reported an ANOVA including all explicit participants from all experiments. This showed a significant main effect of Experiment and test type, but no significant interaction. We take this as evidence that although specific levels of learning vary by experimental condition, the overall pattern of learning (i.e. which pairs are learned better) are the same across all experiments.

      (2) Moreover, the explicit group does not only differ in the explicitness of their memory but also regarding learning performance per se (as evidenced by performance differences for the first training). This important confound needs to be acknowledged and discussed more thoroughly!

      We agree that this topic is important, this is why the subsection “The Type of Transfer Depends on Quality of Knowledge, Not Quantity of Knowledge” deals exclusively with this issue. See our reply to the next point.

      (3) The resampling approach is somewhat interesting to solve the issue raised in 2. However, I doubt that the authors actually achieve what they are claiming. Since we have a 2-AFC task the possibility must be considered that participants who chose correctly in the implicit group did so by chance. This means that the assumption that the matched pairs actually have the same amount of memory for the first training period as the explicit group is likely false. Therefore, this analysis is still comparing apples and oranges.

      We address this idea in detail in the supplementary materials pointing out first that the matched results showed the same pattern as the full results suggesting that Phase 1 and Phase 2 results are independent for this group, and by arguing that randomly selected subset of participants should not show a significant deviation from null performance in the Same vs. Novel performance in Phase 2.

      (4) One important issue, when conducting online experiments is assuring random allocation of participants. How did the authors recruit participants to ensure they did not select participants for the different experiments that differed regarding their preference for wake vs. sleep retention intervals? If no care was taken in this regard, I would suggest reporting this and maybe briefly discussing it.

      This shortcoming was now reported and addressed in the discussion section of the revised manuscript.

      (5) I could not find any information about the exact questions that were asked about the task rules. Also, there was no information on how the answers were used to assign groups. Both should be added.

      The exact questions were added to the revised Supplementary.

      (6) I think that the literature on sleep and rule extraction is well-represented in the manuscript. However, I think also referring more thoroughly to the literature on how sleep leads to gist extraction, schemas, and insight would help understand the relevance of the present research.

      We subsumed references to the mentioned areas of research under the labels of abstraction and generalization. In the revised section, we listed the appropriate labels along with the already used references to make the connection to a vast literature treating generalization in related but distinct ways more explicit.

      (7) It is unclear to me why the items learned in the first learning phase interfere with those learned in the second learning phase (without sleep) and not vice versa. What is the author's explanation for this?

      We added a paragraph on this to our revised discussion section. In short, there may also be retroactive interference. However, we would need yet another variation of the paradigm to properly measure it, and this was outside the scope of the current work.

      (8) As far as I can tell the study lacks all of the usual control tasks that are used in the field of sleep and memory (especially subjective sleepiness and objective vigilance). In addition, this research has the circadian confound, and therefore additional controls would have been warranted, e.g., morningness-eveningness, retrieval capabilities. Also, performance immediately after training phase 1 was not tested, which would serve as an important control for circadian differences in initial learning of the rule.

      The study uses a number of the control measures established in the sleep and memory literature, such as habitual sleep quality and sleep quality during the night of and the night before the experiment. However, there are, of course, more potentially interesting measures, such as the ones named by the reviewer.

      Testing performance right after training phase 1 would have been very interesting indeed. However, due to the nature of statistical learning tasks, this would have completely confounded the implicitness of learning by presenting participants with segmented input; i.e. isolated pairs. Therefore, we opted for the lesser of two evils in our design decision.

      (9) As far as I can tell, there is no effect of sleep on correctly identifying pairs from training phase 1. This would be expected and thus should be discussed.

      As noted and referenced in the discussion section, the effect of sleep on statistical learning per se is a subject of controversy in the literature, where some studies apparently find effects, while others find no effect on statistical learning whatsoever.

      (10) The manuscript should explicitly mention if the study was preregistered.

      It was not.

      Reviewer #3 (Recommendations For The Authors):

      The topic of this project is close to my heart, and I commend the authors for conducting numerous variations of the experiment with large sample sizes. I have some suggestions I feel will make the paper stronger, and a few minor comments that caught my eye during reading:

      (1) First and foremost, I found the paper's structure cumbersome. For instance, different aspects of Experiment 1 results are reported in (1) the main text, (2) under methods, and (3) in Supplementary. This makes reading unnecessarily difficult. This relates not only to the analysis results - the sample size is reported as 226 in the main text, 226+3 in Methods, and 226+3+19 in Supplementary. I strongly suggest removing all results from the Methods section and merging the supplementary results with the main results.

      We overhauled the structure of the paper, moving much more information into the proper method section and out of the Supplementary.

      (2) "Attention checks" and "response bias" appear first in Supplementary Experiment 1 but are explained only later under Experiment 1. The same thing for the experimental procedure. I therefore suggest placing Experiment 1 before Supplementary Experiment 1, but related to my previous comment - have one paragraph dedicated to Subject Exclusion of all experiments.

      The new structure of the Method sections solves this.

      (3) Figure 4 is mentioned but does not appear in the manuscript.

      This has been fixed. The paragraph in question now references the correct supplementary figure.

      (4) OSF project includes only data with no README file on how to understand the data. The work would also benefit from sharing the experimental and analysis codes.

      A README file was added.

      (5) This sentence is repeated in relation to four experiments: "Bayes Factors from Bayesian t-tests for implicit participants reported for experiments 1, 2, and 3 used an r-scale parameter of 0.5 instead of the default √2/2, reflecting that Experiment 1 found small effect sizes for this group". First, it is missing an explanation of what the r-scale means. Second, it sounds as if this was a product of the procedure, but in fact it was a decision by the researcher if I am correct. If so, it is missing a description of how and why this choice was made.

      This was indeed a decision by the researchers, in line with a Baysian logic of evidence accumulation. We made the explanation in the paper clearer.

      (6) Did I understand correctly that each pair was tested 4 times? Was it against the same foil? Did you make sure not to repeat the same pair in back-to-back trials? These details, in addition to what I noted in the public review, are needed.

      Each pair was tested 4 times. Each time against a different foil pair. Details have been added to the Method section.

      (7) Also in relation to my public review, I could not understand why the sample size was overshot by so much in Experiment 1 (229 instead of 198.15)?

      The calculated sample size of 198.15 was for the implicit subgroup alone, while 229 included explicit and implicit participants.

      (8) The correlation between phase 1 and phase 2 is only tested in explicit participants. Why is that? A test in implicit participants is needed for completeness.

      Correlations for implicit participants have been added.

      (9) There is known asymmetry between the horizontal and vertical plains in our visual system (with preference for horizontal stimuli). I was missing a comparison between learning in the two structures, and a report of how many participants received either in Phase 1.

      The allocation of participants to horizontal and vertical conditions was balanced. In the Method section we already report an ANOVA testing for a potential effect of orientation condition, which was not significant.

      Minor/aesthetic comments:

      (1) "In Phase 2, explicit participants performed above chance for learning pairs that shared their higher level orientation structure with that of pairs in Phase 1". This sounds as if there was a separate test following the two learning phases. Perhaps reword to "for phase 2 pairs".

      Fixed

      (2) "the two asleep-consolidation groups (Exp. 3 and 4)" - I think you mean Exp. 2 and 4.

      Fixed.

      (3) "acquiring explicitness in Experiment 5 as compared to 1" I think you mean Supplementary Experiment 1 as compared to 1.

      Fixed

      (4) "without such a redescription, the previously learned patterns in Phase 1 interfere with new ones in Phase 2, when redescription occurs..." The comma should be a dot.

      Fixed

      (5) In Experiment 4, did 168 or 169 participants survive exclusion? Both accounts exist, and so do reports of degrees of freedom that allow both 23 and 24 explicit participants.

      Fixed.

      (6) "Implicit learners also performed above chance.." in Experiment 2 is missing (n=XX).

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      We are grateful to the reviewers and the editorial team for their feedback and thorough revisions of our paper. We also appreciate their acknowledgement that this study represents a significant advancement in the field of reproductive neuroendocrinology and offers insights on the contribution of obesity vs melanocortin signaling in women’s fertility. In the revised version, we will provide a more detailed clarification of the data and methodology and adhere to the reviewers’ suggestions.

      Please find below our answers to specific concerns in the public review:

      Given the fact that mice lacking MC4R in Kiss1 neurons remained fertile despite some reproductive irregularities, the overall tone and some of the conclusions of the manuscript (e.g., from the abstract: "... Mc4r expressed in Kiss1 neurons is required for fertility in females") were overstated. Perhaps this can be described as a contributing pathway, but other mechanisms must also be involved in conveying metabolic information to the reproductive system.

      We will tone down these statements throughout the manuscript to indicate that MC4R in Kiss1 neurons plays a role in the metabolic control of fertility (rather than “…is required for fertility”)

      The mechanistic studies evaluating melanocortin signalling in Kiss1 neurons were all completed in ovariectomised animals (with and without exogenous hormones) that do not experience cyclical hormone changes. Such cyclical changes are fundamental to how these neurons function in vivo and may dynamically alter the way they respond to neuropeptides. Therefore, eliminating this variable makes interpretation difficult.

      Mice lack true follicular and luteal phases and therefore it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate a LH surge with an E2-replacement regimen [1]. This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Inclusion of cycling females would make interpretation much more difficult.

      (1) Bosch et al., 2013 Mol & Cell Endo; https://doi.org/10.1016/j.mce.2012.12.021

      Use of the POMC-Cre to target ontogenetic inputs to Kiss1 neurons might have targeted a wider population of cells than intended.

      POMC is transiently expressed during embryonic development in a portion of cells fated to be Kiss1 or NPY/AgRP neurons [1-2]. Therefore, this is a valid concern when crossing with a floxed mouse. However, use of AAVs in adult animals avoids this issue and leads to specific expression in POMC neurons [3]. This POMC-Cre mouse has been used extensively with AAVs to drive specific expression in POMC neurons by other laboratories [4-7]. Therefore, we are confident that our optogenetic studies have narrowly targeted POMC inputs.

      (1) Padilla et al., 2010 Nat Med; https://doi.org/10.1038/nm.2126

      (2) Lam et al., 2017 Mol Metab; https://doi.org/10.1016/j.molmet.2017.02.007

      (3) Stincic et al., 2018 eNeuro; https://doi.org/10.1523/eneuro.0103-18.2018

      (4) Fenselau et al., 2017 Nat Neuro; https://doi.org/10.1038/nn.4442

      (5) Rau & Hentges, 2019 J Neuro; https://doi.org/10.1523/jneurosci.3193-18.2019

      (6) Fortin et al., 2021 Nutrients; https://doi.org/10.3390/nu13051642

      (7) Villa et al., 2024 J Neuro; https://doi.org/10.1523/jneurosci.0222-24.2024

      Recommendations for Authors

      We thank the reviewers and the editorial team for their comments and thorough revisions of our paper. We have now addressed their comments and edited the manuscript accordingly:

      Reviewer #1 (Recommendations For The Authors):

      L80 -This is an awkward sentence; it isn't an inverse agonist of the AgRP; this may read better just to say that the inverse agonist, AgRP.

      Thank you for this comment. This has now been changed in the text (L80).

      L86 - This text reads as if mice have an inherent obesity issue.

      This has also now been addressed in the text (L86).

      L131 - The numbers of digits past the decimal point should match for both mean and SEM.

      This has also now been addressed throughout the text.

      Figure 1D: Revise the bar graphs with distinct SEM bars, as these data are not generated within the same mice.

      The graphs are now changed, and they include distinct SEM and individual data points.

      Figure 2I-L - An n of 3 for controls is pretty minimal, though the clustering of data points is tight.

      We thank the reviewer for this comment, and we emphasize that while we agree that an n=3 for controls is minimal, the mRNA level values of this group are close, therefore the clustering of the data points is tight. We are happy to provide the raw data value for these groups if the reviewer wishes to.

      L159 - The role of reduced dynorphin mRNA is pretty speculative with regard to basal levels of LH, especially since no other indices of LH secretion were affected. It should also be recognized that mRNA levels do not always equate to activity.

      We agree with the reviewer that our explanation of the role of the reduced dynorphin with regards to the elevated basal LH is speculative, however, we only report that the higher LH levels correlates with the lower expression of the Pdyn gene expression, which is in line with the well documented role of Dynorphin on inhibiting LH secretion. We also recognize that mRNA levels don’t necessarily reflect activity. We have now added this statement to the text (L159).

      L164 - Given the ovary data, it seems that the increase seen in KO mice isn't quite sufficient, but is it known how much of a surge is necessary for ovulation in mice?

      We agree with the reviewer’s comment that the LH surge in Kiss1MC4RKO group is not enough to consistently induce ovulation, which is supported by the decrease in the numbers of corpora lutea data (Figure 2, O).

      According to literature, an LH surge in the female mice is estimated by a LH value >4 ng/ml (Bahougne et al., 2020). According to this rule, our data show that only two females out of six had LH surge in the KO group, while four females out of five had LH surge in the control group.  

      L211 - According to the figure, LH pulses were not recovered and remained similar to KO levels. Looking at the LH secretory patterns presented, it seems like the pulse frequency data should be interpreted with some caution, given that some of the pulses identified are tenuous at best.

      We agree that the LH pulses identified by our software (criteria described in the methods) are variable in shape (LH pulses are difficult to detect clearly in gonad intact females) and did not differ in number between groups; however, the reinsertion of Mc4r within Kiss1 neurons restored LH basal levels, amplitude and total secretory mass, which are clear indicatives of a significant improvement in the ability of these mice to release LH.

      L218 - Is there a reason why the surge was not looked at in these groups?

      Ovarian histology is the best indicator of ovulation. In these mice, corpora lutea were absent, indicating impaired ovulation, thus, we did not consider performing an LH surge protocol was necessary.

      L244 - This would also fit with previous findings in sheep that not all Kiss neurons express MC receptors

      We agree with this comment.

      L329 - Given the rapidity of its actions, how would this membrane ER function during a normal surge?

      Rapid estrogen signaling can act to ease transitions between states. Membrane delimited E2 actions can quickly attenuate or enhance coupling between receptors and signaling cascades. These effects will precede E2-driven changes in gene expression that produce more stable alterations in signaling. This combination of mechanisms will reduce any lag between rises in serum E2 and physiological effects. Considering the abbreviated mouse reproductive cycle, parallel mechanisms acting on different timescales are particularly important.

      L365 - I'm a little confused as to how this particular work sheds light on a role for MC3R. Is the relative distribution of the two isoforms within Kiss neurons known?

      In the present study, we report that hypothalamic Mc3r expression decreases leading up to the age of puberty onset (p30), in line with the profile of expression of Mc4r and a recent publication involving Mc3r in puberty onset (Lam et al., 2021), suggesting that both receptors may be involved in the control of reproductive function, potentially through the direct regulation of Kiss1 neurons as characterized in our present study.

      L422 - While I understand the nature of this statement, the receptor may simply reflect the activity of what binds to it, i.e., AgRP vs. alpha-MSH, suggesting that maybe the prepubertal period is more AgRP-dominated.

      We agree with this statement, and this needs to be further investigated.

      L495 - Reinsertion of Mc4R in Kiss1 neurons

      Thank you for this comment. This is now corrected in the text (L501).

      L524 - Bilateral ovariectomy of 6-month

      Thank you for this comment. This is now corrected in the text (L530).

      L538 - Is it known what stage of the cycle these mice were in when samples were collected?

      Yes, the samples were collected in diestrus. This is now mentioned in the text (L548)

      L556 - Pulse amplitude is usually measured relative to the preceding nadir.

      The method that we have been consistently using in our lab is the average of the 4 highest LH values in the samples collection period for each animal. We have found this to be consistent and representative of the overall amplitude (McCarthy et al., 2021; Talbi et al., 2021).

      L594 - This is a little confusing - the whole MBH would contain the ARH, but only the ARH was collected from the KO mice. If the whole MBH, dynorphin and Tac3, and Tac3 are expressed outside of the ARC, making interpretation of changes specifically within the ARH is difficult.

      Here (L592), we describe two different experiments, as mentioned by i) and ii).

      For experiment 1 (i): MBH was used in the WT mice at ages P10, P15, P22 and P30 to investigate the expression of the melanocortin genes (Agrp, Pomc, Mc3r and Mc4r).

      For experiment 2 (ii): In both KO and control groups, only the micro-dissected ARH was used to investigate genes expressions of Pdyn, Kiss1, Tac2, Tacr3.

      Reviewer #2 (Recommendations For The Authors):

      The validation experiments for the various manipulations are currently presented in the supplementary data. Still, in my opinion, these are critically important for interpreting the data, and it should be considered to present these more comprehensively in the main body of the manuscript. In Figure S1, it seems that the exposure of the two images is not the same, with a higher background in the control. Has this image been adjusted to highlight the staining, while the other has not? It looks like there remains a low level of expression still present in at least some of the KO cells - this may reflect difficulties using RNAscope (with its extreme amplification) to detect the absence of a signal, or it could also be that the knockout is incomplete. A percentage of cells still express MC4R. I think this should be acknowledged or discussed.

      We thank the reviewer for the feedback. While we agree that the validation of the mouse model is critical, we would like to keep it in the supplemental data.

      We also agree that the exposure looks different between the KO and WT controls, and we thank the reviewer for this comment. The quality of the photograph decreased when transferring to the manuscript. This has now been improved in the revised figure.

      As for the MC4R expression in some of the KO cells, we believe that MC4R is expressed in non Kiss1 cells as shown in the merged figure. Therefore, we believe that the Knockout of Mc4r in Kiss1 neurons is complete in these mice.

      The clear difference from the PVN's lack of effect is convincing and indicates that a specific knockout has been achieved. Is equivalent data also available for the AVPV population of cells that are examined later in the manuscript? Do those Kiss1 neurons also express the MC4R? The same question applies to the knock-in experiment: Was the expression of MC4R also driven in the AVPV population using this approach

      Yes, Kiss1 neurons in the AVPV also express MC4R as indicated in this study, and thus Mc4r is removed/reinserted in the AVPV as well in this mouse model.

      The quantitative RT-qPCR data on developmental changes in metabolic signaling molecules are really peripheral to the paper's main question. Relative to the validation experiments (as discussed above), I think these are less important data and could be placed into a supplementary figure. The discussion of these data becomes problematic, e.g., on line 359, the changes are described as "a low melanocortin tone..." but this seems problematic when referring to reduced expression of AgRP, an inverse agonist at the MC4R. If you are going to present these data, individual data points should be shown. Similarly, the question about whether this is a PCOS-like phenotype is perhaps worth asking. Still, the simple assessment of T and AMH could also be reported in a sentence without necessarily showing the data (or placing it in a supplementary figure). Better to focus on the key question - which is the role of MC4R signaling in Kiss1 neurons.

      We understand this reviewer’s concerns, however, due to the impact of MC4R signaling (particularly in the context of AgRP) on puberty, we strongly believe that the reader will benefit from expression profile across ages so we will respectfully disagree and keep in the main figure.  

      Per this reviewer’s comment, we have now added individual data points to Figure 1D.

      We also agree with the reviewer that the T and AMH data are not in the main scope of the paper, but since we uncovered a PCOS-like phenotype in female mice with specific deletion of Mc4r from Kiss1 neurons, it is important to keep these data in the main figure to show that the phenotype does not fully resemble a PCOS model.

      Having praised the experimental design, I think it is fair to acknowledge that the reproductive data from these experiments remain difficult to interpret. I understand that it is difficult to illustrate estrous cycles, but the "quantitative" data on percentages of time spent in any one stage are not as informative as seeing the actual individual patterns in Figure 2B. Were all of the animals consistently like the one illustrated, with persistent diestrus and only occasional evidence of ovulation?

      We agree that Figure 2C may be difficult to interpret but it is the best way to capture the all the data points for each group.

      All the 5 Kiss1MC4RKO females had persistent diestrus phases with only one or two estrus phases over 15 days (except for one female who had 4 estrous days), compared to control females who had 7 to 9 days of estrous, as shown in the graph (except for one female who had 5 days of estrus over 15 days period).

      Given that LH pulses appear to be normal, does this, in fact, suggest an ovarian problem? Is that possible? Are MC4R and Kiss1 co-expressed in the ovary? Or do you think this suggests an ovulation problem, perhaps driven by the impaired LH surge?

      This reviewer is correct in that our findings suggest a central defect in ovulation based on the deficit observed in the preovulatory LH surge. Thus, it is possible to have normal LH pulses, which are driven by one population of Kiss1 neurons (ARH) and the LH surge, driven by a distinct population of Kiss1 neurons (AVPV).

      Similarly, the response to the "LH surge induction protocol" is impaired (why not look at endogenous LH surges?). It seems that ovulation should be an all-or-none phenomenon in that if the LH surge is sufficient to induce ovulation, then all available follicles would be ovulated. If it is not, then no follicles will be ovulated. Why fewer follicles are ovulated in the gene-targeted animals seems more likely to be due to impaired follicular development rather than a subthreshold LH surge. So, this again points back to the ovary. Or perhaps we need a more thorough assessment of the pattern of LH pulses throughout the cycles in these animals.

      An LH surge induction protocol allows us to submit all female mice to the same conditions and expect a similar response, which is then optimal to compare with animals with an expected ovulation deficit, as it eliminates   external factors. We disagree in that ovulation is an all-or-none phenomenon because in mice numerous follicles mature at the same time and thus a decrease in the number of ovulated oocytes may be significant between groups even if the animals are not completely infertile.

      Collectively, my assessment of these data is that there are effects on reproduction, but they are actually relatively subtle. There were abnormal cycles and impaired LH surge in response to exogenous estrogen. But the animals are not actually infertile, so can ovulate and express normal reproductive behavior. So while there is a role for MC4R signalling in Kiss1 neurons, it may be a contributing modulatory role rather than a major regulatory mechanism. I think the tone of the descriptions should reflect this. I like the way it is framed in some parts of the discussion ("reproductive impairments...mediated by MC4R in Kiss1 neurons and not by their obese phenotype"), but the overall significance of this is overstated in some places, such as the abstract and in other parts of the discussion ("this population is tightly controlled by melanocortins").

      As mentioned in previous responses, ovulation in mice is not all-or nothing, so while the mice can reproduce, the disruption in the central mechanisms that control ovulation and irregular estrous cycles are a significant advancement in the field with strong translational potential to species where only one oocyte is usually ovulated, like in humans, where reproductive disorders in MC4R patients had been attributed to the obesity phenotype rather than to a central action of MC4R (as the reviewer captured in their comment). This is one of the main findings of this study.

      The overstatement has been now addressed throughout the text.

      For in vitro studies, all mice were ovariectomized and given estradiol "replacement." What was the rationale for this? Wouldn't this suppress the basal activity of these neurons? Then it appears that some of the animals were studied as ovariectomised (for an unspecified time but apparently ">7 days", without hormone replacement. The basal activity of these cells would be dramatically different. I think these artificial manipulations make these data quite difficult to interpret. How does this reflect the situation in a normal (or abnormal) estrous cycle? My understanding is that the brain slice approach already compromises the ability of this population of cells to function as a coordinated network (i.e., coordinated episodes of activity that are seen in vivo have not been observed in vitro in brain slices). Ovariectomizing and providing exogenous hormones also removes the additional regulatory elements of the cyclical changes in hormone inputs, so the cells may or may not behave like they would in vivo. Perhaps the authors could justify their choice of experimental model.

      We have clarified that the mice were ovariectomized for 7-10 days. A group of 3 mice are OVXed at once and then used on subsequent days a week later. This delay is both for the recovery of the animal and to allow for “washout” of endogenous ovarian hormones. For optogenetic studies, we were not measuring basal activity. Rather, we prioritized the ability to detect a postsynaptic response. While E2 decreases the networked activity of Kiss1- ARH neurons, the Hcn channels, calcium channels, and Vglut2 expression are all increased. This leads to increased excitability and more glutamate release. Mice lack true follicular and luteal phases and therefore it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate a LH surge with an E2-replacement regimen (Bosch et al., J Mol Cell Endocrinology 2013). This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Finally, we have documented that Kiss1<sup>ARH</sup> neurons retain the synchronization of their neuronal firing in the hypothalamic slice preparation (Qiu et al., eLife 2016).

      Figure 4E shows neurons' staining after expressing a Cre-dependent channel rhodopsin vector into POMC-Cre mice. The number of labelled cells looks markedly larger than expected for adult POMC neurons. Was the specificity of this approach to neurons expressing POMC checked? I understand that the POMC-Cre mice have been criticised for ectopic expression of Cre during development in other populations of neurons in the arcuate nucleus that does not express POMC, such as the AgRP neurons (e.g., PMID: 22166984). Is it possible that this is not a problem in adult animals? Has that been validated in these animals? The description of the method suggests that it is acknowledged that some of the expression driven in these animals might be in AgRP neurons. Still, optogenetic activation of these cells will include all cells expressing Cre at the time of AAV administration.

      POMC is transiently expressed during embryonic development in a portion of cells fated to be Kiss1 or NPY/AgRP neurons. Therefore, this is a valid concern when crossing with a floxed mouse. However, use of AAVs in adult animals avoids this issue and leads to specific expression in POMC neurons. This POMC-Cre mouse has been used extensively with AAVs to drive specific expression in POMC neurons by other laboratories (Padilla et al., Nat Med 2010; Lam et al., Mol Metab 2017; Stincic et al., eNeuro 2018 eNeuro; Fenselau et al., Nat Neuro 2017). We have previously shown that AAV-driven mCherry expression is limited to cells labeled with a beta-endorphin antibody (Stincic et al., 2018 eNeuro). Therefore, we are confident that our optogenetic studies have narrowly targeted POMC inputs.

      Some additional explanation of the electrophysiology result may be required. For example, on Line 292, I'm confused by Fig 4M. Why is the response to 20Hz stimulation different in this cell (compared to the one in 4L) before administering naloxone? What proportion of cells showed this opposite response? On line 307: Is 5 cells sufficient for testing the POMC inputs onto AVPV and PeN Kiss1 neurons? How many slices/animals are included in collecting these 5 cells? The rapid action of STX illustrates the ability to modulate the response to MTII, but I am struggling to understand the implications of this in a physiological context. Suppose this response is desensitized by longer-term treatment with E2 (as indicated in the manuscript). Is it relevant to normal regulation during the cycle (particularly in the AVPV, where the key regulatory step seems to be the prolonged exposure to high estradiol as part of the preovulatory signals leading up to the LH surge)?

      As stated in the text, E2 has been shown to increase POMC expression and beta-Endorphin immunostaining. We do not know the effects of E2 on aMSH expression and release. E2 also tends to attenuate the coupling between inhibitory postsynaptic metabotropic (Gi,o-coupled) receptors and signaling cascades. So, there is likely a combination of pre- and post-synaptic mechanisms contributing to these responses. However, the focus of the current studies was on the predominant melanocortin signaling and, as such, we chose to eliminate the influence of opioid signaling. We have added two more cells to this group, both of which were successfully rescued for a total of 5 of 6 cells (6 slices, 5 animals). Between the labeling of b-endorphin fibers and high rate of rescue, we do believe that this is sufficient evidence to support a direct POMC input to Kiss1<sup>AVP/PeN</sup> neurons.

      Line 52: "Here, we show that Mc4r expressed in Kiss1 neurons is required for fertility in females." The knockout animals remain fertile, so this conclusion needs to be re-worded.

      Thank you for this comment. This has now been changed (L52).

      Line 80: "The melanocortin 4 receptor (MC4R) binds α-melanocyte stimulating hormone (αMSH), an agonist product of the pro-opiomelanocortin (Pomc) gene, and the inverse agonist of the agouti-related peptide (AgRP) to regulate food intake and energy expenditure" Is this the correct wording? I think it should be stated that AgRP is an inverse agonist at the MC4R, not that αMSH is the inverse agonist of AgRP. Re-work this sentence.

      Thank you for this comment. This has now been changed (L79-80).

      Line 88: "... however, conflicting reports exist". Describe what these conflicting reports show. Many MC4 variants ("mutations") are expressed in humans, but few will fully inactivate signalling like the mouse knockout.

      We thank the reviewer for this comment. By conflicting data, we refer to the studies that report no reproductive impairments in women with MC4R mutations. Either because the metabolic impairments (obesity, hyperphagia, hyperinsulinemia, hyperleptinemia, etc) are so strong that the focus is skewed to these issues, without a full reproductive assessment in these women, or simply because the reviewer mentioned, not all MC4R mutations fully inactivate its signaling in humans - as opposed to mouse models where reproductive disruption has been described previously in full body MC4RKOs.

      Line 91: "...that largely affects females". Is this a genuine sex difference, or are reproductive deficits simply more overt in female rodents? I think the Coss paper (reference 19 in the manuscript) showed a greater effect of diet-induced obesity in males than in females.

      We believe that sex differences exist with regards to the role of MC4R in the regulation of fertility, as we show that most of this effect is mediated by MC4R signaling in Kiss1 AVPV neurons, a neuronal population that is specific to the female brain.

      As far as we can tell, the Coss paper (Villa et al., 2024) has only tested males but not females. Moreover, they investigated the effect of diet induced obesity in mice on their fertility (specifically LH secretion), while in this study we are specifically looking at the deletion of MC4R from Kiss1 neurons, and these mice were not obese (Figure 2A). While both these conditions induce impaired fertility, the mechanisms and signaling pathways are different (our mice lack MC4R signaling while the obese mice have a decrease in MC4R expression but the signaling is still functional).

      Line 392: also Hessler et al. PMID: 32337804.

      This reference is now added to the text (Line 393).

      Line 433. The discussion of how advanced puberty onset (seen in the Kiss1-specific KO animals) might be caused by MC4R signalling in AVPV Kiss1 neurons, which are sexually dimorphic, which might explain sex differences in puberty timing in mammals seems extremely speculative and based on limited data. More targeted experiments would be needed to address this, and I think this speculation should be removed here.

      This speculation has now been removed from the text.

      Line 438: "Furthermore, our findings suggest that metabolic cues, through the regulation of the melanocortin output onto Kiss1AVPV/PeN neurons, are essential for the timing and magnitude of the GnRH/LH surge." Again, I think this is overstating the present data, which has only looked at an artificial hormone administration regime. The animals are fertile and, thus, must be able to mount a sufficient LH surge. The major effect, in fact, seems to be on their cycle, perhaps leading to impaired follicular development. Please acknowledge that this will be one of the multiple pathways by which metabolic information is fed into the HPG axis.

      In addition to the effect on their cycles as mentioned by the reviewer, the Kiss1MC4RKO females also display impaired fertility (Figure 2, S-T) and fewer corpora lutea which is in line with the impaired mounting of LH surge (Figure 2, M). Even if the LH surge is induced by the hormone administration protocol, it only reflects the natural ability of the HPG axis to mount the surge, as this regimen is only there to mimic the endogenous hormonal changes leading to LH surge and therefore ovulation, in a controlled manner. Nonetheless, we agree with this reviewer that this is not the sole mechanism by which metabolism regulates reproductive function and this has been emphasized in the paper. (line 443)

      Reviewer #3 (Recommendations For The Authors):

      The decreased melanocortin tone drives puberty onset (Figure 1D), and this is correlative. The transgenic animals' hypothalamic expression of Agrp, Pomc, Mc4r, and Mc3r can be measured to strengthen the claim. Hprt expression should be demonstrated, as this housekeeping gene was used as a common denominator.

      We thank the reviewer for this comment. While we think that indeed, measuring Agrp, Pomc, Mc4r, and Mc3r gene expressions in the transgenic mice will strengthen our claim and give more insights into the melanocortins tone during pubertal maturation, this is unfortunately not feasible as it will involve generating a lot of mice (at least n=40 pups for an n=5/group, KO and control littermates, females only -which will require setting up lots of breeding pairs-) during different ages throughout puberty.

      As for the gene expression of Hprt, because we have 6 mice per age, 4 ages total, every gene (Agrp, Pomc, Mc4r, Mc3r) was run in a separate plate with Hprt as its own housekeeping gene. Samples were run in duplicates for each Hprt and melanocortin genes in a 96 well = 48 wells for Hprt and 48 wells for each of the melanocortin genes. Therefore, it won’t be possible to represent one Hprt expression for all the four genes, however every gene was normalized to the Hprt gene expression that was ran in the same plate).

      In Figures 4 and 5, dot plots can be used (as opposed to the bar graphs) to better reflect the individual data points.

      Figures 4 and 5 have been revised to include individual data points.

      The electrophysiology experiment requires more details in the method section. In addition to the publication cited, a brief recap of the methodology used in this paper, such as the focal application of MTII (Figure 4B), is also needed.

      We have added more details to the Methods.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      In the manuscript the authors describe a new pipeline to measure changes in vasculature diameter upon optogenetic stimulation of neurons. The work is useful to better understand the hemodynamic response on a network /graph level.

      Strengths:

      The manuscript provides a pipeline that allows to detect changes in the vessel diameter as well as simultaneously allows to locate the neurons driven by stimulation.

      The resulting data could provide interesting insights into the graph level mechanisms of regulating activity dependent blood flow.

      Weaknesses:

      (1) The manuscript contains (new) wrong statements and (still) wrong mathematical formulas.

      The symbols in these formulas have been updated to disambiguate them, and the accompanying statements have been adjusted for clarity.

      (2) The manuscript does not compare results to existing pipelines for vasculature segmentation (opensource or commercial). Comparing performance of the pipeline to a random forest classifier (illastik) on images that are not preprocessed (i.e. corrected for background etc.) seems not a particularly useful comparison.

      We’ve now included comparisons to Imaris (a commercial) for segmentation and VesselVio (open-source) for graph extraction software.

      For the ilastik comparison, the images were preprocessed prior to ilastik segmentation, specifically by doing intensity normalization.

      Example segmentations utilizing Imaris have now been included. Imaris leaves gaps and discontinuities in the segmentation masks, as shown in Supplementary Figure 10. The Imaris segmentation masks also tend to be more circular in cross-section despite irregularities on the surface of the vessels observable in the raw data and identified in manual segmentation. This approach also requires days to months to generate per image stack.

      “Comparison with commercial and open-source vascular analysis pipelines

      To compare our results with those achievable on these data with other pipelines for segmentation and graph network extraction, we compared segmentation results qualitatively with Imaris version 9.2.1 (Bitplane) and vascular graph extraction with VesselVio [1]. For the Imaris comparison, three small volumes were annotated by hand to label vessels. Example slices of the segmentation results are shown in Supplementary Figure 10. Imaris tended to either over- or under-segment vessels, disregard fine details of the vascular boundaries, and produce jagged edges in the vascular segmentation masks. In addition to these issues with segmentation mask quality, manual segmentation of a single volume took days for a rater to annotate. To compare to VesselVio, binary segmentation masks (one before and one after photostimulation) generated with our deep learning models were loaded into VesselVio for graph extraction, as VesselVio does not have its own method for generating segmentation masks. This also facilitates a direct comparison of the benefits of our graph extraction pipeline to VesselVio. Visualizations of the two graphs are shown in Supplementary Figure 11. Vesselvio produced many hairs at both time points, and the total number of segments varied considerably between the two sequential stacks: while the baseline scan resulted in 546 vessel segments, the second scan had 642 vessel segments. These discrepancies are difficult to resolve in post-processing and preclude a direct comparison of individual vessel segments across time. As the segmentation masks we used in graph extraction derive from the union of multiple time points, we could better trace the vasculature and identify more connections in our extracted graph. Furthermore, VesselVio relies on the distance transform of the user supplied segmentation mask to estimate vascular radii; consequently, these estimates are highly susceptible to variations in the input segmentation masks.We repeatedly saw slight variations between boundary placements of all of the models we utilized (ilastik, UNet, and UNETR) and those produced by raters. Our pipeline mitigates this segmentation method bias by using intensity gradient-based boundary detection from centerlines in the image (as opposed to using the distance transform of the segmentation mask, as in VesselVio).”

      (3) The manuscript does not clearly visualize performance of the segmentation pipeline (e.g. via 2d sections, highlighting also errors etc.). Thus, it is unclear how good the pipeline is, under what conditions it fails or what kind of errors to expect.

      On reviewer’s comment, 2D slices have been added in the Supplementary Figure 4.

      (4) The pipeline is not fully open-source due to use of matlab. Also, the pipeline code was not made available during review contrary to the authors claims (the provided link did not lead to a repository). Thus, the utility of the pipeline was difficult to judge.

      All code has been uploaded to Github and is available at the following location: https://github.com/AICONSlab/novas3d

      The Matlab code for skeletonization is better at preserving centerline integrity during the pruning of hairs from centerlines than the currently available open-source methods.

      - Generalizability: The authors addressed the point of generalizability by applying the pipeline to other data sets. This demonstrates that their pipeline can be applied to other data sets and makes it more useful.  However, from the visualizations it's unclear to see the performance of the pipeline, where the pipelines fails etc. The 3d visualizations are not particularly helpful in this respect . In addition, the dice measure seems quite low, indicating roughly 20-40% of voxels do not overlap between inferred and ground truth. I did not notice this high discrepancy earlier. A thorough discussion of the errors appearing in the segmentation pipeline would be necessary in my view to better assess the quality of the pipeline.

      2D slices from the additional datasets have been added in the Supplementary Figure 13 to aid in visualizing the models’ ability to generalize to other datasets.

      The dice range we report on (0.7-0.8) is good when compared to those (0.56-86) of 3D segmentations of large datasets in microscopy [2], [3], [4], [5], [6]. Furthermore, we had two additional raters segment three images from the original training set. We found that the raters had a mean inter class correlation  of 0.73 [7]. Our model outperformed this Dice score on unseen data: Dice scores from our generalizability tests on C57 mice and Fischer rats on par or higher than this baseline.

      Reviewer #2 (Public review):<br /> The authors have addressed most of my concerns sufficiently. There are still a few serious concerns I have. Primarily, the temporal resolution of the technique still makes me dubious about nearly all of the biological results. It is good that the authors have added some vessel diameter time courses generated by their model. But I still maintain that data sampling every 42 seconds - or even 21 seconds - is problematic. First, the evidence for long vascular responses is lacking. The authors cite several papers:

      Alarcon-Martinez et al. 2020 show and explicitly state that their responses (stimulus-evoked) returned to baseline within 30 seconds. The responses to ischemia are long lasting but this is irrelevant to the current study using activated local neurons to drive vessel signals.

      Mester et al. 2019 show responses that all seem to return to baseline by around 50 seconds post-stimulus.

      In Mester et al. 2019, diffuse stimulations with blue light showed a return to baseline around 50 seconds post-stimulus (cf. Figure 1E,2C,2D). However, focal stimulations where the stimulation light is raster scanned over a small region focused in the field of view show longer-lasting responses (cf. Figure 4) that have not returned to baseline by 70 seconds post-stimulus [8]. Alarcon-Martinez et al. do report that their responses return baseline within 30 seconds; however, their physiological stimulation may lead to different neuronal and vessel response kinetics than those elicited by the optogenetic stimulations as in current work.

      O'Herron et al. 2022 and Hartmann et al. 2021 use opsins expressed in vessel walls (not neurons as in the current study) and directly constrict vessels with light. So this is unrelated to neuronal activity-induced vascular signals in the current study.

      We agree that optogenetic activation of vessel-associated cells is distinct from optogenetic activation of neurons, but we do expect the effects of such perturbations on the vasculature to have some commonalities.

      There are other papers including Vazquez et al 2014 (PMID: 23761666) and Uhlirova et al 2016 (PMID: 27244241) and many others showing optogenetically-evoked neural activity drives vascular responses that return back to baseline within 30 seconds. The stimulation time and the cell types labeled may be different across these studies which can make a difference. But vascular responses lasting 300 seconds or more after a stimulus of a few seconds are just not common in the literature and so are very suspect - likely at least in part due to the limitations of the algorithm.

      The photostimulation in Vazquez et al. 2014 used diffuse photostimulation with a fiberoptic probe similar to Mester et al. 2019 as opposed to raster scanning focal stimulation we used in this study and in the study by Mester et al. 2019  where we observed the focal photostimulation to elicited longer than a minute vascular responses. Uhlirova et al. 2016 used photostimulation powers between 0.7 and 2.8 mW, likely lower than our 4.3 mW/mm2 photostimulation. Further, even with focal photostimulation, we do see light intensity dependence of the duration of the vascular responses. Indeed, in Supplementary Figure 2, 1.1 mW/mm2 photostimulation leads to briefer dilations/constrictions than does 4.3 mW/mm2; the 1.1 mW/mm2 responses are in line, duration wise, with those in Uhlirova et al. 2016.

      Critically, as per Supplementary Figure 2, the analysis of the experimental recordings acquired at 3-second temporal resolution did likewise show responses in many vessels lasting for tens of seconds and even hundreds of seconds in some vessels.

      Another major issue is that the time courses provided show that the same vessel constricts at certain points and dilates later. So where in the time course the data is sampled will have a major effect on the direction and amplitude of the vascular response. In fact, I could not find how the "response" window is calculated. Is it from the first volume collected after the stimulation - or an average of some number of volumes? But clearly down-sampling the provided data to 42 or even 21 second sampling will lead to problems. If the major benefit to the field is the full volume over large regions that the model can capture and describe, there needs to be a better way to capture the vessel diameter in a meaningful way.

      In the main experiment (i.e. excluding the additional experiments presented in the Supplementary Figure 2 that were collected over a limited FOV at 3s per stack), we have collected one stack every 42 seconds. The first slice of the volume starts following the photostimulation, and the last slice finishes at 42 seconds. Each slice takes ~0.44 seconds to acquire. The data analysis pipeline (as demonstrated by the Supplementary Figure 2) is not in any way limited to data acquired at this temporal resolution and - provided reasonable signal-to-noise ratio (cf. Figure 5) - is applicable, as is, to data acquired at much higher sampling rates.

      It still seems possible that if responses are bi-phasic, then depth dependencies of constrictors vs dilators may just be due to where in the response the data are being captured - maybe the constriction phase is captured in deeper planes of the volume and the dilation phase more superficially. This may also explain why nearly a third of vessels are not consistent across trials - if the direction the volume was acquired is different across trials, different phases of the response might be captured.

      Alternatively, like neuronal responses to physiological stimuli, the vascular responses elicited by increases in neuronal activity may themselves be variable in both space and time.

      I still have concerns about other aspects of the responses but these are less strong. Particularly, these bi-phasic responses are not something typically seen and I still maintain that constrictions are not common. The authors are right that some papers do show constriction. Leaving out the direct optogenetic constriction of vessels (O'Herron 2022 & Hartmann 2021), the Alarcon-Martinez et al. 2020 paper and others such as Gonzales et al 2020 (PMID: 33051294) show different capillary branches dilating and constricting. However, these are typically found either with spontaneous fluctuations or due to highly localized application of vasoactive compounds. I am not familiar with data showing activation of a large region of tissue - as in the current study - coupled with vessel constrictions in the same region. But as the authors point out, typically only a few vessels at a time are monitored so it is possible - even if this reviewer thinks it unlikely - that this effect is real and just hasn't been seen.

      Uhlirova et al. 2016 (PMID: 27244241) observed biphasic responses in the same vessel with optogenetic stimulation in anesthetized and unanesthetized animals (cf Fig 1b and Fig 2, and section “OG stimulation of INs reproduces the biphasic arteriolar response”). Devor et al. (2007) and Lindvere et al. (2013) also reported on constrictions and dilations being elicited by sensory stimuli.

      I also have concerns about the spatial resolution of the data. It looks like the data in Figure 7 and Supplementary Figure 7 have a resolution of about 1 micron/pixel. It isn't stated so I may be wrong. But detecting changes of less than 1 micron, especially given the noise of an in vivo prep (brain movement and so on), might just be noise in the model. This could also explain constrictions as just spurious outputs in the model's diameter estimation. The high variability in adjacent vessel segments seen in Figure 6C could also be explained the same way, since these also seem biologically and even physically unlikely.

      Thank you for your comment. To address this important issue, we performed an additional validation experiment where we placed a special order of fluorescent beads with a known diameter of 7.32 ± 0.27um, imaged them following our imaging protocol, and subsequently used our pipeline to estimate their diameter. Our analysis converged on the manufacturer-specified diameters, estimating the diameter to be 7.34 ± 0.32. The manuscript has been updated to detail this experiment, as below:

      Methods section insert

      “Second, our boundary detection algorithm was used to estimate the diameters of fluorescent beads of a known radius imaged under similar acquisition parameters. Polystyrene microspheres labelled with Flash Red (Bangs Laboratories, inc, CAT# FSFR007) with a nominal diameter of 7.32um and a specified range of 7.32 ± 0.27um as determined by the manufacturer using a Coulter counter were imaged on the same multiphoton fluorescence microscope set-up used in the experiment (identical light path, resonant scanner, objective, detector, excitation wavelength and nominal lateral and axial resolutions, with 5x averaging). The images of the beads had a higher SNR than our images of the vasculature, so Gaussian noise was added to the images to degrade the SNR to the same level of that of the blood vessels. The images of the beads were segmented with a threshold, centroids calculated for individual spheres, and planes with a random normal vector extracted from each bead and used to estimate the diameter of the beads. The same smoothing and PSF deconvolution steps were applied in this task. We then reported the mean and standard deviation of the distribution of the diameter estimates. A variety of planes were used to estimate the diameters.”

      Results Section Insert

      “Our boundary detection algorithm successfully estimated the radius of precisely specified fluorescent beads. The bead images had a signal-to-noise ratio of 6.79 ± 0.16 (about 35% higher than our in vivo images): to match their SNR to that of in vivo vessel data, following deconvolution, we added Gaussian noise with a standard deviation of 85 SU to the images, bringing the SNR down to 5.05 ± 0.15. The data processing pipeline was kept unaltered except for the bead segmentation, performed via image thresholding instead of our deep learning model (trained on vessel data). The bead boundary was computed following the same algorithm used on vessel data: i.e., by the average of the minimum intensity gradients computed along 36 radial spokes emanating from the centreline vertex in the orthogonal plane. To demonstrate an averaging-induced decrease in the uncertainty of the bead radius estimates on a scale that is finer than the nominal resolution of the imaging configuration, we tested four averaging levels in 289 beads. Three of these averaging levels were lower than that used on the vessels, and one matched that used on the vessels (36 spokes per orthogonal plane and a minimum of 10 orthogonal planes per vessel). As the amount of averaging increased, the uncertainty on the diameter of the beads decreased, and our estimate of the bead's diameter converged upon the manufacturer's Coulter counter-based specifications (7.32 ± 0.27um), as tabulated in Table 1.”

      Reviewer #1 (Recommendations for the authors):

      Comments to the authors replies to the reviews:

      - Supplementary Figure 13:

      As indicated before the 3d images + scale makes it impossible to judge the quality of the outputs.

      As aforementioned, 2D slices have been added to the Supplementary Figure 13.

      - Supplementary Table 3:

      There is a significant increase in the Hausdorrf and Mean Surface Distance measures for the new data, why ?

      A single vessel being missed by either the rater or the model would significantly affect the Hausdorff distance (HD) and by extension Mean Surface Distance: this is particularly pertinent in the LSFM image with its much larger FOV and thus a potential for much larger max distances to result from missed vessels in the prediction or ground truth data. Large Hausdorff distances may indicate a vessel was missed in either the ground truth or the segmentation mask.

      Of note, a different rater annotated these additional datasets from the raters labeling the ground truth data. There is a high variability in boundary placements between raters. On a test where three raters segmented the same three images from the original dataset, we computed a ICC of 0.73 across their segmentations. Our model Dice scores on predictions in out-of-distribution data sets were on par with the inter-rater ICC on the Thy1ChR2 2PFM data.

      - Supplementary Figure 2: The authors provide useful data on the time responses. However, looking at those figures, it is puzzling why certain vessels were selected as responding as there seems almost no change after stimulation. In addition, some of the responses seem to actually start several tens of seconds before the actual stimulus (particularly in A).

      Only some traces in C and D (dark blue) seem to be actually responding vessels.

      This is not discussed and unclear.

      Supplementary Figure 2 displays the time courses of vessel calibre for all vessels in the FOV, not just those deemed responders.

      The aforementioned effects are due to the loess smoothing filter having been applied to the time courses for the preliminary response, which has been rectified in the updated figures. In particular, Supplementary Figure 2 has been updated with separate loess smoothing before and after photostimulation. The (pre-stimulation) effect is gone once the loess smoothing has been separated.

      - R Point 7: As indicated before and in agreement with the alternative reviewer, the quality of the results in 3d is difficult to judge. No 2d sections that compare 'ground truth' with inferred results are shown in the current manuscript which would enable a much better judgment. The provided video is still 3d and not a video going through 2d slices. Also, in the video the overlap of vasculature and raw data seems to be very good and near 100%, why is the dice measure reported earlier so low ? Is this a particularly good example ?

      Some examples, indicating where the pipeline fails (and why) would be helpful to see, to judge its performance better (ideally in 2d slices).

      As discussed in the public comments, the 2D slices are now included in Suppl. Fig. 4 and suppl. Fig 13 to facilitate visual assessment. The vessels are long and thin so that slight dilations or constrictions impact the Dice scores without being easily visualizable.

      - Author response images 6 and 7. From the presented data the constrictions measured in the smaller vessels may be a result (at least partly) of noise. This seems to be particularly the case in Author response image 7 left top and bottom for example. It would be helpful to show the actual estimates of the vessels radii overlaid in the (raw) images. In some of the pictures the noise level seems to reach higher values than the 10-20% of noise used in the tests by the authors in the revision.

      The vessel radii are estimated as averages across all vertices of the individual vessels: it is thus not possible to overlay them meaningfully in 2D slices: in Figure 2B, we do show a rendering of sample vessel-wise radii estimates.

      - "We tested the centerline detection in Python, scipy (1.9.3) and Matlab. We found that the Matlab implementation performed better due to its inclusion of a branch length parameter for the identification of terminal branches, which greatly reduced the number of false branches; the Python implementation does not include this feature (in any version) and its output had many more such "hair" artifacts. Clearmap skeletonization uses an algorithm by Palagyi & Kuba(1999) to thin segmentation masks, which does not include hair removal. Vesselvio uses a parallelized version of the scipy implementation of Lee et al. (1994) algorithm which does not do hair removal based on a terminal branch length filter; instead, Vesselvio performs a threshold-based hair removal that is frequently overly aggressive (it removes true positive vessel branches), as highlighted by the authors."

      This statement is wrong. The removal of small branches in skeletons is algorithmically independent of the skeletonization algorithm itself. The authors cite a reference concerned with the algorithm they are currently employing for the skeletonization. Careful assessment of that reference shows that this algorithm removes small length branches after skeletonization is performed. This feature is available in open-source packages as well, or could be easily implemented.

      We appreciate that skeletonization is distinct from hair removal and have reworded this paragraph for clarity. We are currently working with SciPy developers to implement hair removal in their image processing pipeline so as to render our pipeline fully open-source.

      The removal of hairs after skeletonization with length based thresholding leads to the possibility of removing parts of centerlines in the main part of vessels after branch points with hairs. The Matlab implementation does not do this and leaves the main branches intact.

      This text has been updated to:

      “Hair” segments shorter than 20 μm and terminal on one end were iteratively removed, starting with the shortest hairs and merging the longest hairs at junctions with 2 terminal branches with the main vessel branch to reduce false positive vascular branches and minimize the amount of centerlines removed. This iterative hair removal functionality of the skeletonization algorithm is currently unavailable in python, but is available in Matlab [9].

      - "On the reviewer's comment, we did try inputting normalized images into Ilastik, but this did not improve its results." This is surprising. Reasonable standard preprocessing (e.g. background removal, equalization, and vessel enhancement) would probably restore most of illastik's performance in the indicated panel.

      While the improvement may be present in a particular set of images, the generalizability of such improvement to other patches is often poor in our experience, as reflected by aforementioned results and the widespread uptake of DL approaches to image segmentation. The in vivo datasets also contain artifacts arising from eg. bleeding into the FOV that ilastik is highly sensitive to. This is an example of noise that is not easily removed by standard preprocessing.

      - "Typical pre-processing/standard computer vision techniques with parameter tuning do not generalize on out-of-distribution data with different image characteristics, motivating the shift to DL-based approaches."

      I disagree with this statement. DL approaches can generalize typically when trained with sufficient amount of diverse data. However, DL approaches can also fail with new out of distribution data. In that situation they only be 'rescued' via new time intensive data generation and retraining. Simple standard image pre-processing steps (e.g. to remove background or boost vessel structures) have well defined parameter that can be easily adapted to new out of distribution data as clear interpretations are available. The time to adapt those parameters is typically much smaller than retraining of DL frameworks.

      We find that the standard image processing approaches with parameter tuning work robustly only if fine-tuned on individual images; i.e., the fine-tuning does not generalize across datasets. This approach thus does not scale to experiments yielding large image sizes/having high throughput experiments. While DL models may not generalize to out-of-distribution data, fine-tuning DL models with a small subset of labels generally produce superior models to parameter tuning that can be applied to entire studies. Moreover, DL fine-tuning is typically an efficient process due to very limited labelling and training time required.

      - It is still unclear how the authors pipeline performs compared with other (open source or commercially) available pipelines. As indicated before, comparing to illastik, particularly when feeding non preprocessed data, does not seem to be a particularly high bar.

      This question has also been raised by the other reviewer who asked to compare to commercially available pipelines.

      This question was not answered by the authors, and instead the authors reply by claiming to provide an open source pipeline. In fact, the use of matlab in their pipeline does not make it fully open-source either. Moreover, as mentioned before, open-source pipelines for comparisons do exists.

      As discussed above, the manuscript now includes comparisons to Imaris for segmentation and Vesselvio for graph extraction. The pipeline is on github.

      -"We agree with the review that this question is interesting; however, it is not addressable using present data: activated neuronal firing will have effects on their postsynaptic neighbors, yet we have no means of measuring the spread of activation using the current experimental model."

      Distances to the closest neuron in the manuscript are measured without checking if it's active. Thus, distances to the first set of n neurons could be measured in the same way, ignoring activation effects.

      Shorter distances to an entire ensemble of neurons would still be (more) informative of metabolic demands.

      This could indeed be done within the existing framework. The connected-components-3d can be used to extract individual occurrences of neurons in the FOV from the neuron segmentation mask. Each neuron could then have its distance calculated to each point on the vessel centerlines.

      - model architecture:

      It is unclear from the description if any positional encoding was used for the image patches.

      It is unclear if the architecture / pipeline can handle any volume sizes or is trained on a fixed volume shapes? In the latter case how is the pipeline applied?

      The model includes positional encoding, as described in Hatamizadeh et al. 2021.

      The model can be applied to images of any size, as demonstrated on larger images in Supplementary Figure 9 and on smaller images in Supplementary Figure 2. The pipeline is applied in the same way. It will read in the size of an input image and output an image of the same size.

      - transformer models often show better results when using a learning rate scheduler that adjust the learning rate (up and down ramps typically). Did the authors test such approaches?

      We did not use a learning rate scheduler, as we found we were getting good results without using one.

      - formula (4): The 95% percentile of two numbers is the max, and thus (5) is certainly not what the HD95 metric is. The formula is simply wrong as displayed.

      Thank you. The formula has been updated.

      - formula (5): formula 5 is certainly wrong: n_X, n_y are either integer numbers as indicated by the sum indices or sets when used in the distances, but can't be both at the same time.

      Thank you for your comment. The Formula has been updated.

      - The statement:

      "this functionality of the skeletonization algorithm is currently unavailable in any python implementation, but is available in Matlab [56]."

      is not correct (see reply above)

      Please see the response above. This text has been updated to:

      “Hair” segments shorter than 20 μm and terminal on one end were iteratively removed, starting with the shortest hairs and merging the longest hairs at junctions with 2 terminal branches with the main vessel branch to reduce false positive vascular branches and minimize the amount of centerlines removed. This iterative hair removal functionality of the skeletonization algorithm is currently unavailable in Python, but is available in Matlab [9].

      - the centerline extraction is performed after taking the union of smoothed masks. The union operation can induce novel 'irregular' boundaries that degrade skeletonization performance. I would expect to apply smoothing after the union?

      Indeed the images were smoothed via dilation after taking the union, as described in the previous set of responses to private comments.

      - "The radius estimate defined the size of the Gaussian kernel that was convolved with the image to smooth the vessel: smaller vessels were thus convolved with narrower kernels."

      It's unclear what image were filtered ?

      We have updated this text for clarity:

      The radius estimate defined the size of the Gaussian kernel that was convolved with the 2D image slice to smooth the vessel: smaller vessels were thus convolved with narrower kernels.

      - Was deconvolution on the raw images applied or after Gaussian filtering ?

      The deconvolution was applied before Gaussian filtering.

      - ",we extracted image intensities in the orthogonal plane from the deconvolved raw registered image. A 2D Gaussian kernel with sigma equal to 80% of the estimated vessel-wise radius was used to low-pass filter the extracted orthogonal plane image and find the local signal intensity maximum searching, in 2D, from the center of the image to the radius of 10 pixels from the center."

      Would it not be better to filter the 3d image before extracting a 2d plane and filter then ?

      That could be done, but would incur a significant computational speed penalty. 2D convolutions are faster, and produced excellent accuracy when estimating radii in our bead experiment.

      What algorithm was used to obtain the 2d images.

      The 2d images were obtained using scipy.ndimage.map_coordinates.

      - Figure 2: H is this the filtered image or the raw data ?

      Panel H is raw data.

      - It would be good to see a few examples of the raw data overlaid with the radial estimates to evaluate the approach (beyond the example in K).

      Additional examples are shown in Figure 5.

      - Figure 2 K: Why are boundary points greater than 2 standard deviations away from the mean excluded ?

      They are excluded to account for irregularities as vessels approach junctions [10], [11] REF.

      - Figure 2 L: what exactly is plotted here ? What are vertex wise changes, is that the difference between the minimum and maximum of all the detected radii for a single vertex? Why do some vessels (red) show high values consistently throughout the vessel ?

      Figure 2L displays change in the radius of vertices - in this FOV- following photostimulation in relation to baseline.

      - Assortativity: to calculate the assortativity, are radius changes binned in any form to account for the fact that otherwise, $e_{xy}$ and related measures will be likely be based on single data points?

      Assortativity is not calculated from single data points. It can be calculated by either binning into categories or computing it on scalars i.e. average radius across a vessel segment:

      See here for info on calculating assortativity from binned categories (ie classifying a vessel as a constrictor, dilator or non-responder):

      https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.assortativity.attribute_assortativity_coefficient.html#networkx.algorithms.assortativity.attribute_assortativity_coefficient

      And see here for calculating assortativity from scalar values:

      https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.assortativity.numeric_assortativity_coefficient.html#networkx.algorithms.assortativity.numeric_assortativity_coefficient

      We calculated the assortativity using scalar values.

      In both cases, one uses all nodes and calculates the correlation between each node and its neighbours with an attribute that can be binned or is a scalar. Binning the value on a given node would not affect the number of nodes in a graph.

      - "Ilastik tended to over-segment vessels, i.e. the model returned numerous false positives, having a high recall (0.89{plus minus}0.19) but low precision (0.37{plus minus}0.33) (Figure 3, Supplementary Table 3)."

      As indicated before, and looking at Figure 4, over segmentation seems due to too high background. A suggested preprocessing step on the raw images to remove background could have avoided this.

      The images were normalized in preprocessing.

      - Figure 4: The 3d panels are not much easier to read in the revised version. As suggested by other reviewers, 2d sections indicating the differences and errors would be much more helpful to judge the pipelines quality more appropriately.

      As discussed above, 2D sections are now available in a supplementary figure.

      - Figure 3: What would be the dice score (and other measures) between two ground truths extracted by two annotations by two humans (assisted e.g. by illastik).

      Two additional human rates annotated images. We observed a ICC of 0.73 across a total of three raters on the three images.

      - Figure 5: The authors only provide the absolute value of SU for the sigma noise levels. This only has some meaning when compared to the mean or median SU of the images. In the text the maximal intensity of 1023 SU is mentioned, but what are those values in images with weaker / smaller vessels (as provided in the constriction examples in the revision)/

      I am unclear why this validation figure should be part of the main manuscript while generalization performance is left out.

      The manuscript has been updated with the mean SNR value of 5.05 ± 0.15 to provide context for the quality of our images.

      Bibliography

      (1) J. R. Bumgarner and R. J. Nelson, “Open-source analysis and visualization of segmented vasculature datasets with VesselVio,” Cell Rep. Methods, vol. 2, no. 4, Apr. 2022, doi: 10.1016/j.crmeth.2022.100189.

      (2) G. Tetteh et al., “DeepVesselNet: Vessel Segmentation, Centerline Prediction, and Bifurcation Detection in 3-D Angiographic Volumes,” Front. Neurosci., vol. 14, Dec. 2020, doi: 10.3389/fnins.2020.592352.

      (3) N. Holroyd, Z. Li, C. Walsh, E. Brown, R. Shipley, and S. Walker-Samuel, “tUbe net: a generalisable deep learning tool for 3D vessel segmentation,” Jul. 24, 2023, bioRxiv. doi: 10.1101/2023.07.24.550334.

      (4) W. Tahir et al., “Anatomical Modeling of Brain Vasculature in Two-Photon Microscopy by Generalizable Deep Learning,” BME Front., vol. 2020, p. 8620932, Dec. 2020, doi: 10.34133/2020/8620932.

      (5) R. Damseh, P. Delafontaine-Martel, P. Pouliot, F. Cheriet, and F. Lesage, “Laplacian Flow Dynamics on Geometric Graphs for Anatomical Modeling of Cerebrovascular Networks,” ArXiv191210003 Cs Eess Q-Bio, Dec. 2019, Accessed: Dec. 09, 2020. [Online]. Available: http://arxiv.org/abs/1912.10003

      (6) T. Jerman, F. Pernuš, B. Likar, and Ž. Špiclin, “Enhancement of Vascular Structures in 3D and 2D Angiographic Images,” IEEE Trans. Med. Imaging, vol. 35, no. 9, pp. 2107–2118, Sep. 2016, doi: 10.1109/TMI.2016.2550102.

      (7) T. B. Smith and N. Smith, “Agreement and reliability statistics for shapes,” PLOS ONE, vol. 13, no. 8, p. e0202087, Aug. 2018, doi: 10.1371/journal.pone.0202087.

      (8) J. R. Mester et al., “In vivo neurovascular response to focused photoactivation of Channelrhodopsin-2,” NeuroImage, vol. 192, pp. 135–144, May 2019, doi: 10.1016/j.neuroimage.2019.01.036.

      (9) T. C. Lee, R. L. Kashyap, and C. N. Chu, “Building Skeleton Models via 3-D Medial Surface Axis Thinning Algorithms,” CVGIP Graph. Models Image Process., vol. 56, no. 6, pp. 462–478, Nov. 1994, doi: 10.1006/cgip.1994.1042.

      (10) M. Y. Rennie et al., “Vessel tortuousity and reduced vascularization in the fetoplacental arterial tree after maternal exposure to polycyclic aromatic hydrocarbons,” Am. J. Physiol.-Heart Circ. Physiol., vol. 300, no. 2, pp. H675–H684, Feb. 2011, doi: 10.1152/ajpheart.00510.2010.

      (11) J. Steinman, M. M. Koletar, B. Stefanovic, and J. G. Sled, “3D morphological analysis of the mouse cerebral vasculature: Comparison of in vivo and ex vivo methods,” PLOS ONE, vol. 12, no. 10, p. e0186676, Oct. 2017, doi: 10.1371/journal.pone.0186676.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      The authors explain that an action potential that reach an axon terminal emits a small electrical field as it "annihilates". This happens even though there is no gap junction, at chemical synapses. The generated electrical field is simulated to show that it can affect a nearby, disconnected target membrane by tens of microvolts for tenths of a microsecond. Longer effects are simulated for target locations a few microns away.

      To simulate action potentials (APs), the paper does not use the standard HodgkinHuxley formalism because it fails to explain AP collision. Instead it uses the Tasaki and Matsumoto (TM) model which is simplified to only models APs with three parameters and as a membrane transition between two states of resting versus excited. The authors expand the strictly binary, discrete TM method to a Relaxing Tasaki Model (RTM) that models the relaxation of the membrane potential after an AP. They find that the membrane leak can be neglected in determining AP propagation and that the capacitive currents dominate the process.

      The strength of the work is that authors identified an important interaction between neurons that is neglected by the standard models. A weakness of the proposed approach is the assumptions that it makes. For instance, the external medium is modeled as a homogeneous conductive medium, which may be further explored to properly account for biological processes. To the authors’ credit, the external medium can be largely varying and could be left out from the general model, only to be modeled specific instances.

      The authors provide convincing evidence by performing experiments to record action potential propagation and collision properties and then developing a theoretical framework to simulate effect of their annihilation on nearby membranes. They provide both experimental evidence and rigorous mathematical and computer simulation findings to support their claims. The work has a potential of explaining significant electrical interaction between nerve centers that are connected via a large number of parallel fibers.

      Comments on revisions:

      The authors responded to all of my previous concerns and significantly improved the manuscript.

      We thank the reviewer for his comments and are pleased that we were able to adequately address all of his previous concerns. As a small comment to the remark of the reviewer “potential of explaining ... interaction ... via a large number of parallel fibers” we would like to add: The ephaptic coupling is prominent when APs annihilate at axon terminals, as we illustrate in Figure 4 and 5. Across parallel fibers, the impact of propagating APs is much lower but still may result in synchronization of APs.

      Reviewer 2:

      In this study, the authors measured extracellular electrical features of colliding APs travelling in different directions down an isolated earthworm axon. They then used these features to build a model of the potential ephaptic effects of AP annihilation, i.e. the electrical signals produced by colliding/annihilating APs that may influence neighbouring tissue. The model was then applied to some different hypothetical scenarios involving synaptic connections. In a revised version of the manuscript, it was also applied, with success, to published experimental data on the cerebellar basket cell-to-Purkinje cell pinceau connection. The conclusion is that an annihilating AP at a presynaptic terminal can emphatically influence the voltage of a postsynaptic cell (this is, presumably, the ’electrical coupling between neurons’ of the title), and that the nature of this influence depends on the physical configuration of the synapse.

      As an experimental neuroscientist who has never used computational approaches, I am unable to comment on the rigour of the analytical approaches that form the bulk of this paper. The experimental approaches appear very well carried out, and the data showing equal conduction velocity of anti- and orthodromically propagating APs in every preparation is now convincing.

      The conclusions drawn from the synaptic modelling have been considerably strengthened by the new Figure 5. Here, the authors’ model - including AP annihilation at a synaptic terminal - is used to predict the amplitude and direction of experimentally observed effects at the cerebellar basket cell-to-Purkinje cell synapse (Blot & Barbour 2014). One particular form of the model (RTM with tau=0.5ms and realistic non-excitability of the terminal) matches the experimental data extremely well. This is a much more convincing demonstration that the authors’ model of ephaptic effects can quantitatively explain key features of experimental data pertaining to synaptic function. As such, the implications for the relevance of ephaptic coupling at different synaptic contacts may be widespread and important.

      However, it appears that all of the models in the new Fig5 involve annihilating APs, yet only one fits the data closely. A key question, which should be addressed if at all possible, is what happens to the predictive power of the best-fitting model in Fig5 if the annihilation, and only the annihilation, is removed? In other words, can the authors show that it is specifically the ephaptic effects of AP annihilation, rather than other ephaptic effects of, say AP waveform/amplitude/propagation, that explain the synaptic effects measured in Blot & Barbour (2014)? This would appear to be a necessary demonstration to fully support the claims of the title.

      Reviewer 2 (Recommendations for the authors):

      Can you clarify whether all models shown in Fig5 involve an annihilating AP? Is it possible to plot the predicted effects of the most successful model (RTM 0.5ms in B) with *only* the annihilation selectively removed?

      We are grateful for the reviewer’s comments and the specific suggestion for improvement (’...can the authors show that it is specifically the ephaptic effects of AP annihilation, rather than other ephaptic effects...’). For illustrating the importance of annihilation, we added the results of our calculation when no annihilation occurs, i.e. for propagating APs in the source neuron (Figure 5A) and we modified the geometry of the source neuron in Figure 5B such that only the annihilation takes place. Together with the source neuron with similar properties to the Basket cell (Figure 5C), we now show the effect of annihilation and the effect of Basket cell specific geometry and physiology. We added and edited in the main text the following 4 sentences:

      ll 271: In our two models (TM and RTM), the modulation of not terminating but propagating APs along the source axon on the AP rate of the target cell is minute (Figure 5A). Note that this geometry does not correspond to the Purkinje cell-Basket cell connectivity. For annihilating APs at the axon terminal, with excitable segments up to the very end, our models reveal a moderate modulation, and only about half of what was reported for the Purkinje cell by Blot and Barbour (2014). This illustrates the importance of AP annihilation for ephaptic coupling (Figure 5B). We added and edited the figure legend:

      Figure 5. ... (A) excluding the annihilation of an AP at the source neuron, i.e. a propagating AP, cause only minute modulation of the predicted AP rate in the target neuron. Note that this example does not represent the Basket cell terminal with annihilating APs. (B) annihilation of an AP at the terminal of the source neuron, with all segments being excitable in our calculation, cause moderate modulation. (C) source neuron with similar properties to the Basket cell, i.e. a bouton and last segments non-excitable (corresponding to 15 µm with no switch from resting state to excited state), cause inhibition and rebound that is very similar as described by Blot and Barbour (2014).

      In the discussion, we extended one sentence to refer to Figure 5:

      ll 346: This may cause synchronization of APs and our proposed model also can be used to study the observed phenomena of synchronization due to ephaptic coupling, even in the case of zero discharge (see Figure 4A, and local impact on the target, integrated on timescales >1 ms in Figure 5).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We sincerely appreciate the time and effort you and the reviewers have invested in evaluating our work.

      We are grateful for the constructive criticism of the reviewers. Building up on their feedback we have made additions to the reviewed preprint. Specifically, we have added information to the supplementary materials to give additional context on the impact of the fixed experimental design on infants’ looking behavior. Further, we have adapted the text throughout the manuscript to incorporate a thorough discussion of the impact of the experimental design.

      We believe that these revisions and the inclusion of supplementary analyses provide a clearer understanding of our findings.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors observed a decline in autophagy and proteasome activity in the context of Milton knockdown. Through proteomic analysis, they identified an increase in the protein levels of eIF2β, subsequently pinpointing a novel interaction within eIF subunits where eIF2β contributes to the reduction of eIF2α phosphorylation levels. Furthermore, they demonstrated that overexpression of eIF2β suppresses autophagy and leads to diminished motor function. It was also shown that in a heterozygous mutant background of eIF2β, Milton knockdown could be rescued. This work represents a novel and significant contribution to the field, revealing for the first time that the loss of mitochondria from axons can lead to impaired autophagy function via eIF2β, potentially influencing the acceleration of aging. To further support the authors' claims, several improvements are necessary, particularly in the methods of quantification and the points that should be demonstrated quantitatively. It is crucial to investigate the correlation between aging and the proteins eIF2β and eIF2α.

      Thank you so much for your review and comments. We included analyses of protein levels of eIF2α, eIF2β, and eIF2γ at 7 days and 21 days (Figure 4D). The manuscript was revised as below;

      Lines 242-245 ‘As for the other subunits of eIF2 complex, proteome analysis did not detect a significant difference in the protein levels of eIF2α and eIF2γ between milton knockdown and control flies at 7 and 21 days (Figure 4D).’

      Reviewer #2 (Public Review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of Milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria.

      The manuscript has several weaknesses. The reader should take extra care while reading this manuscript and when acknowledging the findings and the model in this manuscript.

      The defect in autophagy by the depletion of axonal mitochondria is one of the main claims in the paper. The authors should work more on describing their results of LC3-II/LC3-I ratio, as there are multiple ways to interpret the LC3 blotting for the autophagy assessment. Lysosomal defects result in the accumulation of LC3-II thus the LC3-II/LC3-I ratio gets higher. On the other hand, the defect in the early steps of autophagosome formation could result in a lower LC3-II/LC3-I ratio. From the results of the actual blotting, the LC3-I abundance is the source of the major difference for all conditions (Milton RNAi and eIF2β overexpression and depletion). In the text, the authors simply state the observation of their LC3 blotting. The manuscript lacks an explanation of how to evaluate the LC3-II/LC3-I ratio. Also, the manuscript lacks an elaboration on what the results of the LC3 blotting indicate about the state of autophagy by the depletion of axonal mitochondria.

      Thank you for pointing it out, and we apologize for an insufficient description of the result. We included quantitation of the levels of LC3-I and LC3-II in Figure 2A, 2D, 3D, 6B and 7B. As the reviewer pointed out, changes in the LC3-II/LC3-I ratio do not necessarily indicate autophagy defects. However, since p62 accumulation (Figure 2B, 2E, 3E, 6C, 7C in the original manuscript), these results collectively suggest that autophagy is lowered. We revised the manuscript to include this discussion as below:

      Lines 174-186 ‘During autophagy progression, LC3 is conjugated with phosphatidylethanolamine to form LC3-II, which localizes to isolation membranes and autophagosomes. LC3-I accumulation occurs when autophagosome formation is impaired, and LC3-II accumulation is associated with lysosomal defects(31,32). p62 is an autophagy substrate, and its accumulation suggests autophagic defects(31,32). We found that milton knockdown increased LC3-I, and the LC3-II/LC3-I ratio was lower in milton knockdown flies than in control flies at 14-day-old (Figure 2A). We also analyzed p62 levels in head lysates sequentially extracted using detergents with different stringencies (1% Triton X-100 and 2% SDS). Western blotting revealed that p62 levels were increased in the brains of 14-day-old of milton knockdown flies (Figure 2B). The increase in the p62 level was significant in the Triton X-100-soluble fraction but not in the SDS-soluble fraction (Figure 2B), suggesting that depletion of axonal mitochondria impairs the degradation of less-aggregated proteins.’

      Line 189-190 : ‘At 30 day-old, LC3-I was still higher, and the LC3-II/LC3-I ratio was lower, in milton knockdown compared to the control (Figure 2D).’

      Line 199-201: ‘However, in contrast with milton knockdown, Pfk knockdown did not affect the levels of LC3-I, LC3-II or the LC3-II/LC3-I ratio (Figure 3D).’

      Line 275-281: ‘Neuronal overexpression of eIF2β increased LC3-II, while the LC3-II/LC3-I ratio was not significantly different (Figure 6A and B). Overexpression of eIF2β significantly increased the p62 level in the Triton X-100-soluble fraction (Figure 6C, 4-fold vs. control, p < 0.005 (1% Triton X-100)) but not in the SDS-soluble fraction (Figure 6C, 2-fold vs. control, p = 0.062 (2% SDS)), as observed in brains of milton knockdown flies (Figure 2B). These data suggest that neuronal overexpression of eIF2β accumulates autophagic substrates.’

      Line 307-315: ‘Neuronal knockdown of milton causes accumulation of autophagic substrate p62 in the Triton X-100-soluble fraction (Figure 2B), and we tested if lowering eIF2β ameliorates it. We found that eIF2β heterozygosity caused a mild increase in LC3-I levels and decreases in LC3-II levels, resulting in a significantly lower LC3-II/LC3-I ratio in milton knockdown flies (Figure 7B). eIF2β heterozygosity decreased the p62 level in the Triton X-100-soluble fraction in the brains of milton knockdown flies (Figure 7C). The p62 level in the SDS-soluble fraction, which is not sensitive to milton knockdown (Figure 2B), was not affected (Figure 7C). These results suggest that suppression of eIF2β ameliorates the impairment of autophagy caused by milton knockdown.’

      Another main point of the paper is the up-regulation of eIF2β by depleting the axonal mitochondria leads to the proteostasis crisis. This claim is formed by the findings from the proteome analyses. The authors should have presented their proteomic data with much thorough presentation and explanation. As in the experiment scheme shown in Figure 4A, the author did two proteome analyses: one from the 7-day-old sample and the other from the 21-day-old sample. The manuscript only shows a plot of the result from the 7-day-old sample, but that of the result from the 21-day-old sample. For the 21-day-old sample, the authors only provided data in the supplemental table, in which the abundance ratio of eIF2β from the 21-day-old sample is 0.753, meaning eIF2β is depleted in the 21-day-old sample. The authors should have explained the impact of the eIF2β depletion in the 21-day-old sample, so the reader could fully understand the authors' interpretation of the role of eIF2β on proteostasis.

      Thank you for pointing it out. We included plots of the results of 21-day-old proteome as a part of the main figure (Figure 4C). As the reviewer pointed out, eIF2β protein levels are reduced at the 21-day-old. Since a reduction in the eIF2_β_ ameliorated milton knockdown-induced locomotor defects in aged flies (Figure 7D), the reduction in eIF2β observed in the 21-day-old milton knockdown flies is not likely to negatively contribute to milton knockdown-induced defects. We included this discussion in the manuscript as below:

      Lines 337-341:‘eIF2β protein levels are reduced at the 21-day-old; however, since a reduction in the eIF2β ameliorated milton knockdown-induced locomotor defects in aged flies (Figure 7), the reduction in eIF2β observed in the 21-day-old is not likely to negatively contribute to milton knockdown-induced defects.’

      The manuscript consists of several weaknesses in its data and explanation regarding translation.

      (1) The authors are likely misunderstanding the effect of phosphorylation of eIF2α on translation. The P-eIF2α is inhibitory for translation initiation. However, the authors seem to be mistaken that the down-regulation of P-eIF2α inhibits translation.

      We are sorry for our insufficient explanation in the previous version. As the reviewer pointed out, it is well known that the phosphorylated form of eIF2α inhibits translation initiation. Neuronal knockdown of milton caused a reduction in p-eIF2α (Figure 4J and K), and it also lowered translation (Figure 5); the relationship between these two events is currently unclear. We do not think that a reduction in the p-eIF2α suppressed translation; rather, we propose that the unbalance of expression levels of the components of eIF2 complexes negatively affects translation. We revised discussion sections to describe our interpretation more in detail as below:

      Line 368-378: ‘eIF2β is a component of eIF2, which meditates translational regulation and ISR initiation. When ISR is activated, phosphorylated eIF2α suppresses global translation and induces translation of ATF4, which mediates transcription of autophagy-related genes(39,40). Since ISR can positively regulate autophagy, we suspected that suppression of ISR underlies a reduction in autophagic protein degradation. We found neuronal knockdown of milton reduced phosphorylated eIF2α, suggesting that ISR is reduced (Figure 4). However, we also found that global translation was reduced (Figure 5). It may be possible that increased levels of eIF2β disrupt the eIF2 complex or alter its functions. The stoichiometric mismatch caused by an imbalance of eIF2 components may inhibit ISR induction. Supporting this model, we found that eIF2β upregulation reduced the levels of p-eIF2α (Figure 6).’

      We have revised the graphical abstract and removed the eIF2 complex since its role in the loss of proteostasis caused by milton knockdown has not been elucidated yet.

      (2) The result of polysome profiling in Figure 4H is implausible. By 10%-25% sucrose density gradient, polysomes are not expected to be observed. The authors should have used a gradient with much denser sucrose, such as 10-50%.

      Thank you for pointing it out. It was a mistake of 10-50%, and we apologize for the oversight. It was corrected (Figure 5).

      (3) Also on the polysome profiling, as in the method section, the authors seemed to fractionate ultra-centrifuged samples from top to bottom and then measured A260 by a plate reader. In that case, the authors should have provided a line plot with individual data points, not the smoothly connected ones in the manuscript.

      Thank you for pointing it out. We revised the graph (Figure 5).

      (4) For both the results from polysome profiling and puromycin incorporation (Figure 4H and I), the difference between control siRNA and Milton siRNA are subtle, if not nonexistent. This might arise from the lack of spatial resolution in their experiment as the authors used head lysate for these data but the ratio of Phospho-eIF2α/eIF2α only changes in the axons, based on their results in Figure 4E-G. The authors could have attempted to capture the spatial resolution for the axonal translation to see the difference between control siRNA and Milton siRNA.

      Thank you for your comment. We agree that it would be an interesting experiment, but it will take a considerable amount of time to analyze axonal translation with spatial resolution. We will try to include such analyses in the future. For this manuscript, we revised the discussion section to include the reviewer's suggestion as below;

      Lines 351-353: ‘Further analyses to dissect the effects of milton knockdown on proteostasis and translation in the cell body and axon by experiments with spatial resolution would be needed.’

      Recommendations for the authors:

      From the Reviewing Editor:

      As the Reviewing Editor, I have read your manuscript and the associated peer reviews. I have concerns about publishing this work in its current form. I think that your manuscript cannot claim to have found a novel function of eIF2beta because of technical uncertainties and conceptual problems that should be addressed.

      Thank you so much for your review and comments. We addressed all the concerns raised by the reviewers. Point-by-point responses are listed below.

      First, your manuscript is based partly on what appears to be a mistaken understanding of the mechanistic basis of the ISR. Specifically, eIF2 is a heterotrimeric complex of alpha, beta, and gamma subunits. When eIF2a is phosphorylated, the heterotrimer adopts a new conformation. This conformation directly binds and inhibits eIF2B, the decameric GEF that exchanges the GDP bound to the gamma subunit of the eIF2 complex for GTP. Unless I misunderstood your paper, you seem to propose that decreasing levels of phospho-eIF2a will inhibit translation, but this is backward from what we know about the ISR.

      Thank you for your insightful comment, and we are sorry for the confusion. We did not mean to propose that decreasing levels of phospho-eIF2_a_ inhibits translation. We apologize for our insufficient explanation, which might have caused a misunderstanding (Lines 312-318 in the original version). We agree with the reviewer that ‘mismatch due to elevated eIF2-beta could change the behavior of the ISR’. We revised the text in the result section as follows:

      Lines 259-264 (in the Result section) ‘Phosphorylation of eIF2α induces conformational changes in the eIF2 complex and inhibits global translation(36). To analyze the effects of milton knockdown on translation, we performed polysome gradient centrifugation to examine the level of ribosome binding to mRNA. Since p-eIF2α was downregulated, we hypothesized that milton knockdown would enhance translation. However, unexpectedly, we found that milton knockdown significantly reduced the level of mRNAs associated with polysomes (Figure 5A and B).’

      Lines 368-378 (in the Discussion section): ‘eIF2β is a component of eIF2, which meditates translational regulation and ISR initiation. When ISR is activated, phosphorylated eIF2α suppresses global translation and induces translation of ATF4, which mediates transcription of autophagy-related genes(39,40). Since ISR can positively regulate autophagy, we suspected that suppression of ISR underlies a reduction in autophagic protein degradation. We found neuronal knockdown of milton reduced phosphorylated eIF2α, suggesting that ISR is reduced (Figure 4). However, we also found that global translation was reduced (Figure 5). It may be possible that increased levels of eIF2β disrupt the eIF2 complex or alter its functions. The stoichiometric mismatch caused by an imbalance of eIF2 components may inhibit ISR induction. Supporting this model, we found that eIF2β upregulation reduced the levels of p-eIF2α (Figure 6).’

      It may be possible that a stoichiometric mismatch due to elevated eIF2-beta could change the behavior of the ISR, but your paper doesn't adequately address the expression levels of all three eIF2 subunits: alpha, beta, and gamma. The proteomic data shown in Fig 4B is unconvincing on its own because the changes in the beta subunit are subtle. The Western blot in Figure 4C suggests that the KD changes the mass or mobility of the beta subunit, and most importantly, there are no Western blots measuring the levels of eIF2a, eIF2a-phospho, or eIF2-gamma.

      We appreciate the reviewer’s comment and agree that the stoichiometric mismatch due to elevated eIF2β may interfere with ISR. We found overexpression of eIF2β lowered p-eIF2 alpha (Figure S2 in V1), which supports this model. We included this data in the main figure in the revised manuscript (Figure 6D) and revised the text as below:

      Lines 279-281: ‘Since milton knockdown reduced the p-eIF2α level (Figure 4K), we asked whether an increase in eIF2β affects p-eIF2α. Neuronal overexpression of eIF2β did not affect the eIF2α level but significantly decreased the p-eIF2α level (Figure 6D, E).’

      Expression data of eIF2α and eIF2γ from proteomic analyses has been extracted from proteome analyses and included as a table (Figure 4D). Western blots of phospho-eIF2a (Figure S1 in V1) in the main figure (Figure 4G). The result section was revised as below;

      Lines 242-245: ‘As for the other subunits of eIF2 complex, proteome analysis did not detect a significant difference in the protein levels of eIF2α and eIF2γ between milton knockdown and control flies at 7 and 21 days (Figure 4D).’

      Reviewer #1 (Recommendations For The Authors):

      L125-128: In this section, while the efficiency of Milton knockdown is referenced from a previous publication, it is necessary to also mention that the Miro knockdown has been similarly reported in the literature. Additionally, the Methods section lacks details on the Miro RNAi line used, and Table 2 does not include the genotype for Miro RNAi. This information should be included for clarity and completeness.

      Thank you for pointing it out. Knockdown efficiency with this strain has been reported (Iijima-Ando et al., PLoS Genet, 2012). We revised the text to include citation and knockdown efficiency as follows:

      Lines 139-147: ‘There was no significant increase in ubiquitinated proteins in milton knockdown flies at 1-day old, suggesting that the accumulation of ubiquitinated proteins caused by milton knockdown is age-dependent (Figure S1). We also analyzed the effect of the neuronal knockdown of Miro, a partner of milton, on the accumulation of ubiquitin-positive proteins. Since severe knockdown of Miro in neurons causes lethality, we used UAS-Miro RNAi strain with low knockdown efficiency, whose expression driven by elav-GAL4 caused 30% reduction of Miro mRNA in head extract(24). Although there was a tendency for increased ubiquitin-positive puncta in Miro knockdown brains, the difference was not significant (Figure 1B, p>0.05 between control RNAi and Miro RNAi). These data suggest that the depletion of axonal mitochondria induced by milton knockdown leads to the accumulation of ubiquitinated proteins before neurodegeneration occurs.’

      L132-L136: The current phrasing in this section suggests an increase in ubiquitinated proteins for both Milton and Miro knockdowns. However, since there is no significant difference noted for Miro, it is incorrect to state an increase in ubiquitin-positive puncta. Furthermore, combining the results of Milton knockdown to claim an increase in ubiquitinated proteins prior to neurodegeneration is misleading. At the very least, the expression here needs to be moderated to accurately reflect the findings.

      Thank you for pointing it out. We revised the text as above.

      L137-L141: Results in Figure 1 indicate that Milton knockdown leads to an increase in ubiquitinated proteins at 14 days, while Miro knockdown shows no difference from the control at either 14 or 30 days. Conversely, both the control and Miro exhibit an increase in ubiquitinated proteins with aging, but this trend does not seem to apply to Milton knockdown. This observation suggests that Milton KD may not affect the changes in protein quality control associated with aging. It implies that Milton's function might be more related to protein homeostasis in younger cells, or that changes due to aging might overshadow the effects of Milton knockdown. These interpretations should be included in the Results or Discussion sections for a more comprehensive analysis.

      Thank you for your insightful comment. We revised the text to include those points as follows:

      Lines 152-153: ‘These results suggest that depletion of axonal mitochondria may have more impact on proteostasis in young neurons than in old neurons.’

      Lines 355-362: ‘The depletion of axonal mitochondria and accumulation of abnormal proteins are both characteristics of aged brains(37,38). Our results suggest that the loss of axonal mitochondria is an event upstream of proteostasis collapse during aging. Neuronal knockdown of milton had more impact on proteostasis in young neurons than the old neurons (Figure 1). Proteome analyses also showed that age-related pathways, such as immune responses, are enhanced in young flies with milton knockdown (Table 2). The reduction in axonal transport of mitochondria may be one of the triggering events of age-related changes and accelerates the onset of aging in the brain.’

      L143 : Please remove the erroneously included quotation mark.

      Thank you for pointing it out. We corrected it.

      L145-L147:

      - While it is understood that Milton knockdown results in a reduction of mitochondria in axons, as reported previously and seemingly indicated in Figure 1E, this paper repeatedly refers to axonal depletion of mitochondria. Therefore, it would be beneficial to quantitatively assess the number of mitochondria in the axonal terminals located in the lamina via electron microscopy. Such quantification would robustly reinforce the argument that mitochondrial absence in axons is a consequence of Milton knockdown.

      Thank you for pointing it out. We included quantitation of the number of mitochondria in the synaptic terminals (Figure 1E).

      The text and figure legend was revised accordingly:

      Lines 156-157: ‘As previously reported(24), the number of mitochondria in presynaptic terminals decreased in milton knockdown (Figure 1E).’

      - The knockdown of Milton is known to reduce mitochondrial transport from an early stage, but what about swelling? By observing swelling at 1 day and 14 days, it may be possible to confirm the onset of swelling and discuss its correlation with the accumulation of ubiquitinated proteins.

      Quantitation of axonal swelling has also been included (Figure 1F).

      We appreciate reviewer’s comments on the correlation between the accumulation of ubiquitinated proteins and axonal swelling. Axonal swelling was not observed at 3-days-old (Iijima-Ando et al., PLoS Genetics, 2012), indicating that axonal swelling is an age-dependent event. Dense materials are found in swollen axons more often than in normal axons, suggesting a positive correlation between disruption of proteostasis and axonal damage. It would be interesting to analyze the time course of events further; however, we feel it is beyond the scope of this manuscript. We revised the text as below to include this discussion:

      Lines 157-159: ‘The swelling of presynaptic terminals, characterized by the enlargement and roundness, was not reported at 3-day-old(24) but observed at this age with about 4% of total presynaptic terminals (Figure 1F, asterisks).’

      Lines 162-167: ‘Dense materials are rarely found in age-matched control neurons, indicating that milton knockdown induces abnormal protein accumulation in the presynaptic terminals (Figure 1G and H). In milton knockdown neurons, dense materials are found in swollen presynaptic terminals more often than in presynaptic terminals without swelling, suggesting a positive correlation between the disruption of proteostasis and axonal damage (Figure 1G).’

      Lines 362-365: ‘Disruption of proteostasis is expected to contribute neurodegeneration(38), and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown ((24,29) and Figure 1) in detail with higher time resolution.’

      L147-L151: Though Figures 1F and 1G provide qualitative representations, it is advisable to quantitatively assess whether dense materials significantly accumulate. Such quantitative analysis would be required to verify the accumulation of dense materials in the context of the study.

      Thank you for pointing it out. We included quantitation of the number of neurons with dense material (Figure 1G). We revised the manuscript as follows:

      Line 161-163: ‘Dense materials are rarely found in age-matched control neurons, indicating that milton knockdown induces abnormal protein accumulation in the presynaptic terminals (Figure 1G and H).’

      Regarding Figure 1B, C:

      - Even though the count of puncta in the whole brain appears to be fewer than 400, the magnification of the optic lobe suggests a substantial presence of puncta. Please clarify in the Methods section what constitutes a puncta and whether the quantification in the whole brain is based on a 2D or 3D analysis. Detail the methodology used for quantification.

      Thank you for your comment. We revised the method section to include more details as below:

      Lines 434-437: ‘Quantitative analysis was performed using ImageJ (National Institutes of Health) with maximum projection images derived from Z-stack images acquired with same settings. Puncta was identified with mean intensity and area using ImageJ.’

      - What about 1-day-old specimens? Does Milton knockdown already show an increase in ubiquitinated protein accumulation at this early stage? Investigating whether ubiquitin-protein accumulation is involved in aging promotion or is already prevalent during developmental stages is a necessary experiment.

      Thank you for your comment. We carried out immunostaining with an anti-ubiquitin antibody in the brains at 1-day-old. No significant difference was detected between the control and milton knockdown. This result has been included as Figure S1 in the revised manuscript. The result section was revised as below:

      Line 136-139 ‘There was no significant increase in ubiquitinated proteins in milton knockdown flies at 1-day old, suggesting that the accumulation of ubiquitinated proteins caused by milton knockdown is age-dependent (Figure S1).’

      For Figure 1E: In the Electron Microscopy section of the Methods, define how swollen axons were identified and describe the quantification methodology used.

      Thank you for your comment. Swollen axons are, unlike normal axons, round in shape and enlarged. We revised the text as below;

      Lines 157-160: ‘The swelling of presynaptic terminals, characterized by the enlargement and roundness, was not reported at 3-day-old(24) but observed at this age with about 4% of total presynaptic terminals (Figure 1F, asterisks).’

      Lines 683-684, Figure 1 legend: ‘Swollen presynaptic terminals (asterisks in (F)), characterized by the enlargement and higher circularity, were found more frequently in milton knockdown neurons.’

      L218-L219: Throughout the text, the expression 'eIF2β is "upregulated" in response to Milton knockdown' is frequently used. However, considering the presented results, it might be more accurate to interpret that under the condition of Milton knockdown, eIF2β is not undergoing degradation but rather remains stable.

      Thank you for pointing it out. We replaced ‘upregulated’ with ‘increased’ throughout the text.

      L234-L235: On what basis is the conclusion drawn that there is a reduction? Given that three experiments have been conducted, it would be possible and more convincing to quantify the results to determine if there is a significant decrease.

      Thank you for pointing it out. We quantified the AUC of polysome fraction and carried out statistical analysis. There is a significant decrease in polysome in milton knockdown, and this result has been included in Figure 5B. We revised the figure and the legend accordingly.

      L236: 5H-> 4H

      Thank you for pointing it out, and we are sorry for the confusion. We corrected it.

      L238-L239: Since there is no significant difference observed, it may not be accurate to interpret a reduction in puromycin incorporation.

      Thank you for pointing it out. As described above, quantification of polysome fractions showed that milton knockdown significantly reduce polysome (Figure 5B). We revised the manuscript as below;

      Lines 263-264: ‘However, unexpectedly, we found that milton knockdown significantly reduced the level of mRNAs associated with polysomes (Figure 5A and B).’

      Figure 5D and Figure 6D: Climbing assays have been conducted, but I believe experiments should also be performed to examine whether overexpression or heterozygous mutants of eIF2β induce or suppress degeneration.

      Thank you for pointing it out. We analyzed the eyes with eIF2_β_ overexpression for neurodegeneration. Although there was a tendency of elevated neurodegeneration in the retina with eIF2_β_ overexpression, the difference between control and eIF2_β_ overexpression did not reach statistical significance (Figure S2). This result has been included as Figure S2 in the revised manuscript, and the following sentences have been included in the text:

      Lines 288-293: ‘We asked if eIF2β overexpression causes neurodegeneration, as depletion of axonal mitochondria in the photoreceptor neurons causes axon degeneration in an age-dependent manner(24). eIF2β overexpression in photoreceptor neurons tends to increase neurodegeneration in aged flies, while it was not statistically significant (p>0.05, Figure S2).’

      L271-L272: The results in Figure 6B are surprising. I anticipated a greater increase compared to the Milton knockdown alone. While p62 appears to be reduced, it is not clear why these results lead to the conclusion that lowering eIF2β rescues autophagic impairment. Please add a discussion section to address this point.

      Thank you for pointing it out. We apologize for the unclear description of the result. Milton knockdown flies show p62 accumulation (Figure 2), and deleting one copy of eIF2beta in milton knockdown background reduced p62 accumulation (Figure 7C). We revised the text as below:

      Lines 307-315: ‘Neuronal knockdown of milton causes accumulation of autophagic substrate p62 in the Triton X-100-soluble fraction (Figure 2B), and we tested if lowering eIF2β ameliorates it. We found that eIF2β heterozygosity caused a mild increase in LC3-I levels and decreases in LC3-II levels, resulting in a significantly lower LC3-II/LC3-I ratio in milton knockdown flies (Figure 7B). eIF2β heterozygosity decreased the p62 level in the Triton X-100-soluble fraction in the brains of milton knockdown flies (Figure 7C). The p62 level in the SDS-soluble fraction, which is not sensitive to milton knockdown (Figure 2B), was not affected (Figure 7C). These results suggest that suppression of eIF2β ameliorates the impairment of autophagy caused by milton knockdown.’

      L369: Please specify the source of the anti-ubiquitin antibody used.

      Thank you for pointing it out. We included the antibody information in the method section.

      Figure 7: While the relationship between Milton knockdown and the eIF2β and eIF2α proteins has been elucidated through the authors' efforts, I would like to see an investigation into whether eIF2β is upregulated and eIF2α phosphorylation is reduced in simply aged Drosophila. This would help us understand the correlation between aging and eIF2 protein dynamics.

      Thank you for your comment. We agree that it is an important question, and we are working on it. However, we feel that it is beyond the scope of the current manuscript.

      L645-L646: If the mushroom body is identified using mito-GFP, then include mito-GFP in the genotype listed in Supplementary Table 2.

      We are sorry for the oversight. We corrected it in Supplementary Table 2.

      Additionally, while it is presumed that the mito-GFP signal decreases in axons with Milton RNAi, how was the lobe tips area accurately selected for analysis? Please include these details along with a comprehensive description of the quantification methodology in the Methods section.

      Thank you for your comment. Although the mito-GFP signal in the axon is weak in the milton knockdown neurons, it is sufficient to distinguish the mushroom body structure from the background. We revised the method section to include this information in the method section:

      Line 437-438: ‘For eIF2α and p-eIF2α immunostaining, the mushroom body was detected by mitoGFP expression.’

    1. Author response:

      The following is the authors’ response to the original reviews.

      Point-by-point response to the public review:

      General Comment: “Using computational modeling, this manuscript explores the effect of growth feedback on the performance of gene networks capable of adaptation. The authors selected 425 hypothetical synthetic circuits that were shown to achieve nearly perfect adaptation in two earlier computational studies (see Ma et al. 2009, and Shi et al. 2017). They examined the effects of cell growth feedback by introducing additional terms to the ordinary differential equation-based models, and performed numerical simulations to check the retainment and the loss of the adaptation responses of the circuits in the presence of growth feedback. The authors show that growth feedback can disrupt the gene network adaptation dynamics in different ways, and report some exceptional core motifs which allow for robust performance in the presence of growth feedback. They also used a metric to establish a scaling law between a circuit robustness measure and the strength of growth feedback. These results have important implications in the field of synthetic biology, where unforeseen interactions between designed gene circuits and the host often disrupt the desired behavior. The paper’s conclusions are supported by their simulation results, although these are presented in their summary formats and it would be useful for the community if the detailed results for each topology were available as a supplementary file or through the authors’ GitHub repository.”

      We are grateful for the referee’s positive evaluation of our work. We have updated our GitHub and OSF repositories with detailed results for each topology. Additionally, we have included other simulation codes, result data, and detailed explanations in these two repositories that may be of interest to our readers.

      Strength 1: “This work included a detailed investigation of the reasons for adaptation failure upon introducing cell growth to the systems. The comprehensiveness of the analysis makes the work stand out among studies of functional screening of network topologies of gene regulation.”

      We are grateful for the referee’s positive assessment of our work, notably the recognition of the ‘detailed investigation’ we conducted, and the ‘comprehensiveness of the analysis’ we provided.

      Strength 2: “The authors’ approaches for assessment of robustness, such as the survival ratio Q, can be useful for a wide range of topologies beyond adaptation. The scaling law obtained with those approaches is interesting.”

      We are grateful for the referee’s positive evaluation of our defined factors for assessing circuit robustness. We also appreciate the acknowledgment of the “interesting” nature of the scaling law we discovered using the assessment factor R.

      Weaknesses 1: “The title suggests that the work investigates the ’effects of growth feedback on gene circuits’. However, the performance of ’nearly perfect adaptation’ was chosen for the majority of the work, leaving the question of whether the authors’ conclusion regarding the effects of growth feedback is applicable to other functional networks.”

      We agree that our present title can be too broad, and we have changed it from “Effects of growth feedback on gene circuits: A dynamical understanding” to “Effects of growth feedback on adaptive gene circuits: A dynamical understanding”. Although we have some brief results and discussions on the gene circuits with bistability, we admit that most of our results and discussions are focused on circuits that have adaptation.

      The new title is more specific and should be a more appropriate summary of the paper.

      Weaknesses 2: “This work relies extensively on an earlier study, evaluating only a selected set of 425 topologies that were shown to give adaptive responses (Shi et al., 2017). This limited selection has two potential issues. First, as the authors mentioned in the introduction, growth feedback can also induce emerging dynamics even without existing function-enabling gene circuits, as an example of the ”effects of growth feedback on gene circuits”. Limiting the investigation to only successful circuits for adaptation makes it unclear whether growth feedback can turn the circuits that failed to produce adaptation by themselves into adaptation-enabling circuits. Secondly, as the Shi et al. (2017) study also used numerical experiments to achieve their conclusions about successful topologies, it is unclear whether the numerical experiments in the present study are compatible with the earlier work regarding the choice of equation forms and ranges of parameter values. The authors also assumed that all readers have sufficient understanding of the 425 topologies and their derivation before reading this paper.”

      We agree with the reviewer that several issues need to be clarified in our new manuscript. We have added new discussions for all of them.

      We agree with the reviewer that growth feedback could turn the non-adaptive circuits into adaptationenabling circuits, and this indeed presents a compelling topic for future research. We have added the following discussions to our paper, talking about a relevant matter. We find that in our simulated dataset, there are cases where a higher degree of growth feedback can restore the adaptation that has been lost in a circuit. However, as we discussed in this new paragraph, a comprehensive study in the direction of turning non-adaptive circuits into adaptation-enabling circuits will “require entirely different approaches for sampling circuit parameters and selecting candidate network topologies, demanding significantly high computational costs.” Given that this topic extends beyond the scope of the current paper, we leave this matter to future research.

      “Although the primary focus of this paper is on how growth feedback can undermine an originally adaptive circuit and how to design circuits that are robust against such feedback, our simulated dataset reveals instances where growth feedback can benefit the circuit within certain ranges. Specifically, we identified 2,092 circuits across 306 different topologies where adaption, lost at an intermediate level of growth feedback, is restored at higher levels. This is 1.4% of all circuits tested. We anticipate that additional circuits exhibiting this loss-and-recovery behavior exist, as our sampling of six discrete levels of k<sub>g</sub> (0,0.2,0.4,0.6,0.8,1.0) might have overlooked numerous cases. This result again suggests the possible advantages of growth feedback in gene circuits (Tan et al., 2009; Nevozhay et al., 2012; Deris et al., 2013; Feng et al., 2014; Melendez-Alvarez and Tian, 2022). A comprehensive study into how growth feedback can endow or enhance adaption in circuits would require entirely different approaches for sampling circuit parameters and selecting candidate network topologies, demanding significantly high computational costs. Given that this topic extends beyond the scope of the current paper, we leave this matter to future research.”

      We have added the following discussions about the reasoning behind using the 425 network topologies selected from the study Shi et al. (2017).

      “We use these 425 network topologies from the study (Shi et al., 2017), avoiding redundancy with established results. Due to the unique focus of our research on the effects of growth feedback and the need to evaluate quantitative ratios of robust circuits among all functional ones, we have chosen to use a 20-fold increase in the number of random parameter sets for each network topology compared to the simulations in (Shi et al., 2017). This approach makes it computationally prohibitive to scan all possible 16,038 three-node circuits. We carefully follow the settings in (Shi et al., 2017), which also analyzed TRNs with the AND logic as in this paper. Detailed descriptions of our simulation experiments are provided in the Methods section. To make our results more convincing, we have adopted a set of adaptation criteria that are stricter than those used in (Shi et al., 2017). Consequently, the ratio of adaptive circuits is somewhat lower in our study, with 4 out of the 425 network topologies not demonstrating adaptation.”

      Other than the more strict adaptation criteria and much larger sampling sizes, as we mentioned in this paragraph, we have carefully followed the simulation details of the study Shi et al. (2017). This includes but is not limited to: the dynamical equations (when k<sub>g</sub> = 0), the input signals, the scales and ranges of the circuit parameters to be randomly sampled, and the sampling method (Latin hypercube sampling). One of the authors of the current paper was also the first author of the study Shi et al. (2017), who helped us verify the details of simulations (among many other contributions). These identical settings justify our usage of the established results with the 425 network topologies.

      To provide more information about these 425 network topologies, We have added the following introduction. It introduces the structural features of the networks, especially the shared core motifs for adaptation. In our GitHub and OSF repositories, we have also provided relevant data about the 425 topologies, including the topology structures and the parameter sets we scanned.

      “These topologies can be classified into two families based on the core topology: networks with a negative feedback loop (NFBL) and networks with an incoherent feed-forward loop (IFFL) (Shi et al., 2017). More specifically, there are 206 network topologies in the NFBL family. All of these NFBL topologies have a negative feedback loop for node B. This negative feedback loop can be formed by the loop from node B to A and back to B (such as the circuit shown in Fig. 1 (a)), by node B to C and back to B, or by a longer route, from node B to A and then to C and back to B. There is always a self-activation link from B to B in all these 206 NFBL networks. There are 219 network topologies in the IFFL family. All of them have two feed-forward pathways from the input node A to the output node C. One pathway goes from node A to C directly, while the other involves node B in the middle. One of the pathways is activating while the other one is inhibitory.”

      Weaknesses 3: “The authors’ model does not describe the impact of growth via a biological mechanism: they model growth as an additional dilution rate and calculate growth rate based on a phenomenological description with growth rate occurring at a maximum (k<sub>g</sub>) scaled by the circuit ’burden’ b(t). Therefore, the authors’ model does not capture potential growth rate changes in parameter values (e.g., synthetic protein production falls with increasing growth rate; see Scott & Hwa, 2023).”

      In our paper, we consider dilution due to cell growth as the dominant factor of growth feedback. Here we compared the adaptive circuits under no-growth conditions and their ability to maintain their adaptive behaviors after dilution into a fresh medium, which mediated a significant dilution to the circuits. This is based on our previous work, Zhang, et al. Nature chemical biology 16.6 (2020): 695-701. We agree that an increased growth rate can change synthetic protein production. However, the dynamic roles of the dilution and growthaffected production rate should be analogous, given that they both act as inhibitory factors arising from cell growth as mentioned by the reviewer. Still, we agree that taking the growth effect on the production rate into account would provide a more comprehensive study, but it is beyond the scope of the present work. We have added the following paragraph in the Discussion section of our paper.

      “In our paper, we consider dilution due to cell growth as the dominant factor of growth feedback. Here we compared the adaptive circuits under no-growth conditions and their ability to maintain their adaptive behaviors after dilution into a fresh medium, which mediated a significant dilution to the circuits. This is based on our previous work (Zhang et al. (2020)). However, growth feedback is inherently complex (Klumpp et al. (2009)). For instance, an increased growth rate can change protein synthesis rate (Hintsche and Klumpp (2013); Scott and Hwa (2023)), and cell growth rates can affect the distribution of protein expression in cell populations (Gouda et al. (2019)). In our paper, we concentrate on a simplified model with dilution, which we consider to have captured the dominant factor. The dynamic roles of the dilution and growth-affected production rate should be analogous, given that they both act as inhibitory factors arising from cell growth. Incorporating the impact of growth rate on protein synthesis into our model would offer a more comprehensive analysis, a task beyond the scope of this paper but presenting an intriguing opportunity for future research to address the complexities of growth feedback.”

      Weaknesses 4: “The authors made several claims about the bifurcations (infinite-period, saddle-node, etc) underlying the abrupt changes leading to failures of adaptations. There is a lack of evidence supporting these claims. Both local and global bifurcations can be demonstrated with semi-analytic approaches such as numerical continuation along with investigations of eigenvalues of the Jacobian matrix. The claims based on ODE solutions alone are not sound.”

      After our further simulations and verification, we found that most of the bifurcation-induced failures we mentioned in type-V and type-VI failures should be categorized as bistability or multistability-induced failures. They are still abrupt switching between adaptive and non-adaptive states, as we described in the previous version of the manuscript. However, they are actually still far away from the bifurcation points at the critical k<sub>g</sub>. We have corrected all relevant descriptions and figures, including panel Fig. 4 (c) and its captions. We have added the following paragraph in the paper to explain this issue.

      “One might expect bifurcations to play an important role in many type-V and type-VI failures. However, in our simulations, failures precisely at the bifurcation point are not observed. This is because the bifurcation points under consideration, such as fold bifurcations, are where one of the attraction basins diminishes to zero. For a failure to occur exactly at the bifurcation point, the initial condition would need to coincide precisely with the infinitesimally small basin just before it vanishes. More realistically, failures almost always largely precede the exact bifurcation point. They happen while the basin is still contracting and the basin boundary crosses the initial condition or O<sub>1</sub>. An example is shown in Fig. 4(b), where bistability persists, yet the lighter orange basin with a larger O<sub>1</sub>(C) cannot be reached as the boundary shifts away from the initial condition A<sub>0</sub> and B<sub>0</sub>. As another example, in Fig. 4 (c) from a different circuit, the higher O<sub>2</sub>(C) state disappears at k<sub>g</sub> ≈ 0.012 and switches to a lower O<sub>2</sub>(C), but this point is not a bifurcation.

      It is the point where the stable O<sub>1</sub> continuously crosses the basin boundary of O<sub>2</sub>.”

      Our further simulations have verified the existence of the oscillation-related bifurcations. We have added a new appendix discussing the phenomena associated with them in more detail.

      Weaknesses 5: “The impact of biochemical noise is not evaluated in this work; the author’s analysis is only carried out in a deterministic regime.”

      In this paper, we have not taken into account biochemical noise as we focus solely on scenarios where all protein concentrations are high. In these circumstances, the influence of noise is relatively minor. Incorporating biochemical noise, which originates from various sources and possesses diverse characteristics, would significantly complicate the analysis beyond the scope of our current work. However, exploring this aspect could be an intriguing avenue for future research. We have included the following discussions in our paper.

      “Our study focuses on scenarios where random noises are ignored. Realistically, gene circuits are subjected to diverse types of noise, which can complicate their predictable behavior and design. These noises can originate externally from a noisy input signal I, or intrinsically, directly affecting the circuit components. Further, these noises can be classified based on various mechanisms that cause them (Colin et al. (2017); Sartori and Tu (2011)) . And with different mechanisms, each type of noise can be characterized by different attributes such as frequency, amplitude, and noise color. These variances can lead to different impacts on the circuits, potentially necessitating unique mechanisms or designs for the attenuation of each category (Sartori and Tu (2011); Qiao et al. (2019) ). Given the extensive complexity and the need for thorough investigation, these noise-related challenges are beyond the scope of this paper and require a series of future studies.”

      Point-by-point response to the recommendations for the authors:

      Comment 1: - The authors’ github repository, detailed in their code availability statement, is currently unavailable and likely contains some of the answers to the queries here.

      We have updated our GitHub and OSF repositories with simulation codes, result data, and detailed explanations. The link to our GitHub repository in the previous version of the manuscript contained a format error, making it inaccessible to the referees. We apologize for this mistake and have corrected it.

      Comment 2:   - At present, it is not clear how the 425 topologies are created from the system of equations (Eq. 6-8) or from the circuit diagram in Fig 1a. This could do with being explicitly stated for the reader.

      We have added the following paragraph to discuss how the 425 topologies are selected and what the common motifs and connections they share.

      “Previous research identified 425 different three-node TRN network topologies that can achieve adaptation in the absence of growth feedback (Shi et al., 2017), providing the base of our computational study. These topologies can be classified into two families based on the core topology: networks with a negative feedback loop (NFBL) and networks with an incoherent feed-forward loop (IFFL) (Shi et al., 2017). More specifically, there are 206 network topologies in the NFBL family. All of these NFBL topologies have a negative feedback loop for node B. This negative feedback loop can be formed by the loop from node B to A and back to B (such as the circuit shown in Fig. 1 (a)), by node B to C and back to B, or by a longer route, from node B to A and then to C and back to B. There is always a self-activation link from B to B in all these 206 NFBL networks. There are 219 network topologies in the IFFL family. All of them have two feed-forward pathways from the input node A to the output node C. One pathway goes from node A to C directly, while the other involves node B in the middle. One of the pathways is activating while the other one is inhibitory. We use these 425 network topologies from the study (Shi et al., 2017), avoiding redundancy with established results. Due to the unique focus of our research on the effects of growth feedback and the need to evaluate quantitative ratios of robust circuits among all functional ones, we have chosen to use a 20-fold increase in the number of random parameter sets for each network topology compared to the simulations in (Shi et al., 2017). This approach makes it computationally prohibitive to scan all possible 16,038 three-node circuits. We carefully follow the settings in (Shi et al., 2017), which also analyzed TRNs with the AND logic as in this paper. Detailed descriptions of our simulation experiments are provided in the Methods section. To make our results more convincing, we have adopted a set of adaptation criteria that are stricter than those used in (Shi et al., 2017). Consequently, the ratio of adaptive circuits is somewhat lower in our study, with 4 out of the 425 network topologies not demonstrating adaptation.”

      Comment 3: - In the main text, the authors mentioned that they chose 425 network topologies for this study, whereas the number is 435 in the abstract. Please correct the error.

      The number 435 in our previous abstract referred to the 10 four-node circuits that we studied in the appendix, in addition to the 425 three-node network topologies. To avoid confusion and potential misunderstandings among readers, we have revised this expression of “435 distinct topological structures” to “more than four hundred topological structures”.

      Comment 4: - Please can the authors include the topologies they have studied in an appendix or as supplementary material. The impact of this work would increase significantly if for each topology the authors could include a pie chart similar to the one shown in Fig 2 so that others can use these results.

      We fully acknowledge the potential benefits of providing simulation results for each topology. However, including over four hundred more figures in this paper is not feasible. Moreover, we expect that many readers may also be interested in results not only for individual topologies but also for subsets sharing specific motifs or regulatory connections. Therefore, we have provided all the necessary data and codes in our GitHub repository to make these pie charts. We have included a detailed guide on how to generate these pie charts in the GitHub Readme file. These allow readers to plot the pie chart and extract distributions for any individual topology or use conditions to filter any subset of topologies as required. We believe this approach offers greater flexibility for our readers. We have also added the following explanation in the Methods section.

      “The codes implementing these criteria are available in our GitHub repository, with the link provided in the ”Code Availability” section. The failure type results for all circuits tested are available in our OSF repository, with the link provided in the ”Data Availability” section. An additional note is provided in the README file of our GitHub repository for further guidance on generating pie charts similar to Fig. 2 for any network topology or subset of topologies.”

      Comment 5: - At present, the authors have not given sufficient detail for their numerical methods (e.g. to identify bistability or oscillations) to enable the work to be repeated. I would appreciate it if the authors could expand their Methods section or provide a description of their method as an appendix. Additionally, the authors must clarify how many parameter sets per topology showed successful adaptation.

      In response to this comment, we have reorganized and expanded our Methods section, especially the new “Numerical simulations of circuit dynamics” and “Numerical criteria for functional adaptation and failure types” subsections. We added details on how we define and evaluate a “relatively steady state”, how to determine if there is an oscillation, how to determine the critical k<sub>g</sub> value, and how to determine if a failure is continuous or abrupt. Readers can also find the corresponding codes in our GitHub repository, where we provide a README file to help the readers locate the script file they need.

      The number of parameter sets per topology showed successful adaptation is precisely our definition of the Q-value. Q-values of most of the circuits we tested are shown in multiple figures in the paper. A complete table of Q-values with different topologies and different k<sub>growth</sub> values can be found in our OSF repository.

      Comment 6: - Looking at the Model Description, there seem to be multiple issues, as follows. The model should be rewritten and all simulations redone with the model corrected as described below:

      (a) The ”strength of growth feedback” is modeled by the maximal growth parameter k<sub>g</sub> in Equation (12). However, this rate does not represent growth feedback. In fact, this parameter must be present also for the system without growth feedback, Equations (6 - 8), because those cells grow as well! So Equation (12) with b(t)=0 should also be added to Equations (6 - 8), in addition to the dilution terms in each equation.

      (b) The dilution due to growth (dN/dt)*(B/N) is only added to Equations (9 - 11). This is wrong - growthaffects (dilutes) all protein concentrations, even without growth feedback, so similar terms must be added even to equations without growth feedback, i.e., to Equations (6 - 8).

      (c) The term representing growth feedback is actually the fraction 1/(1+b(t)). To adjust the strength ofgrowth feedback, some parameters should be introduced into this term. Specifically, the term currently has a Hill form with Hill coefficient = 1 and sensitivity = 1. The term should be converted into a general Hill function, and the parameters of that function should be altered to represent growth feedback. This Hill function is called a cellular (phenotypic) fitness landscape, see Nevozhay et al., 2012.

      Equations (6-8) only describe one part of the entire model we are studying. We are having these equations presented solely for the purpose of not overwhelming readers with a large number of parameters that are defined for the first time. They are not actually used in our simulations, but were only for explanations of the meaning of parameters. In our simulations throughout the paper, we only used Eqs. (9-13) (with various topologies). We have revised the texts to make this point clear. We have added the following descriptions in the section Model Description:

      “In order not to overwhelm readers with too many terms and parameters, we first describe a partial model (an isolated circuit without growth feedback) before introducing the complete model that we study in this work.”

      “Equations. (9) to (13) are the dynamical equations we actually use for simulating the circuit dynamics.”

      Additionaly, in the newly added subsection “Numerical simulations of circuit dynamics683” in the Methods, we explicitly mention that:

      “The dynamical equations we use are similar to Eqs. (9-13) but with different topologies.”

      We consider dilution due to cell growth as the dominant factor of growth feedback. In fact, we study the adaptive circuits without growth and their ability to maintain their adaptive behaviors after dilution into a fresh medium, based on a recent work [Zhang, et al., Nature Chemical Biology 16.6 (2020): 695-701]. The dynamic roles of the dilution and growth-affected production rate should be analogous, given that they both act as inhibitory factors arising from cell growth. The term mentioned in the comment is about how the burden of the circuit affects cell growth. We agree that it can be interesting to have a more comprehensive study on how different degrees of nonlinearity of this term can have different effects on the overall robustness towards the growth feedback problem, but this is not part of our primary focus and is beyond the scope of this paper. In this paper, we are mostly concerned with the variability of the strength of the growth feedback/dilution, controlled by the parameter k<sub>g</sub>, instead of the different types of nonlinearity.

      Comment 7:  - On the right side of Equation (7), the first term should be inhibitory, right?

      This is indeed an error. We accidentally reversed the regulation from A to B and B to A when inputting the formula. We have corrected both terms.

      Comment 8: - It seems to me that a better transition from Figs 6 and 7 to Fig 8 can be made. Did the authors choose the three circuits in Fig 8 based on the three distinct groups shown in Fig 6 and 7? The rationale for choosing the three topologies given the clusters identified earlier can be explained more clearly.

      We agree more explanation can be provided here. We have added the following descriptions, in the caption of Fig.8:

      “The other three curves represent circuits with different robustness levels: high (Circuit No. 98), moderate (Circuit No. 3), and low (Circuit No. 28) values of R, to demonstrate that this scaling behavior is generic. Each of these three circuit topologies is selected from one of the three groups illustrated in Fig. 6 and Fig. 7, and they have the highest Q(k<sub>g</sub> = 0) value within their respective groups.”

      and in the main text:

      “The three other curves represent circuit topologies that have a relatively high, moderate, and low value R among the 425 topologies tested, to demonstrate that this scaling behavior is generic. (These three topologies are the highest Q(k<sub>g</sub> = 0) topology in each of the three groups shown in Fig. 6 and Fig. 7.”

      Comment 9: - The insights from the neural network model seem to be very limited. It would be interesting to see if the model can predict the performance of network topologies that have not been exposed to the model during training.

      Machine learning is not a focus of this paper. For the section the comment was referring to, the main research question is on the relationship between circuit robustness and topology, and the point we are trying to make is that the robustness dependency varies across different connections — some connections are critical, while others are less impactful. The neural-network-based analysis was only used to provide further support to this point by demonstrating that through optimization, neural networks automatically assign different levels of weights to different connections in the circuits.

      We agree that it can be an interesting topic to study how machine learning can be used to help us design functional and robust circuits, as discussed in the final paragraph of the Discussion section. However, such an investigation would require a series of more comprehensive and carefully designed simulation experiments to validate if “neural networks can predict the performance of network topologies that have not been exposed to the model during training”. One point one should take extra care of is that many network topologies we study are very similar to many others, with shared motifs and links. These considerations extend beyond the scope of this paper.

      Other potential improvements or future work

      Comment 10: - The growth feedback examined in this paper comes from the effect of protein levels on the cell division rate (growth rate). However, the opposite effect can also occur; cell growth rates can affect the distribution of protein expression in cell populations. A good reference is Kheir Gouda et al., which is already on the list of references. These opposite effects should be described and discussed.

      We agree that growth feedback is inherently complex and has many biological effects, and in our paper, we are using a simplified model to study the dominant factor of growth feedback. We have added the following paragraph in the Discussion section, which involves the opposite effect mentioned in the comment.

      “In our paper, we consider dilution due to cell growth as the dominant factor of growth feedback. Here we compared the adaptive circuits under no-growth conditions and their ability to maintain their adaptive behaviors after dilution into a fresh medium, which mediated a significant dilution to the circuits. This is based on our previous work (Zhang et al. (2020)). However, growth feedback is inherently complex (Klumpp et al. (2009)). For instance, an increased growth rate can change protein synthesis rate (Hintsche and Klumpp (2013); Scott and Hwa (2023)), and cell growth rates can affect the distribution of protein expression in cell populations (Gouda et al. (2019)). In our paper, we concentrate on a simplified model with dilution, which we consider to have captured the dominant factor. The dynamic roles of the dilution and growth-affected production rate should be analogous, given that they both act as inhibitory factors arising from cell growth. Incorporating the impact of growth rate on protein synthesis into our model would offer a more comprehensive analysis, a task beyond the scope of this paper but presenting an intriguing opportunity for future research to address the complexities of growth feedback.”

      Comment11: - It may be worth mentioning that growth feedback can lead to persistence, see PMID:27010473.

      We have included this research as a citation.

      Comment 12: - While some other networks (two-node) are discussed, it would be worth doing this analysis for all one- and two-node networks, perhaps controlled by small molecules added externally. If not here, then as a future plan.

      We agree that this is an interesting idea for future studies.

      Comment 13: - The manuscript analyzes the deterministic dynamics of a set of gene networks. However, gene expression is always stochastic, and gene circuits have been designed to control stochastic gene expression. For example, gene expression distributions can be reshaped, or even new peaks can appear, which would be worth mentioning, PMID: 30341217. The effect of growth feedback on stochastic gene expression and future perspectives of systematically studying this should be discussed.

      We have added the following paragraph in the Discussion section to discuss the effects of noises and stochasticity. The research mentioned in the comment is also included.

      “Our study focuses on scenarios where random noises are ignored. Realistically, gene circuits are subjected to diverse types of noise, which can complicate their predictable behavior and design. These noises can originate externally from a noisy input signal I, or intrinsically, directly affecting the circuit components. Further, these noises can be classified based on various mechanisms that cause them (Colin et al. (2017); Sartori and Tu (2011)). And with different mechanisms, each type of noise can be characterized by different attributes such as frequency, amplitude, and noise color. These variances can lead to different impacts on the circuits, potentially necessitating unique mechanisms or designs for the attenuation of each category (Sartori and Tu (2011); Qiao et al. (2019)). Given the extensive complexity and the need for thorough investigation, these noise-related challenges are beyond the scope of this paper and require a series of future studies.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors present a cornucopia of data generated using deep mutational scanning (DMS) of variants in MET kinase, a protein target implicated in many different forms of cancer. The authors conducted a heroic amount of deep mutational scanning, using computational structural models to augment the interpretation of their DMS findings.

      Strengths:

      This powerful combination of computational models, experimental structures in the literature, dose-response curves, and DMS enables them to identify resistance and sensitizing mutations in the MET kinase domain, as well as consider inhibitors in the context of the clinically relevant exon-14 deletion. They then try to use the existing language model ESM1b augmented by an XGBoost regressor to identify key biophysical drivers of fitness. The authors provide an incredible study that has a treasure trove of data on a clinically relevant target that will appeal to many.

      We thank Reviewer 1 for their generous assessment of our manuscript!

      Weaknesses:

      However, the authors do not equally consider alternative possible mechanisms of resistance or sensitivity beyond the impact of mutation on binding, even though the measure used to discuss resistance and sensitivity is ultimately a resistance score derived from the increase or decrease of the presence of a variant during cell growth.

      For this resistance screen, Ba/F3 was a carefully chosen cellular selection system due to its addiction to exogenously provided IL-3, undetected expression of endogenous RTKs (including MET), and dependence on kinase transgenes to promote signaling and growth under IL-3 withdrawal. Together this allows for the readout of variants that alter kinase-driven proliferation without the caveat of bypass resistance. In our previous phenotypic screen (Estevam et al., 2024, eLife), we also carefully examined the impact of all possible MET kinase domain mutations both in the presence and absence of IL-3 withdrawal, but no inhibitors. There, we identified a small group of mutations that were associated with gain-of-function behavior located at conserved regulatory motifs outside of the catalytic site, yet these mutations were largely sensitive to inhibitors within this screen.

      Here, the majority of resistance mutations were located at or near the ATP-binding pocket, suggesting an impact on resistance through direct drug interactions. However, there was also a small population of distal mutations that met our statistical definitions of resistance. Within the crizotinib selection, sites such as T1293, L1272, T1261, amongst others, demonstrated resistance profiles but were located in C-lobe away from the catalytic site. While we did not experimentally validate these specific mutations, it is possible that non-direct drug binders instead promote resistance through allosteric or conformational mechanisms which preserve kinase activity and signaling. Indeed, our ML framework explicitly included conformational and stability effects as significant in improving predictions.

      We would be happy to further discuss any specific alternative resistance mechanisms Reviewer 1 has in mind! Thank you for highlighting this!

      There are also points of discussion and interpretation that rely heavily on docked models of kinase-inhibitor pairs without considering alternative binding modes or providing any validation of the docked pose. Lastly, the use of ESM1b is powerful but constrained heavily by the limited structural training data provided, which can lead to misleading interpretations without considering alternative conformations or poses.

      The majority of our interpretations are grounded in the X-ray structures of WT MET bound to the inhibitors studied (or close analogs). The use of docked models (note - to mutant structures predicted by UMol, not ESM, that can have conformational changes) is primarily in the ML part of the manuscript. Indeed, in our models, conformational and binding mode changes are taken into account as features (see Ligand RMSD, Residue RMSD). There are certainly improved methods (AF3 variants) emerging that might have even more power to model these changes, but they come with greater computational costs and are something we will be evaluating in the future.

      We added to the results section: “While our features can account for some changes in MET-mutant conformation and altered inhibitor binding pose, the prediction of these aspects can likely be improved with new methods.”

      Reviewer #2 (Public review):

      Summary:

      This manuscript provides a comprehensive overview of potential resistance mutations within MET Receptor Tyrosine Kinase and defines how specific mutations affect different inhibitors and modes of target engagement. The goal is to identify inhibitor combinations with the lowest overlap in their sensitivity to resistant mutations and determine if certain resistance mutations/mechanisms are more prevalent for specific modes of ATP-binding site engagement. To achieve this, the authors measured the ability of ~6000 single mutants of MET's kinase domain (in the context of a cytosolic TPR fusion) to drive IL-3-independent proliferation (used as a proxy for activity) of Ba/F3 cells (deep mutational profiling) in the presence of 11 different inhibitors. The authors then used co-crystal and docked structures of inhibitor-bound MET complexes to define the mechanistic basis of resistance and applied a protein language model to develop a predictive model of inhibitor sensitivity/resistance.

      Strengths:

      The major strengths of this manuscript are the comprehensive nature of the study and the rigorous methods used to measure the sensitivity of ~6000 MET mutants in a pooled format. The dataset generated will be a valuable resource for researchers interested in understanding kinase inhibitor sensitivity and, more broadly, small molecule ligand/protein interactions. The structural analyses are systematic and comprehensive, providing interesting insights into resistance mechanisms. Furthermore, the use of machine learning to define inhibitor-specific fitness landscapes is a valuable addition to the narrative. Although the ESM1b protein language model is only moderately successful in identifying the underlying mechanistic basis of resistance, the authors' attempt to integrate systematic sequence/function datasets with machine learning serves as a foundation for future efforts.

      We thank Reviewer 2 for their thoughtful assessment of our manuscript!

      Weaknesses:

      The main limitation of this study is that the authors' efforts to define general mechanisms between inhibitor classes were only moderately successful due to the challenge of uncoupling inhibitor-specific interaction effects from more general mechanisms related to the mode of ATP-binding site engagement. However, this is a minor limitation that only minimally detracts from the impressive overall scope of the study.

      We agree. We have added to the discussion: “A full landscape of mutational effects can help to predict drug response and guide small molecule design to counteract acquired resistance. The ability to define molecular mechanisms towards that goal will likely require more purposefully chosen chemical inhibitors and combinatorial mutational libraries to be maximally informative.”

      Reviewer #3 (Public review):

      Summary:

      In the manuscript 'Mapping kinase domain resistance mechanisms for the MET receptor tyrosine kinase via deep mutational scanning' by Estevam et al, deep mutational scanning is used to assess the impact of ~5,764 mutants in the MET kinase domain on the binding of 11 inhibitors. Analyses were divided by individual inhibitor and kinase inhibitor subtypes (I, II, I 1/2, and III). While a number of mutants were consistent with previous clinical reports, novel potential resistance mutants were also described. This study has implications for the development of combination therapies, namely which combination of inhibitors to avoid based on overlapping resistance mutant profiles. While one suggested pair of inhibitors with the least overlapping resistance mutation profiles was suggested, this manuscript presents a proof of concept toward a more systematic approach for improved selection of combination therapeutics. Furthermore, in a final part of this manuscript the data was used to train a machine learning model, the ESM-1b protein language model augmented with an XG Boost Regressor framework, and found that they could improve predictions of resistance mutations above the initial ESM-1b model.

      Strengths:

      Overall this paper is a tour-de-force of data collection and analysis to establish a more systematic approach for the design of combination therapies, especially in targeting MET and other kinases, a family of proteins significant to therapeutic intervention for a variety of diseases. The presentation of the work is mostly concise and clear with thousands of data points presented neatly and clearly. The discovery of novel resistance mutants for individual MET inhibitors, kinase inhibitor subtypes within the context of MET, and all resistance mutants across inhibitor subtypes for MET has clinical relevance. However, probably the most promising outcome of this paper is the proposal of the inhibitor combination of Crizotinib and Cabozantib as Type I and Type II inhibitors, respectively, with the least overlapping resistance mutation profiles and therefore potentially the most successful combination therapy for MET. While this specific combination is not necessarily the point, it illustrates a compelling systematic approach for deciding how to proceed in developing combination therapy schedules for kinases. In an insightful final section of this paper, the authors approach using their data to train a machine learning model, perhaps understanding that performing these experiments for every kinase for every inhibitor could be prohibitive to applying this method in practice.

      We thank Reviewer 3 for their assessment of our manuscript (we are very happy to have it described as a tour-de-force!)

      Weaknesses:

      This paper presents a clear set of experiments with a compelling justification. The content of the paper is overall of high quality. Below are mostly regarding clarifications in presentation.

      Two places could use more computational experiments and analysis, however. Both are presented as suggestions, but at least a discussion of these topics would improve the overall relevance of this work. In the first case it seems that while the analyses conducted on this dataset were chosen with care to be the most relevant to human health, further analyses of these results and their implications of our understanding of allosteric interactions and their effects on inhibitor binding would be a relevant addition. For example, for any given residue type found to be a resistance mutant are there consistent amino acid mutations to which a large or small or effect is found. For example is a mutation from alanine to phenylalanine always deleterious, though one can assume the exact location of a residue matters significantly. Some of this analysis is done in dividing resistance mutants by those that are near the inhibitor binding site and those that aren't, but more of these types of analyses could help the reader understand the large amount of data presented here. A mention at least of the existing literature in this area and the lack or presence of trends would be worthwhile. For example, is there any correlation with a simpler metric like the Grantham score to predict effects of mutations (in a way the ESM-1b model is a better version of this, so this is somewhat implicitly discussed).

      Indeed we experimented with including these types of features in the XGBoost scheme (particularly residue volume change and distance) to augment the predictive power of the ESM model - see Figure 8 - figure supplement 1; however, we didn’t find them as significant. Therefore, the signal is likely very small and/or incorporated into the baseline ESM model.

      Indeed, this discussion relates to the second point this manuscript could improve upon: the machine learning section. The main actionable item here is that this results section seems the least polished and could do a better job describing what was done. In the figure it looks like results for certain inhibitors were held out as test data - was this all mutants for a single inhibitor, or some other scheme? Overall I think the implications of this section could be fleshed out, potentially with more experiments.

      Figure 8A and the methods section contain a very detailed explanation of test data. We have thought about it and do not have any easy path to improve the description, which we reproduce here:

      “Experimental fitness scores of MET variants in the presence of DMSO and AMG458 were ignored in model training and testing since having just one set of data for a type I ½ inhibitor and DMSO leads to learning by simply memorizing the inhibitor type, without generalizability. The remaining dataset was split into training and test sets to further avoid overfitting (Figure 8A). The following data points were held out for testing - (a) all mutations in the presence of one type I (crizotinib) and one type II (glesatinib analog) inhibitor, (b) 20% of randomly chosen positions (columns) and (c) all mutations in two randomly selected amino acids (rows) (e.g. all mutations to Phe, Ser). After splitting the dataset into train and test sets, the train set was used for XGBoost hyperparameter tuning and cross-validation. For tuning the hyperparameters of each of the XGBoost models, we held out 20% of randomly sampled data points in the training set and used the remaining 80% data for Bayesian hyperparameter optimization of the models with Optuna (Akiba et al., 2019), with an objective to minimize the mean squared error between the fitness predictions on 20% held out split and the corresponding experimental fitness scores. The following hyperparameters were sampled and tuned: type of booster (booster - gbtree or dart), maximum tree depth (max_depth), number of trees (n_estimators), learning rate (eta), minimum leaf split loss (gamma), subsample ratio of columns when constructing each tree (colsample_bytree), L1 and L2 regularization terms (alpha and beta) and tree growth policy (grow_policy - depthwise or lossguide). After identifying the best combination of hyperparameters for each of the models, we performed 10-fold cross validation (with re-sampling) of the models on the full training set. The training set consists of data points corresponding to 230 positions and 18 amino acids. We split these into 10 parts such that each part corresponds to data from 23 positions and 2 amino acids. Then, at each of 10 iterations of cross-validation, models were trained on 9 of 10 parts (207 positions and 16 amino acids) and evaluated on the 1 held out part (23 positions and 2 amino acids). Through this protocol we ensure that we evaluate performance of the models with different subsets of positions and amino acids. The average Pearson correlation and mean squared error of the models from these 10 iterations were calculated and the best performing model out of 8192 models was chosen as the one with the highest cross-validation correlation. The final XGBoost models were obtained by training on the full training set and also used to obtain the fitness score predictions for the validation and test sets. These predictions were used to calculate the inhibitor-wise correlations shown in Figure 8B.“

      As mentioned in the 'Strengths' section, one of the appealing aspects of this paper is indeed its potential wide applicability across kinases -- could you use this ML model to predict resistance mutants for an entirely different kinase? This doesn't seem far-fetched, and would be an extremely compelling addition to this paper to prove the value of this approach.

      This is exactly where we want to go next! But as we see here, it is going to be hard and require more purposeful selection of chemicals and likely combinatorial mutations to be maximally informative (see also reviewer 2 response where we have added text)

      Another area in which this paper could improve its clarity is in the description of caveats of the assay. The exact math used to define resistance mutants and its dependence on the DMSO control is interesting, it is worth discussing where the failure modes of this procedure might be. Could it be that the resistance mutants identified in this assay would differ significantly from those found in patients? That results here are consistent with those seen in the clinic is promising, but discrepancies could remain.

      Thank you for pointing this out. The greatest trade-off of probing the intracellular MET kinase (juxtamembrane, kinase domain, c-tail) in the constitutively active TPR system is that while we gain cytoplasmic expression, constitutive oligomerization, and HGF-independent activation, other features like membrane-proximal effects are lost and translatability of some mutations in non-proliferative conditions may also be limited. Nevertheless, Ba/F3 allows IL-3 withdrawal to serve as an effective variant readout of transgenic kinase variant effects due to its undetectable expression of endogenous RTKs and addiction to exogenous interleukin-3 (IL-3).

      In our previous study, we were also interested in comparing the phenotypic results to available patient populations in cBioPortal. We observed that our DMS captured known oncogenic MET kinase variants, in addition to a population of gain-of-function variants within clinical residue positions that have not been clinically reported. Interestingly, the population of possible novel gain-of-function mutant codons were more distant in genetic space (2-3 Hamming distance) from wild type than the clinically reported variant codon (1-2 Hamming distance).

      For this inhibitor screen, we also carefully compared previously reported and validated resistance mutations across referenced publications to that of our inhibitor screen, and observed large agreement as noted in-text. While discrepancies could definitely remain, there is precedence for consistency.

      Furthermore a more in depth discussion of the MetdelEx14 results is warranted. For example, why is the DMSO signature in Figure 1 - supplement 4 so different from that of Figure 1?

      In our previous study (Estevam et al., 2024), we more directly compared MET and METΔExon14, and while observed several differences, especially at conserved regulatory motifs, the TPR expression system did not provide a robust differential. Therefore, we hypothesize that a membrane-bound context is likely necessary to obtain a differential that captures juxtamembrane regulatory effects for these two isoforms. For that reason, we did not place heavy emphasis on the differences between MET and METΔExon14 in this study. Nevertheless, we performed parallel analysis of the METΔExon14 inhibitor DMS and provided all source and analyzed data in our GitHub repository (https://github.com/fraser-lab/MET_kinase_Inhibitor_DMS).

      In our analysis of resistance, we used Rosace to score and compare DMSO and inhibitor landscapes. We present the full distribution of raw scores in Figure 1 for each condition. However, to visually highlight resistance mutations as a heatmap, we subtracted the scores of each variant in each inhibitor condition from the raw DMSO score, making the heatmaps in Figure 1 - supplement 4 appear more “blue.”

      And finally, there is a lot of emphasis put on the unexpected results of this assay for the tivantinib "type III" inhibitor - could this in fact be because the molecule "is highly selective for the inactive or unphosphorylated form of c-Met" according to Eathiraj et al JBC 2011?

      The work presented by Eathiraj et al JBC 2011 is a key study we reference and is foundational to tivantinib. While the point brought up about tivantinib’s selective preference for an inactive conformation is valid, this is also true for type II kinase inhibitors. In our study, regardless of inhibitor conformational preference, tivantinib was the only one with a nearly identical landscape to DMSO and exhibited selection even in the absence of Ba/F3 MET-addiction (Figure 1E). This result is in closer agreement with MET agnostic behavior reported by Basilico et al., 2013 and Katayama et al., 2013.

      While this paper is crisply written with beautiful figures, the complexity of the data warrants a bit more clarity in how the results are visualized. Namely, clearly highlighting mutants that have previously reported and those identified by this study across all figures could help significantly in understanding the more novel findings of the work.

      To better compare and contrast novel mutation identified in this study to others, we compiled a list of reported resistance mutations from recent clinical and experimental studies (Pecci et al 2024; Yao et al., 2023; Bahcall et al., 2022; Recondo et al., 2020; Rotow et al ., 2020; Fujino et al., 2019), since a direct database with resistance annotations does not exist for MET, to the best of our knowledge. In total, this amounted to 31 annotated resistance mutations across crizotinib, capmatinib, tepotinib, savolitinib, cabozantinib, merestinib, and glesatinib, which we have now tabulated in a new figure (Figure 4) and commentary in the main text:

      To assess the agreement between our DMS and previously annotated resistance mutations, we compiled a list of reported resistance mutations from recent clinical and experimental studies (Pecci et al 2024; Yao et al., 2023; Bahcall et al., 2022; Recondo et al., 2020; Rotow et al ., 2020; Fujino et al., 2019) (Figure 4A,B). Overall, previously discovered mutations are strongly shifted to a GOF distribution for the drugs where resistance is reported from treatment or experiment; in contrast, the distribution is centered around neutral for those sites for other drugs not reported in the literature (Figure 4C). However, even in cases such as L1195V, we observe GOF DMS scores indicative of resistance to previously reported inhibitors. Given this overall strong concordance with prior literature and clinical results, we can also provide hypotheses to clarify the role of mutations that are observed in combination with others. For example, H1094Y is a reported driver mutation that has been linked to resistance in METΔEx14 for glesatinib with either the secondary L1195V mutation or in isolation (Recodo et al., 2020). However, in our assay H1094Y demonstrated slight sensitivity to gelesatinib, suggesting that either resistance is linked to the exon14 deletion isoform, the L1195V mutation, or a cellular factor not modeled well by the BaF3 system.

      Finally, the potential impacts and follow-ups of this excellent study could be communicated better - it is recommended that they advertise better this paper as a resource for the community both as a dataset and as a proof of concept. In this realm I would encourage the authors to emphasize the multiple potential uses of this dataset by others to provide answers and insights on a variety of problems.

      Please see below

      Related to this, the decision to include the MetdelEx14 results, but not discuss them at all is interesting, do the authors expect future analyses to lead to useful insights? Is it surprising that trends are broadly the same to the data discussed?

      Our previous paper suggests that Ba/F3 isn’t a great model for measuring the differences between MET and METΔEx14, so we haven’t emphasized other than to point to our previous paper. We include the full analysis here nonetheless as a resource. Potentially where the greatest differences between resistance mutant behaviors would be observed is in the full-length, membrane-bound MET and METΔEx14 receptor isoforms. While outside of the scope of this study, there is great potential to use the resistance mutations identified in this study as a filtered group to test and map differential inhibitor sensitivities between receptor isoforms.

      And finally it could be valuable to have a small addition of introspection from the authors on how this approach could be altered and/or improved in the future to facilitate the general application of this approach for combination therapies for other targets.

      See also reviewer 2 response where we have added text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major points of revision:

      (1) It seems like much of the structural interpretation of the inhibitor binding mode, outside of crizotinib binding, appears to come from docked models of the inhibitor to the MET kinase domain. Given the potential variability of the docked structure to the kinase domain, it would be useful for the authors to consider alternative possible binding modes that their docking pipeline may have suggested. It could also be useful to provide some degree of validation or contextualization of their docking models.

      All individual figures are very carefully inspected based on either existing crystal structures of the inhibitor or closely related inhibitors (ATP, 3DKC; crizotinib, 2WGJ; tepotinib, 4R1V; tivantinib, 3RHK; AMG-458, 5T3Q; NVP-BVU972, 3QTI; merestinib, 4EEV; savolitinib, 6SDE). In total, four structural interpretations were the result of docking onto reference experimental structures (capmatinib, cabozantinib, glumetinib, glesatinib). As we wrote above, different conformations and binding modes are possible in predicted mutant structures (as we did here at scale) and included in the ML analysis already.

      (2) In the first section, the authors classify an inhibitor as Type Ia on docking models, but mention the conflicting literature describing it as type Ib - it would be helpful to provide a contextualization of why this distinction between Ia and Ib matters, and what difference it might make. It would also be useful to know if their docking score only suggested poses compatible with Ia or if other poses were provided as well. Validation using other method might be beneficial, especially since they acknowledge the conflicting literature for classification. Or at least recontextualization that more evidence would be needed.

      Kinase inhibitors have several canonical structural definitions we use to base the classifications in this study. Specifically, type I inhibitors are classified in MET by interactions with Y1230, D1228, K1110 in addition to its conformation in the ATP-binding site. Type I inhibitors are further subdivided into type 1a in MET if it leverages interactions with the solvent front and residue G1163. In prior literature referenced, tepotinib was classified as type 1b, which would imply it does not have solvent front interactions, like savolitinib (PDB 6SDE) or NVP-BVU972 (PDB 3QTI). However, in the tepotinib experimental structure (PDB 4R1V), we observed a greater structural resemblance to other type 1a inhibitors opposed to type 1b (Figure 1 - figure supplement 1b).

      (3) The measure used to discuss resistance and sensitivity is ultimately a resistance score derived from the increase or decrease of the presence of a variant during cell growth. This is not a measure of direct binding. It would be helpful if the authors discussed alternative mechanisms through which these variants may impact resistance and/or sensitivity, such as stability, protonation effects, or kinase activity. The score itself may be convolving over all these potential mechanisms to drive GOF and LOF observed behavior.

      See the response to the public review. Indeed, our ML framework explicitly included conformational and stability effects as significant in improving predictions.

      (4) While it is promising to try and improve the predictive properties of ESM1b, it is not exactly clear why the authors considered their structural data of 11 inhibitors a sufficient dataset with which to augment the model. It would be useful for the authors to provide some additional context for why they wished to augment ESM1b in particular with their dataset, and provide any metrics indicating that their training data of 11 inhibitors provided an adequate statistical sample.

      We don’t understand what this means. Sorry!

      (5) The authors use ESM-1b to predict the fitness impact of each mutation and augment it using protein structural data of drug-target interactions. However, using an XGBoost regressor on a single set of 11 kinase-inhibitor interaction pairs is an incredibly sparse dataset to train upon. It would be useful for the authors to consider the limitations of their model, as well as its extensibility in the context of alternate binding poses, alternate conformations, or changes in protonation states of ligand or inhibitor.

      On the contrary - this is 11 chemicals across 3000 mutations. We have discussed alternative interpretations above.

      Minor points:

      (1) It would also be useful for the authors to provide more context around their choice of regressor. XGBoost is a powerful regressor but can easily overfit high dimensional data when paired with language models such as ESM-1b. This would be particularly useful since some of the features to train on were also generated using existing models such as ThermoMPNN.

      Yes - we are quite concerned about overfitting and have tried to assess overfitting by careful design of test and validation sets.

      (2) The authors also mention excluding their DMSO and AMG458 scores in the model training and testing due to overfitting issues - it would be useful to have an SI figure pointing to this data.

      No - we exclude the DMSO because that is the reference (baseline) and AMG because it has a different binding mode. This isn’t related to overfitting.

      (3) The authors mention in their docking pipeline that 5 binding modes were used for each ligand docking, but it appears that only one binding mode is considered in the main figures. It would be useful for the authors to provide additional details about what were the other binding modes used for, how different were each binding mode, and how was the "primary" mode selected (and how much better was its score than the others).

      The reviewer misinterprets the difference between poses shown in figures, based on mostly crystal structures or carefully selected templates, and the use of docked models in feature engineering for the ML part of the study. Where existing crystal structures do not exist, we performed docking for capmatinib, cabozantinib, glumetinib, glesatinib onto reference structures bound to type I (2WGJ) and type II (4EEV) inhibitors. We selected one representative binding mode based on the reference inhibitor, and while not exact, at a minimum these models provide a basis for structural interpretation.

      Reviewer #2 (Recommendations for the authors):

      My main suggestion is for the authors to add a few sentences (in non-technical language) to the results section, specifically before the results shown in Figure 3, defining gain-of-function, loss-of-function, resistance, and sensitivity. While these definitions are present in the materials and methods section, explicitly discussing them prior to the relevant results would significantly improve the overall readability of the manuscript.

      We defined “gain-of-function” and “loss-of-function” mutations as those with fitness scores statistically greater or lower than wild-type. Within the DMSO condition, gain-of-function and loss-of -function labels describe mutational perturbation to protein function, whereas within inhibitor conditions, the labels describe the difference in fitness introduced by an inhibitor.

      We have also clarified these definitions where the terms are first introduced: “As expected, the DMSO control population displayed a bimodal distribution with mutations exhibiting wild-type fitness centered around 0, with a wider distribution of mutations that exhibited loss- or gain-of-function effects, as defined by fitness scores with statistically significant lower or greater scores than wild-type, respectively.”

      Figure 7D. Please add a bit more detail to the legend on how fold change (y-axis) was calculated.

      Here, fold change represents the number of viable cells at each inhibitor concentration relative to the TKI control, measured with the CellTiter-Glo® Luminescent Cell Viability Assay (Promega) as an end point readout. We have updated the legend of Figure 7D with calculation details: “Dose-response for each inhibitor concentration is represented as the fraction of viable cells relative to the TKI free control.”

      I must admit, I did not understand what "Specific inhibitor fitness landscapes also aid in identifying mutations with potential drug sensitivity, such as R1086 and C1091 in the MET P-loop" means. These are positions where most mutations lead to greater sensitivity to crizotinib. Is the idea that there are potentially clinically-relevant MET mutations that can be targeted over wild type with crizotinib?

      Thank you for highlighting this! The P-loop (phosphate-binding loop) is a glycine-rich structural motif conserved in kinase domains. This motif is located in the N-lobe, where its primary role is to gate ATP entry into the active site and stabilize the phosphate groups of ATP when bound. Therefore, the P-loop is a common target region for ATP-competitive inhibitor design, but also a site where resistance can emerge (Roumiantsev et al., 2002). The idea we’d like to convey is that identifying residues that offer the potential for drug stabilization with the added benefit of having lower risk resistance, is an attractive consideration for novel inhibitor design.

      We have added to the text: “Individual inhibitor resistance landscapes also aid in identifying target residues for novel drug design by providing insights into mutability and known resistance cases. This enables the selection of vectors for chemical elaboration with potential lower risk of resistance development. Sites with mutational profiles such as R1086 and C1091, located in the common drug target P-loop of MET, could be likely candidates for crizotinib.”

      Reviewer #3 (Recommendations for the authors):

      (1) Suggested Improvements to the Figures:

      a)  Figure 4A - T1261 seems to be mislabeled

      b)  In Figure 3A it's suggested to highlight mutants determined to be resistance mutants by this scheme.

      c)  In Figure 3D it would be informative to highlight which of these resistance mutants have already been previously reported and which are novel to this study

      d)  Throughout figures 3A, 3D, and 4G the graphical choices on how to highlight synonymous mutations and mutations not performed in the assay needs improvement.

      The Green vs Grey 'TRUE' vs 'FALSE' boxes are confusing. Just a green box indicating synonymous mutations would be sufficient. Additionally these green boxes are hard to see, and often edges of this green box are currently missing making it even more difficult to see and interpret.

      * In Figure 4A mutants do not seem to be indicated by a line or plus sign, but this is not explained in the legend or the caption. Please add.

      * In 3D and 4G it is not clear if the mutants not performed are indicated at all - perhaps they are indicated in white, making them indistinguishable from scores with 0. Please clarify.

      T1261 and G1242 are now correctly labeled.

      In text we have also highlighted reported resistance mutations for crizotinib, which are inclusive of clinical reports and in vitro characterization: “These sites, and many of the individual mutations, have been noted in prior reports, such as: D1228N/H/V/Y, Y1230C/H/N/S, G1163R.”

      We have adjusted the heatmaps to improve visual clarity. Mutations with score 0 are white, as indicated by the scale bar, and mutations uncaptured by the screen are now in light yellow. The green outline distinguishing WT synonymous mutations have also been adjusted so edges are no longer cut off. In our representations, we only distinguished mutations by the score color scale bar and WT outline. What looked like a “plus” or “line” in the original figure was only the heatmap background, which now should be resolved in the updated figure and legends for Figure 3 and Figure 4.

      (2) Some Minor Suggested Improvements to the Text:

      a)  The abbreviation CBL for 'CBL docking site' is used without being defined.

      b)  Figure 3G is referenced, but it does not exist.

      c)  In the sentence 'Beyond these well characterized sites, regions with sensitivity occurred throughout the kinase, primarily in loop-regions which have the greatest mutational tolerance in DMSO, but do not provide a growth advantage in the presence of an inhibitor (Figure 1 - Figure Supplement 1; Figure 1 - Figure Supplement 2).'. It is not clear why these supplemental figures are being referenced.

      d)  In the supplement section 'Enrich2 Scoring' has what seem like placeholders for citations in [brackets]

      Cbl is a E3 ubiquitin ligase that plays a role in MET regulation through engagement with exon 14, specifically at Y1003 when phosphorylated. This mode of regulation was more highlighted in our previous study. However, since Cbl was only mentioned briefly in this study, we have removed reference to it to simplify the text.

      In addition, we have removed the figure 3G reference and corrected the in-text range. We have also removed references to figure supplements where unnecessary and edited the “Enrich2 scoring” method section to now reference missing citations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      In organisms with open mitosis, nuclear envelope breakdown at mitotic entry and re‐assembly of the nuclear envelope at the end of mitosis are important, highly regulated processes. One key regulator of nuclear envelope re‐assembly is the BAF (Barrier‐to‐Autointegration) protein, which contributes to cross‐linking of chromosomes to the nuclear envelope. Crucially, BAF has to be in a dephosphorylated form to carry out this function, and PP2A has been shown to be the phosphatase that dephosphorylates BAF. The Ankle2/LEM4 protein has previously been identified as an important regulator of PP2A in the dephosphorylation of BAF but its precise function is not fully understood, and Li and colleagues set out to investigate the function of Ankle2/LEM4 in both Drosophila flies and Drosophila cell lines.

      Strengths: 

      The authors use a combination of biochemical and imaging techniques to understand the biology of Ankle2/LEM4. On the whole, the experiments are well conducted and the results look convincing. A particular strength of this manuscript is that the authors are able to study both cellular phenotypes and organismal effects of their mutants by studying both Drosophila D‐mel cells and whole flies.

      The work presented in this manuscript significantly enhances our understanding of how Ankle2/LEM4 supports BAF dephosphorylation at the end of mitosis. Particularly interesting is the finding that Ankle2/LEM4 appears to be a bona fide PP2A regulatory protein in Drosophila, as well as the localisation of Ankle2/LEM4 and how this is influenced by the interaction between Ankle2 and the ER protein Vap33. It would be interesting to see, though, whether these insights are conserved in mammalian cells, e.g. does mammalian Vap33 also interact with LEM4? Is LEM4 also a part of the PP2A holoenzyme complex in mammalian cells? 

      We feel that conducting experiments to test the level of conservation of our findings in mammalian cells is outside the scope of our study, and we will leave it for other labs to investigate.

      Weaknesses: 

      This work is certainly impactful but more discussion and comparison of the Drosophila versus mammalian cell system would be helpful. Also, to attract the largest possible readership, the Ankle2 protein should be referred to as Ankle2/LEM4 throughout the paper to make it clear that this is the same molecule. 

      We have reinforced our presentation and discussion of similarities and differences between Ankle2 from Drosophila vs humans where relevant throughout the Introduction and Discussion sections. Additionally, we have added the mention that Ankle2 is also called LEM4 in humans in the Abstract and Introduction. However, when referring to Drosophila Ankle2, we do not use LEM4 because it is not listed as an alternate name for this gene/protein in FlyBase.

      A schematic model at the end of the final figure would be very useful to summarise the findings.

      We have already provided a schematic model in Figure S3, where we think it is better placed.

      Reviewer #2 (Public review):

      The authors first identify Ankle2 as a regulatory subunit and direct interactor of PP2A, showing they interact both in vitro and in vivo to promote BAF dephosphorylation. The Ankyrin domain of Ankle2 is important for the interaction with PP2A. They then show Ankle2 also interacts with the ER protein Vap33 through FFAT motifs and they particularly co‐localize during mitosis. The recruitment of Ankle2 to Vap33 is essential to ER and nuclear envelop membrane in telophase while earlier in mitosis, it relies on the C terminus but not the FFAT motifs for recruitments to the nuclear membrane and spindle envelop in early mitosis. The molecular determinants and receptors are currently not known. The authors check the function of the PP2A recruitment to Ankle2/Vap33 in the context of embryos and show this recruitment pathway is functionally important. While the Ankle2/Vap33 interaction is dispensable in adult flies ‐looking at wing development, the PP2A/Ankle2 interaction is essential for correct wing and fly development. Overall, this is a very complete paper that reveals the molecular mechanism of PP2A recruitment to Ankle2 and studies both the cellular and the physiological effect of this interaction in the context of fly development.

      Strengths: 

      The paper is well written and the narrative is well‐developed. The figures are of high quality, wellcontrolled, clearly labelled, and easy to understand. They support the claims made by the authors. 

      Weaknesses: 

      The study would benefit from being discussed in the context of what is already known on Ankle2 biology in C.elegans and human cells. It is important to highlight the structures shown in the paper are alphafold models, rather than validated structures. 

      We have enhanced our presentation of what is known about LEM‐4L/Ankle2 in C. elegans and humans in the Introduction, and further developed comparisons of our findings regarding Drosophila Ankle2 with these orthologs in the Results and Discussion sections. We have also specified in all sections and figure legends that the structures shown are AlphaFold3 models.

      Reviewer #3 (Public review): 

      Summary: 

      The authors were interested in how Ankle2 regulates nuclear envelope reformation after cell division. Other published manuscripts, including those from the authors, show without a doubt that Ankle2 plays a role in this critical process. However, the mechanism by which Ankle2 functions was unclear. Previous work using worms and humans (Asencio et al., 2012) established that human ANKLE2 could bind endogenous PP2A subunits. The binding was direct and was mediated through a region before and including the first ankyrin repeat in human ANKLE2. In addition to its interaction with PP2A, Asencio et al., 2012 also show that ANKLE2 regulates VRK1 kinase activity. Together PP2A and VRK1 regulate BAF phosphorylation for proper nuclear envelope reformation. Here, the authors provide more evidence for interaction with PP2A by also mapping the domain of interaction to the ankyrin repeat in Drosophila. In addition, the ankyrin repeat is essential for nuclear envelope reformation after division. They show that Ankle2 can bind in a PP2A complex without other known regulatory subunits of PP2A. The authors also identify a novel interaction with ER protein Vap33, but functional relevance for this interaction in nuclear envelope reformation is not provided in the manuscript, which the authors explicitly state. This manuscript does not comment on the activity of Ballchen/VRK1 in relation to Ankle2 loss and BAF phosphorylation or nuclear envelope reformation, even though links were previously shown by multiple studies (Asencio et al., Link et al., Apridita Sebastian et al.,). Nuclear envelope defects were rescued by the reduction of VRK1 in two of these manuscripts. It is possible that BAF phosphorylation phenotypes can be contributed by both PP2A inactivity and VRK1 overactivity due to the loss of Ankle2.

      Strengths: 

      This manuscript is a useful finding linking Ankle2 function during nuclear envelope reformation to the PP2A complex. The authors present solid data showing that Ankle2 can form a complex with PP2A‐29B and Mts and generate a phosphoproteomic resource that is fundamentally important to understanding Ankle2 biology. 

      Weaknesses: 

      However, the main findings/conclusions about subcellular localization might be incomplete since they are drawn from overexpression experiments. In addition, throughout the text, some conclusions are overstated or are not supported by data. 

      It is true that all experiments studying subcellular localization were done with tagged proteins overexpressed in flies and cell culture. Nevertheless, we show that Ankle2‐GFP is functional since it rescues phenotypes resulting from the loss of endogenous Ankle2 in both flies and cultured cells. The antibodies we generated against Ankle2 were unable to reliably detect the endogenous protein by immunofluorescence. We have now stated this caveat in our manuscript. Regarding the validity of our conclusions in relation to our data, we address each point raised by the reviewer under the Recommendations for the authors. In some cases, we have adjusted our conclusions and in other cases, we have provided additional clarification or justification. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      There are a few experimental issues that should be addressed, specific comments are listed below: 

      (1) Figure 1F: In this experiment, the authors immunoprecipitate GFP‐PP2A‐29B or PP2A‐B29BGFP and Western blot for Ankle2 and Mts to demonstrate that both are co‐immunoprecipitated. To demonstrate that these interactions are specific, the authors should also blot for a protein that is expected to definitely NOT co‐immunoprecipitate with PP2A‐B29; e.g. tubulin. 

      Our conclusion that GFP‐PP2A‐29B and PP2A‐29B‐GFP specifically interact with Ankle2 and Mts is also based on mass spectrometry analysis of the purification products from embryos and cells in culture, comparing with products of purification of GFP alone (Fig 1E‐F, S1C‐D and Tables S2, S3). The lists of identified proteins reveal that most proteins (including tubulins) are not enriched with GFP‐PP2A‐29B or PP2A‐29B‐GFP like Ankle2 and Mts are.

      (2) Figure 2A: The colour coding of the dots is not explained in the figure legend. 

      We have now added the explanation.

      (3) Figure 2B: The competition experiment is a good idea. Do the authors get the same results when they conduct the experiment the other way round, i.e. keep the concentration of Tws the same but increase the concentration of Ankle2? 

      We have tried this reverse experiment but saw little effect. The failure to observe displacement of Tws by Ankle2 in this context could be due to a higher affinity of Tws than Ankle2 in the PP2A complex, or to lower expression levels achieved for Ankle2 (a larger protein) relative to Tws.

      (4) Figure 5D: The hyperphosphorylation of BAF is very difficult to see, and it is impossible to tell whether the hyperphosphorylation has been rescued or not by the different Ankle2 constructs. Can the phosphorylated and the hyperphosphorylated bands be separated better? This panel needs significant improvements to support the claims in the text.

      In our opinion, the hyperphosphorylated (upper band) and unphosphorylated (lower band) forms of BAF are well resolved and readily distinguishable. The fainter band in the middle could correspond to a partially phosphorylated form of BAF but we do not venture to speculate on its precise identity nor do we need it to draw our conclusions. The important information from this blot is that the level of unphosphorylated BAF after Ankle2 RNAi increases when Ankle2WT‐GFP and Ankle2Fm+FL1‐GFP are expressed but not when Flag‐GFP or Ankle2ANK‐GFP are expressed. In these experiments, the rescue of unphosphorylated BAF is incomplete because not all cells express the GFP‐tagged protein in our non‐clonal stable cell lines.

      Reviewer #2 (Recommendations for the authors):

      (1) The alphafold models need to be labelled as such better on the figures, to distinguish them from X‐ray crystallography structures. Alphafold will always propose a solution but it is not necessarily correct. 

      We have added the note “MODEL” directly in Figures 2C, 2D, 4F and S3B, in addition to the information already provided in the text and figure legends specifying that these are models generated by AlphaFold3.

      (2) Figure 4 F. Annotate the Ankle2 FL1 peptide. 

      We have indicated the amino acid residues in the figure.

      (3) Problems with the statistical tests. T‐tests cannot be used for comparing multiple groups, as this favors error propagation. 

      All of our t‐tests compare only two groups at a time, as indicated. In this regard, our labeling in Fig 5C may have been misleading. We have now changed it.

      (4) Close‐ups of ring canal in Figure S2. In Figure S2, there seem to be lots of GFP‐Ankle2 vesicles in the cytoplasm of the oocyte. 

      We agree that the image showing Ankle2‐GFP alone in the RNAi Vap33 condition suggested a cytoplasmic granular localization of unknown nature. However, upon examination, we realized that this image did not correspond to the same z‐step as the matching merged image (which also

      included DNA staining). We have now replaced the image with the correct one.

      Reviewer #3 (Recommendations for the authors): 

      Be more accurate about what conclusions can be made from reported data, particularly from overexpression and deletion studies. 

      (1) The domain analysis for physical interaction is quite thorough. However, localization information is taken from overexpressed constructs. While these data show what could happen, the authors are not using endogenous levels of Ankle2 in cells or tissues that are known to require Ankle2. As a result, it is difficult to determine whether localization results are biologically meaningful. 

      We have added the following text at the end of the third Results section:

      “We were unable to examine the localization of endogenous Ankle2 because the antibodies that we generated gave inconclusive results in immunofluorescence. For the remainder of our study, we relied on the overexpression of Ankle2‐GFP, which may not perfectly reflect the localization and function of endogenous Ankle2. However, Ankle2‐GFP is functional as it can rescue phenotypes observed when endogenous Ankle2 is depleted (see below).”

      (2) The data showing that Ankle2 is a regulator unit of the PP2A complex also relies on in vitro binding assays in an over‐expression context. Data certainly show Ankle2 can bind proteins in the PP2A complex when overexpressed. However, the authors could not isolate enough of the complex from the animal to test function, so Ankle2 acting as a regulatory subunit isn't functionally shown. There are other possibilities, such as Ankle2 acts as a scaffold for complex assembly.  

      The competition experiments shown in Fig 2 are based on complexes assembling in cells and are not in vitro binding assays. We show 4 lines of evidence supporting the idea that Ankle2 functions as a regulatory subunit of PP2A: 1) Ankle2 interacts with the structural (PP2A‐29B) and catalytic (Mts) subunits of PP2A without any known regulatory subunit of PP2A. 2) Depletion of Ankle2 leads to the hyperphosphorylation of the known PP2A substrate BAF. 3) The PP2A regulatory subunit Tws/B55 competes with Ankle2 for formation of a complex with PP2A. 4) AlphaFold3 predicts that Ankle2 engages in a complex with PP2A at a position similar to that of known regulatory subunits of PP2A including Tws/B55, and consistent with their mutually exclusive presence in PP2A complexes. If Ankle2 acted as a scaffold for the formation of a PP2A complex containing other regulatory subunits, we would expect to detect Ankle2 and another regulatory subunit in the same complex.

      (3) Throughout the text, some conclusions are overstated or are not supported by data. Examples are below: 

      a. Page 1: "we show for the first time that Ankle2 is a regulatory subunit of PP2A"  The authors show binding and changes in BAF phosphorylation levels, but changes in PP2A activity with modulation of Ankle2 weren't shown. 

      We have replaced this phrase with this one:

      “…we provide several lines of evidence that suggest that Ankle2 is a regulatory subunit of PP2A…”

      b. Page 3: "The requirement for Ankle2 in the development of the central nervous system was initially discovered through its targeting by the microcephaly‐causing Zika virus (Shah et al.,

      2018)." 

      This is not the first paper showing ANKLE2 plays a role in the development of the CNS. Yamamoto et al., 2014 identified mutants in Ankle2 with defects in CNS development in flies and humans, establishing it as a human microcephaly‐causing gene. 

      We are sorry for this oversight. We have now cited this important work.

      c. Page 6: "Moreover, BAF appears to be the only obligatory substrate of Ankle2‐dependent dephosphorylation for cell proliferation as lowering the dose of the BAF kinase NHK‐1/Ballchen rescues wing development defects caused by the partial depletion of Ankle2 (Li et al., 2024)."  It is unclear why the authors conclude this since Ballchen/VRK1 can phosphorylate many things besides BAF. 

      Although the conclusion cannot be drawn categorically, it seems to be by far the most likely scenario. However, we agree that in principle, other mechanisms could also account for these genetic observations, such as the dephosphorylation of another, still unidentified obligatory substrate of PP2A‐Ankle2 that would also be phosphorylated by NHK‐1/Ballchen. However, we have also shown that expression of an unphosphorylatable mutant form of BAF rescues phenotypes observed upon loss of Ankle2 function (Li et al, 2024). We have changed our sentence as follows:

      "Moreover, BAF could be the only obligatory substrate of Ankle2‐dependent dephosphorylation for cell proliferation as lowering the dose of the BAF kinase NHK‐1/Ballchen or expression of an unphosphorylatable mutant form of BAF rescues wing development defects caused by the partial depletion of Ankle2 (Li et al., 2024).”

      d. Page 10: "These results suggest that a Vap33‐Ankle2‐PP2A complex can mediate the recruitment of a pool of PP2A at the NE."

      There is insufficient evidence to indicate that Vap33‐Ankle2‐PP2A exists in a stable state in the cell and that this complex mediates recruitment of PP2A at the NE. The images do not include Vap33, showing no evidence it is present when PP2A is at the NE and the complex could only be detected with overexpression. 

      We agree with this caveat and recognize the need to be cautious when proposing our model. In this regard, we feel that our wording is reasonable and appropriate, using “suggest” rather than “prove”, “show” or “indicate”.

      e. Page 11: These results suggest that the interaction of Ankle2 with PP2A is essential for its function in BAF dephosphorylation and nuclear reassembly." Page 14: "these results indicate that the interaction of Ankle2 with PP2A is essential during embryo". Page 14: "These results indicate that the interaction of Ankle2 with PP2A but not with Vap33 is essential for its function during cell proliferation in imaginal wing disc development." 

      These experiments show that the ankyrin repeat in Ankle2 is necessary for these processes. It does not say PP2A interaction with Ankle2 is necessary because other things could bind the domain. 

      We have revised the segments of the text mentioned, taking the reviewer’s legitimate concerns into consideration. We have also added the following sentence to the Discussion:

      “However, it remains formally possible that the deletion of Ankyrin repeats used to disrupt the Ankle2‐PP2A interaction abrogated another, unknown aspect of Ankle2 function.”

      f. Page 12: "Overall, we conclude that in addition to its N‐terminal PP2A‐interacting Ankyrin domain, Ankle2 requires the integrity of its C‐terminal portion for its essential function in nuclear reassembly." 

      No data was shown for differences in nuclear reassembly, only the ability for ANKLE2 truncation mutants to localize to the nuclear envelope. It isn't clear whether the nuclear envelope reformation is normal in Figure S6 which the authors refer to. Lamin staining could help determine and conclude the C‐terminal region is important for nuclear envelope reformation. 

      Our conclusion is drawn from the results shown in Figures S4 and S5 (described in the same section), where a rescue assay in cells was performed to assess the functionality of different variants of Ankle2‐GFP when endogenous Ankle2 was depleted. In this assay, Lamin and DNA staining were used to examine nuclear reassembly (as in Figure 5). Figure S6 shows the localizations of the different variants of Ankle2‐GFP, but endogenous Ankle2 is not depleted in these cells.

      g. Page 13: "We conclude that the ability of Ankle2 to interact with PP2A is required for the timely recruitment of BAF at reassembling nuclei and ensuing NE reassembly."

      It's possible the Ankyrin domain in ANKLE2 is interacting with proteins other than PP2A to recruit BAF at reassembling nuclei, especially since ANKLE2 is found to regulate VRK1 (Link 2019) which has been found to phosphorylate BAF during the cell cycle (Molitor 2014). Additionally, the images in Figure 6A appear to show fully reassembled nuclear envelopes in all mutants by 180s. 

      This point relates to point e, raised above by this reviewer. We have re‐written the sentence as follows:

      “We conclude that the Ankyrin domain, required for the ability of Ankle2 to interact with PP2A, is necessary for the timely recruitment of BAF at reassembling nuclei and ensuing NE reassembly.”

      Please note that in this paragraph, we discuss a delay in RFP‐BAF recruitment, rather than the complete elimination of this recruitment. 

      h. Page 16: "Our unbiased phosphoproteomic analysis confirmed that BAF dephosphorylation depends on Ankle2, despite the absence of a detectable interaction between Drosophila Ankle2 and BAF, which may be due to the lack of a LEM domain in the former (Fishburn et al., 2024). Moreover, while Ankle2 was shown to bind and inhibit the BAF counteracting kinase VRK1 in humans (Asencio et al., 2012), we detected no interaction between Ankle2 and NHK‐1/Ballchen (VRK1 ortholog) in Drosophila. This suggests that the loss of Ankle2 causes BAF hyperphosphorylation by preventing PP2A‐dependent dephosphorylation rather than by preventing inhibition of NHK‐1"

      There could be transient binding between Ankle2 and Ballchen/VRK1/NHK‐1 or activity can be indirect, but that doesn't mean there is not a contribution of BAF phosphorylation by Ballchen/VRK1/NHK‐1. Genetic evidence from three model systems, including Drosophila, indicates there is a strong genetic interaction between Ankle2 and Ballchen/VRK1/NHK‐1 that includes rescue of lethality.

      We agree and we have re‐written in this way:

      “While a putative interaction between Ankle2 and NHK‐1 in Drosophila could occur transiently, thereby escaping detection, the simplest interpretation of our results is that the loss of Ankle2 causes BAF hyperphosphorylation by preventing PP2A‐dependent dephosphorylation rather than by preventing inhibition of NHK‐1.”

      We do not question the fact that Ballchen/VRK1/NHK‐1 phosphorylates BAF and genetically interacts with Ankle2. The antagonistic relationship between Ballchen/VRK1/NHK‐1 and Ankle2 observed genetically can be explained by the fact that the kinase phosphorylates BAF while PP2AAnkle2 dephosphorylates it, without the need to invoke an additional inhibition of the kinase by Ankle2.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The hypothesis is based on the idea that inversions capture genetic variants that have antagonistic effects on male sexual success (via some display traits) and survival of females (or both sexes) until reproduction. Furthermore, a sufficiently skewed distribution of male sexual success will tend to generate synergistic epistasis for male fitness even if the individual loci contribute to sexually selected traits in an additive way. This should favor inversions that keep these male-beneficial alleles at different loci together at a cis-LD. A series of simulations are presented and show that the scenario works at least under some conditions. While a polymorphism at a single locus with large antagonistic effects can be maintained for a certain range of parameters, a second such variant with somewhat smaller effects tends to be lost unless closely linked. It becomes much more likely for genomically distant variants that add to the antagonism to spread if they get trapped in an inversion; the model predicts this should drive accumulation of sexually antagonistic variants on the inversion versus standard haplotype, leading to the evolution of haplotypes with very strong cumulative antagonistic pleiotropic effects. This idea has some analogies with one of predominant hypotheses for the evolution of sex chromosomes, and the authors discuss these similarities. The model is quite specific, but the basic idea is intuitive and thus should be robust to the details of model assumption. It makes perfect sense in the context of the geographic pattern of inversion frequencies. One prediction of the models (notably that leads to the evolution of nearly homozygously lethal haplotypes) does not seem to reflect the reality of chromosomal inversions in Drosophila, as the authors carefully discuss, but it is the case of some other "supergenes", notably in ants. So the theoretical part is a strong novel contribution.

      We appreciate the detailed and accurate summary of our main theoretic results.

      To provide empirical support for this idea, the authors study the dynamics of inversions in population cages over one generation, tracking their frequencies through amplicon sequencing at three time points: (young adults), embryos and very old adult offspring of either sex (>2 months from adult emergence). Out of four inversions included in the experiment, two show patterns consistent with antagonistic effects on male sexual success (competitive paternity) and the survival of offspring, especially females, until an old age, which the authors interpret as consistent with their theory.

      As I have argued in my comments on previous versions, the experiment only addresses one of the elements of the theoretical hypothesis, namely antagonistic effects of inversions on male reproductive success and other fitness components, in particular of females. Furthermore, the design of this experiment is not ideal from the viewpoint of the biological hypothesis it is aiming to test. This is in part because, rather than testing for the effects of inversion on male reproductive success versus the key fitness components of survival to maturity and female reproductive output, it looks at the effects on male reproductive success versus survival to a rather old age of 2 months. The relevance of survival until old age to fitness under natural conditions is unclear, as the authors now acknowledge. Furthermore, up to 15% of males that may have contributed to the next generation did not survive until genotyping, and thus the difference between these males' inversion frequency and that in their offspring may be confounded by this potential survival-based sampling bias. The experiment does not test for two other key elements of the proposed theory: the assumption of frequency-dependence of selection on male sexual success, and the prediction of synergistic epistasis for male fitness among genetic variants in the inversion. To be fair, particularly testing for synergistic epistasis would be exceedingly difficult, and the authors have now included a discussion of the above caveats and limitations, making their conclusions more tentative. This is good but of course does not make these limitations of the experiment go away. These limitations mean that the paper is stronger as a theoretical than as an empirical contribution.

      We discuss the choice to focus on exploring the potential antagonistic effects of the inversion karyotype on male reproductive success and survival in our general response above. Primarily, this prediction seemed to be the most specific to the proposed model as compared to other alternate models. Still, further studies are clearly needed to elucidate the potential frequency dependence and genetic architecture of the inversions.

      Regarding the choice of age at collection, it is unknown to what degree our selected collection age of 10 weeks correlates with survival in the wild, but we feel confident that there will be some positive correlation.

      We now further clarify that across our experiments, a minimum of 5% and a mean of 9% of the males used in the parental generation died before collection. These proportions do not appear sufficient to explain the differences between paternal and embryo inversion frequencies shown in Figure 9.

      Reviewer #2 (Public review):

      Summary:

      In their manuscript the authors address the question whether the inversion polymorphism in D. melanogaster can be explained by sexually antagonistic selection. They designed a new simulation tool to perform computer simulations, which confirmed their hypothesis. They also show a tradeoff between male reproduction and survival. Furthermore, some inversions display sex-specific survival.

      Strengths:

      It is an interesting idea on how chromosomal inversions may be maintained

      Weaknesses:

      The authors motivate their study by the observation that inversions are maintained in D. melanogaster and because inversions are more frequent closer to the equator, the authors conclude that it is unlikely that the inversion contributes to adaptation in more stressful environments. Rather the inversion seems to be more common in habitats that are closer to the native environment of ancestral Drosophila populations.

      While I do agree with the authors that this observation is interesting, I do not think that it rules out a role in local adaptation. After all, the inversion is common in Africa, so it is perfectly conceivable that the non-inverted chromosome may have acquired a mutation contributing to the novel environment.

      Based on their hypothesis, the authors propose an alternative strategy, which could maintain the inversion in a population. They perform some computer simulations, which are in line with the predicted behavior. Finally, the authors perform experiments and interpret the results as empirical evidence for their hypothesis. While the reviewer is not fully convinced about the empirical support, the key problem is that the proposed model does not explain the patterns of clinal variation observed for inversions in D. melanogaster. According to the proposed model, the inversions should have a similar frequency along latitudinal clines. So in essence, the authors develop a complicated theory because they felt that the current models do not explain the patterns of clinal variation, but this model also fails to explain the pattern of clinal variation.

      To the contrary – in the Discussion paragraph beginning on Line 671, we explain why we would predict that a tradeoff between survival and reproduction should lead to clinal inversion frequencies. We suggest that a karyotype associated with a survival penalty should be increasingly disadvantageous in more challenging environments (such as high altitudes and latitudes for this species). Furthermore, an advantage in male reproductive competition conferred by that same haplotype may be reduced by the lower population densities that we would expect in more challenging environments (meaning that each female should encounter fewer males). Individually or jointly, these two factors predict that the equilibrium frequency of a balanced inversion frequency polymorphism should depend on a local population’s environmental harshness and population density, with the ensuing prediction that inversion frequency should correlate with certain environmental variables.

      Reviewer #3 (Public review):

      Summary:

      In this study, McAllester and Pool develop a new model to explain the maintenance of balanced inversion polymorphism, based on (sexually) antagonistic alleles and a trade-off between male reproduction and survival (in females or both sexes). Simulations of this model support the plausibility of this mechanism. In addition, the authors use experiments on four naturally occurring inversion polymorphisms in D. melanogaster and find tentative evidence for one aspect of their theoretical model, namely the existence of the above-mentioned trade-off in two out of the four inversions.

      Strengths:

      (1) The study develops and analyzes a new (Drosophila melanogaster-inspired) model for the maintenance of balanced inversion polymorphism, combining elements of (sexually) antagonistically (pleiotropic) alleles, negative frequency-dependent selection and synergistic epistasis. Simulations of the model suggest that the hypothesized mechanism might be plausible.

      (2) The above-mentioned model assumes, as a specific example, a trade-off between male reproductive display and survival; in the second part of their study, the authors perform laboratory experiments on four common D. melanogaster inversions to study whether these polymorphisms may be subject to such a trade-off. The authors observe that two of the four inversions show suggestive evidence that is consistent with a trade-off between male reproduction and survival.

      Open issues:

      (1) A gap in the current modeling is that, while a diploid situation is being studied, the model does not investigate the effects of varying degrees of dominance. It would thus be important and interesting, as the authors mention, to fill this gap in future work.

      (2) It will also be important to further explore and corroborate the potential importance and generality of trade-offs between different fitness components in maintaining inversion polymorphisms in future work.

      We appreciate the work put in to evaluating, improving, and summarizing our study. We agree that further work studying the effects of dominance and of the fitness components of the inversions is important.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      l. 354 : I don't understand what the authors mean by "an antagonistic and non-antagonistic allele". If there is a antagonistic polymorphism at a locus, then both alleles have antagonistic effects; i.e., allele B increases trait 1 and reduced trait 2 relative to allele A and vice versa.

      Edited, agreed that the terminology used here was sub-optimal.

      Reviewer #2 (Recommendations for the authors):

      The motivation for their model is their claim that the clinal inversion frequencies are not compatible with local adaptation. The reviewer doubts this strong statement. Furthermore, the proposed model also fails to explain the inversion frequencies in natural populations.

      Hence, rather than building a straw man, it would be better if the authors first show their experiments and then present their model as an explanation for the empirical results. Nevertheless, it is also clear that the empirical data are not very strong and cannot be fully explained by the proposed model.

      This claim that we reject any role of local adaptation in clinal variation and selection upon inversion polymorphism does not hold up in a reading of our manuscript. We even suggest that locally varying selective pressures must be playing some role, although that does not imply that local adaptation is the ultimate driver of inversion frequencies. Indeed, we suggest that local adaptation alone is an insufficient explanation for inversion frequency clines in D. melanogaster, including because (1) these frequency clines do not approach the alternate fixed genotypes predicted by local directional selection, (2) these derived inversions tend to be more frequent in more ancestral environments (l.113-158).

      In our public review response above, and in the Discussion section of our paper, we explain why our model can predict both the clinal frequencies of many Drosophila inversions and their intermediate maximal frequencies. Of course, we do not predict that most inversions in this species should follow the specific tradeoff investigated here. In fact, we were surprised to find even two inversions that experimentally supported our predicted tradeoff. Still, it remains possible that other inversions in this species are subject to other balanced tradeoffs not investigated here, which could help explain why they rarely reach high local frequencies.

      Reviewer #3 (Recommendations for the authors):

      My previous comments have been adequately addressed.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      […]

      To provide empirical support for this idea, the authors study the dynamics of inversions in population cages over one generation, tracking their frequencies through amplicon sequencing at three time points: (young adults), embryos and very old adult offspring of either sex (>2 months from adult emergence). Out of four inversions included in the experiment, two show patterns consistent with antagonistic effects on male sexual success (competitive paternity) and the survival of offspring, especially females, until an old age, which the authors interpret as consistent with their theory.

      There are several reasons why the support from these data for the proposed theory is not waterproof.

      (1) As I have already pointed out in my previous review, survival until 2 months (in fact, it is 10 weeks and so 2.3 months) of age is of little direct relevance to fitness, whether under natural conditions or under typical lab conditions.

      The authors argue this objection away with two arguments

      First, citing Pool (2015) they claim that the average generation time (i.e. the average age at which flies reproduce) in nature is 24 days. That paper made an estimate of 14.7 generations per year under the North Carolina climate. As also stated in Pool (2015), the conditions in that locality for Drosophila reproduction and development are not suitable during three months of the year. This yields an average generation length of about 19.5 days during the 9 months during which the flies can reproduce. On the highly nutritional food used in the lab and at the optimal temperature of 25 C, Drosophila need about 11-12 days to develop from egg to adult. Even assuming these perfect conditions, the average age (counted from adult eclosion) would be about 8 days. In practice, larval development in nature is likely longer for nutritional and temperature reasons, and thus the genomic data analyzed by Pool imply that the average adult age of reproducing flies in nature would be about 5 days, and not 24 days, and even less 10 weeks. This corresponds neatly to the 2-6 days median life expectancy of Drosophila adults in the field based on capture-recapture (e.g., Rosewell and Shorrocks 1987).

      Second, the authors also claim that survival over a period of 2 month is highly relevant because flies have to survive long periods where reproduction is not possible. However, to survive the winter flies enter a reproductive diapause, which involves profound physiological changes that indeed allow them to survive for months, remaining mostly inactive, stress resistant and hidden from predators. Flies in the authors' experiment were not diapausing, given that they were given plentiful food and kept warm. It is still possible that survival to the ripe old age of 10 weeks under these conditions still correlates well with surviving diapause under harsh conditions, but if so, the authors should cite relevant data. Even then, I do not think this allows the authors to conclude that longevity is "the main selective pressure" on Drosophila (l. 936).

      This is overall a thoughtfully presented critique and we have endeavored to improve our discussion of Pool (2015) and to clarify some of the language used about survival elsewhere. While we agree that challenges other than survival to 10 weeks are very relevant to Drosophila melanogaster, collection at 10 weeks does encompass some of these other challenges. Egg to adult viability still contributes to the frequencies of the inversions at collection and is not separable from longevity in this data. Collection at longevity was chosen in part to encompass all lifetime fitness challenges that might influence the inversion frequency at collection, albeit still within permissive laboratory conditions. Future experiments exploring specific stressors independently and beyond permissive lab conditions would generate a clearer picture.

      In addition to general edits, the specific phrase mentioned at 1. 936 [now line 1003] has been revised from “In many such cases females are in reproductive diapause, and so longevity is the main selective pressure.” to “While longevity is a key selective pressure underlying overwintering, the relationship between longevity in permissive lab conditions without diapause and in natural conditions under diapause is unclear (Schmidt et al. 2005; Flatt 2020), and our experiment represents just one of many possible ways to examine tradeoffs involving survival.”

      (2) It appears that the "parental" (in fact, paternal) inversion frequency was estimated by sequencing sires that survived until the end of the two-week mating period. No information is provided on male mortality during the mating period, but substantial mortality is likely given constant courtship and mating opportunities. If so, the difference between the parental and embryo inversion frequency could reflect the differential survival of males until the point of sampling rather than / in addition to sexual selection.

      We have further clarified that when referenced as parental frequency, the frequency presented is ½ the paternal frequency as the mothers were homokaryotypic for the standard arrangement. We chose to present both due to considerations in representing the frequency change from paternal to embryo frequencies, where a hypothetical change from 0.20 frequency in fathers to 0.15 frequency in embryos represents a selective benefit (a frequency increase in the population), despite the reality that this is a decrease in allele frequency between paternal and embryo cohorts.

      We mentioned a maximum 15% paternal mortality at line 827 [now l.1056], but have now added complete data on the counts of flies in the experiment as a supplemental table (Table S1) and have added or corrected further references to this in the results and methods [lines 555, 638, 975]. It is true that this may influence the observed frequency changes to some degree, and while we adjusted our sampling method to account for the effects of this mortality on statistical power [l.1056ff], we have now edited the manuscript to better highlight potential effects of this phenomenon on the recorded frequency changes.

      It is also worth noting that, if mortality among fathers over the mating period is codirectional with mortality among aged offspring, this would bias the results against detecting an opposing antagonistic selective effect of the inversions on paternity share. This is now also mentioned in the manuscript, l.639ff.

      (3) Finally, irrespective of the above caveats, the experimental data only address one of the elements of the theoretical hypothesis, namely antagonistic effects of inversions on reproduction and survival, notably that of females. It does not test for two other key elements of the proposed theory: the assumption of frequency-dependence of selection on male sexual success, and the prediction of synergistic epistasis for male fitness among genetic variants in the inversion. To be fair, particularly testing the latter prediction would be exceedingly difficult. Nonetheless, these limitations of the experiment mean that the paper is much stronger theoretical than empirical contribution.

      This is a fair criticism of the limitations of our results, and we now summarize such caveats more directly in the discussion summary, lines 876ff.

      Reviewer #2 (Public Review): 

      […]

      Comments on the latest version:

      I would like to give an example of the confusing terminology of the authors:

      "Additionally, fitness conveyed by an allele favoring display quality is also frequency-dependent: since mating success depends on the display qualities of other males, the relative advantage of a display trait will be diminished as more males carry it..."

      I do not understand the difference to an advantageous allele, as it increases in frequency the frequency increase of this allele decreases, but this has nothing to do with frequency dependent selection. In my opinion, the authors re-define frequency dependent selection, as for frequency dependent selection needs to change with frequency, but from their verbal description this is not clear.

      We have edited this text for greater clarity, now line 232ff. We did not seek to redefine frequency dependence, and did mean by “the relative advantage of a display trait will be diminished” that an equivalent s would diminish with frequency. We have now remedied terminological issues introduced in the prior revision with regard to frequency dependent selection.

      One example of how challenging the style of the manuscript is comes from their description of the DNA extraction procedure. In principle a straightforward method, but even here the authors provide a convoluted uninformative description of the procedure.

      We have edited for clarity the text on lines 1016-1020. Citing a published protocol and mentioning our modifications seems an appropriate trade-off between representing what was done accurately, citing the sources we relied on in doing it, and limiting the volume of information in the main text for such a straightforward and common method. 

      It is not apparent to the reviewer why the authors have not invested more effort to make their manuscript digestible.

      We have invested a great deal of effort in making this manuscript as clear as we are able to.  We regret that our writing has not been to this reviewer’s liking. We believe we have been highly responsive to all specific criticisms, including revising all passages cited as unclear. In this round, we have again scrutinized the entire manuscript for any opportunity to clarify it, and we have made further changes throughout.  Although our subject matter is conceptually nuanced, we nevertheless remain optimistic that a careful, fresh reading of our revised manuscript would yield a more favorable impression.

      Reviewer #3 (Public Review):

      […]

      Weaknesses:

      A gap in the current modeling is that, while a diploid situation is being studied, the model does not investigate the effects of varying degrees of dominance. It would be important and interesting to fill this gap in future work.

      Agreed, and now reinforced at lines 892ff.

      Comments on the latest version:

      Most of the comments which I have made in my public review have been adequately addressed.

      Some of the writing still seems somewhat verbose and perhaps not yet maximally succinct; some additional line-by-line polishing might still be helpful at this stage in terms of further improving clarity and flow (for the authors to consider and decide).

      We have made further changes and some polishing in this draft, and greatly appreciate the guidance provided in improving the draft so far. 

      Reviewer #1 (Recommendations For The Authors):

      (1) While the model results are convincing, some of the verbal interpretation is confusing. In particular, the authors state that in their model the allele favoring male display quality shows a negative frequency dependence whereas the alternative allele has a positive frequency dependence. This does not make sense to me in the context of population genetics theory. For a one-locus, two-allele model the change of allele frequency under selection depends on the fitness of the genotypes concerned relative to each other. Thus, at least under no dominance assumed in this model, if the relative fitness of AA decreases with the frequency of allele A, the relative fitness of aa must decrease with the frequency of allele a. I.e., if selection is negatively frequency dependent, then it is so for both alleles.

      This phrasing was wrong, and we have edited the relevant section.

      (2) I am still not entirely sure that the synergistic epistasis assumed in the verbal model is actually generated in the simulations; this would be easy enough to check by extracting the mating success of males with different genotypes from the simulation output should be reported, e.g., as a figure supplement.

      Our new Figure S2, which depicts haplotype frequencies for a set of the simulations presented in Figure 4, should demonstrate a necessary presence of synergistic epistasis. These results further clarify that the weaker allele B is only kept when linked to A. The same fitness classes of genotype are present in the simulations with and without the inversion, so the only mechanical difference is the rate of recombination, and the only way this might change selection on the alleles is if a variant has a different fitness in one haplotype background than another – i.e. epistasis. The maintenance of haplotypes AB and ab to the exclusion of Ab and aB relies on the lesser relative fitness of Ab and aB. And since survival values are multiplicative, this additional contribution must come from the mate success of AB being disproportionately larger than Ab or aB, indicating the emergent synergistic epistasis posited by our model. We have clarified this point in the text at line 363ff.

      (3) l. 318ff: What was this set number of males? I could not find this information anywhere. Also, this model of the mating system is commonly referred to as "best of N", so the authors may want to include this label in the description.

      We indicate this detail just after the referenced line, now reworded and on l. 338-340 as “For each female’s mating competition, 100 males were sampled, though see Figure S1 for plots with varying encounter number.”  Among these edits, “one hundred” has been changed to a numeral for easier skimming, and Figure S1 is now referenced here earlier in the text. Several edits have also been made in the caption of Figures 2 and 3, and in the relevant methods section to clarify the number of encountered males simulated, mention best of N terminology, and clarify how the quality score is used in the mate competition.

      (4) The description of the experiment is still confusing. The number of individuals of each sex entered in each mating cage is missing from the Methods (l. 914); although I did finally find it in the Results. These flies were laying over 2 weeks - does this mean that offspring from the entire period were used to obtain the embryo and aged offspring frequencies, or only from a particular egg collection? If the former, does this mean that the offspring obtained from different egg batches were aged separately? Were the offspring aged in cages or bottles, at what density? Given that only those males that survived until the end of the two-week mating period were sequenced, it is important to know what % of the initial number of males these survivors were. A substantial mortality of the parental males could bias the estimate of parental frequencies. How many parental males, embryos and aged offspring were sequenced? Were all individuals of a given cage and stage extracted and sequenced as a single pool or were there multiple pools? The description could also be structured better. For example, the food and grape agar recipes and cage construction are inserted at random points of the description of the crossing design, which does not help.

      We have now reorganized and edited these portions of the Methods text. Portions of this comment overlap with edits responding to (2) of the Public Review and below for l. 921 in Details. Offspring from different laying periods were aged in different bottles, further separated by the time at which they eclosed. They were then pooled for DNA extraction and library preparation by sex and a binary early or late eclosion time. This data was present in the “D. mel. Sample Size” column of supplemental tables S6 and S7 (now S7 and S8), but we have added and referenced a new table to specifically collate the sample sizes of different experimental stages, table S1. Now referenced at lines 555, 638, 975, 1057.

      (5) The caption of figure 9 and the discussion of its results should be clear and explicit about the fact that "adult offspring" in Fig 9A and "female" and "male" refers to adults surviving to old age (whereas "parental" in Fig 9A refers to young adults in their reproductive prime. This has consequences for the interpretation of the difference between "parental" and "adult offspring", as it combines one generation of usual selection as it occurs under the conditions of the lab culture (young adult at generation t -> young adult in generation t+1) with an additional step of selection for longevity. Thus, a marked change in allele frequency does not imply that the "parental" frequency does not represent an equilibrium frequency of the inversions under the lab culture conditions. Furthermore, it would be useful to state explicitly that Figure 9B represents the same results as figure 9A, but with the aged offspring split by sex.

      Figure caption edited to provide further clarity on the age of cohorts and presented data, along with the relevant results section (2.3) referencing this figure.

      We avoid making any statements about the equilibrium frequencies of inversions under lab conditions, and whether or not any step of our experiment reflects such equilibria, because our investigation does not rely upon or test for such conditions. Instead, our analysis focuses on whether inversions have contrasting effects (as indicated by frequency changes that are incompatible with neutral sampling) between different life history components.  Under our model, such frequency reversals might be detectable both at equilibrium balanced inversion frequencies and also at frequencies some distance away from equilibria. We have now clarified this point at l. 970-972.

      Details:

      l. 211: this should be modified as male-only costs are now included.

      Edited. “survival likelihood (of either or both sexes).”

      l. 343: misplaced period

      Edited.

      l. 814: "We confirmed model predictions...": This sounds like it refers to an empirical confirmation of a theory prediction, but I think the authors just want to say that their simulations predicted antagonistic variants can be maintained at an intermediate equilibrium frequency. So the wording should be changed to avoid ambiguity.

      Edited. Now line 869.

      l. 853: How can a genome be "empty"? Do the authors mean an absence of any polymorphism?

      Edited to: “In SAIsim, a population is instantiated as a python object, and populated with individuals which are also represented by python objects. These individuals may be instantiated using genomes specified by the user, or by default carry no genomic variation.” Lines 913ff.

      l. 853: I do not see this diagramed in Figure 5

      Apologies, fixed to Fig. 2

      l. 864: is crossing-over in the model limited to female gametogenesis (reflecting the Drosophila case) or does it occur in both sexes?

      There is a variable in the simulator to make crossover female-specific. All simulations were performed with female-only crossover. Edited for clarity. “While the simulator can allow recombination in both sexes, all simulations presented only generate crossovers and gene conversion events for female gametes, in accordance with the biology of D. melanogaster.” Lines 928-929.

      l. 906: "F2" is ambiguous; does this mean that the mix of lines was allowed to breed for two generations? Also, in other places in the manuscript these flies appear to be referred to are "parental". So do not use F2.

      Edited, F2 language removed and replaced with being allowed to breed for two generations. Now lines 967ff.

      l. 910: this is incorrect/imprecise; what can be inferred is the frequency of the inversions in male gametes that contributed to fertilization. This would correspond to the frequency in successful males only if each successful male genotype had the same paternity share.

      Edited, now “Since no inversions could be inherited through the mothers, inversion frequencies among successful male gametes could be inferred from their pooled offspring.” Now line 994.

      l. 912: "without a controlled day/night cycle" meaning what? Constant light? Constant darkness? Daylight falling through the windows?

      Edited to “Unless otherwise noted, all flies were kept in a lab space of 23°C with around a degree of temperature fluctuation and without a controlled day/night cycle. Light exposure was dependent on the varying use of the space by laboratory workers but amounted to near constant exposure to at least a minimal level of lighting, with some variable light due to indirect lighting from adjacent rooms with exterior windows.” Now lines 1007-1010.

      l. 921: I cannot parse this sentence. Were the offspring isolated as virgins?

      No, the logistics of collecting virgins would have been prohibitive, and it did not seem essential for our experiment. Hopefully the edits to this section are clearer, now lines 978ff.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This manuscript reports the substrate-bound structure of SiaQM from F. nucleatum, which is the membrane component of a Neu5Ac-specific Tripartite ATP-dependent Periplasmic (TRAP) transporter. Until recently, there was no experimentally derived structural information regarding the membrane components of TRAP transporter, limiting our understanding of the transport mechanism. Since 2022, there have been 3 different studies reporting the structures of the membrane components of Neu5Ac-specific TRAP transporters. While it was possible to narrow down the binding site location by comparing the structures to proteins of the same fold, a structure with substrate bound has been missing. In this work, the authors report the Na+-bound state and the Na+ plus Neu5Ac state of FnSiaQM, revealing information regarding substrate coordination. In previous studies, 2 Na+ ion sites were identified. Here, the authors also tentatively assign a 3rd Na+ site. The authors reconstitute the transporter to assess the effects of mutating the binding site residues they identified in their structures. Of the 2 positions tested, only one of them appears to be critical to substrate binding.

      Strengths: 

      The main strength of this work is the capture of the substrate bound state of SiaQM, which provides insight into an important part of the transport cycle.

      Weaknesses: 

      The main weakness is the lack of experimental validation of the structural findings. The authors identified the Neu5Ac binding site, but only test 2 residues for their involvement in substrate interactions, which is quite limited. However, comparison with previous mutagenesis studies on homologues supports the location of the Neu5Ac binding site. The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not sufficiently experimentally tested for its contribution to Na+ dependent transport. This lack of experimental validation prevents the authors from unequivocally assigning this site as a Na+ binding site. However, the reporting of these new data is important as it will facilitate follow up studies by the authors or other researchers. 

      Comments on revisions: 

      Overall, the authors have done a good job of addressing the reviewers' comments. It's good to know that the authors are working on the characterisation of the potential metal binding site mutants - characterizing just a few of these will provide much-needed experimental support for this potential Na+ site. 

      The new MD simulations provide additional support for the new Na+ site and could be included.

      However, as the authors know, direct experimental characterisation of mutants is the ideal evidence of the Na+ site.

      Aside from the characterisation of mutants, which seems to be held up by technical issues, the only remaining issue is the comparison of the Na+- and Na+/Neu5Ac-bound states with ASCT2. It still does not make sense to me why the authors are not directly comparing their Na+ only and Na+/Neu5Ac states with the structures of VcINDY in the Na+-only and Na+/succinate bound states. These VcINDY structures also revealed no conformational changes in the HP loops upon binding succinate, as the authors see for SiaQM. Therefore, this comparison is very supportive. It is understood that the similarity to the DASS structure is mentioned on p.17, but it is also interesting and useful to note that TRAP and DASS transporters also share a lack of substrateinduced local conformational changes, to the extent these things have been measured.

      We acknowledge the summary weakness that experimental data to support the third Na binding site is critical.

      Based on the reviewer’s suggestion, we added the following in the main text and a supplementary figure comparing the Na ion binding sites between VcINDY and SiaQM. Page 13.

      “These two sodium ion binding sites are also conserved in the structure of VcINDY (Supplementary Figure 7) (Sauer et al., 2022). In both cases, the sodium ions are bound at the helix-loop-helix ends of HP1 and HP2. The binding sites utilize both side chains and main chain carbonyl groups. The number of main chain carbonyl interactions suggests that they are critical, and using main chain rather than side chain interactions minimizes the likelihood of point mutations affecting the binding.”

      Reviewer #3 (Public review): 

      The manuscript by Goyal et al report substrate-bound and substrate-free structures of a tripartite ATP independent periplasmic (TRAP) transporter from a previously uncharacterized homolog, F. nucleatum. This is one of most mechanistically fascinating transporter families, by means of its QM domain (the domain reported in his manuscript) operating as a monomeric 'elevator', and its P domain functioning as a substrate-binding 'operator' that is required to deliver the substrate to the QM domain; together, this is termed an 'elevator with an operator' mechanism.

      Remarkably, previous structures had not demonstrated the substrate Neu5Ac bound. In addition, they confirm the previously reported Na+ binding sites, and report a new metal binding site in the transporter, which seems to be mechanistically relevant. Finally, they mutate the substrate binding site and use proteoliposomal uptake assays to show the mechanistic relevance of the proposed substrate binding residues.

      Strengths: 

      The structures are of good quality, the presentation of the structural data has improved, the functional data is robust, the text is well-written, and the authors are appropriately careful with their interpretations. Determination of a substrate bound structure is an important achievement and fills an important gap in the 'elevator with an operator' mechanism.

      Weaknesses: 

      Although the possibility of the third metal site is compelling, I do not feel it is appropriate to model in a publicly deposited PDB structure without directly confirming experimentally. The authors do not extensively test the binding sites due to technical limitations of producing relevant mutants; however, their model is consistent with genetic assays of previously characterized orthologs, which will be of benefit to the field. Finally, some clarifications of EM processing would be useful to readers, and it would be nice to have a figure visualizing the unmodeled lipid densities - this would be important to contextualize to their proposed mechanism.

      Reviewer #3 (Recommendations for the authors): 

      I appreciate the authors' responses to our critiques; the revised manuscript is much improved and has addressed most of my concerns. I look forward to seeing their follow up experiments testing mutational e=ects. I think MD simulations of ion-binding sites on their own are supportive but by themselves not su=icient to prove the existence of a functional Na+-binding site. Some clarifications in the methods/supplements would satisfy my concerns about data processing and analysis.

      - Unliganded map: were the 141,272 particles used for one class of ab initio? This is unusual, usually multiple ab initio classes are used to further eliminate junk particles. The authors themselves use 6 classes for the substrate-bound dataset.

      We classified the particles into multiple 3-D classes.  There was no improvement in statistics or maps on splitting these further.  Hence, we did not pursue that further. 

      - Substrate-bound map: how did the four 'identical' classes independently refine? Are similar Na+/substate densities found in each separate class?

      The other classes refined to worse than 4.5 Å resolution. We stopped characterizing them past that point.  We were hoping to see multiple conformations that are diLerent – and hopefully a class where only two sodium ions could be bound.  However, any interpretation at 4.5 Å would be unreliable.

      - Both maps: all ab initio classes prior to final refinement should be displayed in the supplementary workflow, this is common for EM processing diagrams.

      We agree it is common – however, unless there is a good reason to discuss the other classes, we are not convinced of the value of crowding the figures.

      - What specific refinement package and version of Phenix are the authors using? It seems unusual that it is not possible to refine without a metal in Phenix real-space refinement, I have seen many structures where there is no issue refining without critical ions/waters. The authors should double check that they are using the appropriate scattering table for cryo-EM, which should be "electron".

      Sorry for the confusion – we did not mean to say we cannot refine without a metal. If we want to add something to the density, we cannot refine it without suggesting a metal or solvent.  The site without anything added will refine without any issues but in the absence of additional verification, we cannot be sure of the identity of the ions. We are confident of the metal binding site – but not confident of the exact metal bound.  We used Sodium as our first hypothesis.

      We don’t think the scattering factors will help in the identification of the ions. Servalcat as part of CCP-EM can produce diLerence maps and we believe that for identification of ions, it will require higher resolution (<2.5 Å) but at this resolution, we can say that there is a nonprotein density but not more than that. We were using “electron” (which we believe is default with phenix.real_space_refine). The refinement was performed using standard protocols and appropriate scattering factors (Phenix version 1.19x), and we have previously used similar refinement protocols for other maps/models (Example -Vinothkumar KR, Arya CK, Ramanathan G, Subramanian R. 2021. Comparison of CryoEM and X-ray structures of dimethylformamidase. Progress in Biophysics and Molecular Biology, CryoEM microscopy developments and their biological applications 160:66–78. doi:10.1016/j.pbiomolbio.2020.06.008).

      To convince the reviewer of the quality of the maps, we have added figures that show the model-to-map fit of all of the main secondary structural elements in both the unliganded and the Neu5Ac bound forms.

      - I certainly understand the authors' reluctance to not model the entirety of protein densities; however, I think it would be useful to highlight these densities in the global context of the protein. A common way to show this is to show the density proximal to protein chains in one color, and the remaining densities in a contrasting color (Figure 1 somewhat demonstrates this but it is di=icult to tell). I think this would be a nice figure to show the presence and location of unmodeled densities.

      We have modified supplementary figure 3 to include unmodelled densities in panels G and H for both structures.

      - Small detail, "uniform" is misspelled as "unifrom" in supplementary Figure 3. 

      Thank you.  Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate the positive assessment and agree that the experimental data offer valuable insights into HBV capsid assembly inhibition. Based on the reviewers' suggestions, we have clarified the cryo-EM data and added structural and mechanistic details throughout the manuscript, which we believe significantly enhance its overall clarity and impact. The manuscript now better reflects a promising strategy to interfere with the HBV life cycle. We have carefully addressed all comments to improve both the clarity and quality of the manuscript.

      Response to Public Reviews

      We greatly appreciate the insightful comments and suggestions from the reviewers. Below, we provide responses to the points raised in the public reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors present an interesting strategy to interfere with the HBV life cycle: the preparation of geranyl and peptides' dimers that could impede the correct assembly of hepatitis B core protein HBc into viable capsids. These dimers are of different nature, depending on the HBc site the authors plan to target. A preliminary study with geranyl dimers (targeting a hydrophobic site of HBc) was first investigated. The second series deals with peptide-PEG linker-peptide dimers, targeting the tips of HBc dimer spikes.

      Strengths:

      This work is very well conducted, combining ITC experiments (for determination of dimers' KD), cellular effects (thanks to the grafting of previously developed dimers with polyarginine-based cell penetrating peptide) HBV infected HEK293 cells and Cryo-EM studies.

      The findings of these research teams unambiguously demonstrated the interest of such dimeric structures in impeding the correct HBV life cycle and thus, could bring solutions in the control of its development. Ultimately, a new class of HBV Capside Assembly Modulators could arise from this study.

      There is no doubt that this work could bring very interesting information for people working on VHB.

      Weaknesses:

      Some minor corrections must be made, especially for a more precise description of the strategy and the chemical structure of the designed new VHB capsid assembly modulators.

      We are grateful for the positive feedback on the experimental design, the combination of ITC, cellular effects, and Cryo-EM studies, and the potential for developing new classes of HBV Capsid Assembly Modulators (CAMs). In the revised version we have clarified the design rationale for the choice of the PEG linker length in the Supplementary Information, linking it to the structural measurements of the capsid. Chemical structures and detailed molecular formulas were added and terms have been corrected. A scrambled dimeric peptide served as a negative control, which showed no binding, confirming the specificity of our designed peptide and ruling out non-specific interactions from other elements of the molecules such as the linkers. Finally, we have revised the nomenclature for the geranyl dimers to better reflect the chemical structure. All figures, including Figure 3, have been updated to high-resolution. All mentioned typos have been corrected. Consultation dates have been added to the website references. HPLC terminology was corrected.

      Reviewer #2 (Public Review):

      Summary:

      Vladimir Khayenko et al. discovered two novel binding pockets on HBc with in vitro binding and electron microscopy experiments. While the geranyl dimer targeting a central hydrophobic pocket displayed a micromolar affinity, the P1-dimer binding to the spike tip of HBc has a nanomolar affinity. In the turbidity assay and at the cellular level, an HBc aggregation from peptide crosslinking was demonstrated.

      Strengths:

      The study identifies two previously unexplored binding pockets on HBc capsids and develops novel binders targeting these sites with promising affinities.

      Weaknesses:

      While the in vitro and cellular HBc aggregation effects are demonstrated, the antiviral potential against HBV infection is not directly evaluated in this study.

      Thank you for recognizing the innovative approach of our work and the potential for developing novel antivirals targeting HBc. We have now included additional discussion on potential future experiments aimed at evaluating the compounds' effects on cellular physiology and viral infectivity.

      Reviewer #3 (public Review):

      Summary:

      HBV is a continuing public health problem and new therapeutics would be of great value. Khayenko et al examine two sites in the HBc dimer as possible targets for new therapeutics. Older drugs that target HBc bind at a pocket between two HBc dimers. In this study Khayenko et al examine sites located in the four helix bundle at the dimer interface.

      The first site is a pocket first identified as a triton100 binding site. The authors suggest it might bind terpenes and use geraniol as an example. They also test a decyl maltose detergent and a geraniol dimer intended for bivalent binding. The KDs were all in the 100µM range. Cryo-EM shows that geraniol binds the targeted site.

      The second site is at the tip of the spike. Peptides based on a 1995 study (reference 43) were investigated. The authors test a core peptide, two longer peptides, and a dimer of the longest peptide. A deep scan of the longest monomer sequence shows the importance of a core amino acid sequence. The dimeric peptide (P1-dimer) binds almost 100 fold better than the monomer parent (P1). Cryo-EM structures confirm the binding site. The dimeric peptide caused HBc capsid aggregation When HBc expressing cells were treated with active peptide attached to a cell penetrating peptide, the peptide caused aggregation of HBc antigen mirroring experiments with purified proteins.

      Strengths:

      The two sites have not been well investigated. This paper marks a start. The small collection of substrates investigated led to discovery of a dimeric peptide that leads to capsid aggregation, presumably by non-covalent crosslinking. The structures determined could be very useful for future investigations.

      Weaknesses:

      In this draft, the rational for targets for the triton x100 site is not well laid out. The target molecules bind with KDs weaker that 50µM. The way the structural results are displayed, one cannot be sure of the important features of binding site with respect to the the substrate. The peptide site and substrates are better developed, but structural and mechanistic details need to be described in greater detail.

      We appreciate the reviewer’s positive comments on identifying and targeting previously unexplored sites on HBc, and the potential utility of our dimeric peptides in future studies. We have revised the Results section to better explain the rationale behind targeting the hydrophobic binding site. Additionally, the structures have been revised for clearer presentation, and we now emphasize the key features of the binding site and the role of substrate specificity.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      For clarity, the chemical structure of SLLGRM peptide, geraniol and HAP molecules must be indicated, preferably in Fig. 1 (at least in the Supplementary Information section).

      We have now included the chemical structures of the SLLGRM peptide, geraniol, and HAP molecules for clarity in Figure 1 and in the main manuscript to ensure they are easily accessible for reference and to provide further detail and context.

      In the same idea, in Fig. 1 (and in the text): The molecular formula of heteroaryldihydropyrimidine HAP must be clearly indicated, as the nature of the heteroatom (S, O, N?) in this "heteroaryl" derivative is not indicated.

      The full molecular formula of HAP (((2S)-1-[[(4R)-4-(2-chloranyl-4-fluoranyl-phenyl)-5-methoxycarbonyl-2-(1,3-thiazol-2-yl)-1,4-dihydropyrimidin-6-yl]methyl]-4,4-bis(fluoranyl)-pyrrolidine-2-carboxylic acid), is now included the figure legend.

      with a polyethylene glycol (PEG) linker that could bridge the distance of 38 Å between the two opposing hydrophobic pockets": what is the rationale of the design of this linker? Authors must explain briefly why/how they have chosen this linker length and nature (please indicate a reference for the appropriate choice of PEG linker). Same remarks for dimers targeting the capsid spike tips, having 50 angstroms PEG linkers. So, the choice of the linker length must be clearly explained and not be only mentioned in the sentence of the discussion part "Using our structural knowledge of the capsid, particularly the distances between the spikes.

      We have now better clarified the rationale for the design of the PEG linker length. The linker lengths were specifically chosen based on structural knowledge of the capsid, particularly the measured distances between the spike tips (60 Å) and the hydrophobic pockets (40 Å). In the Supplementary Information (Supplementary Figure 1), we now clearly explain how these measurements guided the choice of PEG linker length, allowing for optimal bridging and interaction between the binding sites. This supplementary figure now explicitly connects the design rationale to the specific structural features of the capsid.

      I do not agree with the authors when they claim a "nanomolar affinity of 312 nM". To me, a nanomolar affinity would require several of few tens of nanoM (but not three hundreds) ... So, please correct with "sub-micromolar affinity of 312 nM" and all the other parts of the manuscript (title and caption of Figure 3..., "the peptide dimer (P1dC) with nanomolar affinity" "nanomolar levels"...).

      We thank the Rev#1 for pointing this out. Since the term "nanomolar affinity" can indeed be interpreted as referring to the lower end of the nanomolar range, rather than values close to 300 nM we have revised the manuscript to refer to the "sub-micromolar affinity" where applicable. This change has been made throughout the manuscript, including the subtitles and figure captions, and the text.

      The drug design strategy was to combine two peptides showing low affinity, attached by a PEG linker with an appropriate length and appears obvious to me. But a control experiment is anyway missing: the peptide-PEG linker derivative (not the dimer peptide-PEG linker-peptide...) should have been evaluated for an unambiguous proof of concept of these dimeric peptides. To my opinion, for the publication of this work, these experiments should be brought (eg, when describing the affinities of SLLGR dimers). I agree that Cryo-EM experiments bring evidences of the dimer binding but the affinity values for (peptide-PEG linker) derivatives would bring an additional proof (as the PEG flexible linkers was not resolved by Cryo-EM).

      Thank you for your thoughtful comment regarding the use of a monovalent control for the peptide-PEG linker. A scrambled dimeric peptide serves as a negative control. In ITC it showed no binding at all. Thereby ruling out possibly unspecific interactions mediated by the introduced PEG linker or handle itself.

      Given the complete lack of binding with the scrambled dimeric peptide, we believe this thoroughly excludes the need for an additional monovalent control, as it provides strong evidence that the observed binding is driven specifically by the designed peptide sequence and not by the linker or other structural components. We have now made this clarification more explicit in the revised manuscript to avoid any ambiguity. We hope this addresses your concern, and we appreciate your suggestion to further strengthen the rigor of the work. Despite its identical charge, molecular weight and atom composition the scrambled control did not cause HBc aggregation in living cells, thus indicating sequence specific action of the aggregating dimer.

      The nomenclature of the dimers must be modified because there is no logic between the name "long dimer" and the chemical structure. Particularly, the number of ethylene glycol motifs must be indicated: authors have to find an appropriate nomenclature indicating both the linker length and nature (small molecule or peptide) of the bivalent parts (and hence, do not mention anymore "short geranyl dimer" "long geranyl dimer").

      Thank you for your valuable suggestion regarding the nomenclature of the dimers. We agree that the terms "short geranyl dimer" and "long geranyl dimer" do not fully reflect the chemical structure of the molecules. In response, we have revised the nomenclature to provide a clearer indication of both the linker length and the nature of the bivalent parts. We now refer to the dimers as (Geranyl)<sub>2</sub>-Lys for the dimer with two geranyl groups attached to lysine and (Geranyl-PEG3)<sub>2</sub>-Lys for the dimer with a PEG3 linker (three ethylene glycol units) between the lysine amine and the geranyl groups. These revised names more accurately describe the structural differences and should avoid any ambiguity.

      Lines 198-199: "Among these, the dimerized P1 exhibited a higher 198 occupation of the binding site, as illustrated in Supplementary Figure 9." But in Supp. Fig. 9, dimer P1dC (10) is described. As the text above is describing P1-dimer (9), the Supp. Fig. 9 must be provided, if available. If not, please modify this conclusion accordingly. In the text, when mentioning dimerized P1 peptide, authors must indicate with which compound it deals: (9) or (10)?

      Thank you for your careful reading of the manuscript and for pointing out the discrepancy. In Supplementary Figure 9, the dimer described is P1dC, not P1d. The text has been revised to clarify this. We appreciate your attention to detail.

      Please note that the graphic quality of Figure 3 is bad as it results in pixelized drawings (especially for the chemical structures).

      Thank you for your feedback regarding the quality of Figure 3. We have now updated all figures, including Figure 3, to high-resolution PNG format with 300-500 dpi to ensure optimal graphic quality. This should resolve the pixelization issue, particularly for the chemical structures.

      Minor typos: "clinical studies, a third are CAMs.[6]" "to the spike base hydrophobic pocket" "geraniol affinity to the central hydrophobic pocket, we designed"

      We have corrected the punctuation in the mentioned sentences and appreciate your careful review of the manuscript.

      Concerning the citation of a website (references 5 and 6), I guess that the consultation date should be mentioned.

      We have now updated the references accordingly, including the consultation dates.

      In the Materials and Methods part, Peptide synthesis paragraph, authors must write "semi-preparative HPLC.

      It’s now corrected to "semi-preparative HPLC".

      In the supplementary information file, 1H and 13C NMR spectrum for the small molecule "Short Geranyl Dimer (SGD)" should be provided.

      The purity and identity of this Geranyl derivate were confirmed through UV detection in LC-MS and supported by the mass spectra, which provide robust and clear evidence of the compound's structure and well-accepted method for confirming the structure in this context. While we understand the value of NMR in structural analysis, we believe that additional analytical evidence is not critical for this study.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this study presents an innovative approach to target the HBV core protein and paves the way for developing new classes of antivirals with a distinct mechanism of action. The findings expand the current knowledge of druggable sites on HBc capsids and provide promising lead compounds. Future studies exploring the antiviral effects and optimizing the binders for therapeutic applications would be valuable next steps.

      We sincerely thank the reviewer for the positive assessment of our work and for highlighting its innovative approach to targeting the HBV core protein. We appreciate your recognition of the study's potential in paving the way for developing new classes of antivirals with distinct mechanisms of action. Below, we provide responses to each of the points raised.

      The significance of the central hydrophobic pocket as a target may require additional experiments for validation. Currently, the substrate binding activity is relatively low and appears to have a non-significant impact on HBc.

      We agree that the central hydrophobic pocket exhibits relatively weak binding affinity with the ligands tested in this study. However, we have provided additional structural evidence and affinity data to support its relevance as a druggable site. In recognition of the weak affinity of these small molecules, we expanded our focus to include peptide-based binders, which yielded higher affinities, particularly when dimerized.

      It might be more effective to present Figure 1B after summarizing all the results.

      We understand the reviewer’s suggestion. However, we decided to highlight and summarize the major findings early in the manuscript. We included Figure 1B at the beginning to allow readers to quickly grasp the core concepts and outcomes of our study.

      The labels for P1/P2 are presented in Figure 1A, yet their definitions are not provided until the second part of the Results section.

      We appreciate the reviewer’s observation. While see a benefit of showing three trackable sites on HBV early and as an overview but we also agree that the early presentation of P1/P2 could lead to some confusion. To resolve this, we have revised the figure to introduce only on the minimal peptide to avoid any ambiguity. The full dimer sequences and names are introduced later.

      Further investigation of the cytotoxic potential of peptide-induced HBc aggregation is necessary.

      Investigating the cytotoxicity together with infectivity is an important future direction but outside the scope of this study. We now elaborate on this point in the discussion.

      Reviewer #3 (Recommendations For The Authors):

      Two sites in the dimer interface are shown to bind ligands. It is not shown that filling these regions will change infection. The exhaustive studies by Bruss showed point mutations directly alter infection and would be of value to discuss.

      We thank Rev#3 for this very helpful comment. We now highlight how point mutations in these regions were shown to affect HBV infectivity. Thereby providing a link between our findings and how ligand binding might influence the viral life cycle.

      It is not shown whether the two sites interact. Molecular dynamics by Hadden or Gumbart may be informative. The failure to look for a connection between these sites is an oversight.

      We thank Rev#3 for the insightful suggestion to explore potential interactions between the two binding sites. We acknowledge that molecular dynamics (MD) simulations, such as those performed by Gumbart et al. and Hadden et al., could indeed provide valuable insights into the structural dynamics and potential cooperativity between these sites. Indeed, molecular dynamics of the HBV capsid by Perilla and Hadden has demonstrated significant flexibility in the capsid spikes and their interactions with neighboring subunits suggesting that the dynamics of binding sites could influence ligand accessibility and potential crosstalk.

      We believe that our own previous structural studies together with data in this work provide substantial experimental evidence on this topic. In Makbul et al. 2021a (doi.org/10.3390/microorganisms9050956) we observed that peptide binding (particularly P2) did not stabilize the spikes; instead, the upper part of the spikes exhibited considerable wobbling. This variability mirrored the conformational diversity reported in MD simulations. Using local classification, we noted that the variability in the spike's upper region was greater when P2 was bound than in its absence. Additionally, in Makbul et al. 2021b (doi.org/10.3390/v13112115), we showed that peptide binding had little effect on the hydrophobic pocket beneath the mobile spike region, located in the more rigid part of the capsid. While we observed F97 in the D-monomer adopting two alternate rotamer orientations upon P2 binding this was not exclusive to P2, as similar changes were noted in the L60V mutant even without bound peptide.

      We have updated the manuscript to briefly discuss this crosstalk, that provides additional context to our findings. Interestingly, only TX100—but not geraniol—completely flipped F97 into an alternate orientation, forming a new π-π stacking interaction with the mobile region of the spike. This finding suggests that interactions within the hydrophobic pocket are transmitted based on ligand specific interactions to the tips of the spikes. Thus, supporting and refining the concept of a crosstalk between binding sites, primarily initiated from the hydrophobic pocket in a ligand specific fashion.

      The logic for proposing a terpene ligand is strained. Comparisons are made to HBs and the HDV delta antigen. However, HBs is myristoylated not farnesylated and delta antigen binds HBs not HBc.

      We have revised the text to clarify the rationale for testing terpenes as ligands, focusing instead on the specific properties of the hydrophobic pocket targeted by geraniol.

      The authors suggest larger terpenes as binding agents, but there does not appear to be room for a longer molecule in the binding site. The authors do not discuss whether a longer molecule could be modeled in the site based on their density.

      We appreciate this observation and agree that the potential for larger terpenes to bind this site is not obvious from the structural data presented in this work. We have now included a more detailed visualization (Fig2D) and discussion of the hydrophobic binding pocket, based on the density observed in the presented geraniol structure and the previous triton structure and discuss its implications of the binding of larger hydrophobic molecules into the site (Fig 2D).

      The authors note that the structure could explain molecular details of this site, but these are not discussed. A more complete analysis of the geraniol protein is necessary, including an estimate of the resolution of that density.

      We agree that a more complete analysis of the hydrophobic binding site was warranted. We have now expanded the discussion of the structural details of this binding site based on the geraniol-bound structure, the density and occupancy accounted by this ligand. These additional details (Fig 2C,D and Fig 5) should provide a clearer understanding of the binding interactions observed.

      The dimeric geraniol is marginally better binding than the monomer, two-fold, but this could be due to doubling the number of geraniols per ligand or due to an undefined interaction of the extended molecule with the surface of the capsid. A geraniol linker should be tested.

      The modest improvement in binding may indeed only reflect the doubled number of geraniols rather than linker-mediated avidity effects. Interaction of the linker with the capsid surface is ruled-out by the scrambled control that included the same linkers but did not show any capacity to bind.

      Is the enhanced binding of dimer due to bivalent binding of dimer to one capsid? Is it a chance interaction of the linker with the surface of HBc, which is easily tested? Is it an avidity effect due to aggregation of capsids?

      Thank you for this insightful question. Our data suggest that the enhanced binding is due to bivalent interactions. To address the possibility of non-specific interactions from either the handle or the linker, we included a scrambled dimeric peptide as a negative control, which showed no binding. This rules out non-specific interactions from the linker or handle. Given this, we believe an additional monovalent control is unnecessary, as the scrambled control confirms that the binding is driven by the geraniol and peptide warheads alone. We have clarified this in the revised manuscript and appreciate your suggestion to strengthen the study.

      The experimental analysis of point mutation of P1 is not analyzed beyond stating that it shows the importance of the core peptide sequence. Is there rationale for the effect of R3 to E and K10 to E mutation?

      We appreciate the reviewer's curiosity and request for a more detailed discussion of the P1 deep mutational scan data and its implications. The observed low mutation tolerance of the core peptide sequence SLLGRM regarding HBc binding is highly consistent with our prior structural data and binding studies in solutions (https://doi.org/10.3390/microorganisms9050956) as well as the results from the original phage library screening (M. R. Dyson, K. Murray, Proceedings of the National Academy of Sciences 1995, 92, 2194–2198), and the binding data presented here. Notably, the data set does not suggest that additional binding interfaces contribute to the aggregation seen with N-terminal elongated P1 and P2 versus the non-aggregating shorter SLLGRM. While the positional scan largely aligns with previous phage binding hierarchy and quantified ligands, we were previously prompted by surprising affinity gains for positive to negative amino exchanges in related peptides in same way as Rev#3: Specifically, “SLLGEM” has been predicted previously and here to show enhanced affinity over “SLLGRM”. Quantification in solution, however, could not confirm this enhanced HBV binding affinity (Makbul et al. 2021 Microorganisms), which could not be recapitulated by in solution quantification. In the revised version of the manuscript we now highlight the possible limited predictive power of this assay for positions where positively charged residues are exchanged by negatively charged residues (Figure legend of Fig 3D).

      The fluctuations in Figure 3B could be largely magnification of noise due to changing the y-axis. The fluctuations can be characterized as standard variation, excluding the injections, to allow a quantitative judgment.

      Isothermal titration calorimetry heat fluctuations without injections are now shown in the supplementary information scaled to the same y-axis (Supplementary Figure 3D). 

      Molecular graphics throughout are too small and poorly labeled.

      We have revised the molecular graphics throughout the manuscript to increase their size and improve labeling for clarity. All figures are now provided in 500dpi.

      In Figure 2, compounds 1 and 2 are pyrophosphates. The label in the figure should be corrected.

      Thank you for pointing this out. These compounds were removed for clarity.

      In the introduction, the phrase "discontinuation frequently leads to relapse" should be changed to something less ambiguous.

      Thank you for highlighting this point regarding the phrasing in the introduction. We have revised the statement to more accurately reflect the clinical situation by specifying that stopping treatment often results in viral rebound and disease recurrence in many patients. This adjustment clarifies the intended meaning and addresses the ambiguity you identified. We hope this revision better aligns with the clinical context of HBV management and improves the overall clarity of the manuscript.

      Define "functional cure" in the introduction.

      Thank you for your suggestion to clarify the term 'functional cure.' We have revised the manuscript and instead of ”functional cure” we mention the goal of sustained viral suppression without detectable HBV DNA and loss of hepatitis B surface antigen (HBsAg) without the need for continuous therapy. This should provide greater clarity for readers and improve the overall comprehensibility of the introduction.

      The sentence beginning line 92 is not clear unless one has already read the paper. Figure 1 is not well described.

      Thank you for your valuable feedback regarding the clarity of this sentence and the legend of Figure 1. We have revised the text and legend to provide more context and improve the flow for readers who are unfamiliar with the specifics of the study. The revised version now clearly explains the targeted binding sites and the purpose of the bivalent binders at the beginning of the results section.

      In line 235 the meaning is not clear. What is in excess? Is there free CPP in solution? Is it the charge on the CPP?

      We have clarified the passage as requested.

      When describing peptide-induced aggregation, Figures 5 and 6, figure 1B is never referred to. Figure 1B would work better as part of Figure 6.

      We understand the reviewer’s suggestion. However, we decided to highlight and summarize the major findings and the underlying hypothesis early in the manuscript. We included Figure 1B at the beginning to allow readers to quickly grasp a core concept and outcome of our study.

      We now however refer to Figure 1B and together with all the other changes hope that we have improved the clarity and quality of the manuscript.

      We appreciate your constructive feedback and the opportunity to further refine the work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Rigor in the design and application of scientific experiments is an ongoing concern in preclinical (animal) research. Because findings from these studies are often used in the design of clinical (human) studies, it is critical that the results of the preclinical studies are valid and replicable. However, several recent peer-reviewed published papers have shown that some of the research results in cardiovascular research literature may not be valid because their use of key design elements is unacceptably low. The current study is designed to expand on and replicate previous preclinical studies in nine leading scientific research journals. Cardiovascular research articles that were used for examination were obtained from a PubMed Search. These articles were carefully examined for four elements that are important in the design of animal experiments: use of both biological sexes, randomization of subjects for experimental groups, blinding of the experimenters, and estimating the proper size of samples for the experimental groups. The findings of the current study indicate that the use of these four design elements in the reported research in preclinical research is unacceptably low. Therefore, the results replicate previous studies and demonstrate once again that there is an ongoing problem in the experimental design of preclinical cardiovascular research.

      Strengths:

      This study selected four important design elements for study. The descriptions in the text and figures of this paper clearly demonstrate that the rate of use of all four design elements in the examined research articles was unacceptably low. The current study is important because it replicates previous studies and continues to call attention once again to serious problems in the design of preclinical studies, and the problem does not seem to lessen over time.

      Weaknesses:

      The current study uses both descriptive and inferential statistics extensively in describing the results. The descriptive statistics are clear and strong, demonstrating the main point of the study, that the use of these design elements is quite low, which may invalidate many of the reported studies. In addition, inferential statistical tests were used to compare the use of the four design elements against each other and to compare some of the journals. The use of inferential statistical tests appears weak because the wrong tests may have been used in some cases. However, the overall descriptive findings are very strong and make the major points of the study.

      We sincerely appreciate the reviewer’s comments and detailed feedback and their recognition of the importance of this work in replicating previous studies and calling attention to the problems in preclinical study design. In response to the reviewer’s suggestions, we have recalculated our inferential statistics. In place of our previous inferential statistics, we have used an alternative correction calculation for p-values (Holm-Bonferroni corrections) and used median-based linear model analyses and nonparametric Kruskal-Wallis tests that are more appropriate for analyzing this dataset. Our overall trends in results remain the same.

      Reviewer #2 (Public Review):

      Summary

      This study replicates a 2017 study in which the authors reviewed papers for four key elements of rigor: inclusion of sex as a biological variable, randomization of subjects, blinding outcomes, and pre-specified sample size estimation. Here they screened 298 published papers for the four elements. Over a 10 year period, rigor (defined as including any of the 4 elements) failed to improve. They could not detect any differences across the journals they surveyed, nor across models. They focused primarily on cardiovascular disease, which both helps focus the research but limits the potential generalizability to a broader range of scientific investigation. There is no reason, however, to believe rigor is any better or worse in other fields, and hence this study is a good 'snapshot' of the progress of improving rigor over time.

      Strengths

      The authors randomly selected papers from leading journals, e.g., PNAS). Each paper was reviewed by 2 investigators. They pulled papers over a 10-year period, 2011 to 2021, and have a good sample of time over which to look for changes. The analysis followed generally accepted guidelines for a structured review.

      Weaknesses

      The authors did not use the exact same journals as they did in the 2017 study. This makes comparing the results complicated. Also, they pulled papers from 2011 to 2021, and hence cannot assess the impact of their own prior paper.

      The authors write "the proportion of studies including animals of both biological sexes generally increased between 2011 and 2021, though not significantly (R2= 0.0762, F(1,9)= 0.742, p= 0.411 (corrected p=8.2". This statement is not rigorous because the regression result is not statistically significant. Their data supports neither a claim of an increase nor a decrease over time. A similar problem repeats several times in the remainder of their results presentation.

      I think the Introduction and the Discussion are somewhat repetitive and the wording could be reduced.

      Impact and Context

      Lack of reproducibility remains an enormous problem in science, plaguing both basic and translational investigations. With the increased scrutiny on rigor, and requirements at NIH and other funding agencies for more rigor and transparency, one would expect to find increasing rigor, as evidenced by authors including more study design elements (SDEs) that are recommended. This review found no such change, and this is quite disheartening. The data implies that journals-editors and reviewers-will have to increase their scrutiny and standards applied to preclinical and basic studies. This work could also serve as a call to action to investigators outside of cardiovascular science to reflect on their own experiences and when planning future projects.

      We sincerely appreciate the reviewer’s insights and comments and recognition of our work contributing to the growing body of evidence on the lack of rigor in preclinical cardiovascular research study design. Regarding the weaknesses the reviewer noted; the referenced 2017 publication details a study by Ramirez et al, and was not conducted by our group. Our study aimed to expand upon their findings by using a more recent timeframe and an alternative list of highly respected cardiovascular research journals. We have now better clarified this distinction in the manuscript. We have also addressed our phrasing regarding the lack of statistical significance in the increase of the proportion of studies including animals of both sexes from 2011-2021.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Many of the methods in this study were strong or adequate. Although the descriptive statistics appear solid, there are significant problems that need to be addressed in the selection and use of inferential statistics.

      (1) One of the design elements that was studied was sample size estimation. This is usually done by a power analysis. The authors should consider what group size for the examined journals is adequate for their statistics to be valid. Or they could report the power of their studies to achieve a given meaningful difference.

      We thank the reviewer for this excellent observation. We unfortunately failed to conduct an a priori power analysis. Previous research (Gupta, et al. 2016) suggests that post-hoc power calculations should not be carried out after the study has been conducted. We acknowledge the importance of establishing a sufficient sample size to draw sound conclusions based on an adequate effect size, and we regret that we did not carry out the appropriate estimations. We are very appreciative of the reviewer’s suggestions and aim to implement such an appropriate study design element in future studies.

      Gupta KK, Attri JP, Singh A, Kaur H, Kaur G. Basic concepts for sample size calculation: Critical step for any clinical trials!. Saudi J Anaesth. 2016;10(3):328-331. doi:10.4103/1658-354X.174918

      (2) A Bonferroni correction was used extensively. Because of its use, the corrected p values often appear much too high. The Bonferroni test becomes much too conservative for more than 3 or 4 tests. I suggest using a different test for multiple comparisons.

      We thank the reviewer for their insightful suggestion. We have updated all p-values to reflect a Holm-Bonferroni correction instead. All p-values have been corrected and updated.

      (3) The use of the chi-square test for categorical data is appropriate. However, the t-test and multiple regression tests are designed for continuous variables. Here, it appears that they were used for the nominal variables (Table 1). For these nominal data, other nonparametric tests should be used.

      We thank the reviewer for this valuable insight. We have updated our statistical analysis methods and now use nonparametric Kruskal-Wallis tests to analyze differences in SDE reporting across journals, instead of chi-square test. Our reported p-values have been adjusted accordingly.

      (4) It is not clear exactly when each test is used. The stats section in Methods should better delineate when each test is used. In addition, it would be helpful to include the test used in the figure legends.

      We thank the reviewer for bringing up this important point. We have now updated the methods section to better delineate which tests were used, and also included the specific tests in the figure legends.

      (5) You will need to rewrite some sections of the text to reflect the changes due to changing your use of statistics.

      We have rewritten the sections of the text to reflect the changes in our use of statistics.

      Here are a few comments on the presentation.

      (1) Some of the figure legends are almost impossible to read. They are too congested.

      We thank the reviewer for pointing this out. We have edited the figure legends to make them more readable. We will also attach a pdf with the graphs to allow for easier formatting.

      (2) Also, is it possible to drop some of the panels in Figure 1?

      The panels in figure 1 have been rearranged to make them more readable. We believe that each panel provides valuable visual summaries of our data, that will aid readers in understanding our results.

      (3) It is not mandatory that values of y-axis on the graphs go up 100% (Figs 2 and 3). Using a maximum value of 100% clumps the lines visually. I suggest a max value on the y-axis of the graph of 50% or 60%. That will spread the lines better visually so differences can better be seen.

      We thank the reviewer for considering the experience of our paper’s readers. The y-axes of Figures 2 and 3 have been truncated to 50%. The trend lines in each Figure now appear more separated and differences can better be seen.

      Reviewer #2 (Recommendations For The Authors):

      The authors did not use the exact same journals as they did in the 2017 study. This makes comparing the results complicated. Also, they pulled papers from 2011 to 2021, and hence cannot assess the impact of their own prior paper.

      We appreciate the reviewer’s concern in maintaining consistency with the paper published by Ramirez, et al. in 2017. To clarify, our efforts focused on providing a replication study that expanded upon the original Ramirez publication - which we have no affiliation with. For our study, we used different academic journals than those used by Ramirez, et al, and also a different time-frame. We have updated the language in the manuscript to better-clarify the purpose and parameters of our study relative to the previous, unaffiliated, study.

      The authors write "the proportion of studies including animals of both biological sexes generally increased between 2011 and 2021, though not significantly (R2= 0.0762, F(1,9)= 0.742, p= 0.411 (corrected p=8.2". This statement is not rigorous because the regression result is not statistically significant. Their data supports neither a claim of an increase nor a decrease over time. A similar problem repeats several times in the remainder of their results presentation.

      Thank you for bringing this information to our attention. We agree with the concern regarding the statement, “the proportion of studies including animals of both biological sexes generally increased between 2011 and 2021, though not significantly (R2= 0.0762, F(1,9)= 0.742, p= 0.411 (corrected p=8.2.” We have rephrased the statement. Our updated Holm-Bonferroni corrected p-value is now noted in this more appropriately worded description of our results. Lastly, we have addressed the wording and redundancy seen in both the introduction and discussion and have made both more concise.

      I think the Introduction and the Discussion are somewhat repetitive and the wording could be reduced.

      We thank the reviewer for bringing this to our attention. We have addressed the redundancy across the Introduction and the Discussion. We have also altered the wording to reflect a more concise explanation of our study.

      The 'trends' are not statistically significant. A non-significant trend does not exist and no claim of a 'trend' is justified by the data.

      We thank the reviewer for this observation. We have updated the phrasing of ‘trends’ in all areas of the manuscript.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Authors of this article have previously shown the involvement of the transcription factor Zinc finger homeobox-3 (ZFHX3) in the function of the circadian clock and the development/differentiation of the central circadian clock in the suprachiasmatic nucleus (SCN) of the hypothalamus. Here, they show that ZFHX3 plays a critical role in the transcriptional regulation of numerous genes in the SCN. Using inducible knockout mice, they further demonstrate that the deletion Of Zfhx3 induces a phase advance of the circadian clock, both at the molecular and behavioral levels. 

      Strengths: 

      - Inducible deletion of Zfhx3 in adults 

      - Behavioral analysis 

      - Properly designed and analyzed ChIP-Seq and RNA-Seq supporting the conclusion of the behavioral analysis 

      Weaknesses: 

      - Further characterization of the disruption of the activity of the SCN is required. 

      (1) We thank the reviewer for their valuable inputs. Indeed, a comprehensive behavioral assessment of mice of this genotype was executed in Wilcox et al. ;2017 study. In Wilcox et al.; 2017, Figure 4, 6-h phase advance (jetlag) clearly showed faster reentrainment in ZFHX3-KO mice when compared to the controls.

      - The description of the controls needs some clarification. 

      (2) We agree with the reviewer and will modify the text to clearly describe the controls wherever mentioned.

      Reviewer #2 (Public review): 

      Summary: 

      ZFHX3 is a transcription factor expressed in discrete populations of adult SCN and was shown by the authors previously to control circadian behavioral rhythms using either a dominant missense mutation in Zfhx3 or conditional null Zfhx3 mutation using the Ubc-Cre line (Wilcox et al., 2017). In the current manuscript, the authors assess the function of ZFHX3 by using a multi-omics approach including ChIPSeq in wildtype SCNs and RNAseq of SCN tissues from both wildtype and conditional null mice. RNAseq analysis showed a loss of oscillation in Bmal1 and changes in expression levels of other clock output genes. Moreover, a phase advance gene transcriptional profile using the TimeTeller algorithm suggests the presence of a regulatory network that could underlie the observed pattern of advanced activity onset in locomotor behavior in knockout mice. 

      In figure1, the authors identified the ZFHX3 bound sites using ChIPseq and compared the loci with other histone marks that occur at promoters, TSS, enhancers and intergenic regions. And the analysis broadly points to a role for ZFHX3 in transcriptional regulation. The vast majority of nearly 40000 peaks overlapped H3K4me3 and K27ac marks, active promoters which also included genes falling under the GO category circadian rhythms. However, no significant differential ZFHX3 bound peaks were detected between ZT3 and ZT15. In these experiments, it is not clear if and how the different ChIP samples (ZFHX3 and histone PTM ChIPs) were normalized/downsampled for analysis. Moreover, it seems that ZFHX3 binding or recruitment has little to do with whether the promoters are active.

      (3) We thank the reviewer for their valuable comment. Different ChIP samples. (ZFHX3 and histone PTM ChIPs) were treated in the same manner from preprocessing (quality control by FastQC, Trimming, Alignment to mm10 genome and Peak calling) using MACS2 as mentioned in Methods. The data was normalized using bamCoverage tools and bigwig files were generated for visual inspection using USCS Genome Browser. These additional details will be added to Methods. Finally, BEDTools was employed to study overlapping peaks between ZFHX3 and histone PTMs.

      We agree that, alone, the current data does not make any claim for ZFHX3 being crucial for promoter to be active. Our data clearly suggests that a vast majority of ZFHX3 genomic binding in the SCN was observed at active promoters marked by H3K4me3 and H3K27ac and potentially regulating gene transcription. 

      Based on a enrichment of ARNT domains next to K4Me3 and K27ac PTMs, the authors propose a model where the core-clock TFs and ZFHX3 interact. If the authors develop other assays beyond just predictions to test their hypothesis, it would strengthen the argument for role in circadian transcription in the SCN. It would be important in this context to perform a ChIP-seq experiment for ZFHX3 in the knockout animal (described from Figure 2 onwards) to eliminate the possibility of non-specific enrichment of signal from "open chromatin'. Alternatively, a ChIPseq analysis for BMAL1 or CLOCK could also strengthen this argument to identify the sites co-occupied by ZFHX3 and core-clock TFs. 

      (4a) We agree that follow-up experiments such as BMAL1/CLOCK ChIPseq suggested by the reviewer will further confirm the proposed interaction of ZFHX3 with core-clock TFs. However, this is beyond the scope of the current study. 

      (4b) Again, conducting complementary ChIPseq in ZFHX3 knockout mice will strengthen the findings, but conducting TF-ChIPseq in a specific brain tissue such as the SCN (unlike peripheral tissues such as liver) does not only warrant use of multiple animals per sample but is also technically challenging and time-consuming to ensure specificity of the sample. For these reasons, datasets such as ours on the SCN are uncommon. Furthermore, in this particular context, we are certain that, based on current dataset, the ZFHX3 peaks (narrow) we observed were well-defined and met the specified statistical criteria mitigating any risk of signal arising from non-specific enrichment from open-chromatin regions. 

      Next, they compared locomotor activity rhythms in floxed mice with or without tamoxifen treatment. As reported before in Wilcox et al 2017, the loss of ZFHX3 led to a shorter free running period and reduced amplitude and earlier onset of activity. Overall, the behavioral data in Figure 2 and supplementary figure 2 has been reported before and are not novel.

      (5) We recognise that a detailed circadian behavior assessment from adult mice lacking ZFHX3 has been conducted previously by Nolan lab (Wilcox et al; 2017). In the current study, however, we used a separate cohort of mice, to focus on the behavioral advance noted in 24-h LD cycle and generate a more refined assessment. Importantly, these mice were also used for transcriptomic studies as detailed in Figure 3, which we consider to be a positive feature of our experimental design: behavior and molecular analyses were performed on the same animals. 

      Next, the authors performed RNAseq at 4hr intervals on wildtype and knockout animals maintained in light/dark cycles to determine the impact of loss of ZFHX3. Overall transcriptomic analysis indicated changes in gene expression in nearly 36% of expressed genes, with nearly half being upregulated while an equal fraction was downregulated. Pathways affected included mostly neureopeptide neurotransmitter pathways. Surprisingly, there was no correlation between the direction in change in expression and TF binding since nearly all the sites were bound by ZFHX3 and the active histone PTMs. The ChIP-seq experiment for ZFHX3 in the UBC-Cre+Tam mice again could help resolve the real targets of ZFHX3 and the transcriptional state in knockout animals. 

      (6) We agree with the reviewer that most of the differentially expressed genes showed ZFHX3 binding at active promoter sites. That said, the current dataset is in line with recently published ZFHX3-CHIPseq data by Baca et al; 2024 [PMID: 38412861] in human neural stem cells and Hu et al; 2024 [PMID: 38871709] in human prostate cancer cells that clearly suggests ZFHX3 binds at active promoters and act as chromatin remodellers/mediators that modulate gene transcription depending on the accessory TFs assembled at target genes. Therefore, finding no correlation in the direction of change in expression is not striking.  

      To determine the fraction of rhythmic transcripts, Using dryR, the authors categorise the rhythmic transcriptome into modules that include genes that lose rhythmicity in the KO, gain rhythmicity in the KO or remain unaffected or partially affected. The analysis indicates that a large fraction of the rhythmic transcriptome is affected in the KO model. However, among core-clock genes only Bmal1 expression is affected showing a complete loss of rhythm. The authors state a decrease in Clock mRNA expression (line 294) but the panel figure 4A does not show this data. Instead it depicts the loss in Avp expression - {{ misstated in line 321 ( we noted severe loss in 24-h rhythm for crucial SCN neuropeptides such as Avp (Fig. 3a).}} 

      (7a) Indeed, among the core-clock genes rhythmic expression is lost after ZFHX3 knockout only for Bmal1. However, given the mice were rhythmic (as assessed by wheel-running activity) in LD conditions, the observed 24-h gene expression rhythm in the majority of core-clock genes (Pers and Crys)  is consistent with behavior data,  and suggests towards a molecular clock with plausible scenarios as explained at line 439. That said, the unique and well-defined changes (amplitude and phase) observed as demonstrated in Figure 5 highlights a model in which ZFHX3 exerts differential control, for example in case of Per2 noted advance in molecular rhythm (~2-h), but no such change in Cry, presents an opportunity to delineate further the regulation of TTFL genes. 

      (7b) Line 294 states- loss of Bmal1 rhythm and reduction in Clock mRNA . Figure 4a is in support of former. We shall revise the text for clarity. 

      (7c) As rightly pointed out by the reviewer, line 321 is referring to loss of Avp expression and we shall correct the typo by replacing “Figure 3a to 4a”. Thank you.  

      However, core-clock genes such as Pers and Crys show minor or no change in expression patterns while Per2 and Per3 show a ~2hr phase advance. While these could only weakly account for the behavioral phase advance, the authors used TimeTeller to assess circadian phase in wildtype and ZFHX3 deficient mice. This approach clearly indicated that while the clock is not disrupted in the knockout animals, the phase advance can be correctly predicted from a network of gene expression patterns. 

      Strengths: 

      The authors use a multiomic strategy in order to reveal the role of the ZFHX3 transcription factor with a combination of TF and histone PTM ChIPseq, time-resolved RNAseq from wildtype and knockout mice and modeling the transcriptomic data using TimeTeller. The RNAseq experiments are nicely controlled and the analysis of the data indicates a clear impact on gene-expression levels in the knockout mice and the presence of a regulatory network that could underlie the advanced activity onset behavior. 

      Weaknesses: 

      It is not clear whether ZFHX3 has a direct role in any of the processes and seems to be a general factor that marks H3K4me3 and K27ac marked chromatin. Why it would specifically impact the core-clock TTFL clock gene expression or indeed daily gene expression rhythms is not clear either. Details for treatment of different ChIP samples (ZFHX3 and histone PTM ChIPs) on data normalization for analysis are needed. The loss of complete rhythmicity of Avp and other neuropeptides or indeed other TFs could instead account for the transcriptional deregulation noted in the knockout mice.

      (8) We thank the reviewer for the constructive feedback.  The current data suggests ZFHX3 acts as a mediating factor, occupying targeted active promoter sites and regulating gene expression by partnering with other key TFs in the SCN. Please see point 7 for clarification. The binding sites of ZFHX3 clearly showed enrichment for E-box(CACGTG) motif bound by CLOCK/BMAL1 along with binding sites for key SCN-specific TFs such as RFX (please see Supplementary Fig1). Our data thereby shows that it affects both core-clock and clock output genes (at varied levels) thereby exercising a pervasive control over the SCN transcriptome. 

      For treatment of ChIP samples please see point 4. We followed ENCODE guidelines strictly.

    1. Author response:

      We sincerely appreciate the insightful feedback and constructive suggestions provided by the reviewers. We thank reviewers for their valuable support in improving our manuscript.

      In response to the public reviews raised by reviewers, we plan to make the following revisions:

      (1) Most metadata have been rectified through collaborative review of original literature sources rather than automated processes. We intend to incorporate a detailed discussion on this matter in the revised manuscript.

      (2) We will include a corrections table for entries to provide clarity and transparency regarding any amendments made.

      (3) Additional references will be included to elucidate the rationale behind the selection of interact residues definition methods and the set threshold. The threshold is not fixed. In fact, we utilized a 5Å cutoff in current version, listing all residues with distances less than 5Å alongside the corresponding distances. The researchers could screen the residues through distance according to their custom cutoff. To offer researchers flexibility, we will also provide interact residues and corresponding distances with higher cutoffs for custom screening. These enhancements will be detailed in the revised manuscript.

      (4)We acknowledge the importance of expanding the database to include a wider range of experimental information and complexes with diverse target sizes. Regrettably, immediate updates to address these limitations are not feasible at this time. Thus, we will give an illustration in the later detail response to reviewers.

    1. Author response:

      We very much appreciate the reviewers’ and editor’s overall positive responses to our manuscript "Evolution of lateralized gustation in nematodes".

      Reviewer #1:

      The mechanism of lsy-6-independent establishment of ASEL/R asymmetry in P. pacificus remains uncharacterized. 

      We thank the reviewer for recognizing the novel contributions of our work in revealing the existence of alternative pathways for establishing neuronal lateral asymmetry despite the absence of the lsy-6 miRNA in a divergent nematode species. We are certainly encouraged now to search for genetic factors that abolish asymmetric expression of gcy-22.3.

      Reviewer #2:

      (1) The authors observe only weak attraction of C. elegans to NaCl. These results raise the question of whether the weak attraction observed is the result of the prior salt environment experienced by the worms. More generally, this study does not address how prior exposure to gustatory cues shapes gustatory responses in P. pacificus. Is salt sensing in P. pacificus subject to the same type of experience-dependent modulation as salt sensing in C. elegans? 

      Proposed revision: For our live imaging experiments, we had not considered if starved P. pacificus animals in the presence of salt may exhibit responses different from a well-fed state. However, we will venture to address the effect of experience-dependent modulation in P. pacificus chemotaxis behavior using NH4Cl.

      (2) A key finding of this paper is that the Ppa-CHE-1 transcription factor is expressed in the Ppa-AFD neurons as well as the Ppa-ASE neurons, despite the fact that Ce-CHE-1 is expressed specifically in Ce-ASE. However, additional verification of Ppa-AFD neuron identity is required. Based on the image shown in the manuscript, it is difficult to unequivocally identify the second pair of CHE-1-positive head neurons as the Ppa-AFD neurons. Ppa-AFD neuron identity could be verified by confocal imaging of the CHE-1-positive neurons, co-expression of Ppa-che-1p::GFP with a likely AFD reporter, thermotaxis assays with Ppa-che-1 mutants, and/or calcium imaging from the putative Ppa-AFD neurons. 

      We are happy to provide additional evidence to confirm Ppa-AFD neuron identity since the expression of Ppa-CHE-1 in non-ASE amphid neurons is one of the major differences between the two nematode specie

      Proposed revision: We will provide results showing the Ppa-ttx-1::gfp reporter expression in finger-like neuronal endings and Ppa-_TTX-1::ALFA co-localization with _Ppa-che-1::gfp in the putative AFD neurons and discuss the possible role of Ppa-CHE-1 in AFD differentiation. We attempted to obtain AFD markers using several reporter strains. However, Ppa-gcy-8.1p::gfp(csuEx101) (PPA24212) showed no expression while Ppa-gcy-8.2p::gfp(csuEx100) (PPA41407) showed only expression in pharyngeal cells.

      (4) The authors show that silencing Ppa-ASE has a dramatic effect on salt chemotaxis behavior. However, these data lack control with histamine-treated wild-type animals, with the result that the phenotype of Ppa-ASE-silenced animals could result from exposure to histamine dihydrochloride. This is an especially important control in the context of salt sensing, where histamine dihydrochloride could alter behavioral responses to other salts. 

      Proposed revision: Thank you for noticing this oversight. The control for histamine-treated wild-type worms in the Ppa-ASE silencing experiments was inadvertently left out in the original submission. Because the HisCl transgene is on a randomly segregating transgene array, we have scored worms with and without the transgene expressing the co-injection marker (Ppa-egl-20p::rfp expressed in the tail) to show that the presence of the transgene is necessary for the knockdown of NH4Br attraction.

      We will also address most of the other more minor suggestions and clarifications sought by the reviewers.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper Kawasaki et al describe a regulatory role for the PIWI/piRNA pathway in rRNA regulation in Zebrafish. This regulatory role was uncovered through a screen for gonadogenesis defective mutants, which identified a mutation in the meioc gene, a coiled-coil germ granule protein. Loss of this gene leads to redistribution of Piwil1 from germ granules to the nucleolus, resulting in silencing of rRNA transcription.

      Strengths:

      Most of the experimental data provided in this paper is compelling. It is clear that in the absence of meioc, PiwiL1 translocates in to the nucleolus and results in down regulation of rRNA transcription. the genetic compensation of meioc mutant phenotypes (both organismal and molecular) through reduction in PiwiL1 levels are evidence for a direct role for PiwiL1 in mediating the phenotypes of meioc mutant.

      Weaknesses:

      Questions remain on the mechanistic details by which PiwiL1 mediated rRNA down regulation, and whether this is a function of Piwi in an unperturbed/wildtype setting. There is certainly some evidence provided in support of the natural function for piwi in regulating rRNA transcription (figure 5A+5B). However, the de-enrichment of H3K9me3 in the heterozygous (Figure 6F) is very modest and in my opinion not convincingly different relative to the control provided. It is certainly possible that PiwiL1 is regulating levels through cleavage of nascent transcripts. Another aspect I found confounding here is the reduction in rRNA small RNAs in the meioc mutant; I would have assumed that the interaction of PiwiL1 with the rRNA is mediated through small RNAs but the reduction in numbers do not support this model. But perhaps it is simply a redistribution of small RNAs that is occurring. Finally, the ability to reduce PiwiL1 in the nucleolus through polI inhibition with actD and BMH-21 is surprising. What drives the accumulation of PiwiL1 in the nucleolus then if in the meioc mutant there is less transcription anyway?

      Despite the weaknesses outlined, overall I find this paper to be solid and valuable, providing evidence for a consistent link between PIWI systems and ribosomal biogenesis. Their results are likely to be of interest to people in the community, and provide tools for further elucidating the reasons for this link.

      The amount of cytoplasmic rRNA in piwi+/- was increased by 26% on average (figure 5A+5B), the amount of ChiP-qPCR of H3K9 was decreased by about 26% (Figure 6F), and ChiP-qPCR of Piwil1 was decreased by 35% (Figure 6G), so we don't think there is a big discrepancy. On the other hand, the amount of ChiP-qPCR of H3K9 in meioc<sup>mo/mo</sup> was increased by about 130% (Figure 6F), while ChiP-qPCR of Piwil1 was increased by 50%, so there may be a mechanism for H3K9 regulation of Meioc that is not mediated by Piwil1. As for what drives the accumulation of Piwil1 in the nucleolus, although we have found that Piwil1 has affinity for rRNA (Fig. 6A), we do not know what recruits it. Significant increases in the 18-35nt small RNA of 18S, 28S rRNA and R2 were not detected in meioc<sup>mo/mo</sup> testes enriched for 1-8 cell spermatogonia, compared with meioc<sup>+/mo</sup> testes. The nucleolar localization of Piwil1 has revealed in this study, which will be a new topic for future research.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors report that Meioc is required to upregulate rRNA transcription and promote differentiation of spermatogonial stem cells in zebrafish. The authors show that upregulated protein synthesis is required to support spermatogonial stem cells' differentiation into multi-celled cysts of spermatogonia. Coiled coil protein Meioc is required for this upregulated protein synthesis and for increasing rRNA transcription, such that the Meioc knockout accumulates 1-2 cell spermatogonia and fails to produce cysts with more than 8 spermatogonia. The Meioc knockout exhibits continued transcriptional repression of rDNA. Meioc interacts with and sequesters Piwil1 to the cytoplasm. Loss of Meioc increases Piwil1 localization to the nucleolus, where Piwil1 interacts with transcriptional silencers that repress rRNA transcription.

      Strengths:

      This is a fundamental study that expands our understanding of how ribosome biogenesis contributes to differentiation and demonstrates that zebrafish Meioc plays a role in this process during spermatogenesis. This work also expands our evolutionary understanding of Meioc and Ythdc2's molecular roles in germline differentiation. In mouse, the Meioc knockout phenocopies the Ythdc2 knockout, and studies thus far have indicated that Meioc and Ythdc2 act together to regulate germline differentiation. Here, in zebrafish, Meioc has acquired a Ythdc2-independent function. This study also identifies a new role for Piwil1 in directing transcriptional silencing of rDNA.

      Weaknesses:

      There are limited details on the stem cell-enriched hyperplastic testes used as a tool for mass spec experiments, and additional information is needed to fully evaluate the mass spec results. What mutation do these testes carry? Does this protein interact with Meioc in the wildtype testes? How could this mutation affect the results from the Meioc immunoprecipitation?

      Stem cell-enriched hyperplastic testes came from wild-type adult sox17::GFP transgenic zebrafish. Sperm were found in these hyperplastic testes, and when stem cells were transplanted, they self-renewed and differentiated into sperm. It is not known if the hyperplasias develop due to a genetic variant in the line. We will add the following comment.

      “The stem cell-enriched hyperplastic testes, which are occasionally found in adult wildtype zebrafish, contain cells at all stages of spermatogenesis. Hyperplasia-derived SSCs self-renewed and differentiated in the same manner as SSCs of normal testes in transplants of aggregates mixed with normal testicular cells.”

      Reviewer #3 (Public review):

      Summary:

      The paper describes the molecular pathway to regulate germ cell differentiation in zebrafish through ribosomal RNA biogenesis. Meioc sequesters Piwil1, a Piwi homolog, which suppresses the transcription of the 45S pre-rDNA by the formation of heterochromatin, to the perinuclear bodies. The key results are solid and useful to researchers in the field of germ cell/meiosis as well as RNA biosynthesis and chromatin.

      Strengths:

      The authors nicely provided the molecular evidence on the antagonism of Meioc to Piwil1 in the rRNA synthesis, which supported by the genetic evidence that the inability of the meioc mutant to enter meiosis is suppressed by the piwil1 heterozygosity.

      Weaknesses:

      (1) Although the paper provides very convincing evidence for the authors' claim, the scientific contents are poorly written and incorrectly described. As a result, it is hard to read the text. Checking by scientific experts would be highly recommended. For example, on line 38, "the global translation activity is generally [inhibited]", is incorrect and, rather, a sentence like "the activity is lowered relative to other cells" is more appropriate here. See minor points for more examples.

      Thank you for pointing that out. I will correct the parts pointed out.

      (2) In some figures, it is hard for readers outside of zebrafish meiosis to evaluate the results without more explanation and drawing.

      We will refine Figure 1A and add schema of spermatogonia culture system in a supplemental figure. 

      (3) Figure 1E, F, cycloheximide experiments: Please mention the toxicity of the concentration of the drug in cell proliferation and viability.

      When testicular tissue culture was performed at 0.1, 1, 10, 100, 250, and 500mM, abnormal strong OP-puro signals including nuclei were found in cells at 10mM or more. We will add the results in the Supplemental Material. In addition, at 1mM, growth was perturbed in fast-growing 32≤-cell cysts of spermatogonia, but not in 1-4-cell spermatogonia, as described in L122-125.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      By way of background, the Jiang lab has previously shown that loss of the type II BMP receptor Punt (Put) from intestinal progenitors (ISCs and EBs) caused them to differentiate into EBs, with a concomitant loss of ISCs (Tian and Jiang, eLife 2014). The mechanism by which this occurs was activation of Notch in Put-deficient progenitors. How Notch was upregulated in Put-deficient ISCs was not established in this prior work. In the current study, the authors test whether a very low level of Dl was responsible. But co-depletion of Dl and Put led to a similar phenotype as depletion of Put alone. This result suggested that Dl was not the mechanism. They next investigate genetic interactions between BMP signaling and Numb, an inhibitor of Notch signaling. Prior work from Bardin, Schweisguth and other labs has shown that Numb is not required for ISC self-renewal. However the authors wanted to know whether loss of both the BMP signal transducer Mad and Numb would cause ISC loss. This result was observed for RNAi depletion from progenitors and for mad, numb double mutant clones. Of note, ISC loss was observed in 40% of mad, numb double mutant clones, whereas 60% of these clones had an ISC. They then employed a two-color tracing system called RGT to look at the outcome of ISC divisions (asymmetric (ISC/EB) or symmetric (ISC/ISC or EB/EB)). Control clones had 69%, 15% and 16%, respectively, whereas mad, numb double mutant clones had much lower ISC/ISC (11%) and much higher EB/EB (37%). They conclude that loss of Numb in moderate BMP loss of function mutants increased symmetric differentiation which lead caused ISC loss. They also reported that Numb<sup>15</sup> and numb<sup>4</sup> clones had a moderate but significant increase in ISC-lacking clones compared to control clones, supporting the model that Numb plays a role in ISC maintenance. Finally, they investigated the relevance of these observation during regeneration. After bleomycin treatment, there was a significant increase in ISC-lacking clones and a significant decrease in clone size in numb<sup>4</sup> and Numb<sup>15</sup> clones compared to control clones. Because bleomycin treatment has been shown to cause variation in BMP ligand production, the authors interpret the numb clone under bleomycin results as demonstrating an essential role of Numb in ISC maintenance during regeneration.

      Strengths:

      (i) Most data is quantified with statistical analysis

      (ii) Experiments have appropriate controls and large numbers of samples

      (iii) Results demonstrate an important role of Numb in maintaining ISC number during regeneration and a genetic interaction between Mad and Numb during homeostasis.

      Weaknesses:

      (i) No quantification for Fig. 1

      Thank you for your suggestion. Quantification of Fig.1 will be added.  

      (ii) The premise is a bit unclear. Under homeostasis, strong loss of BMP (Put) leads to loss of ISCs, presumably regardless of Numb level (which was not tested). But moderate loss of BMP (Mad) does not show ISC loss unless Numb is also reduced. I am confused as to why numb does not play a role in Put mutants. Did the authors test whether concomitant loss of Put and Numb leads to even more ISC loss than Put-mutation alone.

      Thank you for your comment. We have tested the genetic interaction between punt and numb using punt RNAi and numb RNAi driven by esg<sup>ts</sup>. According to the results in this study and our previously published data, punt mutant clone or esg<sup>ts</sup>> punt RNAi could induce a rapid loss of ISC (whin 8 days). We did not observe further enhancement of stem cell loss phenotype caused punt RNAi by numb RNAi.

      (iii) I think that the use of the word "essential" is a bit strong here. Numb plays an important role but in either during homeostasis or regeneration, most numb clones or mad, numb double mutant clones still have ISCs. Therefore, I think that the authors should temper their language about the role of Numb in ISC maintenance.

      Thank you. We will revise the language.

      Reviewer #2 (Public review):

      Summary:

      This work assesses the genetic interaction between the Bmp signaling pathway and the factor Numb, which can inhibit Notch signalling. It follows up on the previous studies of the group (Tian, Elife, 2014; Tian, PNAS, 2014) regarding BMP signaling in controlling stem cell fate decision as well as on the work of another group (Sallé, EMBO, 2017) that investigated the function of Numb on enteroendocrine fate in the midgut. This is an important study providing evidence of a Numb-mediated back up mechanism for stem cell maintenance.

      Strengths:

      (1) Experiments are consistent with these previous publications while also extending our understanding of how Numb functions in the ISC.

      (2) Provides an interesting model of a "back up" protection mechanism for ISC maintenance.

      Weaknesses:

      (1) Aspects of the experiments could be better controlled or annotated:

      (a) As they "randomly chose" the regions analyzed, it would be better to have all from a defined region (R4 or R2, for example) or to at least note the region as there are important regional differences for some aspects of midgut biology.

      Thank you. Since we mainly focus on region 4, we have added the clarification in the manuscript.

      (b) It is not clear to me why MARCM clones were induced and then flies grown at 18{degree sign}C? It would help to explain why they used this unconventional protocol.

      To avoid spontaneous clone, we kept the flies under 18°C.

      (2) There are technical limitations with trying to conclude from double-knockdown experiments in the ISC lineage, such as those in Figure 1 where Dl and put are both being knocked down: depending on how fast both proteins are depleted, it may be that only one of them (put, for example) is inactivated and affects the fate decision prior to the other one (Dl) being depleted. Therefore, it is difficult to definitively conclude that the decision is independent of Dl ligand.

      In our hand, Dl-RNAi is very effective and exhibited loss of N pathway activity as determined by the N pathway reporter Su(H)-lacZ (Fig. 1D). Therefore, the ectopic Su(H)-lacZ expression in Punt Dl double RNAi (fig. 1E) is unlikely due to residual Dl expression. Nevertheless, we will change the statement “BMP signaling blocks ligand-independent N activity” to” Loss of BMP signaling results in ectopic N pathway activity even when Dl is depleted”

      (3) Additional quantification of many phenotypes would be desired.

      (a) It would be useful to see esg-GFP cells/total cells and not just field as the density might change (2E for example).

      We focused on R4 region for quantification where the cell density did not exhibit apparent change in different experimental groups. In addition, we have examined many guts for quantification. It is unlikely that the difference in the esg+ cell number is caused by change in cell density.

      (b) Similarly, for 2F and 2G, it would be nice to see the % of ISC/ total cell and EB/total cell and not only per esgGFP+ cell.

      Unfortunately, we didn’t have the suggested quantification. However, we believe that quantification of the percentage of ISC or EB among all progenitor cells, as we did here, provides a faithful measurement of the self-renewal status of each experimental group.

      (c) Fig1: There is no quantification - specifically it would be interesting to know how many esg+ are su(H)lacZ positive in Put- Dl- condition compared to WT or Put- alone. What is the n?

      Quantification will be added.

      (d) Fig2: Pros + cells are not seen in the image? Are they all DllacZ+?

      Anti-Pros and anti-E(spl)mβ-CD2 were stained in the same channel (magenta).  Pros+ is nuclear dot-like staining, while CD2 outlined the cell membrane of EB cell.

      (e) Fig3: it would be nice to have the size clone quantification instead of the distribution between groups of 2 cell 3 cells 4 cell clones.

      Thank you for your suggestion. In this study, we have quantified the clone size of each clone and calculated the average size for each genotype. However, the frequency distribution analysis was chosen because it highlights the significance of the clone size differences among genotypes.

      (f) How many times were experiments performed?

      All experiments are performed 3 times.

      (4) The authors do not comment on the reduction of clone size in DSS treatment in Figure 6K. How do they interpret this? Does it conflict with their model of Bleo vs DSS?

      numb<sup>4</sup> clone containing guts treated with DSS exhibited a slight reduction of clone size, evident by a higher percentage of 2-cell clones and lower percentage of > 8 cell clones. This reduction is less significant in guts containing numb<sup>15</sup> clones. However, the percentage of Dl<sup>+</sup>-containing clones is similar between DSS and mock-treated guts. It is possible that ISC proliferation is lightly reduced due to numb<sup>4</sup> mutation or the genetic background.

      (5) There is probably a mistake on sentence line 314 -316 "Indeed, previous studies indicate that endogenous Numb was not undetectable by Numb antibodies that could detect Numb expression in the nervous system".

      We will make a correction of the sentence.

      Reviewer #3 (Public review):

      Summary:

      The authors provide an in-depth analysis of the function of Numb in adult Drosophila midgut. Based on RNAi combinations and double mutant clonal analyses, they propose that Numb has a function in inhibiting Notch pathway to maintain intestinal stem cells, and is a backup mechanism with BMP pathway in maintaining midgut stem cell mediated homeostasis.

      Strengths:

      Overall, this is a carefully constructed series of experiments, and the results and statistical analyses provides believable evidence that Numb has a role, albeit weak compared to other pathways, in sustaining ISC and in promoting regeneration especially after damage by bleomycin, which may damage enterocytes and therefore disrupt BMP pathway more. The results overall support their claim.

      The data are highly coherent, and support a genetic function of Numb, in collaborating with BMP signaling, to maintain the number and proliferative function of ISCs in adult midguts. The authors used appropriate and sophisticated genetic tools of double RNAi, mutant clonal analysis and dual marker stem cell tracing approaches to ensure the results are reproducible and consistent. The statistical analyses provide confidence that the phenotypic changes are reliable albeit weaker than many other mutants previously studied.

      Weaknesses:

      In the absence of Numb itself, the midgut has a weak reduction of ISC number (Fig. 3 and 5), as well as weak albeit not statistically significant reduction of ISC clone size/proliferation. I think the authors published similar experiments with BMP pathway mutants. The mad<sup>1-2</sup> allele used here as stated below may not be very representative of other BMP pathway mutants. Therefore, it could be beneficial to compare the number of ISC number and clone sizes between other BMP experiments to provide the readers with a clearer picture of how these two pathways individually contribute (stronger/weaker effects) to the ISC number and gut homeostasis.

      Thank you for your comment. We have tested other components of BMP pathway in our previously study (Tian et al., 2014). More complete loss of BMP signaling (for example, Put clones, Put RNAi, Tkv/Sax double mutant clones or double RNAi) resulted in ISC loss regardless of the status of numb, suggesting a more predominant role of BMP signaling in ISC self-renewal compared with Numb. We speculate that the weak stem cell loss phenotype associated with numb mutant clones in otherwise wild type background could be due to fluctuation of BMP signaling in homeostatic guts.

      The main weakness of this manuscript is the analysis of the BMP pathway components, especially the mad<sup>1-2</sup> allele. The mad RNAi and mad<sup>1-2</sup> alleles (P insertion) are supposed to be weak alleles and that might be suitable for genetic enhancement assays here together with numb RNAi. However, the mad<sup>1-2</sup> allele, and sometimes the mad RNAi, showed weakly increased ISC clone size. This is kind of counter-intuitive that they should have a similar ISC loss and ISC clone size reduction.

      We used mad<sup>1-2</sup> and mad RNAi here to test the genetic interaction with numb because our previous studies showed that partial loss of BMP signaling under these conditions did not cause stem cell loss, therefore, may provide a sensitized background to determine the role of Numb in ISC self-renewal. The increased proliferation of ISC/ clone size in associated with mad<sup>1-2</sup> and mad RNAi is due to the fact that the reduction of BMP signaling in either EC or EB will non-autonomously induce stem cell proliferation. However, in mad numb double mutant clones, there was a reduction in clone size, which correlated with loss of ISC.

      A much stronger phenotype was observed when numb mutants were subject to treatment of tissue damaging agents Bleomycin, which causes damage in different ways than DSS. Bleomycin as previously shown to be causing mainly enterocyte damage,  and therefore disrupt BMP signaling from ECs more likely. Therefore, this treatment together with loss of numb led to a highly significant reduction of ISC in clones and reduction of clone size/proliferation. One improvement is that it is not clear whether the authors discussed the nature of the two numb mutant alleles used in this study and the comparison to the strength of the RNAi allele. Because the phenotypes are weak and more variable, the use of specific reagents is important.

      Numb<sup>15</sup> is a null allele, and the nature of numb<sup>4</sup> has not been elucidated. According to Domingos, P.M. et al., numb<sup>15</sup> induced a more severe phenotype than numb<sup>4</sup> did. Consistently, we also found that more numb<sup>15</sup> mutant clones were void of stem cell than numb<sup>4</sup>.

      Furthermore, the use of possible activating alleles of either or both pathways to test genetic enhancement or synergistic activation will provide strong support for the claims.

      Activation of BMP (Tkv<sup>CA</sup>) also induced stem cell tumor (Tian et al., 2014), which is not suitable for synergistic activation experiment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study offers a useful treatment of how the population of excitatory and inhibitory neurons integrates principles of energy efficiency in their coding strategies. The analysis provides a comprehensive characterisation of the model, highlighting the structured connectivity between excitatory and inhibitory neurons. However, the manuscript provides an incomplete motivation for parameter choices. Furthermore, the work is insufficiently contextualized within the literature, and some of the findings appear overlapping and incremental given previous work.

      We are genuinely grateful to the Editors and Reviewers for taking time to provide extremely valuable suggestions and comments, which will help us to substantially improve our paper. We decided to do our very best to implement all suggestions, as detailed in the point-by-point rebuttal letter below. We feel that our paper has improved considerably as a result. 

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, leading to the experimentally observed phenomenon of feature competition. They also characterise the impact of various (hyper)parameters, such as adaptation timescale, ratio of excitatory to inhibitory cells, regularisation strength, and background current. These results add useful biological realism to a particular model of efficient coding. However, not all claims seem fully supported by the evidence. Specifically, several biological features, such as the ratio of excitatory to inhibitory neurons, which the authors claim to explain through efficient coding, might be contingent on arbitrary modelling choices. In addition, earlier work has already established the importance of structured connectivity for feature competition. A clearer presentation of modelling choices, limitations, and prior work could improve the manuscript.

      Thanks for these insights and for this summary of our work.  

      Major comments:

      (1) Much is made of the 4:1 ratio between excitatory and inhibitory neurons, which the authors claim to explain through efficient coding. I see two issues with this conclusion: (i) The 4:1 ratio is specific to rodents; humans have an approximate 2:1 ratio (see Fang & Xia et al., Science 2022 and references therein); (ii) the optimal ratio in the model depends on a seemingly arbitrary choice of hyperparameters, particularly the weighting of encoding error versus metabolic cost. This second concern applies to several other results, including the strength of inhibitory versus excitatory synapses. While the model can, therefore, be made consistent with biological data, this requires auxiliary assumptions.

      We now describe better the ratio of numbers of E and I neurons found in real data, as suggested. The first submission already contained an analysis of how the optimal ratio of E vs I neuron numbers depends in our model on the relative weighting of the loss of E and I neurons and on the relative weighting of the encoding error vs the metabolic cost in the loss function (see Fig. 7E). We revised the text on page 12 describing Fig. 7E. 

      To allow readers to form easily a clear idea of how the weighting of the error vs the cost may influence the optimal network configuration, we now present how optimal parameters depend on the weighting in a systematic way, by always including this type of analysis when studying all other model parameters (time constants of single E and I neurons, noise intensity, metabolic constant, ratio of mean I-I to E-I connectivity). These results are shown on the Supplementary Fig. S4 A-D and H, and we comment briefly on each of them in Results sections (pages 9, 10, 11 and 12) that analyze each of these parameters.  

      Following this Reviewer’s comment, we now included a joint analysis of network performance relative to the ratio of E-I neuron numbers and the ratio of mean I-I to E-I connectivity (Fig. 7J). We found a positive correlation between optima values of these two ratios. This implies that a lower ratio of E-I neuron numbers, such as a 2:1 ratio in human cortex mentioned by the reviewer, predicts lower optimal ratio of I-I to E-I connectivity and thus weaker inhibition in the network. We made sure that this finding is suitably described in revision (page 13).

      (2) A growing body of evidence supports the importance of structured E-I and I-E connectivity for feature selectivity and response to perturbations. For example, this is a major conclusion from the Oldenburg paper (reference 62 in the manuscript), which includes extensive modelling work. Similar conclusions can be found in work from Znamenskiy and colleagues (experiments and spiking network model; bioRxiv 2018, Neuron 2023 (ref. 82)), Sadeh & Clopath (rate network; eLife, 2020), and Mackwood et al. (rate network with plasticity; eLife, 2021). The current manuscript adds to this evidence by showing that (a particular implementation of) efficient coding in spiking networks leads to structured connectivity. The fact that this structured connectivity then explains perturbation responses is, in the light of earlier findings, not new.

      We agree that the main contribution of our manuscript in this respect is to show how efficient coding in spiking networks can lead to structured connectivity implementing lateral inhibition similar to that proposed in the recent studies mentioned by the Reviewer. We apologize if this was not clear enough in the previous version. We streamlined the presentation to make it clearer in revision.  We nevertheless think it useful to report the effects of perturbations within this network because these results give information about how lateral inhibition works in our network. Thus, we kept presenting it in the revised version, although we de-emphasized and simplified its presentation. We now give more emphasis to the novelty of the derivation of this connectivity rule from the principles of efficient coding (pages 4 and 6). We also describe better (page 8) what the specific results of our simulated perturbation experiments add to the existing literature.

      (3) The model's limitations are hard to discern, being relegated to the manuscript's last and rather equivocal paragraph. For instance, the lack of recurrent excitation, crucial in neural dynamics and computation, likely influences the results: neuronal time constants must be as large as the target readout (Figure 4), presumably because the network cannot integrate the signal without recurrent excitation. However, this and other results are not presented in tandem with relevant caveats.

      We improved the Limitations paragraph in Discussion, and also anticipated caveats in tandem with results when needed, as suggested. 

      We now mention the assumption of equal time constants between the targets and readouts in the Abstract. 

      We now added the analysis of the network performance and dynamics as a function of the time constant of the target (t<sub>x</sub>) to the Supplementary Fig S5 (C-E). These results are briefly discussed in text on page 13. The only measure sensitive to t<sub>x</sub> is the encoding error of E neurons, with a minimum at t<sub>x</sub> =9 ms, while I neurons and metabolic cost show no dependency. Firing rates, variability of spiking as well as the average and instantaneous balance show no dependency on t<sub>x</sub>. We note that t<sub>x</sub> = t, with t=1/l the time constant of the population readout (Eq. 9), is an assumption we use when we derive the model from the efficiency objective (Eq. 18 to 23). In our new and preliminary work (Koren, Emanuel, Panzeri, Biorxiv 2024), we derived a more general class of models where this assumption is relaxed, which gives a network with E-E connectivity that adapts to the time constant of the stimulus. Thus, the reviewer is correct in the intuition that the network requires E-E connectivity to better integrate target signals with a different time constant than the time constant of the membrane. We now better emphasize this limitation in Discussion (page 16).

      (4) On repeated occasions, results from the model are referred to as predictions claimed to match the data. A prediction is a statement about what will happen in the future – but most of the “predictions” from the model are actually findings that broadly match earlier experimental results, making them “postdictions”.

      This distinction is important: compared to postdictions, predictions are a much stronger test because they are falsifiable. This is especially relevant given (my impression) that key parameters of the model were tweaked to match the data.

      We now comment on every result from the model as either matching earlier experimental results, or being a prediction for experiments. 

      In Section “Assumptions and emergent properties of the efficient E-I network derived from first principles”, we report (page 4) that neural networks have connectivity structure that relates to tuning similarity of neurons (postdiction). 

      In Section “Encoding performance and neural dynamics in an optimally efficient E-I network” we report (page 5) that in a network with optimal parameters, I neurons have higher firing rate than E neurons (postdiction), that single neurons show temporally correlated synaptic currents (postdiction) and that the distribution of firing rates across neurons is log-normal (postdiction). 

      In Section “Competition across neurons with similar stimulus tuning emerging in efficient spiking networks” we report (page 6)  that the activity perturbation of E neurons induces lateral inhibition on other E neurons, and that the strength of lateral inhibition depends on tuning similarity (postdiction). We show that activity perturbation of E neurons induces lateral excitation in I neurons (prediction). We moreover show that the specific effects of the perturbation of neural activity rely on structured E-I-E connectivity (prediction for experiments, but similar result in Sadeh and Clopath, 2020). We show strong voltage correlations but weak spike-timing correlations in our network (prediction for experiments, but similar result in Boerlin et al. 2013). 

      In Section “The effect of structured connectivity on coding efficiency and neural dynamics”, we report (page 7) that our model predicts a number of differences between networks with structured and unstructured (random) connectivity. In particular, structured networks differ from unstructured ones by showing better encoding performance, lower metabolic cost, weaker variance over time in the membrane potential of each neuron, lower firing rates and weaker average and instantaneous balance of synaptic currents.

      In Section “Weak or no spike-triggered adaptation optimizes network efficiency”, we report (page 9) that our model predicts better encoding performance in networks with adaptation compared to facilitation. Our results suggest that adaptation should be stronger in E compared to I (PV+) neurons (postdiction). In the same section, we report (page 10) that our results suggest that the instantaneous balance is a better predictor of model efficiency than average balance (prediction).

      In Section “Non-specific currents regulate network coding properties”, we report (page 10) that our model predicts that more than half of the distance between the resting potential and firing threshold is taken by external currents that are unrelated to feedforward processing (postdiction). We also report (page 11) that our model predicts that moderate levels of uncorrelated (additive) noise is beneficial for efficiency (prediction for experiments, but similar results in Chalk et al., 2016, Koren et al., 2017, Timcheck et al. 2022).

      In Section “Optimal ratio of E-I neuron numbers and of mean I-I to E-I synaptic efficacy coincide with biophysical measurements”, we predict the optimal ratio of E to I neuron numbers to be 4:1 (postdiction) and the optimal ratio of mean I-I to E-I connectivity to be 3:1 (postdiction). Further, we report (page 13) that our results predict that a decrease in the ratio of E-I neuron numbers is accompanied with the decrease in the ratio of mean I-I to E-I connectivity. 

      Finally, in Section “Dependence of efficient coding and neural dynamics on the stimulus statistics”, we report (page 13) that our model predicts that the efficiency of the network has almost no dependence on the time scale of the stimulus (prediction). 

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength. It thus argues that some of these observations may come as a direct consequence of efficient coding.

      Strengths:

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models.

      In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some longstanding puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important.

      Though several of the observations have been reported and studied before (see below), this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Thanks for these insights and for the kind words of appreciation of the strengths of our work.  

      Weaknesses:

      Though the text of the paper may suggest otherwise, many of the modeling choices and observations found in the paper have been introduced in previous work on efficient spiking models, thereby making this work somewhat repetitive and incremental at times. This includes the derivation of the network into separate excitatory and inhibitory populations, discussion of physical units, comparison of voltage versus spike-timing correlations, and instantaneous E/I balance, all of which can be found in one of the first efficient spiking network papers (Boerlin et al. 2013), as well as in subsequent papers. Metabolic cost and slow adaptation currents were also presented in a previous study (Gutierrez & Deneve 2019). Though it is perfectly fine and reasonable to build upon these previous studies, the language of the text gives them insufficient credit.

      We indeed built our work on these important previous studies, and we apologize if this was not clear enough. We thus improved the text to make sure that credit to previous studies is more precisely and more clearly given (see detailed reply for the list of changes made). 

      To facilitate the understanding on how we built on previous work, we expanded the comparison of our results with the results of Boerlin et al. (2013) about voltage correlations and uncorrelated spiking (page 7), comparison with the derivation of physical units of Boerlin et al. (2013) (page 3), discussion of how results on the ratio of the number of E to I neurons relate  to Calaim et al (2022) and Barrett et al. (2016) (page 16), and comment on the previous work by Gutierrez and Deneve about adaptation (page 8).  

      Furthermore, the paper makes several claims of optimality that are not convincing enough, as they are only verified by a limited parameter sweep of single parameters at a time, are unintuitive and may be in conflict with previous findings of efficient spiking networks. This includes the following. 

      Coding error (RMSE) has a minimum at intermediate metabolic cost (Figure 5B), despite the fact that intuitively, zero metabolic cost would indicate that the network is solely minimizing coding error and that previous work has suggested that additional costs bias the output. 

      Coding error also appears to have a minimum at intermediate values of the ratio of E to I neurons (effectively the number of I neurons) and the number of encoded variables (Figures 6D, 7B). These both have to do with the redundancy in the network (number of neurons for each encoded variable), and previous work suggests that networks can code for arbitrary numbers of variables provided the redundancy is high enough (e.g., Calaim et al. 2022). 

      Lastly, the performance of the E-I variant of the network is shown to be better than that of a single cell type (1CT: Figure 7C, D). Given that the E-I network is performing a similar computation as to the 1CT model but with more neurons (i.e., instead of an E neuron directly providing lateral inhibition to its neighbor, it goes through an interneuron), this is unintuitive and again not supported by previous work. These may be valid emergent properties of the E-I spiking network derived here, but their presentation and description are not sufficient to determine this.

      With regard to the concern that our previous analyses considered optimal parameter sets determined with a sweep of a single parameter at a time, we have addressed this issue in two ways. First, we presented (Figure 6I and 7J and text on pages 11 and 13) results of joint sweeps of variations of pairs of parameters whose joint variations are expected to influence optimality in a way that cannot be understood varying one parameter at a time. These new analyses complement the joint parameter sweep of the time constants of single E and I neurons (t<sub>r</sub><sup>E</sup> and t<sub>r</sub><sup>I</sup>) that has already been presented in Fig. 5A (former Fig. 4A). Second, we conducted, within a reasonable/realistic range of possible variations of each individual parameter, a Monte-Carlo random joint sampling (10000 simulations with 20 trials each) of all 6 model parameters that we explored in the paper. We presented these new results on Fig. 2 and discuss it on pages 5-6. 

      The Reviewer is correct in stating that the error (RMSE) exhibits a counterintuitive minimum as a function of the metabolic constant despite the fact that, intuitively, for vanishing metabolic constant the network is solely minimizing the coding error (Fig. 6B). In our understanding, this counterintuitive finding is due to the presence of noise in the membrane potential dynamics. In the presence of noise, a non-vanishing metabolic constant is needed to suppress “inefficient” spikes purely induced by noise that do not contribute to coding and increase the error. This gives rise to a form of “stochastic resonance”, where the noise improves detection of the signal coming from the feedforward currents. We note that the metabolic constant and the noise variance both appear in the non-specific external current (Eq. 29f in Methods), and, thus, a covariation in their optimal values is expected. Indeed, we find that the optimal metabolic constant monotonically increases as a function of the noise variance, with stronger regularization (larger beta) required to compensate for larger variability (larger sigma) (Fig. 6I). Finally, we note that a moderate level of noise (which, in turn, induces a non-trivial minimum of the coding error as a function of beta) in the network is optimal. The beneficial effect of moderate levels of noise on performance in networks with efficient coding has been shown in different contexts in previous work (Chalk et al. 2016, Koren and Deneve, 2017). The intuition is that the noise prevents the excessive synchronization of the network and insufficient single neuron variability that decrease the performance. The points above are now explained in the revised text on page 11.

      The Reviewer is also correct in stating that the network exhibits an optimal performance for intermediate values of the number of I neurons and the number of encoded features. In our understanding, the optimal number of encoded features of M=3 arises simply because all the other parameters were optimized for those values of M. The purpose of those analyses was not to state that a network optimally encodes only a given number of features, but how a network whose parameters are optimized for a given M perform reasonably well when M is varied. We clarify this on page 13 of Results in Discussion on page 16. In the same Discussion paragraph we refer also to the results of Calaim et al mentioned by the Reviewer. 

      To address the concern about the comparison of efficiency between the E-I and the 1CT model, we took advantage of the Reviewer’s suggestions to consider this issue more deeply. In revision, we now compare the efficiency of the 1CT model with the E population of the E-I model (Fig. 8H). This new comparison changes the conclusion about which model is more efficient, as it shows the 1CT model is slightly more efficient than the E-I model. Nevertheless, the E-I model performance is more robust to small variations of optimal parameters, e.g., it exhibits biologically plausible firing rates for non-optimal values of the metabolic constant. See also the reply to point 3 of the Public Review of Reviewer 2 for more detail. We added these results and the ensuing caveats for the interpretation of this comparison on Page 14, and also revised the title of the last subsection of Results.  

      Alternatively, the methodology of the model suggests that ad hoc modeling choices may be playing a role. For example, an arbitrary weighting of coding error and metabolic cost of 0.7 to 0.3, respectively, is chosen without mention of how this affects the results. Furthermore, the scaling of synaptic weights appears to be controlled separately for each connection type in the network (Table 1), despite the fact that some of these quantities are likely linked in the optimal network derivation. Finally, the optimal threshold and metabolic constants are an order of magnitude larger than the synaptic weights (Table 1). All of these considerations suggest one of the following two possibilities. One, the model has a substantial number of unconstrained parameters to tune, in which case more parameter sweeps would be necessary to definitively make claims of optimality. Or two, parameters are being decoupled from those constrained by the optimal derivation, and the optima simply corresponds to the values that should come out of the derivation.

      We thank the reviewer for bringing about these important questions.

      In the first submission, we presented both the encoding error and the metabolic cost separately as a function of the parameters, so that readers could get an understanding of how stable optimal parameters would be to the change of the relative weighting of encoding error and metabolic cost. We specified this in Results (page 5) and we kept presenting separately encoding and metabolic terms in the revision.

      However, we agree that it is important to present the explicit quantification on how the optimal parameters may depend on g<sub>L</sub>. In the first submission, we showed the analysis for all possible weightings in case of two parameters for which we found this analysis was the most relevant – the ratio of neuron numbers (Fig. 7E, Fig. 6E in first submission) and the optimal number of input features M (see last paragraph on page 13 and Fig. 8D). We now show this analysis also for the rest of studied model parameters in the Supplementary Fig. S4 (A-D and H). This is discussed on pages 9, 10,11 and 12.

      With regard to the concern that the scaling of synaptic weights should not be controlled separately for each connection type in the network, we agree and we would like to clarify that we did not control such scaling separately. Apologies if this was not clear enough. From the optimal analytical solution, we obtained that the connectivity scales with the standard deviation of decoding weights (s<sub>w</sub><sup>E</sup> and s<sub>w</sub><sup>I</sup>) of the pre and postsynaptic populations (Methods, Eq. 32). We studied the network properties as a function of the ratio of average I-I to E-I connectivity (Fig. 7 F-I; Supplementary Fig. S4 D-H), which is equivalent to the ratio of standard deviations s<sub>w</sub><sup>I</sup> /s<sub>w</sub><sup>E</sup> (see Methods, Eq. 35). We clarified this in text on page 12.

      Next, it is correct that our synaptic weights are an order of magnitude smaller than the metabolic constant. We analysed a simpler version of the network that has the coding and dynamics identical to our full model (Methods, Eq. 25) but without the external currents. We found that the optimal parameters determining the firing threshold in such a simpler network were biologically implausible (see Supplementary Text 2 and Supplementary Table S1). We considered as another simple solution the rescaling of the synaptic efficacy such as to have biologically plausible threshold. However, that gave implausible mean synaptic efficacy (see Supplementary Text 2).  Thus, to be able to define a network with biologically plausible firing threshold and mean synaptic efficacy, we introduced the non-specific external current. After introducing such current, we were able to shift the firing threshold to biologically plausible values while keeping realistic values of mean synaptic efficacy. Biologically plausible values for the firing threshold are around 15 -– 20 mV above the resting potential (Constantinople and Bruno, 2013), which is the value that we have in our model. A plausible value for the average synaptic strength is between a fraction of one millivolt to a couple of millivolts (Constantinople & Bruno, 2013, Campagnola et al. 2022), which also corresponds to values that the synaptic weights take. The above results are briefly explained in the revised text on page 4.

      Finally, to study the optimality of the network when changing multiple parameters at a time, we added a new analysis with Monte-Carlo random joint sampling (10.000 parameter sets with 20 trials for each set) of all 6 model parameters that we explored in the paper. We compared (Fig 2) the so-obtained results of each simulation with those obtained from the understanding gained from varying one or two parameters at a time (optimal parameters reported in Table 1 and used throughout the paper).  We found (Fig. 2) that the optimal configuration in Table 1 was never improved by any other simulations we performed, and that the first three random simulations that came the closest to the optimal one of Table 1 had stronger noise intensity but also stronger metabolic cost than the configuration on Table 1. The second, third and fourth configurations had longer time constants of both E and I single neurons (adaptation time constants). Ratio of E-I neuron numbers and of I-I to E-I connectivity in the second, third and fourth best configuration were either jointly increased or decreased with respect to our configuration. These results are reported on Fig. 2 and in Tables 2-3 and they are discussed in Results (page 5).

      Reviewer #3 (Public Review):

      Summary:

      In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work?

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs.

      They then investigate in-depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and showing the networks can operate in a biologically realistic regime.

      Strengths:

      (1) The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field.

      (2) They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly.

      (3) They put sensible constraints on their networks, while still maintaining the good properties these networks should have.

      Thanks for this summary and for these kind words of appreciation of the strengths of our work.  

      Weaknesses:

      (1) The paper has somewhat overstated the significance of their theoretical contributions, and should make much clearer what aspects of the derivations are novel. Large parts were done in very similar ways in previous papers. Specifically: the split into E and I neurons was also done in Boerlin et al (2008) and in Barrett et al (2016). Defining the networks in terms of realistic units was already done by Boerlin et al (2008). It would also be worth it to discuss Barrett et al (2016) specifically more, as there they also use split E/I networks and perform biologically relevant experiments.

      We improved the text to make sure that credit to previous studies is more precisely and more clearly given (see rebuttal to the specific suggestions of Reviewer 2 for a full list).

      We apologize if this was not clear enough in the previous version. 

      With regard to the specific point raised here about the E-I split, we revised the text on page 2. With regard to the realistic units, we revised the text on page 3. Finally, we commented on relation between our results and results of the study by Barrett et al. (2016) on page 16.

      (2) It is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. While the constraints of Dale's law are sensible (splitting the population in E and I neurons, and removing any non-Dalian connection), they are imposed from biology and not from any coding principles. A discussion of how this could be done would be much appreciated, and in the main text, this should be made clear.

      We indeed removed non-Dalian connections because Dale’s law is a major constraint for biological plausibility. Our logic was to consider efficient coding within the space of networks that satisfy this (and other) biological plausibility constraints. We did not intend to claim that removing the non-Dalian connections was the result of an analytical optimization. We clarified this in revision (page 4).

      (3) Related to the previous point, the claim that the network with split E and I neurons has a lower average loss than a 1 cell-type (1-CT) network seems incorrect to me. Only the E population coding error should be compared to the 1-CT network loss, or the sum of the E and I populations (not their average). In my author recommendations, I go more in-depth on this point.

      We carefully considered these possibilities and decided to compare only the E population of the E-I model with the 1-CT model. On Fig.8G (7C of the first submission), E neurons have a slightly higher error and cost compared to the 1CT network. In the revision, we compared the loss of E neurons of the E-I model with the loss of the 1-CT model. Using such comparison, we found that the 1CT network has lower loss and is more efficient compared to E neurons of the E-I model. We revised Figure 8H and text on page 14 to address this point. 

      (4) While the paper is supposed to bring the balanced spiking networks they consider in a more experimentally relevant context, for experimental audiences I don't think it is easy to follow how the model works, and I recommend reworking both the main text and methods to improve on that aspect.

      We tried to make the presentation of the model more accessible to a non-computational audience in the revised paper. We carefully edited the text throughout to make it as accessible as possible. 

      Assessment and context:

      Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporating aspects of energy efficiency. For computational neuroscientists, this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers, the model provides a clearer link between efficient coding spiking networks to known experimental constraints and provides a few predictions.

      Thanks for these kind words. We revised the paper to make sure that these points emerge more clearly and in a more accessible way from the revised paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Referring to the major comments:

      (1) Be upfront about particular modelling choices and why you made them; avoid talk of a "striking/surprising", etc. ability to explain data when this actually requires otherwise-arbitrary choices and auxiliary assumptions. Ideally, this nuance is already clear from the abstract.

      We removed all the "striking/surprising" and similar expressions from the text. 

      We added to the Abstract the assumption of equal time constants of the stimulus and of the membrane of E and I neurons and the assumption of the independence of encoded stimulus features.

      In revision, we performed additional analyses (joint parameter sweeps, Monte-Carlo joint sampling of all 6 model parameters) providing additional evidence that the network parameters in Table 1 capture reasonably well the optimal solution. These are reported on Figs. 2, 6I and 7J and in Results (pages 5, 11 and 13). See rebuttal to weaknesses of the public review of the Referee 2 for details.

      (2) Make even more of an effort to acknowledge prior work on the importance of structured E-I and I-E connectivity.

      We have revised the text (page 4) to better place our results within previous work on structured E-I and I-E connectivity.

      (3) Be clear about the model's limitations and mention them throughout the text. This will allow readers to interpret your results appropriately.

      We now comment more on model's limitations, in particular the simplifying assumption about the network's computation (page 16), the lack of E-E connectivity (page 3), the absence of long-term adaptation (page 10), and the simplification of only having one type of inhibitory neurons (page 16). 

      (4) Present your "predictions" for what they are: aspects of the model that can be made consistent with the existing data after some fitting. Except in the few cases where you make actual predictions, which deserve to be highlighted.

      We followed the suggestion of the reviewer and distinguished cases where the model is consistent with the data (postdictions) from actual predictions, where empirical measurements are not available or not conclusive. We compiled a list of predictions and postdictions in response to the point 4 of Reviewer 1. In revision, we now comment about every property of the model as either reproducing a known property of biological networks (postdiction) or being a prediction. We improved the text in Results on pages 4, 5, 6, 7, 9, 10, 11, 12 and 13 to accommodate these requests.

      Minor comments and recommendations

      It's a sizable list, but most can be addressed with some text edits.

      (1) The image captions should give more details about the simulations and analyses, particularly regarding sample sizes and statistical tests. In Figure 5, for example, it is unclear if the lines represent averages over multiple signals and, if so, how many. It's probably not a single realization, but if it is, this might explain the otherwise puzzling optimal number of three stimuli. Box plots visualize the distribution across simulation trials, but it's not clear how many. In Figure 7d, a star suggests statistical significance, but the caption does not mention the test or its results; the y-axis should also have larger limits.

      All statistical results were computed on 100 or 200 simulation trials, depending on the figure, with duration of the trial of 1 second of simulated time. To compute statistical results in Fig. 1, we used 10 trials with duration of 10 seconds for each trial. Each trial consisted of M independent realizations of Ornstein-Uhlenbeck (OU) processes as stimuli, independent noise in the membrane potential and an independent draw of tuning parameters, such that the results are general over specific realization of these random variables. Realizations of the OU processes were independent across stimulus dimensions and across trials. We added this information in the caption of each figure. 

      The optimal number of M=3 stimuli is the result of measuring the performance of the network in 100 simulation trials (for each parameter value), thus following the same procedure as for all other parameters. Boxplots on Fig. 8G-H were also generated from results computed in 100 simulation trials, which we have now specified in the caption of the figure, together with the statistical test used for assessing the significance (twotailed t-test). We also enlarged the limits of Fig. 8H (7D in the previous version).

      (2) The Oldenburg paper (reference 62) finds suppression of all but nearby neurons in response to two- photon stimulation of small neural ensembles (instead of single neurons, as in Chettih & Harvey). This isn't perfectly consistent with the model's results, even though the Oldenburg experiments seem more relevant given the model's small size, and strong connectivity/high connection probability between similarly tuned neurons. What might explain the potential mismatch?

      We sincerely apologize for not having been precise enough on this point when comparing our model against Chettih & Harvey and Oldenburg et al. We corrected the sentence (page 6) to remove the claim that our model reproduces both. 

      We speculate that the discrepancy between perturbing our model and the Oldenburg data may arise from the lack of E-E connectivity in our model. Synaptic connections between E neurons with similar selectivity could create an enhancement instead of suppression between neuronal pairs with very similar tuning. We added a sentence about this in the section with perturbation experiments “Competition across neurons with similar stimulus tuning emerging in efficient spiking networks” (page 7) where we discuss this limitation of our model. We feel that this example shows the utility to derive some perturbation results from our model, as not all networks with some degree of lateral inhibition will show the same perturbation results. Comparing our model's perturbation with real data perturbation results has thus some value to better appreciate strengths and limitations of our approach. 

      (3) "Previous studies optogenetically stimulated E neurons but did not determine whether the recorded neurons were excitatory or inhibitory " (p. 11). I believe Oldenburg et al. did specifically image excitatory neurons.

      The reviewer is correct about Oldenburg et al. imaging specifically excitatory neurons. We have revised this part of the Discussion (page 15). 

      (4) The authors write that efficiency is particularly achieved where adaptation is stronger in E compared to I neurons (p. 7; Figure 4). Although this would be consistent with experimental data (the I neurons in the model seem akin to fast-spiking Pv+ cells), I struggle to see it in the figure. Instead, it seems like there are roughly two regimes. If either of the neuronal timescales is faster than the stimulus timescale, the optimisation fails. If both are at least as slow, optimisation succeeds.

      We agree with the reviewer that the adaptation properties of our inhibitory neurons are compatible with Pv+ cells. What is essential for determining the dynamical regime of the network is less the relation to the time constant of the stimulus (t<sub>x</sub>) but rather the relation between the time constant of the population readout (t, which is also the membrane time constant) and the time constant of the single neuron (t<sub>r</sub><sup>y</sup> for y=E and y=I; see Eq. 23, 25 or 29e). The relation between t and t<sub>r</sub><sup>y</sup> determines if single neurons generate spike-triggered adaptation (t<sub>r</sub><sup>y</sup> > t) or spike-triggered facilitation (t<sub>r</sub><sup>y</sup> < t; see Table 4). In regimes with facilitation in either E or I neurons (or both), the network performance strongly deteriorates compared to regimes with adaptation (Fig. 5A). 

      Beyond adaptation leading to better performance, we also found different effects of adaptation in E and I neurons. We acknowledge that the difference of these effects was difficult to see from the Fig. 4B in the first submission. We have now replotted results from previously shown Fig. 4B to focus on the adaptation regime only, (since the Fig. 5A already establishes that this is the regime with better performance). We also added figures showing the differential effect of adaptation in E and I cell type on the firing rate and on the average loss (Fig. 5C-D). Fig. 5B and C (top plots) show that with adaptation in E neurons, the error and the loss increase more slowly than with adaptation in I neurons. Moreover, the firing rate in both cell types decreases with adaptation in E neurons, while this is not the case with adaptation in I neurons (Fig. 5D). These results are added to the figure panels specified above and discussed in text on page 9.

      To clarify the relation between neuronal and stimulus timescale, we now also added the analysis of network performance as a function of the time constant of the stimulus t<sub>x</sub> (Supplementary Fig. S5 C-E). We found that the model's performance is optimal when the time constant of the stimulus is close to the membrane time constant t. This result is expected, because the equality of these time constants was imposed in our analytical derivation of the model (t<sub>x</sub>  = t). We see a similar decrease in performance for values of t<sub>x</sub>  that are faster and slower with respect to the membrane time constant (Supplementary Fig. S5C, top). These results are added to the figure panels specified above and discussed in text on page 13.

      (5) A key functional property of cortical interneurons is their lower stimulus selectivity. Does the model replicate this feature?

      We think that whether I neurons are less selective than E neurons is still an open question. A number of recent empirical studies reported that the selectivity of I neurons is comparable to the selectivity of E neurons (see., e.g., Kuan et al. Nature 2024, Runyan et al. Neuron 2010, Najafi et al. Neuron 2020). In our model, the optimal solution prescribes a precise structure in recurrent connectivity (see Eq. 24 and Fig. 1C(ii)) and structured connectivity endows I neurons with stimulus selectivity. To show this, we added plots of example tuning curves and the distribution of the selectivity index across E and I neurons (Fig. 8E-F) and described these new results in Results (page 14). Tuning curves in our network were similar to those computed in a previous work that addressed stimulus tuning in efficient spiking networks (Barrett et al. 2016). We evaluated tuning curves using M=3 constant stimulus features and we varied one of the features while the two others were kept fixed. We provided details on how the tuning curves and the selectivity index were computed in a new Methods subsection (“Tuning curves and selectivity index”) on page 50.

      (6) The final panels of Figure 4 are presented as an approach to test the efficiency of biological networks. The authors seem to measure the instantaneous (and time-averaged) E-I balance while varying the adaptation parameter and then correlate this with the loss. If that is indeed the approach (it's difficult to tell), this doesn't seem to suggest a tractable experiment. Also, the conclusion is somewhat obvious: the tighter the single neuron balance, the fewer unnecessary spikes are fired. I recommend that the authors clearly explain their analysis and how they envision its application to biological data.

      We indeed measured the instantaneous (and time-averaged) E-I balance while varying the adaptation parameters and then correlating this with the loss. We did not want to imply that the latter panels of Figure 4 are a means to test the efficiency or biological networks or that we are suggesting new and possibly unfeasible experiments. We see it as a way to better conceptually understand how spike triggered adaptation helps the network’s coding efficiency, by tightening the E I balance in a way that it reduces the number of unnecessary spikes. We apologize if the previous text was confusing in this respect.   We have now removed the initial paragraph of former Results Subsection (including removing the subsection title) and added new text about different effect of adaptation in E and I neurons on Page 9. We also thoroughly revised Figure 5.

      (7) The external stimuli are repeatedly said to vary (or be tracked) across "multiple time scales", which might inadvertently be interpreted as (i) a single stimulus containing multiple timescales or (ii) simultaneously presented stimuli containing different timescales. These scenarios are potential targets for efficient coding through neuronal adaptation (reference 21 in the manuscript and Pozzorini et al. Nat. Neuro. 2013), but they are not addressed in the current model. I recommend the authors clarify their statements regarding timescales (and if they're up for it, acknowledge this as a limitation).

      We thank the reviewer for bringing up this interesting point. To address the second point raised by the Reviewer (simultaneously presented stimuli containing multiple timescales), we performed new analyses to test the model with simultaneously presented stimuli that have different timescales. We found that the model encodes efficiently such stimuli.  We tested the case with a 3-dimensional stimulus where each dimension is an Ornstein-Uhlenbeck process with a different time constant. More precisely, we kept the time constant in the first dimension fixed (at 10 ms), and varied the time constant in the second and third dimension such that the time constant in the third dimension is doubled with respect to the second dimension. We plotted the encoding error in every stimulus dimension for E and I neurons (Fig. 8B, left plot) as well as the encoding error and the metabolic cost averaged across stimulus dimensions (Fig. 8B, right plot). The results are briefly described with text on page 13.

      Regarding the case i) (single stimulus containing multiple timescales), we considered two possibilities. One possibility is that timescales of the stimulus are separable, and in this case a single stimulus containing several time scales can be decomposed in several stimuli with a single time scale each. As we assign a new set of weights for each dimension of the decomposed stimulus, this case is similar to the case ii) that we already addressed. Another possibility is that timescales of the stimulus cannot be separated. This case is not covered in the present analysis and we listed it among the limitations of the model. We revised the text (page 13) around the question of multiple time scales and included the citation of Pozzorini et al. (2013). 

      (8) It is claimed that the model uses a mixed code to represent signals, citing reference 47 (Rigotti et al., Nature 2013). But whereas the model seems to use linear mixed selectivity, the Rigotti reference highlights the virtues of nonlinear mixed selectivity. In my understanding, a linearly mixed code does not enjoy the same benefits since it’s mathematically equivalent to a non-mixed code (simply rotate the readout matrix). I recommend that the authors clarify the type of selectivity used by their model and how it relates to the paper(s) they cite.

      The reviewer is correct that our selectivity is a linear mixing of input variables, and differs from the selectivity in Rigotti et al. (2013) which is non-linear. We revised the sentence on page 4 to clarify better that the mixed selectivity we consider is linear and we removed Rigotti’s citation. 

      (9) Reference 46 is cited as evidence that leaky integration of sensory features is a relevant computation for sensory areas. I don’t think this is quite what the reference shows. Instead, it finds certain morphological and electrophysiological differences between single pyramidal neurons in the primary visual cortex compared to the prefrontal cortex. Reference 46’ then goes on to speculate that these are differences relevant to sensory computation. This may seem like a quibble, but given the centrality of the objectivee function in normative theories, I think it's important to clarify why a particular objective is chosen.

      We agree that our reference of Amatrudo et al was not the best reference and that the previous text was confusing. We thus tried to improve on its clarity. We looked at the previous theoretical efficient coding papers introducing this leaky integration and we could not find in the previous theoretical work a justification of this assumption based on experimental papers. However, there is evidence that neurons in sensory structures, and in cortical association areas respond to time varying sensory evidence by summing stimuli over time with a weight that decreases steadily going back in time from the time of firing, which suggests that neurons integrate time-varying sensory features. In many cases, these integration kernels decay approximately exponentially going back in time, and several models explaining successfully perceptual readouts of neural activity work assuming leaky integration. This suggests that the mathematical approximation of leaky integration of sensory evidence, though possibly simplistic, is reasonable.  We revised the text in this respect (page 2).  

      (10) The definition of the objective function uses beta as a tuning parameter, but later parts of the text and figures refer to a parameter g_L which might only be introduced in the convex combination of Eq. 40a.

      This is correct. Parameter optimization has been performed on a weighted sum of the average encoding error and cost as given by the Eq. 39a (40a in first submission), with the weighting g<sub>L</sub> for the error versus the cost, and not the beta that is part of the objective in Eq.10. The convex combination in Eq. 39a allowed us to find a set of optimal parameters that is within biologically realistic parameter ranges, which includes realistic values for the firing threshold. The average encoding error and metabolic cost (the two terms on the right-hand side of Eq. 39a, without weighting with g<sub>L</sub>) in our network are of the same order (see Fig 8G for the E-I model where these values are plotted separately for the optimal network). Weighing the cost with optimal beta that is in the range of ~10 would have yielded a network that optimizes almost exclusively the metabolic cost and would bias the results towards solutions with poor encoding accuracy.

      To document more fully how the choice of weighting of the error with the cost (g<sub>L</sub>) affects the optimal parameters, we now added new analysis (Fig. 8D and Supplementary Fig. S4 A-D and H) showing optimal parameters as a function of this weighting. We commented on these results in the text on pages 9-11 and 12. For further details, please see also the reply to point 1 or Reviewer 1.

      (11) Figure 1J: "In E neurons, the distribution of inhibitory and of net synaptic inputs overlap". In my understanding, they are in fact identical, and this is by construction. It might help the reader to state this.

      We apologize for an unclear statement. In E neurons, net synaptic current is the sum of the feedforward current and of recurrent inhibition (Eq. 29c and Eq. 42). With our choice of tuning parameters that are symmetric around zero and with stimulus features that have vanishing mean, the mean of the feedforward current is close to zero. Because of this, the mean of the net current is negative and is close to the mean of the inhibitory current. We have clarified this in the text (page 5).

      (12) A few typos:

      -  p1. "Minimizes the encoding accuracy" should be "maximizes..."

      -  p1: "as well the progress" should be something like "as well as the progress"

      -  p.11 In recorded neurons where excitatory or inhibitory. ", "where" should be "were" - Fig3: missing parentheses (B)

      -  Fig4B: the 200 ticks on the y-scale are cut off.

      -  Panel Fig. 5a: "stimulus" should be "stimuli".

      -  Ref 24 "Efficient andadaptive sensory codes" is missing a space.

      -  p. 26: "requires" should be "required".

      -  On several occasions, the article "the" is missing.

      We thank the reviewer for kindly pointing out the typos that we now corrected.

      Reviewer #2 (Recommendations For The Authors):

      I would like to give the authors more details about the two main weaknesses discussed above, so that they may address specific points in the paper. First, there is the relation to previous work. Several published articles have presented very similar results to those discussed here, including references 5, 26, 28, 32, 33, 42, 43, 48, and an additional reference not cited by the authors (Calaim et al. 2022 eLife e73276). This includes:

      (1) Derivation of an E-I efficient spiking network, which is found in refs. 28, 42, 43, and 48. This is not reflected in the text: e.g., "These previous implementations, however, had neurons that did not respect Dale's law" (Introduction, pg. 1); "Unlike previous approaches (28, 48), we hypothesize that E and I neurons have distinct normative objectives...". The authors should discuss how their derivation compares to these.

      We have now fully clarified on page 3 that our model builds on the seminal previous works that introduced E-I networks with efficient coding (Supplementary text in Boerlin et al. 2013, Chalk et al. 2016, Barrett et al. 2016). 

      (2) Inclusion of a slow adaptation current: I believe this also appears in a previous paper (Gutierrez & Deneve 2019, ref. 33) in almost the exact same form, and is again not reflected in the text: "The strength of the current is proportional to the difference in inverse time constants ... and is thus absent in previous studies assuming that these time constants are equal (... ref. 33). Again, the authors should compare their derivation to this previous work.

      We thank the reviewer for pointing this out. We sincerely apologize if our previous version did not recognize sufficiently clearly that the previous work of Gutierrez and Deneve (eLife 2019; ref 33) introduced first the slow adaptation current that is similar to spike-triggered adaptation in our model. We have made sure that the revised text recognizes it more clearly. We also explained better what we changed or added with respect to this previous work (see revised text on page 8). 

      The work by Gutierrez and Deneve (2019) emphasizes the interplay between single neuron property (an adapting current in single neurons) and network property (networklevel coding through structured recurrent connections). They use a network that does not distinguish E and I neurons. Our contribution instead focuses on the adaptation in an E-I network. To improve the presentation following the Reviewer’s comment, we now better emphasize the differential effect of adaptation in E and in I neurons in revision (Fig. 5 B-D). Moreover, Gutierrez and Deneve studied the effect of adaptation on slower time scales (1 or 2 seconds) while we study the adaptation on a finer time scale of tens of milliseconds. The revised text detailed this is reported on Page 8.

      (3) Background currents and physical units: Pg. 26: "these models did not contain any synaptic current unrelated to feedforward and recurrent processing" and "Moreover previous models on efficient coding did not thoroughly consider physical units of variables" - this was briefly described in ref. 28 (Boerlin et al. 2013), in which the voltage and threshold are transformed by adding a common constant, and additional aspects of physical units are discussed.

      It is correct that Boerlin et al (2013) suggested adding a common constant to introduce physical units. We now revised the text to make clearer the relation between our results and the results of Boerlin et al. (2013) (page 3). In our paper, we built on Boerlin et al. (2013) and assigned physical units to computational variables that define the model's objective (the targets, the estimates, the metabolic constant, etc.). We assigned units to computational variables in such a way that physical variables (such as membrane potential, transmembrane currents, firing thresholds and resets) have the correct physical units.  We have now clarified how we derived physical units in the section of Results where we introduce the biophysical model (page 3) and specified how this derivation relates to the results in Boerlin et al. (2013).

      (4) Voltage correlations, spike correlations, and instantaneous E/I balance: this was already pointed out in Boerlin et al. 2013 (ref 28; from that paper: "Despite these strong correlations of the membrane potentials, the neurons fire rarely and asynchronously") and others including ref. 32. The authors mention this briefly in the Discussion, but it should be more prominent that this work presents a more thorough study of this well-known characteristic of the network.

      We agree that it would be important to comment on how our results relate to these results in Boerlin et al. (2013). It is correct that in Boerlin et al. (2013) neurons have strong correlations in the membrane potentials, but fire asynchronously, similarly to what we observe in our model. However, asynchronous dynamics in Boerlin et al. (2013) strongly depends on the assumption of instantaneous synaptic transmission and time discretization, with a “one spike per time bin” rule in numerical implementation. This rule enforces that at most one spike is fired in each time bin, thus actively preventing any synchronization across neurons. If this rule is removed, their network synchronizes, unless the metabolic constant is strong enough to control such synchronization to bring it back to asynchronous regime (see ref. 36). Our implementation does not contain any specific rule that would prevent synchronization across neurons. We now cite the paper by Boerlin and colleagues and briefly summarize this discussion when we describe the result of Fig. 3D on page 7. 

      (5) Perturbations and parameters sweep: I found one previous paper on efficient spiking networks (Calaim et al. 2022) which the authors did not cite, but appears to be highly relevant to the work presented here. Though the authors perform different perturbations from this previous study, they should ideally discuss how their findings relate to this one. Furthermore, this previous study performs extensive sweeps over various network parameters, which the authors might discuss here, when relevant. For example, on pg. 8, the authors write “We predict that, if number of neurons within the population decreases, neurons have to fire more spikes to achieve an optimal population readout” – this was already shown in Calaim et al. 2022 Figure 5, and the authors should mention if their results are consistent.

      We apologize for not being aware of Calaim et al. (2022) when we submitted the first version of our paper. This important study is now cited in the revised version. We have now, as suggested, performed sweeps of multiple parameters inspired by the work of Calaim. This new analysis is described extensively in reply to Weaknesses in the Public Review of reviewer 2 and is found in Fig 2, 6I and 7J and described on pages 5,11 and 13.

      The Reviewer is also correct that the compensation mechanism that applies when changing the ratio of E-I neuron numbers is similar to the one described in Barrett et al. (2016) and related to our claim “if number of neurons within the population decreases, neurons have to fire more spikes to achieve an optimal population readout”. We have now added (page 11) that this prediction is consistent with the finding of Barrett et al. (2016).

      With regard to the dependence of optimal coding properties on the number of neurons, we have tried to better describe similarities and differences with our work and that of Calaim et al as well as with the work of Barrett et al. (2016) which reports highly relevant results. These additional considerations are summarized in a paragraph in Discussion (page 16).

      (6) Overall, the authors should distinguish which of their results are novel, which ones are consistent with previous work on efficient spiking networks, and which ones are consistent in general with network implementations of efficient and sparse coding. In many of the above cases, this manuscript goes into much more depth and study of each of the network characteristics, which is interesting and commendable, but this should be made clear. In clarifying the points listed above, I hope that the authors can better contextualize their work in relation to previous studies, and highlight what are the unique characteristics of the model presented here.

      We made a number of clarifications of the text to provide better contextualization of our model within existing literature and to credit more precisely previous publications. This includes commenting on previous studies that introduced separate objective functions of E and I neurons (page 2), spike-triggered adaptation (page 8), physical units (page 3), and changes in the number of neurons in the network (page 16). 

      Next, there are the claims of optimal parameters. As explained on pg. 35 (criterion for determining optimal model parameters), it appears to me that they simply vary each parameter one at a time around the optimal value. This argument appears somewhat circular, as they would need to know the optimal parameters before starting this sweep. In general, I find these optimality considerations to be the most interesting and novel part of the paper, but the simulations are relatively limited, so I would ask the authors to either back them up with more extensive parameter sweeps that consider covariations in different parameters simultaneously (as in Calaim et al. 2022). Furthermore, the authors should make sure that they are not breaking any of the required relationships between parameters necessary for the optimization of the loss function. Again, some of the results (such as coding error not being minimized with zero metabolic cost) suggests that there might be issues here. 

      We thank the reviewer for this insightful suggestion. We have now added a joint sweep of all relevant model parameters using Monte-Carlo parameter search with 10.000 iterations. We randomly drew parameter configurations from predetermined parameter ranges that are detailed in the newly added Table 2. Parameters were sampled from a uniform distribution. We varied all the six model parameters studied in the paper (metabolic constant, noise intensity, time constant of single E and I neurons, ratio of E to I neurons and ratio of the mean I-I to E-I connectivity).  We now present these results on a new Figure 2. We did not find any set of parameters with lower loss than the parameters in Table 1 when the weighting of the error with the cost was in the following range: 0.4<g<sub>L</sub><0.81 (Fig. 2C). While our large but finite Monte-Carlo random sampling does not fully prove that the configuration we selected as optimal (on Table 1) is a global optimum, it shows that this configuration is highly efficient. Further, and as detailed in the rebuttal to the Weaknesses of the Public Review of Referee 2, analyses of the near optimal solutions are compatible with the notion (resulting from the join parameter sweep studies that we added to Figures 6 and 7) that network optimality may be influenced by joint covariations in parameters. These new results are reported in Results (page 5, 11 and 13) and in Figure 2, 6I an 7J.

      Some more specific points:

      (1) In general, I find it difficult to understand the scaling of the RMSE, cost, and loss values in Figures 4-7. Why are RMSE values in the range of 1-10, whereas loss and cost values are in the range of 0-1? Perhaps the authors can explicitly write the values of the RMSE and loss for the simulation in Figure 1G as a reference point.

      Encoding error (RMSE), metabolic cost (MC) and average loss for a well performing network are within the range of 1-10 (see Fig. 8G or 7C in the first submission). To ease the visualization of results, we normalized the cost and the loss on Figs. 6-8 in order to plot them on the same figure (while the computation of the optima is done following the Eq. 39 and is without normalization). We have now explicitly written the values of RMSE, MC and the average loss (non-normalized) for the simulation in Fig. 1D on page 5, as suggested by the reviewer. We have also revised Fig. 4 and now show the absolute and not the relative values of the RMSE and the MC (metabolic cost). 

      (2) Optimal E-I neuron ratio of 4:1 and efficacy ratio of 3:1: besides being unintuitive in relation to previous work, are these two optimal settings related to one another? If there are 4x more excitatory neurons than inhibitory neurons, won't this affect the efficacy ratio of the weights of the two populations? What happens if these two parameters are varied together?

      Thanks for this insightful point. Indeed, the optima of these two parameters are interdependent and positively correlated - if we decrease the E-I neuron ratio, the optimal efficacy ratio decreases as well. To better show this relation we added figures with 2dimensional parameter search (Fig. 7J) where we varied jointly the two ratios. The red cross on the right figure marks the optimal ratios used as optimal parameters in our study. These finding are discussed on page 13.

      (3) Optimal dimensionality of M=[1,4]: Again, previous work (Calaim et al. 2022) would suggest that efficient spiking networks can code for arbitrary dimensional signals, but that performance depends on the redundancy in the network - the more neurons, the better the coding. From this, I don't understand how or why the authors find a minimum in Figure 7B. Why does coding performance get worse for small M?

      We optimized all model parameters with M=3 and this is the reason why M=3 is the optimal number of inputs when we vary this parameter. Our network shows a distinct minimum of the encoding error as a function of the stimulus dimensionality for both E and I neurons (Fig. 8C, top). This minimum is reflected in the minimum of the average loss (Fig. 8C, bottom). The minimum of the loss is shifted (or biased) by the metabolic cost, with strong weighting of the cost lowering the optimal number of inputs. This is discussed on pages 13-14.

      Here are a list of other, more minor points, that the authors can consider addressing to make the results and text more clear:

      (1) Feedforward efficient coding models: in the introduction (pg. 1) and discussion (pg. 11) it is mentioned that early efficient coding models, such as that of Olshausen & Field 96, were purely feedforward, which I believe to be untrue (e.g., see Eq. 2 of O&F 96). Later models made this even more explicit (Rozell et al. 2008). Perhaps the authors can either clarify what they meant by this, or downplay this point.

      We sincerely apologize for the oversight present in the previous version of the text. We agree with the reviewer that the model in Olshausen and Field (1996) indeed defines a network with recurrent connections, and the same type of recurrent connectivity has been used by Rozell et al. (2008, 2013). The structure of the connectivity in Olshausen and Field (as well as in Rozell et al (2008)) is closely related to the structure of connectivity that we derived in our model. We have corrected the text in the introduction (page 1) to remove these errors.

      (2) Pg. 2 - The authors state: "We draw tuning parameters from a normal distribution...", but in the methods, it states that these are then normalized across neurons, so perhaps the authors could add this here, or rephrase it to say that weights are drawn uniformly on the hypersphere.

      We rephrased the description of how weights were determined (page 2).

      (3) Pg. 2 - "We hypothesize the time-resolved metabolic cost to be proportional to the estimate of a momentary firing rate of the neural population" - from what I can see, this is not the usual population rate, which would be an average or sum of rates across the population.

      Indeed, the time-dependent metabolic cost is not the population rate (in the sense of the sum of instantaneous firing rates across neurons), but is proportional to it by a factor of 1/t. More precisely, we can define the instantaneous estimate of the firing rate of a single neuron i as z<sub>i</sub>(t) = 1/t<sub>r</sub> r<sub>i</sub>(t) with r<sub>i</sub>(t) as in Eq. 7. We have clarified this in the revised text on page 3. 

      (4) Pg. 3: "The synaptic strength between two neurons is proportional to their tuning similarity if the tuning similarity is positive" - based on the figure and results, this appears to be the case for I-E, E-I, and I-I connections, but not for E-E connections. This should be clarified in the text. Furthermore, one reference given in the subsequent sentence (Ko et al. 2011, ref. 51), is specifically about E-E connections, so doesn't appear to be relevant here.

      We have now specified that the Eq. 24 does not describe E-E connections. We also agree that the reference (Ko et al. 2011) did not adequately support our claim and we thus removed it and revised the text on page 3 accordingly.

      (5) Pg. 3: "the relative weight of the metabolic cost over the encoding error controls the operating regime of the network" and "and an operating regime controlled by the metabolic constant" - what do you mean by operating regime here?

      We used the expression “operating regime” in the sense of a dynamical regime of the network.  However, we agree that this expression may be confusing and we removed it in revision. 

      (6) Pg. 3: "Previous studies interpreted changes of the metabolic constant beta as changes to the firing thresholds, which has less biological plausibility" - can the authors explain why this is less plausible, or ideally provide a reference for it?

      In biological networks, global variables such as brain state can strongly modulate the way neural networks respond to a feedforward stimulus. These variables influence neural activity in at least two distinct ways. One is by changing non-specific synaptic inputs to neurons, which is a network-wide effect (Destexhe and Pare, Nature Reviews Neurosci. 2003). This is captured in our model by changing the strength of the mean and fluctuations in the external currents. Beyond modulating synaptic currents, another way of modulating neural activity is by changing cell-intrinsic factors that modulate the firing threshold in biological neurons (Pozzorini et al. 2013). Previous studies on spiking networks with efficient coding interpreted the effect of the metabolic constant as changes to the firing threshold (Koren and Deneve, 2017, Gutierrez and Deneve 2019), which corresponds to cell-intrinsic factors. Here we instead propose that the metabolic constant modulates the neural activity by changing the non-specific synaptic input, homogeneously across all neurons in the network. Interpreting the metabolic constant as setting the mean of the non-specific synaptic input was necessary in our model to find an optimal set of parameters (as in Table 1) that is also biologically plausible. We revised the text accordingly (page 4).

      (7) Pg. 4: Competition across neurons: since the model lacks E-E connectivity, it seems trivial to conclude that there is competition through lateral inhibition, and it can be directly determined from the connectivity. What is gained from running these perturbation experiments?

      We agree that a reader with a good understanding of sparse / efficient coding theory can tell that there is competition across neurons with similar tuning already from the equation for the recurrent connectivity (Eq. 24). However, we presume that not all readers can see this from the equations and that it is worth showing this with simulations.

      Following the reviewer's comment, we have now downplayed the result about the model manifesting lateral inhibition in general on page 6. We have also removed its extensive elaboration in Discussion.

      One reason to run perturbation experiments was to test to what extent the optimal model qualitatively replicates empirical findings, in particular, single neuron perturbation experiments in Chettih and Harvey, 2019, without specifically tuning any of the model parameters. We found that the model reproduces qualitatively the main empirical findings, without tuning the model to replicate the data. We revised the text on page 5 accordingly.

      Further reason to run these experiments was to refine predictions about the minimal amount of connectivity structure that generates perturbation response profiles that are qualitatively compatible with empirical observations. To establish this, we did perturbation experiments while removing the connectivity structure of a particular connectivity sub-matrices (E-I, I-I or I-E; Fig. S3 F). This allowed us to determine which connectivity matrix has to be structured to observe results that qualitatively match empirical findings. We found that the structure of E-I and I-E connectivity is necessary, but not the structure of I-I connectivity. Finally, we tested partial removal of the connectivity structure where we replaced the precise (and optimal) connectivity structure and imposed a simpler connectivity rule. In the optimal connectivity, the connection strength is proportional to the tuning similarity. A simpler connectivity rule, in contrast, only specifies that neurons with similar tuning share a connection, and beyond this the connection strength is random. Running perturbation experiments in such a network obeying a simpler connectivity rule still qualitatively replicated empirical results from Chettih and Harvey (2019). This is shown on the Supplementary Fig. S2F on described on page 8.

      (8) Pg. 4: "the optimal E-I network provided a precise and unbiased estimator of the multidimensional and time-dependent target signal" - from previous work (e.g., Calaim et al. 2022), I would guess that the estimator is indeed biased by the metabolic cost. Why is this not the case here? Did you tune the output weights to remove this bias?

      Output weights were not tuned to remove the bias. On Fig. 1H in the first submission we plotted the bias for the network that minimizes the encoding error. We forgot to specify this in the text and figure caption, for which we apologize. We now replaced this figure with a new one (Fig. 1E) where we plot the bias of the network minimizing the average loss (with parameters as in Table 1). The bias of the network minimizing the error is close to zero, B^E = 0.02 and B^I = 0.03.  The bias of the network minimizing the loss is stronger and negative, B^E = -0.15 and B^I=-0.34. In the text of Results, we now report the bias of both networks (i.e., optimizing the encoding error and optimizing the loss). We also added a plot showing trial-averaged estimates and a time-dependent bias in each stimulus dimension (Supplementary figure S1 F). Note that the network minimizing the encoding error requires a lower metabolic constant (β = 6) than the network optimizing the loss (β=14), however, the optimal metabolic cost in both networks is nonzero. We revised the text and explained these points on page 5.

      (9) Pg. 4: "The distribution of firing rates was well described by a log-normal distribution" - I find this quite interesting, but it isn't clear to me how much this is due to the simulation of a finitetime noisy input. If the neurons all have equal tuning on the hypersphere, I would expect that the variability in firing is primarily due to how much the input correlates with their tuning. If this is true, I would guess that if you extend the duration of the simulation, the distribution would become tighter. Can you confirm that this is the stationary distribution of the firing rates?

      We now simulated the network with longer simulation time (10 seconds of simulated time instead of 2 seconds used previously) and also iterated the simulation across 10 trials to report a result that is general across random draws of tuning parameters (previously a single set of tuning parameters was used). The reviewer is correct that the distribution of firing rates of E neurons has become tighter with longer simulation time, but distributions remain log-normal. We also recomputed the coefficient of variation (CV) using the same procedure. We updated these plots on Fig. 1F.

      (10) Pg. 4: "We observed a strong average E-I balance" - based on the plots in Figure 1J, the inputs appear to be inhibition-dominated, especially for excitatory neurons. So by what criterion are you calling this strong average balance?

      The reviewer is correct about the fact that the net synaptic input to single neurons in our optimal network shows excess inhibition and the network is inhibition-dominated, so we revised this sentence (page 5) accordingly.  

      (11) Pg. 4: Stronger instantaneous balance in I neurons compared to E neurons - this is curious, and I have two questions: (1) can the authors provide any intuition or explanation for why this is the case in the model? and (2) does this relate to any literature on balance that might suggest inhibitory neurons are more balanced than excitatory neurons?

      In our model, I neurons receive excitatory and inhibitory synaptic currents through synaptic connections that are precisely structured. E neurons receive structured inhibition and a feedforward current. The feedforward current consists of M=3 independent OU processes projected on the tuning vectors of E neurons w<sub>i</sub><sup>E</sup>. We speculate that because the synaptic inhibition and feedforward current are different processes and the 3 OU inputs are independent, it is harder for E neurons to achieve the instantaneous balance that would be as precise as in I neurons. While we think that the feedforward current in our model reflects biologically plausible sensory processing, it is not a mechanistic model of feedforward processing. In biological neurons, real feedforward signals are implemented as a series of complex feedforward synaptic inputs from downstream areas, while the feedforward current in our model is a sum of stimulus features, and is thus a simplification of a biological process that generates feedforward signals. We speculate that a mechanistic implementation of the feedforward current could increase the instantaneous balance in E neurons.  Furthermore, the presence of EE connections could potentially also increase the instantaneous balance in E neurons. We revised the Discussion about these important questions that lie on the side of model limitations and could be advanced in future work. We could not find any empirical evidence directly comparing the instantaneous balance in E versus I neurons.  We have reported these considerations in the revised Discussion (page 16).

      (12) Pg. 5, comparison with random connectivity: "Randomizing E-I and I-E connectivity led to several-fold increases in the encoding error as well as to significant increases in the metabolic cost" and Discussion, pg. 11: "the structured network exhibits several fold lower encoding error compared to unstructured networks": I'm wondering if these comparisons are fair. First, regarding activity changes that affect the metabolic cost - it is known that random balanced networks can have global activity control, so it is not straightforward that randomizing the connectivity will change the metabolic cost. What about shuffling the weights but keeping an average balance for each neuron's input weights? Second, regarding coding error, it is trivial that random weights will not map onto the correct readout. A fairer comparison, in my opinion, would at least be to retrain the output weights to find the best-fitting decoder for the threedimensional signal, something more akin to a reservoir network.

      Thank you for raising these interesting questions. The purpose of comparing networks with and without connectivity structure was to observe causal effects of the connectivity structure on the neural activity. We agree that the effect on the encoding error is close to trivial, because shuffling of connectivity weights decouples neural dynamics from decoding weights. We have carefully considered Reviewer's suggestions to better compare the performance of structured and unstructured networks. 

      In reply to the first point, we followed the reviewer's suggestion and compared the optimal network with a shuffled network that matched the optimal network in its average balance. This was achieved by increasing the metabolic constant, decreasing the noise intensity and slightly decreasing the feedforward stimulus (we did not find a way to match the net current in both cell types by changing a single parameter). As we compared the metabolic cost between the optimal and the shuffled network with matched average balance, we still found lower metabolic cost in the optimal network, even though the difference was now smaller. We replaced Fig. 3B from the first submission with these new results in Fig. 4B and commented on them in the text (page 7).

      In reply to the second point, we followed reviewer’s suggestion and compared the encoding error (RMSE) of the optimal network and the network with shuffled connectivity where decoding weights are trained such as to optimally reconstruct the target signal. As suggested, we now analyzed the encoding error of the networks using decoding weights trained on the set of spike trains generated by the network using linear least square regression to minimize the decoding error. For a fair and quantitative comparison and because we did not train decoding weights of our structured model, we performed this same analysis using spike trains generated by networks with structured and shuffled recurrent connectivity. We found that the encoding error is smaller in the E population and much smaller in the I population in the structured compared to the random network. Decoding weights found numerically in the optimal network approach uniform distribution of weights that we used in our model (Fig. 4A, right). In contrast, decoding weights obtained from the random network do not converge to a uniform distribution, but instead form a much sparser distribution, in particular in I neurons (Supplementary Fig. S3 A). These additional results reported in the above mentioned figures are discussed in text on page 14.  

      (13) Pg. 5: "a shift from mean-driven to fluctuation-driven spiking" and Pg. 11 "a network structured as in our efficient coding solution operates in a dynamical regime that is more stimulus-driven, compared to an unstructured network that is more fluctuation driven" - I would expect that the balanced condition dictates that spiking is always fluctuation driven. I'm wondering if the authors can clarify this.

      We agree with the reviewer that networks with and without connectivity structure are fluctuation-driven, because in a mean-driven network the mean current must be suprathreshold (Ahmadian and Miller, 2021), which is not the case of either network. We removed the claim of the change from mean to fluctuation driven regime in the revised paper. We are grateful to the Reviewer for helping us tighten the elaboration of our findings.

      (14) Pg. 5: "suggesting that variability of spiking is independent of the connectivity structure" - the literature of balanced networks argues against this. Is this not simply because you have a noisy input? Can you test this claim?

      We thank the reviewer for the suggestion. We tested this claim by measuring the coefficient of variation in networks receiving a constant stimulus. In particular, we set the same strength in each of the M=3 stimulus dimensions and set the stimulus amplitude such as to match the firing rate of the optimal network in response to the OU stimulus. We computed the coefficient of variation in 200 simulation trials.  The removal of connectivity structure did not cause significant change of the coefficient of variation in a network driven by a constant stimulus (Fig. 4E). These additional results are discussed in text on page 7. 

      We also taken the suggestion about variability of spiking being independent of the connectivity structure. We removed this claim in the revision, because we only tested a couple of specific cases where the connectivity is structured with respect to tuning similarity (fully structured, fully unstructured and partially unstructured networks). This is not exhaustive of all possible structures that recurrent connectivity may have.

      (15) Pg. 6: "we also removed the connectivity structure only partially, keeping like-to-like connectivity structure and removing all structure beyond like-to-like" - can you clarify what this means, perhaps using an equation? What connectivity structure is there besides like-to-like?

      In the optimal model, the strength of the synapse between a pair of neurons is proportional to the tuning similarity of the two neurons, Y<sub>ij</sub> proportional to J<sub>ij</sub> for Y<sub>ij</sub> >0 (see Eq. 24 and Fig. 1C(ii)). Besides networks with optimal connectivity, we also tested networks with a simpler connectivity rule. Such a simpler rule prescribes a connection if the pair of neurons has similar tuning (Y<sub>ij</sub> >0), and no connection otherwise. The strength of the connection following this simpler connectivity rule is otherwise random (and not proportional to pairwise tuning similarity Y<sub>ij</sub> as it is in the optimal network). We clarified this in the revision (page 8), also by avoiding the term “like-to-like” for the second type of networks, which could indeed be prone to confusion.

      (16) Pgs. 6-7: "we indeed found that optimal coding efficiency is achieved with weak adaptation in both cell types" and "adaptation in E neurons promotes efficient coding because it enforces every spike to be error- correcting" - this was not clear to me. First, it appears as though optimal efficiency is achieved without adaptation nor facilitation, i.e., when the time constants are all equal. Indeed, this is what is stated in Table 1. So is there really a weak adaptation present in the optimal case? Second, it seems that the network already enforces each spike to be errorcorrecting without adaptation, so why and how would adaptation help with this?

      We agree with the Reviewer that the network without adaptation in E and I neurons is already optimal. It is also true that most spikes in an optimal network should already be error-correcting (besides some spikes that might be caused by the noise). However, regimes with weak adaptation in E neurons remain close to optimality. Spike-triggered facilitation, meanwhile, ads spikes that are unnecessary and decrease network efficiency. We revised the Fig.5 (Fig. 4 in first submission) and replaced 2-dimensional plots in Fig.4 C-F with plots that show the differential effect of adaptation in E neurons (top) and in I neurons (bottom plots) for the measures of the encoding error (RMSE), the efficiency (average loss) and the firing rate (Fig. 5B-D). On the new Fig. 5C it is evident that the loss of E and I population grows slowly with adaptation in E neurons (top) while it grows faster with adaptation in I neurons (bottom). These considerations are explained in revised text on page 9.

      (17) Pg. 7: "adaptation in E neurons resulted in an increase of the encoding error in E neurons and a decrease in I neurons" - it would be nice if the authors could provide any explanation or intuition for why this is the case. Could it perhaps be because the E population has fewer spikes, making the signal easier to track for the I population?

      We agree that this could indeed be the case. We commented on it in revision (page 9).

      (18) Pg. 7: "The average balance was precise...with strong adaptation in E neurons, and it got weaker when increasing the adaptation in I neurons (Figure 4E)" - I found the wording of this a bit confusing. Didn't the balance get stronger with larger I time constants?

      By increasing the time constant of I neurons, the average imbalance got weaker (closer to zero) in E neurons (Fig. 5G, left), but stronger (further away from zero) in I neurons (Fig. 5G, right). We have revised the text on page 9 to make this clearer.

      (19) Pg. 7: Figure 4F is not directly described in the text.

      We have now added text (page 9) commenting on this figure in revision.

      (20) Pg. 8: "indicating that the recurrent network dynamics generates substantial variability even in the absence of variability in the external current" -- how does this observation relate to your earlier claim (which I noted above) that "variability of spiking is independent of connectivity structure"?

      We agree that the claim about variability of spiking being independent of connectivity structure was overstated and we thus removed it. The observation that we wanted to report is that both structured and unstructured networks have very similar levels of variability of spiking of single neurons. The fact that much of the variability of the optimal network is generated by recurrent connections is not incompatible. We revised the related text (page 11) for clarity.

      (21) Pg. 9: "We found that in the optimally efficient network, the mean E-I and I-E synaptic efficacy are exactly balanced" - isn't this by design based on the derivation of the network?

      True, the I-E connectivity matrix is the transpose of the E-I connectivity matrix, and their means are the same by the analytical solution. This however remains a finding of our study. We have clarified this in the revised text (page 12).

      (22) Pg. 30, eq. 25: the authors should verify if they include all possible connectivity here, or if they exclude EE connectivity beforehand.

      We now specify that the equation for recurrent connectivity (Eq. 24, Eq. 25 in first submission) does not include the E-E connectivity in the revised text (page 41).

      Reviewer #3 (Recommendations For The Authors):

      Essential

      (1)  Currently, they measure the RMSE and cost of the E and I population separately, and the 1CT model. Then, they average the losses of the E and I populations, and compare that to the 1CT model, with the conclusion that the 1CT model has a higher average loss. However, it seems to me that only the E population should be compared to the 1CT model. The I population loss determines how well the I population can represent the E population representation (which it can do extremely well). But the overall coding accuracy of the network of the input signal itself is only represented by the E population. Even if you do combine the E and I losses, they should be summed, not averaged. I believe a more fair conclusion would be that the E/I networks have generally slightly worse performance because of needing to follow Dale's law, but are still highly efficient and precise nonetheless. Of course, I might be making a critical error somewhere above, and happy to be convinced otherwise!

      We carefully considered the reviewer's comment and tested different ways of combining the losses of the E and I population. We decided to follow the reviewer's suggestion and to compare the loss of the E population of the E-I model with the loss of the one cell type model. As evident already from the Fig. 8G, such comparison indeed changes the result to make the 1CT model more efficient. Also, the sum of losses of E and I neurons results in the 1CT model being more efficient than the E-I model. Note, however, the robustness of the E-I model to changes in the metabolic constant (Fig. 6C, top). The firing rates of the E-I model stay within physiological ranges for any value of the metabolic constant, while the firing rate of the 1CT model skyrocket for the metabolic constant that is lower than optimal (Fig. 8I).

      We added to Results (page 14) a summary of these findings.

      (2) The methods and main text should make much clearer what aspects of the derivation are novel, and which are not novel (see review weaknesses for specifics).

      We specified these aspects, as discussed in more detail in the above reply to point 4 of the public review of Reviewer 1.

      Request:

      If possible, I would like to see the code before publication and give recommendations on that (is it easy to parse and reproduce, etc.)

      We are happy to share the computer code with the reviewer and the community. We added a link to our public repository containing the computer code that we used for simulations and analysis to the preprint and submission (section “Code availability” on page 17). 

      Suggestions:

      (1) I believe that for an eLife audience, the main text is too math-heavy at the beginning, and it could be much simplified, or more effort could be made to guide the reader through the math.

      We tried to do our best to improve the clarity of description of mathematical expressions in the main text.

      (2) Generally vector notation makes network equations for spiking neurons much clearer and easier to parse, I would recommend using that throughout the paper (and not just in the supplementary methods).

      We now use vector notation throughout the paper whenever we think that this improves the intelligibility of the text. 

      (3) In the discussion or at the end of the results adding a clear section summarizing what the minimal requirements or essential assumptions are for biological networks to implement this theory would be helpful for experimentalists and theorists alike.

      We have added such a section in Discussion (page 15). 

      (5) I think the title is a bit too cumbersome and hard to parse. Might I suggest something like 'Efficient coding and energy use in biophysically realistic excitatory-inhibitory spiking networks' or 'Biophysically constrained excitatory-inhibitory spiking networks can efficiently implement efficient coding'.

      We followed reviewer’s suggestion and changed the title to “Efficient coding in biophysically realistic excitatory-inhibitory spiking networks.”

      (6) How the connections were shuffled exactly was not clear to me in how it was described now. Did they just take the derived connectivity, and shuffle the connections around? I recommend a more explicit methods section on it (I might have missed it).

      Indeed, the connections of the optimal network were randomly shuffled, without repetition, between all neuronal pairs of a specific connectivity matrix. This allows to preserve all properties of the distribution of connectivity weights and only removes the structure of the connectivity, which is precisely what we wanted to test. We now added a section in Methods (“Removal of connectivity structure”) on pages 51-52 where we explain how the connectivity structure is removed.

      (7) Figure 1 sub-panel ordering was confusing to read (first up down, then left right). Not sure if re- arranging is possible, but perhaps it could be A, B, and C at the top, with subsublabels (i) and (ii). Might become too busy though.

      We followed this suggestion and rearranged the Fig. 1 as suggested by the reviewer. 

      (8) Equation 3 in the main text should specify that 'y' stands for either E or I.

      This has been specified in the revision (page 3). 

      (9) Figure 1D shows a rough sketch of the types of connectivities that exist, but I would find it very useful to also see the actual connection strengths and the effect of enforcing Dale's law.

      We revised this figure (now Fig. 1B (ii)) and added connection strengths as well as a sketch of a connection that was removed because of Dale’s law.

      (10) The main text mentions how the readout weights are defined (normal distributions), but I think this should also be mentioned in the methods.

      Agreed. We indeed had Methods section “Parametrization of synaptic connectivity (page 46), where we explain how readout weights are defined. We apologize if a call on this section was not salient enough in the first submission. We made sure that the revised main text contains a clear pointer to this Methods section for details. 

      (11) The text seems to mix ‘decoding weights’ and ‘readout weights’.

      Thanks for this suggestion to use consistent language. We opted for ‘decoding weights’ and removed ‘readout weights’.

      (12) The way the paper is written makes it quite hard to parse what are new experimental predictions, and what results reproduce known features. I wonder if some sort of 'box' is possible with novel predictions that experimentalists could easily look at and design an experiment around.

      We now revised the text. We clarified for every property of the model if this property is a prediction of facts that were not yet experimentally tested or if it accounts for previously observed properties of biological neurons. Please see the reply to point 4 of Reviewer 1. 

      (13) Typo's etc.:

      Page 5 bottom -- ("all") should have one of the quotes change direction (common latex typo, seems to be the only place with the issue).

      We thank the reviewer for pointing out this typo that has been removed in revision.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the anatomical features of the synaptic boutons in layer 1 of the human temporal neocortex. They examined the size of each synapse, the macular or perforated appearance, the size of the synaptic active zone, the number and volume of the mitochondria, and the number of synaptic and dense core vesicles, also differentiating between the readily releasable, the recycling, and the resting pool of synaptic vesicles. The coverage of the synapse by astrocytic processes was also assessed, and all the above parameters were compared to other layers of the human temporal neocortex. The authors conclude that the subcellular morphology of the layer 1 synapses are suitable for the functions of the neocortical layer, i.e. the synaptic integration within the cortical column. The low glial coverage of the synapses might allow increased glutamate spillover from the synapses, enhancing synaptic crosstalk within this cortical layer.

      Strengths:

      The strengths of this paper are the abundant and very precious data about the fine structure of the human neocortical layer 1. Quantitative electron microscopy data (especially that derived from the human brain) are very valuable since this is a highly time- and energy-consuming work. The techniques used to obtain the data, as well as the analyses and the statistics performed by the authors are all solid, strengthen this manuscript, and mainly support the conclusions drawn in the discussion.

      We would like to thank reviewer#1 for his very positive comments on our manuscript stating that such data about the fine structure of the human neocortex are are highly relevant.

      Weaknesses:

      There are several weaknesses in this work. First, the authors should check and review extensively for improvements to the use of English. Second, several additional analyses performed on the existing data could substantially elevate the value of the data presented. Much more information could be gained from the existing data about the functions of the investigated layer, of the cortical column, and about the information processing of the human neocortex. Third, several methodological concerns weaken the conclusions drawn from the results.

      We would like to thank the reviewer for his critical and thus helpful comments on our manuscript. We took the first comment of the reviewer concerning the English and have thus improved our manuscript by rephrasing and shortening sentences. Secondly, according to the reviewer several additional analyses should be performed on the existing data, which could substantially elevate the value of the data presented. We will implement some of the suggestions in the improved version of the manuscript where appropriate. We will address a more detailed answer to the reviewer’s queries in her/his suggestions to the authors (see below). However, the reviewer states himself: “The techniques used to obtain the data, as well as the analyses and the statistics performed by the authors are all solid, strengthen this manuscript, and mainly support the conclusions drawn in the discussion”.

      Reviewer #2 (Public review):

      Summary:

      The study of Rollenhagen et al. examines the ultrastructural features of Layer 1 of the human temporal cortex. The tissue was derived from drug-resistant epileptic patients undergoing surgery, and was selected as far as possible from the epilepsy focus, and as such considered to be non-epileptic. The analyses included 4 patients with different ages, sex, medication, and onset of epilepsy. The manuscript is a follow-on study with 3 previous publications from the same authors on different layers of the temporal cortex:

      Layer 4 - Yakoubi et al 2019 eLife

      Layer 5 - Yakoubi et al 2019 Cerebral Cortex

      Layer 6 - Schmuhl-Giesen et al 2022 Cerebral Cortex.

      They find, that the L1 synaptic boutons mainly have a single active zone, a very large pool of synaptic vesicles, and are mostly devoid of astrocytic coverage.

      Strengths:

      The manuscript is well-written and easy to read. The Results section gives a detailed set of figures showing many morphological parameters of synaptic boutons and glial elements. The authors provide comparative data of all the layers examined by them so far in the Discussion. Given that anatomical data in the human brain are still very limited, the current manuscript has substantial relevance. The work appears to be generally well done, the EM and EM tomography images are of very good quality. The analysis is clear and precise.

      We would like to thank the reviewer for his very positive evaluation of our paper and the comments that such data have a substantial relevance, in particular in the human neocortex. In contrast to reviewer#1, this reviewer’s opinion is that the manuscript is well written and easy to read.

      Weaknesses:

      One of the main findings of this paper is that "low degree of astrocytic coverage of L1 SBs suggests that glutamate spillover and as a consequence synaptic cross-talk may occur at the majority of synaptic complexes in L1". However, the authors only quantified the volume ratio of astrocytes in all 6 layers, which is not necessarily the same as the glial coverage of synapses. In order to strengthen this statement, the authors could provide 3D data (that they have from the aligned serial sections) detailing the percentage of synapses that have glial processes in close proximity to the synaptic cleft, that would prevent spillover.

      We agree with the reviewer that we only quantified the volume ratio of the astrocytic coverage but not necessarily the percentage of synapses that may or not contribute to the formation of the ‘tripartite’ synapse. As suggested, we will re-analyze our material with respect to the percentage of coverage for individual synaptic boutons in each layer and will implement the results in the improved version of the manuscript. However, since this is a completely new analysis that is time-consuming we would like to ask the reviewer for additional time to perform this task.

      A specific statement is missing on whether only glutamatergic boutons were analyzed in this MS, or GABAergic boutons were also included. There is a statement, that they can be distinguished from glutamatergic ones, but it would be useful to state it clearly in the Abstract, Results, and Methods section what sort of boutons were analyzed. Also, what is the percentage of those boutons from the total bouton population in L1?

      We would like to thank the reviewer for this comment. Although our title clearly states, we focused on quantitative 3D-models of excitatory synaptic boutons, we will point out that more clearly in the Methods and Result chapters. Our data support recent findings by others (see for example Cano-Astorga et al. 2023, 2024; Shapson-Coe et al. 2024) that have evaluated the ratio between excitatory vs. inhibitory synaptic boutons in the temporal lobe neocortex, the same area as in our study, which was between 10-15% inhibitory terminals but with a significant layer and region specific difference. We will include the excitatory vs. inhibitory ratio and the corresponding citations in the Results section.

      Synaptic vesicle diameter (that has been established to be ~40nm independent of species) can properly be measured with EM tomography only, as it provides the possibility to find the largest diameter of every given vesicle. Measuring it in 50 nm thick sections results in underestimation (just like here the values are ~25 nm) as the measured diameter will be smaller than the true diameter if the vesicle is not cut in the middle, (which is the least probable scenario). The authors have the EM tomography data set for measuring the vesicle diameter properly.

      We partially disagree with the reviewer on this point. Using high-resolution transmission electron microscopy, we measured the distance from the outer-to-outer membrane only on those synaptic vesicles that were round in shape with a clear ring-like structure to avoid double counts and discarded all those that were only partially cut according to criteria developed by Abercrombie (1946) and Boissonnat (1988). We assumed that within a 55±5 nm thick ultrathin section (silver to gray interference contrast) all clear-ring-like vesicles were distributed in this section assuming a vesicle diameter between 25 to 40nm. For large DCVs, double-counts were excluded by careful examination of adjacent images and were only counted in the image where they appeared largest.

      In addition, we have measured synaptic vesicles using TEM tomography and came to similar results. We will address this in Material and Methods that both methods were used.

      It is a bit misleading to call vesicle populations at certain arbitrary distances from the presynaptic active zone as readily releasable pool, recycling pool, and resting pool, as these are functional categories, and cannot directly be translated to vesicles at certain distances. Indeed, it is debated whether the morphologically docked vesicles are the ones, that are readily releasable, as further molecular steps, such as proper priming are also a prerequisite for release.

      We thank the reviewer for this comment. However, nobody before us tried to define a morphological correlate for the three functionally defined pools of synaptic vesicles since synaptic vesicles normally are distributed over the entire nerve terminal. As already mentioned above, after long and thorough discussions with Profs. Bill Betz, Chuck Stevens, Thomas Schikorski and other experts in this field we tried to define the readily releasable (RRP), recycling (RP) and resting pools by measuring the distance of each synaptic vesicle to the presynaptic density (PreAZ). Using distance as a criterion, we defined the RRP including all vesicles that were located within a distance (perimeter) of 10 to 20 nm from the PreAZ that is less than an average vesicle diameter (between 25 to 40 nm). The RP was defined as vesicles within a distance of 60-200 nm away, still quite close but also rapidly available on demand and the remaining ones beyond 200 nm were suggested to belong to the resting pool. This concept was developed for our first publication (Sätzler et al. 2002) and this approximation since then is very much acknowledged by scientist working in the field of synaptic neuroscience and computational neuroscientist. We were asked by several labs worldwide whether they can use our data of the perimeter analysis for modeling. We agree that our definition of the three pools can be seen as arbitrary but we never claimed that our approach is the truth but nothing as the truth. Concerning the debate whether only docked vesicles or also those very close the PreAZ should constitute the RRP we have a paper in preparation using our perimeter analysis, EM tomography and simulations trying to clarify this debate. Our preliminary results suggest that the size of the RRP should be reconsidered.

      Tissue shrinkage due to aldehyde fixation is a well-documented phenomenon that needs compensation when dealing with density values. The authors cite Korogod et al 2015 - which actually draws attention to the problem comparing aldehyde fixed and non-fixed tissue, still the data is non-compensated in the manuscript. Since all the previous publications from this lab are based on aldehyde fixed non-compensated data, and for this sake, this dataset should be kept as it is for comparative purposes, it would be important to provide a scaling factor applicable to be able to compare these data to other publications.

      We thank the reviewer for his suggestion. However, for several reasons we did not correct for shrinkage caused by aldehyde fixation. There are papers by Eyre et al. (2007) and the mentioned paper by Korogod et al. 2015 that have demonstrated that cryo-fixation reveals larger numbers of docked synaptic vesicles, a smaller glial volume, and a less intimate glial coverage of synapses and blood vessels compared to chemical fixation. Other structural subelements such as active zone size and shape and the total number of synaptic vesicles remained unaffected. In two further publications Zhao et al. (2012a, b) investigating hippocampal mossy fiber boutons using cryo-fixation and substitutions came to similar results with respect to bouton and active zone size and number and diameter of synaptic vesicles compared to aldehyde-fixation as described by Rollenhagen et al. 2007 for the same nerve terminal. This was one of the reasons not correcting for shrinkage. In addition, all cited papers state that chemical fixation in general provides a much better ultrastructural preservation of tissue samples when compared with cryo-fixation and substitution where optimal preservation is only regional within a block of tissue and therefore less suitable for large-scale ultrastructural analyses as we performed.

      Reviewer #3 (Public review):

      Summary:

      Rollenhagen et al. offer a detailed description of layer 1 of the human neocortex. They use electron microscopy to assess the morphological parameters of presynaptic terminals, active zones, vesicle density/distribution, mitochondrial morphology, and astrocytic coverage. The data is collected from tissue from four patients undergoing epilepsy surgery. As the epileptic focus was localized in all patients to the hippocampus, the tissue examined in this manuscript is considered non-epileptic (access) tissue.

      Strengths:

      The quality of the electron microscopic images is very high, and the data is analyzed carefully. Data from human tissue is always precious and the authors here provide a detailed analysis using adequate approaches, and the data is clearly presented.

      We are very thankful to the reviewer upon his very positive comments about our data analysis and presentation.

      Weaknesses:

      The study provides only morphological details, these can be useful in the future when combined with functional assessments or computational approaches. The authors emphasize the importance of their findings on astrocytic coverage and suggest important implications for glutamate spillover. However, the percentage of synapses that form tripartite synapses has not been quantified, the authors' functional claims are based solely on volumetric fraction measurements.

      We thank the reviewer for his critical comments on our findings concerning the layer-specific astrocytic coverage as also suggested by reviewer#2. As already stated above we will analyze the astrocytic coverage and the layer-specific percentage of astrocytic contribution to the ‘tripartite’ synapse in more detail. We are, however, a bit puzzled about the comment that structural anatomists usually receive that our study only provides morphological details. Our thorough analysis of structural and synaptic parameters of synaptic boutons underlie and might even predict the function of synaptic boutons in a given microcircuit or network and will thus very much improve our understanding and knowledge about the functional properties of these structures, in particular in the human brain where such studies are still quite rare. The main goal of our studies in the human neocortex was the quantitative morphology of synaptic boutons and thus the synaptic organization of the cortical column, layer by layer which to our knowledge is the first such detailed study undertaken in the human brain. Our efforts have set a golden standard in the analysis of synaptic boutons embedded in different microcircuits und is meanwhile internationally very well accepted.

      The distinction between excitatory and inhibitory synapses is not clear, they should be analyzed separately.

      As already stated above in response to reviewer#1 our study focused on excitatory synaptic boutons since they represent the majority of synapses. However, in the improved version of our manuscript in the Material and Method section we included a paragraph with structural criteria to distinguish excitatory from inhibitory terminals (see also our comment to reviewer#1 concerning this point) including appropriate citations.

      The text connects functional and morphological characteristics in a very direct way. For example, connecting plasticity to any measurement the authors present would be rather difficult without any additional functional experiments. References to various vesicle pools based on the location of the vesicles are also more complex than suggested in the manuscript. The text should better reflect the limitations of the conclusions that can be drawn from the authors' data.

      We thank the reviewer for this comment. However, it has been shown by meanwhile numerous publications that the shape and size of the active zone together with the pool of synaptic vesicles and the astrocytic coverage critically determines synaptic transmission and synaptic strength, but can also contribute to the modulation of synaptic plasticity (see also citations within the text). It has been shown that synaptic boutons can switch upon certain stimulation conditions to different modes of release (uni- vs. multiquantal, uni- vs multivesicular release) and from asynchronous to synchronous release leading also to the modulation of synaptic short- and long-term plasticity. To the second comment: When we started with our first paper about the Calyx of Held – principal neuron synapse in the MNTB (Sätzler et al. 2002) we tried to define a morphological correlate for the three functionally defined pools. As already mentioned above in our reply to the other two reviewers, this is rather difficult since synaptic vesicles are normally distributed over the entire nerve terminal. After long and thorough discussions with Bill Betz, Chuck Stevens and other leading scientist in the field of synaptic neuroscience, we together with Bert Sakmann tried to define a morphological correlate for the functionally defined pools using a perimeter analysis. We defined the readily releasable pool as vesicles 10 to 20 nm away from the presynaptic active zone, the recycling pool as those in 60-200 nm distance and the remaining as those belonging to the resting pool. However, it has been shown by capacitance measurements (see for example Hallermann et al 2003), FM1-43 investigations (see for example Henkel et al. 1996) and high-resolution electron microscopy (see for example Schikorski and Stevens 2001; Schikorski 2014) that our estimate of the RRP nearly perfectly matches with the functionally defined pools at hippocampal and cortical synapses (Silver et al. 2003). In addition, in one of our own papers (Rollenhagen et al. 2018) we also estimated the RP functionally from trains of EPSPs using an exponential fit analysis and came to similar results upon its size using the perimeter analysis.

      Of course, as stated by the reviewer the scenario could be more complex, using other criteria but we never claimed that our morphologically defined pools are the truth but nothing as the truth but we believe it offers a quite good approximation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Abstract:

      Avoid the numerous abbreviations in the abstract. The paragraph describing the results obtained in this study is too short. Include more results, such as the size of the active zone, the proportion of perforated synapses, the ratio of synapses terminating on dendrites/spines, the percentage of volume occupied by mitochondria, etc. In the last paragraph, compare the layer-specific data to other layers of the neocortex before writing the concluding sentence.

      To meet the word limits of the abstract (150 words) defined by eLife we had to use abbreviations. We followed the suggestions by the reviewer and expanded our abstract by adding the proportion of macular vs. perforated active zone and the percentage of mitochondria within an SB. However, we did not include the comparison of structural parameters in the Abstract since this is discussed thoroughly in the MS at other places (see Results and Discussion).

      Results:

      First of all, wonderful data! Lots of work, very valuable quantitative electron microscopy results.

      Main concerns:

      Adding several analyses would give much more information about the cortical synaptic organization. It would be very useful to differentiate between excitatory and inhibitory terminals (and give their ratio) and include this information in all different analyses, such as in the SV number, SV pool analysis, mitochondrion analysis, etc., that would give functional information as well. You have all the data for this, and you know how to differentiate between inhibitory and excitatory synapses, it can be done. We could see the possible morphological differences between excitatory and inhibitory synapses (maybe one is larger/has more SVs, etc. than the other). Based on these possible differences conclusions could be drawn about functional hypotheses, such as one or the other is more efficient in inducing postsynaptic potentials, excitation or inhibition is more pronounced in layer 1, etc. Furthermore, looking at the ratio of perforated synapses, we could gain information about the formation of new synapses. Maybe there is a difference between excitatory and inhibitory circuits in this point of view.

      To the first point: Since our focus was on excitatory synaptic boutons as already stated in the title we have not analyzed inhibitory SBs. To do so, we have to re-analyze our complete data which is time-consuming and an additional workload. However, we can give a ratio excitatory vs. inhibitory synaptic boutons which was between 10-15% but with layer-specific differences. Our finding are in good agreement with a recent publication in Science by the Lichtman group (Shapson-Coe et al. 2024) and work by the DeFelipe group (Cano-Astorga et al. 2023, 2024) estimating the number of inhibitory boutons in different layers of the temporal lobe neocortex as we did by 10-15%. We included a small paragraph about inhibitory synapses, their percentage and included the citations in our Results section. Concerning the ratio between macular, non-perforated vs. perforated active zones we stated the majority of synaptic boutons were of the macular, non-perforated type (~75%; see improved version of the MS). If perforated, this was found predominantly on the postsynaptic site, but quite rare in L1 SBs. Since GABAergic terminals had only a small or no clearly visible PSD this would be hard to look at.

      To the last point, it has been demonstrated that the number of dense core vesicles and their fusion with the presynaptic density could be a critical factor in the build-up of the active zone. In addition, the findings of the Geinismann group suggesting that perforated synapses are more efficient than non-perforated ones is nowadays very controversially discussed since other factors such as size of the active zone (see for example Matz et al. 2010; Holderith et al. 2012) and the astrocytic coverage contribute to synaptic efficacy and strength.

      Related to this topic: although in the case of rat CA1 pyramidal cells all inhibitory synapses terminated on dendritic shafts (Megias et al., Neuroscience 2001), please be aware that both excitatory and inhibitory synapses can terminate on both dendritic shafts and spines in humans (inhibitory synapses are though rare on spines, usually less than 10%, but they do exist, see for example Wittner et al, Neuroscience, 2001). Please, define the excitatory/inhibitory nature of the synapses based on morphological features (not on their postsynaptic target), i.e., flattened vesicles and thin postsynaptic density for GABAergic synapses, whereas larger, round vesicles and thick postsynaptic density for glutamatergic synapses. Anyway, the ratio of excitatory and inhibitory synapses on dendrites and spines in the two sublamina would also give useful information about the synaptic organization of the human neocortical layer 1.

      We are aware that not all terminals targeting on spines are excitatory, in turn it has been shown that not all terminals on shafts were inhibitory as long thought (Silver et al. 2003). However, as stated by the reviewer their abundancy on spines is rather low. At the moment it is rather unclear which functional impact inhibitory terminals on spines have, despite a local inhibition (see for example Kubota et al. eLife 2015), and thus their role is rather speculative since excitatory synapses are the predominant class on dendritic spines. As already stated above the ratio of excitatory vs. inhibitory terminals is between 10-15% and not significantly different between the two sublaminae. We are willing to add this in the results section (see in the improved version of the manuscript).

      (2) About the glial coverage: Please, specify how glial elements were determined. What were the morphological features specific to astroglial processes? In Figure 5, how could we know whether the glial element marked by green is not a spine neck? The lack of morphological features specific to glial processes makes this analysis weak. The most accurate would be to make it with the aid of GFAP staining. I know this is not possible with your existing data, but at least, provide information on how glial processes were identified.

      We used the criteria first described by Peters et al. (1991) and Ventura and Harris (1999) identifying astrocytic profiles by their irregular stellate shape, relatively clear cytoplasm, numerous glycogen granules and bundles of intermediate filaments. After more than 20 years of structural investigations, we hope that the reviewers will believe us that we can identify astrocytic processes at the high-resolution TEM level. In some of our publications (Rollenhagen et al. 2007; 2015; 2018; Yakoubi et al. 2019a) we have used glutamine synthetase pre-embedding immunhistochemistry to identify astrocytic processes, but a disadvantage of this method is the reduction of the ultrastructural preservation of the tissue. We have included the criteria to identify astrocytic processes of glial coverage in our manuscript together with the two citations (see improved version of the manuscript).

      (3) The authors state that the total number of SVs was very variable. How was the distribution of the number of SVs? Homogenous distribution suggests that different types of synapses cannot be distinguished based on their morphological features, whereas distribution with more than one peak would suggest that different types of synapses are present in L1, and that they can be differentiated by their appearance (number of SVs, for example). This might be also related to the type of synapse (i.e., excitatory or inhibitory). The same applies to the number of RP and resting pool SVs.

      To look for differences in structural and synaptic parameters that can further classify synaptic boutons we have performed a hierarchical cluster and multivariance analysis. However, it turned out that according to structural and functional parameters no further classification into subtypes could be done.

      (4) The authors should check and review extensively for improvements to the use of English. The Results and Discussion sections contain many sentences which are not easy to understand. They have either a too complicated structure, or they are incomplete and hard to follow. Few examples: "The RRP/PreAZ at p20 nm criterium was on average 19.05 {plus minus} 17.23 SVs (L1a: 25.04 {plus minus} 21.09 SVs and L1b: 13.07 {plus minus} 13.87SVs) and thus nearly 2-fold larger for L1a." If you take out the parenthesis, the sentence has no meaning. "The majority of SBs in L1 of the human TLN had a single at most three AZs that could be of the non-perforated macular or perforated type comparable with results for other layers in the human TLN but by ~1.5-fold larger than in rodent and non-human primates." Rephrase these types of sentences, please.

      We partially agree with the reviewer. We have improved our manuscript by rephrasing and shortening sentences.

      Other suggestions:

      (1) Put the synaptic density part after the description of the neuronal and synaptic composition part, it is more logical this way (i.e., first qualitative description, the distinction between sublayers, then quantitative data). Please write down in the description of the neuronal and synaptic composition part how L1a and L1b were differentiated (see also my comment on Figure 1).

      We agree with the reviewer and did the change according to the suggestion. For a better understanding, we have also expanded the neuronal and synaptic description of the two sublaminae in L1.

      (2) Introduce a list of abbreviations at the beginning, that would help.

      It is quite unusual to provide a list of abbreviations in eLife. However, when used first the full meaning of the abbreviations is now given.

      (3) What is cleft width? Usually, it refers to the distance between the pre- and the postsynaptic membrane, but here, I think it refers to the size (diameter) of the active zone. Please, clarify in the Result section (as it appears earlier than the Methods section, where it is explained). I would probably use the expression "synaptic cleft size" instead of "synaptic cleft width" to avoid misunderstanding.

      We thank the reviewer for the suggestion and used synaptic cleft size for better clarity and have transferred the sentence from the Material and Methods to the Results section.

      (4) The description of the different SVs (RRP, RP, etc.) is not clear in lines 236-242. What does it mean, that RRP vesicles are located {less than or equal to}10 nm and {less than or equal to}20 nm from the active zone? Explain, why the two different distance criteria were used. Furthermore, how were the vesicles located at p20-p60 defined? Why were these vesicles not considered in the determination of the different pools?

      As stated in the public review to the reviewers concern we have tried to define a morphological correlate to the three functionally defined pools. After thorough discussions, with leading scientists in the field of synaptic neuroscience we have decided to use the distance of individual vesicles from the PreAZ and sort vesicles upon these criteria. One can argue that this approach is random, however, these distance criteria were described by Rizzoli and Betz (2004, 2005) and Denker and Rizzoli (2010). As also stated in the public review there is still a controversial discussion whether only docked or omega-shaped SVs constitute the RRP. We decided that also those very close within 10 and 20 nm away from the PreAZ, which is less than a SV diameter may also contribute to the RRP since it was shown that SVs are quite mobile.

      (5) Please, explain how the number of docked vesicles can be 3x larger in L1b, than the number of vesicles located at p10? Docked vesicles are the closest (with the membrane touching the PreAZ)... if this comes from the fact that another pool of boutons was used for the EM tomography analysis, then the entire pool of boutons analyzed, then it means that the selection of boutons for the EM tomography is highly biased. This also implies that EM tomography data are most probably not valid for the entire L1b. The difference might also come from the different ratios of dendrite/spine synapses included in the two different analyses. In this case, it would be helpful to distinguish between synapses terminating on dendrites/spines and analyse them separately (same as for inhibitory/excitatory, which is not exactly the same as dendrite/spine!). Different n numbers of synapses are given in the text (n=25, 25, 25 25) and in Table 2 (n=91, 98, 87, and 84) for the analysis of the docked vesicles, please, correct this.

      This is a correct value and thus there is a nearly 3-fold difference. The TEM tomography was carried out on the same blocks that have been used for our 3D-volume reconstructions. To carry out TEM tomography we had to use thicker sections (250 nm) to look for complete SBs as we also did in our serial sections, but of course, we could not quantify the same SBs. The completeness of SBs was one of our main criteria to reconstruct structural and synaptic parameters. The second was that the synaptic cleft was cut perpendicular. Only SBs that met these criteria were chosen for further quantitative analysis. In this respect we are of course biased in both methods.

      Secondly, as already stated we did not quantify inhibitory terminals in serial sections. However, we did not find significant differences between shaft vs. spine synapses.

      Finally, in Table 2 the total number of ‘docked’ SVs is given analyzed from the total number of SBs analyzed.

      Discussion:

      Please include the recent findings of human L1 neurons, including the "rosehip" cells in the L1 neuronal network, see Boldog et al., Nat Neurosci 2018. It would be also useful to consider in the discussion the human-specific cortical synchrony and integration phenomena derived from in vitro data (Mansvelder, Lein, Tamas, Wittner, Larkum, Huberfeld labs, etc.), and how the synaptic morphology can be related to these.

      We thank the reviewer and include the reference in our chapter functional significance.

      Figures and Tables:

      Figure 1: In the legend, it is written that CR cells are marked by an asterisk, but on the figure it is marked by arrowheads. H: I would put the dashed line slightly lower, just above the two neuronal cell bodies. Now it looks like in the middle of the astrocytic layer. One of the asterisks marking the CR cell is not above the nucleus of that cell. I: the gabaergic neuron is outside of the framed area. I would delete the frame, anyway, the arrowheads and the asterisk are enough to show what the authors want to show.

      We have changed the Figure according to the suggestions raised by the reviewer.

      Figure 3: The transparent yellow is not visible. It is a bit disturbing that the contours of the boutons are not visible, I would make the transparent yellow stronger (less transparent). The SVs in green/magenta will be still visible.

      We wanted to highlight the internal subelements of SBs and thus made the covering transparent but we think it is still visible.

      Figure 6C: The data concerning other layers than L1 are most probably taken from other publications of the research group. One is cited (for L6), but not the others. Please correct this, or if not, then write this in the Results and Methods.

      We changed the citation in the improved version of the manuscript. We overlooked that the values for L4 and L5 were already published in Schmuhl-Giesen et al. 2022.

      Table 1: What does central and lateral cleft width mean in Table 1? Furthermore, please, give the name for abbreviations CV and IQR in Tables 1 and 2.

      The measurements of the synaptic cleft are now described in detail in the Results section. We now have given the full names for CV and IQR in the legends of tables 1 and 2.

      Supplemental Figures 1 and 2: Why Hu01 and Hu02 are twice? What is the difference? Based on the figure legend, it is L1a and L1b? If yes, please, indicate on the figure or in the legend.<br /> Supplemental Table 1: What is TLE in the case of Hu_04? If it is temporal lobe epilepsy, then why age at epilepsy onset is missing?

      Yes, Hu01 and Hu02 were selected for both L1a and L1b in separate serial sections preparations each. We indicated this now in the figure legend. Concerning Hu_04, unfortunately we do not have any further information about the medical background of the patient.

      Supplemental Table 1 (Patient table), that there are many abbreviations explained which do not appear in the table (lBAZ: Brivaracetam CBZ: Carbamazepine; CLB: Clobazam; ESL: Eslicarbazepin; GGL: Ganglioglioma, etc.), please check and correct.

      We have removed the unnecessary abbreviations.

      Other minor suggestions:

      What is Pr? Please, give the name a first appearance (line 368).

      We explained Pr (release probability) when used for the first time.

      Give the name for t-LDT, please (lines 442-443).

      We explained t-LTD (timing-dependent long-term depression) when used for the first time.

      Typo in line 169: DCW instead of DCV (dense core vesicle), DCV is used in the figure legends.

      We changed DCW to DCV.

      Typo in line 190: Yokoubi instead of Yakoubi (reference).

      We changed Yokoubi to Yakoubi.

      Typo in line 237: Rizzoloi instead of Rizzoli (reference).

      We changed Rizzoloi to Rizzoli.

      Line 229-230: One reference is not inserted properly - Piccolo and Bassoon.

      The reference of Schoch and Gundelfinger and Murkherjee to the build-up of the active zone and the role of DCV containing Piccolo and Bassoon are properly cited in the text.

      Typo in line 398: exit instead of exist.

      Corrected

      Typo in line 700: Reynolds (1063) instead of 1963.

      Corrected

      Reviewer #2 (Recommendations for the authors):

      Abstract:

      The last sentence seems far-fetched, and unrelated to the manuscript. How mostly single active zone boutons can "mediate, integrate and synchronize contextual and cross-modal information, enabling flexible and state-dependent processing of feedforward sensory inputs from other layers of the cortical column"? Which of the anatomical findings of the manuscript led to these conclusions?

      According to the review by Schuman et al. (2021) layer 1 is regarded as a layer that mediate, integrate and synchronize contextual and cross-modal information, enabling flexible and state-dependent processing of feedforward sensory inputs from other layers of the cortical column to which the structural quantitative 3D- models of SBs contribute since they are an integral element connecting neurons and building networks.

      I am also puzzled by the authors' statement in more than one place of the manuscript that "L1a can be characterized as a predominantly astrocytic sublamina". If the L1 contains the lowest measured volume ratio of glial processes (Figure 6), then this description does not seem to hold. Please rephrase.

      The reviewer is right and we rephrased the sentences for more clarity in the improved version of our manuscript.

      Results:

      The authors find large inter-patient variability in the synapse density at L1, which raises the issue of what were the criteria to include certain patients in the analyses. Apparently, these are different from the ones analysed in their previous papers, and all the provided parameters were different (sex, age, medication, onset of epilepsy), and any of them can result in altered synapse density.

      First, we have not used all patients for this study. Secondly, it was not possible to use all patients for all six layers.

      It would be useful to add a panel for Figure 1 with synapse density across the different layers, as they provide this data in the Discussion.

      We implemented a Supplementary Table 1 with the synaptic density values over all layers compared in the Discussion.

      I cannot find Source Data 1 in the manuscript although it is referred to in more than 1 place (e.g. page 5 line 100).

      Source data were uploaded when our manuscript was submitted directly to eLife as Supplemental Material. However, as stated by bioRxiv ‘any Supplemental Materials associated with this manuscript have not been transferred to bioRxiv to avoid the posting of potentially sensitive information’ all source data have not been uploaded to the preprint server.

      Page 5 line 100 the correct value is 7.3*107 or rather 108?

      We corrected the value in the improved version of the MS.

      It would be nice to put the synapse density values into context by comparing them to e.g. mouse, rat, or monkey data.

      Since we are working on the human temporal lobe neocortex we avoided to compare those data with those estimated in experimental animals. In addition as discussed by DeFelipe et al. (1999) different methods were used to quantify synaptic density in experimental animals so these results are difficult to compare.

      Page 5 Line 117 CR-cells stands for Cayal-Retzius cells?

      CR-cells is the abbreviation for Cajal-Retzius cells.

      Page 6 Line 146 repeated sentence.

      We deleted the repeated sentence.

      Page 7 Line 154 "file-scale TEM" ??

      We replaced file-scale by fine-scale.

      Page 7 Line 164 "GABAergic synapses identified by the smaller more spherical SVs". With this fixation condition, GABAergic vesicles are more ovoid than glutamatergic ones. What were the criteria to distinguish them?

      To our knowledge in meanwhile numerous publications using the same fixation inhibitory terminals contain more spherical and smaller and not roundish synaptic vesicles and showed no clear prominent PSDs as described in our paper. We have addressed that more clearly in the results section of the improved version of the MS.

      Page 8 line 197 "The majority (~98%) of SBs in L1a and L1b had only a single (Figures 2C-E, 3A-C, E) at most two or three AZs" is in striking contrast with the other statement from page 7 Line 163 "Numerous SBs in both sublaminae were seen to establish either two or three synaptic contacts on the same spine or dendrite". Which of these statements is valid? Please provide exact quantification for this statement and decide which one is true.

      It is true that the majority of synaptic boutons had a single active zone. However, for example on a spine not only a single but also two or three SBs can be found. We have rephrased this sentence for more clarity.

      Page 9 Line 206 "L1 AZs did not show a large variability in size as indicated by the low SD, CV, and variance (Table 1)" Is this inter-patient variance of mean values? As in Supplementary Figure 1, both the SBs volume and PreAZ area show large variability in a given patient sample. Only the inter-patient variability of mean values seems low. Please state it clearly throughout the MS for other datasets as well.

      For clarity concerning the variability between patients and structural parameters we have generated box plots (Suppl. Figures 1 and 2).

      Page 9 Line 208 data is on Figure 5A and not 8A.

      We thank the reviewer and corrected the citation of the Figure

      Page 12 Line 295 how can the number of docked vesicles for L1b be larger than the one measured by the perimeter p10 nm? This later should contain the docked and PreAZ membrane proximal pool as well. This difference is even larger if we assume, that at EM tomography only partial AZs were analysed in a 200 nm thick section, not the entire AZ as for the perimeter measurement. Can the authors provide density estimates by dividing the docked / p10 nm vesicle numbers with the AZ area and comparing them?

      This is a result comparing both methods. To the second concern: As stated in the text only synaptic boutons were the active zone can be followed from the beginning to its end and were the synaptic cleft was cut perpendicular were included in the TEM tomography sample as we also did in our 3D-volume reconstructions.

      Methods:

      Page 25 Line 624 While the PSD area can be equivocally measured, due to the dense appearance of the PSD on the EM images, the PreAZ is more difficult to outline due to lack of evident anatomical markers except the synaptic cleft (the dense material is much thinner). That is why in many publications the PreAZ area is considered to be identical to the PSD area. What are the anatomical criteria used here for the PreAZ? Why do the authors correct the PSD area, which is easy to measure with the PreAZ area that is much less certain to outline?

      As stated in material and Methods both the pre- and postsynaptic densities are not defined by placing a closed contour in both densities because one can’t be certain that the dense accumulation of particles defining both areas since the impregnation (staining) and contrast of both structures critically depends on the uranyl and lead staining which could led to misinterpretation due to different staining results. That’s why we have drawn a contour line from the beginning to the end of the presynaptic density and extrapolated that for the postsynaptic density (for details see Material and Methods). In our samples both the pre- and postsynaptic densities were always clearly visible in those boutons further analyze.

      Page 26 Line 640 vesicle density measurement: All the synaptic vesicles that are in the 50 nm thick section in their entirety are missed, and there are methods based on EM tomography to correct these estimations. One can not assume, that the error caused by "double counts" of vesicles cancels for the lost ones. There are stereological methods to estimate both types of error please include them and correct the values.

      We would like to point out that the whole body of our work to structural analysis of vesicle pools is based on image data stemming from transmission electron microscopy (TEM) generating a projection of the entire volume of the ultra-thin section and NOT from scanning electron microscopy (SEM) where only a small volume close to the surface of the section would be captured. Operating in TEM mode ensures that no vesicle is missed only because it is embedded in its entirety in the section as postulated by the reviewer. Hence, EM tomography, which is basically a TEM operating from different incident angles in relation to the specimen or section, does not provide any advantage in detecting these vesicles. It does, however, help to better position a 3D object within the section volume itself and therefore allows to detect objects that could overlap from one viewing angle by using another angle. As the average vesicle diameter is of similar size compared to the section thickness, the possibility of a complete overlap to happen, however, is almost zero. And as we only count clear ring-like structures, a stereological correction factor calculated according to Abercrombie (1946) would underestimate real counts (see also Saetzler et al. 2002). If there is, however, relevant literature on "methods based on EM tomography" and "stereological methods to estimate both types of error" (over- and underestimates) that we are missing out on, we would appreciate the reviewer providing us with the corresponding references so that we can include such calculations in our paper.

      Page 27 Line 664 and 665 "sections" are still tissue blocks, as sectioning comes after if the process is correctly written. Please correct.

      We have corrected this according to the reviewer’s comment.

      Page 43 Figure 4 D Data for L1b is missing, only the correlation line is visible.

      Corrected in a new Figure.

      Page 44 Figure 5 C arrowheads are in the correct places? Some of them do not seem to point to the edge of the synapse.

      We carefully checked the Figure and adjusted the arrowheads.

      Figure 5 E lower arrowhead labels something, that is difficult to identify but does not seem to be a vesicle.

      We agree with the reviewer on this point and changed the figure accordingly.

      Figure 5 F, the upper vesicle is at least 10 nm apart from the PreAZ membrane. Did the authors consider it as docked (indicated with arrowhead, according to the legend it labels docked vesicles)?

      We agree with the reviewer on this point and changed the figure accordingly.

      Page 45 Figure 6 B one of the 2 synaptic boutons (sb), sb2 has a tangential active zone that precludes the identification of the pre- and post-synaptic membranes, still 2 "docked vesicles" are labeled. How were they classified as docked? Please remove these tangential synapses from the dataset, as membranes can not be identified.

      The reviewer is right that the active zone is tangentially cut, however, the two vesicles are associated with the AZ. In addition, we did not use this AZ for vesicle data analysis.

      Page 46 Line 1124 interneuron axon labelled in green not brown.

      Corrected as suggested by the reviewer.

      Line 1129 SStC is missing.

      Changed according to the reviewer’s comment.

      Page 48 Table 2 Number of docked vesicles Median values are rounded to integer values? If yes why?

      The statistic package used rounded to the given values.

      Page 51 Supplementary Table 1 Hu_04 Histopathology, what does TLE stands for?

      TLE: temporal lobe epilepsy. We included the abbreviation in the legend of Supplementary Table1, that is now table 2.

      Reviewer #3 (Recommendations for the authors):

      (1) Reanalysis of astrocytic coverage based on the % of synapses that form tripartite synapses.

      We have reanalyzed the data concerning this point (new Figure 6D).

      (2) Segregation of excitatory and inhibitory synapses.

      We have now included a paragraph in our results section to distinguish between excitatory and inhibitory synapses.

      (3) Better explanation of the limits of the study to assess functional parameters.

      We disagree with the reviewer on this point and have not included an explanation concerning the limits of this study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This useful study uses high-field fMRI to test the hypothesized involvement of subcortical structure, particularly the striatum, in WM updating. It overcomes limitations in prior work by applying high-field imaging with a more precise definition of ROIs. Thus, the empirical observations are of use to specialists interested in working memory gating or the reference back task specifically. However, evidence to support the broader implications, including working memory gating as a construct, is incomplete and limited by the ambiguities in this task and its connection to theory. 

      We would like to express our gratitude to the editor and the reviewers for their time and effort in providing insightful and valuable comments. We greatly value the critical perspective on the relationship between fMRI contrasts and the PBWM model. We hope to have addressed all the last critical points and changed the manuscript according to the reviewers’ suggestions. Furthermore, we would like to point out that the behavioral results section was edited, as a double-check of the results section revealed some erroneous descriptive statistics.

      Public Reviews:

      Reviewer #1:

      Summary: 

      Trutti and colleagues used 7T fMRI to identify brain regions involved in subprocesses of updating the content of working memory. Contrary to past theoretical and empirical claims that the striatum serves a gating function when new information is to be entered into working memory, the relevant contrast during a reference-back task did not reveal significant subcortical activation. Instead, the experiment provided support for the role of subcortical (and cortical) regions in other subprocesses. 

      Strengths: 

      The use of high-field imaging optimized for subcortical regions in conjunction with the theory-driven experimental design mapped well to the focus on a hypothetical striatal gating mechanism. 

      Consideration of multiple subprocesses and the transparent way of identifying these, summarized in a table, will make it easy for future studies to replicate and extend the present experiment.   

      Weaknesses: 

      The reference-back paradigm seems to only require holding a single letter in working memory (X or O; Figure 1). It remains unclear how such low demand on working memory influences associated fMRI updating responses. It is also not clear whether reference-switch trials with 'same' response truly tax working-memory updating (and gate opening), as the working-memory content/representation does not need to be updated in this case. These potential design issues, together with the rather low number of experimental trials, raise concerns about the demonstrated absence of evidence for striatal gate opening. 

      We acknowledge that a limitation of our study is that the task involved relatively low working memory demands. It remains to be clarified whether the same neural mechanisms would be engaged under a higher working memory load, and this is an important consideration for future research.

      We also fully agree that it is uncertain whether reference-switch trials requiring a ‘same’ (or ‘match’ ) response truly engage working memory updating (or gate opening), as the working memory content or representation does not need to be altered in these cases. This concern is addressed in detail in the discussion section titled “No Support for Striatal Gate Opening” (see second paragraph).

      Regarding our references to dopamine, we completely agree with the reviewer about the speculative nature of these discussions. In response, we thoroughly reviewed the manuscript and made revisions where necessary to ensure that we consistently emphasize the speculative nature of our commentary on dopamine and dopaminergic pathways.

      Finally, we acknowledge the concerns about the design and the relatively low number of trials. However, our fMRI analyses of other reference-back task contrasts did reveal activity in the striatum and other subcortical ROIs. This suggests that our scanning protocol and task design are sufficiently sensitive to detect striatal activity, even with the limited number of trials.

      The authors provide a motivation for their multi-step approach to fMRI analyses. Still, the three subsections of fMRI results (3.2.1; 3.2.2; 3.3.3) for 4 subprocesses each (gate opening, gate closing, substitution, updating mode) made the Results section complex and it was not always easy to understand why some but not other approaches revealed significant effects (as the midbrain in gate opening). 

      We thank the reviewer for this important remark and the opportunity to clarify our approach. We conducted whole-brain general linear models (GLMs) to generate a comprehensive wholebrain map of brain activity for each contrast. However, the whole-brain statistical parametric mappings (SPMs) involve data smoothing, which–while improving signal detection–reduces spatial precision. This is especially problematic in smaller or closely adjacent regions, where spatial blurring can merge distinct activations or make localized signals appear more widespread.

      Additionally, the statistical thresholds in whole-brain analyses may detect weak or borderline significant effects, whereas ROI-wise GLMs, which assume uniform behavior across the entire region, may miss the same effects if the signal is weak or inconsistent across the ROI.

      Since our primary focus was on the subcortex, we relied more heavily on ROI-wise GLMs, which were limited to subcortical regions. We prioritized findings that were supported by either the ROI-wise GLMs or by both GLM analyses. For instance, the midbrain activations found in our whole-brain analysis but not in the ROI analysis may result from smoothing (where activation from neighboring regions spreads into midbrain voxels) or from functional heterogeneity within the ROI, which can obscure localized activations when averaged in the ROI-wise GLMs. Inferences from each GLM approach, along with their discrepancies, are discussed for each contrast throughout the discussion, with additional details on the clusterbased ROI analysis in the discussion section titled “Dopaminergic involvement in working memory substitution” (see third paragraph).

      We acknowledge that the results section may seem complex, and we apologize for any inconvenience this may cause.

      Reviewer #2:

      Summary: 

      The study reported by Trutti et al. uses high-field fMRI to test the hypothesized involvement of subcortical structure, particularly striatum, in WM updating. Specifically, participants were scanned while performing the Reference Back task (e.g., Rac-Lubashevsky and Kessler, 2016), which tests constructs like working memory gate opening and closing and substitution. While striatal activation was involved in substitution, it was not observed in gate opening. This observation is cited as a challenge to cortico-striatal models of WM gating, like PBWM (Frank and O'Reilly, 2005). 

      Strengths: 

      While there have been prior fMRI studies of the reference back task (Nir-Cohen et al., 2020), the present study overcomes limitations in prior work, particularly with regard to subcortical structures, by applying high-field imaging with a more precise definition of ROIs. And, the fMRI methods are careful and rigorous, overall. Thus, the empirical observations here are useful and will be of interest to specialists interested in working memory gating or the reference back task specifically. 

      Weaknesses: 

      I am less persuaded by the more provocative points regarding the challenge it presents to models like PBWM, made in several places by the paper. As detailed below, issues with conceptual clarity of the main constructs and their connection to models, like PBWM, along with some incomplete aspects of the results, make this stronger conclusion less compelling. 

      (1) The relationship of the Nir-Cohen et al. (2020) task analysis of the reference back task, with its contrasts like gate opening and closing, and the predictions of PBWM is far from clear to me for several reasons. 

      First, contrasts like gate opening and gate closing make strong finite state assumptions. As far as I know, this is not an assumption of PBWM, certainly not for gate opening. At a minimum, PBWM is default closed because of the tonic inhibition of cortico-thalamic dynamics by the globus pallidus. Indeed, this was even noted in the discussion of this paper, which seems to acknowledge this discrepancy, but then goes on to conclude that they have challenged the PBWM model anyway.  

      We thank the reviewer for this remark and agree that the reference-back task contrasts do not perfectly align with the predictions of the PBWM model. In the discussion section "No support for striatal gate opening," we note that our data support the PBWM model by emphasizing the central role of the basal ganglia in working memory processes. However, we acknowledge that it may not have been sufficiently clear in the manuscript that the way the reference-back task is operationalised does not allow for a precise test of the PBWM's gating predictions. To address this, we have revised the manuscript to shift focus away from framing it as a direct challenge to the PBWM model. Below, some edits are highlighted.

      ‘This contrasts with the findings of Nir-Cohen et al. (2020) and raises questions about the relationship between the gate opening process in the reference back task and the indirect striatal gating mechanism described in the PBWM model (Frank et al., 2001; Hazy et al., 2007; O’Reilly & Frank, 2006) and other neurocomputational theories (Hazy et al., 2007; Jongkees, 2020). According to these models, a dopaminergic signal in the striatum is required to trigger gating. Although the orthogonal contrasts in the referenceback task are intended to isolate working memory subprocesses inspired by models of working memory, the two gating contrasts do not fully capture the gating mechanism as originally proposed in neurocomputational models (Frank et al., 2001; Hazy et al., 2007; O’Reilly & Frank, 2006).’ (line 721-730)

      ‘Another explanation for the lack of enhanced striatal activity in gate opening challenges the conceptualization of the gating mechanism in the reference-back task, which does not accurately map onto the PBWM predictions.’ (line 746)

      ‘Moreover, despite the lack of striatal involvement during gate opening, our findings do not rule out the possibility that the PBWM model's predictions about striatal gating in working memory are correct, given the misalignment between the gate opening contrast and the PBWM’s proposal regarding striatal gating. It remains unclear whether the absence of striatal activation during gate opening trials is specific to low-demand tasks, like the reference-back task, which does not require as much gating compared to high working memory-demand tasks involving preparation for updating. Or whether the gate opening contrast does not sufficiently capture the PBWM proposed gating mechanism. Further investigation is needed to determine whether (dopamine-driven) striatal gating occurs in high-demand working memory tasks, where the gating process plays a more critical role.’

      Second, as far as I know, PBWM emphasizes go/no-go processes around constructs of input- and output-gating, rather than state shifts between gate opening and closing. While this relationship is less clear in reference back, substituting task-relevant items into working memory does appear to be an example of input gating, as modeled by PBWM. Thus, it is not clear to me why the substitution contrast would not be more of a test of input gating than the gate opening contrast, which requires assumptions that are not clear are required by the model, as noted above. 

      We fully agree with the reviewer, which is why we proposed that neural mechanisms involving the midbrain and striatum are more likely to be observed in the substitution contrast rather than the gate opening contrast.

      Third, PBWM relies on striatal mechanisms to solve the problem of selective gating, inputting, or outputting items in memory while also holding on to others. Selective gating contrasts with global gating, in which everything in memory is gated or nothing. The reference back task is a test of global gating. It is an important distinction because non-striatal mechanisms that can solve global gating, cannot solve selective gating. Indeed, this limitation of non-striatal mechanisms was the rationale for PBWM adding striatum. The connectivity of the striatum with the cortex permits this selectivity. It is not clear that the reference back task tests these selective demands in the first place. That limitation in this task was the rationale behind the recent Rac-Lubashevsky and Frank (2022) paper using the reference back 2 procedure that modifies the original reference back for selective gating. 

      We thank the reviewer for highlighting this excellent reference. We believe it holds exciting potential for future high-field fMRI studies that explore the neural mechanisms underlying selective gating.

      So, if the primary contribution of the paper is to test PBWM, as suggested by the first line of the abstract, then it is not clear that the reference back task in general, or the gate opening contrast in particular, is the best test of these predictions. Other contrasts (substitution), or indeed, tasks (reference back 2) would have been better suited. 

      We agree with the reviewer that the gate opening contrast may not be the optimal test for the PBWM model predictions. However, previous studies have found evidence of striatal gateopening mechanisms using the reference-back task, which cannot be overlooked. We hypothesized that striatal mechanisms are likely active only when working memory content requires replacement, as seen in the substitution contrast in line with the PBWM model. Additionally, the reference-back 2 task (Rac-Lubashevsky & Frank, 2021) had not yet been published when we began data collection. Exploring this task in future studies, particularly with a 7 T fMRI protocol optimized for subcortical regions, would be an exciting avenue for further investigation.

      Finally, in response to the reviewer’s remark, we have revised the abstract to remove the emphasis on challenging the PBWM model.

      (2) In general, observations of univariate activity in the striatum have been notoriously variable in the context of WM. Indeed, Chatham et al. (2014) who tested working memory output gating - notably in a direct test of the predictions of PBWM - noted this variability. They too did not observe univariate activation in the striatum associated with selective output gating. Rather they found evidence of increased connectivity between the striatum and cortex during selective output gating. They argued that one account of this difference is that striatal gating dynamics emerge from the balance between the firing of both Go and NoGo cell populations that decide whether to gate or not. It is not always clear how this balance should relate to univariate activation in the striatum. Thus, the present study might also test cortico-striatal connectivity, rather than relying exclusively on univariate activation, in their test of striatal involvement in these WM constructs. 

      We appreciate the reviewer’s insightful observation regarding the variability of univariate activity in the striatum, particularly in the context of working memory and the challenges noted by Chatham et al. (2014). We agree that striatal gating dynamics likely reflect a balance between Go and NoGo cell populations, which may not always manifest in univariate activation alone. In line with the reviewer’s suggestion, examining cortico-striatal connectivity could provide a more comprehensive understanding of striatal involvement in working memory processes, particularly selective gating.

      While our current study focused primarily on univariate activity, we recognize the importance of connectivity-based approaches and plan to incorporate functional connectivity analyses in future studies to further explore these dynamics. Such an approach, especially when combined with ultra-high-field fMRI, may offer valuable insights into the interaction between the striatum and cortex during working memory tasks.

      (3) It is concerning that there was no behavioral cost for comparison switch vs. repeat trials. This differs from with prior observations from the reference back (e.g., Nir-Cohen et al., 2020), and in general, is odd given the task switch/cue interpretation component. This failure to observe a basic behavioral effect raises a concern about how participants approached this task and how that might differ from prior reports of the reference back. If they were taking an unusual strategy, it further complicates the interpretation of these results and the implications they hold for theory. 

      We understand the reviewer’s concern regarding the lack of behavioral response time costs for comparison switch versus repeat trials, which does indeed differ from previous findings in studies such as Nir-Cohen et al. (2020). It is possible that this results from our fMRI task design, such as increased inter-trial intervals compared to behavioral studies. While this is certainly a point of concern, we believe that the neural data still provide valuable insights into the mechanisms underlying working memory gating despite the absence of a clear behavioral effect.

      In future studies, we aim to increase the number of trials and more closely align our task design with previous studies to mitigate this issue. We agree that further investigation is necessary to ensure the robustness of these effects and their theoretical implications.

      In summary, the present observations are useful, particularly for those interested in the reference back task. For example, they might call into question verbal theories and task analyses of the reference back task that tie constructs like gate-opening to striatal mechanisms. However, given the ambiguities noted above, the broader implications for models like PBWM, or indeed, other models of working memory gating, are less clear.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Weaknesses (Reviewer 1):

      The role of Fgf signaling in gliogenesis and Foxg1 in neurogenesis is well known. It is not clear if Fgf18 is a direct target of Foxg1.

      We agree with the reviewer- Fgf signaling is an established pro-gliogenic pathway (Duong et al 2019) and Foxg1 overexpression is known to promote neurogenesis in cultured neural stem cells (Branacaccio et al 2019). Our study links these two mechanisms, as the Reviewer has summarized: (a) we demonstrate that FOXG1 works via modulating Fgf signaling cell-autonomously within progenitors by regulating the levels of Fgfr3. (b) Loss of Foxg1 in postmitotic neurons results in the upregulation of Fgf ligand expression (possibly via indirect mechanisms) and this non-cell autonomously increases Fgf signaling in progenitors_. Our study is entirely performed _in vivo.

      Revision: We have revised the manuscript to reflect that Fgf18 may be an indirect target of FOXG1 in postmitotic neurons.

      Weaknesses (Reviewer 2):

      It wasn't clear to me why the authors chose postnatal day 14 to examine the effects of Foxg1 deletion at E15 - this is a long time window, giving time for indirect consequences of Foxg1 deletion to influence development and thereby potentially complicating the interpretation of findings. For example, the authors show that there is no increased proliferation of astrocytes or death of neurons lacking Foxg1 shortly after cre-mediated deletion, but it remains formally possible (if perhaps unlikely) that these processes could be affected later during the time window. The rationale underlying the choice of this time point should be explained.

      I don't agree with the statement in the very last sentence of the results section that "neurogenesis is not possible in the absence of [Foxg1]" as there are multiple reports in the literature demonstrating the presence of neurons in Foxg1-/- mice (eg: Xuan et al., 1995; Hanashima et al., 2002, Martynoga et al., 2005, Muzio and Mallamaci 2005). Perhaps the statement refers specifically to late-born cortical neurons. This point also arises in the discussion section.

      Revisions:

      (a) We have revised the manuscript to explain why we chose postnatal day 14 to examine the effects of Foxg1 deletion at E15.

      ●  We have examined the transcriptomic dysregulation after Foxg1 deletion at E17.5, which is a reasonable period to identify potential direct targets. Furthermore, FOXG1 occupies the Fgfr3 locus in ChIP-seq performed at E15.5. Together, these support the interpretation that Fgfr3 is a direct target of Foxg1.

      ● As the Reviewer notes, we have investigated the possibility of increased proliferation of astrocytes and death of neurons and found no evidence suggesting these phenomena occur in the 3 days after loss of Foxg1. Cortical neurons are postmitotic and differentiated by E18.5, the stage at which we examined CC3 staining and found no difference in cell death in control and mutants (Supplementary Figure S2C, C’). The majority of progenitors (PAX6+ve cells) that lose Foxg1 at E15.5 express the gliogenic transcription factor NFIA by E18.5 (Figure 2C, C’), but hardly any express intermediate (neurogenic) progenitor marker TBR2 (Supplementary Figure S2B, B’). It is therefore unlikely that neurons are born from Foxg1 mutant progenitors and then die at a later stage.

      ● The cellular consequences of loss of Foxg1 require additional time to detect e.g. it takes ~ 5 days for GFAP to be detected in astrocytes once they are born. The P14 timepoint permits the assessment of oligogenesis which begins after astrogliogenesis and therefore permits a comprehensive assessment of the lineage of E15.5 Foxg1 null progenitors.

      (b) Thank you for pointing out that the last sentence of the results section implied (incorrectly) that ALL neurogenesis is not possible in the absence of Foxg1 We have modified this (and the discussion) to reflect that this applies to E14/15 progenitors and late-born cortical neurons.

      Recommendations for the authors (Reviewer 2):

      (c) We thank the reviewer for this suggestion. We will modify the schematic (Figure 7) to remove any ambiguity regarding Foxg1 expression.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Ma et al. describes a multi-model (pig, mouse, organoid) investigation into how fecal transplants protect against E. coli infection. The authors identify A. muciniphila and B. fragilis as two important strains and characterize how these organisms impact the epithelium by modulating host signaling pathways, namely the Wnt pathway in lgr5 intestinal stem cells.

      Strengths:

      The strengths of this manuscript include the use of multiple model systems and follow up mechanistic investigations to understand how A. muciniphila and B. fragilis interacted with the host to impact epithelial physiology.

      Weaknesses:

      As in previous revisions, there remains concerning ambiguity in the methodology used for microbiota sequence analysis and it would be difficult to replicate the analysis in any meaningful way. In this revision, concerns about the rigor and reproducibility of this component of the manuscript have been increased. Readers should be cautious with interpretation of this data.

      (1) In previous versions of the manuscript it would appear the correct bioproject accession was listed but, the actual link went to an unrelated project. The updated accession link appears to contain raw data; however, the authors state they used an Illumina HiSeq 2500. This would be an unusual choice for V3-V4 as it would not have read lengths long enough to overlap. Inspection of the first sample (SRR19164796) demonstrates that this is absolutely not the raw data, as there is a ~400 nt forward read, and a 0 length reverse read. All quality scores are set to 30. There is no logical way to go from HiSeq 2500 raw data and read lengths to what was uploaded to the SRA and it was certainly not described in the manuscript.

      What we uploaded to the SRA was Contigs files for sample, we have modified the description on line 694.

      (2) No multiple testing correction was applied to the microbiome data.

      The alpha diversity indexes were tested using T-test and wilcox test, and we showed the result of T-test in FigureS1B. The p-values were corrected for multiple testing using the Benjamini-Hochberg method, we have modified the description on line 322.

      ---------

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      Ma X. et al proposed that A. muciniphila was a key strain that promotes the proliferation and differentiation of intestinal stem cells through acting on the Wnt/β-catenin signaling pathway. They used various models, such as piglet model, mouse model and intestinal organoids to address how A. muciniphila and B. fragilis offer the protection against ETEC infection. They showed that FMT with fecal samples, A. muciniphila or B. fragilis protected piglets and/or mice from ETEC infection, and this protection is manifested as reduced intestinal inflammation/bacterial colonization, increased tight junction/Muc2 proteins, as well as proper Treg/Th17 cells. Additionally, they demonstrated that A. muciniphila protected basal-out and/or apical-out intestinal organoids against ETEC infection via Wnt signaling.

      Comments on revised version:

      Please add proper references to indicate the invasion of ETEC into organoids after 1 h of infection.

      We have added references on line 211.

      References:

      Xiao K, Yang Y, Zhang Y, Lv QQ, Huang FF, Wang D, Zhao JC, Liu YL. 2022. Long-chain PUFA ameliorate enterotoxigenic Escherichia coli-induced intestinal inflammation and cell injury by modulating pyroptosis and necroptosis signaling pathways in porcine intestinal epithelial cells. Br. J. Nutr. 128(5):835-850.

      Qian MQ, Zhou XC, Xu TT, Li M, Yang ZR, Han XY. 2023. Evaluation of Potential Probiotic Properties of Limosilactobacillus fermentum Derived from Piglet Feces and Influence on the Healthy and E. coli-Challenged Porcine Intestine. Microorganisms. 11(4).

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Ma et al. describes a multi-model (pig, mouse, organoid) investigation into how fecal transplants protect against E. coli infection. The authors identify A. muciniphila and B. fragilis as two important strains and characterize how these organisms impact the epithelium by modulating host signaling pathways, namely the Wnt pathway in lgr5 intestinal stem cells.

      Strengths:

      The strengths of this manuscript include the use of multiple model systems and follow up mechanistic investigations to understand how A. muciniphila and B. fragilis interacted with the host to impact epithelial physiology.

      Weaknesses:

      After an additional revision, the bioinformatics section of the methods has changed significantly from previous versions and now indicates a third sequencer was used instead: Ion S5 XL. Important parameters required to replicate analysis have still not been provided. Inspection of the SRA data indicates a mix of Illumina MiSeq and Illumina HiSeq 2500. It is now unclear which sequencing technology was used as authors have variably reported 4 different sequencers for these samples. Appropriate metadata was not provided in the SRA, although some groups may be inferred from sample names. These changing descriptions of the methodologies and ambiguity in making the data available create concerns about rigor of study and results.

      Due to confusing the sequencing method of this experiment with other experiment samples, we apologize for the multiple incorrect modifications of the method description. We have modified the method for microbiome sequencing technology on line 304. The sequencing technology is Illumina HiSeq 2500. The SRA metadata can be viewed at https://www.ncbi.nlm.nih.gov/sra/PRJNA837047. The sample names ep1-6 and ef1-6 were correspond to the EP and EF groups, respectively.

      Recommendations For the Authors:

      As in the previous revision:

      -provide important parameters required to replicate analysis

      -ensure that reporting of sequencing technology is correct as data listed on SRA appears to be derived from Illumina sequencers, and was deposited indicating as such.

      -update SRA metadata such that experimental groups are clear and match the nomenclature used in the manuscript (Particularly for samples which are labelled [A-Z][0-9]

      - The multiple testing correction wasn’t applied.

      -Due to confusing the sequencing method of this experiment with other experiment samples, we apologize for the multiple incorrect modifications of the method description. We have modified the method for microbiome sequencing technology on line 304. The sequencing technology is Illumina HiSeq 2500.

      - The SRA metadata can be viewed at https://www.ncbi.nlm.nih.gov/sra/PRJNA837047. The sample names ep1-6 and ef1-6 were correspond to the EP and EF groups, respectively.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the effects of aging on auditory system performance in understanding temporal fine structure (TFS), using both behavioral assessments and physiological recordings from the auditory periphery, specifically at the level of the auditory nerve. This dual approach aims to enhance understanding of the mechanisms underlying observed behavioral outcomes. The results indicate that aged animals exhibit deficits in behavioral tasks for distinguishing between harmonic and inharmonic sounds, which is a standard test for TFS coding. However, neural responses at the auditory nerve level do not show significant differences when compared to those in young, normal-hearing animals. The authors suggest that these behavioral deficits in aged animals are likely attributable to dysfunctions in the central auditory system, potentially as a consequence of aging. To further investigate this hypothesis, the study includes an animal group with selective synaptic loss between inner hair cells and auditory nerve fibers, a condition known as cochlear synaptopathy (CS). CS is a pathology associated with aging and is thought to be an early indicator of hearing impairment. Interestingly, animals with selective CS showed physiological and behavioral TFS coding similar to that of the young normal-hearing group, contrasting with the aged group's deficits. Despite histological evidence of significant synaptic loss in the CS group, the study concludes that CS does not appear to affect TFS coding, either behaviorally or physiologically.

      We agree with the reviewer’s summary.

      Strengths:

      This study addresses a critical health concern, enhancing our understanding of mechanisms underlying age-related difficulties in speech intelligibility, even when audiometric thresholds are within normal limits. A major strength of this work is the comprehensive approach, integrating behavioral assessments, auditory nerve (AN) physiology, and histology within the same animal subjects. This approach enhances understanding of the mechanisms underlying the behavioral outcomes and provides confidence in the actual occurrence of synapse loss and its effects. The study carefully manages controlled conditions by including five distinct groups: young normal-hearing animals, aged animals, animals with CS induced through low and high doses, and a sham surgery group. This careful setup strengthens the study's reliability and allows for meaningful comparisons across conditions. Overall, the manuscript is well-structured, with clear and accessible writing that facilitates comprehension of complex concepts.

      Weaknesses:

      The stimulus and task employed in this study are very helpful for behavioral research, and using the same stimulus setup for physiology is advantageous for mechanistic comparisons. However, I have some concerns about the limitations in auditory nerve (AN) physiology. Due to practical constraints, it is not feasible to record from a large enough population of fibers that covers a full range of best frequencies (BFs) and spontaneous rates (SRs) within each animal. This raises questions about how representative the physiological data are for understanding the mechanism in behavioral data. I am curious about the authors' interpretation of how this stimulus setup might influence results compared to methods used by Kale and Heinz (2010), who adjusted harmonic frequencies based on the characteristic frequency (CF) of recorded units. While, the harmonic frequencies in this study are fixed across all CFs, meaning that many AN fibers may not be tuned closely to the stimulus frequencies.

      We chose the stimuli for the AN recordings to be identical to the stimuli used in the behavioral evaluation of the perceptual sensitivity. Only with this approach can we directly compare the response of the population of AN fibres with perception measured in behaviour. We will address this more clearly in the revision.

      If units are not responsive to the stimulus further clarification on detecting mistuning and phase locking to TFS effects within this setup would be valuable.

      It is unclear to us what the reviewer alludes to. We ask to rephrase the question.

      Given the limited number of units per condition-sometimes as few as three for certain conditions - I wonder if CF-dependent variability might impact the results of the AN data in this study and discussing this factor can help with better understanding the results. While the use of the same stimuli for both behavioral and physiological recordings is understandable, a discussion on how this choice affects interpretation would be beneficial. In addition a 60 dB stimulus could saturate high spontaneous rate (HSR) AN fibers, influencing neural coding and phase-locking to TFS. Potentially separating SR groups, could help address these issues and improve interpretive clarity.

      In the discussion of a revised version of the manuscript, we will point out the pros and cons of using fixed-level stimuli that were not adjusted in frequency to the BF.

      A deeper discussion on the role of fiber spontaneous rate could also enhance the study. How might considering SR groups affect AN results related to TFS coding? While some statistical measures are included in the supplement, a more detailed discussion in the main text could help in interpretation. We do not think that it will be necessary to conduct any statistical analysis in addition to that already reported in the supplement.

      We will consider moving some supplementary information back into the main manuscript when revising.

      Although Figure S2 indicates no change in median SR, the high-dose treatment group lacks LSR fibers, suggesting a different distribution based on SR for different animal groups, as seen in similar studies on other species. A histogram of these results would be informative, as LSR fiber loss with CS-whether induced by ouabain in gerbils or noise in other animals-is well documented (e.g., Furman et al., 2013).

      We will add information on the distribution when revising.

      Although ouabain effects on gerbils have been explored in previous studies, since these data already seems to be recorded for the animal in this study, a brief description of changes in auditory brainstem response (ABR) thresholds, wave 1 amplitudes, and tuning curves for animals with cochlear synaptopathy (CS) in this study would be beneficial. This would confirm that ouabain selectively affects synapses without impacting outer hair cells (OHCs). For aged animals, since ABR measurements were taken, comparing hearing differences between normal and aged groups could provide insights into the pathologies besides CS in aged animals. Additionally, examining subject variability in treatment effects on hearing and how this correlates with behavior and physiology would yield valuable insights. If limited space maybe a brief clarification or inclusion in supplementary could be good enough.

      We do indeed have data on ABR amplitudes and the wave 1 growth functions but only in response to broadband clicks. For more frequency-specific information, mass-potential recordings are available, obtained before and after ouabain treatment. Regarding neural tuning, we did not obtain full frequency-threshold curves but do have bandwidths for response curves recorded close to threshold. We are in the process of analyzing all these data further and will consider how to best incorporate them into the manuscript, to address the reviewer’s concerns.

      Another suggestion is to discuss the potential role of MOC efferent system and effect of anesthesia in reducing efferent effects in AN recordings. This is particularly relevant for aged animals, as CS might affect LSR fibers, potentially disrupting the medial olivocochlear (MOC) efferent pathway. Anesthesia could lessen MOC activity in both young and aged animals, potentially masking efferent effects that might be present in behavioral tasks. Young gerbils with functional efferent systems might perform better behaviorally, while aged gerbils with impaired MOC function due to CS might lack this advantage. A brief discussion on this aspect could potentially enhance mechanistic insights.

      Our provisional response below will be integrated in similar form into the Discussion.

      Olivocochlear efferent activity is a potential modulator of OHC gain (by medial olivocochlear neurons, MOC) and afferent activity (by lateral olivocochlear neurons, LOC). Beyond this general observation it is, however, difficult to speculate about its specific role in the TFS1 test, as almost nothing is known about efferent activity under naturalistic conditions in a behaving animal (reviewed by Lauer et al., 2022). We note, however, that efferent activity is believed to be reduced under general anesthesia (reviewed by Guinan, 2011, DOI 10.1007/978-1-4419-7070-1_3) and possibly abnormal in other ways, considering the potential top-down inputs to the efferent neurons from extensive brain networks (reviewed by Schofield, 2011, DOI 10.1007/978-1-4419-7070-1_9; Romero and Trussell, 2022, DOI: 10.1016/j.heares.2022.108516). Thus, it is reasonable to assume a reduced efferent influence in our auditory-nerve data, compared to the behavioral test situation. In contrast, we assume more comparable efferent influences in young-adult and old gerbils. It was recently shown that, despite age-related losses in both MOC and LOC cochlear innervation, this basically reflected the loss of efferent target structures (OHC and type-I afferents), with the surviving cochlear circuitry remaining largely normal (Steenken et al., 2024, DOI: 10.3389/fnsyn.2024.1422330). The main difference was an increased proportion of OHC without any efferent innervation, predominantly in low-frequency cochlear regions (Steenken et al., 2024). Such OHC are thus not under efferent control, and they are more numerous (about 10 – 30%) in old gerbils.

      Lastly, although synapse counts did not differ between the low-dose treatment and NH I sham groups, separating these groups rather than combining them with the sham might reveal differences in behavior or AN results, particularly regarding the significance of differences between aged/treatment groups and the young normal-hearing group. For maximizing statistical power, we combined those groups in the statistical analysis. These two groups did not differ in synapse number and had quite similar ABR wave 1 growth functions.

      Reviewer #2 (Public review):

      Summary:

      Using a gerbil model, the authors tested the hypothesis that loss of synapses between sensory hair cells and auditory nerve fibers (which may occur due to noise exposure or aging) affects behavioral discrimination of the rapid temporal fluctuations of sounds. In contrast to previous suggestions in the literature, their results do not support this hypothesis; young animals treated with a compound that reduces the number of synapses did not show impaired discrimination compared to controls. Additionally, their results from older animals showing impaired discrimination suggest that age-related changes aside from synaptopathy are responsible for the age-related decline in discrimination.

      We agree with the reviewer’s summary.

      Strengths:

      (1) The rationale and hypothesis are well-motivated and clearly presented.

      (2) The study was well conducted with strong methodology for the most part, and good experimental control. The combination of physiological and behavioral techniques is powerful and informative. Reducing synapse counts fairly directly using ouabain is a cleaner design than using noise exposure or age (as in other studies), since these latter modifiers have additional effects on auditory function.

      (3) The study may have a considerable impact on the field. The findings could have important implications for our understanding of cochlear synaptopathy, one of the most highly researched and potentially impactful developments in hearing science in the past fifteen years.

      Weaknesses:

      (1) My main concern is that the stimuli may not have been appropriate for assessing neural temporal coding behaviorally. Human studies using the same task employed a filter center frequency that was (at least) 11 times the fundamental frequency (Marmel et al., 2015; Moore and Sek, 2009). Moore and Sek wrote: "the default (recommended) value of the centre frequency is 11F0." Here, the center frequency was only 4 or 8 times the fundamental frequency (4F0 or 8F0). Hence, relative to harmonic frequency, the harmonic spacing was considerably greater in the present study. By my calculations, the masking noise used in the present study was also considerably lower in level relative to the harmonic complex than that used in the human studies. These factors may have allowed the animals to perform the task using cues based on the pattern of activity across the neural array (excitation pattern cues), rather than cues related to temporal neural coding. The authors show that mean neural driven rate did not change with frequency shift, but I don't understand the relevance of this. It is the change in response of individual fibers with characteristic frequencies near the lowest audible harmonic that is important here.

      The auditory filter bandwidth of the gerbil is about double that of human subjects. Because of this, the masking noise has a larger overall level than in the human studies in the filter. This precludes that the gerbils can use excitation patterns, especially in the condition with a center frequency of 1600 Hz and a fundamental of 200 Hz and in the condition with a center frequency of 3200 Hz and a fundamental of 400 Hz.

      The case against excitation pattern cues needs to be better made in the Discussion. It could be that gerbil frequency selectivity is broad enough for this not to be an issue, but more detail needs to be provided to make this argument. The authors should consider what is the lowest audible harmonic in each case for their stimuli, given the level of each harmonic and the level of the pink noise. Even for the 8F0 center frequency, the lowest audible harmonic may be as low as the 4th (possibly even the 3rd). In human, harmonics are thought to be resolvable by the cochlea up to at least the 8th.

      Because of the gerbil’s broader auditory filters, with the exception of the condition with center frequency of 1600 Hz and fundamental of 400 Hz harmonics are are not resolved. We will expand the topic of potential excitation pattern cues in the discussion of the revised version and add results on modeled excitation patterns to the supplement.

      (2) The synapse reductions in the high ouabain and old groups were relatively small (mean of 19 synapses per hair cell compared to 23 in the young untreated group). In contrast, in some mouse models of the effects of noise exposure or age, a 50% reduction in synapses is observed, and in the human temporal bone study of Wu et al. (2021, https://doi.org/10.1523/JNEUROSCI.3238-20.2021) the age-related reduction in auditory nerve fibres was ~50% or greater for the highest age group across cochlear location. It could be simply that the synapse loss in the present study was too small to produce significant behavioral effects. Hence, although the authors provide evidence that in the gerbil model the age-related behavioral effects are not due to synaptopathy, this may not translate to other species (including human). This should be discussed in the manuscript.

      Our provisional response below will be integrated in similar form into the Discussion.

      The observed extent of age-related or noise-induced loss of type-I afferent synapses on IHC varies widely between species and studies. For example, in ageing CBA/CaJ mice, mean losses of between 20 and 50% of afferent synapses (depending on cochlear location and precise age) were reported (Sergeyenko et al., 2013, DOI: 10.1523/JNEUROSCI.1783-13.2013; Kobrina et al., 2020, DOI: 10.1016/j.neurobiolaging.2020.08.012). Humans showed more pronounced losses of peripheral axons, of 40–100%, again depending on cochlear location, precise age, and noise history (Wu et al., 2019, DOI: 10.1016/j.neuroscience.2018.07.053; 2021, DOI: 10.1523/JNEUROSCI.3238-20.2021). The age-related and induced synapse losses in our gerbils were in a more moderate range, around 20% (Steenken et al., 2021, DOI: 10.1016/j.neurobiolaging.2021.08.019; this study). Thus, it is possible that a more severe, induced synaptopathy would have resulted in behavioral deficits in young-adult gerbils. However, in the absence of additional noise or pharmacologically induced damage, our study provides strong evidence for other factors causing temporal processing problems with advancing age. Our 3-year-old gerbils are approximately comparable to a 60-year-old human (Castano-Gonzalez et al., 2024, DOI: 10.1016/j.heares.2024.108989) with beginning but not yet clinically relevant hearing loss (Hamann et al., 2002, DOI: 10.1016/S0378-5955(02)00454-9).

      It would be informative to provide synapse counts separately for the animals who were tested behaviorally, to confirm that the pattern of loss across the group was the same as for the larger sample.

      Yes, the pattern was the same for the subgroup of behaviorally tested animals. We will add this information to the revised version of the manuscript.

      (3) The study was not pre-registered, and there was no a priori power calculation, so there is less confidence in replicability than could have been the case. Only three old animals were used in the behavioral study, which raises concerns about the reliability of comparisons involving this group.

      The results for the three old subjects differed significantly from those of young subjects and young ouabain-treated subjects. This indicates a sufficient statistical power, since otherwise no significant differences would be observed.

      Reviewer #3 (Public review):

      This study is a part of the ongoing series of rigorous work from this group exploring neural coding deficits in the auditory nerve, and dissociating the effects of cochlear synaptopathy from other age-related deficits. They have previously shown no evidence of phase-locking deficits in the remaining auditory nerve fibers in quiet-aged gerbils. Here, they study the effects of aging on the perception and neural coding of temporal fine structure cues in the same Mongolian gerbil model.

      They measure TFS coding in the auditory nerve using the TFS1 task which uses a combination of harmonic and tone-shifted inharmonic tones which differ primarily in their TFS cues (and not the envelope). They then follow this up with a behavioral paradigm using the TFS1 task in these gerbils. They test young normal hearing gerbils, aged gerbils, and young gerbils with cochlear synaptopathy induced using the neurotoxin ouabain to mimic synapse losses seen with age. In the behavioral paradigm, they find that aging is associated with decreased performance compared to the young gerbils, whereas young gerbils with similar levels of synapse loss do not show these deficits. When looking at the auditory nerve responses, they find no differences in neural coding of TFS cues across any of the groups.

      However, aged gerbils show an increase in the representation of periodicity envelope cues (around f0) compared to young gerbils or those with induced synapse loss. The authors hence conclude that synapse loss by itself doesn't seem to be important for distinguishing TFS cues, and rather the behavioral deficits with age are likely having to do with the misrepresented envelope cues instead.

      We agree with the reviewer’s summary.

      The manuscript is well written, and the data presented are robust. Some of the points below will need to be considered while interpreting the results of the study, in its current form. These considerations are addressable if deemed necessary, with some additional analysis in future versions of the manuscript.

      Spontaneous rates - Figure S2 shows no differences in median spontaneous rates across groups. But taking the median glosses over some of the nuances there. Ouabain (in the Bourien study) famously affects low spont rates first, and at a higher degree than median or high spont rates. It seems to be the case (qualitatively) in Figure S2 as well, with almost no units in the low spont region in the ouabain group, compared to the other groups. Looking at distributions within each spont rate category and comparing differences across the groups might reveal some of the underlying causes for these changes. Given that overall, the study reports that low-SR fibers had a higher ENV/TFS log-z-ratio, the distribution of these fibers across groups may reveal specific effects of TFS coding by group.

      As the reviewer points out, our sample from the group treated with a high concentration of ouabain showed very few low-spontaneous-rate auditory-nerve fibers, as expected from previous work. However, this was also true, e.g., for our sample from sham-operated animals, and may thus well reflect a sampling bias. We are therefore reluctant to attach much significance to these data distributions. We will consider moving some supplementary information back into the main manuscript when revising.

      Threshold shifts - It is unclear from the current version if the older gerbils have changes in hearing thresholds, and whether those changes may be affecting behavioral thresholds. The behavioral stimuli appear to have been presented at a fixed sound level for both young and aged gerbils, similar to the single unit recordings. Hence, age-related differences in behavior may have been due to changes in relative sensation level. Approaches such as using hearing thresholds as covariates in the analysis will help explore if older gerbils still show behavioral deficits.

      Unfortunately, we did not obtain behavioral thresholds that could be used here. The ABR thresholds, although not directly comparable to behavioral thresholds, suggest that our old animals had at most a moderate threshold increase in quiet. Furthermore, we want to point out that the TFS 1 stimuli had an overall level of 68 dB SPL, and the pink noise masker would have increased the threshold more than expected from the moderate, age-related hearing loss in quiet. Thus, the masked thresholds for all gerbil groups are likely similar and should have no effect on the behavioral results.

      Task learning in aged gerbils - It is unclear if the aged gerbils really learn the task well in two of the three TFS1 test conditions. The d' of 1 which is usually used as the criterion for learning was not reached in even the easiest condition for aged gerbils in all but one condition for the aged gerbils (Fig. 5H) and in that condition, there doesn't seem to be any age-related deficits in behavioral performance (Fig. 6B). Hence dissociating the inability to learn the task from the inability to perceive TFS 1 cues in those animals becomes challenging.

      Even in the group of gerbils with the lowest sensitivity, for the condition 400/1600 the animals achieved a d’ of on average above 1. Furthermore, stimuli were well above threshold and audible, even when no discrimination could be observed. Finally, as explained in the methods, different stimulus conditions were interleaved in each session, providing stimuli that were easy to discriminate together with those being difficult to discriminate. This approach ensures that the gerbils were under stimulus control, meaning properly trained to perform the task. Thus, an inability to discriminate does not indicate a lack of proper training.

      Increased representation of periodicity envelope in the AN - the mechanisms for increased representation of periodicity envelope cues is unclear. The authors point to some potential central mechanisms but given that these are recordings from the auditory nerve what central mechanisms these may be is unclear. If the authors are suggesting some form of efferent modulation only at the f0 frequency, no evidence for this is presented. It appears more likely that the enhancement may be due to outer hair cell dysfunction (widened tuning, distorted tonotopy). Given this increased envelope coding, the potential change in sensation level for the behavior (from the comment above), and no change in neural coding of TFS cues across any of the groups, a simpler interpretation may be -TFS coding is not affected in remaining auditory nerve fibers after age-related or ouabain induced synapse loss, but behavioral performance is affected by altered outer hair cell dysfunction with age.

      A similar point is made by Reviewer #1. As indicated above, we do have limited data on neural bandwidths and will explore if these are sufficient to address the reviewers’ questions about potential, age-related changes in neural tuning in our sample. Previous work found no substantial OHC losses (Tarnowski et al., 1991, DOI: 10.1016/0378-5955(91)90142-V; Adams and Schulte, 1997, DOI: 10.1016/S0378-5955(96)00184-0; Steenken et al., 2024, DOI: 10.3389/fnsyn.2024.1422330) nor any deterioration in neural frequency tuning (Heeringa et al., 2020, DOI: 10.1523/JNEUROSCI.2784-18.2019), in quiet-aged gerbils of similar age as the ones used here.

      Emerging evidence seems to suggest that cochlear synaptopathy and/or TFS encoding abilities might be reflected in listening effort rather than behavioral performance. Measuring some proxy of listening effort in these gerbils (like reaction time) to see if that has changed with synapse loss, especially in the young animals with induced synaptopathy, would make an interesting addition to explore perceptual deficits of TFS coding with synapse loss.

      This is an interesting suggestion that we will explore in the revision of the manuscript. Reaction times were recorded for responses that can be used as a proxy for listening effort.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      Summary:

      This computational modeling study addresses the observation that variable observations are interpreted differently depending on how much uncertainty an agent expects from its environment. That is, the same mismatch between a stimulus and an expected stimulus would be less significant, and specifically would represent a smaller prediction error, in an environment with a high degree of variability than in one where observations have historically been similar to each other. The authors show that if two different classes of inhibitory interneurons, the PV and SST cells, (1) encode different aspects of a stimulus distribution and (2) act in different (divisive vs. subtractive) ways, and if (3) synaptic weights evolve in a way that causes the impact of certain inputs to balance the firing rates of the targets of those inputs, then pyramidal neurons in layer 2/3 of canonical cortical circuits can indeed encode uncertainty-modulated prediction errors. To achieve this result, SST neurons learn to represent the mean of a stimulus distribution and PV neurons its variance.

      The impact of uncertainty on prediction errors in an understudied topic, and this study provides an intriguing and elegant new framework for how this impact could be achieved and what effects it could produce. The ideas here differ from past proposals about how neuronal firing represents uncertainty. The developed theory is accompanied by several predictions for future experimental testing, including the existence of different forms of coding by different subclasses of PV interneurons, which target different sets of SST interneurons (as well as pyramidal cells). The authors are able to point to some experimental observations that are at least consistent with their computational results. The simulations shown demonstrate that if we accept its assumptions, then the authors’ theory works very well: SSTs learn to represent the mean of a stimulus distribution, PVs learn to estimate its variance, firing rates of other model neurons scale as they should, and the level of uncertainty automatically tunes the learning rate, so that variable observations are less impactful in a high uncertainty setting.

      Strengths:

      The ideas in this work are novel and elegant, and they are instantiated in a progression of simulations that demonstrate the behavior of the circuit. The framework used by the authors is biologically plausible and matches some known biological data. The results attained, as well as the assumptions that go into the theory, provide several predictions for future experimental testing. The authors have taken into account earlier review comments to revise their paper in ways that enhance its clarity.

      Weaknesses:

      One weakness could be that the proposed theory does rely on a fairly large number of assumptions. However, there is at least some biological support for these. Importantly, the authors do lay out and discuss their key assumptions in the Discussion section, so readers can assess their validity and implications for themselves.

      Thank you very much, we are very satisfied with this public review.

      Reviewer #4 (Public Review):

      Summary:

      Wilmes and colleagues develop a model for the computation of uncertainty modulated prediction errors based on an experimentally inspired cortical circuit model for predictive processing. Predictive processing is a promising theory of cortical function. An essential aspect of the model is the idea of precision weighting of prediction errors. There is ample experimental evidence for prediction error responses in cortex. However, a central prediction of the theory is that these prediction error responses are regulated by the uncertainty of the input. Testing this idea experimentally has been difficult due to a lack of concrete models. This work provides one such model and makes experimentally testable predictions.

      Strengths:

      The model proposed is novel and well-implemented. It has sufficient biological accuracy to make useful and testable predictions.

      Weaknesses:

      One key idea the model hinges on is that stimulus uncertainty is encoded in the firing rate of parvalbumin positive interneurons. This assumption, however, is rather speculative and there is no direct evidence for this.

      Thank you very much for this nice description. With regard to the weakness: it is true that the key idea hinges on uncertainty being encoded in the firing of inhibitory neurons. If it turns out that these inhibitory neurons are not PV neurons, however, the theory does not break down. The suggestion of PV neurons is fueled by the observation that PV neurons implement shunting and hence divisive inhibition and by the connectivity of PVs in the circuit. We discuss this in the discussion section: "To provide experimental predictions that are immediately testable, we suggested specific roles for SSTs and PVs, as they can subtractively and divisively modulate pyramidal cell activity, respectively. In principle, our theory more generally posits that any subtractive or divisive inhibition could implement the suggested computations. With the emerging data on inhibitory cell types, subtypes of SSTs and PVs or other cell types may turn out to play the proposed role."

      Recommendations for the authors:

      Reviewer #4 (Recommendations For The Authors):

      (1) Line numbers would simplify reviewing.

      We will add line numbers to our next submission.

      (2) The existence of positive and negative PE was already suggested by Rao & Ballard.

      We added the citation to the sentence "Because baseline firing rates are low in layer 2/3 pyramidal cells () positive and negative prediction errors were suggested to be represented by distinct neuronal populations [44,66],[...]" in the section "Computation of UPEs in cortical microcircuits".

      (3) wekk should probably read well.

      Indeed, thank you. We fixed it.

      (4) Figure 4. legends A-C are mixed up. What are the two values of ¦s-u¦ in F and I - the same as in D and F.

      Thank you, we fixed this.

      (5) "representation neurons, the activity of which reflects the internal model". For consistency with the original definitions this should read "the activity of which reflects the internal representation". The internal "model" is the synaptic weights (or transformation between areas) - the activity of representation neurons (as the name implies) is the internal "representation".

      Thank you, we changed it.

      (6) "Mice trained in a predictable environment [...] [4]." This should read "reared" in an unpredictable environment, etc. Relatedly, the problem with this argument is that, the referenced paper argues that the mice never learned to predict and the reduced PE responses are a consequence of a reduction in prediction strength (these mice never - in life - had experience of visuomotor coupling). Better evidence might be the acute changes observed in normal mice (see e.g. Figure 3B in https://pubmed.ncbi.nlm.nih.gov/22681686/ However, another finding from the paper referenced is that in mice reared without visuomotor coupling, MM responses of SST interneurons are unchanged, while those in PV interneurons are completely absent. Would the authors model come to similar results if trained in an environment with (very) high uncertainty and then tested in a low uncertainty environment?

      Thank you for pointing us to Figure 3B of Keller et al. 2012. We are now citing this result as it is indeed better evidence.

      Thank you very much for your illuminating question and for pointing out that a mouse that never experienced a predictable visual flow may not have formed a model of the visual flow, and hence may not have any prediction about its visual experience. We haven’t considered this scenario in our paper before. So far, we only considered scenarios, in which it is possible to learn a prediction, i.e. to infer the mean from the sensory input. We now consider this other scenario in which the mouse that was reared in an unpredictable environment did not form a prediction and compare SST (1) and PV (2) activity in this mouse to one that learned to form a prediction, and added it to the section "Predictions for different cell types":

      "Second, prediction error activity seems to decrease in less predictable, and hence more uncertain, contexts: in mice reared in a predictable environment [where locomotion and visual flow match, 42], error neuron responses to mismatches in locomotion and visual flow decreased with each day of experiencing these unpredictable mismatches. Third, the responses of SSTs and PVs to mismatches between locomotion and visual flow [4] are in line with our model (note that in this experiment the mismatches are negative prediction errors as visual flow was halted despite ongoing locomotion): In this study, SST responses decreased during mismatch, i.e. when the visual flow was halted, and there was no difference between mice reared in a predictable or unpredictable environment. In line with these observations, the authors concluded that SST responses reflected the actual visual input. In our model negative PE circuit, SSTs also reflect the actual stimulus input, which in our case was a whisker stimulus (SST rates in Fig. 6C and I reflect the stimuli (black and grey bar) in A and G, respectively) and SST rates are the same for high and low uncertainty (corresponding to mice reared in a predictable or unpredictable environment). In the same study, PV responses were absent towards mismatches in animals reared in an unpredictable environment [4]. The authors argued that mice reared in an unpredictable environment did not learn to form a prediction. In our model, the missing prediction corresponds to missing predictive input from the auditory domain (e.g. due to undeveloped synapses from the predictive auditory input). If we removed the predictive input in our model, PVs in the negative PE circuit would also be silent as they would not receive any of the excitatory predictive inputs."

      (7) "Our model further posits the existence of two distinct subtypes of SSTs in positive and negative error circuits." There is some evidence for this: Figure 5a in https://pubmed.ncbi.nlm.nih.gov/36747710/

      Thank you, we added this citation to the corresponding section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The focus of this manuscript was to investigate the role of Cldn9 in the development of the mammalian cochlea. The main rationale of the study is the fact that cochlear hair cells do not regenerate, so when damaged they are lost forever, causing irreparable hearing loss. The authors have attempted to address this problem by inducing the ectopic production of additional hair cells and testing whether they acquire the morphological and functional characteristics of native hair cells. They show that downregulation of Cldn9 using a well-established genetic manipulation of transgenic mice led to the production of extra numerary inner hair cells, which were able to survive for several months. By performing a large battery of experiments, the authors were able to determine that the native and ectopic inner hair cells have comparable morphological and physiological characteristics. There are several conclusions highlighted by the authors in different parts of the manuscript, including the key role of Cldn9 in coordinating embryonic and postnatal development, the differentiation of supporting cells into inner hair cells, and the possible use of Cldn9 to induce inner hair cell differentiation following deafness induced by hair cell loss.

      Strengths:

      Several of the conclusions in this study are well supported by the experimental work.

      Weaknesses:

      Some aspects of the data and its interpretation needs better explanation and requires further investigation.

      (1) The Results section is the most difficult part to read and understand. It contains a very limited, and in some places confusing and repetitive, description of the data. Statistical analysis is missing for some of the key data (e.g., ABRs), and in some places the text contradicts the data presented in the figures (e.g., Figure 8). I am sure carefully revising the text would clarify some of these issues.

      We thank the reviewer for the suggestion. We revised parts of the results section and added the statistical analysis to the ABRs and DPOAE (lines 151-159; Page 29, lines 846-880). 

      (2) One puzzling finding that is not addressed in the manuscript is the lack of functional benefit from these additional inner hair cells. In fact, it appears to be detrimental based on the increased ABR thresholds. Maybe it would be useful to analyze the wave 1 characteristics.

      We thank the reviewer for the suggestion. We added the wave 1 characteristics as S8.

      (3) It is not clear what direct evidence there is, apart from some immunostaining, indicating that the ectopic inner hair cells derive from the supporting cells. This part would benefit from a more careful consideration and maybe an attempt at a more direct experimental approach.

      We thank the reviewer for the suggestion. We intend to investigate the origin of the ectopic inner hair cells using (for example, a qRT-PCR, sm FISH, etc.) in our future study.

      (4) One point that should be made clear throughout the manuscript is that the ectopic inner hair cells are generated in a cochlea that is undergoing normal maturation. Thus, there is no guarantee that modulating the expression levels of Cldn9 in a deaf mouse lacking hair cells would produce the same result as that shown in this study. My guess is that it probably won't, but I am sure this could be tested (maybe in the future) using the excellent experimental approach applied in this study.

      That is a great point. We will explore it in our future experiments.

      Reviewer #2 (Public Review):

      Summary:

      The generation of functional extranumerary inner hair cells (IHCs) in postnatal mice, particularly with virus-mediated knockdown of Cldn9 mRNA expression in the neonatal cochlear duct, is an important observation. It is significant because not many studies exist that report molecular manipulations of the neonatal organ of Corti that result in the generation of new hair cells that remain functional and appear to be intact for an extended time, here more than one year. Overall, this is a carefully conducted study; the observations are clear, and the methods are solid. Two independent methods for reducing the expression of Cldn9 mRNA were used: a conditional transgenic model and AAV-mediated knockdown with shRNA. The lack of a functional explanation of how the reduced expression of Cldn9 specifically leads to the formation of extranumerary IHCs leaves open questions. For example, it is not clear whether there is indeed a fate change happening and whether Cldn9 reduction affects developmental processes. The discussion of how Cldn9 reduction potentially affects Notch signaling, without hard evidence, is handwaving.

      Strengths:

      It is a very interesting observation and somewhat unexpected in its specificity for inner hair cells. Using two different approaches to manipulate Cldn9 expression provides a strong experimental foundation. The study is conducted quantitatively and with care.

      Weaknesses:

      The lack of mechanistic insight results in an open-ended story where at least the potential interaction of Cldn9 reduction with known and well-characterized signaling pathway components should have been investigated. This missed opportunity limits the scope of the study and should be addressed: How does Cldn9 downregulation affect the expression levels of other known genes linked to hair cell production and cell fate decisions? Quantitative RT-PCR works well for the authors, and comparing the expression of Notch or other known pathway components could provide mechanistic insight.

      We thank the reviewer for the suggestion. We did quantitative RT-PCR to compare the expression of Notch or other known pathway components in our future work. Besides, we used smFISH with ccnd1 probe and cdkn1b probe to detect cyclin D1 and cyclin-dependent kinase inhibitor 1B (p27) separately in the mouse cochlea. GAPDH was selected as a reference gene. The quantification results showed no significant difference between Cldn9<sup>+/T</sup> mice and Cldn9<sup>+/+</sup> mice at P2, P7, and P14.

      It is unclear how P21 inner hair cells were identified for the patch-clamp experiments shown in Fig 4E-H. This is a challenging endeavor without the possibility of using specific markers.

      We did not have a specific marker for IHCs. However, one with experience in hair bundle morphology and knowledge of their location in the epithelia can identify IHCs from the upright microscope.

      Please also address the numerous minor points outlined below; it will improve the paper's readability.

      Thanks. Please find the point-to-point answers below.

      Please include page numbers and line numbers in a revised manuscript.

      We include page numbers and line numbers in a revised manuscript.

      Reviewer #3 (Public Review):

      This important study by Chen et al help in advancing our knowledge about the regulation of inner hair cell (IHC) development and revealed the role of Cldn9 in IHC embryonic and postnatal induction by transdifferentiation from the supporting cells. The authors developed an inducible doxycycline (dox)-tet-OFF-Cldn9 transgenic mice to regulate expression levels of Cldn9 and show that downregulation of Cldn9 resulted in additional, although incomplete row of IHCs immediately adjacent to the original IHC row. These induced extra IHCs had similar well developed hair bundles, able to mechanotransduce and were innervated by auditory neurons resembling wild-type IHCs. In addition, the authors knock down Cldn9 postnatally using shRNA injections in P1-7 mice with similar induction of extranumerary IHC next to the original row of IHCs. The conclusions of this paper are mostly well supported by the data, but some data analysis needed to be clarified and some crucial controls should be provided to improve the confidence in the presented results. There is a great potential for practical use of these valuable findings and new knowledge on IHC developmental regulation to design Cldn9 gene therapy in the future.

      The described by Chen et al mechanisms of extra hair cell generation by suppression of the tight junction protein Cldn9 expression level are very interesting and previously unknown. In particular, the generation of extra IHCs postnatally using downregulation of Cldn9 by shRNA could potentially be very useful as a replacement of HCs lost after noise-induced trauma, ototoxic agents, or other environmental trauma. On the other hand, the replacement of lost hair cells due to various genetic mutations by inducing a supernumerary IHCs with the same abnormalities would not be reasonable.

      The authors show that postnatally generated ectopic IHCs are viable and mechanotransducive, but it would be nice to show the maturation steps of ectopic IHC during this postnatal period. For example, stereocilia bundles of the ectopic hair cells should mature later than the original IHCs. A few days after viral delivery of shRNA, you should be able to observe immature IHC bundles that unequivocally will define newly generated IHCs. Unfortunately, the authors show only examples of already mature ectopic IHCs at P21 and in 5-6 weeks old mice and at relatively low resolution. Also, during maturation, IHCs usually have transient axo-somatic synapses that are not present in mature IHCs. It would be great to see if, in 5-6 weeks old mouse, the ectopic IHCs still have axo-somatic synapses or not, and if the majority of the ectopic IHCs have innervation. Some of the data in this study would benefit from showing corresponding controls and some - from higher resolution imaging.

      We appreciate the reviewer's suggestion. The objective of the paper is to report the phenomenon and present the coarse features of the Cldn9-mediated induced ectopic hair cells. The systematic details are for future studies, which are ongoing and out of the current scope.

      In the mammalian cochlea, each HC is separated from the next by intervening supporting cells, forming an invariant and alternating mosaic along the cochlea's length. Cochlear supporting cells in some conditions can divide and trans-differentiate into HCs, serving as a potential resource for HC differentiation, using transcription and other developmental signaling factors.

      However, when ectopic hair cells are generated from supporting cell trans-differentiation, the intricate mosaic of the organ of Corti is altered, which could by itself lead to hearing issues. In case of downregulation of Cldn9, the extra row of IHCs seems to be positioned immediately adjacent to the original IHC row. It is not clear if the newly formed unusual junctions between the ectopic and original IHCs are sufficiently tight to prevent leakage of the endolymph to the basolateral surface of IHCs. Also, it is not clear if the other organ of Corti tight junctions could lose their tightness due to the downregulation of Cldn9, which could over time affect the endocochlear potential as shown by this study and hearing abilities.

      There was a slightly increased ABR threshold (5 dB -15 dB) (Fig. 4A) and a decrease in the magnitude of the EP and the rise in the K<sup>+</sup> concentration in the endolymph and perilymph of Cldn9+/T mice compared to from age-matched littermates (S10) indicated there might be a compromised epithelium tight junction. The downregulation of Cldn9 affected the endocochlear potential and hearing abilities ((Fig. 4A, S10) after 2m, suggesting an age-dependent effect. The effective downregulation of Cldn9 would require proper titration of Cldn9 levels to induce extra hair cells with intact epithelial integrity; work may require additional studies.

      Importantly, CLDN9 immunofluorescence staining data that show cytoplasmic staining of supporting cells should be revisited and the organ of Corti schematics showing CLDN9 expression should be corrected, considering that CLDN9 localizes to the tight junctions of the reticular lamina as was shown by immunoEM in this study and described in previous publications (Kitajiri et al., 2004; Nakano et al., 2009, Ramzan et al., 2021). While the current version of the manuscript will interest scientists working in the inner ear development and regeneration field, it could be more valuable to hearing researchers outside this immediate field and perhaps developmental biologists and cell biologists after proper revision.

      We appreciate the reviewer's comments. We were concerned about the observation, but the results were consistent. Indeed, that was the motivation for performing the immunoEM (S3). A follow-up report may address it further.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Please address the points I made about the presentation (word choice, inconsistencies in labeling, etc). It ultimately helps a reader to understand and to follow your logic. This is an important observation.

      We corrected the inconsistencies in labeling and addressed the points you suggested.

      Making the extra effort to investigate a possible interaction between Cldn9 and Notch signaling would substantially increase the significance of the work.

      Thanks for the suggestions. We will explore it in our future work.

      Minor points:

      Some sentences would benefit from revision:

      - The abstract argues that hearing loss is incurable because mammalian hair cells are terminally differentiated (3rd sentence). This is not accurate.

      Mammalian HCs are terminally differentiated by birth, making HC loss challenging to replace.

      - The second sentence of the second paragraph of the introduction, "Cochlear SCs can divide and trans-differentiate into HCs, serving as a potential resource for HC differentiation, using transcription and developmental signaling factors (White et al., 2006)," should be referenced in the context of the animal's age. This feature of supporting cells is transient and only observed in neonatal mice. The following sentences in the same paragraph would also benefit from being placed into the same context when appropriate.

      We thank the reviewer for the suggestion. These sentences have been corrected.

      - Introduction: "But functional features of the newly developed HC are circumspect." The authors probably meant "circumspect," but is this the appropriate word? Also, please use the plural of HC = HCs.

      The sentence has been corrected to “but the functional features of the newly developed HCs are circumspect”.

      - Introduction: Isn't an essential function of tight junctions in the organ of Corti the separation of fluid-filled spaces? Perhaps additional functions of tight junction proteins are unclear, but at least this one function appears clear.

      We thank the reviewer for the suggestion. We added the “additional” before the “function” in this sentence.

      - Introduction: "using shRNA injection in postnatal (P) days (P1-7) mice." This is a rather vague statement that could be better defined. Perhaps mention that the injections targeted the round window and that an AAV-based method was used. Also, it is not clear from the methods whether the injection needle pierced the round window. Please clarify. Likewise, the methods state that these experiments were conducted in P1-P15 mice, but the main text says P1-P7. Later, in the results section and in the figure legend for Fig 7, the mice are between P1-P7 and P14; the figure itself is labeled with P1 and P14. However, data is presented (Fig 6) for injections at P2, P4, P7, and P14. In the text referring to Fig 6B in the results section, it is stated, "By contrast, the P14-21 inner ear transfected with Cldn9-shRNA produced no detectable increase..." Only data for P2, P4, P7, and P14 injections are presented. These are minor issues, but please check the inconsistencies because they make it difficult to follow.

      We corrected this sentence to “Analogous additional putative IHCs differentiation was observed when Cldn9-shRNA was injected through the round window to postnatal (P) days (P2-7, and P14) mice…”.  The label in Fig 7A has been changed to P2-7, and the text referring to Fig 6B in the result section has been changed to “the P14 inner ear transfected with Cldn9-shRNA produced no detectable increase...".

      - Last statement of the Introduction: "making Cldn9 a viable target for generating transformed IHCs." It is not clear what transformed IHCs are.

      We replaced the transformed with supernumerary.

      - To understand the Southern Blot analysis in Fig 1E, the location of BstAPI and BamHI restriction sites and the probe need to be illustrated in Fig 1D.

      The restriction sites BstAPI, (Bst), and BamHI (Bam) are indicated (Fig. 1D).

      - Please define the purple arrows and arrowheads in Fig 1D. What do the different colors for the backbone mean? I see red and green, but also orange and yellow in the floxed allele. In Fig 1F, is "Knock-in" synonymous with homozygote? Would it be clearer to use the nomenclature Cldn9(T/T), Cldn9(T/+), and Cldn9(+/+), which is used later in the text?

      We have made the changes as requested.

      - Results, first paragraph: "Results of RT-PCR..." This refers to quantitative RT-PCR; please add the word "quantitative."

      Thanks. We added “quantitative” to the sentence.

      - Results and Fig S1. Is the strong upregulation of Cldn9 mRNA (S1A) also reflected in stronger Cldn9 immunoreactivity?

      Yes, the strong upregulation of Cldn9 mRNA showed higher cldn9 immunoreactivity.

      - Results, Fig 1. Please add a schematic drawing showing all elements of the inducible gene expression cassette in the final transgenic allele, and please illustrate how the system works. This helps the reader to understand the strong Cldn9 mRNA upregulation in Cldn9(T/T) mice, where expression is likely driven by the CMV promoter and reciprocally, in the presence of doxycycline, the suppression of transcription by binding of the tTA-dox protein to the TRE elements of the modified CMV promoter. Is this a correct assumption?

      Yes, this is a correct assumption

      - Results, about Fig S3. Why is it important to investigate Cldn6 and ILDR1 levels in the context of Cldn9 downregulation? Also, that is meant with "no comparative differences in others?". If a potential compensatory effect is suspected, why are the authors not systematically characterizing the expression of other tight junction proteins with quantitative RT-PCR? The results shown in S3 are anecdotal, without proper quantification, and lack context.

      The goal is to examine the potential compensatory changes in other TJ proteins. It was not to examine all possible TJ proteins localized in the inner ear.

      Results, section headed with "Downregulation of..." First sentence. Fig. 2A-C à Fig. 2A-E.

      Thanks. We corrected the sentence “5-week-old mice Cldn9<sup>+/T</sup> cochleae displayed a notable row of ectopic HCs (Fig. 2A-C).” to “5-week-old mice Cldn9<sup>+/T</sup> cochleae displayed a notable row of ectopic HCs (Fig. 2A-E).”

      The same section: "were negatively labeled with anti-prestin antibody." Consider "were not labeled with antibody to prestin." Likewise, a few sentences below, please consider rephrasing "the ectopic HCs ... reacted positively to otoferlin antibodies". Also, "...expressed multiple CtBP2 labeling..." - this reads like an incomplete sentence.

      Thanks for the suggestions. We have corrected the three sentences mentioned.

      The phrase "putative ectopic" lacks clarity because "putative" could refer to "ectopic" (like an adverb). Consider swapping the two words and writing "ectopic putative IHCs" or simply "ectopic IHCs."

      Thanks for the suggestions. We replaced the “putative ectopic IHCs” with “ectopic IHCs” in all contexts.

      Please use more precise figure labels when referring to a specific figure panel. For example, "Additionally, the ectopic HCs show IHC bundle features (Fig. 2)," - Bundles are shown in Fig 2D and Fig 2E. Please check all instances where a full figure is mentioned, but the specific reference is to a panel of the figure. Another example, "... using quantitative RT-PCR (S7)..." would be more specific if Fig S7A is referred to.

      Thanks for the suggestions. We checked all instances and corrected the labels. Thanks!

      "IHC counts at different ages (P2-P21) and the cochlear frequency segments (4-32 kHz) demonstrate..."- the figure shows data for 8 kHz and 32 kHz; please revise: "segments (8 kHz and 32 kHz) demonstrate."

      This sentence has been revised based on your suggestion. Thanks!

      Please add a legend to Fig. 3C (like the one shown in Fig. 2F).

      Thanks for the reminder. The legend for Fig. 3C was modified.

      Fig 4A and Fig 4B. It is impossible to distinguish the open/closed circles and the many lines. Please consider a different format or an extended supplemental figure. Also, drawing a line connection between the 32 kHz and click data points in 4A is inappropriate.

      Instead of the open/closed circles, the dashed line means Cldn9<sup>+/+</sup> mice, and solid lines represent Cldn9<sup>+/T</sup> mice. We added the line labels. The line connecting between 32 kHz and click data points was removed.

      Fig 4, legend. Please define BHB and BHC levels.

      BHB and BHC are defined.

      The paragraph "Synaptic features of PE IHCs match original IHCs" is confusing because it states the following: "The synapses between the IHCs and auditory neurons at the apical, middle, and basal cochlear locations from 5-week-old Cldn9+/+ and Cldn9+/T mice show substantial differences." The meaning of the heading, therefore, does not match what is ultimately shown and discussed.

      We have changed the title to “Synaptic features of ectopic IHCs and original IHCs”.

      Moreover, no actual features of synapses are investigated; CtBP2/Homer pairs were used to identify afferent synapses, which this reviewer would argue provides a reasonable estimate of the number of synapses where pre- and post-synaptic markers are detected in close vicinity. It would be helpful to describe the method for counting juxtaposed CtBP2 and Homer-labeled puncta with more detail.

      The method section now includes more information about the synapse count, which this reviewer would argue provides a reasonable estimate of the number of synapses where pre- and post-synaptic markers are detected in close proximity.

      The final concluding sentence of the section also suggests that synaptic transmission from PE IHCs might be compromised because significant differences in synapse numbers were identified. It would be important to mention this.

      Thanks for the reminder. We added this information to the final concluding sentence.

      Fig. 5C, 5D; legend. Is "co-expressed" the right word choice? Consider "colocalized" or "juxtaposed".

      The "co-expressed" has been replaced with "colocalized".

      Voltage-clamp recordings of P21 inner hair cell mechanoelectrical transduction currents. This reviewer cannot identify a previous publication describing the details of this method on P21 cochlear inner hair cells; this seems like an excellent methodological advance.

      Yes, we can record data from older mice. Thanks for pointing it out.

      "Transfection in vivo of Cldn9 shRNA," the P14-21 inner ear transfected with Cldn9-shRNA." Plus, additional use of the word "transfection." Transfection generally means the introduction of plain nucleic acid into cells. The word refers to methods that do not use viruses. In contrast, "transduction" is the term used for virus-mediated gene transfer. The authors used AAVs. Please correct for appropriate scientific terminology.

      Thanks for the clarification. This information has been corrected accordingly.

      "A slight decline in the amplitude of the EP and a substantial rise in perilymph K+ was detected in 8-month-old Cldn9+/T (S7)." Probably Fig. S8A,B is meant.

      Yes, it referred to Fig. S8 A, B. We corrected it in the result section. Thanks!

      Heading "Discussions" -> "Discussion"

      The focus of the second part of the discussion on potential interactions between Cldn9 suppression and known signaling pathways is essential. The logic that is presented with respect to Notch signaling, however, is not clear and misleading. For example, it is not obvious what is meant by "Cldn9 subserves the signaling catalyst to activate NICD cascades" and whether this statement is supported by any published data.

      The statement was a suggestion and has been qualified with a “may” clause (line 299).

      The authors might consider discussing whether the observed effect caused by Cldn9 elimination is a specific role of the Cldn9 protein itself or is an epiphenomenon resulting from cytomechanical changes in the developing and maturing organ of Corti. This would add a potential Notch-independent component for a possible interpretation of the observations.

      We state lines 302-304 “Alternatively, Cldn9 levels disruption may alter the mechanical properties of the developing and maturing organ of Corti that may trigger ectopic IHC differentiation, an epiphenomenon independent of the Notch signaling“.

      Methods:

      "Deletion of the selection marker in the tTA cassette by crossing the F1 mouse with the embryonic Cre line (B6.129S4-Meox2tm1(cre)Sor/J)." This sentence seems to be incomplete.

      Thanks for pointing it out. This sentence has been rewritten.

      "Images were captured under a confocal microscope." Consider writing "with a confocal microscope".

      This sentence has been corrected. Thanks!

      RNA extraction and... How many mice were used per experiment? 10-15 or just 10?

      The mice number for the RNA extraction is between 10 and 15. Thanks

      Reviewer #3 (Recommendations For The Authors):

      Below are my suggestions, questions, and criticisms.

      (1) The red outline on Fig1A schematic does not correspond to the previously published expression pattern of CLDN9 in the organ of Corti reticular lamina tight junctions (Kitajiri et al, 2004, Nakano et al., 2009, Ramzan et al., 2021). Also, there are no tight junctions all around the pillar cells. The tight junctions are restricted to the sites of tight attachments between two cells. The immunofluorescence staining using CLDN9 antibody looks rather cytoplasmic (Fig 1 and Fig S1) than associated with the tight junctions as it was shown by immunoEM data here and reported previously (Kitajiri et al, 2004; Nakano et al, 2009; Ramzan et al, 2021). Please correct the schematic and explain your data.

      We have redrawn the diagram (Fig. 7).

      (2) The CLDN9 staining in Figure 1, B and C, highlights the cytoplasm of the supporting cells, and hair cells devoid of the staining. From the images in Fig. S1C, it also looks like CLDN9 is present only in supporting cells and not in hair cells? How would the authors reconcile their data with Cldn9 expression data from the gEAR database and Ramzan et al.'s 2021 RNAscope data? Please provide the validation of the antibody used in this study.

      We recognize the reviewer’s concern but RNA and protein levels are not always in parallel.

      (3) Figure 1D. The dash lines from the targeting vector to the wt allele seem to indicate a recombination event. Please do not show the recombination event, instead just show what part of the targeting vector was incorporated to replace wt Cldn9. There is no description in the figure 1 legend what purple arrows and arrowheads mean and what yellow and orange line segments in the floxed allele schematic indicate. Please also show where the BstAPI and BamHI restriction enzyme sites are.

      We have provided supplement Fig 1., and have noted the BstAPI and BamHI restriction enzyme sites in Fig. 1D.

      (4) What does the organ of Corti that has 40-to-55-fold increase in Cldn9 mRNA expression looks like before dox treatment? Any abnormalities at all? How is CLDN9 protein localization looks in the Cldn9+/T untreated mice? Do they have normal number of IHCs? Cldn9+/T untreated mice should be used as another control at least in Figure S1. What does the organ of Corti that has a 40-to-55-fold increase in Cldn9 mRNA expression look like before dox treatment? Are there any abnormalities at all?

      The untreated Cldn9<sup>+/T</sup> mice can grow normally but are not fertile. So, we used a very low concentration of dox water (0.1 mg/ml) instead of normal water to keep the breeding pairs. The protein level increased in the Cldn9<sup>+/T</sup> mice compared with Cldn9<sup>+/+</sup>mice. With 0.1 mg/ml dox water, they also showed ectopic IHCs.

      (5) It is interesting that decline of 0.4-0.6-fold in mRNA level leads to about 8-fold decrease in protein level based on your immunoEM data on tight junctions of IHC with supporting cells. Do you observe the same effect in OHC-SC tight junctions, or the decrease was observed selectively around IHCs?

      The reviewer is alluding to matching RNA and protein levels. It appears that for Clnd9 one cannot expect a closely matched relationship.

      (6) The quality of the immunoEM data is great, but a control of secondary antibody alone staining in wt and Cldn9+/T dox treated should be shown and compared to the Cldn9+/T treated sample.

      We thank the reviewer for raising the issue. Secondary antibodies are used as a control in all immunoEMs in the laboratory. We opted not to show negative results.

      (7) The authors observed a decrease in Cldn6 expression albeit not quantitative in response to Cldn9 downregulation. How were the immunofluorescence signals compared and evaluated? Please provide a detailed description of the method used. Did the authors used the same image acquisition parameters? Was the Cldn9 and Cldn6 immunostaining done using same protocol with the same aliquot and dilution of the secondary antibodies, etc.? The staining for CLDN6 seems to be concentrated in the cytoplasm of supporting cells, and not in the tight junctions, similar to CLDN9 immunoreactivity shown in Fig. S1C and to the ILDR1 pattern of staining in Fig. S3. How can the authors explain this? How were the antibodies validated?

      The Cldn9 and Cldn6 immunostaining were done using the same protocol with the same aliquot and dilution of the secondary antibodies.

      (8) CLDN14 is also expressed in the organ of Corti tight junctions. What happened to this TJ protein during CLDN9 downregulation?

      We detected Cldn14 with immunostaining in the Cldn9+/T mice and Cldn9+/+ mice fed with 0.25 mg/ml dox water, and the results showed increased expression of Cldn14 in Cldn9+/T mice. Detail alterations of other TJ proteins have been reserved for future studies. 

      (9) When supernumerary IHCs were observed in Cldn9+/T mice, have the authors noticed a corresponding decrease in supporting cells surrounding IHCs? Quantification of the IHCs supporting cells would be useful. Do the ectopic IHCs have apical tight junctions with original IHCs or they are surrounded by supporting cells?

      We quantified the SCs around the IHCs but did not detect significant differences among the groups.

      (10) The authors indicated that viable PE IHCs were observed in 15 months old Cldn9+/T dox treated mice. How stereocilia bundles look in these ectopic hair cells? Are they preserved similar to the original IHCs or degenerated? It is hard to see this in Fig 3, phalloidin panel. High-resolution SEM would show this better.

      For the remaining ectopic IHCs in 15 months, we did not detect apparent differences in hair bundles compared with the original IHCs.

      (11) Interestingly, the authors indicate that the highest number of the ectopic IHCs were developed in the apical turn and the higher elevation of ABR threshold was also observed at low frequencies end. This may indicate that extra IHCs do not help hearing function.

      The extra IHCs showed along the whole cochlea, even though it is more obvious in the apical turn. The declined hearing may have resulted from the leakage of the endolymph K+ to the perilymph and EP decline.

      (12) No age-matched wt control is shown for decreased expression of Cldn9 after shRNA injection at P2 (Fig. 6A).

      As indicated earlier, we opted to state but did not show negative results.

      (13) Figure 6C. The better- quality SEM images showing a longer stretch of IHCs are needed to convince readers that there are ectopic IHCs that are well preserved in 5-6 weeks old mice in all cochlear turns after GFP-Cldn9 shRNA treatment at P2-P7.

      In S4, we showed that there are ectopic IHCs along the cochlear axis.

      (14) Do scrambled shRNA control samples had some ectopic IHCs? This control is missing in Fig.6D.

      No scrambled shRNA controls did not show ectopic IHCs. We have stated it.

      (15) Figure 7B, lower schematic. There are no known continuous tight junctions and CLDN9 expression around the OHCs and IHCs. CLDN9 is known to be concentrated at the reticular lamina tight junctions which separate the endolymph from perilymph. Please, correct all schematics accordingly.

      We have made the changes as requested.

      Minor comments:

      (1) Page 1, Abstract. I would not say "making HC loss incurable" since recent gene therapy results show some advances in this direction. Please rephrase more accurately.

      We have made the changes as requested.

      (2) Page 4, Results, line 5; please rephrase "PCR of tail tissue samples performed genotyping."

      It has been corrected to “The genotyping was performed by the PCR with the tail tissue.”

      (3) Fig. 1 legend, panel B, replace "showing IHC stained myosin7a" with "showing IHC stained by myosin7a". Also, in the same sentence, "phalloidin, actin (green) antibodies," Phalloidin is not an antibody; please change this.

      Thanks. We have corrected this information.

      (4) Fig 2C, IHC label obscures the view of IHCs, please move this label out and use an arrow to point to IHCs.

      We have made the changes as requested.

      (5) Figure 4, title. Replace "currents elicited original" with "current elicited from original".

      This sentence has been corrected. Thanks.

      (6) Figure 4, panel A. It is hard to see the open symbols on the graph. Are they associated with the dash lines? Please make them more visible or indicate what dash lines are. "ABR threshold for (n=12)" should be "ABR threshold for Cldn9+/+(n=12)"?

      Yes, they are associated with the dash lines. We added the labels for the solid lines and dash lines. "ABR threshold for (n=12)" was corrected to "ABR threshold for Cldn9+/+(n=12)."

      (7) Figure 4, legend. "Within each wt and heterozygote mice, there was no significant shift...". Do you mean within each group of mice? Also "Mean DPOAE threshold for 2-8 mos (n=9) was tested,..." Do you mean (n=9) for each group or what group?

      Yes, "Within each wt and heterozygote mice, there was no significant shift..." has been revised. The number of mice in each group for the DPOAE test was clarified in the Fig. 4B legend. Thanks.

      (8) Please label the X axis in Figure 4D.

      The X-axis has been labeled (Time (s))

      (9) Figure 4 B, do the colors of the lines indicate the same age groups as in Fig 4A? Do the dash lines associate with open symbols? Please state this clearly in the figure's legend.

      Yes. We added this information in Fig. 4B legend.

      (10) Figure 4D. Please label the X axis of the fluorescence intensity graph.

      The X-axis has been labeled (Time (s))

      (11) Figure 4G, legend. Replace "(mean +std)" with "(mean +SD)" for consistency here and in Figure 5 legend.

      Thanks. We replaced "(mean +std)" with "(mean +SD) in the legend of Fig. 4G and Fig.5 and Fig.6.

      (12) Figure 5B, legend. Replace "makers" with "markers".

      Thanks. This information was corrected.

      (13) Figure 6A, legend. There is no downregulation of Cldn9 by shRNA shown in "S5". Do the authors mean Figure S7? Please, correct "S5" to "Fig. S7".

      This information was corrected. Thanks.

      (14) Figure 6A, legend. There is no reduced CLDN9 protein expression shown in Fig. 1C. Do the authors mean Fig. 6A, third panel? Please correct the phrase "reduced protein expression (Fig. 1C) is shown in the 3rd Panel (Cldn9, red)" accordingly, and do not capitalize "p" in the "3rd Panel".

      This information was corrected. Thanks (line 917-918).

      (15) Also there, replace "The right Panel shows two rows of IHCs (marked HC marker, Myo7a (cyan), and the merged photomicrograph" with "The right panel shows the merged image with two rows of IHCs stained with HC marker Myo7a (cyan) and the expression of Ad-GFP-mCldn9 shRNA (green) in the adjacent row of supporting cells". Please indicate in what cells Ad-GFP-mCldn9 shRNA (green) is expressed. It looks like only one row of supporting cells has this green signal.

      This information was corrected.

      (16) Figure 6B, legend. Replace "Examples of photomicrographs of sections of the whole-mount cochlea of P2, P4, P7, and P14 Cldn9 shRNA injected mice" with "Examples of phalloidin stained whole-mount organ of Corti samples from cochleae of the wild-type mice injected at P2, P4, P7 and P14 with Cldn9 shRNA"

      This sentence has been modified based on your suggestions. Thanks!

      (17) Replace "action labeling" with "actin labeled."

      Thanks!  The "action labeling" has been replaced with "actin labeled." Line 924

      (18) Figure 6C. Insert "C" before SEM images description in the legend. The authors stated that SEM images of "5-6-wks-old mice" are shown. Please indicate the exact age of mice shown on each image and at what age these mice received the virus injection.

      Thanks!  The “C” has been added. We have noted that the SEM images are from 5-week-old mice" in the legend, and the virus was injected at P2.

      (19) Figure 6D, legend. Last sentence: move "are significantly different" and insert this between "IHCs" and "at P2 apex".

      This information was corrected.

      (20) Figure S7, legend. Replace "(sram)" with "(scram)" as in the figure itself. Also, Indicate the age of samples at the harvesting time for imaging and the age at injection of Cldn9 shRNA.

      "(sram)" has been replaced with "(scram)". The age of samples at the harvesting time for imaging and the age at injection of Cldn9 shRNA are indicated.

      (21) Figure S8. Replace "4 mos-old" and "8 mos-old" with "4 months-old" and "8 months-old" everywhere in the legend and in the figure labels.

      We have made the changes as suggested.

      (22) Page 8, 5th lane from the bottom. Change "EP and K+ concentration endolymph" to "EP and K+ concentration of the endolymph".

      It has been corrected. Thanks.

      (23) Page 8, next to the last sentence before the Discussion. Wrong figure number, please replace "(S7)" with "Fig. S8".

      It has been corrected. Thanks.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Joint Public Review:

      Summary:

      The authors aimed to identify the neural sources of behavioral variation in fruit flies deciding between odor and air, or between two odors.

      Strengths:

      - The question is of fundamental importance.

      - The behavioral studies are automated, and high-throughput.

      - The data analyses are sophisticated and appropriate.

      - The paper is clear and well-written aside from some initially strong wording.

      - The figures beautifully illustrate their results.

      - The modeling efforts mechanistically ground observed data correlations.

      Weaknesses:

      - The correlations between behavioral variations and neural activity/synapse morphology are relatively weak, and sometimes overstated in the wording that describes them.

      We sincerely thank the reviewers for these evaluations.

      Recommendations for the authors:

      Line 56: "We hypothesize that as sensory cues are encoded and transformed to produce motor outputs, their representation in the nervous system becomes increasingly idiosyncratic and predictive of individual behavioral responses". This seems obvious a priori. The sensory stimuli are the same, but the motor responses are different. Along the way there has to be a progression from same to different. Is there an alternative hypothesis? If so, perhaps state the alternative.

      We added text to the first paragraph of the introduction (lines 58-60) laying out an alternative hypothesis that individuality emerges through biomechanical differences and environmental interactions, and we have altered our motivating question to assess whether circuit elements in which activity is predictive of individual behavior exist, and if so, where (lines 60-62).

      Line 157: typo "remaining"

      We changed “remaining” to “remain” (line 160).

      Line 163: why report r sometimes and R^2 other times? Better to use R^2 throughout.

      We changed all instances of r to R<sup>2</sup>, notably when reporting combined train/test statistics for calcium - behavior models (line 162). We also reframed the outputs (medians + 90% confidence intervals) of the supplemental analysis inferring the strength of the latent calcium-behavior relationship to be in terms of R<sup>2</sup> (lines 166, 173-175, 241, 252; modified text in Inference of correlation between latent calcium and behavior states in Materials and Methods; adjusted figure and caption for Figure 1 – figure supplement 9).

      Line 182: "odorant". Should be "odorant receptors"?

      We respectfully disagree – our ORN and PN calcium data are responses to odorants in 5 glomerulus/odorant receptor types. When we group PCA loadings by glomerulus for both ORN and PN calcium, the consistency within groups is much stronger than when we group the loadings by odorant (Figure 1 – figure supplement 8). Additionally, “odorant receptor organization” would mean the same thing as “glomerular organization,” since all ORNs expressing the same odorant receptor project to a single glomerulus.

      Line 331: "harbor". Maybe more modestly "contribute to"?

      We changed “harbor” to “contribute to” (line 334) and added additional moderating language that the difference in DC2 and DM2 activations in PNs explains a large portion of the individuality signal (lines 337-339).

      Line 403: typo "is"

      We retained “is” as the corresponding verb for “the net effect,” but we adjusted the position of the reference to Gomez-Marin and Ghazanfar, 2019 for more clarity (lines 406-408).

    1. Author response:

      Reviewer #1(Public review):

      Summary:

      This manuscript details the results of a small pilot study of neoadjuvant radiotherapy followed by combination treatment with hormone therapy and dalpiciclib for early-stage HR+/HER2-negative breast cancer.

      Strengths:

      The strengths of the manuscript include the scientific rationale behind the approach and the inclusion of some simple translational studies.

      Weaknesses:

      The main weakness of the manuscript is that overly strong conclusions are made by the authors based on a very small study of twelve patients. A study this small is not powered to fully characterize the efficacy or safety of a treatment approach, and can, at best, demonstrate feasibility. These data need validation in a larger cohort before they can have any implications for clinical practice, and the treatment approach outlined should not yet be considered a true alternative to standard evidence-based approaches.

      I would urge the authors and readers to exercise caution when comparing results of this 12-patient pilot study to historical studies, many of which were much larger, and had different treatment protocols and baseline patient characteristics. Cross-trial comparisons like this are prone to mislead, even when comparing well powered studies. With such a small sample size, the risk of statistical error is very high, and comparisons like this have little meaning.

      We greatly appreciate your evaluation of our study and fully agree with the limitations you have pointed out. We have clearly stated the limitations of the small sample size and emphasized the need for a larger population to validate our preliminary findings in the discussion section (Lines 311-316).

      We acknowledge that this small sample size is not powered to characterize this regimen as a promising alternative regimen in the treatment of patients with HR-positive, HER2-negative breast cancer. Therefore, we have revised the description of this regimen to serve as a feasible option for neoadjuvant therapy in HR-positive, HER2-negative breast cancers both in the discussion (Lines 317-320) and the abstract (Lines 71-72).

      We agree with you that cross-trial comparisons should be approached with caution due to differences in study designs and patient populations. In our discussion section, we acknowledge that small sample size limited the comparison of our data with historical data in the literature due to the potential bias (Lines 312-313). We clearly state that such comparisons hold limited significance (Lines 313-314) and suggest a larger population to validate our preliminary findings.

      • Why was dalpiciclib chosen, as opposed to another CDK4/6 inhibitor?

      Thank you for your comments. The rationale for selecting dalpiciclib over other CDK4/6 inhibitors in our study is primarily based on the following considerations:

      (1) Clinical Efficacy: In several clinical trials, including DAWNA-1 and DAWNA-2, the combination of dalpiciclib with endocrine therapies such as fulvestrant, letrozole, or anastrozole has been shown to significantly extend the progression-free survival (PFS) in patients with hormone receptor-positive, HER2-negative advanced breast cancer (1-2).

      (2) Tolerability and Management of Adverse Reactions: The primary adverse reactions associated with dalpiciclib are neutropenia, leukopenia, and anemia. Despite these potential side effects, the majority of patients are able to tolerate them, and with proper monitoring and management, these reactions can be effectively mitigated (1-2).

      (3) Comparable pharmacodynamic with other CDK4/6 inhibitors: The combination of CDK4/6 inhibitors, including palbociclib, ribociclib, and abemaciclib, with aromatase inhibitors has demonstrated an enhanced ability to suppress tumor proliferation and increase the rate of clinical response in neoadjuvant therapy for HR-positive, HER2-negative breast cancer (3-5). Furthermore, preclinical studies have shown that dalpiciclib has comparable in vivo and in vitro pharmacodynamic activity to palbociclib, suggesting its potential effectiveness in similar treatment regimens (6).

      (4) Accessibility and Regulatory Approval: Dalpiciclib has gained marketing approval in China on December 31, 2021, which facilitates the accessibility of this medication, making it a more convenient option when considering treatment plans.

      References:

      (1) Zhang P, Zhang Q, Tong Z, et al. Dalpiciclib plus letrozole or anastrozole versus placebo plus letrozole or anastrozole as first-line treatment in patients with hormone receptor-positive, HER2-negative advanced breast cancer (DAWNA-2): a multicentre, randomised, double-blind, placebo-controlled, phase 3 trial(J). The Lancet Oncology, 2023, 24(6): 646-657.

      (2) Xu B, Zhang Q, Zhang P, et al. Dalpiciclib or placebo plus fulvestrant in hormone receptor-positive and HER2-negative advanced breast cancer: a randomized, phase 3 trial(J). Nature medicine, 2021, 27(11): 1904-1909.

      (3) Hurvitz S A, Martin M, Press M F, et al. Potent cell-cycle inhibition and upregulation of immune response with abemaciclib and anastrozole in neoMONARCH, phase II neoadjuvant study in HR+/HER2− breast cancer(J). Clinical Cancer Research, 2020, 26(3): 566-580.

      (4) Prat A, Saura C, Pascual T, et al. Ribociclib plus letrozole versus chemotherapy for postmenopausal women with hormone receptor-positive, HER2-negative, luminal B breast cancer (CORALLEEN): an open-label, multicentre, randomised, phase 2 trial(J). The lancet oncology, 2020, 21(1): 33-43.

      (5) Ma C X, Gao F, Luo J, et al. NeoPalAna: neoadjuvant palbociclib, a cyclin-dependent kinase 4/6 inhibitor, and anastrozole for clinical stage 2 or 3 estrogen receptor–positive breast cancer(J). Clinical Cancer Research, 2017, 23(15): 4055-4065.

      (6) Long F, He Y, Fu H, et al. Preclinical characterization of SHR6390, a novel CDK 4/6 inhibitor, in vitro and in human tumor xenograft models(J). Cancer science, 2019, 110(4): 1420-1430.

      • The eligibility criteria are not consistent throughout the manuscript, sometimes saying early breast cancer, other times saying stage II/III by MRI criteria.

      criteria in our manuscript. We deeply apologize for any confusion caused by these inconsistencies. We have revised the term from “early-stage HR-positive, HER2-negative breast cancer” to “early or locally advanced HR-positive, HER2-negative breast cancer” (Lines 128 and 150). The term “early or locally advanced” encompasses two different stages of breast cancer, whereas “Stage II/III by MRI criteria” refers to specific stages within the TNM staging system.

      • The authors should emphasize the 25% rate of conversion from mastectomy to breast conservation and also report the type and nature of axillary lymph node surgery performed. As the authors note in the discussion section, rates of pathologic complete response/RCB scores are less prognostic for hormone-receptor-positive breast cancer than other subtypes, so one of the main rationales for neoadjuvant medical therapy is for surgical downstaging. This is a clinically relevant outcome.

      We appreciate your constructive comments. Based on your suggestions, we have made the following revisions and additions to the article.

      The breast conservation rate serves as a secondary endpoint in our study (Line 62 and 179). We have highlighted the significant 25% conversion rate from mastectomy to breast conservation in both the results (Lines 229-230) and discussion sections (Lines 290-292).

      In our study, all patients underwent lymph node surgery, including sentinel lymph node biopsy or axillary lymph node dissection. Among them, 58.3% of patients (7/12) underwent sentinel lymph node biopsies.

      We agree with your point that the prognostic value of pathologic complete response/RCB score is lower for hormone receptor-positive breast cancer compared to other subtypes, we have revised the discussion section to clarify that one of the principal objectives for neoadjuvant therapy in this patient population is to facilitate downstaging and enhance the rate of breast conservation (Lines 289-290). And also emphasized that this neoadjuvant therapeutic regiment appeared to improve the likelihood of pathological downstaging and achieve a margin-free resection, particularly for those with locally advanced and high-risk breast cancer (Lines 293-295).

      Reviewer #2 (Public review):

      Firstly, as this is a single-arm preliminary study, we are curious about the order of radiotherapy and the endocrine therapy. Besides, considering the radiotherapy, we also concern about the recovery of the wound after the surgery and whether related data were collected.

      Thanks for the comments. The treatment sequence in this study is to first administer radiotherapy, followed by endocrine therapy. A meta-analysis has indicated that concurrent radiotherapy with endocrine therapy does not significantly impact the incidence of radiation-induced toxicity or survival rates compared to a sequential approach (1). In light of preclinical research suggesting enhanced therapeutic efficacy when radiotherapy is delivered prior to CDK4/6 inhibitors, we have opted to administer radiotherapy before the combination therapy of CDK4/6 inhibitors and hormone therapy (2).

      In our study, we collected data on surgical wound recovery. All 12 patients had Class I incisions, which healed by primary intention. The wounds exhibited no signs of redness, swelling, exudate, or fat necrosis.

      References:

      (1) Li Y F, Chang L, Li W H, et al. Radiotherapy concurrent versus sequential with endocrine therapy in breast cancer: A meta-analysis(J). The Breast, 2016, 27: 93-98.

      (2) Petroni G, Buqué A, Yamazaki T, et al. Radiotherapy delivered before CDK4/6 inhibitors mediates superior therapeutic effects in ER+ breast cancer(J). Clinical Cancer Research, 2021, 27(7): 1855-1863.

      Secondly, in the methodology, please describe the sample size estimation of this study and follow up details.

      Thanks for pointing out this crucial omission. Sample size estimation for this study and follow-up details have been added in the methodology section. The section on sample size estimation has been revised to state in Statistical analysis: “This exploratory study involves 12 patients, with the sample size determined based on clinical considerations, not statistical factors (Lines 210-211).” The section on follow up has been revised to state in Procedures section “A 5-year follow-up is conducted every 3 months during the first 2 years, and every 6 months for the subsequent 3 years. Additionally, safety data are collected within 90 days after surgery for subjects who discontinue study treatment (Lines 169-172).”

      Thirdly, in Table 1, the item HER2 expression, it's better to categorise HER2 into 0, 1+, 2+ and FISH-.

      Thank you very much for pointing out this issue. The item HER2 expression in Table 1 has been revised from “negative, 1+, 2+ and FISH-” to “0, 1+, 2+ and FISH-”.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      Lodhiya et al. demonstrate that antibiotics with distinct mechanisms of action, norfloxacin and streptomycin, cause similar metabolic dysfunction in the model organism Mycobacterium smegmatis. This includes enhanced flux through the TCA cycle and respiration as well as a build-up of reactive oxygen species (ROS) and ATP. Genetic and/or pharmacologic depression of ROS or ATP levels protect M. smegmatis from norfloxacin and streptomycin killing. Because ATP depression is protective, but in some cases does not depress ROS, the authors surmise that excessive ATP is the primary mechanism by which norfloxacin and streptomycin kill M. smegmatis. In general, the experiments are carefully executed; alternative hypotheses are discussed and considered; the data are contextualized within the existing literature.

      We thank the reviewer for the very comprehensive summary of the study.

      Strengths:

      The authors tackle a problem that is both biologically interesting and medically impactful, namely, the mechanism of antibiotic-induced cell death.

      Experiments are carefully executed, for example, numerous dose- and time-dependency studies; multiple, orthogonal readouts for ROS; and several methods for pharmacological and genetic depletion of ATP.

      There has been a lot of excitement and controversy in the field, and the authors do a nice job of situating their work in this larger context.

      Inherent limitations to some of their approaches are acknowledged and discussed e.g., normalizing ATP levels to viable counts of bacteria.

      We thank the reviewer for the encouraging comments.

      Weaknesses:

      All of the experiments performed here were in the model organism M. smegmatis. As the authors point out, the extent to which these findings apply to other organisms (most notably, slow-growing pathogens like M. tuberculosis) is to be determined. To avoid the perception of overreach, I would recommend substituting "M. smegmatis" for Mycobacteria (especially in the title and abstract).

      At first glance, a few of the results in the manuscript seem to conflict with what has been previously reported in the (referenced) literature. In their response to reviewers, the authors addressed my concerns. It would also be ideal to include a few lines in the manuscript briefly addressing these points. (Other readers may have similar concerns).

      In the first round of review, I suggested that the authors consider removing Figs. 9 and 10A-B as I believe they distract from the main point of the paper and appear to be the beginning of a new story rather than the end of the current one. I still hold this opinion. However, one of the strengths of the eLife model is that we can agree to disagree.

      We acknowledge the reviewer’s concern and have changed title of the manuscript by including Mycobacterium smegmatis instead of Mycobacteria. The abstract already mentioned the same.

      In the discussion section of the revised manuscript, we have already addressed and analysed our results extensively within the context of the available literature, regardless of whether our findings aligned with or differed from previous studies. We still believe that the mentioned discussion will help suffice to explain our results to the readers.

      In this manuscript we also sought to assess the bacteria's ability to counteract drug induced stresses, contributing to our understanding of how antibiotic tolerance develop in Mycobacterium smegmatis. Results presented in Figure 9 clearly demonstrate that M.smegmatis attempt to reduce respiration by decreasing flux through the complete TCA cycle, thereby mitigating ROS and ATP production in response to antibiotics.  Additionally, the bacterial response also included increased expression of the protein Eis, which is exemplar for intrinsic drug resistance, with a concomitant increase in mutation frequency, thereby hinting at the development of antibiotic tolerance followed by resistance. We still believe that these data should be included to support our observations and they make the study more comprehensive.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to test the hypothesis that ATP bursts are the predominant driver of antibiotic lethality of Mycobacteria

      Strengths:

      No significant strengths in the current state as it is written.

      Weaknesses:

      A major weakness is that M. smegmatis has a doubling time of three hours and the authors are trying to conclude that their data would reflect the physiology of M. tuberculossi that has a doubling time of 24 hours. Moreover, the authors try to compare OD measurements with CFU counts and thus observe great variabilities.

      Comments on revisions:

      I am surprised that the authors simply did not repeat the study in figure one with CFU counts and repeated in triplicate. Since this is M. smegmatis, it would take no longer than two weeks to repeat this experiment and replace the figure. I understand that obtaining CFU counts is much more laborious than OD measurements but it is necessary. Your graph still says that there is 0 bacteria at time 0, yet in your legend it says you started with 600,000 CFU/ml. I don't understand why this experiment was not repeated with CFU counts measured throughout. This is not a big ask since this is M. smegmatis but it appears that the authors do not want to repeat this experiment. Minimally, fix the graph to represent the CFU.

      We acknowledge the reviewer’s concern and have changed title of the manuscript by specifying Mycobacterium smegmatis instead of Mycobacteria.

      It is still not clear to the authors what the reviewer mean by OD measurements. All the data presented in the entire manuscript , including in Figure 1 are solely based on CFU measurements. So, as suggested by the reviewer, all experiments are already presented in terms of CFU.

    1. Author response:

      We thank the editors and reviewers for the constructive assessment. We plan to address the comments as follows:

      Reviewer #1 (Public review):

      We are generating a new cohort of Lv-TGFB2 overexpressing mice in which IOP will be compared under the anesthesia conditions that are identical for diurnal and nocturnal states. Parenthetically, we used the awake (diurnal) and isoflurane (nocturnal) anesthesia to mirror the conditions in the Patel et al (2021) PNAS study.

      Reviewer #2 (Public review):

      We are not sure what the Reviewer means by the “difference between the message and transcript data” and are not sure whether providing evidence about the TRPV4-dependence of the expression of fibrotic genes and canonical TGFb2 pathway genes fits within the scope of our study (which focuses on the TGFB2-dependence of TRPV4 expression and IOP regulation). We propose to address this by including new data about the TGFb2- and TRPV4 dependence of TRPV4 and Piezo1 expression. We could include information about the effect of TGFB2 on fibrosis-related genes from a (submitted study) in which we used RNASeq to investigate TGFB2 and TGFB2 + HC067047-dependence of gene expression in TM cells on a confidential basis but not include it in the revised manuscript.

      - Re:  b-tubulin comment  [b-tubulin associates with the plasma membrane by binding to integral membrane proteins in the plasma and organellar membranes, through palmitoylation and attachment to linker proteins and as an integral component of exocytotic vesicles (Wolff, BBA 2009; Hogerheide et al., PNAS 2017). Together with b-actin and Gapdh it is often used as a loading control to assess cellular TRPV4 protein expression (e.g., https://www.cellsignal.com/products/primary-antibodies/trpv4-antibody/65893; Grove et al., Science Signaling 2019 and Moore et al., PNAS 2013).  Our qPCR and RNASeq studies show that TGFB2 does not affect b-tubulin expression]

      - We will provide a higher resolution image for Fig. 4A

      - Will address the Fig 5A and 6A comment [We thank the Reviewer for noticing the ambiguity and revised Figure Legends to clarify that “pre-injection” in Figures 5B and 6B refers to IOP measurements before the intracameral injection of HC-06  not pre-injection of lentiviral constructs].

      -  We will address the issue of constitutive TRPV4 activity and Piezo1 involvement in the revised Discussion.

      We hope this is sufficient information at this point but would be more than happy to provide more information if needed.

      Thank you, we are very impressed by the eLife review protocols.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      This study provides valuable insights, addressing the growing threat of multi-drug-resistant (MDR) pathogens by focusing on the enhanced efficacy of colistin when combined with artesunate and EDTA against colistin-resistant Salmonella strains. The evidence is solid, supported by comprehensive microbiological assays, molecular analyses, and in vivo experiments demonstrating the effectiveness of this synergic combination. However, the discussion on the clinical application challenges of this triple combination is incomplete, and it would benefit from addressing the high risk associated with using three potential nephrotoxic agents in vivo.

      The development of novel pharmaceutical dosage forms, pharmacokinetic, pharmacodynamic and safety analysis of the triple combination will be further conducted in our next study to provide a theoretical basis for the next clinical drug use. The discussion of potential toxicity of AS, colistin, EDTA and the triple combination have been added in line 318 to 337.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) The study focuses on a limited number of Salmonella strains, and broader testing on various MDR pathogens would strengthen the findings.

      The number of COL-resistant clinical strains that actually used was larger than that mentioned in our original article, when evaluating the antimicrobial activities of AS, EDTA, COL alone or drug combinations. But, considering that there were superfluous results of mcr-1 positive Salmonella strains, we omitted these results (Table supplement 7 and 8 in revised supplement materials) to avoid redundant data presentation in the original article. Additionally, much more gram-negative and -positive MDR bacteria, such as Klebsiella pneumoniae, Pseudomonas aeruginosa and Staphylococcus aureus will be selected for the next study including the development of novel pharmaceutical dosage forms, pharmacokinetic, pharmacodynamic and safety analysis et al.

      (2) While the study elucidates several mechanisms, further molecular details could provide deeper insights into the interactions between these drugs and bacterial targets.

      In our next study, further molecular details will be focused on the regulatory targets of CheA and SpvD-related pathways, as well as the precise inhibition targets of MCR protein by the triple combination, through the generation of deletion or point mutations, and analysis of intermolecular interactions.

      (3) The time-kill experiment was conducted over 12 hours instead of the recommended 24 hours. To demonstrate a synergistic effect among the drugs, a reduction of at least 2 log10 in colony count should be shown in a 24-hour experiment. Additionally, clarifying the criteria for selecting drug concentrations is important to improve the interpretation of the results.

      The time-kill experiment of 24 hours have been re-executed and could be used to replace the Figure 1 in the original paper. The New Figure 1 has been uploaded and the change do not affect our interpretation of the result.

      Although in vitro studies have determined that with increasing dose of AS and EDTA, the antibacterial synergistic activity was gradually enhanced, and meanwhie, may also resulting in more toxic side effects. Thus, in our study, the 1/8 MICs of AS and EDTA were selected to ensure excellent antibacterial activity whereas minimize the potential toxicity. The instructions on the selection of drug concentration have been added in line 323 to 326.

      (4) While the combination of EDTA, artesunate, and colistin shows promising in vitro results against Salmonella strains, the clinical application of this combination warrants careful consideration due to potential toxicity issues associated with these compounds.

      The development of novel pharmaceutical dosage forms, pharmacokinetic, pharmacodynamic and safety analysis of the triple combination will be further conducted in our next study to provide a theoretical basis for the next clinical drug use.

      Reviewer #2 (Public Review):

      (1) The study by Zhai et al describes repurposing of artesunate, to be used in combination with EDTA to resensitize Salmonella spp. to colistin. The observed effect applied both to strains with and without mobile colistin resistance determinants (MCR). It was already known that EDTA in combination with colistin has an inhibitory effect on MCR-enzymes, but at the same time, both colistin and EDTA can contribute to nephrotoxicity, something which is also true for artesunate. Thus, the triple combination of three nephrotoxic agents has significant challenges in vivo, which is not particularly discussed in this paper.

      The discussion of potential toxicity of triple combination has been added in line 318 to 337.

      (2) The selection of strains is not very clear. Nothing is known about the sequence types of the strains or how representative they are for strains circulating in general. Thus, it is difficult to generalize from this limited number of isolates, although the studies done in these isolates are comprehensive.

      The tested strains in this study were all COL-resistant clinical isolates, and the genome sequencing and comparative analysis of these strains have not been analyzed. The antibacterial activities of different antimicrobial drugs against the S16 and S30 strains have been measured and listed in the Table supplement 9 within revised supplement materials. Considering that the number of COL-resistant clinical strains that actually used was larger than that mentioned in our original article (see the NO.1 response to the Public Reviewer #1), we think that the results obtained in this study could be representative to some extent.

      (3) Nothing is known about the susceptibility of the strains to other novel antimicrobial agents. Colistin has a limited role in the treatment of gram-negative infections, and although it can be used sometimes in combination, it is not clear why it would be combined with two other nephrotoxic agents and how this could have relevance in a clinical setting.

      The antibacterial activities of different antimicrobial drugs against the S16 and S30 strains have been measured and listed in the Table supplement 9 within revised supplement materials. Additionally, the discussion of potential toxicity of triple combination has been added in line 318 to 337.

      (4) It is not clear whether their transcriptomics analysis should at least be carried out in duplicate for reasons of being able to assess reproducibility. It is also not clear why the samples were incubated for 6 hours - no discussion is presented on the selection of a time point for this.

      As it can be seen from the time kill curves that the survival number of bacteria started to decrease after 4 h incubation of drug combinations. If the incubation time is too short (for example less than 4 h), the differentially expressed genes can not be fully revealed, while too long incubation time (such as 8 h and 12 h) may lead to a significant CFU reduction of bacteria, and result in inaccurate sequencing results. Therefore, we selected the incubation time 6 h, at which point drugs exhibited  significant antibacterial effects and there were also enough survival bacteria in the sample for transcriptome analysis. Each sample had three replications to preserve the accuracy of results.

      (5) Discussion is lacking on the reproducibility and selection of details for the methodology.

      The results obtained in this paper have been repeated several times, which indicated that the detailed operation steps described in the materials and methods section were reproducibility. To avoid redundancy, we did not include too much details in the discussion section.

      Reviewer #3 (Public Review):

      (1) Number of strains tested.

      The number of COL-resistant clinical strains that actually used was larger than that mentioned in our original article (see the NO.1 response to the Public Reviewer #1)

      (2) Response to comment: Lack of data on cytotoxicity.

      The pharmacokinetic, pharmacodynamic and safety analysis of the triple combination will be further conducted in our next study to provide a theoretical basis for the next clinical drug use.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Introduction:

      The introduction should provide more context about the pathogen Salmonella, its significance in both human and veterinary medicine, and the impact of colistin resistance in these pathogens. Salmonella is a leading cause of foodborne illnesses worldwide, resulting in substantial morbidity and mortality. It can cause a range of diseases, from gastroenteritis to more severe systemic infections like typhoid fever and invasive non-typhoidal salmonellosis. In veterinary medicine, Salmonella infections can lead to significant economic losses in livestock industries due to illness and death among animals, as well as through the contamination of animal products.

      The description has been added in the introduction section in line 47 to 53.

      (2) Results and Discussion:

      (1) While the combination of EDTA, artesunate, and colistin shows promising in vitro results against Salmonella, the clinical application of this combination warrants careful consideration due to potential toxicity issues associated with these compounds. Colistin is known for nephrotoxicity and neurotoxicity, limiting its use to severe cases where the benefits outweigh the risks. EDTA, as a chelating agent, can disrupt essential metal ions in the body, posing risks of metabolic imbalances. Although it has clinical applications, primarily in cases of heavy metal poisoning, its use as an adjuvant in antibiotics may present risks. Although generally well-tolerated for malaria, interactions of artesunate with other drugs and long-term safety in combined therapies require thorough investigation.

      The discussion of potential toxicity of triple combination has been added in line 318 to 337.

      (2) Table 1: The manuscript mentions that some strains used in the study are mcr-positive and mcr-negative. It is important to indicate in Table 1, in addition to the identification of Salmonella species, which strains are mcr-positive or mcr-negative.

      The relevant information has been added in Table 1.

      (3) Figure 2: What is the authors' hypothesis regarding the growth curves labeled "a" and "e" where strains JS and S16 resume growth 12 hours after treatment with AS? In the legend of Figure 2, describe what was used as the "positive control group."

      The growth curves labeled “a” and “e” were in Figure 1. After incubated with AC for 8 h, the survival CFUs of JS and S16 strains showed a slightly reduction, but there were still living cells. Since the bactericidal activity of AC is not strong enough to exert sustained bactericidal activity, these remaining living cells will resume growth after treatment with AC for 12 h. The “positive control group” in the legend of Figure 2 has been indicated in line 724.

      (4) What is the authors' hypothesis for the differences observed in the transcriptome and metabolome?

      The changes in gene transcription level may cause corresponding changes in protein level, but these proteins are not all involved in the bacterial metabolic process. For example, MCR protein  is encoded by the COL resistance related gene mcr, which mediates the modification of lipid A, but are not involved in the cellular metabolic process. Therefore, the transcriptome change of mcr gene may affect the protein production of MCR, nor the bacterial metabolic processes, so there are differences observed in the transcriptome and metabolome.

      (5) In some parts of the text, the authors state that artesunate and EDTA potentiate the action of colistin, which is a bacteriostatic drug. However, in other parts, the authors describe the effect of the AEC combination as bacteriostatic (Abstract: line 32; Results: line 179). How do the authors explain this inconsistency?

      The artesunate and EDTA could be regarded as “adjuvants” for the bacteriostatic drug colistin. Adjuvants itself act no or weak antibacterial effect on bacteria. For antimicrobial drugs, the “adjuvants” are compounds that generally used in combination with antibacterial drugs to re-sensitizing bacteria that have developed drug resistance. Thus, in this paper the AEC combination could be regared as bacteriostatic.

      (6) According to Brennan & Kirby (2019; doi: 10.1016/j.cll.2019.04.002), to evaluate the synergism between different drug combinations, bacterial growth curves need to be assessed over 24 hours. If the colony count is {greater than or equal to} 2 log10 lower than that of the most active antimicrobial alone, the combination is considered synergistic. Based on the growth curve results shown in Figure 1, the experiment was conducted for 12 hours, and in some cases, only a small reduction in growth was observed, even at the maximum concentration of colistin. Moreover, in some cases, the curve resumes rising between 8 and 12 hours. What is the authors' hypothesis in this case? It is important to conduct the assay over 24 hours to confirm the synergism between these drugs.

      The time-kill experiment of 24 hours have been re-executed and could be used to replace the Figure 1 in the original paper. Additionally, the phenomenon that “the curve resumes rising between 8 and 12 hours” has been explained in the response to comment of “Reviewer #1 (Recommendations For The Authors), Results and Discussion, (3) Figure 2”.

      (7) To prove that CheA and SpvD play a critical role in the effect of the AEC combination, deletion of these genes should be performed, and the mutant strains should be tested.

      The deletion of cheA and spvD will be carried out in our next study.

      (8) To demonstrate that the flagellum is no longer assembled, a transmission electron microscopy image using antibodies against flagellin should be performed, along with motility tests.

      The motility assays have been performed and displayed as Figure supplement 5 in the revised supplement materials.

      (9) Figure 7: In the X-axis legend, specify what "model" refers to.

      The “model” refers to the PBS control group that mice were treated with PBS after the intraperitoneal injection of 100 µL bacterial solution (1.31 × 10<sup>5</sup> CFU).

      (10) Figure 8 Legend: In the legend of Figure 8 (line 717), are the authors referring to E. coli or Salmonella?

      It referred to Salmonella, which has already been illustrated in the headline of Figure 8 in the revised manuscript.

      (3) Materials and Methods:

      (1) Bacterial Strains and Agents: It would be beneficial to include in the table the species of the strains used in the study, as well as the concentrations of colistin, artesunate, and EDTA utilized (lines 321 - 332).

      We have ever tried to add the above information to Table 1, but the addition of this information would make the table too large and beyond the margins, which is not conducive to the layout design of the table, so we chose to display these information in the materials and methods section instead of the table.

      (2) Antibacterial Activity In Vitro: Ensure clarity and well-defined ranges for the concentrations of colistin, EDTA, and artesunate used separately and in combinations (lines 335 - 344).

      The drug concentrations have been listed in line 369 to 371.

      (3) Time-Kill Assays: Clarify the criteria for selecting concentrations, whether based on MICs or peak and trough concentrations relevant to human and animal treatments with colistin (lines 345 - 351).

      Although in vitro studies have determined that with increasing dose of AS and EDTA, the antibacterial synergistic activity was gradually enhanced, and meanwhie, may also resulting in more toxic side effects. Thus, in our study, the 1/8 MICs of AS and EDTA were selected to ensure excellent antibacterial activity whereas minimize the potential toxicity. The instructions on the selection of drug concentration have been added in line 323 to 326.

      (4) General Corrections: Throughout the manuscript, correct typographical errors and consistently include the concentration values in mg/L alongside the MIC fractions. Specify the strains used for all experiments to ensure clarity. In the manuscript, the term "medication regimens" is used to describe the experimental setups involving different combinations of drugs tested in vitro. To improve accuracy and clarity, it is recommended to use the term "drug combination" instead. This term is more appropriate for in vitro experiments and will help avoid confusion with clinical treatment protocols.

      The typographical errors have been checked and corrected throughout the manuscript, and the “medication regimens” have been replaced by “drug combinations”.

      Reviewer #2 (Recommendations For The Authors):

      Please see above for recommendations on what can be done to improve the manuscript.

      While other omics analyses have been conducted herein, the authors do not comment on the genomic analysis of their own strains. It would have been a natural step to sequence all the strains used in the experiments.

      Due to limited program funding, the genome sequencing and comparative analysis of these strains have not been analyzed. The antibacterial activities of different antimicrobial drugs against the S16 and S30 strains have been measured and listed in the Table supplement 9 within revised supplement materials.

      Some minor comments:

      (1) There are some spelling errors - e.g. "bacteria strains" instead of "bacterial strains".

      The grammar and spelling errors have been corrected throughout the manuscript.

      (2) I would avoid words like "unfortunately".

      The word “unfortunately” has been changed.

      (3) Some MIC-values in Table 1 seem incorrect - e.g. 24 mg/L. This is not a 2-log value - the value should be 32 mg/L if the dilution series has been carried out correctly.

      We are so sorry for the mistake. The data has been corrected, and we also checked other data.

      Reviewer #3 (Recommendations For The Authors):

      Below are some suggestions.

      (1) Sentences L47 & L48 "Infections with antibiotic-resistant pathogens, especially carbapenemase-producing Enterobacteriaceae, represent an impending catastrophe of a return to the pre-antibiotic era" - this is slightly exaggerated! I also wonder if we need to use Enterobacterales instead of Enterobacteriaceae.

      The sentences in L47 & L48 have been changed. We googled the “carbapenemase-producing Enterobacteriaceae” and found it is a high-frequency word in numerous reports.

      (2) L48. The drying up of the antibiotic discovery pipeline is NOT necessarily the reason to use colistin as a drug of last resort!

      The sentence has been revised.

      (3) The manuscript requires extensive English editing but has merit based on the strong compilation of data.

      We have optimized and revised the writing of the whole article.

      (4) I suggest the authors have some data on the cytotoxicity of AS alone, colistin alone, and both of them against eucaryotic cells (Caco-) and if possible determine IS (index selectivity). This additional experiment will strengthen the quality of the manuscript. The authors must also explain how to put such tri-therapy into practice.

      The development of novel pharmaceutical dosage forms, pharmacokinetic, pharmacodynamic and safety analysis of the triple combination will be further conducted in our next study to provide a theoretical basis for the next clinical drug use. The discussion of potential toxicity of AS, colistin, EDTA and the triple combination have been added in line 318 to 337.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this study, Bu et al examined the dynamics of TRPV4 channel in cell overcrowding in carcinoma conditions. They investigated how cell crowding (or high cell confluence) triggers a mechano-transduction pathway involving TRPV4 channels in high-grade ductal carcinoma in situ (DCIS) cells that leads to large cell volume reduction (or cell volume plasticity) and proinvasive phenotype.

      In vitro, this pathway is highly selective for highly malignant invasive cell lines derived from a normal breast epithelial cell line (MCF10CA) compared to the parent cell line, but not present in another triple-negative invasive breast epithelial cell line (MDA-MB-231). The authors convincingly showed that enhanced TRPV4 plasma membrane localization correlates with highgrade DCIS cells in patient tissue samples.

      Specifically in invasive MCF10DCIS.com cells, they showed that overcrowding or overconfluence leads to a decrease in cell volume and intracellular calcium levels. This condition also triggers the trafficking of TRPV4 channels from intracellular stores (nucleus and potentially endosomes), to the plasma membrane (PM). When these over-confluent cells are incubated with a TRPV4 activator, there is an acute and substantial influx of calcium, attesting to the fact that there are a high number of TRPV4 channels present on the PM. Long-term incubation of these over-confluent cells with the TRPV4 activator results in the internalization of the PMlocalized TRPV4 channels.

      In contrast, cells plated at lower confluence primarily have TRPV4 channels localized in the nucleus and cytosol. Long-term incubation of these cells at lower confluence with a TRPV4 inhibitor leads to the relocation of TRPV4 channels to the plasma membrane from intracellular stores and a subsequent reduction in cell volume. Similarly, incubation of these cells at low confluence with PEG 3000 (a hyperosmotic agent) promotes the trafficking of TRPV4 channels from intracellular stores to the plasma membrane.

      Strengths:

      The study is elegantly designed and the findings are novel. Their findings on this mechanotransduction pathway involving TRPV4 channels, calcium homeostasis, cell volume plasticity, motility, and invasiveness will have a great impact in the cancer field and are potentially applicable to other fields as well. Experiments are well-planned and executed, and the data is convincing. The authors investigated TRVP4 dynamics using multiple different strategies- overcrowding, hyperosmotic stress, and pharmacological means, and showed a good correlation between different phenomena.

      Weaknesses:

      A major emphasis in the study is on pharmacological means to relate TRPV4 channel function to the phenotype. I believe the use of genetic means would greatly enhance the impact and provide compelling proof for the involvement of TRPV4 channels in the associated phenotype.

      In this regard, I wonder if siRNA-mediated knockdown of TRPV4 in over-confluent cells (or knockout) would lead to an increase in cell volume and normalize the intracellular calcium levels back to normal, thus ultimately leading to a decrease in cell invasiveness.

      We greatly appreciate the positive feedback regarding the design of our study and the novelty of our findings. We also acknowledge the valuable suggestion to complement our pharmacological approaches with genetic manipulation of TRPV4.

      In response to the comment regarding siRNA-mediated knockdown or knockout of TRPV4, we fully agree that this would further substantiate our findings. In the revised manuscript, we implemented shRNA targeting TRPV4 to investigate its functional effects on intracellular calcium level changes, cell volume plasticity, and invasiveness phenotypes, assessed through singlecell motility assays under cell crowding or hyperosmotic stress. These results have been incorporated into the revised manuscript, and detailed descriptions of these findings are included below.

      Using the shRNA approach that resulted in ~50% reduction of TRPV4 expression

      (Supplementary Figure 6A and 6B show TRPV4 expression levels via IF and immunoblots, respectively), we examined the effect of reduced TRPV4 on intracellular calcium levels in MCF10DCIS.com cells under normal density (ND) and stress conditions (confluent; Con and hyperosmotic; PEG) using Fluo-4 AM imaging (Fig. 4S-X). We found that shRNA TRPV4 slightly decreased calcium levels in ND cells, likely due to fewer active calcium channels at the plasma membrane resulting from lower TRPV4 expression (as shown in the summary plot in Fig. 4W). With fewer active calcium channels, cells treated with shRNA TRPV4 exhibited less reduction in intracellular calcium levels under cell crowding conditions compared to control cells. Additionally, hyperosmotic stress using PEG 300 induced smaller calcium spikes in shRNA cells compared to the significant spike observed in control cells. This reduced calcium response to Con and hyperosmotic stress in shRNA cells was reflected in the decreased cell volume reduction by PEG 300 shown in Fig. 4Y. Consequently, shRNA-mediated TRPV4 reduction impaired cell volume plasticity in MCF10DCIS.com cells and abolished the pro-invasive mechanotransduction capability involving cell volume reduction, as evidenced by no increase in cell motility (both cell diffusivity and directionality) under hyperosmotic conditions (Fig. 5H-J). These findings demonstrate the critical role of TRPV4 in conferring pro-invasive

      mechanotransduction capability to MCF10DCIS.com cells through cell volume reduction.

      Reviewer #2 (Public review):

      Summary:

      The metastasis poses a significant challenge in cancer treatment. During the transition from non-invasive cells to invasive metastasis cells, cancer cells usually experience mechanical stress due to a crowded cellular environment. The molecular mechanisms underlying mechanical signaling during this transition remain largely elusive. In this work, the authors utilize an in vitro cell culture system and advanced imaging techniques to investigate how non-invasive and invasive cells respond to cell crowding, respectively.

      Strengths:

      The results clearly show that pre-malignant cells exhibit a more pronounced reduction in cell volume and are more prone to spreading compared to non-invasive cells. Furthermore, the study identifies that TRPV4, a calcium channel, relocates to the plasma membrane both in vitro and in vivo (patient samples). Activation and inhibition of the TRPV4 channel can modulate the cell volume and cell mobility. These results unveil a novel mechanism of mechanical sensing in cancer cells, potentially offering new avenues for therapeutic intervention targeting cancer metastasis by modulating TRPV4 activity. This is a very comprehensive study, and the data presented in the paper are clear and convincing. The study represents a very important advance in our understanding of the mechanical biology of cancer.

      Weaknesses:

      However, I do think that there are several additional experiments that could strengthen the conclusions of this work. A critical limitation is the absence of genetic ablation of the TRPV4 gene to confirm its essential role in the response to cell crowding.

      We are deeply grateful for the positive assessment of our study and its contribution to advancing our understanding of mechanical signaling in cancer progression. We also greatly appreciate the suggestion to incorporate genetic ablation experiments to further validate the role of TRPV4 in cell crowding responses.

      As noted in our response to Reviewer #1, we employed an shRNA approach to investigate the functional effects of TRPV4 knockdown on intracellular calcium level changes, cell volume plasticity, and invasiveness phenotypes. We assessed these effects using Fluo-4 AM calcium assay, single-cell volume measurements, and single-cell motility assays under cell crowding or hyperosmotic stress. These results have been incorporated into the revised manuscript and are described in detail in our response to Reviewer #1's "weaknesses" comment.

      Reducing TRPV4 expression levels by shRNA diminished mechanosensing intracellular calcium changes under cell crowding and hyperosmotic conditions using PEG 300 treatment. Furthermore, a significantly reduced cell volume plasticity was observed under hyperosmotic conditions in shRNA treated cells compared to control cells (Fig. 4S-X). This diminished mechanosensing capability abolished the pro-invasive mechanotransduction effect, as assessed by single cell motility under hyperosmotic conditions (Fig. 5H-J). These findings demonstrate the critical role of TRPV4 in conferring pro-invasive mechanotransduction capability to MCF10DCIS.com cells through cell volume reduction.

      Reviewer #1 (Recommendations for the authors):

      The way the results or discussion section is written. It was a little confusing for me to relate to some phenomena. For example, it is not clear how TRPV4 inhibition (due to overcrowding) leads to a decrease in intercellular calcium levels, especially when TRPV4 channels were intercellular (not on the PM) to begin with (in normal density (ND) conditions). Along the same lines, how GSK219 causes a dip in calcium levels in ND cells when TRPV4 channels are primarily intercellular (Figure 4E). If most of the TRPV4 channels that are translocated to the PM in response to cell crowding are in an inactive state, how do they confer enhanced cell volume plasticity relative to non-invasive cell lines?

      Thank you very much for raising this important point. We fully agree with your concern and have significantly revised the manuscript to clarify this aspect. Specifically, we have emphasized that a modest level of TRPV4 channels are constitutively active at the plasma membrane in normal density (ND) cells. This is now discussed in detail in the context of Fig. 4:

      Page 14: “Considering these factors, we hypothesized that cell crowding might inhibit calcium-permeant ion channels that are constitutively active at the plasma membrane, including TRPV4, which would then lower intracellular calcium levels and subsequently reduce cell volume via osmotic water movement.”

      Page 16-17: “… However, the temporal profile of Fluo-4 intensity in Fig. 4E, which corresponds to the time points marked in Fig. 4D (t<sub>1</sub>: baseline and t<sub>2</sub>: dip), clearly shows the dip at t<sub>2</sub>, indicated by ΔCa (the vertical dashed line between the dip and baseline). This modest Fluo-4 dip at t<sub>2</sub> represents the inhibition of activity by GSK219 on a small population of constitutively active TRPV4 channels at the plasma membrane under ND conditions.

      In Con cells, 1 nM GSK219 caused a smaller dip in Fluo-4 intensity compared to the one observed in ND cells, with no subsequent changes. This is likely due to fewer constitutively active TRPV4 at the plasma membrane in Con cells than in ND cells. …These findings suggest that a substantial portion of TRPV4 channels relocated to the plasma membrane under cell crowding was inactive, and some constitutively active TRPV4 channels already present in the membrane became inactive as a result of cell crowding.”

      'Internalization' might be a better word than 'uptake' in the following line in the results section

      "...activating TRPV4 under cell crowding conditions triggered channel uptake, indicating that TRPV4 trafficking depended on the channel's activation status."

      Thank you very much for this suggestion. As recommended, we replaced ‘uptake’ with internalization’ on page 18: 

      “However, in Con cells, where a large number of inactive TRPV4 channels are likely located at the plasma membrane, GSK101 treatment notably reduced plasma membrane-associated TRPV4 in a dose-dependent manner through internalization (Fig. 4O, 4Q), consistent with previous findings65. These data suggest that plasma membrane TRPV4 levels were largely

      regulated by the channel activity status. Specifically, channel activation led to the internalization of TRPV4, while channel inhibition promoted the relocation of TRPV4 to the plasma membrane.”

      1. Out of curiosity:

      2. Is there any information on what the intercellular TRPV4 channels are doing in the cytosol and in the nucleus? Is there any role of intercellular calcium stores in the proposed pathway?

      We greatly appreciate this insightful question. Although we were unable to find studies specifically exploring the roles of cytosolic TRPV4, a recent study (Reference 74) identified a role for nuclear TRPV4 in regulating calcium within the nucleus. We speculate that when TRPV4 activity is severely impaired, such as with additional TRPV4 inhibition under cell crowding conditions, some TRPV4 channels may be redirected to the nucleus. This redistribution could help maintain nuclear calcium homeostasis.

      This discussion is included on page 18 of the manuscript:

      “These findings suggest that further TRPV4 inhibition under crowding conditions triggers a distinct trafficking alteration. Recent studies have implicated nuclear TRPV4 in regulating nuclear Ca2+ homeostasis and Ca2+-regulated transcription74. In light of this study and our findings, TRPV4 may relocate to the nucleus as a compensatory mechanism to maintain nuclear calcium regulation. This relocation could reflect an adaptive response to preserve calcium-dependent transcriptional programs or other nuclear processes essential for cell survival under mechanical stress.”

      One recommendation is to add some explanation or some minor details for the convenience of the reader. For example:

      At normal or lower confluence, cells show an acute large dip in intercellular calcium when an inhibitor is applied implying that there are a few TRPV4 channels on the PM and they are constitutively active.

      Thank you very much for highlighting this important point and for the helpful suggestion to improve clarity. We have significantly revised the text associated with Fig. 4 to ensure this point is clear. Specifically, we have added the following explanation on page 16:

      "This modest Fluo-4 dip at t2 represents the inhibition of activity by GSK219 on a small population of constitutively active TRPV4 channels at the plasma membrane under ND conditions."

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1. The authors frequently change the medium to prevent acidification in overconfluent cultures. A cell viability assay should be performed to ensure that the over-confluent cells remain healthy and viable during the experiments. There are commercial kits that can be easily used to quantify the number of viable cells and the extent of cell toxicity. The number of viable cells would provide a more reliable basis for comparison between normal density and overconfluent conditions.

      Thank you very much for raising this important point. We have consistently observed that cell crowding does not induce significant cell death in MCF10DCIS.com cells. To address your recommendation, we performed a viability assay using propidium iodide (PI) to selectively stain dead cells and WGA-488 to stain all live cells. Cell death was quantified under normal density (ND) conditions and at 1, 3, 5, 7, and 10 days post-confluence.

      Our results indicate that cells remain similarly viable post-confluence, with minimal cell death

      (~1.5%) compared to ND cells (~0.75%). These findings are summarized in Supplementary Figure 2, demonstrating that over-confluent cultures remain healthy and viable during the experiments.

      (2) Figure 2. To determine whether the reduction in cell volume is reversible, over-confluent cells can be further diluted back to normal density. Additionally, the reversibility of TRPV4 channel trafficking to the plasma membrane should be assessed under these conditions in IF experiments and cell surface biotinylation.

      Thank you for this suggestion. We reseeded the previously overcrowded (OC) cells at normal density and observed that their TRPV4 distribution predominantly returned to being intracellular, with only modest plasma membrane localization, as shown by line analysis (Supplementary Figure 10A-C, page 13). Furthermore, their invasiveness decreased to levels comparable to the original normal density (ND) cells (Supplementary Figure 3C and 3E, page 6). These results demonstrate the reversibility of TRPV4 trafficking changes and the increase in invasiveness under mechanical stress.

      Page 6. "The enhanced invasiveness of MCF10DCIS.com cells under cell crowding was largely reversible. When OC cells were reseeded at normal density for invasion assays, their invasive cell fraction decreased to approximately 15%, slightly lower (p = 0.012) than the initial value of around 24% (Suppl. Fig. 3C, 3E)."

      Page 13. “We investigated whether TRPV4 relocation to the plasma membrane induced by cell crowding is reversible, as suggested by its impact on invasiveness (Suppl. Fig. 3E). To test this, previously OC MCF10DCIS.com cells were reseeded under ND conditions. We then assessed TRPV4 localization via immunofluorescence (IF) imaging to determine if most channels returned to the cytoplasm and could be relocated to the plasma membrane under mechanical stress, such as hyperosmotic conditions. Consistent with their initial ND state, reseeded ND MCF10DCIS.com cells displayed intracellular TRPV4 distribution (Suppl. Fig. 10A). Upon exposure to hyperosmotic stress (74.4 mOsm/Kg PEG300), TRPV4 was again relocated to the plasma membrane (Suppl. Fig. 10B). These findings, quantified through line analysis (Suppl. Fig. 10C), demonstrate that the mechanosensing response of MCF10DCIS.com cells is reversible.”

      (3) Figure 3B. A control using intracellular proteins such as GAPDH or Tubulin is missing. Including this control would help exclude the possibility of cell rupture or compromised cell membranes in crowded environments, which is very common in a cell crowding environment.

      Thank you very much for pointing this out. The control lanes (GAPDH) were already included in the full gel results shown in Supplementary Figure 5. For the immunoprecipitation and immunoblotting of surface-biotinylated cell lysates, we did not expect to detect GAPDH; however, some GAPDH signals were still observed. As shown for MCF10DCIS.com cells, less GAPDH was detected under OC conditions, but the immunoprecipitated samples displayed significantly higher levels of TRPV4 on the cell surface compared to ND cells (Supplementary Figure 5A). For the whole cell lysates, TRPV4 protein levels were comparable across different cell lines based on the immunoblot results, with consistent GAPDH signals serving as a loading control (Supplementary Figure 5B).

      (4) Figure 4. To convincingly demonstrate TRPV4 relocation to the plasma membrane, IF should be performed under non-permeable conditions (i.e., without detergents like saponin). This approach ensures that only plasma membrane proteins are accessible to antibodies, reducing intracellular background. The same approach should be applied to Piezo1 and TfR.

      Thank you for this suggestion. We observed that under non-permeable conditions, primary antibodies could still access intracellular proteins. To address this issue, we employed extracellular-binding TRPV4 antibodies to selectively detect TRPV4 relocation to the plasma membrane under hyperosmotic conditions (74.4 mOsm/kg PEG 300) in live MCF10DCIS.com cells, as shown in Supplementary Figure 9. These results clearly demonstrate the plasma membrane relocation of TRPV4 under hyperosmotic conditions, distinguishing it from control conditions. Unfortunately, we were unable to identify high-affinity extracellular-binding antibodies for Piezo1 and TfR. Nevertheless, our findings strongly support the mechanosensing plasma membrane relocation of TRPV4.

      Essential Weakness:

      Throughout the study, only TRPV4 inhibitors and activators were used to show that TRPV4 relocation is associated with intracellular calcium concentration and cell size changes. It is crucial to use TRPV4 KO or KD cells to confirm that the observed effects are specific to TRPV4 and not due to off-target effects on other proteins. Additionally, fusing a plasma membrane targeting sequence to TRPV4 to make a constitutive plasma membrane-localized construct could demonstrate the opposite effect.

      Thank you very much for this important comment. As noted in our response to Reviewer #1, we employed an shRNA approach to investigate the functional effects of TRPV4 knockdown on intracellular calcium level changes, cell volume plasticity, and invasiveness phenotypes. We assessed these effects using Fluo-4 AM calcium assay, single-cell volume measurements, and single-cell motility assays under cell crowding or hyperosmotic stress. These results have been incorporated into the revised manuscript and are described in detail in our response to Reviewer #1's "weaknesses" comment.

      Reducing TRPV4 expression levels by shRNA diminished mechanosensing intracellular calcium changes under cell crowding and hyperosmotic conditions using PEG 300 treatment. Furthermore, a significantly reduced cell volume plasticity was observed under hyperosmotic conditions in shRNA treated cells compared to control cells (Fig. 4S-X). This diminished mechanosensing capability abolished the pro-invasive mechanotransduction effect, as assessed by single cell motility under hyperosmotic conditions (Fig. 5H-J). These findings demonstrate the critical role of TRPV4 in conferring pro-invasive mechanotransduction capability to MCF10DCIS.com cells through cell volume reduction.

      Minor Points:

      The introduction section is poorly written; many results currently included in the introduction would be more appropriately placed in the discussion section. The long redundant introduction makes the article hard to read through.

      Thank you very much for pointing this out. In the revised introduction, we have significantly reduced references to the results, streamlining the section to make it more concise and focused. This adjustment ensures the introduction is clearer and avoids redundancy, improving the readability of the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      "Neural noise", here operationalized as an imbalance between excitatory and inhibitory neural activity, has been posited as a core cause of developmental dyslexia, a prevalent learning disability that impacts reading accuracy and fluency. This is study is the first to systematically evaluate the neural noise hypothesis of dyslexia. Neural noise was measured using neurophysiological (electroencephalography [EEG]) and neurochemical (magnetic resonance spectroscopy [MRS]) in adolescents and young adults with and without dyslexia. The authors did not find evidence of elevated neural noise in the dyslexia group from EEG or MRS measures, and Bayes factors generally informed against including the grouping factor in the models. Although the comparisons between groups with and without dyslexia did not support the neural noise hypothesis, a mediation model that quantified phonological processing and reading abilities continuously revealed that EEG beta power in the left superior temporal sulcus was positively associated with reading ability via phonological awareness. This finding lends support for analysis of associations between neural excitatory/inhibitory factors and reading ability along a continuum, rather than as with a case/control approach, and indicates the relevance of phonological awareness as an intermediate trait that may provide a more proximal link between neurobiology and reading ability. Further research is needed across developmental stages and over a broader set of brain regions to more comprehensively assess the neural noise hypothesis of dyslexia, and alternative neurobiological mechanisms of this disorder should be explored.

      Strengths:

      The inclusion of multiple methods of assessing neural noise (neurophysiological and neurochemical) is a major advantage of this paper. MRS at 7T confers an advantage of more accurately distinguishing and quantifying glutamate, which is a primary target of this study. In addition, the subject-specific functional localization of the MRS acquisition is an innovative approach. MRS acquisition and processing details are noted in the supplementary materials using according to the experts' consensus recommended checklist (https://doi.org/10.1002/nbm.4484). Commenting on rigor the EEG methods is beyond my expertise as a reviewer.

      Participants recruited for this study included those with a clinical diagnosis of dyslexia, which strengthens confidence in the accuracy of the diagnosis. The assessment of reading and language abilities during the study further confirms the persistently poorer performance of the dyslexia group compared to the control group.

      The correlational analysis and mediation analysis provide complementary information to the main case-control analyses, and the examination of associations between EEG and MRS measures of neural noise is novel and interesting.

      The authors follow good practice for open science, including data and code sharing. They also apply statistical rigor, using Bayes Factors to support conclusions of null evidence rather than relying only on non-significant findings. In the discussion, they acknowledge the limitations and generalizability of the evidence and provide directions for future research on this topic.

      Weaknesses:

      Though the methods employed in the paper are generally strong, the MRS acquisition was not optimized to quantify GABA, so the findings (or lack thereof) should be interpreted with caution. Specifically, while 7T MRS affords the benefit of quantifying metabolites, such as GABA, without spectral editing, this quantification is best achieved with echo times (TE) of 68 or 80 ms in order to minimize the spectral overlap between glutamate and GABA and reduce contamination from the macromolecular signal (Finkelman et al., 2022, https://doi.org/10.1016/j.neuroimage.2021.118810). The data in the present study were acquired at TE=28 ms, and are therefore likely affected by overlapping Glu and GABA peaks at 2.3 ppm that are much more difficult to resolve at this short TE, which could directly affect the measures that are meant to characterize the Glu/GABA+ ratio/imbalance. In future research, MRS acquisition schemes should be optimized for the acquisition of Glutamate, GABA, and their relative balance.

      As the authors note in the discussion, additional factors such as MRS voxel location, participant age, and participant sex could influence associations between neural noise and reading abilities and should be considered in future studies.

      We have modified Figure 2 and revised the paragraph discussing the MRS methodological limitations in accordance with Reviewer #1's recommendations. Additionally, we have included the CRLB and linewidth thresholds in the Results section. Furthermore, a new figure showing the correlations between EEG and MRS biomarkers has been added (Figure 3).

      Appraisal:

      The authors present a thorough evaluation of the neural noise hypothesis of developmental dyslexia in a sample of adolescents and young adults using multiple methods of measuring excitatory/inhibitory imbalances as an indicator of neural noise. The authors concluded that there was not support for the neural noise hypothesis of dyslexia in their study based on null significance and Bayes factors. This conclusion is justified, and further research is called for to more broadly evaluate the neural noise hypothesis in developmental dyslexia.

      Impact:

      This study provides an exemplar foundation for the evaluation of the neural noise hypothesis of dyslexia. Other researcher may adopt the model applied in this paper to examine neural noise in various populations with/without dyslexia, or across a continuum of reading abilities, to more thoroughly examine evidence (or lack thereof) for this hypothesis. Notably, the lack of evidence here does not rule out the possibility for a role of neural noise in dyslexia, and the authors point out that presentation with co-occurring conditions, such as ADHD, may contribute to neural noise in dyslexia. Dyslexia remains a multi-faceted and heterogenous neurodevelopmental condition, and many genetic, neurobiological and environmental factors play a role. This study demonstrates one step toward evaluating neurobiological mechanisms that may contribute to reading difficulties.

      Reviewer #2 (Public review):

      Summary:

      This study utilized two complimentary techniques (EEG and 7T MRI/MRS) to directly test a theory of dyslexia: the neural noise hypothesis. The authors report finding no evidence to support an excitatory/inhibitory balance, as quantified by beta in EEG and Glutamate/GABA ratio in MRS. This is important work and speaks to one potential mechanism by which increased neural noise may occur in dyslexia.

      Strengths:

      This is a well conceived study with in depth analyses and publicly available data for independent review. The authors provide transparency with their statistics and display the raw data points along with the averages in figures for review and interpretation. The data suggest that an E/I balance issue may not underlie deficits in dyslexia and is a meaningful and needed test of a possible mechanism for increased neural noise.

      Weaknesses:

      The researchers did not include a visual print task in the EEG task, which limits analysis of reading specific regions such as the visual word form area, which is a commonly hypoactivated region in dyslexia. This region is a common one of interest in dyslexia, yet the researchers measured the I/E balance in only one region of interest, specific to the language network.

      Reviewer #3 (Public review):

      Summary:

      This study by Glica and colleagues utilized EEG (i.e., Beta power, Gamma power, and aperiodic activity) and 7T MRS (i.e., MRS IE ratio, IE balance) to reevaluating the neural noise hypothesis in Dyslexia. Supported by Bayesian statistics, their results show convincing evidence of no differences in EI balance between groups, challenging the neural noise hypothesis.

      Strengths:

      Combining EEG and 7T MRS, this study utilized both the indirect (i.e., Beta power, Gamma power, and aperiodic activity) and direct (i.e., MRS IE ratio, IE balance) measures to reevaluating the neural noise hypothesis in Dyslexia.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      When you search for something, you need to maintain some representation (a "template") of that target in your mind/brain. Otherwise, how would you know what you were looking for? If your phone is in a shocking pink case, you can guide your attention to pink things based on a target template that includes the attribute 'pink'. That guidance should get you to the phone pretty effectively if it is in view. Most real-world searches are more complicated. If you are looking for the toaster, you will make use of your knowledge of where toasters can be. Thus, if you are asked to find a toaster, you might first activate a template of a kitchen or a kitchen counter. You might worry about pulling up the toaster template only after you are reasonably sure you have restricted your attention to a sensible part of the scene.

      Zhou and Geng are looking for evidence of this early stage of guidance by information about the surrounding scene in a search task. They train Os to associate four faces with four places. Then, with Os in the scanner, they show one face - the target for a subsequent search. After an 8 sec delay, they show a search display where the face is placed on the associated scene 75% of the time. Thus, attending to the associated scene is a good idea. The questions of interest are "When can the experimenters decode which face Os saw from fMRI recording?" "When can the experimenters decode the associated scene?" and "Where in the brain can the experimenters see evidence of this decoding? The answer is that the face but not the scene can be read out during the face's initial presentation. The key finding is that the scene can be read out (imperfectly but above chance) during the subsequent delay when Os are looking at just a fixation point. Apparently, seeing the face conjures up the scene in the mind's eye.

      This is a solid and believable result. The only issue, for me, is whether it is telling us anything specifically about search. Suppose you trained Os on the face-scene pairing but never did anything connected to the search. If you presented the face, would you not see evidence of recall of the associated scene? Maybe you would see the activation of the scene in different areas and you could identify some areas as search specific. I don't think anything like that was discussed here.

      You might also expect this result to be asymmetric. The idea is that the big scene gives the search information about the little face. The face should activate the larger useful scene more than the scene should activate the more incidental face, if the task was reversed. That might be true if the finding is related to a search where the scene context is presumed to be the useful attention guiding stimulus. You might not expect an asymmetry if Os were just learning an association.

      It is clear in this study that the face and the scene have been associated and that this can be seen in the fMRI data. It is also clear that a valid scene background speeds the behavioral response in the search task. The linkage between these two results is not entirely clear but perhaps future research will shed more light.

      It is also possible that I missed the clear evidence of the search-specific nature of the activation by the scene during the delay period. If so, I apologize and suggest that the point be underlined for readers like me.

      We will respond to this question by acknowledging that the reviewer is right in that the delay period activation of the scene is not necessarily search-specific. We will then discuss how this possibility affects the interpretation of our results and what kind of studies would need to be conducted in order to fully establish a causal link between delay period activity and visual search performance. We will also discuss the literature on cued attention and situate our work within the context of these other studies that have used similar task paradigms to infer attentional processes. Finally, we will discuss the interpretation of delay period activity in PPA and IFJ.

      Reviewer #2 (Public review):

      Summary:

      This work is one of the best instances of a well-controlled experiment and theoretically impactful findings within the literature on templates guiding attentional selection. I am a fan of the work that comes out of this lab and this particular manuscript is an excellent example as to why that is the case. Here, the authors use fMRI (employing MVPA) to test whether during the preparatory search period, a search template is invoked within the corresponding sensory regions, in the absence of physical stimulation. By associating faces with scenes, a strong association was created between two types of stimuli that recruit very specific neural processing regions - FFA for faces and PPA for scenes. The critical results showed that scene information that was associated with a particular cue could be decoded from PPA during the delay period. This result strongly supports the invoking of a very specific attentional template.

      Strengths:

      There is so much to be impressed with in this report. The writing of the manuscript is incredibly clear. The experimental design is clever and innovative. The analysis is sophisticated and also innovative. The results are solid and convincing.

      Weaknesses:

      I only have a few weaknesses to point out.

      This point is not so much of a weakness, but a further test of the hypothesis put forward by the authors. The delay period was long - 8 seconds. It would be interesting to split the delay period into the first 4seconds and the last 4seconds and run the same decoding analyses. The hypothesis here is that semantic associations take time to evolve, and it would be great to show that decoding gets stronger in the second delay period as opposed to the period right after the cue. I don't think this is necessary for publication, but I think it would be a stronger test of the template hypothesis.

      We will conduct the suggested analysis. Depending on the outcome, we will include it in supplemental materials or the main text.

      Type in the abstract "curing" vs "during."

      We will fix this.

      It is hard to know what to do with significant results in ROIs that are not motivated by specific hypotheses. However, for Figure 3, what are the explanations for ROIs that show significant differences above and beyond the direct hypotheses set out by the authors?

      We will address how each of the ROIs wdas selected based on the use of a priori networks as masks with ROIs as sub-parcels. We will explain why specific ROIs were associated with the strongest hypotheses but how the entire networks are relevant and related to existing literatures on attentional control and working memory. This content will be included in the introduction and discussion sections.

      Reviewer #3 (Public review):

      The manuscript contains a carefully designed fMRI study, using MVPA pattern analysis to investigate which high-level associate cortices contain target-related information to guide visual search. A special focus is hereby on so-called 'target-associated' information, that has previously been shown to help in guiding attention during visual search. For this purpose the author trained their participants and made them learn specific target-associations, in order to then test which brain regions may contain neural representations of those learnt associations. They found that at least some of the associations tested were encoded in prefrontal cortex during the cue and delay period.

      The manuscript is very carefully prepared. As far as I can see, the statistical analyses are all sound and the results integrate well with previous findings.

      I have no strong objections against the presented results and their interpretation.

      Thank you.

    1. Author response:

      eLife Assessment

      This study addresses a novel and interesting question about how the rise of the Qinghai-Tibet Plateau influenced patterns of bird migration, employing a multi-faceted approach that combines species distribution data with environmental modeling. The findings are valuable for understanding avian migration within a subfield, but the strength of evidence is incomplete due to critical methodological assumptions about historical species-environment correlations, limited tracking data, and insufficient clarity in species selection criteria. Addressing these weaknesses would significantly enhance the reliability and interpretability of the results.

      We would like to thank you and two anonymous reviewers for your careful, thoughtful, and constructive feedback on our manuscript. These reviews made us revisit a lot of our assumptions and we believe the paper will be much improved as a result. In addition to minor points, we will make three main changes to our manuscript in response to the reviews. First, we will address the concerns on the assumptions of historical species-environment correlations from perspectives of both theoretical and empirical evidence. Second, we will discuss the benefits and limitations of using tracking data in our study and demonstrate how the findings of our study are consolidated with results of previous studies. Third, we will clarify our criteria for selecting species in terms of both eBird and tracking data.

      Below, we respond to each comment in turn. Once again, we thank you all for your feedback.

      Reviewer #1 (Public review):

      Strengths:

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The methods are detailed and well described and written in such a fashion that they are transparent and repeatable.

      We appreciate the reviewer’s careful reading of our manuscript, encouraging comments and constructive suggestions.

      Weaknesses:

      I only have one major issue, which is possibly a product of the structure requirements of the paper/journal. This relates to the Results and Discussion, line 91 onwards. I understand the structure of the paper necessitates delving immediately into the results, but it is quite hard to follow due to a lack of background information. In comparison to the Methods, which are incredibly detailed, the Results in the main section reads as quite superficial. They provide broad overviews of broad findings but I found it very hard to actually get a picture of the main results in its current form. For example, how the different species factor in, etc.

      Yes, it is the journal request to format in this way (Methods follows the Results and Discussion) for the article type of short reports. As suggested, in the revision we will elaborate on details of our findings, especially the species-specific responses, in terms of (i) shifts of distribution of avian breeding and wintering areas under the influence of the uplift of the Qinghai-Tibetan Plateau, and (ii) major factors that shape current migration patterns of birds in the Plateau. We will also better reference the approaches we used in the study.

      Reviewer #2 (Public review):

      Summary:

      The study tries to assess how the rise of the Qinghai-Tibet Plateau affected patterns of bird migration between their breeding and wintering sites. They do so by correlating the present distribution of the species with a set of environmental variables. The data on species distributions come from eBird. The main issue lies in the problematic assumption that species correlations between their current distribution and environment were about the same before the rise of the Plateau. There is no ground truthing and the study relies on Movebank data of only 7 species which are not even listed in the study. Similarly, the study does not outline the boundaries of breeding sites NE of the Plateau. Thus it is absolutely unclear potentially which breeding populations it covers.

      We are very grateful for the careful review and helpful suggestions. We will revise the manuscript carefully in response to the reviewer’s comments and believe that it will be much improved as a result. Below are our point-by-point replies to the comments.

      Strengths:

      I like the approach for how you combined various environmental datasets for the modelling part.

      We appreciate the reviewer’s encouragement.

      Weaknesses:

      The major weakness of the study lies in the assumption that species correlations between their current distribution and environments found today are back-projected to the far past before the rise of the Q-T Plateau. This would mean that species responses to the environmental cues do not evolve which is clearly not true. Thus, your study is a very nice intellectual exercise of too many ifs.

      This is a valid concern. We will address this from both the perspectives of the theoretical design of our study and empirical evidence.

      First, we agree with the reviewer that species responses to environmental cues might vary over time. Nonetheless, the simulated environments before the uplift of the plateau serve as a counterfactual state in our study. Counterfactual is an important concept to support causation claims by comparing what happened to what would have happened in a hypothetical situation: “If event X had not occurred, event Y would not have occurred” (Lewis 1973). Recent years have seen an increasing application of the counterfactual approach to detect biodiversity change, i.e., comparing diversity between the counterfactual state and real estimates to attribute the factors causing such changes (e.g., Gonzalez et al. 2023). Whilst we do not aim to provide causal inferences for avian distributional change, using the counterfactual approach, we are able to estimate the influence of the plateau uplift by detecting the changes of avian distributions, i.e., by comparing where the birds would have distributed without the plateau to where they currently distributed. We regard the counterfactual environments as a powerful tool for eliminating, to the extent possible, vagueness, as opposed to simply description of current distributions of birds. Therefore, we assume species’ responses to environments are conservative and their evolution should not discount our findings. We will clarify this in both the Introduction and Methods.

      Second, we used species distribution modelling to contrast the distributions of birds before and after the uplift of the plateau under the assumption that species tend to keep their ancestral ecological traits over time (i.e., niche conservatism). This indicates a high probability for species to distribute in similar environments wherever suitable. Particularly, considering birds are more likely to be influenced by food resources (Martins et al. 2024), and the distribution of available food before the uplift (Jia et al. 2020), we believe the findings can provide valuable insights into the influence of the plateau on avian migratory patterns. Having said that, we acknowledge other factors, e.g., carbon dioxide concentrations (Zhang et al. 2022), can influence the simulations of environments and our prediction of avian distribution. We will clarify the assumptions and evidence we have for the modelling in Methods. We will further point out the direction for future studies in the Discussion.

      The second major drawback lies in the way you estimate the migratory routes of particular birds. No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites. Some might overwinter in India, some populations in Africa and you will never know the teleconnections between breeding and wintering sites of particular species. The few available tracking studies (seven!) are too coarse and with limited aspects of migratory connectivity to give answer on the target questions of your study.

      We agree with the reviewer that establishing interconnections for birds is important for estimating the migration patterns of birds. We employed a dynamic model to assess their weekly distributions. Thus, we can track the movement of species every week, and capture the breeding and wintering areas for specific populations. That being said, we acknowledge that our approach can be subjected to the patchy sampling of eBird data. We will better demonstrate this in the main text.  

      Tracking data can provide valuable insights into the movement patterns of species but are limited to small numbers of species due to the considerable costs and time needed. We aimed to adopt the tracking data to examine the influence of focal factors on avian migration patterns, but only seven species, to the best of our ability, were acquired. Moreover, similar results were found in studies that used tracking data to estimate the distribution of breeding and wintering areas of birds in the plateau (e.g., Prosser et al. 2011, Zhang et al. 2011, Zhang et al. 2014, Liu et al. 2018, Kumar et al. 2020, Wang et al. 2020, Pu and Guo 2023, Yu et al. 2024, Zhao et al. 2024). We believe the conclusions based on seven species are rigour, but their implications could be restricted by the number of tracking species we obtained. We will demonstrate how our findings on breeding and wintering areas of birds are reinforced by other studies reporting the locations of those areas. We will also add a separate caveat section to discuss the limitations stated above.

      Your set of species is unclear, selection criteria for the 50 species are unknown and variability in their migratory strategies is likely to affect the direction of the effects.

      We will clarify the selection criteria for the 50 species). We first obtained a full list of birds in the plateau from Prins and Namgail (2017). We then extracted species identified as full migrants in Birdlife International (https://datazone.birdlife.org/species/spcdistPOS) from the full list.

      In addition, the position of the breeding sites relative to the Q-T plate will affect the azimuths and resulting migratory flyways. So in fact, we have no idea what your estimates mean in Figure 2.

      We calculated the azimuths not only by the angles between breeding sites and wintering sites but also based on the angles between the stopovers of birds. Therefore, the azimuths are influenced by the relative positions of breeding, wintering and stopover sites. We will better explain this both in the Methods and legend of Figure 2.

      There is no way one can assess the performance of your statistical exercises, e.g. performances of the models.

      As suggested, we will add the AUC values to assess the performances of the models.

      References

      Gonzalez, A., J. M. Chase, and M. I. O'Connor. 2023. A framework for the detection and attribution of biodiversity change. Philosophical Transactions of the Royal Society B: Biological Sciences 378: 20220182.

      Jia, Y., H. Wu, S. Zhu, Q. Li, C. Zhang, Y. Yu, and A. Sun. 2020. Cenozoic aridification in Northwest China evidenced by paleovegetation evolution. Palaeogeography, Palaeoclimatology, Palaeoecology 557:109907.

      Kumar, N., U. Gupta, Y. V. Jhala, Q. Qureshi, A. G. Gosler, and F. Sergio. 2020. GPS-telemetry unveils the regular high-elevation crossing of the Himalayas by a migratory raptor: implications for definition of a “Central Asian Flyway”. Scientific Reports 10:15988.

      Lewis, D. 1973. Counterfactuals. Oxford: Blackwell.

      Liu, D., G. Zhang, H. Jiang, and J. Lu. 2018. Detours in long-distance migration across the Qinghai-Tibetan Plateau: individual consistency and habitat associations. PeerJ 6:e4304.

      Martins, L. P., D. B. Stouffer, P. G. Blendinger, K. Böhning-Gaese, J. M. Costa, D. M. Dehling, C. I. Donatti, C. Emer, M. Galetti, R. Heleno, Í. Menezes, J. C. Morante-Filho, M. C. Muñoz, E. L. Neuschulz, M. A. Pizo, M. Quitián, R. A. Ruggera, F. Saavedra, V. Santillán, M. Schleuning, L. P. da Silva, F. Ribeiro da Silva, J. A. Tobias, A. Traveset, M. G. R. Vollstädt, and J. M. Tylianakis. 2024. Birds optimize fruit size consumed near their geographic range limits. Science 385:331-336.

      Prins, H. H. T., and T. Namgail. 2017. Bird migration across the Himalayas : wetland functioning amidst mountains and glaciers. Cambridge University Press, Cambridge.

      Prosser, D. J., P. Cui, J. Y. Takekawa, M. Tang, Y. Hou, B. M. Collins, B. Yan, N. J. Hill, T. Li, Y. Li, F. Lei, S. Guo, Z. Xing, Y. He, Y. Zhou, D. C. Douglas, W. M. Perry, and S. H. Newman. 2011. Wild bird migration across the Qinghai-Tibetan Plateau: a transmission route for highly pathogenic H5N1. PloS One 6:e17622.

      Pu, Z., and Y. Guo. 2023. Autumn migration of black-necked crane (Grus nigricollis) on the Qinghai-Tibetan and Yunnan-Guizhou plateaus. Ecology and Evolution 13:e10492.

      Wang, Y., C. Mi, and Y. Guo. 2020. Satellite tracking reveals a new migration route of black-necked cranes (Grus nigricollis) in Qinghai-Tibet Plateau. PeerJ 8:e9715.

      Yu, X., G. Song, H. Wang, Q. Wei, C. Jia, and F. Lei. 2024. Migratory flyways and connectivity of brown headed gulls (Chroicocephalus brunnicephalus) revealed by GPS tracking. Global Ecology and Conservation 56:e03340.

      Zhang, G.G., D.P. Liu, Y.Q. Hou, H.X. Jiang, M. Dai, F.W. Qian, J. Lu, T. Ma, L.X. Chen, and Z. Xing. 2014. Migration routes and stopover sites of Pallas’s gulls Larus ichthyaetus breeding at Qinghai Lake, China, determined by satellite tracking. Forktail 30:104-108.

      Zhang, G.G., D.P. Liu, Y.Q. Hou, H.X. Jiang, M. Dai, F.W. Qian, J. Lu, Z. Xing, and F.S. Li. 2011. Migration routes and stop-over sites determined with satellite tracking of bar-headed geese (Anser indicus) breeding at Qinghai Lake, China. Waterbirds 34:112-116, 115.

      Zhang, R., D. Jiang, C. Zhang, and Z. Zhang. 2022. Distinct effects of Tibetan Plateau growth and global cooling on the eastern and central Asian climates during the Cenozoic. Global and Planetary Change 218:103969.

      Zhao, T., W. Heim, R. Nussbaumer, M. van Toor, G. Zhang, A. Andersson, J. Bäckman, Z. Liu, G. Song, M. Hellström, J. Roved, Y. Liu, S. Bensch, B. Wertheim, F. Lei, and B. Helm. 2024. Seasonal migration patterns of Siberian Rubythroat (Calliope calliope) facing the Qinghai–Tibet Plateau. Movement Ecology 12:54.

    1. Author response:

      Reviewer #2 (Public review):

      (1) Given their results the authors conclude that upregulation of Frizzled on the plasma membrane is not sufficient to explain the stabilization of beta-catenin seen in the ZNRF3/RNF43 mutant cells. This interpretation is sound, and they suggest in the discussion that ZNRF3/RNF43-mediated ubiquitination could serve as a sorting signal to sort endocytosed FZD to lysosomes for degradation and that absence or inhibition of this process would promote FZD recycling. This should be relatively easy to test using surface biotinylation experiments and would considerably strengthen the manuscript.

      Thank you for your valuable suggestions and comments. We will perform cell surface biotinylation experiments.

      (2) The authors show that the FZD5 CRD domain is required for endocytosis since a mutant FZD5 protein in which the CRD is removed does not undergo endocytosis. This is perhaps not surprising since this is the site of Wnt binding, but the authors show that a chimeric FZD5CRD-FZD4 receptor can confer Wnt-dependent endocytosis to an otherwise endocytosis incompetent FZD4 protein. Since the linker region between the CRD and the first TM differs between FZD5 and FZD4 it would be interesting to understand whether the CRD specifically or the overall arrangement (such as the spacing) is the most important determinant.

      Our results in Fig. 1F-G clearly show that the CRD of FZD5 specifically is both necessary and sufficient for Wnt3a/5a-induced FZD5 endocytosis, as replacing the CRD alone in FZD5 with the CRD from either FZD4 or FZD7 completely abolished Wnt-induced endocytosis, whereas replacing the CRD alone in FZD4 or FZD7 with the FZD5 CRD alone could confer Wnt-induced endocytosis.

      (3) I find it surprising that only FZD5 and FZD8 appear to undergo endocytosis or be stabilized at the cell surface upon ZNRF3/RNF43 knockout. Is this consistent with previous literature? Is that a cell-specific feature? These findings should be tested in a different cell line, with possibly different relative levels of ZNRF3 and RNF43 expression.

      Thank you for your comments and suggestions. Our finding that ZNRF3/RNF43 specifically regulates FZD5/8 degradation is consistent with recent published studies in which FZD5 is required for the survival of RNF43-mutant PDAC or colorectal cancer cells (Nature Medicine, 2017, PMID: 27869803) and FZD5 is required for the maintenance of intestinal stem cells (Developmental Cell, 2024, PMID: 39579768 and 39579769), and in both cases, FZDs other than FZD5/8 are also expressed but not sufficient to compensate for the function of FZD5. The mechanism by which Wnt3a/5a specifically induces FZD5/8 endocytosis and degradation is currently unknown and needs to be explored in the future. We speculate that Wnt binding to FZD5/8 may recruit another protein on the cell surface to specifically facilitate FZD5/8 endocytosis. On the other hand, we cannot exclude the possibility that Wnts other than Wnt3a/5a may induce the endocytosis and degradation of FZDs other than FZD5/8 since there are 19 Wnts and 10 FZDs in humans. We will perform flow cytometry experiments using FZD5/8-specific antibodies to examine whether Wnt3a/5a induces FZD5/8 endocytosis in more cell lines.

      (4) If FZD7 is not a substrate of ZNRF3/RNF43 and therefore is not ubiquitinated and degraded, how do the authors reconcile that its overexpression does not lead to elevated cytosolic beta-catenin levels in Figure 5B?

      We are currently not sure of the mechanism underlying this result. Considering that most FZDs are expressed in 293A cells, we do not know how much of the mature form of overexpressed FZD7 was presented to the plasma membrane.

      (5) For Figure 5B, it would be interesting if the authors could evaluate whether overexpression of FZD5 in the ZNRF3/RNF43 double knockout lines would synergize and lead to further increase in cytosolic beta-catenin levels. As control if the substrate selectivity is clear FZD7 overexpression in that line should not do anything.

      We will perform these experiments as you suggested.

      (6) In Figure 6G, the authors need to show cytosolic levels of beta-catenin in the absence of Wnt in all cases.

      We did not add Wnt CM in this experiment. RSPO1 activity, which relies on endogenous Wnt, has been well documented in previous studies.

      (7) Since the authors show that DVL is not involved in the Wnt and ZRNF3-dependent endocytosis they should repeat the proximity biotinylation experiment in figure 7 in the DVL triple KO cells. This is an important experiment since previous studies showed that DVL was required for the ZRNF3/RNF43-mediated ubiqtuonation of FZD.

      Thank you for your valuable suggestions. We will perform the proximity biotinylation experiment in DVL TKO cells.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      This manuscript aimed to study the role of Rudhira (also known as Breast Carcinoma Amplified Sequence 3), an endothelium-restricted microtubules-associated protein, in regulating of TGFβ signaling. The authors demonstrate that Rudhira is a critical signaling modulator for TGFβ signaling by releasing Smad2/3 from cytoskeletal microtubules and how Rudhira is a Smad2/3 target gene. Taken together, the authors provide a model of how Rudhira contributes to TGFβ signaling activity to stabilize the microtubules, which is essential for vascular development.

      Strengths

      The study used different methods and techniques to achieve aims and support conclusions, such as Gene Ontology analysis, functional analysis in culture, immunostaining analysis, and proximity ligation assay. This study provides an unappreciated additional layer of TGFβ signaling activity regulation after ligand receptor interaction.

      We thank the reviewer for acknowledging the importance of our study and providing a clear summary of our findings.

      Weaknesses

      (1) It is unclear how current findings provide a beVer understanding of Rudhira KO mice, which the authors published some years ago.

      Our previous study demonstrated that Rudhira KO mice have a predominantly developmental cardiovascular phenotype that phenocopies TGFβ loss of function (Shetty, Joshi et al., 2018). Additionally, we found that at the molecular level, Rudhira regulates cytoskeletal organization (Jain et al., 2012; Joshi and Inamdar, 2019). Our current study builds upon these previous findings, showing an essential role of Rudhira in maintaining TGFβ signaling and controlling the microtubule cytoskeleton during vascular development. On one hand Rudhira regulates TGFβ signaling by promoting the release of Smads from microtubules, while on the other, Rudhira is a TGFβ target essential for stabilizing microtubules. Thus, our current study provides a molecular basis for Rudhira function in cardiovascular development.

      (2) Why do they use HEK cells instead of SVEC cells in Figure 2 and 4 experiments?

      Our earlier studies have characterized the role of Rudhira in detail using both loss and gain of function methods in multiple cell types (Jain et al., 2012; SheVy, Joshi et al., 2018; Joshi and Inamdar, 2019). As endothelial cells are particularly difficult to transfect, and because the function of Rudhira in promoting cell migration is conserved in HEK cells, it was practical and relevant to perform these experiments in HEK cells (Figures 2 and 4E).

      (3) A model shown in Figure 5E needs improvement to grasp their findings easily.

      We have modified Figure 5E for clarity.

      Reviewer #2 (Public Review):

      Summary

      It was first reported in 2000 that Smad2/3/4 are sequestered to microtubules in resting cells and TGF-β stimulation releases Smad2/3/4 from microtubules, allowing activation of the Smad signaling pathway. Although the finding was subsequently confirmed in a few papers, the underlying mechanism has not been explored. In the present study, the authors found that Rudhira/breast carcinoma amplified sequence 3 is involved in the release of Smad2/3 from microtubules in response to TGF-β stimulation. Rudhira is also induced by TGF-β and is probably involved in the stabilization of microtubules in the delayed phase after TGF-β stimulation. Therefore, Rudhira has two important functions downstream of TGF-β in the early as well as delayed phase.

      Strengths:

      This work aimed to address an unsolved question on one of the earliest events after TGF-β stimulation. Based on loss-of-function experiments, the authors identified a novel and potentially important player, Rudhira, in the signal transmission of TGF-β.

      We thank the reviewer for the critical evaluation and appreciation of our findings.

      Weaknesses:

      The authors have identified a key player that triggers Smad2/3 released from microtubules after TGF-β stimulation probably via its association with microtubules. This is an important first step for understanding the regulation of Smad signaling, but underlying mechanisms as well as upstream and downstream events largely remain to be elucidated.

      We acknowledge that the mechanisms regulating cytoskeletal control of Smad signaling are far from clear, but these are out of scope of this manuscript. This manuscript rather focuses on Rudhira/Bcas3 as a pivot to understand vascular TGFβ signaling and microtubule connections.

      (1) The process of how Rudhira causes the release of Smad proteins from microtubules remains unclear. The statement that "Rudhira-MT association is essential for the activation and release of Smad2/3 from MTs" (lines 33-34) is not directly supported by experimental data.

      We agree with the reviewer’s comment. Although we provide evidence that the loss of Rudhira (and thereby deduced loss of Rudhira-MT association) prevents release of Smad2/3 from MTs (Fig 3C), it does not confirm the requirement of Rudhira-MT association for this. In light of this, we have modified the statement to ‘Rudhira associates with MTs and is essential for the activation and release of Smad2/3 from MTs”.

      (2) The process of how Rudhira is mobilized to microtubules in response to TGF-β remains unclear.

      Our previous study showed that Rudhira associates with microtubules, and preferentially binds to stable microtubules (Jain et al., 2012; Joshi and Inamdar, 2019). Since TGFβ stimulation is known to stabilize microtubules, we hypothesize that TGFβ stimulation increases Rudhira binding to stable microtubules. We have mentioned this in our revised manuscript.

      (3) After Rudhira releases Smad proteins from microtubules, Rudhira stabilizes microtubules. The process of how cells return to a resting state and recover their responsiveness to TGF-β remains unclear.

      We show that dissociation of Smads from microtubules is an early response and stabilization of microtubules is a late TGFβ response. However, we agree that the sequence of these molecular events has not been characterized in-depth in this or any other study, making it difficult to assign causal roles (eg. whether release of Smads from MTs is a pre-requisite for MT stabilization by Rudhira) or reversibility. However, the TGFβ pathway is auto regulatory, leading to increased turnover of receptors and Smads and increased expression of inhibitory Smads, which may recover responsiveness to TGFβ. Additionally, the still short turnover time of stable microtubules (several minutes to hours) may also promote quick return to resting state. We have discussed this in our revised manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for The Authors):

      (1) Overall: Duration of TGF-β stimulation in cell-based assays should be described in the legends for readers' convenience. Avoid simple bar graphs because sample numbers are only 3. A scaVer plot should be super-imposed.

      Details added, as suggested. Duration of treatment is mentioned in Materials and methods section for figures 1C-D; 2A-B; 3; 4A-C; 5A-C; S2D; S3A-C; S4C, D. Bar graphs have been replaced with a bar + scatter plot. Note that, as the Excel file for data related to fig 4A was corrupted, we repeated the experiments to generate fresh data. Hence the graph had to be replaced. However, the result holds true as before.

      (2) Figure 1A: This panel is too small. Gene names are almost invisible.

      Modified for clarity.

      (3) Figure 1B: Show TGFβRI expression by immunoblomng (re-probing) to verify that it is expressed in the rightmost lane.

      TGFβRI overexpression was confirmed by qPCR in a replicate in the same experiment (Fig S2C).

      (4) Figure 1C: Show expression of Rudhira. In addition, confirm the positions of molecular weight markers. Smad2 migrated slower than pSmad2.

      Rudhira expression is shown in Fig S1B. Molecular weight markers have been corrected.

      (5) Figure 3A: This panel shows a negative result that Smad2/3 fails to interact with Rudhira. A positive control, for example, Smad4, would make the data convincing.

      Although it would be nice to have a positive control for interaction, we do not agree that a positive control of Smad4 is essential for our conclusion from this experiment, which is that ‘we were unable to detect an interaction between Rudhira and Smad2/3’.

      (6) Fig. 3B: Show Rudhira blot. If possible, show that the Rudhira-MT association precedes Smad phosphorylation by a time course experiment. This is an important point but not experimentally demonstrated.

      The interaction between Rudhira and microtubules with or without TGFβ is demonstrated by PLA (Fig 3E). Although important, the suggested time course experiments to assess the sequence of events are beyond the scope of this manuscript. 

      (7) Figure 3E: Does the process require the type I receptor kinase activity or non-Smad signaling pathways?

      Since TGFβ pathway is complex and is regulated at multiple steps, this possibility has not been tested and is beyond the scope of current study.

      (8) Figure 4A: The authors did not examine if these elements are functional. Therefore, this panel can be presented as a supplementary figure.

      As suggested, the panel has been moved to supplementary information.

      (9) Figure 4E: The figure legend does not say that cells were TGF-β-stimulated. It remains unclear if Smad2 and Smad3 are involved in Rudhira expression as phosphorylated or non-phosphorylated forms. Therefore, the authors should show a pSmad2 blot. In the absence of TGF-β stimulation, Smad2 and Smad3 are expected to be sequestrated to microtubules and therefore not phosphorylated. In the case that cells were stimulated with TGF-β, show if Rudhira is induced by TGF-β in HEK293T cells. This is not shown in this manuscript.

      This experiment was not performed under regulated conditions with or without TGFβ, hence the sensitivity to TGFβ could not be assessed. Cells were not stimulated with exogenous TGFβ, but cultured in regular medium with serum, which can have up to ~40 ng/ml of TGFβ (latent and active). Additionally, owing to severe depletion of Smad2 or Smad3 by shRNAs we expect sufficient loss of phospho-Smads2/3. 

      (10) Figure S1A: Rudhira migrated at the position corresponding to 91 kD only in this panel.

      Corrected the position of molecular weight marker.

      (11) Line 205-206, "Since in vivo studies indicate that rudhira depletion severely affects the TGFβ pathway [11]": Refer to Reference 11. The paper does not say anything about TGFβ.

      Reference corrected to Ref #14.

      (12) Smad4 was previously reported to be sequestered to microtubules [Ref. 7]. Does Rudhira release Smad4 also?

      This is an interesting point which could be followed up on our future studies.

      (13) It would be nice if the authors examined how Rudhira causes the release of Smad2/3 from microtubules. Currently, it remains unclear whether the association of Rudhira to microtubules is required for the release of Smad2/3. Does a Rudhira mutant lacking microtubule binding fail to induce the release of Smad2/3 after TGF-β stimulation? If so, do Rudhira and Smad2/3 share the same binding site on microtubules? In that case, the mechanism can be regarded as "competitive".

      This is a thoughtful experiment much beyond the scope of current manuscript. In our previous study we were able to localize the Tubulin binding sites of Rudhira primarily to its Bcas3 domain (Joshi and Inamdar, 2019), however the equivalent sites in Tubulin were not assessed. While MH2 domains of Smad2/3 bind β-tubulin, amino acids 114-243 of β-tubulin bind to Smad2/3 (Dai et al., 2007). A systematic study of these tripartite interactions including Rudhira would be an interesting follow up for our future study.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):  

      Summary:  

      The authors show that SVZ derived astrocytes respond to a middle carotid artery occlusion

      (MCAO) hypoxia lesion by secreting and modulating hyaluronan at the edge of the lesion (penumbra) and that hyaluronan is a chemoattractant to SVZ astrocytes. They use lineage tracing of SVZ cells to determine their origin. They also find that SVZ derived astrocytes express Thbs-4 but astrocytes at the MCAO-induced scar do not. Also, they demonstrate that decreased HA in the SVZ is correlated with gliogenesis. While much of the paper is descriptive/correlative they do overexpress Hyaluronan synthase 2 via viral vectors and show this is sufficient to recruit astrocytes to the injury. Interestingly, astrocytes preferred to migrate to the MCAO than to the region of overexpressed HAS2.  

      Strengths:  

      The field has largely ignored the gliogenic response of the SVZ, especially with regards to astrocytic function. These cells and especially newborn cells may provide support for regeneration. Emigrated cells from the SVZ have been shown to be neuroprotective via creating pro-survival environments, but their expression and deposition of beneficial extracellular matrix molecules is poorly understood. Therefore, this study is timely and important. The paper is very well written and the flow of results logical.  

      Comments on revised version:  

      The authors have addressed my points and the paper is much improved. Here are the salient remaining issues that I suggest be addressed.  

      We appreciate the feedback by the reviewer, and we are glad that the paper is considered to be much improved. We have done our best to address the remaining issues in this 2nd revision.

      The authors have still not shown, using loss of function studies, that Hyaluronan is necessary for SVZ astrogenesis and or migration to MCAO lesions.

      This is true. Unfortunately, complete removal of hyaluronan (via Hyase) triggers epilepsy, already described in 1963 by James Young (Exp Neurol paper). Degradation by Hyase also provokes neuroinflammation (Soria et al., 2020 Nat Commun). Two alternatives could be 1) partial depletion with Has inhibitor 4MU (but it is also associated with increased inflammation) or 2) a Has-KO mouse, such as Has3-/- (Arranz et al., 2014), although, to our knowledge, this mouse line is not openly available. We have added a sentence in line 332 addressing this shortcoming: “Loss-of-function studies, using HA-depletion models or HA synthase (Has)deficient mice are still needed to corroborate this finding, although the inflammation associated with HA deficiency might confound interpretation.”

      (1) The co-expression of EGFr with Thbs4 and the literature examination is useful.  

      We thank the reviewer for the kind comment.

      (2) Too bad they cannot explain the lack of effect of the MCAO on type C cells. The comparison with kainate-induced epilepsy in the hippocampus may or may not be relevant.

      As stated in the previous response, we also found this interesting, and it does warrant further exploration by looking into possible direct NSC-astrocyte differentiation. But we believe that both this possible direct differentiation and the reactive status for these astrocytes are out of the scope of the study. We will not speculate about this in the discussion, either.

      (3) Thanks for including the orthogonal confocal views in Fig S6D.  

      (4) The statement that "BrdU+/Thbs4+ cells mostly in the dorsal area" and therefore they mostly focused on that region is strange. Figure 8 clearly shows Thbs4 staining all along the striatal SVZ. Do they mean the dorsal segment of the striatal SVZ or the subcallosal SVZ? Fig. 4b and Fig 4f clearly show the "subcallosal" area as the one analysed but other figures show the dorsal striatal region (Fig. 2a). This is important because of the well-known embryological and neurogenic differences between the regions.  

      While it is true that Thbs4 is also expressed in the other subregions of the SVZ (lateral, ventral and medial), as observed in Fig 8. we chose the dorsal area because it is the subregion where we observed the larger increase in slow proliferative NSCs (Thbs4/GFAP/BrdU-positive cells) after MCAO (Fig S3). As observed in the quantifications in Fig S3, we found Thbs4/GFAP/BrdUpositive cells increase in lateral, medial and ventral SVZ, but it is not significant. Therefore, from Fig 4 onwards, we focused on the dorsal SVZ, which the reviewer mentions as “subcallosal” area. We chose the term “dorsal” as stated in single-cell studies (Cebrian-Silla et al, 2021, eLife; Marcy et al., 2023, Sci Adv) and reviews (Sequerra 2014 Front Cell Neurosci) that investigate or mention this subregion, respectively. In the abstract, we are perfectly clear stating that newborn astrocytes migrate frm both dorsal and medial areas.  

      In Fig 2a, the immunofluorescence image shows medial and lateral SVZ, but at this point in the paper, we have not yet made specific subregional quantifications, and the Nestin, DCX and Thbs4 quantifications refer to the SVZ as a whole, both in the IF and in the WB (Fig 2e-g). We apologize for the confusion. We have clarified this in the text (line 119).  

      (5) It is good to know that the harsh MCAO's had already been excluded.  

      (6) Sorry for the lack of clarity - in addition to Thbs4, I was referring to mouse versus rat Hyaluronan degradation genes (Hyal1, Hyal2 and Hyal3) and hyaluronan synthase genes (HAS1 and HAS2) in order to address the overall species differences in hyaluronan biology thus justifying the "shift" from mouse to rat. You examine these in the (weirdly positioned) Fig. 8h,i. Please add a few sentences on mouse vs rat Thbs4 and Hyaluronan relevant genes.  

      We thank the reviewer for these remarks. We have now added a sentence pointing to the similar internalization and degradation in rat and mouse (reviewed by Sherman et al., 2015). This correction is in line 233. Hyaluronan is, in evolutionary terms, a very “old” molecule, part of the “ancient” glycan-based matrix, before the evolution of proteoglycans and fibrous proteins such as collagen, laminin etc. Hence, its machinery is highly conserved across species.

      We have also reorganized the panels in Fig 8, where 8h and 8i were indeed weirdly positioned. We hope that the new version of this figure is more easily readable.

      (7) Thank you for the better justification of using the naked mole rat HA synthase.  

      Reviewer #3 (Public review):  

      Summary:  

      The authors aimed to study the activation of gliogenesis and the role of newborn astrocytes in a post-ischemic scenario. Combining immunofluorescence, BrdU-tracing and genetic cellular labelling, they tracked the migration of newborn astrocytes (expressing Thbs4) and found that Thbs4-positive astrocytes modulate the extracellular matrix at the lesion border by synthesis but also degradation of hyaluronan. Their results point to a relevant function of SVZ newborn astrocytes in the modulation of the glial scar after brain ischemia. This work's major strength is the fact that it is tackling the function of SVZ newborn astrocytes, whose role is undisclosed so far.  

      Strengths:  

      The article is innovative, of good quality, and clearly written, with properly described Materials and Methods, data analysis and presentation. In general, the methods are designed properly to answer the main question of the authors, being a major strength. Interpretation of the data is also in general well done, with results supporting the main conclusions of this article.  

      In this revised version, the points raised/weaknesses were clarified and discussed in the article.  

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):  

      Minor points:  

      (1) Thanks for the clarification.  

      (2) Thanks for the clarification.  

      (3) The magnification is not apparent in Fig. 5.  

      We had removed two brain slices (from 4 to 2) in order to increase the size of the image 2-fold. We have now further increased the TTC panel, 25% from the revised version, 125% from the original.

      (4) Thanks for the clarification.  

      (5) Thanks for the clarification.  

      (6) Thanks for the clarification.  

      (7) Thanks for the clarification.  

      (8) Thanks for the clarification.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) As VRMate (a component of behaviorMate) is written using Unity, what is the main advantage of using behaviorMate/VRMate compared to using Unity alone paired with Arduinos (e.g. Campbell et al. 2018), or compared to using an existing toolbox to interface with Unity (e.g. Alsbury-Nealy et al. 2022, DOI: 10.3758/s13428-021-01664-9)? For instance, one disadvantage of using Unity alone is that it requires programming in C# to code the task logic. It was not entirely clear whether VRMate circumvents this disadvantage somehow -- does it allow customization of task logic and scenery in the GUI? Does VRMate add other features and/or usability compared to Unity alone? It would be helpful if the authors could expand on this topic briefly.

      We have updated the manuscript (lines 412-422) to clarify the benefits of separating the VR system as an isolated program and a UI that can be run independently. We argue that “…the recommended behaviorMate architecture has several important advantages. Firstly, by rendering each viewing angle of a scene on a dedicated device, performance is improved by splitting the computational costs across several inexpensive devices rather than requiring specialized or expensive graphics cards in order to run…, the overall system becomes more modular and easier to debug [and] implementing task logic in Unity would require understanding Object-Oriented Programming and C# … which is not always accessible to researchers that are typically more familiar with scripting in Python and Matlab.”

      VRMate receives detailed configuration info from behaviorMate at runtime as to which VR objects to display and receives position updates during experiments. Any other necessary information about triggering rewards or presenting non-VR cues is still handled by the UI so no editing of Unity is necessary. Scene configuration information is in the same JSON format as the settings files for behaviorMate, additionally there are Unity Editor scripts which are provided in the VRmate repository which permit customizing scenes through a “drag and drop” interface and then writing the scene configuration files programmatically. Users interested in these features should see our github page to find example scene.vr files and download the VRMate repository (including the editor scripts).  We provided 4 vr contexts, as well as a settings file that uses one of them which can be found on the behaviorMate github page (https://github.com/losonczylab/behaviorMate) in the “vr_contexts” and “example_settigs_files” directories. These examples are provided to assist VRMate users in getting set up and could provide a more detailed example of how VRMate and behaviorMate interact.

      (2) The section on "context lists", lines 163-186, seemed to describe an important component of the system, but this section was challenging to follow and readers may find the terminology confusing. Perhaps this section could benefit from an accompanying figure or flow chart, if these terms are important to understand.

      We maintain the use of the term context and context list in order to maintain a degree of parity with the java code. However, we have updated lines 173-175 to define the term context for the behaviorMate system: “... a context is grouping of one or more stimuli that get activated concurrently. For many experiments it is desirable to have multiple contexts that are triggered at various locations and times in order to construct distinct or novel environments.”

      a. Relatedly, "context" is used to refer to both when the animal enters a particular state in the task like a reward zone ("reward context", line 447) and also to describe a set of characteristics of an environment (Figure 3G), akin to how "context" is often used in the navigation literature. To avoid confusion, one possibility would be to use "environment" instead of "context" in Figure 3G, and/or consider using a word like "state" instead of "context" when referring to the activation of different stimuli.

      Thank you for the suggestion. We have updated Figure 3G to say “Environment” in order to avoid confusion.

      (3) Given the authors' goal of providing a system that is easily synchronizable with neural data acquisition, especially with 2-photon imaging, I wonder if they could expand on the following features:

      a. The authors mention that behaviorMate can send a TTL to trigger scanning on the 2P scope (line 202), which is a very useful feature. Can it also easily generate a TTL for each frame of the VR display and/or each sample of the animal's movement? Such TTLs can be critical for synchronizing the imaging with behavior and accounting for variability in the VR frame rate or sampling rate.

      Different experimental demands require varying levels of precision in this kind of synchronization signals. For this reason, we have opted against a “one-size fits all” for synchronization with physiology data in behaviorMate. Importantly this keeps the individual rig costs low which can be useful when constructing setups specifically for use when training animals. behaviorMate will log TTL pulses sent to GPIO pins setup as sensors, and can be configured to generate TTL pulses at regular intervals. Additionally all UDP packets received by the UI are time stamped and logged. We also include the output of the arduino millis() function in all UDP packets which can be used for further investigation of clock drift between system components. Importantly, since the system is event driven there cannot be accumulating drift across running experiments between the behaviorMate UI and networked components such as the VR system.

      For these reasons, we have not needed to implement a VR frame synchronization TTL for any of our experiments, however, one could extend VRMate to send "sync" packets back to behaviorMate to log when each frame was displayed precisely or TTL pulses (if using the same ODROID hardware we recommend in the standard setup for rendering scenes). This would be useful if it is important to account for slight changes in the frame rate at which the scenes are displayed. However, splitting rendering of large scenes between several devices results in fast update times and our testing and benchmarks indicate that display updates are smooth and continuous enough to appear coupled to movement updates from the behavioral apparatus and sufficient for engaging navigational circuits in the brain.

      b. Is there a limit to the number of I/O ports on the system? This might be worth explicitly mentioning.

      We have updated lines 219-220 in the manuscript to provide this information: Sensors and actuators can be connected to the controller using one of the 13 digital or 5 analog input/output connectors.

      c. In the VR version, if each display is run by a separate Android computer, is there any risk of clock drift between displays? Or is this circumvented by centralized control of the rendering onset via the "real-time computer"?

      This risk is mitigated by the real-time computer/UI sending position updates to the VR displays. The maximum amount scenes can be out of sync is limited because they will all recalibrate on every position update – which occurs multiple times per second as the animal is moving. Moreover, because position updates are constantly being sent by behaviorMate to VRMate and VRMate is immediately updating the scene according to this position, the most the scene can become out of sync with the mouse's position is proportional to the maximum latency multiplied by the running speed of the mouse. For experiments focusing on eliciting an experience of navigation, such a degree of asynchrony is almost always negligible. For other experimental demands it could be possible to incorporate more precise frame timing information but this was not necessary for our use case and likely for most other use cases. Additionally, refer to the response to comment 3a.

      Reviewer #2 (Public review):

      (1) The central controlling logic is coupled with GUI and an event loop, without a documented plugin system. It's not clear whether arbitrary code can be executed together with the GUI, hence it's not clear how much the functionality of the GUI can be easily extended without substantial change to the source code of the GUI. For example, if the user wants to perform custom real-time analysis on the behavior data (potentially for closed-loop stimulation), it's not clear how to easily incorporate the analysis into the main GUI/control program.

      Without any edits to the existing source code behaviorMate is highly customizable through the settings files, which allow users to combine the existing contexts and decorators in arbitrary combinations. Therefore, users have been able to perform a wide variety of 1D navigation tasks, well beyond our anticipated use cases by generating novel settings files. The typical method for providing closed-loop stimulation would be to set up a context which is triggered by animal behavior using decorators (e.g. based on position, lap number and time) and then trigger the stimulation with a TTL pulse. Rarely, if users require a behavioral condition not currently implemented or composable out of existing decorators, it would require generating custom code in Java to extend the UI. Performing such edits requires only knowledge of basic object-oriented programming in Java and generating a single subclass of either the BasicContextList or ContextListDecorator classes. In addition, the JavaFX (under development) version of behaviorMate incorporates a plugin which doesn't require recompiling the code in order to make these changes. However, since the JavaFX software is currently under development, documentation does not yet exist. All software is open-sourced and available on github.com for users interested in generating plugins or altering the source code.

      We have added the additional caveat to the manuscript in order to clarify this point (Line 197-202): “However, if the available set of decorators is not enough to implement the required task logic, some modifications to the source code may be necessary. These modifications, in most cases, would be very simple and only a basic understanding of object-oriented programming is required. A case where this might be needed would be performing novel customized real-time analysis on behavior data and activating a stimulus based on the result”

      (2) The JSON messaging protocol lacks API documentation. It's not clear what the exact syntax is, supported key/value pairs, and expected response/behavior of the JSON messages. Hence, it's not clear how to develop new hardware that can communicate with the behaviorMate system.

      The most common approach for adding novel hardware is to use TTL pulses (or accept an emitted TTL pulse to read sensor states). This type of hardware addition  is possible through the existing GPIO without the need to interact with the software or JSON API. Users looking to take advantage of the ability to set up and configure novel behavioral paradigms without the need to write any software would be limited to adding hardware which could be triggered with and report to the UI with a TTL pulse (however fairly complex actions could be triggered this way).

      For users looking to develop more customized hardware solutions that interact closely with the UI or GPIO board, additional documentation on the JSON messaging protocol has been added to the behaviormate-utils repository (https://github.com/losonczylab/behaviormate_utils). Additionally, we have added a link to this repository in the Supplemental Materials section (line 971) and referenced this in the manuscript (line 217) to make it easier for readers to find this information.

      Furthermore, developers looking to add completely novel components to the UI  can implement the interface described by Context.java in order to exchange custom messages with hardware. (described  in the JavaDoc: https://www.losonczylab.org/behaviorMate-1.0.0/)  These messages would be defined within the custom context and interact with the custom hardware (meaning the interested developer would make a novel addition to the messaging API). Additionally, it should be noted that without editing any software, any UDP packets sent to behaviorMate from an IP address specified in the settings will get time stamped and logged in the stored behavioral data file meaning that are a large variety of hardware implementation solutions using both standard UDP messaging and through TTL pulses that can work with behaviorMate with minimal effort. Finally, see response to R2.1 for a discussion of the JavaFX version of the behaviorMatee UI including plugin support.

      (3) It seems the existing control hardware and the JSON messaging only support GPIO/TTL types of input/output, which limits the applicability of the system to more complicated sensor/controller hardware. The authors mentioned that hardware like Arduino natively supports serial protocols like I2C or SPI, but it's not clear how they are handled and translated to JSON messages.

      We provide an implementation for an I2C-based capacitance lick detector which interested developers may wish to copy if support for novel I2C or SPI. Users with less development experience wishing to expand the hardware capabilities of  behaviorMatecould also develop adapters which can be triggered  on a TTL input/output. Additionally, more information about the JSON API and how messages are transmitted to the PC by the arduino is described in point (2) and the expanded online documentation.

      a. Additionally, because it's unclear how easy to incorporate arbitrary hardware with behaviorMate, the "Intranet of things" approach seems to lose attraction. Since currently, the manuscript focuses mainly on a specific set of hardware designed for a specific type of experiment, it's not clear what are the advantages of implementing communication over a local network as opposed to the typical connections using USB.

      As opposed to serial communication protocols as typical with USB, networking protocols seamlessly function based on asynchronous message passing. Messages may be routed internally (e.g. to a PCs localhost address, i.e. 0.0.0..0) or to a variety of external hardware (e.g. using IP addresses such as those in the range 192.168.1.2 - 192.168.1.254). Furthermore, network-based communication allows modules, such as VR, to be added easily. behavoirMate systems can be easily expanded using low-cost Ethernet switches and consume only a single network adapter on the PC (e.g. not limited by the number of physical USB ports). Furthermore, UDP message passing is implemented in almost all modern programming languages in a platform independent manner (meaning that the same software can run on OSX, Windows, and Linux). Lastly, as we have pointed out (Line 117) a variety of tools exist for inspecting network packets and debugging; meaning that it is possible to run behaviorMate with simulated hardware for testing and debugging.

      The IOT nature of behaviorMate means there is no requirement for novel hardware to be implemented  using an arduino,  since any system capable of  UDP communication can  be configured. For example, VRMate is usually run on Odroid C4s, however one could easily create a system using Raspberry Pis or even additional PCs. behaviorMate is agnostic to the format of the UDP messages, but packaging any data in the JSON format for consistency would be encouraged. If a new hardware is a sensor that has input requiring it to be time stamped and logged then all that is needed is to add the IP address and port information to the ‘controllers’ list in a behaviorMate settings file. If more complex interactions are needed with novel hardware than a custom implementation of ContextList.java may be required (see response to R2.2). However, the provided UdpComms.java class could be used to easily send/receive messages from custom Context.java subclasses.

      Solutions for highly customized hardware do require basic familiarity with object-oriented programming using the Java programming language. However, in our experience most behavioral experiments do not require these kinds of modifications. The majority of 1D navigation tasks, which behaviorMate is currently best suited to control, require touch/motion sensors, LEDs, speakers, or solenoid valves,  which are easily controlled by the existing GPIO implementation. It is unlikely that custom subclasses would even be needed.

      Reviewer #3 (Public review):

      (1) While using UDP for data transmission can enhance speed, it is thought that it lacks reliability. Are there error-checking mechanisms in place to ensure reliable communication, given its criticality alongside speed?

      The provided GPIO/behavior controller implementation sends acknowledgement packets in response to all incoming messages as well as start and stop messages for contexts and “valves”. In this way the UI can update to reflect both requested state changes as well as when they actually happen (although there is rarely a perceptible gap between these two states unless something is unplugged or not functioning). See Line 85 in the revised manuscript “acknowledgement packets are used to ensure reliable message delivery to and from connected hardware”.

      (2) Considering this year's price policy changes in Unity, could this impact the system's operations?

      VRMate is not affected by the recent changes in pricing structure of the Unity project.

      The existing compiled VRMate software does not need to be regenerated to update VR scenes, or implement new task logic (since this is handled by the behaviorMate GUI). Therefore, the VRMate program is robust to any future pricing changes or other restructuring of the Unity program and does not rely on continued support of Unity. Additionally, while the solution presented in VRMate has many benefits, a developer could easily adapt any open-source VR Maze project to receive the UDP-based position updates from behaviorMate or develop their own novel VR solutions.

      (3) Also, does the Arduino offer sufficient precision for ephys recording, particularly with a 10ms check?

      Electrophysiology recording hardware typically has additional I/O channels which can provide assistance with tracking behavior/synchronization at a high resolution. While behaviorMate could still be used to trigger reward valves, either the ephys hardware or some additional high-speed DAQ would be recommended to maintain accurately with high-speed physiology data. behaviorMate could still be set up as normal to provide closed and open-loop task control at behaviorally relevant timescales alongside a DAQ circuit recording events at a consistent temporal resolution. While this would increase the relative cost of the individual recording setup, identical rigs for training animals could still be configured without the DAQ circuit avoiding unnecessary cost and complexity.

      (4) Could you clarify the purpose of the Sync Pulse? In line 291, it suggests additional cues (potentially represented by the Sync Pulse) are needed to align the treadmill screens, which appear to be directed towards the Real-Time computer. Given that event alignment occurs in the GPIO, the connection of the Sync Pulse to the Real-Time Controller in Figure 1 seems confusing.

      A number of methods exist for synchronizing recording devices like microscopes or electrophysiology recordings with behaviorMate’s time-stamped logs of actuators and sensors. For example, the GPIO circuit can be configured to send sync triggers, or receive timing signals as input. Alternatively a dedicated circuit could record frame start signals and relay them to the PC to be logged independently of the GPIO (enabling a high-resolution post-hoc alignment of the time stamps). The optimal method to use varies based on the needs of the experiment. Our setups have a dedicated BNC output and specification in the settings file that sends a TTL pulse at the start of an experiment in order to trigger 2p imaging setups (see line 224, specifically that this is a detail of “our” 2p imaging setup). We provide this information as it might be useful suggesting how to have both behavior and physiology data start recording at the same time. We do not intend this to be the only solution for alignment. Figure 1 indicates an “optional” circuit for capturing a high speed sync pulse and providing time stamps back to the real time PC. This is another option that might be useful for certain setups (or especially for establishing benchmarks between behavior and physiology recordings). In our setup event alignment does not exclusively occur on the GPIO.

      a. Additionally, why is there a separate circuit for the treadmill that connects to the UI computer instead of the GPIO? It might be beneficial to elaborate on the rationale behind this decision in line 260.

      Event alignment does not occur on the GPIO, separating concerns between position tracking and more general input/output features which improves performance and simplifies debugging.  In this sense we maintain a single event loop on the Arduino, avoiding the need to either run multithreaded operations or rely extensively on interrupts which can cause unpredictable code execution (e.g. when multiple interrupts occur at the same time). Our position tracking circuit is therefore coupled to a separate,low-cost arduino mini which has the singular responsibility of position-tracking.

      b. Moreover, should scenarios involving pupil and body camera recordings connect to the Analog input in the PCB or the real-time computer for optimal data handling and processing?

      Pupil and body camera recordings would be independent data streams which can be recorded separately from behaviorMate. Aligning these forms of full motion video could require frame triggers which could be configured on the GPIO board using single TTL like outputs or by configuring a valve to be “pulsed” which is a provided type customization.

      We also note that a more advanced developer could easily leverage camera signals to provide closed loop control by writing an independent module that sends UDP packets to behavoirMate. For example a separate computer vision based position tracking module could be written in any preferred language and use UDP messaging to send body tracking updates to the UI without editing any of the behaviorMate source code (and even used for updating 1D location).

      (5) Given that all references, as far as I can see, come from the same lab, are there other labs capable of implementing this system at a similar optimal level?

      To date two additional labs have published using behaviorMate, the Soltez and Henn labs (see revised lines 341-342). Since behaviorMate has only recently been published and made available open source, only external collaborators of the Losonczy lab have had access to the software and design files needed to do this. These collaborators did, however, set up their own behavioral setups in separate locations with minimal direct support from the authors–similar to what would be available to anyone seeking to set a behaviorMate system would find online on our github page or by posting to the message board.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (4) To provide additional context for the significance of this work, additional citations would be helpful to demonstrate a ubiquitous need for a system like behaviorMate. This was most needed in the paragraph from lines 46-65, specifically for each sentence after line 55, where the authors discuss existing variants on head-fixed behavioral paradigms. For instance, for the clause "but olfactory and auditory stimuli have also been utilized at regular virtual distance intervals to enrich the experience with more salient cues", suggested citations include Radvansky & Dombeck 2018 (DOI: 10.1038/s41467-018-03262-4), Fischler-Ruiz et al. 2021 (DOI: 10.1016/j.neuron.2021.09.055).

      We thank the reviewer for the suggested missing citations and have updated the manuscript accordingly (see line 58).

      (5) In addition, it would also be helpful to clarify behaviorMate's implementation in other laboratories. On line 304 the authors mention "other labs" but the following list of citations is almost exclusively from the Losonczy lab. Perhaps the citations just need to be split across the sentence for clarity? E.g. "has been validated by our experimental paradigms" (citation set 1) "and successfully implemented in other labs as well" (citation set 2).

      We have split the citation set as suggested (see lines 338-342).

      Minor Comments:

      (6) In the paragraph starting line 153 and in Fig. 2, please clarify what is meant by "trial" vs. "experiment". In many navigational tasks, "trial" refers to an individual lap in the environment, but here "trial" seems to refer to the whole behavioral session (i.e. synonymous with "experiment"?).

      In our software implementation we had originally used “trial” to refer to an imaging session rather than experiment (and have made updates to start moving to the more conventional lexicon). To avoid confusion we have remove this use of “trial” throughout the manuscript and replaced with “experiment” whenever possible

      (7) This is very minor, but in Figure 3 and 4, I don't believe the gavage needle is actually shown in the image. This is likely to avoid clutter but might be confusing to some readers, so it may be helpful to have a small inset diagram showing how the needle would be mounted.

      We assessed the image both with and without the gavage needle and found the version in the original (without) to be easier to read and less cluttered and therefore maintained that version in the manuscript.

      (8) In Figure 5 legend, please list n for mice and cells.

      We have updated the Figure 5 legend to indicate that for panels C-G, n=6 mice (all mice were recorded in both VR and TM systems), 3253 cells in VR classified as significantly tuned place cells VR, and 6101 tuned cells in TM,

      (9) Line 414: It is not necessary to tilt the entire animal and running wheel as long as the head-bar clamp and objective can rotate to align the imaging window with the objective's plane of focus. Perhaps the authors can just clarify the availability of this option if users have a microscope with a rotatable objective/scan head.

      We have added the suggested caveat to the manuscript in order to clarify when the goniometers might be useful (see lines 281-288).

      (10) Figure S1 and S2 could be referenced explicitly in the main text with their related main figures.

      We have added explicit references to figures S1 and S2 in the relevant sections (see lines 443, 460  and 570)

      (11) On line 532-533, is there a citation for "proximal visual cues and tactile cues (which are speculated to be more salient than visual cues)"?

      We have added citations to both Knierim & Rao 2003 and Renaudineau et al. 2007 which discuss the differential impact of proximal vs distal cues during navigation as well as Sofroniew et al. 2014 which describe how mice navigate more naturally in a tactile VR setup as opposed to purely visual ones.

      (12) There is a typo at the end of the Figure 2 legend, where it should say "Arduino Mini."

      This typo has been fixed.

      Reviewer #2 (Recommendations For The Authors):

      (4) As mentioned in the public review: what is the major advantage of taking the IoT approaches as opposed to USB connections to the host computer, especially when behaviorMate relies on a central master computer regardless? The authors mentioned the readability of the JSON messages, making the system easier to debug. However, the flip side of that is the efficiency of data transmission. Although the bandwidth/latency is usually more than enough for transmitting data and commands for behavior devices, the efficiency may become a problem when neural recording devices (imaging or electrophysiology) need to be included in the system.

      behaviorMate is not intended to do everything, and is limited to mainly controlling behavior and providing some synchronizing TTL style triggers. In this way the system can easily and inexpensively be replicated across multiple recording setups; particularly this is useful for constructing additional animal training setups. The system is very much sufficient for capturing behavioral inputs at relevant timescales (see the benchmarks in Figures 3 and 4 as well as the position correlated neural activity in Figures 5 and 6 for demonstration of this). Additional hardware might be needed to align the behaviorMate output with neural data for example a high-speed DAQ or input channels on electrophysiology recording setups could be utilized (if provided). As all recording setups are different the ideal solution would depend on details which are hard to anticipate. We do not mean to convey that the full neural data would be transmitted to the behaviorMate system (especially using the JSON/UDP communications that behaviorMate relies on).

      (5) The author mentioned labView. A popular open-source alternative is bonsai (https://github.com/bonsai-rx/bonsai). Both include a graphical-based programming interface that allows the users to easily reconfigure the hardware system, which behaviorMate seems to lack. Additionally, autopilot (https://github.com/auto-pi-lot/autopilot) is a very relevant project that utilizes a local network for multiple behavior devices but focuses more on P2P communication and rigorously defines the API/schema/communication protocols for devices to be compatible. I think it's important to include a discussion on how behaviorMate compares to previous works like these, especially what new features behaviorMate introduces.

      We believe that behaviorMate provides a more opinionated and complete solution than the projects mentioned. A wide variety of 1D navigational paradigms can be constructed in behaviorMate without the need to write any novel software. For example, bonsai is a “visual programming language” and would require experimenters to construct a custom implementation of each of their experiments. We have opted to use Java for the UI with distributed computations across modules in various languages. Given the IOT methodology it would be possible to use any number of programming languages or APIs; a large number of design decisions were made  when building the project and we have opted to not include this level of detail in the manuscript in order to maintain readability. We strongly believe in using non-proprietary and open source projects, when possible, which is why the comparison with LabView based solutions was included in the introduction. Also, we have added a reference to the autopilot reference to the section of the introduction where this is discussed.

      (6) One of the reasons labView/bonsai are popular is they are inherently parallel and can simultaneously respond to events from different hardware sources. While the JSON events in behaviorMate are asynchronous in nature, the handling of those events seems to happen only in a main event loop coupled with GUI, which is sequential by nature. Is there any multi-threading/multi-processing capability of behaviorMate? If so it's an important feature to highlight. If not I think it's important to discuss the potential limitation of the current implementation.

      IOT solutions are inherently concurrent since the computation is distributed. Additional parallelism could be added by further distributing concerns between additional independent modules running on independent hardware. The UI has an eventloop which aggregates inputs and then updates contexts based on the current state of those inputs sequentially. This sort of a “snapshot” of the current state is necessary to reason about when the start certain contexts based on their settings and applied decorators. While the behaviorMate UI uses multithreading libraries in Java to be more performant in certain cases, the degree to which this represents true vs “virtual” concurrency would depend on the individual PC architecture it is run on and how the operating system allocates resources. For this reason, we have argued in the manuscript that behaviorMate is sufficient for controlling experiments at behaviorally relevant timescales, and have presented both benchmarks and discussed different synchronization approaches and permit users to determine if this is sufficient for their needs.

      (7) The context list is an interesting and innovative approach to abstract behavior contingencies into a data structure, but it's not currently discussed in depth. I think it's worth highlighting how the context list can be used to cover a wide range of common behavior experimental contingencies with detailed examples (line 185 might be a good example to give). It's also important to discuss the limitation, as currently the context lists seem to only support contingencies based purely on space and time, without support for more complicated behavior metrics (e.g. deliver reward only after X% correct).

      To access more complex behavior metrics during runtime, custom context list decorators would need to be implemented. While this is less common in the sort of 1D navigational behaviors the project was originally designed to control, adding novel decorators is a simple process that only requires basic object oriented programming knowledge. As discussed we are also implementing a plugin-architecture in the JavaFX update to streamline these types of additions.

      Minor Comments:

      (8) In line 202, the author suggests that a single TTL pulse is sent to mark the start of a recording session, and this is used to synchronize behavior data with imaging data later. In other words, there are no synchronization signals for every single sample/frame. This approach either assumes the behavior recording and imaging are running on the same clock or assumes evenly distributed recording samples over the whole recording period. Is this the case? If so, please include a discussion on limitations and alternative approaches supported by behaviorMate. If not, please clarify how exactly synchronization is done with one TTL pulse.

      While the TTL pulse triggers the start of neural data in our setups, various options exist for controlling for the described clock drift across experiments and the appropriate one depends on the type of recordings made, frame rate duration of recording etc. Therefore behaviorMate leaves open many options for synchronization at different time scales (e.g. the adding a frame-sync circuit as shown in Figure 1 or sending TTL pulses to the same DAQ recording electrophysiology data).  Expanded consideration of different synchronization methods has been included in the manuscript (see lines 224-238).

      (9) Is the computer vision-based calibration included as part of the GUI functionality? Please clarify. If it is part of the GUI, it's worth highlighting as a very useful feature.

      The computer vision-based benchmarking is not included in the GUI. It is in the form of a script made specifically for this paper. However for treadmill-based experiments behaviorMate has other calibration tools built into it (see line 301-303).

      (10) I went through the source code of the Arduino firmware, and it seems most "open X for Y duration" functions are implemented using the delay function. If this is indeed the case, it's generally a bad idea since delay completely pauses the execution and any events happening during the delay period may be missed. As an alternative, please consider approaches comparing timestamps or using interrupts.

      We have avoided the use of interrupts on the GPIO due to the potential for unpredictable code execution. There is a delay which is only just executed if the duration is 10 ms or less as we cannot guarantee precision of the arduino eventloop cycling faster than this. Durations longer than 10 ms would be time stamped and non-blocking. We have adjusted this MAX_WAIT to be specified as a macro so it can be more easily adjusted (or set to 0).

      (11) Figure 3 B, C, D, and Figure 4 D, E suffer from noticeable low resolution.

      We have converted Figure 3B, C, D and 4C, D, E to vector graphics in order to improve the resolution.

      (12) Figure 4C is missing, which is an important figure.

      This figure appeared when we rendered and submitted the manuscript. We apologize if the figure was generated such that it did not load properly in all pdf viewers. The panel appears correctly in the online eLife version of the manuscript. Additionally, we have checked the revision in Preview on Mac OS as well as Adobe Acrobat and the built-in viewer in Chrome and all figure panels appear in each so we hope this issue has been resolved.

      (13) There are thin white grid lines on all heatmaps. I don't think they are necessary.

      The grid lines have been removed from the heatmaps  as suggested.

      (14) Line 562 "sometimes devices directly communicate with each other for performance reasons", I didn't find any elaboration on the P2P communication in the main text. This is potentially worth highlighting as it's one of the advantages of taking the IoT approaches.

      In our implementation it was not necessary to rely on P2P communication beyond what is indicated in Figure 1. The direct communication referred to in line 562 is meant only to refer to the examples expanded on in the rest of the paragraph i.e. the behavior controller may signal the microscope directly using a TTL signal without looping back to the UI. As necessary users could implement UDP message passing between devices, but this is outside the scope of what we present in the manuscript.

      (15) Line 147 "Notably, due to the systems modular architecture, different UIs could be implemented in any programming language and swapped in without impacting the rest of the system.", this claim feels unsupported without a detailed discussion of how new code can be incorporated in the GUI (plugin system).

      This comment refers to the idea of implementing “different UIs”. This would entail users desiring to take advantage of the JSON messaging API and the proposed electronics while fully implementing their own interface. In order to facilitate this option we have improved documentation of the messaging API posted in the README file accompanying the arduino source code. We have added reference to the supplemental materials where readers can find a link to the JSON API implementation to clarify this point.

      Additionally, while a plugin system is available in the JavaFX version of behaviorMate, this project is currently under development and will update the online documentation as this project matures, but is unrelated to the intended claim about completely swapping out the UI.

      Reviewer #3 (Recommendations For The Authors):

      (6) Figure 1 - the terminology for each item is slightly different in the text and the figure. I think making the exact match can make it easier for the reader.

      - Real-time computer (figure) vs real-time controller (ln88).

      The manuscript was adjusted to match figure terminology.

      - The position controller (ln565) - position tracking (Figure).

      We have updated Figure 1 to highlight that the position controller does the position tracking.

      - Maybe add a Behavior Controller next to the GPIO box in Figure 1.

      We updated Figure 1 to highlight that the Behavior Controller performs the GPIO responsibility such that "Behavior Controller" and "GPIO circuit" may be used interchangeably.

      - Position tracking (fig) and position controller (subtitle - ln209).

      We updated Figure 1 to highlight that the position controller does the position tracking.

      - Sync Pulse is not explained in the text.

      The caption for Figure 1 has been updated to better explain the Sync pulse and additional systems boxes

      (7) For Figure 3B/C: What is the number of data points? It would be nice to see the real population, possibly using a swarm plot instead of box plots. How likely are these outliers to occur?

      In order to better characterize the distributions presented in our benchmarking data we have added mean and standard deviation information the plots 3 and 4. For Figure 3B: 0.0025 +/- 0.1128, Figure 3C: 12.9749 +/- 7.6581, Figure 4C: 66.0500 +/- 15.6994, Figure 4E: 4.1258 +/- 3.2558.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Time periods in which experience regulates early plasticity in sensory circuits are well established, but the mechanisms that control these critical periods are poorly understood. In this manuscript, Leier and Foden and colleagues examine early-life critical periods that regulate the Drosophila antennal lobe, a model sensory circuit for understanding synaptic organization. Using early-life (0-2 days old) exposure to distinct odorants, they show that constant odor exposure markedly reduces the volume, synapse number, and function of the VM7 glomerulus. The authors offer evidence that these changes are mediated by invasion of ensheathing glia into the glomerulus where they phagocytose connections via a mechanism involving the engulfment receptor Draper.

      This manuscript is a striking example of a study where the questions are interesting, the authors spent a considerable amount of time to clearly think out the best experiments to ask their questions in the most straightforward way, and expressed the results in a careful, cogent, and well-written fashion. It was a genuine delight to read this paper. I have two experimental suggestions that would really round out existing work to better support the existing conclusions and some instances where additional data or tempered language in describing results would better support their conclusions. Overall, though, this is an incredibly important finding, a careful analysis, and an excellent mechanistic advance in understanding sensory critical period biology.

      We thank the reviewer for their thoughtful and constructive comments on our manuscript. In response to their critiques, we conducted several new experiments as well as additional analysis and making changes to the text. As requested, we carried out an electrophysiological analysis of VM7 PN firing in draper knockdown animals with and without odor exposure. To our surprise, loss of glial Draper fully suppresses the dramatic reduction in spontaneous PN activity observed following critical period ethyl butyrate exposure, arguing that the functional response is restored alongside OSN morphology. It also suggests that the OR42a OSN terminals are intact and functional until they are phagocytosed by ensheathing glia. In other words, glia are not merely clearing axon terminals that have already degenerated. This evidence provides additional support to the claim that the VM7 glomerulus will be an outstanding model for defining mechanism of experience-dependent glial pruning. Detailed responses to the reviewers’ comments follow below. 

      Regarding the apparent disconnect between the near complete silencing of PNs versus the 50% reduction in OR42a OSN infiltration volume, we agree with the reviewer that this tracks with previous data in the field. While our Imaris pipeline is relatively sensitive, it may not pick up modest changes to terminal arbor architecture. Indeed, as described in Jindal et al. (2023) and in the Methods in this manuscript, we chose conservative software settings that, if anything, would undercount the percent change in infiltration volume. We also note that increased inhibitory LN inputs onto PNs could contribute to dramatic PN silencing we observe. While fascinating, we view LN plasticity beyond the scope of the current manuscript. We removed any mention of ‘silent synapses’ and now speculate about increased inhibition. 

      Reviewer #1 (Recommendations For The Authors):

      Major Elements:

      (1) The authors demonstrate that loss of draper in glia can suppress many of the pruning related phenotypes associated with EB exposure. However, they do not assess electrophysiological output in these experiments, only morphology. It would be great to see recordings from those animals to see if the functional response is also restored.

      We performed the experiment the reviewer requested (see Figure 4F-J). We are pleased to report that our recordings from VM7 PNs match our morphology measurements: in repo-GAL4>UAS-draper RNAi flies, there was no difference in the innervation of VM7 PNs between animals exposed to mineral oil or 15% EB from 0-2 DPE. This result is in sharp contrast to the near-total loss of OSN-PN innervation in flies with intact glial Draper signaling, and strongly validates the role we propose for Draper in the Or42a OSN critical period.

      (2) There is a disconnect between physiology and morphology with a near complete loss of activity from VM7 PNs but a less severe loss of ORN synapses. While not completely incongruent (previous work in the AL showed a complete loss of attractive behavior though synapse number was only reduced 40% - Mosca et al. 2017, eLife), it is curious. Can the authors comment further? Ideally, some of these synapses could be visualized by EM to determine if the remaining synapses are indeed of correct morphology. If not, this could support their assertion of silent inputs from page 7. Further, what happens to the remaining synapses? VM7 PNs should be receiving some activity from other local interneurons as well as neighboring PNs.

      We agree that on the surface, our electrophysiology results are more striking than one might expect solely from our measurements of VM7 morphology and presynaptic content. As the reviewer points out, previous studies of fly olfaction have consistently found that relatively modest shifts in glomerular volume in response to prolonged earlylife odorant exposure can be accompanied by drastic changes in physiology and behavior (in addition, we would add Devaud et al., 2003; Devaud et al., 2001; Acebes et al., 2012; and Chodankar et al., 2020, as foundational examples of this phenomenon). 

      A major driver of these changes appears to be remodeling of antennal lobe inhibitory LNs (see Das et al., 2011; Wilson and Laurent, 2005; Chodankar et al., 2020), especially GABAergic inhibitory interneurons. Perhaps increased LN inhibition of chronically activated PNs, on top of the reduced excitatory inputs resulting from ensheathing glial pruning of the Or42a OSN terminal arbor, would explain the near-total loss of VM7 PN activity we observe after critical period EB exposure. However, given that the scope of our study is limited to critical-period glial biology and does not address the complex topics of LN rewiring or synapse morphology, we have removed the sentence in which we raise the possibility of “silent synapses” in order to avoid confusion. The reviewer is also correct that VM7 PNs have inputs from non-ORN presynaptic partners, including LNs and PNs. So again, perhaps increased inhibitory inputs contributes to the near-complete silencing of the PNs. Given the heterogeneity of LN populations, we view this area as fertile ground for future research. 

      Language / Data Considerations:

      (1) Or42a OSNs have other inputs, namely, from LNs. What are they doing here? Are they also affected?

      As discussed above, the question of how LN innervation of Or42a OSNs is altered by critical-period EB exposure is an intriguing one that fully deserves its own follow-up study, and we have tried to avoid speculation about the role of LNs when discussing our pruning phenotype. We note at multiple points throughout the text the importance of LNs and refer to previous studies of LN plasticity in response to chronic odorant exposure. 

      (2) In all of the measurements, what happens to synaptic density? Is it maintained? Does it scale precisely? This would be helpful to know.

      We have performed the analysis as requested, which is now included in a supplement to Figure 5. We found that synaptic density shows no trend in variation across conditions and glial driver genotypes.

      (3) In Figure 5, the controls for the alrm-GAL4 experiments show a much more drastic phenotype than controls in previous figures? Does this background influence how we can interpret the results? Could the response have instead hit a floor effect and it's just not possible to recover?

      The reviewer is correct that following EB exposure, astrocyte vs. ensheathing glial driver backgrounds displayed modest differences in the extent of pruning by volume (0.27 for astros, 0.36 for EG). We note that the two drpr RNAi lines that we used had non-significant (but opposite) effects on the estimated size of OSN42a OSN volume in combination with the astrocyte driver, arguing against a floor effect. In addition, a recent publication by Nelson et al. (2024) replicated our findings with a different astrocyte GAL4 driver and draper RNAi line. Thus, we are confident that this result is biologically meaningful and not an artifact of genetic background. 

      (4) The estimation of infiltration measurement in Figure 6 is tricky to interpret. It implies that the projections occupy the same space, which cannot be possible. I'd advocate a tempering of some of this language and consider an intensity measurement in addition to their current volume measurements (or perhaps an "occupied space" measurement) to more accurately assess the level of resolution that can be obtained via these methods.

      We completely agree that our language in describing EG infiltration could have been more precise, and we modified our language as suggested. The combination of the Or42a-mCD8::GFP label we and others use, our use of confocal microscopy, and our Surface pipeline in Imaris combine to create a glomerular mask that traces the outline of the OSN terminal arbor, but is nonetheless not 100% “filled” by neuronal membrane and/or glial processes. 

      (5) Do the authors have the kind of resolution needed to tell whether there is indeed Or42a-positive axon fragmentation (as asserted on p16 and from their data in figures 4, 5, 7). If the authors want to say this, I would advocate for a measurement of fragmentation / total volume to prove it - if not, I would advocate tempering of the current language.

      The reviewer brings up a fair criticism: while our assertion about axon fragmentation was based on our visual observations of hundreds of EB-exposed brains, the resolution limits of confocal microscopy do not allow us to rigorously rule out fragmentation within a bundle of OSN axons. Instead, our most compelling evidence for the lack of EB-induced Or42a OSN fragmentation in the absence of glial Draper comes from our new electrophysiology data (Figure 4F-J) in repo-GAL4>UAS-draper RNAi animals. We found no difference in spontaneous release from Or42a terminals in flies exposed to mineral oil or 15% EB from 0-2 DPE, which would not be the case if there was Draper-independent fragmentation along the axons or terminal arbors upon EB exposure. We have updated our discussion of fragmentation so that our statements are based on this new evidence, and not confocal microscopy. 

      (6) There is an interesting Discussion opportunity missed here. Some experiments would, ostensibly, require pupae to detect odorants within the casing via structures consistently in place for olfaction during pupation. It would be useful for the authors to discuss a little more deeply when this critical period may arise and why the experiment where pupae are exposed to EB two days before eclosion and there is no response, occurs as it does. I agree that it's clearly a time when they are not sensitive to the odorant, but that could just be because there's no ability to detect odorants at that time. Is it a question of non-sensitivity to EB or just non-sensitivity to everything?

      We share the reviewer’s interest in the plasticity of the olfactory circuit during pupariation, although, as they correctly point out, it is difficult to conceive of an odorant-exposure experiment that could disentangle the barrier effects of puparium from the sensitivity of the circuit itself, and our pre-eclosion data in Figure 3A, D, G does not distinguish between the two. While an investigation into mechanism by which the critical period for ethyl butyrate exposure opens and closes is outside the scope of the present study, we would consider the physical barrier of the puparium to be a satisfactory explanation for why eclosion marks the functional opening of experiencedependent plasticity. As the reviewer suggests, we have added this important nuance to our discussion of the opening of the critical period in the corresponding paragraph of the Results, as well as to the Discussion section “Glomeruli exhibit dichotomous responses to critical period odor exposure.” 

      Minor Elements:

      (1) Page 6 bottom: "Or4a-mCD8::GFP" should be "Or42a-mCD8::GFP"

      (2) Page 15, end of last full paragraph. Remove the "e"

      Thank you for pointing out these typos. They have been corrected. 

      Reviewer #2 (Public Review):

      Sensory experiences during developmental critical periods have long-lasting impacts on neural circuit function and behavior. However, the underlying molecular and cellular mechanisms that drive these enduring changes are not fully understood. In Drosophila, the antennal lobe is composed of synapses between olfactory sensory neurons (OSNs) and projection neurons (PNs), arranged into distinct glomeruli. Many of these glomeruli show structural plasticity in response to early-life odor exposure, reflecting the sensitivity of the olfactory circuitry to early sensory experiences.

      In their study, the authors explored the role of glia in the development of the antennal lobe in young adult flies, proposing that glial cells might also play a role in experiencedependent plasticity. They identified a critical period during which both structural and functional plasticity of OSN-PN synapses occur within the ethyl butyrate (EB)responsive VM7 glomerulus. When flies were exposed to EB within the first two days post-eclosion, significant reductions in glomerular volume, presynaptic terminal numbers, and postsynaptic activity were observed. The study further highlights the importance of the highly conserved engulfment receptor Draper in facilitating this critical period plasticity. The authors demonstrated that, in response to EB exposure during this developmental window, ensheathing glia increase Draper expression, infiltrate the VM7 glomerulus, and actively phagocytose OSN presynaptic terminals. This synapse pruning has lasting effects on circuit function, leading to persistent decreases in both OSN-PN synapse numbers and spontaneous PN activity as analyzed by perforated patch-clamp electrophysiology to record spontaneous activity from PNs postsynaptic to Or42a OSNs.

      In my view, this is an intriguing and potentially valuable set of data. However, since I am not an expert in critical periods or habituation, I do not feel entirely qualified to assess the full significance or the novelty of their findings, particularly in relation to existing research.

      We thank the reviewer for their insightful critique of our work. In response to their comments, we added additional physiological analysis and tempered our language around possible explanations for the apparent disconnect between the physiological and morphological critical period odor exposure. These changes are explained in more detail in the response to the public review by Reviewer 1 and also in our responses outlined below. 

      Reviewer #2 (Recommendations For The Authors):

      I though do have specific comments and questions concerning the presynaptic phenotype they deduce from confocal BRP stainings and electrophysiology.

      Concerning the number of active zones: this can hardly be deduced from standardresolution confocal images and, maybe more importantly, lacking postsynaptic markers. This particularly also in the light of them speculating about "silent synapses". There are now tools existing concerning labeled, cell type specific expression of acetylcholine-receptor expression and cholinergic postsynaptic density markers (importantly Drep2). Such markers should be entailed in their analysis. They should refer to previous concerning "brp-short" concerning its original invention and prior usage.

      We thank the reviewer for their thoughtful approach to our methodology and claims. While the use of confocal microscopy of Bruchpilot puncta to estimate numbers of presynapses is standard practice (see Furusawa et al., 2023; Aimino et al., 2022; Urwyler et al., 2019; Ackerman et al., 2021), the reviewer is correct that a punctum does not an active zone make. Bruchpilot staining and quantification is a well-validated tool for approximating the number of presynaptic active zones, not a substitute for super-resolution microscopy. We made changes to our language about active zones to make this distinction clearer. We have also removed the sentence where we discuss the possibility of “silent synapses,” which both reviewers felt was too speculative for our existing data. Finally, we are highly interested in characterizing the response of PNs and higher-order processing centers to critical-period odorant exposure as a future direction for our research. However, given the complexity of the subject, we chose to limit the scope of this study to the interactions between OSNs and glia. 

      Regarding their electrophysiological analysis and the plausibility of their findings: I am uncertain whether the moderate reduction in BRP puncta at the relevant OSN::PN synapse can fully account for the significantly reduced spontaneous PN activity they report. This seems particularly doubtful in the absence of any direct evidence for postsynaptically silent synapses. Perhaps this is my own naivety, but I wonder why they did not use antennal nerve stimulation in their experiments?

      We refer to previous studies of the AL indicating that moderate changes in glomerular volume and presynaptic content can translate to far more striking alterations in electrophysiology and behavior (Devaud et al., 2003; Devaud et al., 2001; Acebes et al., 2012; and Chodankar et al., 2020, Mosca et al., 2017). This literature has demonstrated that chronic odorant exposure can result in remodeling of inhibitory local interneurons to suppress over-active inputs from OSNs. While we do not address the complex subject of interneuron remodeling in the present study, we find it highly likely that there would be significant changes in interneuron innervation of PNs, independent of glial phagocytosis of OSN excitatory inputs, resulting in additional inhibition. Moving forward, we are very interested in expanding these studies to include odor-evoked changes in PN activity.  

      Additional minor point: The phrase "Soon after its molecular biology was described (et al., 1999), the Drosophila melanogaster" seems somewhat misleading. Isn't the field still actively describing the molecular biology of the fly olfactory system?

      We completely agree and have removed this sentence entirely.  

      Reviewing Editor's Note: to enhance the evidence from mostly compelling in most facets to solid would be to add physiology to the Draper analysis.

      These experiments have been completed and are presented in Figure 4F-J. 

      References

      Acebes A, Devaud J-M, Arnés M, Ferrús A. 2012. Central Adaptation to Odorants Depends on PI3K Levels in Local Interneurons of the Antennal Lobe. J Neurosci 32:417–422. doi:10.1523/jneurosci.2921-11.2012

      Ackerman SD, Perez-Catalan NA, Freeman MR, Doe CQ. 2021. Astrocytes close a motor circuit critical period. Nature592:414–420. doi:10.1038/s41586-021-03441-2

      Aimino MA, DePew AT, Restrepo L, Mosca TJ. 2022. Synaptic Development in Diverse Olfactory Neuron Classes Uses Distinct Temporal and Activity-Related Programs. J Neurosci 43:28–55. doi:10.1523/jneurosci.0884-22.2022

      Chodankar A, Sadanandappa MK, VijayRaghavan K, Ramaswami M. 2020. Glomerulus-Selective Regulation of a Critical Period for Interneuron Plasticity in the Drosophila Antennal Lobe. J Neurosci 40:5549–5560. doi:10.1523/jneurosci.2192-19.2020

      Das S, Sadanandappa MK, Dervan A, Larkin A, Lee JA, Sudhakaran IP, Priya R, Heidari R, Holohan EE, Pimentel A, Gandhi A, Ito K, Sanyal S, Wang JW, Rodrigues V, Ramaswami M. 2011. Plasticity of local GABAergic interneurons drives olfactory habituation. Proc Natl Acad Sci 108:E646–E654. doi:10.1073/pnas.1106411108 Devaud J, Acebes A, Ramaswami M, Ferrús A. 2003. Structural and functional changes in the olfactory pathway of adult Drosophila take place at a critical age. J Neurobiol 56:13–23. doi:10.1002/neu.10215

      Devaud J-M, Acebes A, Ferrus A. 2001. Odor Exposure Causes Central Adaptation and ́Morphological Changes in Selected Olfactory Glomeruli in Drosophila. J Neurosci 21:6274–6282. doi:10.1523/jneurosci.21-16-06274.2001

      Furusawa K, Ishii K, Tsuji M, Tokumitsu N, Hasegawa E, Emoto K. 2023. Presynaptic Ube3a E3 ligase promotes synapse elimination through down-regulation of BMP signaling. Science 381:1197–1205. doi:10.1126/science.ade8978

      Mosca TJ, Luginbuhl DJ, Wang IE, Luo L. 2017. Presynaptic LRP4 promotes synapse number and function of excitatory CNS neurons. eLife 6:e27347. doi:10.7554/elife.27347

      Nelson N, Vita DJ, Broadie K. 2024. Experience-dependent glial pruning of synaptic glomeruli during the critical period. Sci Rep 14:9110. doi:10.1038/s41598-024-59942-3

      Urwyler O, Izadifar A, Vandenbogaerde S, Sachse S, Misbaer A, Schmucker D. 2019. Branch-restricted localization of phosphatase Prl-1 specifies axonal synaptogenesis domains. Science 364. doi:10.1126/science.aau9952

      Wilson RI, Laurent G. 2005. Role of GABAergic Inhibition in Shaping Odor-Evoked Spatiotemporal Patterns in the Drosophila Antennal Lobe. J Neurosci 25:9069–9079.

      doi:10.1523/jneurosci.2070-05.2005

    1. Author response:

      We thank the reviewers and the editor for the detailed and constructive feedback provided. We look forward to submitting a revised version of the manuscript that addresses their comments. We acknowledge that further clarification is needed about the novelty brought by our experimental setup and model in comparison to previous studies using different methodologies. We also acknowledge that more details can be included about the calibration steps and sensitivity of the model parameters. Below we detail the planned changes for the revised version regarding the points raised by the reviewers.

      Reviewer #1 (Public review):

      - The authors then claim that the fragmentation of aggregates due to fluid flows occurs through erosion of small pieces. Because their experimental setup does not allow them to explicitly observe this process (for example, by watching one aggregate break into pieces), they implement an idealized model to show that the nature of the changes to the size histogram agrees with an erosion process. However, in Figure 2C there is a noticeable gap between their experiment and the prediction of their model. Additionally, in a similar experiment shown in Figure S6, the experiment cannot distinguish between an idealized erosion model and an alternative, an idealized binary fission model where aggregates split into equal halves. For these reasons, this claim is weakened.

      The two idealized models of fragment distribution, namely erosion and binary fission, lead to distinguishable final size distributions. We believe that our experiments support the hypothesis of the erosion mechanism. Please note that Figure 2 is concerned with the fragmentation of large colonies, whereas Figure 3 and associated Figure S6 are concerned with very small colonies of a few cells formed by aggregation of single-cell suspension. Indeed, for very small colonies of a few cells, our experimental results cannot distinguish between a binary fission model and an erosion model (Figure S6).

      The situation is very different for large colonies. To address the reviewer’s concern, we will add a new figure in the Supplementary Information (SI), similar to our Figure 2C, where we will compare the erosion model with a binary fission model for large colonies fragmented under ε = 5.8 m<sup>2</sup>/s<sup>3</sup>. We already did this exercise. The results in this new supplementary figure will show that the idealized binary fission model (i.e., where every fracture event produces exactly two fragments) does not capture the experimental fragmentation behaviour of large colonies. In contrast, the idealized erosion model provides a much better prediction of the experimental results, within the experimental uncertainty and variability in colony strength, and has the notable advantage of a straightforward computational implementation.

      - The fourth major result of the manuscript is displayed in Equation 8 and Figure 5, where the authors derive an expression for the ratio between the rate of increase of a colony due to aggregation vs. the rate due to cell division. They then plot this line on a phase map, altering two physical parameters (concentration and fluid turbulence) to show under what conditions aggregation vs. cell division are more important for group formation. Because these results are derived from relatively simple biophysical considerations, they have the potential to be quite powerful and useful and represent a significant conceptual advance. However, there is a region of this phase map that the authors have left untested experimentally. The lowest energy dissipation rate that the authors tested in their experiment seemed to be \dot{epsilon}~1e-2 [m^2/s^3], and the highest particle concentration they tested was 5e-4, which means that the authors never tested Zone II of their phase map. Since this seems to be an important zone for toxic blooms (i.e. the "scum formation" zone), it seems the authors have missed an important opportunity to investigate this regime of high particle concentrations and relatively weak turbulent mixing.

      We agree with the reviewer that Zone (II) of Figure 5 is of great importance to dense bloom formation under wind mixing and that this parameter range was not covered by our experiments using a cone-and-plate shear flow. The measuring range of our device was motivated by engineering applications such artificial mixing of eutrophic lakes using bubble plumes, as well as preliminary experiments which demonstrated that high levels of dissipation rate were required to achieve fragmentation. The dissipation rates of our cone-and-plate experiments capture Zones (III) and (IV) and the higher end of Zone (I). However, the cone-and-plate experiments are less suitable for the lower dissipation rates of Zone (II), as indicated by the red bars in Figure 5, due to the accumulation of colonies in stagnation points.

      Instead, in our revision we will more extensively discuss recent results published in the literature for evidence of aggregation-dominance at Zone (II). The experimental studies of Wu et al. (2019) and Wu et al. (2024) (full citation below) investigated the formation of Microcystis surface scum layers at high colony concentrations (high biovolume fraction) in wind-mixed mesocosms. These studies identified aggregation of colonies at rates faster than cell division, while the stable colony size decreased with mixing rate.  The parameter range of these studies fall within Zone II, and their experimental results agree with our model predictions. We will include in the reviewed version these references and a detailed discussion elucidating the parameter range covered in our experiments and the findings of other studies.

      Wu, X., Noss, C., Liu, L., & Lorke, A. (2019). Effects of small-scale turbulence at the air-water interface on Microcystis surface scum formation. Water Research, 167, 115091.

      Wu, H., Wu, X., Rovelli, L., & Lorke, A. (2024). Dynamics of Microcystis surface scum formation under different wind conditions: the role of hydrodynamic processes at the air-water interface. Frontiers in Plant Science, 15, 1370874.

      Other items that could use more clarity:

      - The authors rely heavily on size distributions to make the claims of their paper. Yet, how they generated those size distributions is not clearly shown in the text. Of primary concern, the authors used a correction function (Equation S1) to estimate the counts of different size classes in their image analysis pipeline. Yet, it is unclear how well this correction function actually performs, what kinds of errors it might produce, and how well it mapped to the calibration dataset the authors used to find the fit parameters.

      We agree with the reviewer that more details of the calibration processes should be included. We will include in the revised version of the SI more details of the calibration steps and direct comparison of raw and corrected histograms of the size distribution and its associated uncertainty.

      - Second, in their models they use a fractal dimension to estimate the number of cells in the group from the group radius, but the agreement between this fractal dimension fit and the data is not shown, so it is not clear how good an approximation this fractal dimension provides. This is especially important for their later derivation of the "aggregation-to-cell division" ratio (Equation 8)

      We agree with the reviewer that more details on the estimation of fractal dimension are needed. The revised version of the SI will include the estimation procedure, the number of colonies analysed, and the associated uncertainty.

      Reviewer #2 (Public review)

      - Especially the introduction seems to imply that shear force is a very important parameter controlling colony formation. However, if one looks at the results this effect is overall rather modest, especially considering the shear forces that these bacterial colonies may experience in lakes. The main conclusion seems that not shear but bacterial adhesion is the most important factor in determining colony size. As the importance of adhesion had been described elsewhere, it is not clear what this study reveals about cyanobacterial colonies that was not known before.

      As we explain in the Introduction, it is a major open question whether cyanobacterial colonies are formed mainly by cell division (after which the dividing cells remain attached to each other by the EPS layer) or mainly by the aggregation of independent cells & colonies. See for example the highly cited review of Xiao & Reynolds 2018 (our ref 17), and references therein. This question has not been resolved and is investigated in our study. We would like to emphasize several key findings that our study reveals about the mechanical behaviour of cyanobacterial colonies under flow:

      (i) Quantification of mechanical strength in cyanobacterial colonies: Our results demonstrate the high mechanical strength of cyanobacterial colonies (much higher than previously thought in references 32 and 39 of the manuscript), as evidenced by the requirement of very high shear rates to achieve fragmentation. To this end, our study highlights their resilience against naturally occurring flows and bridges the gap between theoretical assumptions about colony strength and experimentally measured mechanical properties.

      (ii) Validation of a hypothesis regarding colony formation: Using a fluid-mechanical approach, we confirm the findings of recent genetic studies (references 25 and 64 of the manuscript) which indicated that colony formation of cyanobacteria under natural conditions occurs predominantly via cell division rather than via the aggregation of individual cells. Only in very dense blooms and surface scums, colony formation by the aggregation of smaller colonies likely plays a role.

      (iii) Practical guidelines for cyanobacterial bloom control: Our findings provide valuable insights into the design of artificial mixing systems that are used to suppress surface blooms of buoyant cyanobacteria in lakes. In these lake applications, in which we have been involved, the aim of the mixing is to disperse the colonies over the water column so that they cannot form a surface layer (i.e., the mixing intensity should overcome the flotation velocity of the colonies), which takes away the competitive advantage of buoyant cyanobacteria over nonbuoyant phytoplankton species. However, it has always been an open question whether the high shear of artificial mixing would cause colony fragmentation. An understanding of changes in colony size is relevant for the design of artificial mixing, because smaller colonies have a lower flotation velocity. Our results show that the dissipation rates that are generated by artificial mixing are sufficient to prevent aggregation of large colonies, but not high enough to induce fragmentation of division-formed colonies.

      In the revised version of the manuscript, we will improve the writing to better clarify these three novel insights obtained from our study.

      - The agreement between model and experiments is impressive, but the role of the fit parameters in achieving this agreement needs to be further clarified.

      The influence of the fit parameters (namely the stickiness α1 and the pairs of colony strength parameters S1,q1,S2,q2) is discussed in the sections “DYNAMICAL CHANGES IN COLONY SIZE MODELED BY A TWO-CATEGORY DISTRIBUTION” and “MATERIALS AND METHODS.” We kept the discussion concise to maintain readability. However, we agree with the reviewer that additional details about the importance of the fit parameters and the sensitivity of the results to these parameters could be beneficial. In the revised version of the SI, we will include a more detailed discussion of the fit parameters.

      - The article may not be very accessible for readers with a biology background. Overall, the presentation of the material can be improved by better describing their new method.

      We apologize for the limited readability of the description of the experimental setup and model used. In the revised version of the manuscript, we aim to expand the description of the new methods presented here for a broader audience of biology.

    1. Author response:

      We thank all the reviewers for their insightful comments on this work.

      Response to Reviewer #1:

      We greatly appreciate your comments on the general reliability and significance of our work. We fully agree that it would have been ideal to have additional evidence related to the role of PEBP1 in HRI activation. Unfortunately, we have not been able to find phospho-HRI antibodies that work reliably. The literature seems to agree with this as a band shift using total-HRI antibodies is usually used to study HRI activation. However, with the cell lines showing the most robust effect with PEBP1 knockout or knockdown, we are yet to convince ourselves with the band shifts we see. This could be addressed by optimizing phos-tag gels although these gels can be a bit tricky with complex samples such as cell lysates which contain many phosphoproteins.

      To address the interaction between PEBP1 and eIF2alpha more rigorously we were inspired by the insights you and reviewer #2 provided. While we are unable to do further experiments, we now think it would indeed be possible to do this with either using the purified proteins and/or CETSA WB. These experiments could also provide further evidence for the role of PEBP1 phosphorylation. Although phosphorylation of PEBP1 at S153 has been implicated as being important for other functions of PEBP1, we are not sure about its role here. It may indeed have little relevance for ISR signalling.

      For the in vitro thermal shift assay, we have performed two independent experiments. While it appears that there is a slight destabilization of PEBP1 by oligomycin, the ultimate conclusion of this experiment remains incomplete as there could be alternative explanations despite the apparent simplicity of the assay due the fluorescence background by oligomycin only. We now provide a lysate based CETSA analysis which does not display the same PEBP1 stabilization as the intact cell experiment. As for the signal saturation in ATF4-luciferase reporter assay, this is a valid point.

      Response to Reviewer #2:

      We strongly agree that CETSA has a lot of potential to inform us about cellular state changes and this was indeed the starting point for this project. We apologize for being (too) brief with the explanations of the TPP/MS-CETSA approach and we have now added a bit more detail. With regard to the cut-offs used for the mass spectrometry analysis, you are absolutely right that we did not establish a stringent cut-off that would show the specificity of each drug treatment. Our take on the data was that using the p values (and ignoring the fold-changes) of individual protein changes as in Fig 1D, we can see that mitochondrial perturbations display a coordinated response. We now realize that the downside of this representation is that it obscures the largest and specific drug effects. As mentioned in the response to Reviewer #1, we now also think that it would be possible to obtain more evidence for the potential interaction between PEBP1 and eIF2alpha using CETSA-based assays.

      Response to Reviewer #3:

      Thank you for your assessment, we agree that this manuscript would have been made much stronger by having clearer mechanistic insights. As mentioned in the responses to other reviewers above, we aim to address this limitation in part by looking at the putative interaction between PEBP1 and eIF2alpha with orthogonal approaches. However, we do realize that analysis of protein-protein interactions can be notoriously challenging due to false negative and false positive findings. As with any scientific endeavor, we will keep in mind alternative explanations to the observations, which could eventually provide that cohesive model explaining how precisely PEBP1, directly or indirectly, influences ISR signalling.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      The data overall are very solid, and I would only recommend the following minor changes: 

      (1) Line 187 and line 268: there is perhaps a trend towards slightly increased ATF4-luc reporter with PEBP1-S153D, but it is not statistically significant, so I would tone down the wording here. 

      We now modified this part to "This data is consistent with the modest increase…" .

      (2) The recently discovered SIFI complex (Haakonsen 2024, https://doi.org/10.1038/s41586023-06985-7) regulates both HRI and DELE1 through bifunctional localization/degron motifs. It seems like PEBP1 also contains such a motif, which suggests a potential mechanism for enrichment near mitochondria, perhaps even in response to stress. Maybe the authors could further speculate on this in the discussion. 

      While working on the manuscript, we considered the possibility that PEBP1 function could be related to SIFI complex and concluded that here is a critical difference: while  SIFI specifically acts to turn off stress response signalling, loss of PEBP1 prevents eIF2alpha phosphorylation. We did not however consider that PEBP1 could have a localization/degron motif. Motif analysis by deepmito (busca.biocomp.unibo.it) and similar tools did not identify any conventional mitochondrial targeting signal although we acknowledge that PEBP1 has a terminal alpha-helix which was identified for SIFI complex recognition. We are not sure why you think PEBP1 contains such a motif and therefore are hesitant to speculate on this further in the manuscript.

      (3) Line 358: references 50 and 45 are identical. 

      Thank you for spotting this. Corrected now. 

      (4) Figure S1D: it looks like Oligomycin has a significant background fluorescence, which makes interpretation of these graphs difficult - do you have measurements of the compound alone that can be used to subtract this background from the data? Based on the Tm I would say it does stabilize recombinant PEBP1, and there is no quantification of the variance across the 3 replicates to say there is no difference. 

      You are right, this assay is problematic due to the background fluorescence. The measurements with oligomycin only and subtracting this background results in slightly negative values and nonsensical thermal shift curves. We now additionally show quantification from two different experiments (unfortunately we ran out of reagents for further experiments), and this quantification shows that if anything, oligomycin causes mild destabilization of recombinant PEBP1. We also used lysate CETSA assay which does not show thermal stabilization of PEBP1 by oligomycin, ruling out a direct effect. We attempted to use ferrostatin1 as a positive control as it may bind PEBP1-ALOX protein complex, and it appeared to show marginal stabilization of PEBP1. 

      Reviewer #2 (Recommendations for the authors): 

      I have a few comments for the authors to address: 

      (1) The MS-CETSA experiment is quite briefly described and this could be expanded somewhat. Not clear if multiple biological replicates are used. Is there any cutoff in data analysis based on fold change size (which correlated to the significance of cellular effects), etc? As expected from only one early timepoint (see eg PMID: 38328090), there appear to be a limited number of significant shifts over the background (as judged from Figure S1A). In the Excel result file, however (if I read it right) there are large numbers of proteins that are assigned as stabilized or destabilized. This might be to mark the direction of potential shifts, but considering that most of these are likely not hits, this labeling could give a false impression. Could be good to revisit this and have a column for what could be considered significant hits, where a fold change cutoff could help in selecting the most biologically relevant hits. This would allow Figure 1D to be made crisper when it likely dramatically overestimates the overlap between significant CETSA shifts for these drugs.  

      Fair point, while we focused more on PEBP1, it is important to have sufficient description of the methods. We used duplicate samples for the MS, which is probably the most important point which was absent from the original submission as is now added to the methods. We also added slightly more description on the data analysis. While the AID method does not explicitly use log2 fold changes, it does consider the relative abundance of proteins under different temperature fractions. Since the Tm (melting temperature) for each protein can be at any temperature, we felt that if would be complicated to compare fractions where the protein stability is changed the most and even more so if we consider both significance and log2FC. Therefore, we used this multivariate approach which indicates the proteins with most likely changes across the range of temperatures. To acknowledge that most of the statistically significant changes are not the much over the background as you correctly pointed out, we now add to the main text that “However, most of these changes are relatively small. To focus our analysis on the most significant and biologically relevant changes…” We also agree that it may be confusing that the AID output reports de/stabilization direction for all proteins. In general, we are not big fans of cutoffs as these are always arbitrary, but with multivariate p value of 0.1 it becomes clear that there are only a relatively small number of hits with larger changes. We have now added to the guide in the data sheet that "Primarily, use the adjusted p value of the log10 Multivariate normal pvalue for selecting the overall statistically significant hits (p<0.05 equals  -1.30 or smaller; p<0.01 equals  -2 or smaller)". We have also added to the guide part of the table that “Note that this prediction does not consider whether the change is significant or not, it only shows the direction of change”

      (2) On page 4 the authors state "We reasoned that thermal stability of proteins might be particularly interesting in the context of mitochondrial metabolism as temperature-sensitive fluorescent probes suggest that mitochondrial temperature in metabolically active cells is close to 50{degree sign}C". I don't see the relevance of this statement as an argument for using TPP/CETSA. When this is also not further addressed in the work, it could be deleted.

      Deleted. We agree, while this is an interesting point, it is not that relevant in this paper. 

      (3) To exclude direct drug binding to PEBP1, a thermofluor experiment is performed (Fig S1D). However, the experiment gives a high background at the lower temperatures and it could be argued that this is due to the flouroprobe binding to a hydrophobic pocket of the protein, and that oligomycin at higher concentrations competes with this binding, attenuating fluorescence. These are complex experiments and there could be other explanations, but the authors should address this. An alternative means to provide support for non-binding would be a lysate CETSA experiment, with very short (1-3 minutes) drug exposure before heating. This would typically give a shift when the protein is indicated to be CETSA responsive as in this case. 

      Agree. However, we don't have good means to perform the thermofluor experiments to rule out alternative explanations. What we can say is (as discussed above for reviewer #1, point 4) that quantification from two different experiments shows that oligomycin is does not thermally stabilizing recombinant PEBP1. To complement this conclusion, we used lysate CETSA assay which does not show thermal stabilization of PEBP1 by oligomycin. In this assay we attempted to use ferrostatin1 as a positive control as it may bind PEBP1-ALOX protein complex, and it appeared to show marginal stabilization of PEBP1. But since we lack a robust positive control for these assays, some doubt will inevitably remain.

      (4) The authors appear to have missed that there is already a MS-CETSA study in the literature on oligomycin, from Sun et al (PMID: 30925293). Although this data is from a different cell line and at a slightly longer drug treatment and is primarily used to access intracellular effects of decreased ATP levels induced by oligomycin, the authors should refer to this data and maybe address similarities if any.  

      Apologies for the oversight, the oligomycin data from this paper eluded us at it was mainly presented in the supplementary data. We compared the two datasets and find found some overlap despite the differences in the experimental details. Both datasets share translational components (e.g. EIF6 and ribosomal proteins), but most notably our other top hit BANF1 which we mentioned in the main text was also identified by Sun et al. We have updated the manuscript text as "Other proteins affected by oligomycin included BANF1, which binds DNA in an ATP dependent manner [16], and has also identified as an oligomycin stabilized protein in a previous MS-CETA experiment [23]", citing the Sun et al paper.   

      (5) The confirmation of protein-protein interaction is notoriously prone to false positives. The authors need to use overexpression and a sensitive reporter to get positive data but collect additional data using mutants which provide further support. Typically, this would be enough to confirm an interaction in the literature, although some doubt easily lingers. When the authors already have a stringent in-cell interaction assay for PEBP1 in the CETSA thermal shift, it would be very elegant to also apply the CETSA WB assay to the overexpressed constructs and demonstrate differences in the response of oligomycin, including the mutants. I am not sure this is feasible but it should be straightforward to test. 

      This is a very good suggestion. Unfortunately, due to the time constraints of the graduate students (who must write up their thesis very soon), we are not able to perform and repeat such experiments to the level of confidence that we would like.

      (6) At places the story could be hard to follow, partly due to the frequent introduction of new compounds, with not always well-stated rationale. It could be useful to have a table also in the main manuscript with all the compounds used, with the rationale for their use stated. Although some of the cellular pathways addressed are shown in miniatures in figures, it could be useful to have an introduction figure for the known ISR pathways, at least in the supplement. There are also a number of typos to correct. 

      We agree that there are many compounds used. We have attempted to clarify their use by adding this information into the table of used compounds in the methods and adding an overall schematic to Fig S1G and a note on line 132 "(see Figure 1-figure supplement 1G for summary of drugs used to target PEBP1 and ISR in this manuscript). We have also attempted to remove typos as far as possible.

      (7) EIF2a phosphorylation in S1E does not appear to be more significant for Sodium Arsenite argued to be a positive control, than CCCP, which is argued to be negative. Maybe enough with one positive control in this figure? 

      This experiment was used as a justification for our 30 min time point for the proteomics. By showing the 30 min and 4 h time points as Fig 1G and Figure 1-figure supplement 1F, our point was to demonstrate that the kinetics of phosphorylation and dephosphorylation are relevant. As you correctly pointed out, the stress response induced by sodium arsenite, but also tunicamycin is already attenuated at the 4h time point. We prefer to keep all samples to facilitate comparisons.

      (8) Page 7 reference to Figure S2H, which doesn't exist. Should be S3H.  

      Apologies for the mistake, now corrected to Figure 2-figure supplement 1B.

      (9) Finally, although the TPP labeling of the method is used widely in the literature this is CETSA with MS detection and MS-CETSA is a better term. This is about thermal shifts of individual proteins which is a very well-established biophysical concept. In contrast, the term Thermal Proteome Profiling does not relate to any biophysical concept, or real cell biology concept, as far as I can see, and is a partly misguided term. 

      We changed the term TPP into MS-CETSA, but also include the term TPP in the introduction to facilitate finding this paper by people using the TPP term.

      Reviewer #3 (Recommendations for the authors): 

      Major Issues 

      (1) The one major issue of this work is the lack of a mechanism showing precisely how PEBP1 amplifies the mitochondrial integrated stress response. The work, as it is described, presents data suggesting PEBP1's role in the ISR but fails to present a more conclusive mechanism. The idea of mitochondrial stress causing PEBP1 to bind to eIF2a, amplifying ISR is somewhat vague. Thus, the lack of a more defined model considerably weakens the argument, as the data is largely corollary, showing KO and modulation of PEBP1 definitely has a unique effect on the ISR, however, it is not conclusive proof of what the authors claim. While KO of PEBP1 diminishes the phosphorylation of eIF2a, taken together with the binding to eIF2a, different pathways could be simultaneously activated, and it seems premature to surmise that PEBP1 is specific to mitochondrial stress. Could PEBP1 be reacting to decreased ATP? Release of a protein from the mitochondria in response to stress? Is PEBP1's primary role as a modulator of the ISR, or does it have a role in non-stress-related translation? A cohesive model would tie together these separate indirect findings and constitute a considerable discovery for the ISR field, and the mitochondrial stress field.  

      Thank you for your assessment, we agree that this manuscript would have been much stronger by having clearer mechanistic insights. As with any scientific endeavor, we will keep in mind alternative explanations to the observations, which could eventually provide that cohesive model explaining how precisely PEBP1, directly or indirectly, influences ISR signalling.

      (2) The data relies on the initial identification of PEBP1 thermal stabilization concomitant with mitochondrial ISR induction post-treatment of several small molecules. However, the experiment was performed using a single timepoint of 30 minutes. There was no specific rationale for the choice of this time point for the thermal proteome profiling. 

      The reasoning for this was explicitly stated:  "We reasoned that treating intact cells with the drugs for only 30 min would allow us to observe rapid and direct effects related to metabolic flux and/or signaling related to mitochondrial dysfunction in the absence of major changes in protein expression levels.”

      Minor Issues 

      (1) In Lines 163-166 the authors state "The cells from Pebp1 KO animals displayed reduced expression of common ISR genes (Figure 2F), despite upregulation of unfolded protein response genes Ern1 (Ire1α) and Atf6 genes. This gene expression data therefore suggests that Pebp1 knockout in vivo suppresses induction of the ISR". This statement should be reassessed. While an arm of the UPR does stimulate ISR, this arm is controlled by PERK, and canonically IRE1 and ATF6 do not typically activate the ISR, thus their upregulation is likely unrelated to ISR activation and does not contribute the evidence necessary for this statement. 

      Apologies for the confusion, we aimed to highlight that as there is an increase in the two UPR arms, it is more likely that ISR instead of UPR is reduced. We have now changed the statement to the following:

      "The cells from Pebp1 PEBP1 KO animals displayed reduced expression of common ISR genes (Figure 2F), while there was mild upregulation of the unfolded protein response genes Ern1 (Ire1α) and Atf6 genes. This gene expression data therefore suggests that the reduced expression of common ISR genes is less likely to be mediated by changes in PERK, the third UPR arm, and more likely due to suppression of ISR by Pebp1 knockout in vivo."

      (2) In Lines 169 and 170 the authors state "Western blotting indicated reduced phosphorylation of eIF2α in RPE1 cells lacking PEBP1, suggesting that PEBP1 is involved in regulating ISR signaling between mitochondria and eIF2α". This conclusion is not supported by evidence. A number of pathways could be activated in these knockout cells, and simply observing an increase in p-eIF2α after knocking out PEBP1 does not constitute an interaction, as correlation doesn't mean causation. This KO could indirectly affect the ISR, with PEBP1 having no role in the ISR. While taken together there is enough circumstantial evidence in the manuscript to suggest a role for PEBP1 in the ISR, statements such as these have to be revised so as not to overreach the conclusions that can be achieved from the data, especially with no discernible mechanism.  

      We have now revised this statement by removing the conclusion and stating only the observation:  "Western blotting indicated reduced phosphorylation of eIF2α in RPE1 cells lacking PEBP1 (Fig. 3A)."

    1. Author response:

      The following is the authors’ response to the original reviews.

      comprehensiveness and rigor of the study are notable. Rarely have I reviewed a manuscript reporting the results of so many orthogonal experiments, all of which support the authors' hypotheses, and of so many excellent controls.” Reviewer 2 commented: “They have elegantly demonstrated how some mutants alter each step of processing. Together with FLIM experiments, this study provides additional evidence to support their 'stalled complex hypotheses'….This is a beautiful biochemical work. The approach is comprehensive.”

      Below we respond to the relatively minor concerns of Reviewer 2, which may be included with the first version of the Reviewed Preprint.

      Reviewer 2:

      (1) It appears that the purified γ-secretase complex generates the same amount of Aβ40 and Aβ42, which is quite different in cellular and biochemical studies. Is there any explanation for this?  

      Roughly equal production of Aβ40 and Aβ42 is a phenomenon seen with purified enzyme assays, and the reason for this has not been identified. However, we suggest that what is meaningful in our studies is the relative difference between the effects of FAD-mutant vs. WT PSEN1 on each proteolytic processing step. All FAD mutations are deficient in multiple cleavage steps in γsecretase processing of APP substrate, and these deficiencies correlate with stabilization of E-S complexes.

      (2) It has been reported the Aβ production lines from Aβ49 and Aβ48 can be crossed with various combinations (PMID: 23291095 and PMID: 38843321). How does the production line crossing impact the interpretation of this work?  

      In the cited reports, such crossover was observed when using synthetic Aβ intermediates as substrate. In PMID 2391095 (Okochi M et al, Cell Rep, 2013), Aβ43 is primarily converted to Aβ40, but also to some extent to Aβ38. In PMID: 38843321 (Guo X et al, Science, 2024), Aβ48 is ultimately converted to Aβ42, but also to a minor degree to Aβ40. We have likewise reported such product line “crossover” with synthetic Aβ intermediates (PMID: 25239621; Fernandez MA et al, JBC, 2014). However, when using APP C99-based substrate, we did not detect any noncanonical tri- and tetrapeptide co-products of Aβ trimming events in the LC-MS/MS analyses (PMID: 33450230; Devkota S et al, JBC, 2021). In the original report on identification of the small peptide coproducts for C99 processing by γ-secretase using LC-MS/MS (PMID: 19828817; Takami M et al, J Neurosci, 2009), only very low levels of noncanonical peptides were observed. In the present study, we did not search for such noncanonical trimming coproducts, so we cannot rule out some degree of product line crossover.

      (3) In Figure 5, did the authors look at the protein levels of PS1 mutations and C99-720, as well as secreted Aβ species? Do the different amounts of PS1 full-length and PS1-NTF/CTF influence FILM results?  

      FLIM results depend on the degree that C99 and long Aβ intermediates are bound to γ-secretase compared to unbound C99 and Aβ. The 6E10-Alexa 488 lifetime is significantly decreased by FAD mutations compared to WT PSEN1 (Fig. 5). However, the observed decrease in lifetime with the PSEN1 FAD mutants might also be due to lower levels of C99-720 expression or higher levels of PSEN1 CTF (i.e., mature γ-secretase complexes). We checked the C99-720 fluorescence intensities in the FLIM experiments and found that C99-720 intensities are not significantly different between cells transfected with WT and those with FAD PSEN1. Furthermore, Western blot analysis shows that levels of C99-720 are not significantly low and those of PSEN1 CTF are not high in FAD PSEN1 compared to WT PSEN1 expressing cells. Although PSEN1 CTF levels trend low for PSEN1 F386S, this mutant resulted in decreased FLIM only in Aβ-rich regions. Thus, the reduced FLIM apparently reflects effects of FAD mutation on E-S complex stability. Levels of full-length PSEN1 were also determined and found not to correlated with FLIM effects, although full-length PSEN1 represents protein not incorporated into full active γ-secretase complexes and therefore does not interact with C99-720.

      (4) It is interesting that both Aβ40 and Aβ42 Elisa kits detect Aβ43. Have the authors tested other kits in the market? It might change the interpretation of some published work.  

      We have not tested other ELISA kits. Considering our findings, it would be a good idea for other investigators to test whatever ELISAs they use for specificity vis-à-vis Aβ43.

    1. Author response:

      Reviewer #2 (Public Review): 

      Comment 1: In terms of the biological significance of this interaction, it would be good to examine (via co-immunoprecipitation) whether the CEP89/NCS-1/C3ORF14 interaction takes place upon serum starvation. Does the complex change? 

      NCS1 centriolar localization requires CEP89 as no NCS1 localization was observed in CEP89 knockout cells (Figure 2L; Figure 2-figure supplement 2B). Both CEP89 and NCS1 centriolar localization were observed (Figure 2C; Figure 1D of the PMID: 36711481) in cells grown in serum containing media, although their localization was further enhanced in serum starved cells. From these results, we predict that CEP89 and NCS1 can interact and colocalize in both serum-fed and serum-depleted condition. We think it may not be easy to assess the change in interaction with the co-immunoprecipitation assay, as interactions occur in a test tube, which may not reflect the binding condition inside the cells.

      Comment 2: Also, for the subdistal appendage localization of NCS-1 and C3ORF14, would this also change upon serum starvation? 

      We agree that it would be interesting to see whether the subdistal appendage localization changes upon serum starvation, as NCS1 may capture the ciliary vesicle at the subdistal appendages as we discussed. However, the loss of the subdistal appendage protein, CEP128, blocks subdistal appendage localization of CEP89 [PMID: 32242819] without affecting cilium formation [PMID: 27818179]. This suggests that the subdistal appendage localization of NCS1 or C3ORF14 is likely dispensable for cilium formation.

      Comment 3: For the ciliation results and the recruitment of IFT88 in CEP89 knockout cell lines, this contradicts previous work from Tanos et al (PMID: 23348840), as well as Hou et al (PMID: 36669498). A parallel comparison using siRNA, a transient knockout system, or a degron system would help understand this. A similar point goes for Figure 4, where the effect on ciliogenesis is minimal in knockout cells, but acute siRNA has been shown to have a stronger phenotype. 

      Hou et al. [PMID: 36669498] investigated the role of distal appendage proteins, CEP164, CEP89, and FBF1 in the ciliated chordotonal organ of Drosophila melanogaster by generating knockout Drosophila strains. The results were markedly different from what was observed in mammalian cells. Notably, CEP164 is not required for cilium formation, and CEP89 is required for FBF1 localization in the animal. CEP89 was required for cilium formation in the cells in the ciliated chordotonal organ, of which cilium formation is dependent on IFT machinery. They did not show if IFT centriolar recruitment is affected in the CEP89 mutant cells. These differences likely reflect the divergence of the organization of distal appendage during evolution.

      The ciliation phenotype of our CEP89 knockout cells are milder than what was shown in Tanos et al [PMID: 23348840], but largely consistent with the results from Bornens group, which used siRNA to deplete CEP89 [PMID: 23789104]. Besides, NCS1 knockout cells showed very similar phenotype to the CEP89 knockout cells, and relatively acute deletion of NCS1 (14 days after infection of the lenti-virus containing sgNCS1 without single-cell cloning) displayed an almost identical ciliation defect (Figure 4B-C). Thus, we believe CEP89 is only partially required for cilium formation in RPE-hTERT cells and that the differences are more technical than definitive.

      Comment 4: An elegant phenotype rescue is shown in Figure 5. An interesting question would be, how does this mutant and/or the myristoylation affect the recruitment of C3ORF14? 

      NCS1 is not required for the localization of C3ORF14 (Figure 2M; Figure 2- figure supplement 2C), so we can assume that the myristoylation defective mutant does not affect C3ORF14 recruitment.

      Comment 5: For the EF-hand mutants, it would be good to use control mutants, from known Ca2+ binding proteins as a control for the experiment shown. 

      In the Figure 5-figure supplement 1A-C, we generated a series of EF-hand mutant of NCS1 to see if the calcium binding affects the CEP89 interaction, NCS1 localization, and cilium formation. NCS1 is only protein among the calcium binding NCS family proteins that was found as a positive hit in the mass spec data of CEP89 tandem affinity purification. Therefore, we cannot use other NCS1 family proteins as a control for CEP89 binding, NCS1 localization, and cilium formation.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Using a knock-out mutant strain, the authors tried to decipher the role of the last gene in the mycofactocin operon, mftG. They found that MftG was essential for growth in the presence of ethanol as the sole carbon source, but not for the metabolism of ethanol, evidenced by the equal production of acetaldehyde in the mutant and wild type strains when grown with ethanol (Fig 3). The phenotypic characterization of ΔmftG cells revealed a growth-arrest phenotype in ethanol, reminiscent of starvation conditions (Fig 4). Investigation of cofactor metabolism revealed that MftG was not required to maintain redox balance via NADH/NAD+, but was important for energy production (ATP) in ethanol. Since mycobacteria cannot grow via substrate-level phosphorylation alone, this pointed to a role of MftG in respiration during ethanol metabolism. The accumulation of reduced mycofactocin points to impaired cofactor cycling in the absence of MftG, which would impact the availability of reducing equivalents to feed into the electron transport chain for respiration (Fig 5). This was confirmed when looking at oxygen consumption in membrane preparations from the mutant and would type strains with reduced mycofactocin electron donors (Fig 7). The transcriptional analysis supported the starvation phenotype, as well as perturbations in energy metabolism, and may be beneficial if described prior to respiratory activity data.

      The data and conclusions support the role of MftG in ethanol metabolism.

      We thank the reviewer for the positive evaluation of our manuscript.

      Reviewer #3 (Public review):

      Summary:

      The work by Graca et al. describes a GMC flavoprotein dehydrogenase (MftG) in the ethanol metabolism of mycobacteria and provides evidence that it shuttles electrons from the mycofactocin redox cofactor to the electron transport chain.

      Strengths:

      Overall, this study is compelling, exceptionally well designed and thoroughly conducted. An impressively diverse set of different experimental approaches is combined to pin down the role of this enzyme and scrutinize the effects of its presence or absence in mycobacteria cells growing on ethanol and other substrates. Other strengths of this work are the clear writing style and stellar data presentation in the figures, which makes it easy also for non-experts to follow the logic of the paper. Overall, this work therefore closes an important gap in our understanding of ethanol oxidation in mycobacteria, with possible implications for the future treatment of bacterial infections.

      Weaknesses:

      I see no major weaknesses of this work, which in my opinion leaves no doubt about the role of MftG.

      We thank the reviewer for the positive evaluation of our manuscript.

      Reviewer #4 (Public review):

      Summary:

      The manuscript by Graça et al. explores the role of MftG in the ethanol metabolism of mycobacteria. The authors hypothesise that MftG functions as a mycofactocin dehydrogenase, regenerating mycofactocin by shuttling electrons to the respiratory chain of mycobacteria. Although the study primarily uses M. smegmatis as a model microorganism, the findings have more general implications for understanding mycobacterial metabolism. Identifying the specific partner to which MftG transfers its electrons within the respiratory chain of mycobacteria would be an important next step, as pointed out by the authors.

      Strengths:

      The authors have used a wide range of tools to support their hypothesis, including co-occurrence analyses, gene knockout and complementation experiments, as well as biochemical assays and transcriptomics studies.

      An interesting observation that the mftG deletion mutant grown on ethanol as the sole carbon source exhibited a growth defect resembling a starvation phenotype.

      MftG was shown to catalyse the electron transfer from mycofactocinol to components of the respiratory chain, highlighting the flexibility and complexity of mycobacterial redox metabolism.

      Weaknesses:

      Could the authors elaborate more on the differences between the WT strains in Fig. 3C and 3E? in Fig. 3C, the ethanol concentration for the WT strain is similar to that of WT-mftG and ∆mftG-mftG, whereas the acetate concentration in thw WT strain differs significantly from the other two strains. How this observation relates to ethanol oxidation, as indicated on page 12.

      This is a good question, and we agree with the reviewer that the sum of processes leading to the experimental observations shown in Figure 3 are not completely understood. For instance, when looking at ethanol concentrations, evaporation is a dominating effect and the situation is furthermore confounded by the fact that the rate of ethanol evaporation appears to be inversely correlated to the optical density of the samples (see Figure 3E and compare media control as well as the samples of DmftG and DmftG at OD<sub>600</sub> = 1). Additionally, the growth rate and thus the OD<sub>600</sub> of all strains monitored are different at each time point, thus further complicating the analysis. This is why we assume that the rate of ethanol oxidation is mirrored more clearly by acetate formation, at least in the early phase before 48 h (Figure 3E),i.e., before acetate consumption becomes dominant in DmftG-mftG and WT-mftG. Here, we see that the rate of acetate formation is zero for media controls, low for DmftG, but high for WT as well as DmftG-mftG and WT-mftG. The latter two strains also showed an earlier starting point of growth as well as acetate formation and the following phase of acetate depletion.

      All of these observations are in line with our general statement, i.d., “Parallel to the accelerated and enhanced growth described above (Figure 3A), the overexpression strains displayed higher rates of ethanol consumption as well as an earlier onset of acetate overflow metabolism and acetate consumption (Figure 3D).” We are still convinced that this summary describes the findings well and avoids unnecessary speculation.

      The authors conclude from their functional assays that MftG catalyses single-turnover reactions, likely using FAD present in the active site as an electron acceptor. While this is plausible, the current experimental set up doesn't fully support this conclusions, and the language around this claim should be softened.

      This is a fair point. We revised our claim accordingly. In particular, we changed:

      Page 28: we added “possibly”

      Page 28 we changed “single-turnover reactions” to “reactions reminiscent of a single-turnover process”.

      The authors suggest in the manuscript that the quinone pool (page 24) may act as the electron acceptor from mycofactocinol, but later in the discussion section (page 30) they propose cytochromes as the potential recipients. If the authors consider both possibilities valid, I suggest discussing both options in the manuscript.

      This is true. However, no change to the manuscript is necessary, since both options were discussed on page 30.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors addressing some of the original recommendations is appreciated e.g. title change. Other recommendations that were not adequately addressed would mostly improve the clarity and help comprehension for the reader, but they are at the author's discretion.

      Reviewer #3 (Recommendations for the authors):

      Abstract: "Here, we show that MftG enzymes strictly require mft biosynthetic genes and are found in 75% of organisms harboring these genes". I read this sentence several times and I am still somewhat confused and not sure what exactly is meant here. I suggest to rephrase, e.g., to "Here, we show that in 75% of all organisms that harbour the mft biosynthetic genes, MftG enzymes are also encoded and functionally associated with these genes" (if that was meant; also the abbreviation mft should be introduced in the abstract or otherwise the full name be used).

      We thank the reviewer for the good hint. We changed the sentence to “Here, we show that MftG enzymes are almost exclusively found in genomes containing mycofactocin biosynthetic genes and are present in 75% of organisms harboring these genes”.

      p.3, 2nd paragraph: "Although the role of MFT in alcohol metabolism is well established, further biological roles of mycofactocin appear to exist." Mycofactocin is once written as MFN and once in full length, which is slightly confusing. Consider rephrasing, e.g., to "...further biological roles of this cofactor appear to exist".

      Thank you, we adopted the suggested change.

      Fig. 1: Consider adding MftG in brackets after "mycofactocin dehydrogenase" in panel B.

      Good suggestion. We added (MftG) to the figure.

      Fig. 3: Legend should be corrected. The color of the signs should be teal diamond for "M. smegmatis double presence of the mftG gene" and orange upward facing triangle for "Medium with 10 g L-1 of ethanol without bacterial inoculation". Aside from the coloration, the order should ideally also be identical to the one shown in the upper right part.

      Thank you for the valuable hint! We corrected the legend and unified the legends in the figure caption and figure.

      p.20 : It is not exactly clear to me why "semipurified cell-free extracts from M. smegmatis ∆mftG-mftGHis6 " were used here rather than the purified enzyme. Was the purification by HisTrap columns not feasible or was the protein unstable when fully purified? In any case, it would help the reader to quickly state the reason in this section.

      Indeed, the problem with M. smegmatis as an expression host was a combination of low protein yield and poor binding to Ni-NTA columns. In E. coli, poor expression, low solubility or poor binding was the issue. Unfortunately, the usage of other affinity tags resulted in either poor expression or inactive protein. We have shortly mentioned the major issues on page 21 and prefer not to focus on failed attempts too much.

      p. 21: "We, therefore, concluded that MftG can indeed interact with mycofactocins as electron donors but might require complex electron acceptors, for instance, proteins present in the respiratory chain." I agree. For the future it might be worthwhile to determine the redox potential of MftG, which could provide hints on the natural electron acceptor.

      Thank you for the suggestion. We will consider this question in our future work.

      p. 23: "In M. smegmatis, cyanide is a known inhibitor of the cytochrome bc/aa3 but not of cytochrome bd (34), therefore, the decrease of oxygen consumption when MFTs were added to the membrane fractions in combination with KCN (Figure 7), revealed that MFT-induced oxygen consumption is indeed linked to mycobacterial respiration." It might be a good idea to quickly recapitulate the functions of these cytochromes here. Also, I think it should read "bc1aa3" (also correct in legend of Fig. 8 that says "bcc-aa3").

      Thank you for the good observation. We changed all instances to the correct designation (bc1-aa3).

      Reviewer #4 (Recommendations for the authors):

      Abstract: revise the wording "MftG enzymes strictly require mft biosynthetic genes". It should be either mftG gene with the mft biosynthetic genes or MftG enzyme with the Mft biosynthetic proteins. I also suggest replacing "require" with a more appropriate term.

      This was taken care of. See above.

      Page 3, end of the first paragraph; does the alcohol dehydrogenase refer to Mno/Mdo?

      Partially, yes, but also to other alcohol dehydrogenases.

      Page 4, radical SAM; define upon first use

      Good, point, we changed “radical SAM” to radical S-adenosyl methionine (rSAM)

      Page 6; Rossman fold refers to the fold and not only the FAD binding pocket.

      Good point. We deleted “(Rossman fold)”

      Page 11; not exactly sure what this means "the growth curve of the complemented strain, which could be dysregulated in mftG expression"

      By “dysregulated” expression, we mean that the expression of mftG could be higher or lower than in the WT and could follow different regulatory signals than in the wild type. Since this phenomenon is not well understood, we would like to avoid speculative discussions.

      Page 11; Figures 2E and 2C should be 3E and 3C. Likewise on page 12 Figure 2D.

      Thank you very much for the valuable hint. We corrected the figure numbers as suggested.

      Page 12; the last Figure 3D in the page should be 3E?

      Yes, good catch, we corrected the Figure number.

      Page 17, KO; define upon first use.

      Good suggestion, we changed both instances of “KO” to “knockout”

      Page 24; revise: "for instance. For example"

      We deleted “for instance”.

      Page 26; change 6.506 to 6,506

      Corrected.

      Page 23; "In M. smegmatis, cyanide is a known inhibitor ..." is too long and not easy to understand/follow.

      Good suggestion. We simplified the sentence to “Therefore, the decrease of oxygen consumption in the presence of KCN (Figure 7) revealed…”

      Page 29; "single-turnover reactions could be observed". There are no experiments to support this statement, except the results shown in Figure 7F. I suggest softening the language, as it has been done on page 21. To claim single-turnover, a proper kinetic analysis would be necessary, which is not included in the current manuscript.

      This is true and has been taken care of. See above.

      Figure 1; Indicate mycofactocin dehydrogenase as MftG

      Done.

      Figure 5A; what is the significance of comparing ∆mftG glucose with WT ethanol?

      We agree, that, although the difference of the two columns is significant, this does not have any relevant meaning. Therefore, we removed the bracket with p-value in Panel A.

      Make HdB-Tyl/HdB-tyloxapol usage consistent throughout the document. Likewise, re the usage of mycobacteria/Mycobacteria/Mycobacteria

      Thank you for the valuable hint, we unified the usage throughout the document

    1. Author response:

      Reviewer #1:

      Summary:

      Beyond what is stated in the title of this paper, not much needs to be summarized. eIF2A in HeLa cells promotes translation initiation of neither the main ORFs nor short uORFs under any of the conditions tested.

      Strengths:

      Very comprehensive, in fact, given the huge amount of purely negative data, an admirably comprehensive and well-executed analysis of the factor of interest.

      Weaknesses:

      The study is limited to the HeLa cell line, focusing primarily on KO of eIF2A and neglecting the opposite scenario, higher eIF2A expression which could potentially result in an increase in non-canonical initiation events.

      We thank the reviewer for the positive evaluation. As suggested by the reviewer in the detailed recommendations, we will clarify in the title, abstract and text that our conclusions are limited to HeLa cells. Furthermore, as suggested we will test the effect of eIF2A overexpression on the luciferase reporter constructs, and will upload a revised manuscript.

      Reviewer #2:

      Summary

      Roiuk et al describe a work in which they have investigated the role of eIF2A in translation initiation in mammals without much success. Thus, the manuscript focuses on negative results. Further, the results, while original, are generally not novel, but confirmatory, since related claims have been made before independently in different systems with Haikwad et al study recently published in eLife being the most relevant.

      Despite this, we find this work highly important. This is because of a massive wealth of unreliable information and speculations regarding eIF2A role in translation arising from series of artifacts that began at the moment of eIF2A discovery. This, in combination with its misfortunate naming (eIF2A is often mixed up with alpha subunit of eIF2, eIF2S1) has generated a widespread confusion among researchers who are not experts in eukaryotic translation initiation. Given this, it is not only justifiable but critical to make independent efforts to clear up this confusion and I very much appreciate the authors' efforts in this regard.

      Strengths

      The experimental investigation described in this manuscript is thorough, appropriate and convincing.

      Weaknesses

      However, we are not entirely satisfied with the presentation of this work which we think should be improved.

      We thank the reviewer for the positive evaluation. We will revise the manuscript according to the reviewer's suggestions made in the detailed recommendations.

      Reviewer #3:

      Summary:

      This is a valuable study providing solid evidence that the putative non-canonical initiation factor eIF2A has little or no role in the translation of any expressed mRNAs in cultured human (primarily HeLa) cells. Previous studies have implicated eIF2A in GTP-independent recruitment of initiator tRNA to the small (40S) ribosomal subunit, a function analogous to canonical initiation factor eIF2, and in supporting initiation on mRNAs that do not require scanning to select the AUG codon or that contain near-cognate start codons, especially upstream ORFs with non-AUG start codons, and may use the cognate elongator tRNA for initiation. Moreover, the detected functions for eIF2A were limited to, or enhanced by, stress conditions where canonical eIF2 is phosphorylated and inactivated, suggesting that eIF2A provides a back-up function for eIF2 in such stress conditions. CRISPR gene editing was used to construct two different knock-out cell lines that were compared to the parental cell line in a large battery of assays for bulk or gene-specific translation in both unstressed conditions and when cells were treated with inhibitors that induce eIF2 phosphorylation. None of these assays identified any effects of eIF2A KO on translation in unstressed or stressed cells, indicating little or no role for eIF2A as a back-up to eIF2 and in translation initiation at near-cognate start codons, in these cultured cells.

      The study is very thorough and generally well executed, examining bulk translation by puromycin labeling and polysome analysis and translational efficiencies of all expressed mRNAs by ribosome profiling, with extensive utilization of reporters equipped with the 5'UTRs of many different native transcripts to follow up on the limited number of genes whose transcripts showed significant differences in translational efficiencies (TEs) in the profiling experiments. They also looked for differences in translation of uORFs in the profiling data and examined reporters of uORF-containing mRNAs known to be translationally regulated by their uORFs in response to stress, going so far as to monitor peptide production from a uORF itself. The high precision and reproducibility of the replicate measurements instil strong confidence that the myriad of negative results they obtained reflects the lack of eIF2A function in these cells rather than data that would be too noisy to detect small effects on the eIF2A mutations. They also tested and found no evidence for a recent claim that eIF2A localizes to the cytoplasm in stress and exerts a global inhibition of translation. Given the numerous papers that have been published reporting functions of eIF2A in specific and general translational control, this study is important in providing abundant, high-quality data to the contrary, at least in these cultured cells.

      Strengths:

      The paper employed two CRISPR knock-out cell lines and subjected them to a combination of high-quality ribosome profiling experiments, interrogating both main coding sequences and uORFs throughout the translatome, which was complemented by extensive reporter analysis, and cell imaging in cells both unstressed and subjected to conditions of eIF2 phosphorylation, all in an effort to test previous conclusions about eIF2A functioning as an alternative to eIF2.

      Weaknesses:

      There is some question about whether their induction of eIF2 phosphorylation using tunicamycin was extensive enough to state forcefully that eIF2A has little or no role in the translatome when eIF2 function is strongly impaired. Also, similar conclusions regarding the minimal role of eIF2A were reached previously for a different human cell line from a study that also enlisted ribosome profiling under conditions of extensive eIF2 phosphorylation; although that study lacked the extensive use of reporters to confirm or refute the identification by ribosome profiling of a small group of mRNAs regulated by eIF2A during stress.

      We thank the reviewer for the positive evaluation. We will revise the manuscript according to the recommendations made in the detailed recommendations. Regarding the two points mentioned here:

      (1) the reason eIF2alpha phosphorylation does not increase appreciably is because unfortunately the antibody is very poor. The fact that the Integrated Stress Response (ISR) is induced by our treatment can be seen, for instance, by the fact that ATF4 protein levels increase strongly (in the very same samples where eIF2alpha phosphorylation does not increase much, in Suppl. Fig. 5E). We will strengthen the conclusion that the ISR is indeed activated with additional experiments/data as suggested by the reviewer.

      (2) We agree that our results are in line with results from the previous study mentioned by the reviewer, so we will revise the manuscript to mention this other study more extensively in the discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      The overall goal of this manuscript is to understand how Notch signaling is activated in specific regions of the endocardium, including the OFT and AVC, that undergo EMT to form the endocardial cushions. Using dofetilide to transiently block circulation in E9.5 mice, the authors show that Notch receptor cleavage still occurs in the valve-forming regions due to mechanical sheer stress as Notch ligand expression and oxygen levels are unaffected. The authors go on to show that changes in lipid membrane structure activate mTOR signaling, which causes phosphorylation of PKC and Notch receptor cleavage.

      The strengths of the manuscript include the dual pharmacological and genetic approaches to block blood flow in the mouse, the inclusion of many controls including those for hypoxia, the quality of the imaging, and the clarity of the text. However, several weaknesses were noted surrounding the main claims where the supporting data are incomplete.

      PKC - Notch1 activation:

      (1) Does deletion of Prkce and Prkch affect blood flow, and if so, might that be suppressing Notch1 activation indirectly?

      To address this concern, we performed echocardiography of Prkce<sup>+/-</sup>;Prkch<sup>+/-</sup>, Prkce<sup>-/-</sup>;Prkch<sup>+/-</sup>, and Prkce<sup>+/-</sup>;Prkch<sup>-/-</sup> mouse hearts (Figure 3-supplement figure 2D), showing no significant effect in heartbeat and blood flow. (Line 308)

      (2) It would be helpful to visualize the expression of prkce and prkch by in situ hybridization in E9.5 embryos.

      We now added immunofluorescence staining results for both PKCE and PKCH as shown in Figure 3-supplement figure 2B. In E9.5 embryonic heart, PKCH is mainly expressed in the endocardium overlying AV canal and the base of trabeculae, overlapping with the expression pattern of NICD and pPKC<sup>Ser660</sup>. PKCE is expressed in both endocardium and myocardium. In the endocardium, PKCE is mainly expressed in the endocardium overlying AV canal (Line312-314)

      (2) PMA experiments: Line 223-224: A major concern is related to the conclusion that "blood flow activates Notch in the cushion endocardium via the mTORC2-PKC signaling pathway". To make that claim, the authors show that a pharmacological activation with a potent PKC activator, PMA, rescues NICD levels in the AVC in dofetilide-treated embryos. This claim would also need proof that a lack of blood flow alters the activity of mTORC2 to phosphorylate the targets of PKC phosphorylation. Also, this observation does not explain the link between PKC activity and Notch activation.

      Both AKT Ser473 and PKC Ser660 are well characterized phosphorylation sites regulated by mTORC2 (Baffi TR et. al, mTORC2 controls the activity of PKC and Akt by phosphorylating a conserved TOR interaction motif. Sci Signal. 2021;14.). pAKT<sup>Ser473</sup> is widely used as an indicator of mTORC2 activity. Therefore, the reduced staining intensity of pAKT<sup>Ser473</sup> and pPKC<sup>Ser660</sup> observed in the dofetilide treated embryos should reflect the reduced activity of their common upstream activator mTORC2. This information is provided in Line 317-321.

      As PMA is a well-characterized specific activator of PKC, we believe the rescue of NICD by PMA could explain the link between PKC activity and Notch activation.

      (3) In addition, the authors hypothesise that shear stress lies upstream of PKC and Notch activation, and that because shear stress is highest at the valve-forming regions, PKC and Notch activity is localised to the valve-forming regions. Since PMA treatment affects the entire endocardium which expresses Notch1, NICD should be seen in areas outside of the AVC in the PMA+dofetilide condition. Please clarify.

      As shown in Figure 3C and Figure 3-supplement figure 2B, pPKC, PKCH and PKCE expression are all confined in the AVC region. This explains PMA activates NICD specifically in the valve-forming region. This information is added in Line 312-314.

      Lipid Membrane:

      (1) It is not clear how the authors think that the addition of cholesterol changes the lipid membrane structure or alters Cav-1 distribution. Can this be addressed? Does adding cholesterol make the membrane more stiff? Does increased stiffness result from higher shear stress?

      We do not know how exactly addition of cholesterol alters membrane structure and influence mTORC2-PKC-Notch signaling. As cholesterol is an important component of lipid raft and caveolae, it is possible that enrichment of cholesterol might alter the membrane structure to make the lipid raft structure less dependent on sheer stress. This hypothesis need to be tested in further in vitro studies. This information is added to Line 433-436.

      (2) The loss of blood flow apparently affects Cav1 membrane localization and causes a redistribution from the luminal compartment to lateral cell adhesion sites. Cholesterol treatment of dofetilide-treated hearts (lacking blood flow) rescued Cav1 localization to luminal membrane microdomains and rescued NICD expression. It remains unclear how the general addition of cholesterol would result in a rescue of regionalized membrane distribution within the AVC and in high-shear stress areas.

      We do not know the exact mechanism. As replied in the previous question, future cell-based work is needed to address these important questions. (Line 433-436)

      (3) The authors do not show the entire heart in that rescue treatment condition (cholesterol in dofetilide-treated hearts). Also, there is no quantification of that rescue in Figure 4B. Currently, only overview images of the heart are shown but high-resolution images on a subcellular scale (such as electron microscopy) are needed to resolve and show membrane microdomains of caveolae with Cav1 distribution. This is important because Cav-1could have functions independent of caveolae.

      In Figure 4C, most panels display the large part of the heart including AVC, atrium and ventricle. The images in the third column appear to be more restricted to AVC. We have now replaced these images to reveal AVC and part of the atrium and ventricle. 

      The quantification has also been provided in Figure 4C. We also added a new panel of scanning EM of AVC endocardium, showing numerous membrane invaginations on the luminal surface of the endocardial cells. The size of the invaginations ranges from 50 to 100 nm, consistent with the reported size of caveolae. Dofetilide significantly reduced the number of membrane invaginations, which recovered after restore of blood flow at 5 hours post dofetilide treatment. The reduction of membrane invaginations could also be rescued by ex vivo cholesterol treatment. This information is added to Line 342-349.

      Figure Legends, missing data, and clarity:

      (1) The number of embryos used in each experiment is not clear in the text or figure legends. In general, figure legends are incomplete (for instance in Figure 1).

      Thanks for reminding. we have now added numbers of embryos in the figure legends.

      (2) Line 204: The authors refer to unpublished endocardial RNAseq data from E9.5 embryos. These data must be provided with this manuscript if it is referred to in any way in the text.

      The RNAseq data of PKC isoforms is now provided in Figure3-Figure supplement 2A, Line 301-302.

      (3) Figure 1 shows Dll4 transcript levels, which do not necessarily correlate with protein levels. It would be important to show quantifications of these patterns as Notch/Dll4 levels are cycling and may vary with time and between different hearts.

      The Dll4 immuno-staining in Figure 1B,C is indeed Dll4 protein, not transcript. The quantification is added in Figure 1—Figure supplement 1C. Line 215.

      (4) Line 212-214: The authors describe cardiac cushion defects due to the loss of blood flow and refer to some quantifications that are not completely shown in Figure 3. For instance, quantifications for cushion cellularity and cardiac defects at three hours (after the start of treatment?) are missing.

      The formation of the defects is a developmental process and time dependent. To address this concern, we quantified the cushion cellularity at 5 hours post dofetilide treatment and showed that cell density significantly decreased in the dofetilide treated embryos, albeit less pronounced than the difference at E10.5. (Line 256-257)

      (5) Related to Figure 5. The work would be strengthened by quantification of the effects of dofetilide and verapamil on heartbeat at the doses applied. Is the verapamil dosage used here similar to the dose used in the clinic?

      We are grateful to this suggestion. The effect of dofetilide on heartbeat has already been shown in Figure 2A. We have now additionally measured the heartbeat rate of verapamil treated embryos, and provided the results in Figure 5E. For verapamil injection in mice, a single i.p. dose of 15 mg/kg was used, which is equivalent to 53 mg/m<sup>2</sup> body surface. Verapamil is used in the clinic at dosage ranging from 200 to 480 mg/day, equivalent to 3.33 - 8 mg/kg or 117 - 282 mg/m<sup>2</sup> body surface. Therefore, the dosage used in the mouse is not excessively high compared to the clinic uses. (Line 361-365) 

      Overstated Claims:

      (1) The authors claim that the lipid microstructure/mTORC2/PKC/Notch pathway is responsive to shear stress, rather than other mechanical forces or myocardial function. Their conclusions seem to be extrapolated from various in vitro studies using non-endocardial cells. To solidify this claim, the authors would need additional biomechanical data, which could be obtained via theoretical modelling or using mouse heart valve explants. This issue could also be addressed by the authors simply softening their conclusions.

      We aggrege with the reviewer’s comment. We have now revised the statement as “Our data support a model that membrane lipid microdomain acts as a shear stress sensor and transduces the mechanical cue to activate intracellular mTORC2-PKC-Notch signaling pathway in the developing endocardium. (line 416-418) It is noteworthy that the methodology used to alter blood flow in this study inevitably affects myocardial contraction. Additional work to uncouple sheer stress with other changes of mechanical properties of the myocardium with the aid of theoretical modelling or using mouse heart valve explants is needed to fully characterize the effect of sheer stress on mouse endocardial development.” (Line 436-440)

      (2) Line 263-264: In the discussion, the authors conclude that "Strong fluid shear stress in the AVC and OFT promotes the formation of caveolae on the luminal surface of the endocardial cells, which enhances PKCε phosphorylation by mTORC2." This link was shown rather indirectly, rather than by direct evidence, and therefore the conclusion should be softened. For example, the authors could state that their data are consistent with this model.

      We have revised the statement as “Strong fluid shear stress in the AVC and OFT enhances PKC phosphorylation by mTORC2 possibly by maintaining a particular membrane microstructure.” (Line 372-374)

      (3) In the Discussion, it says: "Mammalian embryonic endocardium undergoes extensive EMT to form valve primordia while zebrafish valves are primarily the product of endocardial infolding (Duchemin et al., 2019)." In the paper cited, Duchemin and colleagues described the formation of the zebrafish outflow tract valve. The zebrafish atrioventricular valve primordia is formed via partial EMT through Dll-Notch signaling (Paolini et al. Cell Reports 2021) and the collective cell migration of endocardial cells into the cardiac jelly. Then, a small subset of cells that have migrated into the cardiac jelly give rise to the valve interstitial cells, while the remainder undergo mesenchymal-to-endothelial transition and become endothelial cells that line the sinus of the atrioventricular valve (Chow et al., doi: 10.1371/journal.pbio.3001505). The authors should modify this part of the Discussion and cite the relevant zebrafish literature.

      Thanks for valuable comments. We have now revised the statement as “Mammalian embryonic endocardium undergoes extensive EMT to form valve primordia while zebrafish atrioventricular valve primordia is formed via partial EMT and the collective cell migration of endocardial cells into the cardiac jelly followed by tissue sheet delamination.” with relevant references added. (Line 411-414)

      Recommendations to the Authors:

      (1) One issue that the authors could address is the organization of figures. There are several cases where positive data that are central to the conclusions are placed in the supplement and should be moved to the main figures. Places where this occurred are listed below:

      - The Tie2 conditional deletion of Dll4 showing retention of NICD in the OFT and AVC regions is highly supportive of the model. The authors should consider moving these data to main Figure 1.

      Thanks for the suggestion. We have reorganized the figure as requested.

      - The ligand expression data in Figure 2- Supplement Figure 1 A is VERY important to the conclusions drawn from the dofetilide treatment. The authors should move these data to main Figure 2.

      The ligand expression data in Figure 2- Supplement Figure 1A are now moved to Figure 2B.

      - In Figure 3A - the area in the field of view should be stated in the Figure (is it the AVC?) Figure 3 - Supplement 1 proximal OFT data should be moved to main Figure 3 as it is central to the conclusions. Negative DA data can be left in the supplement. Again, for Figure 3 - Supplement 1 Stauroporine treatment data should be moved to the main figure as it is positive data that are central to the conclusions.

      Thanks for the suggestion. We have reorganized the figure as requested.

      (2) Antibody used for Twist1 detection is not listed in the resource table.

      Twist1 is purchased from abcam, the detailed information is now available in the resource table.

      (3) Missing arrowhead in Figure 4A, last row.

      Sorry for the negligence. Arrowhead is now added.

      (4) Line 286. "OFT" pasted on the word "endothelium".

      “OFT” is now removed.

      (5) Related to Figure 2C. The fast response of NICD to flow cessation was used as an argument to support post-translational modification. It is not clear why Sox9 and Twist1 expression also responds so quickly.

      Sox9 and Twist1 expression does seem to respond very quickly. Whether there exists additional regulatory pathways such as Wnt, Vegf signaling that also respond to sheer stress needs to be investigated in the future.

      (6) Line 200: The sentence should end with a period.

      Sorry for the oversight. It is now corrected.

      (7) Lines 34 to 35: the authors phrase that Notch is "allowed" to be specifically activated in the AVC and outflow tract by shear stress.

      We have rephrased the statement with “enabling Notch to be specifically activated in AVC and OFT by regional increased shear stress.” Line 27

      (8) Lines 96-100: At the end of the introduction, the text is copied from the abstract. New text should be written or summarized in a different way.

      The last sentence of introduction is now changed to “The results uncovered a new mechanism whereby mechanical force serves as a primary cue for endocardial patterning in mammalian embryonic heart.” (Line 93-95)

      (9) Line 125: The term "agreed with the Dll4 transcript.."should be replaced with a better term like "overlapped" or "was identical with".

      The word “agreed” is now “overlapped”. (Line 219)

      (10) Line 291: "Thus, through these sophisticated mechanisms, the developing mouse hearts may achieve three purposes:"- The English should be adjusted here since it sounds like hearts are aiming to achieve a purpose, which is unlikely what was meant by the authors.

      This sentence is rephrased to “Thus, in the developing mouse hearts: (1) VEGF signaling is reduced to permit endocardial EMT; (2) Dll4 expression is reduced to prevent widespread endocardial Notch activation and make endocardium sensitive to flow; (3) a proper cushion size and shape is maintained by limiting the flanking endocardium to undergo EMT despite physically close to the field of BMP2 derived from of AVC myocardium (Figure 6).” (Line 402-406)

    1. Author response:

      The following is the authors’ response to the original reviews.

      The mice crossing scheme is unusual as you have three mice to cross to produce genotypes, while we could understand that it is possible to produce pups of desired genotypes with different mating schemes, such a vague crossing scheme is not desirable and of poor genetics practice.

      We thank the reviewer for this suggestion. Indeed, our scheme is not a representation of the actual breeding scheme but just a brief explanation of lineages used for the acquisition of the triple transgenic mice. We will include the full crossing scheme into the revision.

      We added to the text the explanation that all used genotypes were maintained as homozygotes and put a full breeding scheme in the supplementary figure S1A

      It is worth mentioning that single knockouts seem to show a corresponding upregulation of the level of the paralogue kinase, indicating that any lack of phenotypes might be due to feedback compensation, which would be an interesting finding if confirmed; this has not been mentioned.

      We thank the reviewer for raising an important point about the paralog upregulation. Indeed, our data on primary cells (supplementary 1B) suggests the upregulation of CDK19 in CDK8KO and vice versa. We will point this out in discussion. We plan to examine the data for the testis as soon as more tissues are available.

      We addressed this question by performing additional western blot (added to the paper fig. 2D) and found no paralogue upregulation in testes. To do that we also manufactured novel rabbit anti-mouse CDK19 antibodies described in Materials and Methods.

      The authors should clarify or present the data on where CDK8 and CDK19  as well as CcnC are expressed so as to help the readers understand which tissues both CDK might be functioning in and cause the loss of CcnC.

      Due to a limited sensitivity of single cell sequencing (only ~5,000 transcripts are sequenced from total of average 500,000 transcripts per cell, so the low expressed transcripts are not sequenced in all cells) it is challenging to firmly establish CDK8/19 positive and -negative tissues from single cell data because both transcripts are minor. This image will be included in the next version.

      In this version we have added staining by CDK8 and CDK19 antibodies on paraffin sections, showing expression in variety of cells. Additionally, we have analyzed Cdk8/CcnC presence in different testicular cell types by flow cytometry. Both methods show that not only spermatogonial stem cells express Cdk8 as was shown in McCleland et al. 2005, but also some 1n cells, 4n cells and a significant part of cKit<sup>- </sup>2n cells. We added a corresponding paragraph and figures (2E-K) to the paper. We consider this a more definitive answer to the question than RNA data.

      Furthermore, data for the genitourinary system in single knockouts are very sparse; data are described for fertility in Figure 1H, ploidy, and cell number in Figures 2B and C, plasma testosterone and luteinizing hormone levels in Figures 5C and 5D, and morphology of testis and prostate tissue for single Cdk8 knockout in Supplementary Figure 1C (although in this case the images do not appear very comparable between control and CDK8 KO, thus perhaps wider fields should be shown), but, for example, there is no analysis of different meiotic stages or of gene expression in single knockouts. It is worth mentioning that single knockouts seem to show a corresponding upregulation of the level of the paralogue kinase, indicating that any lack of phenotypes might be due to feedback compensation, which would be an interesting finding if confirmed; this has not been mentioned.

      We agree that a description of the single KO could be beneficial, but we expect no big differences with the WT or Cre-Ert. We found neither histological differences nor changes in cell counts or ratios of cell types. Our ethical committee also has concerns about sacrificing mice without major phenotypic changes, without a well formulated hypothesis about the observed effects. We plan to add histological pictures to the next version of the article.

      We have updated histological figures with new figures for iDKO and Cre+Tam mice with additional fields of view and better quality staining (2A-B).

      The second major weakness is that the correlation between double knockout and reduced expression of genes involved in steroid hormone biosynthesis is portrayed as a causal mechanism for the phenotypes observed. While this is a possibility, there are no experiments performed to provide evidence that this is the case. Furthermore, there is no evidence showing that CDK8 and/or CDK19 are directly responsible for the transcription of the genes concerned.

      We agree with the reviewer that the effects on CDK8/CDK19/CCNC could lead to the observed transcriptional changes in multiple indirect steps. There are, however, major technical challenges in examining the binding of transcription factors in the tissue, especially in Leydig cells which are a relatively minor population.  We will clarify it in the revision and strengthen this point in the discussion.

      We have added corresponding explanation in the Discussion: “We hypothesize that all these changes are caused by disruption of testosterone synthesis in Leydig cells, although, at this point, we cannot definitively prove that the affected genes are regulated by CDK8/19 directly.”

      The claim of reproductive defects in the induced double knockout of CDK8/19 resulted from the loss of CCNC via a kinase-independent mechanism is interesting but was not supported by the data presented. While the construction and analysis of the systemic induced knockout model of Cdk8 in Cdk19KO mice is not trivial, the analysis and data are weakened by the systemic effect of Cdk8 loss, making it difficult to separate the systemic effect from the local testis effect.

      We agree with the reviewer that the effects on the testes could be due to the systemic loss of CDK8 rather than specifically in the testis, and we will clarify it in the revision. We will also clarify that although our results are suggestive that the effects of CDK8/19 knockout are kinase-independent, and that the loss of Cyclin C is a likely explanation for the kinase independence, but we do not claim that it is *the* mechanism.

      In this version we added several caveats indicating that the proposed mechanism is likely, but not the only one possible.

      Also using TAM-treated wild type as control is ok, but a better control will be TAM-treated ERT2-cre; CDK8f/f or TAM-treated ERT2 Cre CDK19/19 KO, so as to minimize the impact from the well-recognized effect of TAM.  

      We used TAM-treated ERT2-cre for most of the experiments, and did not observe any major histological or physiological differences with the WT+TAM. We will make sure to present them in the revision.

      The authors found that Sertoli cells re-entered the cell cycle in the inducible double knockout but stopped short of careful characterization other than increased expression of cell cycle genes.

      Unfortunately, we were not able to perform satisfactory Ki67 staining to address this point.

      Dko should be appropriately named iDKO (induced dKO). We will make the corresponding change.

      We performed necropsy ? not the right wording here.

      Colchicine-like apoptotic bodies ? what does this mean? Not clear.

      We made appropriate changes - all DKO were renamed iDKO, necropsy changed to autopsy and cells designated as “apoptotic”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Given the proprietary claims of the authors ("We have for the first time generated mice with the systemic inducible Cdk8 knockout on the background of Cdk19 constitutive knockout"), it does not appear acceptable and indeed might be misleading, to not describe the overall phenotypes of the mice. Are mice normal size/weight? Does an autopsy reveal anything other than atrophied genital tissue in males? Do the authors find a phenotype in the intestinal epithelium, as previously reported? (N.B. this could potentially clarify a discrepancy in the literature since the loss of the secretory lineages in double knockouts reported by the Firestein lab was not reproduced by intestinal organoid double knockout in the paper by the Fisher lab).

      We have removed the statement “for the first time”, although to the best of our knowledge this is the fact. We did not attempt to describe all the phenotypic effects of the Cdk8/19 knockout in this paper, since some of the phenotypic observations related to mouse weight and behavior varied between different laboratories involved and require additional analysis. The effect on the urogenital system was by far the most striking histological feature observed and it was carefully addressed in this paper. Other findings require additional experiments and are out of the scope of this paper and we plan to focus on them later. As per suggestion of the reviewer we performed histological analysis of DKO intestines and found the same decrease in the Paneth and goblet cells numbers as described by Dannappel et al. We added corresponding figures (Supplemental fig. 1C) to the paper.

      If the authors wish to reinforce their claims about causality of steroidogenic gene expression and phenotype, they could try rescuing the phenotype by treating mice with testosterone.

      As stated in Discussion, we hypothesized that injection of testosterone would not rescue the phenotype, as the androgen receptor signaling is also affected. However we would like to perform such an experiment, but we were not able to procure testosterone pellets at this time.

      If they wish to claim a direct effect of CDK8/19 on the expression of steroidogenic genes, they could also assess CDK8/19 binding to promoters of the genes analysed by ChIP.

      There are big technical challenges in examining the binding of transcription factors in the primary tissue, especially in Leydig cells, a minor population, so we cannot perform such an experiment.

      In order to conclude that their CDK8/19 inhibitor treatment worked, they could show target engagement by cell thermal shift assay, loss of CDK8/19 kinase-dependent gene expression, or loss of CDK8/19 substrate phosphorylation (eg interferon-induced STAT1 S727 phosphorylation) under the conditions used. Alternatively, they could show rescue with a kinase-dead allele.

      As noted in public comments - we thank the reviewer for raising this concern. The target selectivity and target engagement by the inhibitors used in this study (Senexin B and SNX631-6) have been described in other models and published. CDK8/19 engagement and target selectivity of Senexin B, used in our vitro studies, have been extensively characterized in cell-based assays (Chen et al., Cells 2019, 8(11), 1413; Zhang et al., J Med Chem. 2022 Feb 24;65(4):3420-3433.) Similar characterization has been published for SNX631-6 and its equipotent analog SNX631, which showed drastic antitumor activity when  used in vivo at the same dosing regimen as in this paper (Li et al., J Clin Invest. 2024;134(10):e176709). The comparison of the pharmacokinetics data obtained in the present study and in vitro activity of SNX631-6 in a cell-based assay suggests that the tissue concentrations of this drug should have provided substantial inhibition of Cdk8/19. Unfortunately, there are no known phosphorylation substrates specific for Cdk8/19 that can be used as pharmacodynamic markers. The widely used STAT1 phosphorylation at S727 is exerted not only by CDK8/19 but also by other kinases and shows variable response to CDK8/19 inhibition (Chen et al., Cells 2019, 8(11), 1413). In the revised MS, we have added a Western blot with pSTAT1 S727 staining of WT, 8KO, 19KO and iDKO testes. Cdk8/19 knockout did not decrease and apparently even increased the level of pSTAT1 S727, which demonstrates that this marker of CDK8/19 activity it is not suitable for our tissue type. While the evidence that Cdk8/19 kinase inhibition in the testes after in vivo drug treatment does not match the phenotype of iDKO is admittedly indirect, the same result has been obtained in the cell culture studies with Sertoli cells, where the inhibitor concentration (1 µM Senexin B) was much higher than needed for the maximal Cdk8/19 inhibition.

      Finally, I did not find any legends to supplementary figures anywhere.

      We apologize for not including legends for supplementary figures, and will correct that in the next version of the manuscript.

      Additionally, we addressed the question about the sufficiency of the lipid supply for steroidogenesis in testes. There was a possibility that steroidogenesis is impossible due to the lack of cholesterol input, but OilRed staining revealed that the situation is the opposite: lipid content in iDKO testes is significantly higher than in WT testes. We added corresponding text to the article and the supplementary Fig. S6.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      This manuscript (Baron, Oviedo et al., 2024) builds on a previous study from the Wiseman lab (Perea, Baron et al., 2023) and describes the identification of novel nucleoside mimetics that activate the HRI branch of the ISR and drive mitochondrial elongation. The authors develop an image processing and analysis pipeline to quantify the effects of these compounds on mitochondrial networks and show that these HRI activators mitigate ionomycin-driven mitochondrial fragmentation. They then show that these compounds rescue mitochondrial morphology defects in patient-derived MFN2 mutant cell lines. 

      Strengths: 

      The identification of new ISR modulators opens new avenues for biological discovery surrounding the interplay between mitochondrial form/function and the ISR, a topic that is of broad interest. It also reinforces the possibility that such compounds might represent new potential therapeutics for certain mitochondrial disorders. The development of a quantitative image analysis pipeline is valuable and has the potential to extract the subtle effects of various treatments on mitochondrial morphology. 

      We thank the reviewer for the positive feedback on our manuscript. We address all of the reviewer’s valuable concerns in the revised submission, as highlighted below. 

      Weaknesses: 

      I have three main concerns.

      First, support for the selectivity of compounds 0357 and 3610 acting downstream of HRI comes from using knockdown ISR kinase cell lines and measuring the fluorescence of ATF4-mApple (Figure 1G and 1H). However, the selectivity of these compounds acting through HRI is not shown for mitochondrial morphology. Is mitochondrial elongation blocked in HRI knockdown cells treated with the compounds? While the ISRIB treatment does block mitochondrial elongation, ISRIB acts downstream of all ISR kinases and doesn't necessarily define selectivity for the HRI branch of the ISR. Additionally, are the effects of these compounds on ATF4 production and mitochondrial elongation blocked in a non-phosphorylatable eIF2alpha mutant? 

      We thank the reviewer for highlighting this point. As indicated by the reviewer, we show that compounddependent increases in mitochondrial elongation are blocked by co-treatment with ISRIB, indicating that this effect can be attributed to ISR activation. We prefer the use of this highly selective pharmacologic approach to block ISR activation, as opposed to the MEF<sup>A/A</sup> cells, as the use of pharmacologic approaches provide more temporal control over ISR inhibition and can prevent the type of chronic disruption to mitochondria associated with these types of genetic perturbations. However, the reviewer is correct that ISRIB blocks downstream of all ISR kinases, meaning that we cannot explicitly demonstrate that 0357 and 3610 induce mitochondrial elongation downstream of HRI-dependent ISR activation using this tool. Thus, to address this point, we have clarified the discussion of these results to make it clear that our results show that our compounds induce mitochondrial elongation downstream of the ISR, omitting the direct implications of HRI in this phenotype. 

      This point of selectivity/specificity of the compounds gets at a semantic stumbling block I encountered in the text where it was often stated "stress-independent activation" of ISR kinases. Nucleoside mimetics are likely a very biologically active class of molecules and are likely driving some level of cell stress independent of a classical ISR, UPR, heat-shock response, or oxidative stress response. 

      A major challenge in defining stress-independent activation of stress-responsive signaling pathways is the fact that the activation of these pathways is often used as a primary marker of cellular stress. While this can be overcome by transcriptome-wide profiling (e.g., RNAseq), the reviewer is correct that our focused profiling of select stress-responsive signaling pathways is insufficient to claim the stress-independent activation of the ISR by our prioritized compounds. To address this, we removed this terminology from the revised submission.  

      Second, it is difficult for me to interpret the data for the quantification of mitochondrial morphology. In the legend for Figure 2, it is stated that "The number of individual measurements for each condition are shown above." Are the individual measurements the number of total cells quantified? If not, how many total cells were analyzed? If the individual measurements are distinct mitochondrial structures that could be quantified why are the n's for each parameter (bounding box, ellipsoid principal axis, and sphericity) so different? Does this mean that for some mitochondria certain parameters were not included in the analysis? For me, it seems more intuitive that each mitochondrial unit should have all three parameters associated with it, but if this isn't the case it needs to be more carefully described why. 

      The number of individual measurements refers to the number of 3D segmentations generated using the “surfaces’ module in Imaris. As the reviewer noted, we expect each surface segmentation to represent a single “mitochondrial unit.” We have now further clarified this in the figure legend. 

      Regarding differences in sample size for each group, we used an outlier test (i.e., ROUT outlier test in PRISM 10) to remove apparent outliers in our data. Often, these outliers result from errors in the automatic quantification that inaccurately merge two mitochondria into one large segmentation. This explains the discrepancy in the number of measurements made for each experimental group. We have made this point more clear in the Materials and Methods section of the revised manuscript.  

      Third, the impact of these compounds on the physiological function of mitochondria in the MFN2.D414V mutants needs to be measured. Sharma et al., 2021 showed a clear deficit in mitochondrial OCR in MFN2.D414V cells which, if rescued by these compounds, would strengthen the argument that pharmacological ISR kinase activation is a strategy for targeting the functional consequences of the dysregulation of mitochondrial form.

      In this manuscript, we demonstrate that pharmacologic activation of the ISR using 0357 and 3610 rescue mitochondrial morphology in patient fibroblasts expressing the disease-associated MFN2<sup>D414V</sup> mutant. The reviewer is correct that there are other mitochondrial phenotypes linked to the expression of this mutant. We are currently pursuing this question with more potent ISR activating compounds developed in our laboratory identified using the HTS screening platform described in this manuscript. However, this work, which builds on the studies described herein, uses other ISR activating compounds, which we feel would be best described in subsequent manuscripts that can fully define the activity of these new compounds.  

      Reviewer #2 (Public review): 

      Summary. 

      Mitochondrial dysfunction is associated with a wide spectrum of genetic and age-related diseases. Healthy mitochondria form a dynamic reticular network and constantly fuse, divide, and move. In contrast, dysfunctional mitochondria have altered dynamic properties resulting in fragmentation of the network and more static mitochondria. It has recently been reported that different types of mitochondrial stress or dysfunction activate kinases that control the integrated stress response, including HRI, PERK, and GCN2. Kinase activity results in decreased global translation and increased transcription of stress response genes via ATF4, including genes that encode mitochondrial protein chaperones and proteases (HSP70 and LON). In addition, the ISR kinases regulate other mitochondrial functions including mitochondrial morphology, phospholipid composition, inner membrane organization, and respiratory chain activity. Increased mitochondrial connectivity may be a protective mechanism that could be initiated by pharmacological activation of ISR kinases, as was recently demonstrated for GCN2. 

      A small molecule screening platform was used to identify nucleoside mimetic compounds that activate HRI. These compounds promote mitochondrial elongation and protect against acute mitochondrial fragmentation induced by a calcium ionophore. Mitochondrial connectivity is also increased in patient cells with a dominant mutation in MFN2 by treatment with the compounds.

      Strengths: 

      (1) The screen leverages a well-characterized reporter of the ISR: translation of ATF4-FLuc is activated in response to ER stress or mitochondrial stress. Nucleoside mimetic compounds were screened for activation of the reporter, which resulted in the identification of nine hits. The two most efficacious dose-response tests were chosen for further analysis (0357 and 3610). The authors clearly state that the compounds have low potency. These compounds were specific to the ISR and did not activate the unfolded protein response or the heat shock response. Kinases activated in the ISR were systematically depleted by CRISPRi revealing that the compounds activate HRI.

      (2) The status of the mitochondrial network was assessed with an Imaris analysis pipeline and attributes such as length, sphericity, and ellipsoid principal axis length were quantified. The characteristics of the mitochondrial network in cells treated with the compounds were consistent with increased connectivity. Rigorous controls were included. These changes were attenuated with pharmacological inhibition of the ISR. 

      (3) Treatment of cells with the calcium ionophore results in rapid mitochondrial fragmentation. This was diminished by pre-treatment with 0357 or 3610 and control treatment with thapsigargin and halofuginone 

      (4) Pathogenic mutations in MFN2 result in the neurodegenerative disease Charcot-Marie-Tooth Syndrome Type 2A (CMT2A). Patient cells that express Mfn2-D414V possess fragmented mitochondrial networks and treatment with 0357 or 3610 increased mitochondrial connectivity in these cells.

      We appreciate the reviewer’s positive response to these aspects of our manuscript. We address the reviewer’s valuable comments in the revised submission as highlighted below. 

      Weaknesses: 

      The weakness is the limited analysis of cellular changes following treatment with the compounds. 

      (1) Unclear how 0357 or 3610 alter other aspects of cellular physiology. While this would be satisfying to know, it may be that the authors determined that broad, unbiased experiments such as RNAseq or proteomic analysis are not justified due to the limited translational potential of these specific compounds.

      The reviewer is correct. The low potency of 0357 and 3610 limit the translational potential for these compounds. However, building on the work described herein, we recently identified more potent HRI activating compounds with higher translational potential. Using RNAseq profiling, we found that these compounds show transcriptomewide selectivity for the ISR and can promote adaptive remodeling of mitochondrial morphology and function in cellular models of multiple other diseases. These compounds will be further described in subsequent studies that expand on the efforts outlined here demonstrating the potential for pharmacologic HRI activators to promote adaptive mitochondrial remodeling.   

      (2) There are many changes in Mfn2-D414V patient cells including reduced respiratory capacity, reduced mtDNA copy number, and fewer mitochondrial-ER contact sites. These experiments are relatively narrow in scope and quantifying more than mitochondrial structure would reveal if the compounds improve mitochondrial function, as is predicted by their model.

      In this manuscript, we demonstrate that pharmacologic activation of the ISR using 0357 and 3610 rescue mitochondrial morphology in patient fibroblasts expressing the disease-associated MFN2<sup>D414V</sup> mutant. The reviewer is correct that there are other mitochondrial phenotypes linked to the expression of this mutant. We are currently pursuing this question with more potent ISR activating compounds developed in our laboratory using the HTS screening platform described in this manuscript. However, this work, which builds on the studies described herein, uses other ISR activating compounds, which we feel would be best described in subsequent manuscripts that can fully define the activity of these new compounds.  

      Reviewer #3 (Public review):

      Summary: 

      Mitochondrial injury activates eiF2α kinases - PERK, GCN2, HRI, and PKR - which collectively regulate the Integrated Stress Response (ISR) to preserve mitochondrial function and integrity. Previous work has demonstrated that stress-induced and pharmacologic stress-independent ISR activation promotes adaptive mitochondrial elongation via the PERK and GCN2 kinases, respectively. Here, the authors demonstrate that pharmacologic ISR inducers of HRI and GCN2 enhance mitochondrial elongation and suppress mitochondrial fragmentation in two disease models, illustrating the therapeutic potential of pharmacologic ISR activators. Specifically, the authors first used an innovative ISR translational reporter to screen for nucleoside mimetic compounds that induce ISR signaling and identified two compounds, 0357 and 3610, that preferentially activate HRI. Using a mitochondrial-targeted GFP MEF cell line, the authors next determined that these compounds (as well as the GCN2 activator, halofuginone) enhance mitochondrial elongation in an ISR-dependent manner. Moreover, pretreatment of MEFs with these ISR kinase activators suppressed pathological mitochondrial fragmentation caused by a calcium ionophore. Finally, pharmacologic HRI and GCN2 activation were found to preserve mitochondrial morphology in human fibroblasts expressing a pathologic variant in MFN2, a defect that leads to mitochondrial fragmentation and is a cause of Charcot Marie Tooth Type 2A disease. 

      Strengths: 

      This well-written manuscript has several notable strengths, including the demonstration of the potential therapeutic benefit of ISR modulation. New chemical entities with which to further interrogate this stress response pathway are also reported. In addition, the authors used an elegant screen to isolate compounds that selectively activate the ISR and identify which of the four kinases was responsible for activation. Special attention was also paid to a thorough evaluation of the effect of their compounds on other stress response pathways (i.e. the UPR, and heat and oxidative stress responses), thereby minimizing the potential for off-target effects. The implementation of automated image analysis rather than manual scoring to quantify mitochondrial elongation is not only practical but also adds to the scientific rigor, as does the complementary use of both the calcium ionophore and MFN2 models to enhance confidence and the broad therapeutic potential for pharmacology ISR manipulation. 

      We thank the reviewer for their positive response to our manuscript. We address the reviewer’s remaining concerns as outlined below. 

      Weaknesses: 

      The only minor concerns are with regard to effects on cell health and the timing of pharmacological administration. 

      The two compounds described in this manuscript were found to not induce any overt toxicity over a 24 h period in cell culture models. In the revised manuscript, we show data showing that treatment with increasing doses of either 0357 or 3610 do not significantly reduce cellular viability in HEK293 cells (Fig. S1G). 

      With regards to treatments, we include all of the relevant information for the timing and dosage of compound treatment in the revised manuscript. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for Authors)

      (1) Figure S1 "B. ATF4-Gluc activity" -> Fluc, The number of replicates is not consistently stated for each experiment. p-values are not given for D and F. 

      We have changed the legend for Fig. S1B to ATF4-FLuc. We show individual replicates for all experiments for all panels described in this figure, except panels C and G, in the revised Figure S1. We explicitly state the number of replicates in panel C and G in the accompanying figure legend. We have repeated the qPCR described in panels C,F and statistics are included in the revised manuscript.

      (2) Figure 2 - no p-values for BtdCPU.

      Yes. We found that BtdCPU-dependent increases in mitochondrial fragmentation (described in Fig. 2A-D) were not significant when analyzing all the data included in these figures by Brown-Forsythe and Welch ANOVA test. However, the DMSO and BtdCPU conditions were significantly different when directly compared using a Welch’s t-test (p<0.005). Since the statistics in this manuscript are being analyzed by ANOVA, we decided not to include a significance marker for BtdCPU, as it was not observed in this more stringent test and is not the main focus of this manuscript.  

      (3) Figure S4 (Supplement to Figure 5) -> Supplement to Figure 4. 

      We have corrected this error in the revised manuscript. 

      (4) Error in references - duplicated 24 and 46, duplicated 10 and 11.

      This is now corrected in the revised submission.

      Reviewer #2 (Recommendations for the authors): 

      I would love to see an assessment of mitochondrial function and mtDNA in the D414 cells following treatment. 

      As indicated above, we are continuing to probe the impact of more potent HRI activating compounds in patientderived cell models expressing disease-relevant MFN2 mutants. Initial experiments suggest that this approach can mitigate additional pathologies beyond deficient elongation in these cells, although we are continuing to pursue these results with our improved HRI activating compounds. We are excited by these results, but feel that they are best suited for a follow-up manuscript describing these new HRI activators.   

      Reviewer #3 (Recommendations for the authors):

      The only suggestion to broaden the manuscript's impact might be to perform a basic assessment of the impact of pharmaceutical ISR activation on cell viability. Though mitochondrial elongation is often considered a surrogate for mitochondrial health, whether mitochondrial elongation improves cell viability (or not) would be informative. Similarly, the authors did not address the time-dependent effects of the ISR modulators, choosing to focus on the acute rather more chronic outcomes. Finally, does simultaneous (rather than pre-) treatment with an activator and the ionomycin produce similar effects on mitochondrial morphology, especially since therapeutics are typically administered post-injury?

      We now include cell viability experiments showing that the two HRI activators discussed in this manuscript, 0357 and 3610, do not significantly reduce viability in HEK293 cells. This work is included in the revised manuscript (see Fig. S1G). 

      With respect to acute vs chronic outcomes of ISR activation. As highlighted by the reviewer, we primarily focus this work on defining the impact of acute ISR treatment on mitochondrial morphology. As discussed above, we now show that our prioritized ISR activating compounds 0357 and 3610 do not significantly impact cellular viability over a 24 h timecourse. However, as suggested by the reviewer, additional studies on the potential implications of chronic pharmacologic ISR activation on mitochondrial biology remains to be further explored.

      We are continuing to address this in subsequent studies using more potent ISR kinase activating compounds established in our lab. However, we would like to highlight that detrimental phenotypes linked to chronic ISR kinase activation in cell culture does not preclude the translational potential for this approach, as in vivo PK/PD of these compounds can be controlled to prevent complications arising from chronic pathway activity. We previously demonstrated the potential for controlling compound activity through its PK/PD in our establishment of highly selective activators of other stress-responsive signaling pathways such as the IRE1/XBP1s arm of the UPR (e.g., Madhavan et al (2022) Nat Comm).   

      We appreciate the reviewer’s comments regarding the timing of compound treatment in them ionomycin experiment. Ionomycin works extremely quick to induce fragmentation (minutes), which would be prior to activation of the ISR induced by these compounds (hours). Thus, co-treatment would lead to fragmentation. It is an interesting question to ask if co-treatment with ISR activators could rescue this fragmentation as the pathway is activated, but we did not explicitly address this question in this manuscript. However, we do show that pharmacologic GCN2 or HRI activators can rescue mitochondrial morphology in patient fibroblasts expressing a MFN2 mutant, where mitochondria are fragmented, indicating that our approach can restore mitochondrial morphology in this context. We feel that these results, in combination with others described in our manuscript, demonstrate the potential for this approach to mitigate pathologic mitochondrial fragmentation associated with different conditions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work uses transgenic reporter lines to isolate entpd5a+ cells representing classical osteoblasts in the head and non-classical (osterix-) notochordal sheath cells. The authors also include entpd5a- cells, col2a1a+ cells to represent the closely associated cartilage cells. In a combination of ATAC and RNA-Seq analysis, the genome-wide transcriptomic and chromatin status of each cell population is characterized, validating their methodology and providing fundamental insights into the nature of each cell type, especially the less well-studied notochordal sheath cells. Using these data, the authors then turn to a thorough and convincing analysis of the regulatory regions that control the expression of the entpd5a gene in each cell population. Determination of transcriptional activities in developing zebrafish, again combined with ATAC data and expression data of putative regulators, results in a compelling and detailed picture of the regulatory mechanisms governing the expression of this crucial gene.

      Strengths:

      The major strength of this paper is the clever combination of RNA-Seq and ATAC analysis, further combined with functional transcriptional analysis of the regulatory elements of one crucial gene. This results in a very compelling story.

      Weaknesses:

      No major weaknesses were identified, except for all the follow-up experiments that one can think of, but that would be outside of the scope of this paper.

      Reviewer #2 (Public Review):

      Summary:

      Complementary to mammalian models, zebrafish has emerged as a powerful system to study vertebrate development and to serve as a go-to model for many human disorders. All vertebrates share the ancestral capacity to form a skeleton. Teleost fish models have been a key model to understand the foundations of skeletal development and plasticity, pairing with more classical work in amniotes such as the chicken and mouse. However, the genetic foundation of the diversity of skeletal programs in teleosts has been hampered by mapping similarities from amniotes back and not objectively establishing more ancestral states. This is most obvious in systematic, objective analysis of transcriptional regulation and tissue specification in differentiated skeletal tissues. Thus, the molecular events regulating bone-producing cells in teleosts have remained largely elusive. In this study, Petratou et al. leverage spatial experimental delineation of specific skeletal tissues -- that they term 'classical' vs 'non-classical' osteoblasts -- with associated cartilage of the endo/peri-chondrial skeleton and inter-segmental regions of the forming spine during development of the zebrafish, to delineate molecular specification of these cells by current chromatin and transcriptome analysis. The authors further show functional evidence of the utility of these datasets to identify functional enhancer regions delineating entp5 expression in 'classical' or 'non-classical' osteoblast populations. By integration with paired RNA-seq, they delineate broad patterns of transcriptional regulation of these populations as well as specific details of regional regulation via predictive binding sites within ATACseq profiles. Overall the paper was very well written and provides an essential contribution to the field that will provide a foundation to promote modeling of skeletal development and disease in an evolutionary and developmentally informed manner.

      Strengths:

      Taken together, this study provides a comprehensive resource of ATAC-seq and RNA-seq data that will be very useful for a wide variety of researchers studying skeletal development and bone pathologies. The authors show specificity in the different skeletal lineages and show the utility of the broad datasets for defining regulatory control of gene regulation in these different lineages, providing a foundation for hypothesis testing of not only agents of skeletal change in evolution but also function of genes and variations of unknown significance as it pertains to disease modeling in zebrafish. The paper is excellently written, integrating a complex history and experimental analysis into a useful and coherent whole. The terminology of 'classical' and 'non-classical' will be useful for the community in discussing the biology of skeletal lineages and their regulation.

      Weaknesses:

      Two items arose that were not critical weaknesses but areas for extending the description of methods and integration into the existing data on the role of non-classical osteoblasts and establishment/canalization of this lineage of skeletal cells.

      (1) In reading the text it was unclear how specific the authors' experimental dissection of the head/trunk was in isolating different entp5a osteoblast populations. Obviously, this was successful given the specificity in DEG of results, however, analysis of contaminating cells/lineages in each population would be useful - e.g. using specific marker genes to assess. The text uses terms such as 'specific to' and 'enriched in' without seemingly grounded meaning of the accuracy of these comments. Is it really specific - e.g. not seen in one or other dataset - or is there some experimental variation in this?

      We thank the reviewer for pointing this out. Given that the separation from head and trunk is done manually, there will be some experimental variability. We have used anatomical hallmarks (cleithrum and swim bladder), and therefore would expect the variability to be small. Regarding classical osteoblasts contaminating trunk tissue, head removal was consistently performed using the aforementioned anatomical hallmarks in a manner that ensures that the cleithrum does not remain in the trunk tissue.  In order to alleviate concerns regarding trunk cell populations contaminating cranial populations, and to further clarify our strategy, we add the following statement to the Materials and Methods section: “The procedure does not allow for a complete separation of notochordal non-classical osteoblasts from cranial classical osteoblasts, as the notochord extends into the cranium. However, the amount of sheath cells in that portion of the notochord is negligible, compared both to the number of classical (cranial) osteoblasts in head samples, and to notochord cells isolated in trunk samples.”

      (2) Further, it would be valuable to discuss NSC-specific genes such as calymmin (Peskin 2020) which has species and lineage-specific regulation of non-classical osteoblasts likely being a key mechanistic node for ratcheting centra-specific patterning of the spine in teleost fishes. What are dynamics observed in this gene in datasets between the different populations, especially when compared with paralogues - are there obvious cis-regulatory changes that correlate with the co-option of this gene in the early regulation of non-classical osteoblasts? The addition of this analysis/discussion would anchor discussions of the differential between different osteoblasts lineages in the paper.

      This is an interesting concept and idea, that we will consider in a possible revision or, if requiring substantial additional efforts, in a possible new research line. An excellent starting point for further studies using our datasets.

      Reviewer #3 (Public Review):

      Summary:

      This study characterizes classical and nonclassical osteoblasts as both types were analyzed independently (integrated ATAC-seq and RNAseq). It was found that gene expression in classical and nonclassical osteoblasts is not regulated in the same way. In classical osteoblasts, Dlx family factors seem to play an important role, while Hox family factors are involved in the regulation of spinal ossification by nonclassical osteoblasts. In the second part of the study, the authors focus on the promoter structure of entpd5a. Through the identification of enhancers, they reveal complex modes of regulation of the gene. The authors suggest candidate transcription factors that likely act on the identified enhancer elements. All the results taken together provide comprehensive new insights into the process of bone development, and point to spatio-temporally regulated promoter/enhancer interactions taking place at the entpd5a locus.

      Strengths:

      The authors have succeeded in justifying a sound and consistent buildup of their experiments, and meaningfully integrating the results into the design of each of their follow-up experiments. The data are solid, insightfully presented, and the conclusion valid. This makes this manuscript of great value and interest to those studying (fundamental) skeletal biology.

      Weaknesses:

      The study is solidly constructed, the manuscript is clearly written and the discussion is meaningful - I see no real weaknesses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor issues that may need to be addressed or detailed:

      Supplementary Figures 1I-J, text page 4, line 24: "photoconversion and imaging": this needs some more detailed description: green fluorescent cells should be actively expressing Kaede, but only if there is a delay between photoconversion and imaging. What is the reason that Supplementary Figure 1F shows mainly green fluorescent cells, contrary to 1G-J?

      In our experiments, we could see new Kaede expression under the control of the entpd5a promoter region within 1.5 hours of photoconversion, as shown in Suppl. Figure 1E-H, suggesting that this time window was sufficient for protein generation. The reason for Suppl. Fig 1F showing more green fluorescence we believe relates to the high rate of transcriptional activity at that stage, in the entirety of the notochord progenitor cells. In addition, this is an effect which we attribute to the relatively small number of cells producing red fluorescence at that stage, due to photoconversion of only a few Kaede+ cells at the 15 somites stage (Suppl. Fig. 1E). Therefore, the masking effect of the green fluorescence by the red is not as significant as in G and H, where the red fluorescence resulting from photoconversion right after imaging at 18s and 21s, respectively, significantly overlaps with new green fluorescence. This can be seen in the image as the presence of orange fluorescence in G and H, instead of the clear red shown in E, I and J.

      In addition to this, we would like to point out that in Suppl. Fig. 1I, J the reason that green fluorescence is only detected in the ventral region of the notochord, is because the promoter of entpd5a only remains active in the ventral-most sheath cells at that stage. This is stated in the results section of the main text, first subsection, paragraph 3. The reason for this very interesting, strictly localised expression pattern remains unclear.

      Somewhat intriguing: green fluorescence in Figure 1B, C (osx:GAL4FF) and Supplementary Figure 1C (entpd5a:GAL4FF) in the CNS? Would that be an artefact of the GAL4FF/UAS:GFP system?

      We are confident that the fluorescence pointed out by the reviewer is not an artefact of the GAL4FF/UAS system, for a few reasons. Firstly, osx (Sp7) has been shown to be expressed and to function in the nervous system in mice (Park et al, BBRC, 2011; Elbaz et al, Neuron, 2023). Secondly, not only osx, but also entpd5a can be readily detected in a subset of cranial and spinal neurons in early development using the entpd5a:GAL4FF; UAS:GFP transgenic line (Suppl. Fig 1C). Finally, when establishing transgenic lines with the entpd5a(1.1):GFP construct, expression was almost invariably present in diverse elements of the nervous system, but not in bone (data not shown). This led us to hypothesise that the minimal promoter of entpd5a (and possibly also that of osx) is activated by transcription factors active in the nervous system, and this effect is likely controlled by the surrounding enhancers, but also the genome location. It is unclear at present what the endogenous neural expression of the two genes is like, and we did not further investigate this in this study, as the focus was on the skeleton.

      Figure 2: What exactly is "Corrected Total Cell Fluorescence"? Is it green + red fluorescence?

      We thank the reviewer for pointing out the absence of more information on this. Corrected total cell fluorescence does not correspond to green+ red fluorescence, rather it is calculated as follows for a single channel:

      CTCF = Integrated Density – (Area of selected cell X Mean fluorescence of background readings)

      More details can be found in the following website: https://theolb.readthedocs.io/en/latest/imaging/measuring-cell-fluorescence-using-imagej.html

      We have edited the Materials and Methods section under “Imaging and image analysis” to include the aforementioned information.

      Page 11, line 34: The authors may have missed the recently published "Raman et al., Biomolecules 2024 Vol. 14; doi:10.3390/biom14020139" describing RNA-Seq in 4 dpf osterix+ osteoblasts.

      We thank the reviewer for drawing our attention to the Raman et al publication. The reference has now been added in the manuscript.

      Figure 5A and B: use a higher resolution version to make the numbers and gene names more readable. Figures 5C and 6A could also use a larger font for the text and numbers.

      High resolution files are now included with the revised manuscript, which should significantly help in making figures more easily readable. Although we agree with the reviewer that larger fonts would improve readability, due to the nature of the graphs (very small spaces in some cases, where the numbers would have to fit) this would not be easy to achieve. However, we believe that this issue will be resolved with the availability of higher resolution files. If readability remains a concern, we would be happy to attempt re-organising the graphs to allow for larger fonts.

      Reviewer #2 (Recommendations For The Authors):

      I suggest no further experiments, but do suggest that a few points be clarified.

      In the Discussion, the text "the less evolved osteoblasts of fish and amphibians..." is not accurate. These cells are not less evolved as they represent an independent lineage to tetrapods that have evolved with different stresses for a similar time. However, as teleost fishes and amphibians share characteristics and all share a common ancestor, these signatures represent a putative ancestral state of skeletal differentiation not seen in amniotes, including humans.

      We thank the reviewer for pointing out the unfortunate phrasing. The text has now been modified as follows: “Specifically, the osteoblasts of teleost fish and amphibians, whose characteristics are putatively closer to a more ancestral state of skeletal differentiation compared to amniotes, appear to share gene expression with chondrocytes”.

      The title could potentially be shortened to reach a broader audience by removing the initial clause of 'integration of ATAC and RNA seq' as this is a commonly performed analysis - "Chromatin and transcriptomic signature in classical and non-classical zebrafish osteoblasts indicate mechanisms of ancestral skeletal differentiation" is more descriptive of the findings and not focused on the method.

      We have discussed this internally, but would prefer to retain the current title. The reason is (1) because we would like to see our methodology and datasets be used as platform for further studies, and the current title, in our opinion, facilitates this. In regards to replacing “mechanisms of entpd5a regulation” with “mechanisms of ancestral skeletal differentiation”, we think this does not give an accurate description of our work, which is primarily focused on elucidating entpd5a promoter dynamics.

      All datasets should be made available as soon as possible for use in the field.

      The datasets (raw and processed) are available on the GEO database. The corresponding accession numbers can be found in our data availability statement.

      Minor comments:

      (1) Figure 1A. The labels are missing for grey and light blue structures.

      These structures are together making up the “notochord sheath”, which is comprised of the basal lamina (grey), the medial layer of fibrillar collagen (light blue) and the outer layer of loosely arranged matrix (lighter blue). We modified the figure legend to indicate that the three layers all correspond to the notochord sheath.

      (2) Figure 2A. The constructs in the lower part of the panel are not discussed in the legend and seem out of place in terms of data type and analysis.

      We would argue that indicating which non-coding regions and which ATAC peaks were responsible for driving GFP expression in each construct aids in a better understanding of our results. We thank the reviewer for pointing out the lack of mention of these constructs in the figure legend. This issue has now been resolved.

      (3) Be wary of red/green color combinations, especially in the figures where these are juxtaposed with each other.

      We apologise for the use of red/green colour. Although it is not possible for this manuscript to change the colour patterns, we will make sure to avoid the use of these colours in conjunction in the future.

      (4) The use of fish as a term should be classified as teleost fish, as authors are not addressing non-teleost basal ray-finned fishes or the fact that tetrapods are within bony fishes overall.

      This is well spotted, we have now remedied this by editing the manuscript. Where the term “fish” was used, we now state “teleost fish”.

      (5) Age information is missing in several Figures (e.g. 1D and 2C).

      In some of the figures space constrains did not allow for including the stage on the figure itself. However, we have made sure that in those cases the stage is incorporated in the figure legend.

      (6) The resolution of several Figures (e.g. Figure 5 and Supplementary Figure 3) is low.

      We address this issue by providing high resolution figures with the revised manuscript.

      (7) In the sentence (top page before Discussion) "The same conclusion was reached upon isolation from these three..", it was unclear what 'upon isolation' referred to.

      We agree with the reviewer that this phrasing is unclear. To enhance clarity, the manuscript now reads as follows: “The same conclusion was reached upon isolation of the DEGs highlighted by our RNA-seq results, from the three aforementioned groups of genes associated with ATAC peaks for each cell population.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study investigated the phosphoryl transfer mechanism of the enzyme adenylate kinase, using SCC-DFTB quantum mechanical/molecular mechanical (QM/MM) simulations, along with kinetic studies exploring the temperature and pH dependence of the enzyme's activity, as well as the effects of various active site mutants. Based on a broad free energy landscape near the transition state, the authors proposed the existence of wide transition states (TS), characterized by the transferring phosphoryl group adopting a meta-phosphate-like geometry with asymmetric bond distances to the nucleophilic and leaving oxygens. In support of this finding, kinetic experiments were conducted with Ca2+ ions at different temperatures and pH, which revealed a reduced entropy of activation and unique pH-dependence of the catalyzed reaction.

      Strengths:

      A combined application of simulation and experiments is a strength.

      Weaknesses:

      The conclusion that the enzyme-catalyzed reaction involves a wide transition state is not sufficiently clarified with some concerns about the determined free energy profiles compared to the experimental estimate. (See Recommendations for the authors.)

      Reviewer #2 (Public Review):

      Summary:

      The authors report results of QM/MM simulations and kinetic measurements for the phosphoryl-transfer step in adenylate kinase. The main assertion of the paper is that a wide transition state ensemble is a key concept in enzyme catalysis as a strategy to circumvent entropic barriers. This assertion is based on observation of a "structurally wide" set of energetically equivalent configurations that lie along the reaction coordinate in QM/MM simulations, together with kinetic measurements that suggest a decrease of the entropy of activation.

      Thank you for your feedback. The reviewer’s questions are answered below, hoping to clarify them.

      Strengths:

      The study combines theoretical calculations and supporting experiments.

      Weaknesses:

      The current paper hypothesizes a "wide" transition state ensemble as a catalytic strategy and key concept in enzyme catalysis. Overall, it is not clear the degree to which this hypothesis is fully supported by the data. The reasons are as follows:

      (1) Enzyme catalysis reflects a rate enhancement with respect to a baseline reaction in solution. In order to assert that something is part of a catalytic strategy of an enzyme, it would be necessary to demonstrate from simulations that the activation entropy for the baseline reaction is indeed greater and the transition state ensemble less "wide". Alternatively stated, when indicating there is a "wide transition state ensemble" for the enzyme system - one needs to indicate that is with respect to the non-enzymatic reaction. However, these simulations were not performed and the comparisons not demonstrated. The authors state "This chemical step would take about 7000 years without the enzyme" making it impossible to measure; nonetheless, the simulations of the nonenzymatic reaction would be fairly straight forward to perform in order to demonstrate this key concept that is central to the paper. Rather, the authors examine the reaction in the absence of a catalytically important Mg ion.

      Thank you for your thoughtful feedback. QM/MM calculations for uncatalysed phosphoryl-transfer reactions involving either diphosphates or triphosphates have been well documented in the literature showing narrow and symmetric TSE (Klan et al., JACS 2006, 128 (47) 15310-15323; Cui Wang et al., JPCB 2015, 119(9), 3720-3726). We added these references to the revised manuscript.

      (2) The observation of a "wide conformational ensemble" is not a quantitative measure of entropy. In order to make a meaningful computational prediction of the entropic contribution to the activation free energy, one would need to perform free energy simulations over a range of temperatures (for the enzymatic and non-enzymatic systems). Such simulations were not performed, and the entropy of activation was thus not quantified by the computational predictions. The authors instead use a wider TS ensemble as a proxy for larger entropy, and miss an opportunity to compare directly to the experimental measurements.

      Although we share the reviewers desire to quantify entropies from QM/MM simulations, we agree with discussions in the literature that calculating quantitative entropies from performing QM/MM simulations at different temperatures is not reliable. We therefore felt strongly to stay with a qualitative assessment of entropy differences from our simulations. As the reviewer highlighted, our study combines theoretical calculations and experiments. The entropy of activation is well estimated by the experiments from the experimental accuracy of these temperature-dependent changes in rate constants for the chemical step.  Our computational results agree well with the experimental results and were further validated in 2 rounds of reviews/revisions by additional different free energy calculation methods (MSMD and US), plus committor analysis.

      Reviewer #3 (Public Review):

      Summary:

      By conducting QM/MM free energy simulations, the authors aimed to characterize the mechanism and transition state for the phosphoryl transfer in adenylate kinase. The qualitative reliability of the QM/MM results has been supported by several interesting experimental kinetic studies. However, the interpretation of the QM/MM results is not well supported by the current calculations.

      Thank you for your feedback. We appreciate the recognition of our experimental validation but understand your concern about the interpretation of our QM/MM results. To address this, we answer the specific questions below and added clearer explanations of the computational approach, including its limitations. We also better aligned the QM/MM results with both experimental data and theoretical expectations to strengthen the overall interpretation.

      Strengths:

      The QM/MM free energy simulations have been carefully conducted. The accuracy of the semi-empirical QM/MM results was further supported by DFT/MM calculations, as well as qualitatively by several experimental studies.

      Weaknesses:

      (1) One key issue is the definition of the transition state ensemble. The authors appear to define this by simply considering structures that lie within a given free energy range from the barrier. However, this is not the rigorous definition of transition state ensemble, which should be defined in terms of committor distribution. This is not simply an issue of semantics, since only a rigorous definition allows a fair comparison between different cases - such as the transition state in an enzyme vs in solution, or with and without the metal ion. For a chemical reaction in a complex environment, it is also possible that many other variables (in addition to the breaking and forming P-O bonds) should be considered when one measures the diversity in the conformational ensemble.

      In the revised manuscript, the authors included committor analysis. However, the discussion of the result is very brief. In particular, if we use the common definition of the transition state ensemble (TSE) as those featuring the committor around 0.5, the reaction coordinate of the TSE would span a much narrower range than those listed in Table 1. This point should be carefully addressed.

      The reviewer is right, the TSE is rigorously defined in terms of the committor distribution. We actually calculated the committor distribution for the reaction with and without Mg. We now added the figure showing the committor distribution for both reactions (Figure 3 – supplement 9). We did not include these results before because the committor distribution histogram would need more points to have a more accurate shape, requiring a high computational cost. We followed the reviewer’s suggestion and updated table 1 with the values from the committor distribution analysis.

      (2) While the experimental observation that the activation entropy differs significantly with and without the Ca2+ ion is interesting, it is difficult to connect this result with the "wide" transition state ensemble observed in the QM/MM simulations so far. Even without considering the definition of the transition state ensemble mentioned above, it is unlikely that a broader range of P-O distances would explain the substantial difference in the activation entropy measured in the experiment. Since the difference is sufficiently large, it should be possible to compute the value by repeating the free energy simulations at different temperatures, which would lead to a much more direct evaluation of the QM/MM model/result and the interpretation.

      See our answer above about this point.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      One of the remaining issues with this revision is the assertion of the wide transition states in the presence of Mg2+ ion. When discussing the transition state of phosphoryl transfer reactions, it is important to consider their nature as involving both the cleavage and formation of P-O bonds. While these two events can occur in concert with a single transition state, many studies have shown that one event often precedes the other. Sometimes, there is a slight drop in free energy between the two events, forming a transient intermediate. However, due to its very short lifetime, this intermediate state may not be detectable experimentally. Depending on the sequence of events, the transition state or the transient intermediate may exhibit characteristics of a metaphosphate or phosphorane-like species. Based on the DFTB simulation results presented in the paper, it appears that the reaction forms a metaphosphate-like transition state. In the present reaction, since the two oxygen atoms involved in the reaction are very good leaving groups with similar reactivity, it is not surprising to observe the two events near the TS with very similar relative free energies, and therefore, the free energy profile can be very flat near the TS. This is consistent with the statement that "the transferring phosphate can be much closer to the leaving oxygen than the attacking oxygen and vice versa" on page 9. In my opinion, however, this should not be considered as a wide transition state but rather a consequence of the two events occurring very close to each other along the reaction coordinate. This distinction can be considered a semantic issue, and as long as the authors clearly discuss this issue and clarify the meaning of the TS ensemble, the reviewer is okay with that. In its current form, the statement of the wide TS ensemble may lead to a misleading interpretation of the reaction under study.

      An intermediate is clearly defined as a minimum in the free energy landscape. We see no evidence in any of your simulations of a minimum flanked by two transitions states, nor do we see any evidence in our NMR relaxation data or crystal structure ensemble refinement. We report our experimental and computational results, so that the reader can directly interpret the free energy landscapes for this system, avoiding semantics due to language ambiguity.

      Second, based on the kinetic study, the free energy of the catalytic reaction is approximately zero. The authors suggest that at pH near 7, the ADP exists as a roughly

      50-50 mixture between the singly protonated and fully charged states and consequently, the reaction free energies between the two scenarios cancel each other out. However, this argument is not correct. If [ADP(H)]/[ADP] is close to 1, the two reaction free energies, one with +6 kcal/mol and the other with -6 kcal/mol, imply that the protonation of the products (either ATP or AMP) requires ~12 kcal/mol (i.e., 9 pKa unit shift). Given the symmetric nature of the reaction and the similar pKa values between ATP, ADP versus AMP, such a large shift in the pKa of the product state is not expected, and for the calculated results to be accurate, the pKa shifts in the reactant state versus the product state must be opposite, with a total relative shift of 9 pKa units. This is difficult to understand given the nature of the reaction catalyzed by the adenylate kinase enzyme.

      We thank this reviewer for this question, which made us realize that we cannot compare the free energies of our QM/MD simulations with the experimentally determined ADP and ATP/AMP ratios. In the experiment we determine the entire pool of ADP and AMP/ATP bound to the enzyme, but could not distinguish if the protonated and or nonprotonated states are contributing to the measured observed rate constants (Kerns, S. et al.,(2015). In the present study, we now discovered that the nonprotonated forms have a lower activation barrier, but the protonated states also contribute to the overall reaction. Therefore, we removed this paragraph from our discussion.

      Minor comments:

      The difference in the free energy barrier determined by the MSMD and umbrella sampling is not negligible. Considering that umbrella sampling is commonly used in this type of research, the MSMD method appears to overestimate the barrier by more than 3 kcal/mol. Would the TS geometries obtained by umbrella sampling be comparable to those obtained by MSMD?

      This is an excellent suggestion, since the umbrella sampling is the more accurate method. The TSE from both methods are indeed comparable, and we added new figure panels about this results to Fig. 4.

      Figure 5 shows that the enthalpy of activation is similar for reactions with and without Ca2+. Do the authors expect the enthalpy of activation to decrease when Ca2+ is replaced by Mg2+ without a significant change in the entropy of activation? Any justification?

      In (Kerns, S. et al.,(2015) we had experimentally determined the dependence of the observed rate of the P-transfer on the nature of the divalent metal, with Mg2+ being by far superior to the other divalent metals. We proposed that this majorly is an effect on the enthalpy of activation, that other divalent metals provide poor orbital overlap, in agreement with published work on P-transfer reactions that show selectivity for a specific metal.

      Please provide proper citations for SHAKE and WHAM.

      The citations were added.

      Reviewer #2 (Recommendations For The Authors):

      The authors did not really address in the revised manuscript the main points of the previous review, which included examination of non-enzymatic reaction (via simulation, not measurement) and quantification of the connection between the reported wide TS ensemble and the increase in entropy (by additional simulations). The authors should also add reference to the AM1/d-PhoT model of Nam et al. which is now discussed.

      QM/MM calculations for uncatazlysed phosphoryl-transfer reactions involving either diphosphates or triphosphates have been well documented in the literature showing narrow and symmetric TSE (Klahn et al., JACS 2006, 128 (47) 15310-15323; Cui Wang et al., JPCB 2015, 119(9), 3720-3726). We added these references to the revised manuscript.

      The reference to AM1/d-PhoT model was added.

      Reviewer #3 (Recommendations For The Authors):

      In the revised ms, the authors indeed addressed many of the points raised in the previous round of review. In addition to the issue of TSE and committor mentioned above, another point that needs to be carefully explained is the very significant difference between umbrella sampling results and those in Fig. 1C - especially for the case without Mg2+ - the difference of more than 20 kcal/mol is not something that can be ignored at a qualitative level.

      We thank the reviewer for pointing out that the difference in free energy profiles between umbrella sampling (US) and MSMD, especially in the case without Mg<sup>2</sup>+ needs to be addressed.

      We believe that the key reason for this difference lies in the methodological approaches of these techniques.

      Umbrella sampling is an equilibrium enhanced sampling method, that allows for a balanced and thorough exploration of the free energy landscape, the MSMD is a non-equilibrium method and estimation depends of the averaging scheme used and the number of trajectories. In the present work, the free energy was estimated using an exponential average. This averaging scheme has a slow convergence, small variance and may overestimate the free energy barrier, specially if the barrier as seen in the absence of Mg is quite high. This factor could explain the significant difference between umbrella sampling and MSMD combined with Jarzynski’s equality.

      We have added new panels to Fig. 4 to compare the TSE from the more accurate umbrella sampling to the MSMD simulations, buttressing the validity of our original findings. We revised the manuscript discuss the differences between the MSMD and the umbrella sampling free energy profiles.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The work analyzes how centrosomes mature before cell division. A critical aspect is the accumulation of pericentriolar material (PCM) around the centrioles to build competent centrosomes that can organize the mitotic spindle. The present work builds on the idea that the accumulation of PCM is catalyzed either by the centrioles themselves (leading to a constant accumulation rate) or by enzymes activated by the PCM itself (leading to autocatalytic accumulation). These ideas are captured by a previous model derived for PCM accumulation in C. elegans (ref. 8) and are succinctly summarized by Eq. 1. The main addition of the present work is to allow the activated enzymes to diffuse in the cell, so they can also catalyze the accumulation of PCM in other centrosomes (captured by Eqs. 2-4). The authors claim that this helps centrosomes to reach the same size, independent of potential initial mismatches.

      A strength of the paper is the simplicity of the equations, which are reduced to the bare minimum and thus allow a detailed inspection of the physical mechanism. One shortcoming of this approach is that all equations assume that the diffusion of molecules is much faster than any of the reactive time scales, although there is no experimental evidence for this.

      We appreciate the reviewer’s recognition of the strengths of our work. Indeed, the centrosome growth model incorporates multiple timescales corresponding to various reactions, and existing experimental data do not directly provide diffusion constants for the cytosolic proteins. However, we can estimate these diffusion constants using protein mass, based on the Stokes-Einstein relation, and compare the diffusion timescales with the reaction timescales obtained from FRAP analysis. For example, we estimate that the diffusion timescale for centrosomes separated by 5-10 micrometers is much smaller than the reaction timescales deduced from the FRAP experiments. Specifically, for SPD-5, a scaffold protein with a mass of ~150 kDa, the estimated diffusion constant is ~17 µm<sup>2</sup>/s, using the Stokes-Einstein relation and a reference diffusion constant of ~30 µm<sup>2</sup>/s for a 30 kDa GFP protein (reference: Bionumbers book). This results in a diffusion timescale of ~1 second for centrosomes 10 µm apart. In contrast, FRAP recovery timescales for SPD-5 in C. elegans embryos are on the order of several minutes, suggesting that scaffold protein binding reactions are much slower than diffusion. Therefore, a reaction-limited model is appropriate for studying PCM self-assembly during centrosome maturation. We have revised the manuscript to clarify this point and to include a discussion of the diffusion and reaction timescales.

      Spatially extended model with diffusion

      Both the reviewers have pointed out the importance of considering diffusion effects in centrosome size dynamics, and we agree that this is important to explore. We have developed a spatially extended 3D version of the centrosome growth model, incorporating stochastic reactions and diffusion (see Appendix 4). In this model, the system is divided into small reaction volumes (voxels), where reactions depend on local density, and diffusion is modeled as the transport of monomers/building blocks between voxels.

      We find that diffusion can alter the timescales of growth, particularly when the diffusion timescale is comparable to or slower than the reaction timescale, potentially mitigating size inequality by slowing down autocatalysis. However, the main conclusions of the catalytic growth model remain unchanged, showing robust size regulation independent of diffusion constant or centrosome separation (Figure 2—figure supplement 3). Hence, we focused on the effect of subunit diffusion on the autocatalytic growth model. We find that in the presence of diffusion, the size inequality reduces with increasing diffusion timescale, i.e., increasing distance between centrosomes and decreasing diffusion constant (Figure 2—figure supplement 4). However, the lack of robustness in size control in the autocatalyic growth model remains, i.e., the final size difference increases with increasing initial size difference. Notably, in the diffusion-limited regime (very small diffusion or large distances), the growth curve loses its sigmoidal shape, resembling the behavior in the non-autocatalytic limit (Figure 2). These findings are discussed in the revised manuscript.

      Another shortcoming of the paper is that it is not clear what species the authors are investigating and how general the model is. There are huge differences in centrosome maturation and the involved proteins between species. However, this is not mentioned in the abstract or introduction. Moreover, in the main body of the paper, the authors mention C. elegans on pages 2 and 3, but refer to Drosophila on page 4, switching back to C. elegans on page 5, and discuss Drosophila on page 6. This is confusing and looks as if they are cherry-picking elements from various species. The original model in ref. 8 was constructed for C. elegans and it is not clear whether the autocatalytic model is more general than that. In any case, a more thorough discussion of experimental evidence would be helpful.

      We believe one strength of our approach is its applicability across organisms. Our goal in comparing the theoretical model with experimental data from C. elegans and D.

      melanogaster is to demonstrate that the apparent qualitative differences in centrosome growth across species (see e.g., the extent of size scaling discussed in the section “Cytoplasmic pool depletion regulates centrosome size scaling with cell size”) may arise from the same underlying mechanisms in the theoretical model, albeit with different parameter values. We acknowledge differences in regulatory molecules between species, but the core pathways remain conserved see e.g. Raff, Trends in Cell Biology 2019, section: “Molecular Components of the Mitotic Centrosome Scaffold Appear to Have Been Conserved in Evolution from Worms to Humans”. In the revised manuscript, we have expanded the introduction to clarify this point and explain how our theory applies across species. We have also provided a clearer discussion of the experimental systems used throughout the manuscript and the available experimental evidence.

      The authors show convincingly that their model compensates for initial size differences in centrosomes and leads to more similar final sizes. These conclusions rely on numerical simulations, but it is not clear how the parameters listed in Table 1 were chosen and whether they are representative of the real situation. Since all presented models have many parameters, a detailed discussion on how the values were picked is indispensable. Without such a discussion, it is not clear how realistic the drawn conclusions are. Some of this could have been alleviated using a linear stability analysis of the ordinary differential equations from which one could have gotten insight into how the physical parameters affect the tendency to produce equal-sized centrosomes.

      Following the suggestion of the reviewer, we have revised the manuscript to add references and discussions justifying the choice of the parameter values used for the numerical simulations. These references and parameter choices can be found in Table 1 and Table 2, and are also discussed in relevant figure captions and within the manuscript text.

      We thank the reviewer for the excellent suggestion of including linear stability analysis of the ODE models of centrosome growth. We included linear stability analyses of the catalytic and autocatalytic growth models in Appendix 3. Analysis of the catalytic growth model reaffirms the robustness of size equality and the analysis of autocatalytic growth provides an approximate condition of size inequality. We have modified the revised manuscript to discuss these results.

      The authors use the fact that their model stabilizes centrosome size to argue that their model is superior to the previously published one, but I think that this conclusion is not necessarily justified by the presented data. The authors claim that "[...] none of the existing quantitative models can account for robustness in centrosome size equality in the presence of positive feedback." (page 1; similar sentence on page 2). This is not shown convincingly. In fact, ref 8. already addresses this problem (see Fig. 5 in ref. 8) to some extent.

      The linear stability analysis shown in Fig 5 in ref 8 (Zwicker et al, PNAS, 2014) shows that the solutions are stable around the fixed point and it was inferred from this result that Ostwald ripening can be suppressed by the catalytic activity of the centriole, therefore stabilizing the centrosomes (droplets) against coarsening by Ostwald ripening. But, if size discrepancy arises from the growth process (e.g., due to autocatalysis) the timescale of relaxation for such discrepancy is not clear from the above-mentioned result. We show (in figure 2 - figure supplement 3) that for any appreciable amount of positive feedback, the solution moves very slowly around the fixed point (almost like a line attractor) and cannot reach the fixed point in a biologically relevant timescale. Hence the model in ref 8 does not provide a robust mechanism for size control in the presence of autocatalytic growth. We have added this discussion in the Discussion section.

      More importantly, the conclusion seems to largely be based on the analysis shown in Fig. 2A, but the parameters going into this figure are not clear (see the previous paragraph). In particular, the initial size discrepancy of 0.1 µm^3 seems quite large, since it translates to a sphere of a radius of 300 nm. A similarly large initial discrepancy is used on page 3 without any justification. Since the original model itself already showed size stability, a careful quantitative comparison would be necessary.

      We thank the reviewer for the valuable suggestions. The parameters used in Fig. 2A are listed in Table 1 with corresponding references, and we used the parameter values from Zwicker et al. (2014) for rate constants and concentrations.

      The issue of initial size differences between centrosomes is important, but quantitative data on this are not readily available for C. elegans and Drosophila. Centrosomes may differ initially due to disparities in the amount and incorporation rate of PCM between the mother and daughter centrioles. Based on available images and videos (Cabral et al, Dev. Cell, 2019, DOI: https://doi.org/10.1016/j.devcel.2019.06.004), we estimated an initial radius of ~0.5 μm for centrosomes. Accounting for a 5% radius difference would lead to a volume difference of ~0.1 μm<sup>3</sup>, which was used in our analysis (Fig. 2A). These differences likely arise from distinct growth conditions of centrosomes containing different centrioles (older mother and newer daughter).

      More importantly, we emphasize that the initial size difference does not qualitatively alter the results presented in Figure 2. We agree that a quantitative analysis will further clarify our conclusions, and we have revised the manuscript accordingly. For example, Figure 2—figure supplement 3 provides a detailed analysis of how the final centrosome size depends on initial size differences across various parameter values. Additionally, Appendix 3 now includes analytical estimates of the onset of size inequality as a function of these parameters.

      The analysis of the size discrepancy relies on stochastic simulations (e.g., mentioned on pages 2 and 4), but all presented equations are deterministic. It's unclear what assumptions go into these stochastic equations, and how they are analyzed or simulated. Most importantly, the noise strength (presumably linked to the number of components) needs to be mentioned. How is this noise strength determined? What are the arguments for this choice? This is particularly crucial since the authors quote quantitative results (e.g., "a negligible difference in steady-state size (∼ 2% of mean size)" on page 4).

      As described in the Methods, we used the exact Gillespie method (Gillespie, JPC, 1977) to simulate the evolution of the stochastic trajectories of the systems, corresponding to the deterministic growth and reaction kinetics outlined in the manuscript. We've expanded the Methods to include further details on the stochastic simulations and refer to Appendix 1, where we describe the chemical master equations governing autocatalytic growth..

      The noise strength (fluctuations about the mean size of centrosome) does depend on the total monomer concentration (the pool size), and this may affect size inequality. Similar values of the total monomer concentration were used in the catalytic (0.04 uM) and autocatalytic growth (0.33 uM) simulations. These values for the pool size are similar to previous studies (Zwicker et al, PNAS, 2012) and have been optimized to obtain a good fit with experimental growth curves from C. elegans embryo data.

      To present more quantitative results, we have revised our manuscript to add data showing the effect of pool size on centrosome size inequality (Figure 3 - figure supplement 2). We find the size inequality in catalytic growth to increase with decreasing pool size as the origin of this inequality is the stochastic fluctuation in individual centrosome size. The size inequality (ratio of dv/<V>) in the autocatalytic growth does not depend (strongly) on the pool size (dv and <V> both increase similarly with pool size).

      Moreover, the two sets of testable predictions that are offered at the end of the paper are not very illuminative: The first set of predictions, namely that the model would anticipate an "increase in centrosome size with increasing enzyme concentration, the ability to modify the shape of the sigmoidal growth curve, and the manipulation of centrosome size scaling patterns by perturbing growth rate constants or enzyme concentrations.", are so general that they apply to all models describing centrosome growth. Consequently, these observations do not set the shared enzyme pool apart and are thus not useful to discriminate between models. The second part of the first set of predictions about shifting "size scaling" is potentially more interesting, although I could not discern whether "size scaling" referred to scaling with cell size, total amount of material, or enzymatic activity at the centrioles. The second prediction is potentially also interesting and could be checked directly by analyzing published data of the original model (see Fig. 5 of ref. 8). It is unclear to me why the authors did not attempt this.

      In response to the reviewers' valuable feedback, we have revised the manuscript to include results on potential methods for distinguishing catalytic growth from autocatalytic growth. Since the growth dynamics of a single centrosome do not significantly differ between these two models, it is necessary to experimentally examine the growth dynamics of a centrosome pair under various initial size perturbations. In Figure 3-figure supplement 2, we present theoretical predictions for both catalytic and autocatalytic growth models, illustrating the correlation between initial and final sizes after maturation. The figure demonstrates that the initial size difference and final size difference should be correlated only in the autocatalytic growth and the relative size inequality decreases with increasing subunit pool size in catalytic growth while remains almost unchanged in autocatalytic growth. These predictions can be experimentally examined by inducing varying centrosome sizes at the early stage of maturation for different expression levels of the scaffold former proteins.

      A second experimentally testable feature of the catalytic growth model involves sharing of the enzyme between both centrosomes. This could be tested through immunofluorescent staining of the kinase or by constructing a FRET reporter for PLK1 activity, where it can be studied if the active form of the PLK1 is found in the cytoplasm around the centrosomes indicating a shared pool of active enzyme. Additionally, photoactivated localization microscopy could be employed, where fluorescently tagged enzyme can be selectively photoactivated in one centrosome and intensity can be measured at the other centrosome to find the extent of enzyme sharing between the centrosomes.

      We also discuss shifts in centrosome size scaling behavior with cell size by varying parameters of the catalytic growth model (Fig 4). While quantitative analysis of size scaling in Drosophila is currently unavailable, such an investigation could enable us to distinguish catalytic growth mode with other models. We have included this point in the Discussion section.

      “The second prediction is potentially also interesting …” We assume the reviewer is referencing the scenario in Zwicker et al. (ref 8), where differences in centriole activity lead to unequal centrosome sizes. The data in that study represent a case of centrosome growth with variable centriole activity, resulting in size differences in both autocatalytic and catalytic growth models. This differs from our proposed experiment, where we induce unequal centrosome sizes without modifying centriole activity. We have now revised the text to clarify this distinction.

      Taken together, I think the shared enzyme pool is an interesting idea, but the experimental evidence for it is currently lacking. Moreover, the model seems to make little testable predictions that differ from previous models.

      We appreciate the reviewer’s interest in the core idea of our work. As mentioned earlier, we have improved the clarity in model predictions in the revised discussion section. Unfortunately, the lack of publicly available experimental data limits our ability to provide more direct experimental evidence. However, we are hopeful that our theoretical model will inspire future experiments to test these model predictions.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Banerjee & Banerjee argue that a solely autocatalytic assembly model of the centrosome leads to size inequality. The authors instead propose a catalytic growth model with a shared enzyme pool. Using this model, the authors predict that size control is enzyme-mediate and are able to reproduce various experimental results such as centrosome size scaling with cell size and centrosome growth curves in C. elegans.

      The paper contains interesting results and is well-written and easy to follow/understand.

      We are delighted that the reviewer finds our work interesting, and we appreciate the thoughtful suggestions provided. In response, we have revised the text and figures to incorporate these recommendations. Below, we address each of the reviewer’s comments point by point:

      Suggestions:

      ● In the Introduction, when the authors mention that their "theory is based on recent experiments uncovering the interactions of the molecular components of centrosome assembly" it would be useful to mention what particular interactions these are.

      As the reviewer suggested, we have modified the introduction section to add the experimental observations upon which we build our model.

      ● In the Results and Discussion sections, the authors note various similarities and differences between what is known regarding centrosome formation in C. elegan and Drosophila. It would have been helpful to already make such distinctions in the Introduction (where some phenomena that may be C. elegans specific are implied to hold centrosomes universally). It would also be helpful to include more comments for the possible implications for other systems in which centrosomes have been studied, such as human, Zebrafish, and Xenopus.

      We thank the reviewer for this suggestion. We have modified the Introduction to motivate the comparative study of centrosome growth in different organisms and draw relevant connections to centrosome growth in other commonly studied organisms like Zebrafish and Xenopus.

      ● For Fig 1.C, the two axes are very close to being the same but are not. It makes the graph a little bit more difficult to interpret than if they were actually the same or distinctly different. It would be more useful to have them on the same scale and just have a legend.

      We have modified the Figure 1C in the revised manuscript. The plot now shows the growth of a single and a pair of centrosomes both on the same y-axis scale.

      ● The authors refer to Equation 1 as resulting from an "active liquid-liquid phase separation", but it is unclear what that means in this context because the rheology of the centrosome does not appear to be relevant.

      We used the term “active liquid-liquid phase separation” simply to refer to a previous model proposed by Zwicker et al (PNAS, 2014) where the underlying process of growth results from liquid-liquid phase separation. We agree with the reviewer that the rheological property of the centrosome is not very relevant in our discussions and we have thus removed the sentence from the revised manuscript to avoid any confusion.

      ● The authors reject the non-cooperative limit of Eq 1 because, even though it leads to size control, it does not give sigmoidal dynamics (Figure 2B). While I appreciate that this is just meant to be illustrative, I still find it to be a weak argument because I would guess a number of different minor tweaks to the model might keep size control while inducing sigmoidal dynamics, such as size-dependent addition of loss rates (which could be due to reactions happen on the surface of the centrosome instead of in its bulk, for example). Is my intuition incorrect? Is there an alternative reason to reject such possible modifications?

      The reviewer raises an interesting point here. However, we disagree with the idea that minor adjustments to the model can produce sigmoidal growth curves while still maintaining size control. In the absence of an external, time-dependent increase in building block concentration (which would lead to an increasing growth rate), achieving sigmoidal growth requires a positive feedback mechanism in the growth rate. This positive feedback alone could introduce size inequality unless shared equally between the centrosomes, as it is in our model of catalytic growth in a shared enzyme pool. The proposed modification involving size-dependent addition or loss rates due to surface assembly/disassembly may result in unequal sizes precisely because of this positive feedback. A similar example is provided in Appendix 1, where assembly and disassembly across the pericentriolic material volume lead to sigmoidal growth but also generate significant size inequality and lack of robustness in size control.

      ● While the inset of Figure 3D is visually convincing, it would be good to include a statistical test for completeness.

      Following the reviewer’s suggestion, we present a statistical analysis in Figure 3 - Figure supplement 2 in the modified manuscript to enhance clarity. We show that the size difference values are uncorrelated (Pearson’s correlation coefficient ~ 0) with the initial size difference indicating the robustness of the size regulation mechanism.

      ● The authors note that the pulse in active enzyme in their model is reminiscent of the Polo kinase pulse observed in Drosophila. Can the authors use these published experimental results to more tightly constrain what parameter regime in their model would be relevant for Drosophila? Can the authors make predictions of how this pulse might vary in other systems such as C. elegans?

      Thank you for the insightful suggestion regarding the use of pulse dynamics in experiments to better constrain the model’s parameter regime. In our revised manuscript, we attempted this analysis; however, the data from Wong et al. (EMBO 2022) for Drosophila are presented as normalized intensity in arbitrary units, rather than as quantitative measures of centrosome size or Polo enzyme concentration. This lack of quantitative data limits our ability to benchmark the model beyond capturing qualitative trends. We thus believe that quantitative measurements of centrosome size and enzyme concentration are necessary to achieve a tighter alignment between model predictions and biological data.

      We discuss the enzyme dynamics in C. elegans in the revised manuscript. We find the enzyme dynamics corresponding to the fitted growth curves of C. elegans centrosomes are distinctly different from the ones observed in Drosophila. Instead of the pulse-like feature, we find a step-like increase in (cytosolic) active enzyme concentration.

      ● The authors mention that the shared enzyme pool is likely not diffusion-limited in C. elegans embryos, but this might change in larger embryos such as Drosophila or Xenopus. It would be interesting for the authors to include a more in-depth discussion of when diffusion will or will not matter, and what the consequence of being in a diffusion-limit regime might be.

      Both the reviewers have pointed out the importance of considering diffusion effects in centrosome size dynamics, and we agree that this is important to explore. We have developed a spatially extended 3D version of the centrosome growth model, incorporating stochastic reactions and diffusion (see Appendix 4). In this model, the system is divided into small reaction volumes (voxels), where reactions depend on local density, and diffusion is modeled as the transport of monomers/building blocks between voxels.

      We find that diffusion can alter the timescales of growth, particularly when the diffusion timescale is comparable to or slower than the reaction timescale, potentially mitigating size inequality by slowing down autocatalysis. However, the main conclusions of the catalytic growth model remain unchanged, showing robust size regulation independent of diffusion constant or centrosome separation (Figure 2—figure supplement 3). Hence, we focused on the effect of subunit diffusion on the autocatalytic growth model. We find that in the presence of diffusion, the size inequality reduces with increasing diffusion timescale, i.e., increasing distance between centrosomes and decreasing diffusion constant (Figure 2—figure supplement 4). However, the lack of robustness in size control in the autocatalyic growth model remains, i.e., the final size difference increases with increasing initial size difference. Notably, in the diffusion-limited regime (very small diffusion or large distances), the growth curve loses its sigmoidal shape, resembling the behavior in the non-autocatalytic limit (Figure 2). These findings are discussed in the revised manuscript.

      ● The authors state "Firstly, our model posits the sharing of the enzyme between both centrosomes. This hypothesis can potentially be experimentally tested through immunofluorescent staining of the kinase or by constructing FRET reporter of PLK1 activity." I don't understand how such experiments would be helpful for determining if enzymes are shared between the two centrosomes. It would be helpful for the authors to elaborate.

      Our results indicate the necessity of the centrosome-activated enzyme to be shared for the robust regulation of centrosome size equality. If a FRET reporter of the active form of the enzyme (e.g., PLK1) can be constructed then the localization of the active form of the enzyme may be determined in the cytosol. We propose this based on reports of studying PLK activities in subcellular compartments using FRET as described in Allen & Zhang, BBRC (2006). Such experiments will be a direct proof of the shared enzyme pool. Following the reviewer’s suggestion, we have modified the description of the FRET based possible experimental test for the shared enzyme pool hypothesis in the revised manuscript.

      Additionally, we have added another possible experimental test based on photoactivated localization microscopy (PALM), where tagged enzyme can be selectively photoactivated in one centrosome and intensity measured at the other centrosome to indicate whether the enzyme is shared between the centrosomes.

      Recommendations for the authors:

      The manuscript needs to clarify better what species the model describes, how alternative models were rejected, and how the parameters were chosen.

      In the revised manuscript, we have connect the chemical species in our model to those documented in organisms like Drosophila and C. elegans. This connection is detailed in the main text under the Catalytic Growth Model section and summarized in Table 2. We discuss alternative models and our reasons for excluding them in the first results section on autocatalytic growth, with additional details provided in Appendix 1 and the accompanying supplementary figures. The selection of model parameters is addressed in the main text and methods, with references listed in Table 1. We believe that these revisions, along with our point-by-point responses to reviewer comments, comprehensively address all reviewer concerns.

      Reviewer #1 (Recommendations For The Authors):

      I think the style and structure of the paper could be improved on at least two accounts:

      (1) What's the role of the last section ("Multi-component centrosome model reveals the utility of shared catalysis on centrosome size control.")? It seems to simply add another component, keeping the essential structure of the model untouched. Not surprisingly, the qualitative features of the model are preserved and quantitative features are not discussed anyway.

      This model provides a more realistic description of centrosome growth by incorporating the dynamics of the two primary scaffold-forming subunits and their interactions with an enzyme. It is based on the observation that the major interaction pathways among centrosome components are conserved across many organisms (see Raff, Trends in Cell Biology, 2019 and Table 2), typically involving two scaffold-forming proteins and one enzyme that mediates positive feedback between them. These pathways may involve homologous proteins in different species.

      This model allows us to validate the experimentally observed spatial spread of the two subunits, Cnn and Spd-2, in Drosophila. Additionally, we used it to investigate the impact of relaxing the assumption of a shared enzyme pool on size control. Although similar insights could be obtained using a single-component model, the two-component model offers a more biologically relevant framework. We have highlighted these points in the revised manuscript to ensure clarity.

      (2 ) The very long discussion section is not very helpful. First, it mostly reiterates points already made in the main text. Second, it makes arguments for the choice of modeling (top left column of page 8), which probably should have been made when introducing the model. Third, it introduces new results (lower left column of page 8), which should probably be moved to the main text. Fourth, the interpretation of the model in light of the known biochemistry is useful and should probably be expanded although I think it would be crucial to keep information from different organisms clearly separate (this last point actually holds for the entire manuscript).

      We thank the reviewer for the feedback. We have modified the discussion section to focus more on the interpretation of the results, model predictions and future outlook with possible experiments to validate crucial aspects of the model. We have moved most of the justifications to the main text model description.

      Here are a few additional minor points:

      * page 1: Typo "for for" → "for"

      * Page 8: Typo "to to" → "to"

      We thank the reviewer for the useful recommendations. We have corrected all the typos in the revised manuscript.

      * Why can diffusion be neglected in Eq. 1? This is discussed only very vaguely in the main text (on page 3). Strangely, there is some discussion of this crucial initial step in the discussion section, although the diffusion time of PLK1 is compared to the centrosome growth time there and not the more relevant enzyme-mediate conversion rate or enzyme deactivation rate.

      We now discuss the justification of neglecting diffusion while motivating the model. We have added a more detailed discussion in the Methods section. We estimate the timescale of diffusion for the scaffold formers and the enzyme and compare them with the turnover timescales of the respective proteins Spd-2, Cnn and Polo. We find the proteins to diffuse fast compared to their FRAP recovery timescales indicating reaction timescales to be slower than the timescales of diffusion. Nevertheless, following the reviewer’s suggestion, we have also investigated the effect of diffusion on the growth process in Appendix 4.

      * Page 3: The comparison k_0^+ ≫ k_1^+ is meaningless without specifying the number of subunits n. I even doubt that this condition is the correct one since even if k_0^+ is two orders of magnitude larger than k_1^+, the autocatalytic term can dominate if there are many subunits.

      We thank the reviewer for the insightful comment on the comparison between the growth rates k^+_0 and k^+_1. Indeed, the pool size matters and we have now included a linear stability analysis of the autocatalytic growth equations in Appendix 3 to estimate the condition for size inequality. We have commented on these new findings in the revised manuscript.

      * The Eqs. 2-4 are difficult to follow in my mind. For instance, it is not clear why the variables N_av and N_av^E are introduced when they evidently are equivalent to S_1 and E. It would also help to explicitly mention that V_c is the cell volume. Moreover, do these equations contain any centriolar activity? If so, I could not understand what term mediates this. If not, it might be good to mention this explicitly.

      Following the reviewer’s suggestion, we have modified the equations 2-4 and added the definition of V_c to enhance clarity in the revised manuscript. The centriole activity is given by k^+ in the catalytic model. We now explicitly mention it.

      * Page 4: The observed peak of active enzyme (Fig 3C) is compared to experimental observation of a PLK1 peak at centrosomes in Drosophila (ref. 28). However, if I understand correctly, the peak in the model refers to active enzyme in the entire cell (and the point of the model is that this enzymatic pool is shared everywhere), whereas the experimental measurement quantified the amount of PLK1 at the centrosome (and not the activity of the enzyme). How are the quantity in the model related to the experimental measurements?

      The reviewer is correct in pointing out the difference between the quantities calculated from our model and those measured in the experiment by Wong et al. We have clarified this point in the revised manuscript. We hypothesize that if, in future experiments, the active (phosphorylated) polo can be observed by using a possible FRET reporter of activity then the cytosolic pulse can be observed too. We discuss this point in the revised manuscript.

      * Page 6: The asymmetry due to differences in centriolar activity is apparently been done for both models (Eq. 1 and Eqs. 2-4), referring to a parameter k_0^+ in both cases. How does this parameter enter in the latter model? More generally, I don't really understand the difference in the two rows in Fig. 5 - is the top row referring to growth driven by centriolar activity while the lower row refers to pure autocatalytic growth? If so, what about the hybrid model where both mechanisms enter? This is particularly relevant, since ref. 8 claims that such a hybrid model explains growth curves of asymmetric centrosomes quantitatively. Along these lines, the analysis of asymmetric growth is quite vague and at most qualitative. Can the models also explain differential growth quantitatively?

      We believe the reviewer’s comment on centrosome size asymmetry may stem from a lack of clarity in our initial explanation. In this section, as shown in Figure 5, we compare the full autocatalytic model (where both k_0^+ and k_1^+ are non-zero) with the catalytic model. The confusion might have arisen due to an unclear definition of centriolar activity in the catalytic growth model, which we have clarified in the revised manuscript. Specifically, we use k+ in the catalytic model and k0+ in the autocatalytic model as indicators of centriolar activity.

      Our findings quantitatively demonstrate that variations in centriole activity can robustly drive size asymmetry in catalytic growth, independent of initial size differences. However, in autocatalytic growth, increased initial size differences make the system more vulnerable to a loss of regulation, as positive feedback can amplify these differences, ultimately influencing the final size asymmetry. Our results do not contradict Zwicker et al. (ref 8); rather, they complement it. We show that size asymmetry in autocatalytic growth is governed by both centriole activity and positive feedback, highlighting that centriole activity alone cannot robustly regulate centrosome size asymmetry within this framework.

      * The code for performing the simulations does not seem to be available

      We have now made the main codes available in a GitHub repository. Link: https://github.com/BanerjeeLab/Centrosome_growth_model

    1. Author response:

      We thank the reviewers for their constructive comments. While we work on a revision that addresses all points raised, we would already like to point out that both reviewers seem to have misunderstood how we reported the percentages of filament types in our reactions. Because we included all picked images in our calculations (including false positives from the picking, as well as damaged, overlapping or otherwise unsuitable filaments), we may have inadvertently given the impression that these filament preparations are not pure. In fact, the opposite is true: 0N3R PAD12 tau and the mixture of 0N3R:0N4R PAD12 tau assemble into highly pure paired helical filaments with the Alzheimer fold. Discarding images is common practice for high-resolution cryo-EM structure determination. Our reported percentages of discarded images (20-30%) are much lower than in typical cryo-EM studies, which is another reflection of the high quality of these samples. The main impurity lies in smaller fractions (~10%) of single protofilaments with the Alzheimer fold. We will make this clearer in our revised manuscript.

    1. Author response:

      (1) discuss the non-native properties of ROCKET and compare CDL binding in native proteins

      ROCKET is indeed a non-native protein with exceptional stability, which makes it immune to mutations with subtle effects on structure or dynamics. We would argue that this is an advantage, allowing us to find the features with the most pronounced impact on CDL-mediated stability. The reviewers are right that there certainly are other structural features which impact CDL binding, which cannot be investigated using ROCKET. This is the reason we then apply our findings to GlpG - to translate back to native systems.

      The CDL binding site geometry that we tested experimentally was derived by Corey et al (Sci Adv 2022) from large-scale computational analysis of native protein structures. Our data adds some basic rules for flexibility, which helped us to identify GlpG as a potentially CDL-regulated protein. Following the reviewers’ suggestion, we will screen the dataset from Corey et al. for experimentally confirmed examples of CDL-mediated stabilization and analyze whether they conform to the rules derived from analysis of ROCKET. In this way, we may be able to assess how general our findings are.

      (2) clarify the limitations of combining MS and nMS

      The reviewers correctly point out that there are differences between the MD and MS data: although the binding Site 1 has nearly 100% occupancy in MD, MS shows that ca 50% of the protein is CDL-free and that not all subunits in the tetramer have a CDL bound. Furthermore, MD shows that aromatic residues are important, but this is not tested by MS. Both points relate to the shortcomings of nMS, which requires desolvation, ionization, and detergent stripping to detect protein-lipid complexes. These processes can potentially affect lipid binding, e.g. by leading to loss of lipids that are not tightly bound. As a result, absolute quantitative comparisons between MD and MS are challenging, and contributions from subtle non-electrostatic interactions involving aromatic residues are difficult to detect. For this reason, we use relative changes in lipid interactions between different ROCKET variants to compare MD and MS data. We will discuss these factors in the revision.

      (3) more detailed investigation of the structure-function relationship in GlpG-CDL complexes

      We use the insights from ROCKET to identify a stabilizing CDL site in GlpG and find that CDL binding switches substrate preference from transmembrane to soluble substrates. We do not verify the binding site with mutagenesis in our study, but the MD and MS data are very unambiguous that there is only one site, and its location provides a rationale for how CDL affects substrate binding, which is described in the supplementary data.

      We agree that the regulatory effect of CDL on GlpG activity raises a wide range of interesting questions relating to the mechanism of allosteric inhibition, the evolutionary background, and biological implications of E. coli using changes in membrane CDL content to steer GlpG activity. Work in our labs is on-going to investigate this further, including the mutational analysis suggested by the reviewers, but it moves beyond of the scope of the current study. We will discuss our rationale for the absence of mutagenesis data in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      We thank the reviewer for the time and effort in providing very useful comments and suggestions for our manuscript.

      (1) The results do not support the conclusions. The main "selling point" as summarized in the title is that the apoptotic rate of zebrafish motorneurons during development is strikingly low (~2% ) as compared to the much higher estimate (~50%) by previous studies in other systems. The results used to support the conclusion are that only a small percentage (under 2%) of apoptotic cells were found over a large population at a variety of stages 24-120hpf. This is fundamentally flawed logic, as a short-time window measure of percentage cannot represent the percentage in the long term. For example, at any year under 1% of the human population dies, but over 100 years >99% of the starting group will have died. To find the real percentage of motorneurons that died, the motorneurons born at different times must be tracked over the long term or the new motorneuron birth rate must be estimated. A similar argument can be applied to the macrophage results. Here the authors probably want to discuss well-established mechanisms of apoptotic neuron clearance such as by glia and microglia cells.

      We chose the time window of 24-120 hpf based on the following two reasons: 1) Previous studies showed that although the time windows of motor neuron death vary in chick (E5-E10), mouse (E11.5-E15.5), rat (E15-E18), and human (11-25 weeks of gestation), the common feature of these time windows is that they are all the developmental periods when motor neurons contact with muscle cells. The contact between zebrafish motor neurons and muscle cells occurs before 72 hpf, which is included in our observation time window of 24-120 hpf. 2) Zebrafish complete hatching during 48-72 hpf, and most organs form before 72 hpf. More importantly, zebrafish start swimming around 72 hpf, indicating that motor neurons are fully functional at 72 hpf. Thus, we are confident that this 24-120 hpf time window covers the time window during which motor neurons undergo programmed cell death during zebrafish early development. We have added this information to the revised manuscript.

      We frequently used “early development” in this manuscript to describe our observation. However, we missed “early” in our title. We therefore have added this ket word of “early” in the title in the revised manuscript.

      Previous studies in zebrafish have shown that the production of spinal cord motor neurons largely ceases before 48 hpf, and then the motor neurons remain largely constant until adulthood (doi: 10.1016/j.celrep.2015.09.050; 10.1016/j.devcel.2013.04.012; 10.1007/BF00304606; 10.3389/fcell.2021.640414). Our observation time window covers the major motor neuron production process. Therefore, we believe that neurogenesis will not affect our findings and conclusions.

      We discussed the engulfment of dead motor neurons by other types of cells in the discussion section.

      (2) The transgenic line is perhaps the most meaningful contribution to the field as the work stands. However, the mnx1 promoter is well known for its non-specific activation - while the images suggest the authors' line is good, motor neuron markers should be used to validate the line. This is especially important for assessing this population later as mnx1 may be turned off in mature neurons.

      The mnx1 promoter has been widely used to label motor neurons in transgenic zebrafish. Previous studies have shown that most of the cells labeled in the mnx1 transgenic zebrafish are motor neurons. In this study, we observed that the neuronal cells in our sensor zebrafish formed green cell bodies inside of the spinal cord and extended to the muscle region, which is an important morphological feature of the motor neurons.

      Reviewer 2:

      We thank the reviewer for the time and effort in making very useful comments and suggestions for our manuscript.

      The FRET-based programmed cell death biosensor described in this manuscript could be very useful. However, the authors have not considered what is already known about the development and programmed cell death of zebrafish spinal motor neurons, and potential differences between motor neuron populations innervating different types of muscles in different vertebrate models. Without this context, the application of their new biosensor tool does not provide new insights into zebrafish motor neuron programmed cell death. In addition, the authors have not carried out controls to show the efficacy and specificity of their morpholinos. Nor have they described how they counted dying motor neurons, or why they chose the specific developmental time points they addressed. These issues are addressed more specifically below.

      (1) Lines 12-13: Previous studies in zebrafish showed death of identified spinal motor neurons.

      Line 103: In Figure 2A the cell body in the middle is that of identified motor neuron VaP. VaP death has previously been described in several publications. The cell body on the right of the same panel appears to belong to an interneuron whose axon can be seen extending off to the left in one of the rostrocaudal axon bundles that traverse the spinal cord. Higher-resolution imaging would clarify this.

      Lines 163-164: Is this the absolute number of motor neurons that died? How were the counts done? Were all the motor neurons in every segment counted? There are approximately 30 identifiable VaP motor neurons in each embryo and they have previously been reported to die between 24-36 hpf. So this analysis is likely capturing those cells.

      Our study examined the overall motor neuron apoptosis rather than a specific type of motor neuron death, so we did not emphasize the death of VaP motor neurons. We agree that the dead motor neurons observed in our manuscript contain VaP motor neurons. However, there were also other types of dead motor neurons observed in our study. The reasons are as follows: 1) VaP primary motor neurons die before 36 hpf, but our study found motor neuron cells died after 36 hpf and even at 84 hpf (revised Figure 4A). 2) The position of the VaP motor neuron is together with that of the CaP motor neuron, that is, at the caudal region of the motor neuron cluster. Although it’s rare, we did observe the death of motor neurons in the rostral region of the motor neuron cluster (revised Figure 2C). 3) There is only one or zero VaP motor neuron in each motor neuron cluster. Although our data showed that usually one motor neuron died in each motor neuron cluster, we did observe that sometimes more than one motor neuron died in the motor neuron cluster (revised Figure 2C). We included this information in the revised discussion.

      (2) Lines 82-83: It is published that mnx1 is expressed in at least one type of spinal interneuron derived from the same embryonic domain as motor neurons.

      The mnx1 promoter has been widely used to label motor neurons in transgenic zebrafish. Previous studies have shown that most of the cells labeled in the mnx1 transgenic zebrafish are motor neurons. In this study, we observed that the neuronal cells in our sensor zebrafish formed green cell bodies inside of the spinal cord and extended to the muscle region, which is an important morphological feature of the motor neurons.

      Furthermore, a few of those green cell bodies turned into blue apoptotic bodies inside the spinal cord and changed to blue axons in the muscle regions at the same time, which strongly suggests that those apoptotic neurons are not interneurons. Although the mnx1 promoter might have labeled some interneurons, this will not affect our major finding that only a small portion of motor neurons died during zebrafish early development.

      (3) Lines 161-162: Although this may be the major time window of neurogenesis, there are many more motor neurons in adults than in larvae. Neither of these references describes the increase in motor neuron numbers over this particular time span, so the rationale for this choice is unclear.

      Lines 168-171: It is known that later developing motor neurons are still being generated in the spinal cord at this time, suggesting that if there is a period of programmed cell death similar to that described in chick and mouse, it would likely occur later. In addition, most of the chick and mouse studies were performed on limb-innervating motor neurons, rather than the body wall muscle-innervating motor neurons examined here.

      Lines 237-238: Especially since new motor neurons are still being generated at this time.

      Previous studies have shown that the production of spinal cord motor neurons largely ceases before 48 hpf in zebrafish, and then the motor neurons remain largely constant until the adulthood (doi: 10.1016/j.celrep.2015.09.050; 10.1016/j.devcel.2013.04.012; 10.1007/BF00304606; 10.3389/fcell.2021.640414). Our observation time window covers the major motor neuron production process. Therefore, we believe that neurogenesis will not affect our data and conclusions.

      The death of motor neurons in limb-innervating motor neurons has been extensively studied in chicks and rodents, as it is easy to undergo operations such as amputation. However, previous studies have shown this dramatic motor neuron death does not only occur in limb-innervating motor neurons but also occurs in other spinal cord motor neurons (doi: 10.1006/dbio.1999.9413). In our manuscript, we studied the naturally occurring motor neuron death in the whole spinal cord during the early stage of zebrafish development.

      (4) Lines 184-187: Previous publications showed that death of VaP is independent of limitations in muscle innervation area, suggesting it is not coupled to muscle-derived neurotrophic factors.

      Lines 328-334: There have been many publications describing appropriate morpholino controls. The authors need to describe their controls and show that they know that the genes they were targeting were downregulated.

      For the morpholinos, we did not confirm the downregulation of the target genes. These morpholino-related data are a minor part of our manuscript and shall not affect our major findings. We have removed the neurotrophic factors and morpholino-related data in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study demonstrates the significant role of secretory leukocyte protease inhibitor (SLPI) in regulating B. burgdorferi-induced periarticular inflammation in mice. They found that SLPI-deficient mice showed significantly higher B. burgdorferi infection burden in ankle joints compared to wild-type controls. This increased infection was accompanied by infiltration of neutrophils and macrophages in periarticular tissues, suggesting SLPI's role in immune regulation. The authors strengthened their findings by demonstrating a direct interaction between SLPI and B. burgdorferi through BASEHIT library screening and FACS analysis. Further investigation of SLPI as a target could lead to valuable clinical applications.

      The conclusions of this paper are mostly well supported by data, but two aspects need attention:

      (1) Cytokine Analysis:

      The serum cytokine/chemokine profile analysis appears without TNF-alpha data. Given TNF-alpha's established role in inflammatory responses, comparing its levels between wild-type and infected B. burgdorferi conditions would provide valuable insight into the inflammatory mechanism.

      (2) Sample Size Concerns:

      While the authors note limitations in obtaining Lyme disease patient samples, the control group is notably smaller than the patient group. This imbalance should either be addressed by including additional healthy controls or explicitly justified in the methodology section.

      We thank the reviewer for the careful review and positive comments.

      (1) We did look into the level of TNF-alpha in both WT and SLPI-/- mice with and without B. burgdorferi infection. At serum level, using ELISA, we did not observe any significant difference between all four groups. At gene expression level, using RT-qPCR on the tibiotarsal tissue, we also did not observe any significant differences. Our RT-qPCR result is consistent with the previous microarray study using the whole murine joint tissue (DOI: 10.4049/jimmunol.177.11.7930). The microarray study did not show significant changes in TNF-alpha level in C57BL/6 mice following B. burgdorferi infection. The above data suggest that TNF-alpha does not involve in SLPI-regulated immune responses in the murine tibiotarsal tissue following B. burgdorferi infection. A brief discussion will be added, and the above data will be provided as a supplemental figure in the revised manuscript.

      (2) We agree with the reviewer that the control group is smaller than the patient group. Among the archived samples that are available, the number of adult healthy controls are limited. It has been shown that the serum level of SLPI in healthy volunteers is in average about 40 ng/ml  (DOI: 10.3389/fimmu.2019.00664 and 10.1097/00003246-200005000-00003). The median level in the healthy control in our data was 38.92 ng/ml, which is comparable to the previous results. A brief discussion will be added in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Yu and coworkers investigates the potential role of Secretory leukocyte protease inhibitor (SLPI) in Lyme arthritis. They show that, after needle inoculation of the Lyme disease (LD) agent, B. burgdorferi, compared to wild type mice, a SLPI-deficient mouse suffers elevated bacterial burden, joint swelling and inflammation, pro-inflammatory cytokines in the joint, and levels of serum neutrophil elastase (NE). They suggest that SLPI levels of Lyme disease patients are diminished relative to healthy controls. Finally, they find that SLPI may interact directly the B. burgdorferi.

      Strengths:

      Many of these observations are interesting and the use of SLPI-deficient mice is useful (and has not previously been done).

      We appreciate the reviewer’s careful reading and positive comments.

      Weaknesses:

      (a) The known role of SLPI in dampening inflammation and inflammatory damage by inhibition of NE makes the enhanced inflammation in the joint of B. burgdorferi-infected mice a predicted result;

      We agree that the observation of the elevated NE level and the enhanced inflammation is theoretically likely. Indeed, that was the hypothesis that we explored, and often what is theoretically possible does not turn out to occur. In addition, despite the known contribution of neutrophils to the severity of murine Lyme arthritis, the importance of the neutrophil serine proteases and anti-protease has not been specifically studied, and neutrophils secrete many factors. Therefore, our data fill an important gap in the knowledge of murine Lyme arthritis development – and set the stage for the further exploration of this hypothesis in the genesis of human Lyme arthritis.

      (b) The potential contribution of the greater bacterial burden to the enhanced inflammation is not addressed;

      We agree with the reviewer’s viewpoint that the increased infection burden in the tibiotarsal tissue of the infected SLPI-/- mice could contribute to the enhanced inflammation. A brief discussion of this possibility will be added to the revised manuscript.

      (c) The relationship of SLPI binding by B. burgdorferi to the enhanced disease of SLPI-deficient mice is not clear; and

      We agree with the reviewer that we have not shown the importance of the SLPI-B. burgdorferi binding in the development of periarticular inflammation. It is an ongoing project in our lab to identify the SLPI binding partner in B. burgdorferi. Our hypothesis is that SLPI could bind and inhibit an unknown B. burgdorferi virulence factor that contributes to murine Lyme arthritis. We will include the above discussion in the revised manuscript.

      (d) Several methodological aspects of the study are unclear.

      We appreciate the critique and will modify the method session in greater detail in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      The authors investigated the role of secretory leukocyte protease inhibitors (SLPI) in developing Lyme disease in mice infected with Borrelia burgdorferi. Using a combination of histological, gene expression, and flow cytometry analyses, they demonstrated significantly higher bacterial burden and elevated neutrophil and macrophage infiltration in SLPI-deficient mouse ankle joints. Furthermore, they also showed direct interaction of SLPI with B. burgdorferi, which likely depletes the local environment of SLPI and causes excessive protease activity. These results overall suggest ankle tissue inflammation in B. burgdorferi-infected mice is driven by unchecked protease activity.

      Strengths:

      Utilizing a comprehensive suite of techniques, this is the first study showing the importance of anti-protease-protease balance in the development of periarticular joint inflammation in Lyme disease.

      We greatly appreciate the reviewer’s careful reading and positive comments.

      Weaknesses:

      Due to the limited sample availability, the authors investigated the serum level of SLPI in both in Lyme arthritis patients and patients with earlier disease manifestations.

      We agree with the reviewer that it would be ideal to have more samples from Lyme arthritis patients. However, among the available archived samples, samples from Lyme arthritis patients are limited. For the samples from patients with single EM, the symptom persisted into 3-4 month after diagnosis, the same timeframe when arthritis is developed. We will add the above discussion in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 2, for histological scoring, do they have similar n numbers?

      In panel B, 20 infected WT mice and 19 infected SLPI-/- mice were examined. In panel D, 13 infected WT and SLPI-/- mice were examined. Without infection, WT and SLPI-/- mice do not develop spontaneous arthritis. Due to the slow breeding of the SLPI-/- mice, a small number of uninfected control animals were used.

      (2) In Figure 3, for macrophage population analysis, maybe consider implementing Ly6G-negative gating strategy to prevent neutrophil contamination in macrophage population?

      We appreciate reviewer’s suggestion. We will analyze the data using the Ly6G-negative gating strategy and provide the result in a supplemental figure. We will compare the results using the two gating strategies in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) The investigators should address the possibility that much of the enhanced inflammatory features of infected SLPI-deficient mice are simply due to the higher bacterial load in the joint.

      We agree with the reviewer’s viewpoint that the increased infection burden in the tibiotarsal tissue of the infected SLPI-/- mice could contribute to the enhanced inflammation. A brief discussion of this possibility will be added to the revised manuscript.

      (2) Fig. 1. (A) There is no statistically significant difference in the bacterial load in the heart or skin, in contrast to the tibiotarsal joint. It would be of interest to know whether other tissues that are routinely sampled to assess the bacterial load, such as injection site, knee, and bladder, also harbored increased bacterial load in SLPI-deficient mice. (B) Heart and joint burden were measured at "21-28" days. The two time points should be analyzed separately rather than pooled.

      (A) We appreciate the reviewer’s suggestion. We agree that looking into the infection load in other tissues is helpful. However, studies into murine Lyme arthritis have been predominantly focused on tibiotarsal tissue, which displays the most consistent and prominent swelling that’s easy to observe and measure. Thus, we focused on the tibiotarsal joint in our study. (B) We collected the heart and joint tissue approximately 3-week post infection within a 3-day window based on the feasibility and logistics of the laboratory. Using “21-28 d”, we meant to describe between 21-24 days post infection. We apologize for the mislabeling and will correct it in the revised manuscript, stating approximately 3 weeks in the results, and defining approximately 3-weeks as between 21-24 days in the methods.

      (3) Fig. 2. (A) The same ambiguity as to the days post-infection as cited above in Point 2B exists in this figure. (B) Panel B: Caliper measurements to assess joint swelling should be utilized rather than visual scoring. (In addition, the legend should make clear that the black circles represent mock-infected mice.)

      (A) The histology scoring, and histopathology examination were performed at the same time as heart and joint tissue collection, approximately 3 weeks post infection within a 3-day window based on the feasibility and logistics of the laboratory. We apologize for the mislabeling and will correct it in the revised manuscript.  (B) We appreciate the reviewer’s suggestion. However, our extensive experience is that caliper measurement can alter the assessment of swelling by placing pressure on the joints and did not produce consistent results. Double blinded scoring was thus performed. Histopathology examination was performed by an independent pathologist and confirmed the histology score and provided additional measurements.

      (4) Fig. 3. (A) See Point 2B. (B) For Panels C-E, uninfected controls are lacking.

      We apologize for this omission. Uninfected controls will be provided in the revised manuscript.

      (5) Fig. 4. Fig. 4. Some LD subjects were sampled multiple times (5 samples from 3 subjects with Lyme arthritis; 13 samples from 4 subjects with EM), and samples from same individuals apparently are treated as biological replicates in the statistical analysis. In contrast, the 5 healthy controls were each sampled only once.

      We agree with the reviewer that the control group is smaller than the patient group. Among the archived samples that are available, the number of adult healthy controls are limited, and sampled once. We used these samples to establish the baseline level of SLPI in the serum. It has been shown that the serum level of SLPI in healthy volunteers is in average about 40 ng/ml  (DOI: 10.3389/fimmu.2019.00664 and 10.1097/00003246-200005000-00003). The median level in the healthy control in our data was 38.92 ng/ml, which is comparable to the previous results. A brief discussion will be added in the revised manuscript.

      (6) Fig. 5. (A) Panel A: does binding occur when intact bacteria are used? (B) Panels B, C: Were bacteria probed with PI to indicate binding likely to occur to surface? How many biological replicates were performed for each panel? Is "antibody control" a no SLPI control? What is the blue line?

      Actively growing B. burgdorferi were collected and used for binding assays. We do not permeabilize the bacteria for flow cytometry. Thus, all the binding detected occurs to the bacterial surface. Three biological replicates were performed for each panel. The antibody control is no SLPI control. For panel D, the bacteria were stained with Hoechst, which shows the morphology of bacteria. We apologize for the missing information. A complete and detailed description of Figure 5 will be provided in the revised manuscript. 

      (7) Sup Fig. 1. (A) Panel A: Was this experiment performed multiple times? I.e., how many biological replicates? (B) Panel B: Strain should be specified.

      The binding assay to B. burgdorferi B31A was performed two times. In panel B, B. burgdorferi B31A3 was used. We apologize for the missing information. A complete and detailed description will be provided in the revised manuscript. 

      (8) Fig. S2. It is not clear that the condition (20% serum) has any bactericidal activity, so the potential protective activity of SLPI cannot be determined. (Typical serum killing assays in the absence of specific antibody utilized 40% serum.)

      In Fig. S2, panel B, the first two bars (without SLPI, with 20% WT anti serum) showed around 40% viability. It indicates that the 20% WT anti serum has bactericidal activity. Serum was collected from B. burgdorferi-infected WT mice at 21 dpi, which should contain polyclonal antibody against B. burgdorferi.

      Reviewer #3 (Recommendations for the authors):

      It was a pleasure to review! I congratulate the authors on this elegant study. I think the manuscript is very well-written and clearly conveys the research outcomes. I only have minor suggestions to improve the readability of the text.

      We greatly appreciate the reviewer’s recognition of our work.

      Line 92: Please briefly summarize the key results of the study at the end of the introduction section.

      We appreciate the reviewer’s suggestion. A brief summary will be added in the revised manuscript.

      Line 108: Why is the inflammation significantly occurred only in ankle joints of SLPI-I mice? Could you please provide a brief explanation?

      The inflammation may also happen in other joints the B. burgdorferi infected SLPI-/- mice, which has not been studied. The study into murine Lyme arthritis has been predominantly done in the tibiotarsal tissue, which displays the most prominent swelling that’s easy to observe and measure. Thus, we focused on the tibiotarsal joint in our study.

      Line 136: Please also include the gene names in Figure 3.

      We apologize for the omission. Gene names will be included in the revised manuscript.

      Line 181: Please briefly introduce BASEHIT. Why did you use this tool? What are the benefits?

      We appreciate the reviewer’s suggestion. We will provide more background information on BASEHIT in the revised manuscript.

    1. Author response:

      We thank the three Reviewers for the extensive evaluation of our work, which was largely positive and constructive. Prompted by their reviews and the many suggestions, we plan to do additional control experiments to add further data in a revised manuscript in order to improve the statistics and quantitation. Furthermore, we plan to expand the discussion. We agree that a more comprehensive mechanistic framework would be welcome but note that the system is a complex multicomponent system which is challenging. We plan to expand the work in future follow-up research.

    1. Author response:

      eLife Assessment

      This important study reveals a role for IκBα in the regulation of embryonic stem cell pluripotency. The solid data in mouse embryonic stem cells include separation of function mutations in IκBα to dissect its non-canonical role as a chromatin regulator and its canonical function as NF-κB inhibitor. The conclusions could be strengthened by including better markers of differentiation status and additional controls or orthogonal approaches.

      We are thankful to the two reviewers and editors for their kind feedback and for highlighting the impact of NF-kB-independent IkBa function in stabilizing naïve pluripotency.

      In order to address reviewer’s comments, we will perform further analysis of differentiation trajectories, as well as a deeper comparison of the epigenetic features in our IkBa-KO mESCs with the Serum/LIF and 2i/LIF conditions. Moreover, we recognize that some sentences need to be modified to soften our conclusions in terms of effects on block in the naïve state or the global epigenetic effects, as the reviewers pointed out.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study probes the role of the NF-κB inhibitor IκBa in the regulation of pluripotency in mouse embyronic stem cells (mESCs). It follows from previous work that identified a chromatin-specific role for IκBa in the regulation of tissue stem cell differentiation. The work presented here shows that a fraction of IκBa specifically associates with chromatin in pluripotent stem cells. Using three Nfkbia-knockout lines, the authors show that IκBa ablation impairs the exit from pluripotency, with embryonic bodies (an in vitro model of mESC multi-lineage differentiation) still expressing high levels of pluripotency markers after sustained exposure to differentiation signals. The maintenance of aberrant pluripotency gene expression under differentiation conditions is accompanied by pluripotency-associated epigenetic profiles of DNA methylation and histone marks. Using elegant separation of function mutants identified in a separate study, the authors generate versions of IκBa that are either impaired in histone/chromatin binding or NF-κB binding. They show that the provision of the WT IκBa, or the NF-κB-binding mutant can rescue the changes in gene expression driven by loss of IκBa, but the chromatin-binding mutant can not. Thus the study identifies a chromatin-specific, NF-κB-independent role of IκBa as a regulator of exit from pluripotency.

      Strengths:

      The strengths of the manuscript lie in: (a) the use of several orthogonal assays to support the conclusions on the effects of exit from pluripotency; (b) the use of three independent clonal Nfkbia-KO mESC lines (lacking IκBa), which increase confidence in the conclusions; and (c) the use of separation of function mutants to determine the relative contributions of the chromatin-associated and NF-κB-associated IκBa, which would otherwise be very difficult to unpick.

      Weaknesses:

      In this reviewer's view, the term "differentiation" is used inappropriately in this manuscript. The data showing aberrant expression of pluripotency markers during embryoid body formation are supported by several lines of evidence and are convincing. However, the authors call the phenotype of Nfkbia-KO cells a "differentiation impairment" while the data on differentiation markers are not shown (beyond the fact that H3K4me1, marking poised enhancers, is reduced in genes underlying GO processes associated with differentiation and organ development). Data on differentiation marker expression from the transcriptomic and embryoid body immunofluorescent experiments, for example, should be at hand without the need to conduct many more experiments and would help to support the conclusions of the study or make them more specific. The lack of probing the differentiation versus pluripotency genes may be a missed opportunity in gaining in-depth understanding of the phenotype associated with loss of the chromatin-associated function of IκBa.

      Specific answer to weaknesses for Reviewer 1:

      We have data showing the lack of expression of specific differentiation markers that we will add to the manuscript. Moreover, we will also globally analyse differentiation markers in our transcriptomic data to have a more accurate description of the phenotype.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the role of IκBα in regulating mouse embryonic stem cell (ESC) pluripotency and differentiation. The authors demonstrate that IκBα knockout impairs the exit from the naïve pluripotent state during embryoid body differentiation. Through mechanistic studies using various mutants, they show that IκBα regulates ESC differentiation through chromatin-related functions, independent of the canonical NF-κB pathway.

      Strengths:

      The authors nicely investigate the role of IκBα in pluripotency exit, using embryoid body formation and complementing the phenotypic analysis with a number of genome-wide approaches, including transcriptomic, histone marks deposition, and DNA methylation analyses. Moreover, they generate a first-of-its-kind mutant set that allows them to uncouple IκBα's function in chromatin regulation versus its NF-κB-related functions. This work contributes to our understanding of cellular plasticity and development, potentially interesting a broad audience including developmental biologists, chromatin biology researchers, and cell signaling experts.

      Weaknesses:

      - The study's main limitation is the lack of crucial controls using bona fide naïve cells across key experiments, including DNA methylation analysis, gene expression profiling in embryoid bodies, and histone mark deposition. This omission makes it difficult to evaluate whether the observed changes in IκBα-KO cells truly reflect naïve pluripotency characteristics.

      - Several conclusions in the manuscript require a more measured interpretation. The authors should revise their statements regarding the strength of the pluripotency exit block, the extent of hypomethylation, and the global nature of chromatin changes.

      - From a methodological perspective, the manuscript would benefit from additional orthogonal approaches to strengthen the knockout findings, which may be influenced by clonal expansion of ES cells.

      Overall, this study makes an important contribution to the field. However, the concerns raised regarding controls, data interpretation, and methodology should be addressed to strengthen the manuscript and support the authors' conclusions.

      Specific answer to weaknesses for Reviewer 2:

      - As the reviewer pointed out, we have not performed all the analysis by comparing with cells in 2i LIF since our initial study was focused on Serum LIF and differentiation. However, it was the transcriptome analysis in Serum LIF which showed that KO cells resembled naïve ES cells in 2i LIF by GSEA. We have repeated key experiments with all conditions (Figure 1B, 1D, Figure 3C and 3), but we do not think that repeating all ‘omics’ experiments with 2i LIF conditions will add important information. Nevertheless, we will analyze different chromatin data (DNA methylation and different histone post-translational modifications) from previously published works in 2i/LIF and Serum/LIF and compare them with our IκBα-WT and IκBα-KO mESCs to better confirm the stabilization of the ground state pluripotency in IκBα-KO mESCs under Serum/LIF conditions.

      - We agree about reducing the strength of the pluripotency exit block, extend of hypomethylation and the global nature of chromatin changes. There are many changes in the chromatin that we are trying to better characterize by HiC in ongoing studies that are out of the scope of this manuscript.

      We have performed studies in 3 different IkBa KO and WT clones. In addition, the reconstitution studies with IkBa separation-of-function (SOF) mutants with differential effect after expressing the NFkB binding form (IkBaDChrom) or the chromatin binding form (IkBaDNFkB) also support the robustness of this phenotype.

    1. Author response:

      We thank the three reviewers for their insightful feedback. We look forward to addressing the raised concerns in a revised version of the manuscript. There were a few common themes among the reviews that we will briefly touch upon now, and we will provide more details in the revised manuscript. 

      First, the reviewers asked for the reasoning behind the task ratios we implemented for the different attentional width conditions. The different ratios were selected to be as similar as possible given the size and spacing of our stimuli (aside from the narrowest cue width of one bin, the ratios for the others were 0.67, 0.60, and 0.67). As Figure 1b shows, task accuracy showed small and non-monotonic changes across the three larger cue widths, dissociable from the monotonic pattern seen for the model-estimated width of the attentional field. Furthermore, prior work has indicated that there is a relationship between task difficulty and the overall magnitude of the BOLD response, however we don’t suspect that this will influence the width of the modulation. How task difficulty influences the BOLD response is an important topic, and we hope that future work will investigate this relationship more directly.   

      Second, reviewers expressed interest in the distribution of spatial attention in higher visual areas. In our study we focus only on early visual regions (V1-V3). This was primarily driven by pragmatic considerations, in that we only have retinotopic estimates for our participants in these early visual areas. Our modeling approach is dependent on having access to the population receptive field estimates for all voxels, and while the main experiment was scanned using whole brain coverage, retinotopy was measured in a separate session using a field of view only covering the occipital cortex.  

      Lastly, we appreciate the opportunity to clarify the purpose of the temporal interval analysis. The reviewer is correct in assuming we set out to test how much data is needed to recover the cortical modulation and how dynamic a signal the method can capture. This analysis does show that more data provided more reliable estimates. The more important finding, however, is that the model was still able to recover the location and width of the attentional cue at shorter timescales of as few as two TRs. This has implications for the potential applicability of our approach to paradigms that involve more dynamic adaptation of the attentional field.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This a comprehensive study that sheds light on how Wag31 functions and localises in mycobacterial cells. A clear link to interactions with CL is shown using a combination of microscopy in combination with fusion fluorescent constructs, and lipid specific dyes. Furthermore, studies using mutant versions of Wag31 shed light on the functionalities of each domain in the protein. My concerns/suggestions for the manuscript are minor:

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect on levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.

      We thank the reviewer for the comments. We will improve Ln130 in the manuscript. The lipid classes that get impacted by the depletion of Wag31 vs overexpression are different. Wag31 is an adaptor protein that interacts with proteins of the ACCase complex (Meniche et al., 2014; Xu et al., 2014) that synthesize fatty acid precursors and regulate their activity (Habibi Arejan et al., 2022).

      The varied response to lipid homeostasis could be attributed to a change in the stoichiometry of these interactions with Wag31. While Wag31 depletion would prevent such interactions from occurring and might affect lipid synthesis that directly depends on Wag31-protein partner interactions, its overexpression would lead to promiscuous interactions and a change in the stoichiometry of native interactions, ultimately modulating lipid synthesis pathways.

      (2) The pulldown assays results are interesting, but links are tentative.

      The interactome of Wag31 was identified through the immunoprecipitation of Flag-tagged Wag31 complemented at an integrative locus in Wag31 mutant background to avoid overexpression artifacts. We used Msm::gfp expressing an integrative copy (at L5 locus) of FLAG-GFP as a control to subtract non-specific interactions. The experiment was performed in biological triplicates, and interactors that appeared in all replicates were selected for further analysis. Although we identified more than 100 interactors of Wag31, we analyzed only the top 25 hits, with a PSM cut-off ≥18 and unique peptides≥5. Additionally, two of Wag31's established interactors, AccD5 and Rne, were among the top five hits, thus validating our data.

      Though we agree that the interactions can either be direct or through a third partner, the fact that we obtained known interactors of Wag31 makes us believe these interactions are genuine. Moreover, we performed pulldown experiments for validation by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100, eliminating all non-specific and indirect interactions.  However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript. 

      (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

      In this manuscript, we are trying to convey that Wag31 is a spatiotemporal regulator of lipid metabolism. It is a peripheral protein that is hooked to the membrane via Cardiolipin and forms a scaffold at the poles, which helps localize several enzymes involved in lipid metabolism.

      Homeostasis is the process by which an organism maintains a steady-state of balance and stability in response to changes.  Depletion of Wag31 not only results in delocalisation of lipids in intracellular lipid inclusions but also leads to changes in the levels of various lipid classes. Advancement in the field of spatial biology underscores the importance of native localization of various biological molecules crucial for maintaining a steady-cell of the cell. Hence, we have used the word “homeostasis” to describe both the changes observed in lipid metabolism.

      Reviewer #2 (Public review):

      Summary

      Kapoor et. al. investigated the role of the mycobacterial protein Wag31 in lipid and peptidoglycan synthesis and sought to delineate the role of the N- and C- terminal domains of Wag31. They demonstrated that modulating Wag31 levels influences lipid homeostasis in M. smegmatis and cardiolipin (CL) localisation in cells. Wag31 was found to preferentially bind CL-containing liposomes, and deleting the N-terminus of the protein significantly decreased this interaction. Novel interactions between Wag31 and proteins involved in lipid metabolism and cell wall synthesis were identified, suggesting that Wag31 recruits proteins to the intracellular membrane domain by direct interaction.

      Strengths:

      (1) The importance of Wag31 in maintaining lipid homeostasis is supported by several lines of evidence.

      (2) The interaction between Wag31 and cardiolipin, and the role of the N-terminus in this interaction was convincingly demonstrated.

      Weaknesses:

      (1) MS experiments provide some evidence for novel protein-protein interactions. However, the pull-down experiments lack a valid negative control.

      We thank the reviewer for the comments. We will include a valid negative control in the experiment. We would choose ~2 mycobacterial proteins that are not a part of our interactome study and perform a similar pull-down experiment with them and a positive control (known interactor of Wag31).

      (2) The role of the N-terminus in the protein-protein interaction has not been ruled out.

      Previously, we attempted to express the N-terminal (1-60 aa) and the C-terminal (60-212 aa) proteins in various mycobacterial shuttle vectors to perform MS/MS experiments. Despite numerous efforts, neither was expressed with the N/C-terminal FLAG tag nor without any tag in episomal or integrative vectors due to the instability of the protein. Eventually, we successfully expressed the C-terminal Wag31 with an N and C-terminal hexa-His tag. However, this expression was not sufficient or stable enough for us to perform Ni affinity pull-down experiments for mass spectrometry.  The N-terminal of Wag31 could not be expressed in M. smegmatis even with N and C-terminal Hexa-His tags.

      To rule out the role of the N-terminal in mediating protein-protein interactions, we plan to attempt to express N-terminal of Wag31with N and C-terminal hexa-His tag in E. coli. If this clone successfully expresses in E. coli, we will perform pull-down experiments as described in Figure 7.

      Reviewer #3 (Public review):

      Summary:

      This manuscript describes the characterization of mycobacterial cytoskeleton protein Wag31, examining its role in orchestrating protein-lipid and protein-protein interactions essential for mycobacterial survival. The most significant finding is that Wag31, which directs polar elongation and maintains the intracellular membrane domain, was revealed to have membrane tethering capabilities.

      Strengths:

      The authors provided a detailed analysis of Wag31 domain architecture, revealing distinct functional roles: the N-terminal domain facilitates lipid binding and membrane tethering, while the C-terminal domain mediates protein-protein interactions. Overall, this study offers a robust and new understanding of Wag31 function.

      Weaknesses:

      The following major concerns should be addressed.

      • Authors use 10-N-Nonyl-acridine orange (NAO) as a marker for cardiolipin localization. However, given that NAO is known to bind to various anionic phospholipids, how do the authors know that what they are seeing is specifically visualizing cardiolipin and not a different anionic phospholipid? For example, phosphatidylinositol is another abundant anionic phospholipid in mycobacterial plasma membrane.

      We thank the reviewer for the comments. Despite its promiscuous binding to other anionic phospholipids, 10-N-Nonyl-acridine orange is widely used to stain Cardiolipin and determine its localisation in bacterial cells and mitochondria of eukaryotes (Garcia Fernandez et al., 2004; Mileykovskaya & Dowhan, 2000; Renner & Weibel, 2011).  This is because it has a stronger affinity for Cardiolipin than other anionic phospholipids with the affinity constant being 2 × 10<sup>6</sup> M<sup>−1</sup> for Cardiolipin association and 7 × 10<sup>4</sup> M<sup>−1</sup> for that of phosphatidylserine and phosphatidylinositol association (Petit et al., 1992). Additionally, there is not yet another stain available for detecting Cardiolipin. Our protein-lipid binding assays suggest that Wag31 preferentially binds to Cardiolipin over other anionic phospholipids (Fig. 4b), hence it is likely that the majority of redistribution of NAO fluorescence that we observe might be contributed by Cardiolipin mislocalization due to altered Wag31 levels, with smaller degree of NAO redistribution intensity coming indirectly from other anionic phospholipids displaced from the membrane due to the loss of membrane integrity and cell shape changes due to Wag31.

      • Authors' data show that the N-terminal region of Wag31 is important for membrane tethering. The authors' data also show that the N-terminal region is important for sustaining mycobacterial morphology. However, the authors' statement in Line 256 "These results highlight the importance of tethering for sustaining mycobacterial morphology and survival" requires additional proof. It remains possible that the N-terminal region has another unknown activity, and this yet-unknown activity rather than the membrane tethering activity drives the morphological maintenance. Similarly, the N-terminal region is important for lipid homeostasis, but the statement in Line 270, "the maintenance of lipid homeostasis by Wag31 is a consequence of its tethering activity" requires additional proof. The authors should tone down these overstatements or provide additional data to support their claims.

      We agree with the reviewer that there exists a possibility for another function of the N-terminal that may contribute to sustaining mycobacterial physiology and survival. We would revise our statements in the paper to accurately reflect the data. Results shown suggest that the tethering activity of the N-terminal region may contribute to mycobacterial morphology and survival. However, additional functions of this region can’t be ruled out. Similarly, the maintenance of lipid homeostasis by Wag31 may be associated with its tethering activity, although other mechanisms could also contribute to this process. 

      • Authors suggest that Wag31 acts as a scaffold for the IMD (Fig. 8). However, Meniche et. al. has shown that MurG as well as GlfT2, two well-characterized IMD proteins, do not colocalize with Wag31 (DivIVA) (https://doi.org/10.1073/pnas.1402158111). IMD proteins are always slightly subpolar while Wag31 is located to the tip of the cell. Therefore, the authors' biochemical data cannot be easily reconciled with microscopic observations in the literature. This raises a question regarding the validity of protein-protein interaction shown in Figure 7. Since this pull-down assay was conducted by mixing E. coli lysate expressing Wag31 and Msm lysate expression Wag31 interactors like MurG, it is possible that the interactions are not direct. Authors should interpret their data more cautiously. If authors cannot provide additional data and sufficient justifications, they should avoid proposing a confusing model like Figure 8 that contradicts published observations.

      In the literature, MurG and GlfT2 have been shown to have polar localization (Freeman et al., 2023; Hayashi et al., 2016; Kado et al., 2023), and two groups have shown slightly sub-polar localization of MurG (García-Heredia et al., 2021; Meniche et al., 2014). Additionally, (Freeman et al., 2023) they showed SepIVA to be a spatio-temporal regulator of MurG. MS/MS analysis of Wag31 immunoprecipitation data yielded both MurG and SepIVA to be interactors of Wag31 (Fig. 3). Given Wag31 also displays polar localisation, it likely associates with the polar MurG. However, since a sub-polar localization of MurG has also been reported, it is possible that they do not interact directly, and another protein mediates their interaction. We will modify the model proposed in Fig. 8 based on the above.

      We agree that for validation of interaction, we performed pulldown experiments by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer containing 1% Triton X100, which eliminates all non-specific and indirect interactions.  However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript and propose a model reflecting our results.

      References:

      Freeman, A. H., Tembiwa, K., Brenner, J. R., Chase, M. R., Fortune, S. M., Morita, Y. S., & Boutte, C. C. (2023). Arginine methylation sites on SepIVA help balance elongation and septation in Mycobacterium smegmatis. Mol Microbiol, 119(2), 208-223. https://doi.org/10.1111/mmi.15006

      Garcia Fernandez, M. I., Ceccarelli, D., & Muscatello, U. (2004). Use of the fluorescent dye 10-N-nonyl acridine orange in quantitative and location assays of cardiolipin: a study on different experimental models. Anal Biochem, 328(2), 174-180. https://doi.org/10.1016/j.ab.2004.01.020

      García-Heredia, A., Kado, T., Sein, C. E., Puffal, J., Osman, S. H., Judd, J., Gray, T. A., Morita, Y. S., & Siegrist, M. S. (2021). Membrane-partitioned cell wall synthesis in mycobacteria. eLife, 10. https://doi.org/10.7554/eLife.60263

      Habibi Arejan, N., Ensinck, D., Diacovich, L., Patel, P. B., Quintanilla, S. Y., Emami Saleh, A., Gramajo, H., & Boutte, C. C. (2022). Polar protein Wag31 both activates and inhibits cell wall metabolism at the poles and septum. Front Microbiol, 13, 1085918. https://doi.org/10.3389/fmicb.2022.1085918

      Hayashi, J. M., Luo, C. Y., Mayfield, J. A., Hsu, T., Fukuda, T., Walfield, A. L., Giffen, S. R., Leszyk, J. D., Baer, C. E., Bennion, O. T., Madduri, A., Shaffer, S. A., Aldridge, B. B., Sassetti, C. M., Sandler, S. J., Kinoshita, T., Moody, D. B., & Morita, Y. S. (2016). Spatially distinct and metabolically active membrane domain in mycobacteria. Proc Natl Acad Sci U S A, 113(19), 5400-5405. https://doi.org/10.1073/pnas.1525165113

      Kado, T., Akbary, Z., Motooka, D., Sparks, I. L., Melzer, E. S., Nakamura, S., Rojas, E. R., Morita, Y. S., & Siegrist, M. S. (2023). A cell wall synthase accelerates plasma membrane partitioning in mycobacteria. eLife, 12, e81924. https://doi.org/10.7554/eLife.81924

      Meniche, X., Otten, R., Siegrist, M. S., Baer, C. E., Murphy, K. C., Bertozzi, C. R., & Sassetti, C. M. (2014). Subpolar addition of new cell wall is directed by DivIVA in mycobacteria. Proc Natl Acad Sci U S A, 111(31), E3243-3251. https://doi.org/10.1073/pnas.1402158111

      Mileykovskaya, E., & Dowhan, W. (2000). Visualization of phospholipid domains in Escherichia coli by using the cardiolipin-specific fluorescent dye 10-N-nonyl acridine orange. J Bacteriol, 182(4), 1172-1175. https://doi.org/10.1128/JB.182.4.1172-1175.2000

      Petit, J. M., Maftah, A., Ratinaud, M. H., & Julien, R. (1992). 10N-nonyl acridine orange interacts with cardiolipin and allows the quantification of this phospholipid in isolated mitochondria. Eur J Biochem, 209(1), 267-273. https://doi.org/10.1111/j.1432-1033.1992.tb17285.x

      Renner, L. D., & Weibel, D. B. (2011). Cardiolipin microdomains localize to negatively curved regions of Escherichia coli membranes. Proc Natl Acad Sci U S A, 108(15), 6264-6269. https://doi.org/10.1073/pnas.1015757108

      Xu, W. X., Zhang, L., Mai, J. T., Peng, R. C., Yang, E. Z., Peng, C., & Wang, H. H. (2014). The Wag31 protein interacts with AccA3 and coordinates cell wall lipid permeability and lipophilic drug resistance in Mycobacterium smegmatis. Biochem Biophys Res Commun, 448(3), 255-260. https://doi.org/10.1016/j.bbrc.2014.04.116

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for the authors):

      I am generally satisfied with the authors' revisions and response to my previous comments. I have amended my previous review.

      We thank Reviewer #1 for his valuable comments and suggestions, which improved this manuscript.

      Thank you for considering the comments in your revised version. I still feel a strong mismatch between the claims of optimal foraging behaviour and the results with little compelling evidence.

      On terminology: MTR means Migration Traffic Rates. The authors responded that in their study, MTR is defined as Movement traffic rates. I have two problems with this definition: i) it creates confusion in the literature on the definition of MTR, ii) a traffic inherently describes a movement, and this pleonasm is not necessary.

      We revised the acronyms in this article, replacing MTR with MoTR to clearly distinguish between Migration Traffic Rate (MTR) and Movement Traffic Rate (MoTR).

      Minimal size of insects: Please detail radar settings (power sent, STC; detection thresholds). These parameters define the minimal size of the detected animals.

      We added the following paragraph to provide additional information regarding the radar's detection capabilities:

      " with decreasing detection probability at increasing altitudes. The detection threshold, defined by the STC setting, was 93 dBm, and the transmit power was 25 kW."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study asks whether the phenomenon of crossmodal temporal recalibration, i.e. the adjustment of time perception by consistent temporal mismatches across the senses, can be explained by the concept of multisensory causal inference. In particular, they ask whether the explanation offered by causal inference better explains temporal recalibration better than a model assuming that crossmodal stimuli are always integrated, regardless of how discrepant they are.

      The study is motivated by previous work in the spatial domain, where it has been shown consistently across studies that the use of crossmodal spatial information is explained by the concept of multisensory causal inference. It is also motivated by the observation that the behavioral data showcasing temporal recalibration feature nonlinearities that, by their nature, cannot be explained by a fixed integration model (sometimes also called mandatory fusion).

      To probe this the authors implemented a sophisticated experiment that probed temporal recalibration in several sessions. They then fit the data using the two classes of candidate models and rely on model criteria to provide evidence for their conclusion. The study is sophisticated, conceptually and technically state-of-the-art, and theoretically grounded. The data clearly support the authors’ conclusions.

      I find the conceptual advance somewhat limited. First, by design, the fixed integration model cannot explain data with a nonlinear dependency on multisensory discrepancy, as already explained in many studies on spatial multisensory perception. Hence, it is not surprising that the causal inference model better fits the data.

      We have addressed this comment by including an asynchrony-contingent model, which is capable of predicting the nonlinearity of recalibration effects by employing a heuristic approximation of the causal-inference process (Fig. 3). We also updated the previous competitor model with a more reasonable asynchrony-correction model as the baseline of model comparison, which assumes recalibration aims to restore synchrony whenever the sensory measurement of SOA indicates an asynchrony. The causal-inference model outperformed both models, as indicated by model evidence (Fig. 4A). Furthermore, model predictions show that the causal-inference model more accurately captures recalibration at large SOAs at both the group (Fig. 4B) and the individual levels (Fig. S4).

      Second, and again similar to studies on spatial paradigms, the causal inference model fails to predict the behavioral data for large discrepancies. The model predictions in Figure 5 show the (expected) vanishing recalibration for large delta, while the behavioral data don’t decay to zero. Either the range of tested SOAs is too small to show that both the model and data converge to the same vanishing effect at large SOAs, or the model's formula is not the best for explaining the data. Again, the studies using spatial paradigms have the same problem, but in my view, this poses the most interesting question here.

      We included an additional simulation (Fig. 5B) to show that the causal-inference model can predict non-zero recalibration for long adapter SOAs, especially in observers with a high common-cause prior and low sensory precision. This ability to predict a non-zero recalibration effect even at large SOA, such as 0.7 s, is one key feature of the causal-inference model that distinguishes it from the asynchrony-contingent model.

      In my view there is nothing generally wrong with the study, it does extend the 'known' to another type of paradigm. However, it covers little new ground on the conceptual side.

      On that note, the small sample size of n=10 is likely not an issue, but still, it is on the very low end for this type of study.

      This study used a within-subject design, which included 3 phases each repeated in 9 sessions, totaling 13.5 hours per participant. This extensive data collection allows us to better constrain the model for each participant. Our conclusions are based on the different models’ ability to fit individual data.

      Reviewer #2 (Public Review):

      Summary:

      Li et al.’s goal is to understand the mechanisms of audiovisual temporal recalibration. This is an interesting challenge that the brain readily solves in order to compensate for real-world latency differences in the time of arrival of audio/visual signals. To do this they perform a 3-phase recalibration experiment on 9 observers that involves a temporal order judgment (TOJ) pretest and posttest (in which observers are required to judge whether an auditory and visual stimulus were coincident, auditory leading or visual leading) and a conditioning phase in which participants are exposed to a sequence of AV stimuli with a particular temporal disparity. Participants are required to monitor both streams of information for infrequent oddballs, before being tested again in the TOJ, although this time there are 3 conditioning trials for every 1 TOJ trial. Like many previous studies, they demonstrate that conditioning stimuli shift the point of subjective simultaneity (pss) in the direction of the exposure sequence.

      These shifts are modest - maxing out at around -50 ms for auditory leading sequences and slightly less than that for visual leading sequences. Similar effects are observed even for the longest offsets where it seems unlikely listeners would perceive the stimuli as synchronous (and therefore under a causal inference model you might intuitively expect no recalibration, and indeed simulations in Figure 5 seem to predict exactly that which isn't what most of their human observers did). Overall I think their data contribute evidence that a causal inference step is likely included within the process of recalibration.

      Strengths:

      The manuscript performs comprehensive testing over 9 days and 100s of trials and accompanies this with mathematical models to explain the data. The paper is reasonably clearly written and the data appear to support the conclusions.

      Weaknesses:

      While I believe the data contribute evidence that a causal inference step is likely included within the process of recalibration, this to my mind is not a mechanism but might be seen more as a logical checkpoint to determine whether whatever underlying neuronal mechanism actually instantiates the recalibration should be triggered.

      We have addressed this comment by replacing the fixed-update model with an asynchrony-correction model, which assumes that the system first evaluates whether the measurement of SOA is asynchronous, thus indicating a need for recalibration (Fig. 3). If it does, it shifts the audiovisual bias by a proportion of the measured SOA. We additionally included an asynchrony-contingent model, which is capable of replicating the nonlinearity of recalibration effects by a heuristic approximation of the causal-inference process.

      Model comparisons indicate that the causal-inference model of temporal recalibration outperforms both alternative models (Fig. 4A). Furthermore, the model predictions demonstrate that the causal-inference model more accurately captures recalibration at large SOAs at both the group level (Fig. 4B) and individual level (Fig. S4).

      The authors’ causal inference model strongly predicts that there should be no recalibration for stimuli at 0.7 ms offset, yet only 3/9 participants appear to show this effect. They note that a significant difference in their design and that of others is the inclusion of longer lags, which are unlikely to originate from the same source, but don’t offer any explanation for this key difference between their data and the predictions of a causal inference model.

      We added further simulations to show that the causal-inference model can predict non-zero recalibration also for longer adapter SOAs, especially in observers with a large common-cause prior (Fig. 5A) and low sensory precision (Fig. 5B). This ability to predict a non-zero recalibration effect even at longer adapter SOAs, such as 0.7 s, is a key feature of the causal-inference model that distinguishes it from the asynchrony-contingent model.

      I’m also not completely convinced that the causal inference model isn’t ‘best’ simply because it has sufficient free parameters to capture the noise in the data. The tested models do not (I think) have equivalent complexity - the causal inference model fits best, but has more parameters with which to fit the data. Moreover, while it fits ‘best’, is it a good model? Figure S6 is useful in this regard but is not completely clear - are the red dots the actual data or the causal inference prediction? This suggests that it does fit the data very well, but is this based on predicting held-out data, or is it just that by having more parameters it can better capture the noise? Similarly, S7 is a potentially useful figure but it's not clear what is data and what are model predictions (what are the differences between each row for each participant; are they two different models or pre-test post-test or data and model prediction?!).

      I'm not an expert on the implementation of such models but my reading of the supplemental methods is that the model is fit using all the data rather than fit and tested on held-out data. This seems problematic.

      We recognize the risk of overfitting with the causal-inference model. We now rely on Bayesian model comparisons, which use model evidence for model selection. This method automatically incorporates a penalty for model complexity through the marginalization over the parameter space (MacKay, 2003).

      Our design is not suitable for cross-validation because the model-fitting process is computationally intensive and time-consuming. Each fit of the causal-inference model takes approximately 30 hours, and multiple fits with different initial starting points are required to rule out that the parameter estimates correspond to local minima.

      I would have liked to have seen more individual participant data (which is currently in the supplemental materials, albeit in a not very clear manner as discussed above).

      We have revised Supplementary Figures S4-S6 to show additional model predictions of the recalibration effect for individual participants, and participants’ temporal-order judgments are now shown in Supplement Figure S7. These figures confirm the better performance of the causal-inference model.

      The way that S3 is described in the text (line 141) makes it sound like everyone was in the same direction, however, it is clear that 2 /9 listeners show the opposite pattern, and 2 have confidence intervals close to zero (albeit on the -ve side).

      We have revised the text to clarify that the asymmetry occurs in both directions and is idiosyncratic (lines 168-171). We summarized the distribution of the individual asymmetries of the recalibration effect across visual-leading and auditory-leading adapter SOAs in Supplementary Figure S2.

      Reviewer #3 (Public Review):

      Summary:

      Li et al. describe an audiovisual temporal recalibration experiment in which participants perform baseline sessions of ternary order judgments about audiovisual stimulus pairs with various stimulus-onset asynchronies (SOAs). These are followed by adaptation at several adapting SOAs (each on a different day), followed by post-adaptation sessions to assess changes in psychometric functions. The key novelty is the formal specification and application/fit of a causal-inference model for the perception of relative timing, providing simulated predictions for the complete set of psychometric functions both pre and post-adaptation.

      Strengths:

      (1) Formal models are preferable to vague theoretical statements about a process, and prior to this work, certain accounts of temporal recalibration (specifically those that do not rely on a population code) had only qualitative theoretical statements to explain how/why the magnitude of recalibration changes non-linearly with the stimulus-onset asynchrony of the adapter.

      (2) The experiment is appropriate, the methods are well described, and the average model prediction is a fairly good match to the average data (Figure 4). Conclusions may be overstated slightly, but seem to be essentially supported by the data and modelling.

      (3) The work should be impactful. There seems a good chance that this will become the go-to modelling framework for those exploring non-population-code accounts of temporal recalibration (or comparing them with population-code accounts).

      (4) A key issue for the generality of the model, specifically in terms of recalibration asymmetries reported by other authors that are inconsistent with those reported here, is properly acknowledged in the discussion.

      Weaknesses:

      (1) The evidence for the model comes in two forms. First, two trends in the data (non-linearity and asymmetry) are illustrated, and the model is shown to be capable of delivering patterns like these. Second, the model is compared, via AIC, to three other models. However, the main comparison models are clearly not going to fit the data very well, so the fact that the new model fits better does not seem all that compelling. I would suggest that the authors consider a comparison with the atheoretical model they use to first illustrate the data (in Figure 2). This model fits all sessions but with complete freedom to move the bias around (whereas the new model constrains the way bias changes via a principled account). The atheoretical model will obviously fit better, but will have many more free parameters, so a comparison via AIC/BIC or similar should be informative

      In the revised manuscript, we switched from AIC to Bayesian model selection, which approximates and compares model evidence. This method incorporates a strong penalty for model complexity through marginalization over the parameter space (MacKay, 2003).

      We have addressed this comment by updating the former competitor model into a more reasonable version that induces recalibration only for some measured SOAs and by including another (asynchrony-contingent) model that is capable of predicting the nonlinearity and asymmetry of recalibration (Fig. 3) while heuristically approximating the causal inference computations. The causal-inference model outperformed the asynchrony-contingent model, as indicated by model evidence (Fig. 4A). Furthermore, model predictions show that the causal-inference model more accurately captures recalibration at large SOAs at both the group (Fig. 4B) and the individual level (Fig. S4).

      (2) It does not appear that some key comparisons have been subjected to appropriate inferential statistical tests. Specifically, lines 196-207 - presumably this is the mean (and SD or SE) change in AIC between models across the group of 9 observers. So are these differences actually significant, for example via t-test?

      We statistically compared the models using Bayes factors (Fig. 4A). The model evidence for each model was approximated using Variational Bayesian Monte Carlo. Bayes factors provided strong evidence in support of the causal-inference model relative to the other models.

      (3) The manuscript tends to gloss over the population-code account of temporal recalibration, which can already provide a quantitative account of how the magnitude of recalibration varies with adapter SOA. This could be better acknowledged, and the features a population code may struggle with (asymmetry?) are considered.

      We simulated a population-code model to examine its prediction of the recalibration effect for different adapter SOAs (lines 380–388, Supplement Section 8). The population-code model can predict the nonlinearity of recalibration, i.e., a decreasing recalibration effect as the adapter SOA increases. However, to capture the asymmetry of recalibration effects across auditory-leading and visual-leading adapter stimuli, we would need to assume that the auditory-leading and visual-leading SOAs are represented by neural populations with unequal tuning curves.

      (4) The engagement with relevant past literature seems a little thin. Firstly, papers that have applied causal inference modeling to judgments of relative timing are overlooked (see references below). There should be greater clarity regarding how the modelling here builds on or differs from these previous papers (most obviously in terms of additionally modelling the recalibration process, but other details may vary too). Secondly, there is no discussion of previous findings like that in Fujisaki et al.’s seminal work on recalibration, where the spatial overlap of the audio and visual events didn’t seem to matter (although admittedly this was an N = 2 control experiment). This kind of finding would seem relevant to a causal inference account.

      References:

      Magnotti JF, Ma WJ and Beauchamp MS (2013) Causal inference of asynchronous audiovisual speech. Front. Psychol. 4:798. doi: 10.3389/fpsyg.2013.00798

      Sato, Y. (2021). Comparing Bayesian models for simultaneity judgement with different causal assumptions. J. Math. Psychol., 102, 102521.

      We have revised the Introduction and Discussion to better situate our study within the existing literature. Specifically, we have incorporated the suggested references (lines 66–69) and provided clearer distinctions on how our modeling approach builds on or differs from previous work on causal-inference models, particularly in terms of modeling the recalibration process (lines 75–79). Additionally, we have discussed findings that might contradict the assumptions of the causal-inference model (lines 405–424).

      (5) As a minor point, the model relies on simulation, which may limit its take-up/application by others in the field.

      Upon acceptance, we will publicly share the code for all models (simulation and parameter fitting) to enable researchers to adapt and apply these models to their own data.

      (6) There is little in the way of reassurance regarding the model’s identifiability and recoverability. The authors might for example consider some parameter recovery simulations or similar.

      We conducted a model recovery for each of the six models described in the main text and confirmed that the asynchrony-contingent and causal-inference models are identifiable (Supplement Section 11). Simulations of the asynchrony-correction model were sometimes best fit by causal-inference models, because the latter behaves similarly when the prior of a common cause is set to one.

      We also conducted a parameter recovery for the winning model, the causal-inference model with modality-specific precision (Supplement Section 13).

      Key parameters, including audiovisual bias  , amount of auditory latency noise  , amount of visual latency noise  , criterion, lapse rate  showed satisfactory recovery performance. The less accurate recovery of  is likely due to a tradeoff with learning rate  .

      (7) I don't recall any statements about open science and the availability of code and data.

      Upon acceptance of the manuscript, all code (simulation and parameter fitting) and data will be made available on OSF and publicly available.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      In addition to the comments below, we would like to offer the following summary based on the discussion between reviewers:

      The major shortcoming of the work is that there should ideally be a bit more evidence to support the model, over and above a demonstration that it captures important trends and beats an account that was already known to be wrong. We suggest you:

      (1) Revise the figure legends (Figure 5 and Figure 6E).

      We revised all figures and figure legends.

      (2) Additionally report model differences in terms of BIC (which will favour the preferred model less under the current analysis);

      We now base the model comparison on Bayesian model selection, which approximates and compares model evidence. This method incorporates a strong penalty for model complexity through marginalization over the parameter space (MacKay, 2003).

      (3) Move to instead fitting the models multiple times in order to get leave-one-out estimates of best-fitting loglikelihood for each left-out data point (and then sum those for the comparison metric).

      Unfortunately, our design is not suitable for cross-validation methods because the model-fitting process is computationally intensive and time-consuming. Each fit of the causal-inference model takes approximately 30 hours, and multiple fits with different initial starting points are required to rule out local minima.

      (4) Offering a comparison with a more convincing model (for example an atheoretical fit with free parameters for all adapters, e.g. as suggested by Reviewer 3.

      We updated the previous competitor model and included an asynchrony-contingent model, which is capable of predicting the nonlinearity of recalibration (Fig. 3). The causal-inference model still outperformed the asynchrony-contingent model (Fig. 4A). Furthermore, model predictions show that only the causal-inference model captures non-zero recalibration effects for long adapter SOAs at both the group level (Fig. 4B) and individual level (Figure S4).

      Reviewer #1 (Recommendations For The Authors):

      A larger sample size would be better.

      This study used a within-subject design, which included 9 sessions, totaling 13.5 hours per participant. This extensive data collection allows us to better constrain the model for each participant. Our conclusions are based on the different models’ ability to fit individual data rather than on group statistics.

      It would be good to better put the study in the context of spatial ventriloquism, where similar model comparisons have been done over the last ten years and there is a large body of work to connect to.

      We now discuss our model in relation to models of cross-modal spatial recalibration in the Introduction (lines 70–78) and Discussion (lines 324–330).

      Reviewer #2 (Recommendations For The Authors):

      Previous authors (e.g. Yarrow et al.,) have described latency shift and criterion change models as providing a good fit of experimental data. Did the authors attempt a criterion shift model in addition to a shift model?

      We have considered criterion-shift variants of our atheoretical recalibration models in Supplement Section 1. To summarize the results, we varied two model assumptions: 1) the use of either a Gaussian or an exponential measurement distribution, and 2) recalibration being implemented either as a shift of bias or a criterion. We fit each model variant separately to the ternary TOJ responses of all sessions. Bayesian model comparisons indicated that the bias-shift model with exponential measurement distributions best captured the data of most participants.

      Figure 4B - I'm not convinced that the modality-independent uncertainty is anything but a straw man. Models not allowed to be asymmetric do not show asymmetry? (the asymmetry index is irrelevant in the fixed update model as I understand it so it is not surprising the model is identical?).

      We included the assumption that temporal uncertainty might be modality-independent for several reasons. First, there is evidence suggesting that a central mechanism governs the precision of temporal-order judgments (Hirsh & Sherrick, 1961), indicating that precision is primarily limited by a central mechanism rather than the sensory channels themselves. Second, from a modeling perspective, it was necessary to test whether an audio-visual temporal bias alone, i.e., assuming modality-independent uncertainty, could introduce asymmetry across adapter SOAs. Additionally, most previous studies implicitly assumed symmetric likelihoods, i.e., modality-independent latency noise, by fitting cumulative Gaussians to the psychometric curves derived from 2AFC-TOJ tasks (Di Luca et al., 2009; Fujisaki et al., 2004; Harrar & Harris, 2005; Keetels & Vroomen, 2007; Navarra et al., 2005; Tanaka et al., 2011; Vatakis et al., 2007, 2008; Vroomen et al., 2004).

      Why does a zero SOA adapter shift the pss towards auditory leading? Is this a consequence of the previous day’s conditioning - it’s not clear from the methods whether all listeners had the same SOA conditioning sequence across days.

      The auditory-leading recalibration effect for an adapter SOA of zero has been consistently reported in previous studies (e.g., Fujisaki et al., 2004; Vroomen et al., 2004). This effect symbolizes the asymmetry in recalibration. This asymmetry can be explained by differences across modalities in the noisiness of the latencies (Figure 5C) in combination with audiovisual temporal bias (Figure S8).

      We added details about the order of testing to the Methods section (lines 456–457).

      Reviewer #3 (Recommendations For The Authors):

      Abstract

      “Our results indicate that human observers employ causal-inference-based percepts to recalibrate cross-modal temporal perception” Your results indicate this is plausible. However, this statement (basically repeated at the end of the intro and again in the discussion) is - in my opinion - too strong.

      We have revised the statement as suggested.

      Intro and later

      Within the wider literature on relative timing perception, the temporal order judgement (TOJ) task refers to a task with just two response options. Tasks with three response options, as employed here, are typically referred to as ternary judgments. I would suggest language consistent with the existing literature (or if not, the contrast to standard usage could be clarified).

      Ref: Ulrich, R. (1987). Threshold models of temporal-order judgments evaluated by a ternary response task. Percept. Psychophys., 42, 224-239.

      We revised the term for the task as suggested throughout the manuscript.

      Results, 2.2.2

      “However, temporal precision might not be due to the variability of arrival latency.” Indeed, although there is some recent evidence that it might be.

      Ref: Yarrow, K., Kohl, C, Segasby, T., Kaur Bansal, R., Rowe, P., & Arnold, D.H. Neural-latency noise places limits on human sensitivity to the timing of events. Cognition, 222, 105012 (2022).

      We included the reference as suggested (lines 245–248).

      Methods, 4.3.

      Should there be some information here about the order of adaptation sessions (e.g. random for each observer)?

      We added details about the order of testing to the Methods section (lines 456–457).

      Supplemental material section 1.

      Here, you test whether the changes resulting from recalibration look more like a shift of the entire psychometric function or an expansion of the psychometric function on one side (most straightforwardly compatible with a change of one decision criterion). Fine, but the way you have done this is odd, because you have introduced a further difference in the models (Gaussian vs. exponential latency noise) so that you cannot actually conclude that the trend towards a win for the bias-shift model is simply down to the bias vs. criterion difference. It could just as easily be down to the different shapes of psychometric functions that the two models can predict (with the exponential noise model permitting asymmetry in slopes). There seems to be no reason that this comparison cannot be made entirely within the exponential noise framework (by a very simple reparameterization that focuses on the two boundaries rather than the midpoint and extent of the decision window). Then, you would be focusing entirely on the question of interest. It would also equate model parameters, removing any reliance on asymptotic assumptions being met for AIC.

      We revised our exploration of atheoretical recalibration models. To summarize the results, we varied two model assumptions: 1) the use of either a Gaussian or an exponential measurement distribution, and 2) recalibration being implemented either as a shift of the cross-modal temporal bias or as a shift of the criterion. We fit each model separately to the ternary TOJ responses of all sessions. Bayesian model comparisons indicated that the bias-shift model with exponential measurement distributions best described the data of most participants.

      References

      Di Luca, M., Machulla, T.-K., & Ernst, M. O. (2009). Recalibration of multisensory simultaneity:

      cross-modal transfer coincides with a change in perceptual latency. Journal of Vision, 9(12), Article 7.

      Fujisaki, W., Shimojo, S., Kashino, M., & Nishida, S. ’ya. (2004). Recalibration of audiovisual simultaneity. Nature Neuroscience, 7(7), 773–778.

      Harrar, V., & Harris, L. R. (2005). Simultaneity constancy: detecting events with touch and vision. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 166(3-4), 465–473.

      Hirsh, I. J., & Sherrick, C. E., Jr. (1961). Perceived order in different sense modalities. Journal of Experimental Psychology, 62(5), 423–432.

      Keetels, M., & Vroomen, J. (2007). No effect of auditory-visual spatial disparity on temporal recalibration. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 182(4), 559–565.

      MacKay, D. J. (2003). Information theory, inference and learning algorithms.https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=201b835c3f3a3626ca07b e68cc28cf7d286bf8d5

      Navarra, J., Vatakis, A., Zampini, M., Soto-Faraco, S., Humphreys, W., & Spence, C. (2005). Exposure to asynchronous audiovisual speech extends the temporal window for audiovisual integration. Brain Research. Cognitive Brain Research, 25(2), 499–507.

      Tanaka, A., Asakawa, K., & Imai, H. (2011). The change in perceptual synchrony between auditory and visual speech after exposure to asynchronous speech. Neuroreport, 22(14), 684–688.

      Vatakis, A., Navarra, J., Soto-Faraco, S., & Spence, C. (2007). Temporal recalibration during asynchronous audiovisual speech perception. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 181(1), 173–181.

      Vatakis, A., Navarra, J., Soto-Faraco, S., & Spence, C. (2008). Audiovisual temporal adaptation of speech: temporal order versus simultaneity judgments. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 185(3), 521–529.

      Vroomen, J., Keetels, M., de Gelder, B., & Bertelson, P. (2004). Recalibration of temporal order perception by exposure to audio-visual asynchrony. Brain Research. Cognitive Brain Research, 22(1), 32–35.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      Summary:

      In this manuscript by Bimbard et al., a new method to perform stable recordings over long periods of time with neuropixels, as well as the technical details on how the electrodes can be explanted for follow-up reuse, is provided. I think the description of all parts of the method is very clear, and the validation analyses (n of units per day over time, RMS over recording days...) are very convincing. I however missed a stronger emphasis on why this could provide a big impact on the ephys community, by enabling new analyses, new behavior correlation studies, or neurophysiological mechanisms across temporal scales.

      Strengths:

      Open source method. Validation across laboratories. Across species (mice and rats) demonstration of its use and in different behavioral conditions (head-fixed and freely moving).

      Weaknesses:

      Weak emphasis on what can be enabled with this new method that didn't exist before.

      We thank the reviewer for highlighting the limited discussion around scientific impact. Our implant has several advantages which combine to make it much more accessible than previous solutions. This enables a variety of recording configurations that would not have been possible with previous designs, facilitating recordings from a wider range of brain regions, animals, and experimental setups. In short, there are three key advances which we now emphasise in the manuscript:

      Adaptability: The CAD files can be readily adapted to a wide range of configurations (implantation depth, angle, position of headstage, etc.). Labs have already modified the design for their needs, and re-shared with the community (Discussion, Para 5).

      Weight: Because of the lightweight design, experimenters can i) perform complex and demanding freely moving tasks as we exemplify in the manuscript, and ii) implant female and water restricted mice while respecting animal welfare weight limitations (Flexible design, Para 1).

      Cost: At ~$10, our implant is significantly cheaper than published alternatives, which makes it affordable to more labs and means that testing modifications is cost-effective (Discussion, Para 4).

      Reviewer 1 (Recommendations For The Authors):

      - Differences between mice and rats seem very significant. Although this is probably not surprising, I suggest that the authors comment on this to make it clear to anyone trying to use in different species that are not quantified in the main figures.

      The reviewer is correct—there are qualitative differences between mice and rats, particularly with respect to the unit median amplitude. We have added a comment in the discussion to highlight these inter-species variations (Discussion, Para 7)

      - Another comment that would be useful to have would be how to tackle the problem of tracking the same neuron across days. Even if currently impossible, it could be useful to provide discussion along those lines as to where future improvements (either in hardware or software) can be made.

      We thank the reviewer for highlighting this. Figure. 5 does show data from tracking the same neuron across days (and even months). We have modified the language to make this clear.

      Reviewer 2 (Public Review):  

      Summary:

      This work by Bimbard et al., introduces a new implant for Neuropixels probes. While Neuropixels probes have critically improved and extended our ability to record the activity of a large number of neurons with high temporal resolution, the use of these expensive devices in chronic experiments has so far been hampered by the difficulty of safely implanting them and, importantly, to explant and reuse them after conclusion of the experiment. The authors present a newly designed two-part implant, consisting of a docking and a payload module, that allows for secure implantation and straightforward recovery of the probes. The implant is lightweight, making it amenable for use in mice and rats, and customizable. The authors provide schematics and files for printing of the implants, which can be easily modified and adapted to custom experiments by researchers with little to no design experience. Importantly, the authors demonstrate the successful use of this implant across multiple use cases, in head-fixed and freely moving experiments, in mice and rats, with different versions of Neuropixels probes, and across 8 different labs. Taken together, the presented implants promise to make chronic Neuropixel recordings and long-term studies of neuronal activity significantly easier and attainable for both current and future Neuropixels users.

      Strengths:

      The implants have been successfully tested across 8 different laboratories, in mice and rats, in headfixed and freely moving conditions, and have been adapted in multiple ways for a number of distinct experiments.

      Implants are easily customizable and the authors provide a straightforward approach for customization across multiple design dimensions even for researchers not experienced in design.

      The authors provide clear and straightforward descriptions of the construction, implantation, and explant of the described implants.

      The split of the implant into a docking and payload module makes reuse even in different experiments (using different docking modules) easy.

      The authors demonstrate that implants can be re-used multiple times and still allow for high-quality recordings.

      The authors show that the chronic implantations allow for the tracking of individual neurons across days and weeks (using additional software tracking solutions), which is critical for a large number of experiments requiring the description of neuronal activity, e.g. throughout learning processes.

      The authors show that implanted animals can even perform complex behavioral tasks, with no apparent reduction in their performance.

      Weaknesses:

      While implanted animals can still perform complex behavioral tasks, the authors describe that the implants may reduce the animals' mobility, as measured by prolonged reaction times. However, the presented data does not allow us to judge whether this effect is specifically due to the presented implant or whether any implant or just tethering of the animals per se would have the same effects.

      The reviewer is correct: some of the differences in mouse reaction time could be due to the tether rather than the implant. As these experiments were also performed in water-restricted female mice with the heavier Neuropixels 1.0 implant, our data represent the maximal impact of the implant, and we have highlighted this point in the revision (Freely behaving animals, Para 2).  

      While the authors make certain comparisons to other, previously published approaches for chronic implantation and re-use of Neuropixels probes, it is hard to make conclusive comparisons and judge the advantages of the current implant. For example, while the authors emphasize that the lower weight of their implant allows them to perform recordings in mice (and is surely advantageous), the previously described, heavier implants they mention (Steinmetz et al., 2021; van Daal et al., 2021), have also been used in mice. Whether the weight difference makes a difference in practice therefore remains somewhat unclear.

      The reviewer is correct: without a direct comparison, we cannot be certain that our smaller, lighter implant improves behavioural results (although this is supported by the literature, e.g. Newman et al, 2023). However, the reduced weight of our implant is critical for several laboratories represented in this manuscript due to animal welfare requirements. Indeed, in van Daal et al the authors “recommend a [mouse] weight of >25 g for implanting Neuropixels 1.0 probes.” This limit precludes using (the vast majority of) female mice, or water-restricted animals. Conversely, our implant can be routinely used with lighter, water-restricted male and female mice. We emphasised this point in the revision (Discussion, Para 2).

      The non-permanent integration of the headstages into the implant, while allowing for the use of the same headstage for multiple animals in parallel, requires repeated connections and does not provide strong protection for the implant. This may especially be an issue for the use in rats, requiring additional protective components as in the presented rat experiments.

      We apologise for not clarifying the various headstage holder options in the manuscript and we have now addressed this in the revision (Freely behaving animals, Para 1&2). Our repository has headstage holder designs (in the XtraModifications/Mouse_FreelyMoving folder). This allows leaving the headstage on the implant, and thus minimize the number of connections (albeit increasing the weight for the mouse). Indeed, mice recorded while performing the task described in our manuscript had the head-stage semi-permanently integrated to the implant, and we now highlight this in the revision (Freely behaving animals, Para 1).

      Reviewer 2 (Recommendations For The Authors): 

      The description of the different versions of the head-stage holders should be more clear, listing also advantages/disadvantages of the different solutions. It would be also useful if the authors could comment on the use of these head-stage holders in rats, since they do not seem to offer much protection.

      We thank the reviewer for this point, and we have added notes to the manuscript to clarify the various advantages of the different headstage-holders, and that the headstage can be permanently attached to the implant (Freely behaving animals, Para 1&2). This is the primary advantage of these solutions compared with the minimal implant—at the expense of increasing the implant weight.  

      The reviewer’s concerns regarding the lack of protection for implants in rats is well-placed, and we now emphasise that these experiments benefited from the additional protection of an external 3D casing, which is likely critical for use in larger animals (Freely behaving animals, Para 1).

      While re-used probes seem to show similar yields across multiple uses (Figure 4C), it seems as if there is a much higher variability of the yield for probes that are used for the first (maybe also second) time. There are probes with much higher than average yields, but it seems none of the re-used probes show such high yields. Is this a real effect? Is this because the high-yield probes happened to have not been used multiple times? Is there an analysis the authors could provide to reduce the concern that yields may generally be lower for re-used probes/that there are no very high yields for re-used probes?

      We understand the reviewer’s concern with respect to Figure 4C, however, the re-use of any given probe was determined only by the experimental needs of the project. It is therefore not possible that there is a relationship between probes selected for re-use and unit-yield. We now specify this in the revised legend of Figure 4C. This variability (and the consistency in yield across uses) likely stems from differences between labs, brain region, and implantation protocol.

      The authors claim that a 'large fraction' of units could be tracked for the entire duration of the experiment (Figure 5A,B). They mention in the discussion that quantification can be found in a different manuscript (van Beest et al., 2023), but this should also be quantified here in at least some more detail, also for other animals in addition to the one mouse which was recorded for ~100 days. What fraction can be held for different durations? What is the average holding time, etc.?

      We agree with the reviewer, and have now added new panels quantifying the probability and reliability of tracking a neuron across days (Figure 5E-F). We also comment on the change in tracking probability across time, and its variability across recordings (Stability, Para 4).

      Reviewer 3 (Public Reviews):

      Summary:

      In this manuscript, Bimbard and colleagues describe a new implant apparatus called "Apollo Implant", which should facilitate recording in freely moving rodents (mice and rats) using Neuropixels probes. The authors collected data from both mice and rats, they used 3 different versions of Neuropixels, multiple labs have already adopted this method, which is impressive. They openly share their CAD designs and surgery protocol to further facilitate the adaptation of their method.

      Strengths:

      Overall, the "Apollo Implant" is easy to use and adapt, as it has been used in other laboratories successfully and custom modifications are already available. The device is reproducible using common 3D printing services and can be easily modified thanks to its CAD design (the video explaining this is extremely helpful). The weight and price are amazing compared to other systems for rigid silicon probes allowing a wide range of use of the "Apollo Implant".

      Weaknesses:

      The "Apollo Implant" can only handle Neuropixels probes. It cannot hold other widely used and commercially available silicon probes. Certain angles and distances are not possible in their current form (distance between probes 1.8 to 4mm, implantation depth 2-6.5 mm, or angle of insertion up to 20 degrees).

      As we now discuss in the manuscript (Discussion, Para 4), one implant accommodating the diversity of the existing probes is beyond the scope of this project. However, because the design is adaptable, groups should be able to modify the current version of the implant to adapt to their electrodes’ size and format (and can highlight any issues in the Github “Discussions” area).

      With Neuropixels, the current range of depths covers practically all trajectories in the mouse brain. In rats, where deeper penetrations may be useful, the experimenter can attach the probe at a lower point in the payload module to expose more of the shank. We now specify this in the Github repository.  

      We have now extended the range of inter-probe distances from a maximum of 4 mm to 6.5 mm. Distances beyond this may be better served by 2 implants, and smaller distances could be achieved by attaching two probes on the same side of the docking module. These points are now specified in the revised manuscript (Flexible design, Para 2).

      Reviewer 3 (Recommendations For The Authors):

      I have only a few questions and suggestions:

      Is it possible to create step-by-step instructions for explantation (similar to Figure-1 with CAD schematics)? You mention that payload holder is attached to a micromanipulator, but it is unclear how this is achieved. How was the payload secured with a screw (which screw)? My understanding is that as you turn the screw in the payload holder, it will grab onto the payload module from both sides, but the screw is not in contact with the payload module, correct? I found the screw type on your GitHub, but it would be great if you could add a bill of materials in a table format, so readers don't have to jump between GitHub and article.

      We have now added a bill of materials to the revised manuscript (Implant design and materials, Para 2), although up-to-date links are still provided on the Github repository due to changing availability.

      What happens if you do a dual probe implant and cannot avoid blood vessels in one or both of the craniotomies due to the pre-defined geometry? Is this a frequent issue? How can you overcome this during the surgery?

      Blood vessels can be difficult to avoid in some cases, but we are typically able to rotate/reposition the probes to solve this issue. In some cases, with 4-shank probes, the blood vessel can be positioned between individual probe shanks. We now detail this in the revised manuscript (Assembly and implantation, Para 3).

      I assume if the head is not aligned (line-332) the probe can break during recovery. Have you experienced this during explanation?

      As we now specify in the manuscript (Explantation, Para 2), we are careful when explanting the probe to avoid this issue, and due to the flexibility of the shanks, it does not appear to be a major concern.

      Why did you remove the UV glue (line 435)? How can you level the skull? I assume you have covered bregma and lambda in the first surgery which can create an uneven surface to measure even after you remove the UV glue.

      We thank the reviewer for highlighting this omission from the methods. We now explain (Implantation, Carandini-Harris laboratory) that the UV-glue is completely removed during the second surgery, and the skull is cleaned and scored. This improves the adhesion of the dental cement, and allows for reliable levelling of the skull.

      In line 112 you mentioned that the number of recorded neurons was stable; however, you found a 3% mean decrease in unit count per day (line 120). Stability is great until day 10 (in Figure 4A), but it deteriorates quickly after that. I think it would help readers if you could add the mean{plus minus}SEM of recorded units in the text for days 1-10, days 11-50, and days 51-100 (using the data from Figure 4A).

      We have now added Supplementary Figure 4 to show unit count across bins of days, and a corresponding comment in the text (Stability, Para 2).

      A full survey of the probe (Figure 4B) means that you recorded neuronal activity across 4-5000 channels (depending on how many channels were in the brain). While it is clear that a full probe survey can reduce the number of animals needed for a study, it is also clear in this figure that by day 25 you can record ~300 neurons on 4000 channels. It would be great to discuss this in the discussion and give a balanced view of the long-term stability of these recordings.

      Overall, keeping a large number of units for a long time still remains a challenge. Here, we could record on average 85 neurons per bank during the first 10 days, and then only 45 after 50 days. It is important to note that our quantification averages across all banks recorded, including those in a ventricle or partly outside of the brain. Thus, our results represent a lower estimate of the total neurons recorded. Our new Supplementary Figure 4 helps to highlight the diversity of neuron number recorded per animal. Further improvements in surgical techniques and spike sorting will likely improve stability further and we have now added this comment in the manuscript (Stability, Para 2). For example, we observed excellent stability in a mouse where the craniotomy was stabilized with KwikSil (Supplementary Figure 5).

      The RMS value was around 20 uV in some of the recordings, and according to Figure 4G it is around 16 uV on average. Is it safe to accept putative single units with 20 uV amplitudes, when the baseline noise level is this close to the spike peak-to-peak amplitude?

      On average, less than 1% of the units selected using all the other metrics except the amplitude had an amplitude below 30 µV, and 2.6% below 50 µV. Increasing the threshold to 30 µV, or even 50 µV, did not affect the results. We have now added this comment in the Methods (Data processing, Para 3).

      Can you add the waveform and ISIH of the example unit from day 106 to Figure 5?

      We have now added 4 units tracked up to day 106 in Figure 5.  

      Could you move Supplementary Figure 3A to Figure 4? The number of units is more valuable information than the RMS noise level. I understand that you don't have such a nice coverage of all the days as in Figure 3 and 4, but you might be able to group for the first 3 days and the last 3 days (and if data is available, the middle 3 days) as a boxplot. The goal would be for the reader to be able to see whether there is any change in the number of single units over time.

      We agree with the reviewer, the number of units is more valuable. We had included this information in Figure 4A-F, but we have made edits to the text to make it clearer that this is what is being shown. The data from Figure 3A is already contained within Figure 4, but in 3A the data is separated by individual labs.

      Product numbers are missing in multiple places: line-285 (screw), line-288 (screw), line-290 (screw), line-309 (manipulator), line-374 (gold pin and silver wire), line-384 (Mill-Max), line-394 (silver wire), and many more. It would be great if you could add all these details, so people can replicate your protocol.

      We thank the reviewer for highlighting this, and we have added details of screw thread-size and length to relevant parts of the manuscript, although any type of screw can be used. Similarly, other components are non-specific (e.g. multiple silver-wire diameters were used across labs), so we have not included specific product numbers for general consumer items (like screws and silver wires) to avoid indicating that a specific part must be purchased.

      While it is great to see lab-specific methods, I am not sure in their current form it helps to understand the protocol better. The information is conveyed in different ways (I assume these were written by different people), in different orders, and in different depths (some mention probe implant location relative to bregma and midline, some don't). There are many different glues, epoxies, cement, wires, and pins. I would recommend rewriting these methods sections under a unified template, so it is easier to follow.

      We thank the reviewer for this suggestion and we have rewritten this section of the methods accordingly. We now use a template structure to simplify the comparisons between labs: the same template is used for each lab in each section (payload module assembly, implantation, and data acquisition).

      Line-307: why is a skull screw optional for grounding? What did you use for ground and reference if not a ground screw?

      We now specify in the manuscript that during head-fixed experiments, the animal’s headplate can be used for grounding, and combined with internal referencing provided by the Neuropixels, yielded lownoise recordings (Implantation protocol, Methods).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Oleh et al. uses in vitro electrophysiology and compartmental modeling (via NEURON) to investigate the expression and function of HCN channels in mouse L2/3 pyramidal neurons. The authors conclude that L2/3 neurons have developmentally regulated HCN channels, the activation of which can be observed when subjected to large hyperpolarizations. They further conclude via blockade experiments that HCN channels in L2/3 neurons influence cellular excitability and pathway-specific EPSP kinetics, which can be neuromodulated. While the authors perform a wide range of slice physiology experiments, concrete evidence that L2/3 cells express functionally relevant HCN channels is limited. There are serious experimental design caveats and confounds that make drawing strong conclusions from the data difficult. Furthermore, the significance of the findings is generally unclear, given modest effect sizes and a lack of any functional relevance, either directly via in vivo experiments or indirectly via strong HCN-mediated changes in known operations/computations/functions of L2/3 neurons.

      Specific points:

      (1) The interpretability and impact of this manuscript are limited due to numerous methodological issues in experimental design, data collection, and analysis. The authors have not followed best practices in the field, and as such, much of the data is ambiguous and/or weak and does not support their interpretations (detailed below). Additionally, the authors fail to appropriately explain their rationale for many of their choices, making it difficult to understand why they did what they did. Furthermore, many important references appear to be missing, both in terms of contextualizing the work and in terms of approach/method. For example, the authors do not cite Kalmbach et al 2018, which performed a directly comparable set of experiments on HCN channels in L2/3 neurons of both humans and mice. This is an unacceptable omission. Additionally, the authors fail to cite prior literature regarding the specificity or lack thereof of Cs+ in blocking HCN. In describing a result, the authors state "In line with previous reports, we found that L2/3 PCs exhibited an unremarkable amount of sag at 'typical' current commands" but they then fail to cite the previous reports.

      We thank the reviewer for the thorough examination of our manuscript; however, we disagree with many of the raised concerns for several reasons, as detailed here:

      To address the lack of certain citations, we would like to emphasize that in the introduction section, we did initially focus on the several decades-long line of investigation into the HCN channel content of layer 2/3 pyramidal cells (L2/3 PCs), where there has undoubtedly been some controversy as to their functional contribution. We did not explicitly cite papers that claimed to find no/little HCN channels/sag- although this would be a significant list of publications from some excellent investigators, as methods used may have differed from ours leading to different interpretations. Simply stated, unless one was explicitly looking for HCN in L2/3 PCs, it might go unobserved. However, we now addressed this more clearly in the revision:

      Just to take one example: in the publication mentioned by the reviewer (Kalmbach et al 2018), the investigators did not carry out voltage clamp or dynamic clamp recordings, as we did in our work here. Furthermore, the reported input resistance values in the aforementioned paper were far above other reports in mice (Routh et al. 2022, Brandalise et al 2022, Hedrick et al 2012; which were similar to our findings here), suggesting that recordings in Kalmbach were carried out at membrane potentials where HCN activation may be less available (Routh, Brager and Johnston 2022).

      Another reason for some mixed findings in the field is undoubtedly due to the small/nonexistent sag in L2/3 current clamp recordings (in mice). We also observed a very small sag, which can be explained by the following:  The ‘sag’ potential is a biphasic voltage response emerging from a relatively fast passive membrane response and a slower Ih activation. In L2/3 PCs, hyperpolarization-activated currents are apparently faster than previously described, and are located proximally (Figure 2 & Figure 5). Therefore, their recruitment in mouse L2/3 PCs is on a similar timescale to the passive membrane response, resulting in a more monophasic response. We now include a more full set of citations in the updated introduction section, to highlight the importance of HCN channels in L2/3 PCs in mice (and other species).

      The justification for using cesium (i.e., ‘best practices’) is detailed below.

      (2) A critical experimental concern in the manuscript is the reliance on cesium, a nonspecific blocker, to evaluate HCN channel function. Cesium blocks HCN channels but also acts at potassium channels (and possibly other channels as well). The authors do not acknowledge this or attempt to justify their use of Cs+ and do not cite prior work on this subject. They do not show control experiments demonstrating that the application of Cs+ in their preparation only affects Ih. Additionally, the authors write 1 mM cesium in the text but appear to use 2 mM in the figures. In later experiments, the authors switch to ZD7288, a more commonly used and generally accepted more specific blocker of HCN channels. However, they use a very high concentration, which is also known to produce off-target effects (see Chevaleyre and Castillo, 2002). To make robust conclusions, the authors should have used both blockers (at accepted/conservative concentrations) for all (or at least most) experiments. Using one blocker for some experiments and then another for different experiments is fraught with potential confounds.

      To address the concerns regarding the usage of cesium to block HCN channels, we would like to state that neither cesium nor ZD-7288 are without off-target effects, however in our case the potential off-target effects of external cesium were deemed less impactful, especially concerning AP firing output experiments. Extracellular cesium has been widely accepted as a blocker of HCN channels (Lau et al. 2010, Wickenden et al. 2009, Rateau and Ropert 2005, Hemond et al. 2009, Yang et al. 2015, Matt et al. 2010). However, it is well known to act on potassium channels as well at higher concentrations, which has been demonstrated with intracellular and extracellular application (Puil et al. 1981, Fleidervish et al. 2008, Williams et al. 1991, 2008).

      Although we initially performed ‘internal’ control experiments to ensure the cesium concentration was unlikely to greatly block voltage gated K+ channels during our recordings, we recognize these were not included in the original manuscript. These are detailed as follows: during our recordings cesium had no significant effect on action potential halfwidth, ruling out substantial blocking of potassium channels, nor did it affect any other aspects of suprathreshold activity (now reported in results, page 4 - line 113). Furthermore, we observed similar effects on passive properties (resting membrane potential, input resistance) following ZD-7288 as with cesium, which we now also updated in our figures (Supplementary Figure 1). We did acknowledge that ZD-7288 is a widely accepted blocker of HCN, and for this reason we carried out some of our experiments using this pharmacological agent instead of cesium.

      On the other hand, ZD-7288 suffers from its own side effects, such as potential effects on sodium channels (Wu et al. 2012) and calcium channels (Sánchez-Alonso et al. 2008, Felix et al. 2003). As our aim was to provide functional evidence for the importance of HCN channels, we initially deemed these potential effects unacceptable in experiments where AP firing output (e.g., in cell-attached experiments) was measured. Nonetheless, in new experiments now included here, we found the effects of ZD and cesium on AP output were similar as shown in new Supplemental Figure 1.

      Many experiments were supported by complementary findings using external cesium and ZD-7288. For example, the effect of ZD-7288 on EPSPs was confirmed by similar synaptic stimulation experiments using cesium. This is important, as synaptic inputs of L2/3 PCs are modulated by both dendritic sodium (Ferrarese et al. 2018) and calcium channels (Landau 2022), therefore the application of ZD-7288 alone may have been difficult to interpret in isolation. We thank the reviewer for bringing up this important point.

      (3) A stronger case could be made that HCN is expressed in the somatic compartment of L2/3 cells if the authors had directly measured HCN-isolated currents with outside-out or nucleated patch recording (with appropriate leak subtraction and pharmacology). Whole-cell voltage-clamp in neurons with axons and/or dendrites does not work. It has been shown to produce erroneous results over and over again in the field due to well-known space clamp problems (see Rall, Spruston, Williams, etc.). The authors could have also included negative controls, such as recordings in neurons that do not express HCN or in HCN-knockout animals. Without these experiments, the authors draw a false equivalency between the effects of cesium and HCN channels, when the outcomes they describe could be driven simply by multiple other cesium-sensitive currents. Distortions are common in these preparations when attempting to study channels (see Williams and Womzy, J Neuro, 2011). In Fig 2h, cesium-sensitive currents look too large and fast to be from HCN currents alone given what the authors have shown in their earlier current clamp data. Furthermore, serious errors in leak subtraction appear to be visible in Supplementary Figure 1c. To claim that these conductances are solely from HCN may be misleading.

      We disagree with the argument that “Whole-cell voltage-clamp in neurons with axons and/or dendrites does not work”. Although this method is not without its confounds (i.e. space clamp), it is still a useful initial measure as demonstrated countless times in the literature. However, the reviewer is correct that the best approach to establish the somatodendritic distribution of ion channels is by direct somatic and dendritic outside-out patches. Due to the small diameter of L2/3 PC dendrites, these experiments haven’t been carried out yet in the literature for any other ion channel either to our knowledge. Mapping this distribution electrophysiologically may be outside the scope of the current manuscript, but it was hard for us to ignore the sheer size of the Cs<sup>+</sup> sensitive hyperpolarizing currents in whole cell. Thus, we will opt to report this data.

      Also, we should point out that space clamp-related errors manifest in the overestimation of frequency-dependent features, such as activation kinetics, and underestimation of steady-state current amplitudes. The activation time constant of our measured currents are somewhat faster than previously reported; reducing major concerns regarding space clamp errors. Furthermore, we simply do not understand what “too large… to be from HCN currents” means. Our voltage-clamp measured currents are similar to previously reported HCN currents (Meng et al. 2011, Li 2011, Zhao et al. 2019, Yu et al. 2004, Zhang et al. 2008, Spinelli et al. 2018, Craven et al. 2006, Ying et al. 2012, Biel et al. 2009).

      Furthermore, we should point out that our measured currents activated at hyperpolarized voltages, had the same voltage dependence as HCN currents, did not show inactivation, influenced both input resistance and resting membrane potential, and are blocked by low concentration extracellular cesium. Each of these features would point to HCN.

      (4) The authors present current-clamp traces with some sag, a primary indicator of HCN conductance, in Figure 2. However, they do not show example traces with cesium or ZD7288 blockade. Additionally, the normalization of current injected by cellular capacitance and the lack of reporting of input resistance or estimated cellular size makes it difficult to determine how much current is actually needed to observe the sag, which is important for assessing the functional relevance of these channels. The sag ratio in controls also varies significantly without explanation (Figure 6 vs Figure 7). Could this variability be a result of genetically defined subgroups within L2/3? For example, in humans, HCN expression in L2/3 varies from superficial and deep neurons. The authors do not make an effort to investigate this. Regardless of inconsistencies in either current injection or cell type, the sag ratio appears to be rather modest and similar to what has already been reported previously in other papers.

      We thank the reviewer for pointing out that our explanation for the modest sag ratio might have not been sufficient to properly understand why this measurement cannot be applied to layer 2/3 pyramidal cells. Briefly: sag potential emerges from a relatively (compared to I<sub>h</sub>) fast passive membrane response and a slower HCN recruitment. The opposing polarity and different timescales of these two mechanisms results in a biphasic response called “sag” potential. However, if the timescale of these two mechanisms is similar, the voltage response is not predicted to be biphasic. We have shown that hyperpolarization activated currents in our preparations are fast and proximal, therefore they are recruited during the passive response (see Figure 2g.). This means that although a substantial amount of HCN currents are activated during hyperpolarization, their activation will not result in substantial sag. Therefore, sag ratio measurement is not necessarily applicable to approximate the HCN content of mouse L2/3 PCs. We would like to emphasize that sag ratio measurements are correct in case of other cell types (i.e. L5 and CA1 PCs_,_ and our aim is not to discredit the method, but rather to show that it cannot be applied similarly in the case of mouse L2/3 PCs.

      Our own measurements, similar to others in the literature show that L2/3 PCs exhibit modest sag ratios, however, this does not mean that HCN is not relevant. I<sub>h</sub> activation in L2/3 PCs does not manifest in large sag potential but rather in a continuous distortion of steady-state responses (Figure 2b.). The reviewer is correct that L2/3 PCs are non-homogenous, therefore we sampled along the entire L2/3 axis. This yielded some potential variability in our results (i.e., passive properties); yet we did not observe any cells where hyperpolarizing-activated/Cs<sup>+</sup>-sensitive currents could not be resolved. As structural variability of L2/3 cells does result in variability in cellular capacitance, we compensated for this variability by injecting cellular capacitance-normalized currents. Our measured cellular capacitances were in accordance with previously published values, in the range of 50-120 pF. Therefore, the injected currents were not outside frequently used values. Together, we would like to state that whether substantial sag potential is present or not, initial estimates of the HCN content for each L2/3 PC should be treated with caution.

      (5) In the later experiments with ZD7288, the authors measured EPSP half-width at greater distances from the soma. However, they use minimal stimulation to evoke EPSPs at increasingly far distances from the soma. Without controlling for amplitude, the authors cannot easily distinguish between attenuation and spread from dendritic filtering and additional activation and spread from HCN blockade. At a minimum, the authors should share the variability of EPSP amplitude versus the change in EPSP half-width and/or stimulation amplitudes by distance. In general, this kind of experiment yields much clearer results if a more precise local activation of synapses is used, such as dendritic current injection, glutamate uncaging, sucrose puff, or glutamate iontophoresis. There are recording quality concerns here as well: the cell pictured in Figure 3a does not have visible dendritic spines, and a substantial amount of membrane is visible in the recording pipette. These concerns also apply to the similar developmental experiment in 6f-h, where EPSP amplitude is not controlled, and therefore, attenuation and spread by distance cannot be effectively measured. The outcome, that L2/3 cells have dendritic properties that violate cable theory, seems implausible and is more likely a result of variable amplitude by proximity.

      To resolve this issue, we made a supplementary figure showing elicited amplitudes, which showed no significant distance dependence and minimal variability (new Supplementary Figure 6). We thank the reviewer for suggesting an amplitude-halfwidth comparison control (now included as new Supplementary Figure 6).). To address the issue of the non-visible spines, we would like to note that these images are of lower magnification and power to resolve them. The presence of dendritic spines was confirmed in every recorded pyramidal cell observed using 2P microscopy at higher magnification.

      We would like to emphasize that although our recordings “seemingly” violated the cable theory, this is only true if we assume a completely passive condition. As shown in our manuscript, cable theory was not violated, as the presence of NMDA receptor boosting explained the observed ‘non-Rallian’ phenomenon.

      (6) Minimal stimulation used for experiments in Figures 3d-i and Figures 4g-h does not resolve the half-width measurement's sensitivity to dendritic filtering, nor does cesium blockade preclude only HCN channel involvement. Example traces should be shown for all conditions in 3h; the example traces shown here do not appear to even be from the same cell. These experiments should be paired (with and without cesium/ZD). The same problem appears in Figure 4, where it is not clear that the authors performed controls and drug conditions on the same cells. 4g also lacks a scale bar, so readers cannot determine how much these measurements are affected by filtering and evoked amplitude variability. Finally, if we are to believe that minimal stimulation is used to evoke responses of single axons with 50% fail rates, NMDA receptor activation should be minimal to begin with. If the authors wish to make this claim, they need to do more precise activation of NMDA-mediated EPSPs and examine the effects of ZD7288 on these responses in the same cell. As the data is presented, it is not possible to draw the conclusion that HCN boosts NMDA-mediated responses in L2/3 neurons.

      As stated in the figure legends, the control and drug application traces are from the same cell, both in figure 3 and figure 4, and the scalebar is not included as the amplitudes were normalized for clarity. We have address the effects of dendritic filtering above in answer (5), and cesium blockade above in answer (2). To reiterate, dendritic filtering alone cannot explain our observations, and cesium is often a better choice for blocking HCN channels compared to ZD-7288, which blocks sodium channels as well.

      When an excitatory synaptic signal arrives onto a pyramidal cell in typical conditions, neurotransmitter sensitive receptors transmit a synaptic current to the dendritic spine. This dendritic spine is electrically isolated by the high resistance of the spine neck and due to the small membrane surface of the spine, the synaptic current can elicit remarkably large voltage changes. These voltage changes can be large enough to depolarize the spine close to zero millivolts upon even single small inputs (Jayant et al. 2016). Therefore, to state that single inputs arriving to dendritic spines cannot be large enough to recruit NMDA receptor activation is incorrect. This is further exemplified by the substantial literature showing ‘miniature’ NMDA recruitment via stochastic vesicle release alone.

      (7) The quality of recordings included in the dataset has concerning variability: for example, resting membrane potentials vary by >15-20 mV and the AP threshold varies by 20 mV in controls. This is indicative of either a very wide range of genetically distinct cell types that the authors are ignoring or the inclusion of cells that are either unhealthy or have bad seals.

      Although we are aware of the diversity of L2/3 PCs, resolving further layer depth differences is outside the scope of our current manuscript. However, as shown in Kalmbech et al, resting membrane potential can greatly vary (>15-20 mV) in L2/3 PCs depending on distance from pia. We acknowledge that the variance in AP threshold is large and could be due to genetically distinct cell types.

      (8) The authors make no mention of blocking GABAergic signaling, so it must be assumed that it is intact for all experiments. Electrical stimulation can therefore evoke a mixture of excitatory and inhibitory responses, which may well synapse at very different locations, adding to interpretability and variability concerns.

      We thank the reviewer for pointing out our lack of detail regarding the GABAergic signaling blocker SR 95531. We did include this drug in our recordings of (50Hz stim.) signal summation, so GABAergic responses did not contaminate our recordings. We now included this information in the results section (page 5) and the methods section (page 15)

      (9) The investigation of serotonergic interaction with HCN channels produces modest effect sizes and suffers the same problems as described above.

      We do not agree with the reviewer that 50% drop in neuronal AP firing responses (Figure 7b) was a modest effect size. Thus, we opted to keep this data in the manuscript.

      (10) The computational modeling is not well described and is not biologically plausible. Persistent and transient K channels are missing. Values for other parameters are not listed. The model does not seem to follow cable theory, which, as described above, is not only implausible but is also not supported by the experimental findings.

      The model was downloaded from the Cell Type Database from the Allen Institute, with only minor modifications including the addition of dendritic HCN channels and NDMA receptors- which were varied along a wide parameter space to find a ‘best fit’ to our observations. These additions were necessary to recapitulate our experimental findings. We agree the model likely does not fully recapitulate all aspects of the dendrites, which as we hope to convey in this manuscript, are not fully resolved in mouse L2/3 PCs. This is a previously published neuronal model, and despite its potential shortcomings, is one among a handful of open-source neuronal models of a fully reconstructed L2/3 PC.

      Reviewer #2 (Public Review):

      Summary:

      This paper by Olah et al. uncovers a previously unknown role of HCN channels in shaping synaptic inputs to L2/3 cortical neurons. The authors demonstrate using slice electrophysiology and computational modeling that, unlike layer 5 pyramidal neurons, L2/3 neurons have an enrichment of HCN channels in the proximal dendrites. This location provides a locus of neuromodulation for inputs onto the proximal dendrites from L4 without an influence on distal inputs from L1. The authors use pharmacology to demonstrate the effect of HCN channels on NMDA-mediated synaptic inputs from L4. The authors further demonstrate the developmental time course of HCN function in L2/3 pyramidal neurons. Taken together, this a well-constructed investigation of HCN channel function and the consequences of these channels on synaptic integration in L2/3 pyramidal neurons.

      Strengths:

      The authors use careful, well-constrained experiments using multiple pharmacological agents to asses HCN channel contributions to synaptic integrations. The authors also use a voltage clamp to directly measure the current through HCN channels across developmental ages. The authors also provide supplemental data showing that their observation is consistent across multiple areas of the cerebral cortex.

      Weaknesses:

      The gradient of the HCN channel function is based almost exclusively on changes in EPSP width measured at the soma. While providing strong evidence for the presence of HCN current in L2/3 neurons, there are space clamp issues related to the use of somatic whole-cell voltage clamps that should be considered in the discussion.

      We thank the reviewer for pointing out our careful and well-constrained experiments and for making suggestions. The potential effects of space clamp errors are detailed in the extended explanations under Reviewer 1, Specific points (3).

      Reviewer #3 (Public Review):

      Summary:

      The authors study the function of HCN channels in L2/3 pyramidal neurons, employing somatic whole-cell recordings in acute slices of visual cortex in adult mice and a bevy of technically challenging techniques. Their primary claim is a non-uniform HCN distribution across the dendritic arbor with a greater density closer to the soma (roughly opposite of the gradient found in L5 PT-type neurons). The second major claim is that multiple sources of long-range excitatory input (cortical and thalamic) are differentially affected by the HCN distribution. They further describe an interesting interplay of NMDAR and HCN, serotonergic modulation of HCN, and compare HCN-related properties at 1, 2 and 6 weeks of age. Several results are supported by biophysical simulations.

      Strengths:

      The authors collected data from both male and female mice, at an age (6-10 weeks) that permits comparison with in vivo studies, in sufficient numbers for each condition, and they collected a good number of data points for almost all figure panels. This is all the more positive, considering the demanding nature of multi-electrode recording configurations and pipette-perfusion. The main strength of the study is the question and focus.

      Weaknesses:

      Unfortunately, in its present form, the main claims are not adequately supported by the experimental evidence: primarily because the evidence is indirect and circumstantial, but also because multiple unusual experimental choices (along with poor presentation of results) undermine the reader's confidence. Additionally, the authors overstate the novelty of certain results and fail to cite important related publications. Some of these weaknesses can be addressed by improved analysis and statistics, resolving inconsistent data across figures, reorganizing/improving figure panels, more complete methods, improved citations, and proofreading. In particular, given the emphasis on EPSPs, the primary data (for example EPSPs, overlaid conditions) should be shown much more.

      However, on the experimental side, addressing the reviewer's concerns would require a very substantial additional effort: direct measurement of HCN density at different points in the dendritic arbor and soma; the internal solution chosen here (K-gluconate) is reported to inhibit HCN; bath-applied cesium at the concentrations used blocks multiple potassium channels, i.e. is not selective for HCN (the fact that the more selective blocker ZD7288 was used in a subset of experiments makes the choice of Cs+ as the primary blocker all the more curious); pathway-specific synaptic stimulation, for example via optogenetic activation of specific long-range inputs, to complement / support / verify the layer-specific electrical stimulation.

      We thank the reviewer for their very careful examination of our manuscript and helpful suggestions. We addressed the concerns raised in the review and presented more raw traces in our figures. Although direct dendritic HCN mapping measurements are outside the scope of the current manuscript due to the morphological constraints presented by L2/3 PCs (which explains why no other full dendritic nonlinearity distribution has been described in L2/3 PCs with this method), we nonetheless supplemented our manuscript with additional suggested experiments as suggested. For example, we included the excellent suggestion of pathway-specific optogenetic stimulation to further validate the disparate effect of HCN channels for distal and proximal inputs. We agree that ZD-7288 is a widely accepted blocker of HCN channels. However, the off-target effects on sodium channels may have significantly confounded our measurements of AP output using extracellular stimulation. Therefore, we chose low concentration cesium as the primary blocker for those experiments, but now validated several other Cs<sup>+</sup>-based results with ZD-7288 as well.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I have some issues that need clarification or correction.

      (1) On page 3, line 90, the authors state "We found that bath application of Cs+ (1mM)..." but the methods and Figure 1 state "2mM Cs+". Please check and correct.

      Correct, typo corrected.

      (2) Related to Cs+ application, the methods state that "CsMeSO4 (2mM) was bath applied..." Is this correct? CsMeSO4 is typically used intracellularly while CsCl is used extracellularly. If so, please justify. If not, please correct.

      It is correct. The justification for not using CsCl selectively extracellularly is that introducing intracellular chloride ions can significantly alter basic biophysical properties, unrelated to the cesium effect. However, no similar distinction has been made for CsMeSO4, which would exclude the use of this drug extracellularly.

      (3) The authors normalize the current injections by cell capacitance (pA/pF). Was this done because there is a significant variance in cell morphology? A bit of justification for why the authors chose to normalize the current injection this way would help. If there is significant variation in cell capacitance across cells (or developmental ages), the authors could also include these data.

      Indeed, we choose to normalize current injection to cellular capacitance due to the markedly different morphology of deep and superficial L2/3 PCs. Deeper L2/3 PCs have a pronounced apical branch, closely resembling other pyramidal cell types such as L5 PCs, while superficial L2/3 PC lack a thick main apical branch and instead are equipped with multiple, thinner apical dendrites. This morphological variation would yield an inherent bias in several of the reported measurements, therefore we corrected for it by normalizing current injection to cellular capacitance, similar to our previous recent publications (Olah, Goettemoeller et al., 2022, Goettemoeller et al. 2024, Kumar et al. 2024).

      (4) On page 15, line 445, the section heading is "PV cell NEURON modeling". Is this a typo? The models are of L2/3 pyramidal neurons, correct?  

      Correct, typo corrected.

      (5) Figures 3F and 3I are plots of the voltage integral for different inputs before and after Cs+. The y-axis label units are "pA*ms". This should be "mV*ms" for a voltage integral.  

      Correct, typo corrected.

      (6) On page 9, line 273, the text reads "Voltage clamp experiments revealed that the rectification of steady-state voltage responses to hyperpolarizing current injection was amplified with 5-CT (Fig. 7c)". Both the text and Figure 7C describe current clamp, not voltage clamp, recordings. Please check and correct.

      Correct, typo corrected.

      (7) Figure 2i looks to be a normalized conductance vs voltage (i.e. activation) plot. The y-axis shows 0-1 but the units are in nS. Is that a coincidence or an error?

      Correct, typo corrected.

      Reviewer #3 (Recommendations For The Authors):

      This is your paper. My comments are my own opinion, I don't expect you to agree or to respond. But I hope that what I wrote below will help you to understand my perspective.

      Please pardon my directness (and sheer volume) in this section - I have a lot of notes/thoughts and hope you may find some of them helpful. My high-level comments are unfortunately rather critical, and in (small) part that is because I encountered too many errors/typos/ambiguities in figures, legend, and text. I expect many would be caught with good proofreading, but uncorrected caused confusion on my part, or an inability to interpret your figures with confidence, given some ambiguity.

      The paper reads a bit like patchwork - likely a result of many "helpful" reviewers who came before me. Consider starting with and focusing on the synaptic findings, expanding the number of figures and panels dedicated to that, showing example traces for all conditions, and giving yourself the space to portray these complex experiments and results. While I'm not a fan of a large number of supplemental figures, I feel you could move the "extra" results to the supplementals to improve the focus and get right to the meat of it.

      For me, the main concern is that the evidence you present for the non-uniform HCN distribution is rather indirect. Ideally, I'd like to see patch recordings from various dendritic locations (as others have done in rats, at least; I'm not sure if L2/3 mice have had such conductance density measurements made in basal and apical dendrites). Otherwise, perhaps optical mapping, either functional or via staining. I also mention some concerns about the choice of internal and cesium. More generally, I want to see more primary data (traces), in particular for the big synaptic findings (non-uniform, L1-vs-L4 differences, NMDAR).

      We thank the reviewer for the helpful suggestions. Indeed, direct patch clamp recording is widely considered to be the best method to identify dendritic ion channel distribution, however, we choose an in silico approach instead, for several reasons. Undoubtedly, one of the main reasons to omit direct dendritic recordings was that due to the uniquely narrow apical dendrites this method is extremely challenging, with no previous examples in the literature where isolated dendritic outside-out patch recordings were achieved from this cell type. However, there are theoretical considerations as well. In primates, it has been demonstrated that HCN1 channels are concentrated on dendritic spines (Datta et al., 2023) therefore direct outside-out recordings are not adequate in these circumstances. In future experiments we could directly target L2/3 PC dendrites for outside out recordings in order to resolve dendritic nonlinearity distribution, although a cell-attached methodology may be better suited due to the HCN biophysical properties being closely regulated by intracellular signaling pathways.

      The introduction and Figures 1 and 2 are not so interesting and not entirely accurate: L2/3 do not have "abundant" HCN, nor is there an actual controversy about whether they have HCN. It's been clear (published) for years that they have about the same as all other non-PT neocortical pyramidal neurons (see e.g. Larkum 2007; Sheets 2011). Your own Figure 1A has a logarithmic scale and shows L2/3 as having the lowest expression (?) of all pyramidals and roughly 10x lower than L5 PT, but the text says "comparable", which is misleading.

      We thank the reviewer for this comment. Although there are sporadic reports in the literature about the HCN content of L2/3 PCs, most of these publications arrive to the same conclusion from the negligible sag potential (as the mentioned Larkum et al., 2007 publication); namely that L2/3 PCs do not contain significant amount of HCN channels. We have shown with voltage and current clamp recordings that this assumption is false, as sag potential is not a reliable indicator of HCN content in L2/3 PCs. With the term “controversial” we aimed to highlight the different conclusions of functional investigations (e.g. Sheets et al., 2011) and sag potential recordings (e.g. Larkum et al., 2007), regarding the importance of HCN channels in L2/3 PCs.

      Non-uniform HCN with distal lower density has already been published for a (rare) pyramidal neuron in CA1 (Bullis 2007), similar to what you found in L2/3, and different from the main CA1 population.

      We thank the reviewer for this suggestion. We have now included the mentioned citation in the introduction section (page 3).

      Express sag as a ratio or percentage, consistently. Figure out why in Figure 7 the average sag ratio is 0.02 while in Fig. S1 it is 0.07 (for V1) - that is a massive difference.

      The calculation of sag ratio is consistent across the manuscript (at -6pA.pF), except for experiments depicted in Fig. 7 where sag ratio was calculated from -2pA/pF steps. Explanation below:

      Sag should be measured at a common membrane potential, with each neuron receiving a current pulse appropriate to reach that potential. Your approach of capacitance-based may allow for the same, but it is not clear which responses are used to calculate a single sag value per cell (as in Figure 2d).

      Thank you, we now included this info in the methods section. Sag potential was measured at the -6 pA/pF step peak voltage, except for Fig. 7 as noted above. We have now included this discrepancy detail in the methods section (page 14 ). These recordings in Fig. 7 took significantly longer than any other recording in the manuscript, as it took a considerable time to reach steady-state response from 5-CT application. -6pA/pF is a current injection in the range of 400-800 pA, which was proven to be too severe for continued application in cells after more than an hour of recording. Accordingly, we decided to lower the hyperpolarizing current step in these recordings. The absolute value of sag is thus different in Fig. 7, but nonetheless the 5-CT effect was still significant. Notably, we probably wouldn’t have noticed the small sag in L2/3 here (and thus the entire study), save for the fact that we looked at -6pA/pF to begin.

      In a paper focused on HCN, I would have liked to see resonance curves in the passive characterization.

      We thank the reviewer for the suggestion. Resonance curves can indeed provide useful insights into the impact of HCN on a cell’s physiological behavior, however, these experiments are outside the scope of our current manuscript as without in vivo recordings, resonance curves do not contribute to the manuscript in our opinion.

      How did you identify L2/3? Did you target cells in L2 or L3 or in the middle, or did you sample across the full layer width for each condition? A quantitative diagram showing where you patched (soma) and where you stimulated (L1, L4) with actual measurements, would be helpful (supplemental perhaps). You mention in the text that some L2/3 don't have a tuft, suggesting some variability in morphology - some info on this would be useful, i.e. since you did fill at least some of the neurons (eg 3A), how similar/different are the dendritic arbors?

      We sampled the entire L2/3 region during our recordings. It has been published that deep and superficial L2/3  PCs are markedly different in their morphology, and a recent publication (Brandelise et al. 2023) has even separated these two subpopulations to broad-tufted and slender tufted pyramidal cells, which receive distinct subcortical inputs. Although this differentiation opens exciting avenues for future research, examining potential layer gradients in our dataset would warrant significantly higher sample numbers and is currently out of the scope of our manuscript.

      Distal vs proximal: this could use more clarification, considering how central it is to your results. What about a synapse on a basal dendrite, but 150 or 200 um from the soma, is that considered proximal? Is the distance to the soma you report measured along the 3D dendrite, along the 2D dendrite, as a straight line to the soma, or just relative to some layers or cortical markers? (I apologize if I missed this).

      We thank the reviewer for pointing out the missing description in the results section. We have amended this oversight (p15).  Furthermore, although deeper L3 PCs have characteristic apical and basal dendritic branches, when recordings were made from more superficial L2 cells, a large portion of their dendrites extended radially, which made their classification ambiguous. Therefore, we did not use “apical” and “basal” terminology in the paper to avoid confusion. Distances were measured along the 3D reconstructed surface of the recovered pyramidal cells. This information is now included in the methods.

      Line 445, "PV cell NEURON modeling" ... hmm. Everyone re-uses methods sections to some degree, but this is not confidence-inspiring, and also not from a proofreading perspective.

      We have corrected the typo.

      It seems that you constructed a new HCN NEURON mechanism when several have been published/reviewed already. Please explain your reasons or at least comment on the differences.

      There are slight differences in our model compared to previously published models. Nevertheless, we took a previously published HCN model as a base (Gasparini et al, 2004), and created our own model to fit our whole-cell voltage clamp recordings.

      Bath-applied Cs+ can change synaptic transmission (in the hippocampus; Chevaleyre 2002). But also ZD7288 has some such effects. Also, see (Harris 1995) for a Cs+ and ZD7288 comparison. As well as (Harris 1994) for more Cs+ side-effects (it broadens APs, etc). Bath-applied blockers may affect both long-range and local synapses in your recordings, via K-channels or perhaps presynaptic HCN (though I am aware of your Fig. 1e). Since you can do intracellular perfusion, you could apply ZD7288 postsynaptically (Sheets 2011), an elegant solution.

      We thank the reviewer for the suggestion. We were aware of the potential presynaptic effects of cesium (i.e., presynaptic Kv or other channel effects) and did measure PPR after cesium application (Fig. 1h), noting no effect. At Cs<sup>+</sup> concentrations used here, we now also include new data in the results showing no effect on somatically recorded AP waveform (i.e., representative of a Kv channel effect). As stated earlier for reviewer 1, we now performed additional experiments using either cesium or ZD-7288 for comparison (e.g., see updated Fig. 1; Supplementary Figure 1; Fig. 3b-e). Intracellular ZD re-perfusion is an elegant solution which we will absolutely consider in future experiments.

      K-Gluconate is reported to inhibit Ih (Velumian 1997), consider at least some control experiments with a different internal for the main synaptic finding - maybe you'll find no big change ...

      We thank the reviewer for the suggestion. Although K-Gluconate can inhibit HCN current, the use of this intracellular solution is often used in the literature to measure this current (Huang & Trussel 2014). We have chosen this intracellular solution to improve recording stability.  

      (Biel 2009) is a very comprehensive HCN review, you may find it useful.

      We thank the reviewer for bringing this to our attention, we have now included the citation in the introduction.

      "Hidden" in your title seems too much.

      We changed the title to more accurately describe our findings and removed ‘hidden’.

      While I'm glad you didn't record at room temperature, the choice of 30C seems a bit unfortunate - if you go to the trouble to heat the bath, why not at least 34C, which is reasonably standard as an approximation for physiological temperature?

      We thank the reviewer for pointing this out. The choice of 30C was made to approach physiological temperature levels, while preserving the slices for extended amounts of time which is a standard approach. Future experiments in vivo be performed to further understand the naturalistic relevance at ~37C.

      Line 506: do you mean "Hz" here? It's not a frequency, is it? I think it's a unitless ratio?

      Correct, we have amended the typo.

      Line 95: you have not shown that HCN is "essential" for "excess" AP firing.

      We have corrected the phrasing, we agree.

      Fig. 2b,c: is this data from a single example neuron, maybe the same neuron as in 2a? Or from all recorded neurons pooled?

      The data is from several recorded cells pooled.

      Fig. 3 (important figure):

      Why did you not use a paired test for panels e and f? You have the same number of neurons for each condition and the expectation is that you record each neuron in control and then in cesium condition, which would be a paired comparison. Or did you record only 1 condition per neuron?

      This figure presents your main finding (in my opinion). You should show examples of the synaptic responses, i.e. raw traces, for each condition and panel, and overlaid in such a way that the reader can immediately see the relevant comparison - it's worth the space it requires.

      We thank the reviewer for the suggestions. Traces are only overlaid in the paper when they come from the same cell. For Fig. 3d-i, EPSPs in every neuron were evoked in 2-3 different locations (i.e., 1-2 ‘L4’ locations for Type-I and Type-II synapses, and one ‘L1’ location in each) with the same stimulation pipette and one pharmacological condition per cell. Therefore two-sample t-test were used since the control and cesium conditions came from separate cells (i.e., separate observations). This was necessary, as we can never assume that the stimulating electrode can return back to the same synapse after moving it. We were not comfortable with showing overlaid traces from different cells, however, we did show representative traces from control and the Cs<sup>+</sup> conditions in Fig. 3h. Complementary ZD-7288 experiments can be found on panel b and c, where we did perform within-cell pharmacology (and thus used paired t-tests) from one stimulation area/cell. We hope these complementary experiments increase overall confidence as neither pharmacological approach is 100% without off-target effects. We now also included more overlaid traces where appropriate (i.e., Fig. 3b, and in the new  Fig. 3k experiments using within-cell pharmacology comparisons). We do realize these complementary approaches could cause confusion to the reader, and have now done our best to make the slightly different approaches in this Figure clearer in the results section.

      Consider repeating at least some of these critical experiments with ZD7288 instead of Cs+ (and not K-gluc), or even with ZD7288 pipette perfusion, if it's technically feasible here.

      We thank the reviewer for the suggestions. Although many of our recordings using Cs<sup>+</sup> already had complementary experiments (such as synaptic experiments Figure 3e vs Figure 3b), we recognize the need to extend the manuscript with more ZD-7288 experiments. We have now extended Figure 1 with three panels (Figure 1 c,d,e), which recapitulates a fundamental finding, the change in overall excitability upon HCN channel blockade, using ZD-7288 as well.

      Fig. 3a, why show a schematic (and weirdly scaled) stimulating electrode? Don't you have a BF photo showing the actual stimulating electrode, which you could trace to scale or overlay? Could you use this panel to indicate what counts as "distal" and what as "proximal", visually?

      The stimulating electrode was unfortunately not filled with florescent materials, therefore it was not captured during the z-stack.

      Fig. 3b: is the y-axis labeled correctly? A "100% change" would mean a doubling, but based on the data points here I think y=100% means "no change"?

      The scale is labeled correctly, 100% means doubling.

      Fig. 3b, c: again, show traces representing distal and proximal, not just one example (without telling us how far it was). And use those traces to illustrate the half-width measurement, which may be non-trivial.

      We have extended Figure 3b with an inset showing the effect of ZD-7288 on a proximal stimulating site. The legend now includes additional information indicating stimulating location 28 µm away from the soma in control conditions (black trace) and upon Z-7288 application (green trace).  

      Line 543, 549: it seems you swapped labels "h" and "i"?

      Typo corrected.

      Fig. 4b: to me, MK-801 only *partially* blocks amplification, but in the text L198 you write "abolish".

      We thank the reviewer for pointing this out. Indeed, there are several other subthreshold mechanisms that are still intact after pipette perfusion, which can cause amplification. We have now clarified this in the text (p7).

      Fig. 4e,f: what is the message? Uniform NMDAR? The red asterisk in (e) is at a proximal/distal ratio of roughly 1. I don't understand the meaning of the asterisk (the legend is too basic) and I'm surprised to see a ratio of 1 as the best fit, and also that the red asterisk is at a dendritic distance of 0 um in (f). This could use more explanation (if you feel it's relevant).

      We thank the reviewer for pointing this out. We have now included a better explanation in the results and figure legend. We have also updated the figure to make it clearer and added model traces in Fig. 4f, which correspond to example data from slices in Fig. 4g (both green). The graph suggests nonuniform, proximally abundant NMDA distribution. The color coding corresponds to the proximal EPSP halfwidth divided by distal EPSP halfwidth. It is true that the dendritic distance ‘center’ was best-fit very close to the soma, but also note the dispersion (distribution) half-width was >150mm, so there is quite a significant dendritic spread despite the proximal bias prediction. Based on this model there is likely NMDA spread throughout the entire dendrite, but biased proximally. Naturally, future work will need to map this at the spine level so this is currently an oversimplification. Nonetheless, a proximal NMDA bias was necessary to recapitulate findings from Fig. 3, and additional slice recordings in Fig. 4 were consistent with this interpretation.

      Fig. 4g: I feel your choice of which traces to overlay is focusing on the wrong question. As the reader, what I want to see here is an overlay of all 4 conditions for one pathway. If this is a sequential recording in a single cell (Cs, Cs+MK801, wash out Cs, MK801), then the overlay would be ideal and need not be scaled. Otherwise, you can scale it. But the L1/L4 comparison does not seem appropriate to me. I find myself trying to imagine what all the dark lines would look like overlaid, and all the light lines overlaid separately. Also, the time axis is missing from this panel. Consider a subtraction of traces (if appropriate).

      In these recordings, all EPSPs cells were measured using a stimulating electrode that was moved between L1 and L4 (only once, to keep the exact input consistent) to measure the different inputs in a single neuron. In separate sets of experiments, the same method was used but in the presence of Cs<sup>+</sup>, Cs<sup>+</sup> + MK-801, or MK-801 alone. This was the most controlled method in our hands for this type of approach, as drug wash outs were either impractical or not possible.  Overlaying four traces would have presented a more cluttered image, and were not actually performed experimentally. As our aim was to resolve the proximal-distal halfwidth relationship, therefore we deemed the within-cell L1 vs. L4 comparison appropriate. We have nonetheless added model traces in Fig. 4f, which correspond to example data from slices in Fig. 4g (both green). The bar graphs should serve also serve to illustrate the input-specific  relationship- i.e., that the only time the L1 and L4 EPSP relationship was inverted was in the presence of Cs<sup>+</sup> (green bars) and that this effect was occluded with simultaneous MK-801 in the pipette (red bars).

      Line 579: should "hyperpolarized" be depolarized?

      Corrected

      Fig. 5a: it looks like the HCN density is high in the most basal dendrites (black curve above), then drops towards the soma, then rises again in the apicals (red curve). Is that indeed how the density was modeled? If so, this is completely at odds with the impression I received from reading your text and experimental data - there, "proximal" seems to mean where the L4 axons are, and "distal" seems to mean where the L1 axons are, in other words, high HCN towards the pia and low HCN towards the white matter. But this diagram suggests a biphasic hill-valley-hill distribution of HCN (meaning there is a second "distal" region below the soma). In that case, would the laterally-distant basal dendrites also be considered distal? How does the model implement the distribution - is it 1D, 2D or 3D? As you can probably tell, this figure raised more questions for me and made me wonder why I don't have a better understanding yet of your definitions.

      We thank the reviewer for pointing this out. We agree our initial cartoon of the parameter fitting procedure was not accurate and should have just been depicted a single ‘curve’. We have now simplified it to better demonstrate what the model is testing, and also made the terms more consistent and accurate. There is no ‘second’ region in the model. We hope this better illustrates it now. We also edited the legend to be clearer. Because the model description in Fig. 4d suffered from similar shortcomings, we also modified it accordingly as well as the figure legend there.

      Fig. 5b: why is the best fit at a proximal/distal ratio of 1, yet sigma is 50 um?

      Proximal/distal bias on this figure was fitted to 0.985 (prox/distal ratio) as we modeled control conditions, with intact NDMA and HCN channels,  which closely approximated the control recording comparisons.

      Fig. 6h, Line 662: "vs CsMeSO4 ... for putative LGN events" The panel shows proximal vs distal, not control vs Cs+. What's going on here?

      Typo corrected.

      Fig. 7e: the ctrl sag ratio here averages 0.02, while in Fig. S1 the average (for V1 and others) is about 0.07.  Please refer to our answer given to the previous question regarding sag ratio measurements. Briefly, recordings made with 5-CT application were made using a less severe, -2 pA/pF current injection to test seg responses. This more modest hyperpolarization activated less HCN channels, therefore the sag ratio is lower compared to previously reported datapoints.

      We have included this explanation in the methods section (page 14)

      Now hear you are using a paired test for this pharmacology, but you didn't previously (see my earlier comments/questions).

      Paired t-test were used for these experiments as these control and test datapoints came from the same cell. Cells were recorded in control conditions, and after drug application.

      Line 137: single-axon activation: but cortical axons make multi-synaptic contacts, at least for certain types of pre- and post-synaptic neurons, and (e.g. in L5-L5 pairs) those contacts can be distributed across the entire dendritic arbor. In other words, it's possible that when you stimulate in L1, you activate local axons, and the signal could then propagate to multiple synaptic contact locations, some being distal and some proximal. Maybe you have reasons to believe you're able to avoid this?

      We thank the reviewer for this question. Cortical axons often make distributed contacts, however, top-down and bottom-up pathways innervating L2/3 PCs are at least somewhat restricted to L2/3/L4 and L1, respectively (Shen et al. 2022, Sermet et al. 2019). Therefore, due to the lack evidence suggesting a heavily mixed topographical distribution for top-down and bottom-up inputs, we have reason to believe that L1 stimulation will result in mainly distal input recruitment, while L4 stimulation will mainly excite proximal dendritic regions. The resolution of our experiments was also improved by the minimal stimulation and visual guidance (subset of experiments) of the stimulation. Furthermore, new optogenetic experiments stimulating LGN and LM axons, which have been anatomically defined previously as biased to deeper layers and L1, respectively, were now also performed (Fig. 3j-l) with analogous cesium effects as our local electrical stimulation experiments. Future work using varying optogenetic stimulation parameters will expand on this.

      L140: "previous reports" ==> citation needed.

      We have inserted the citation needed.

      L149: "arriving to layer 1"; but I think earlier you noted that some or many L2/3 neurons lack a dendritic tuft; do they all nevertheless have dendrites in L1? Note that cortico-cortical long-range axons still need to pass through all cortical layers on their way up to L1.

      We thank the reviewer for the question. Although the more superficial L2/3 PCs lack distinct apical tuft, their dendrites reach the pia similarly to deeper L2/3 PCs. All of our recorded and post-hoc recovered cells had dendrites in L1, except in cases where they were clearly cut during the slicing procedure, which cells were occluded from the study.

      When you write "L4 axons" or "L4 inputs", do you specifically mean long-range thalamic axons? Or axons from local L4 neurons? What about axons in L4 that originate from L5 pyramidal neurons?

      In case of ‘L4’ axons, we cannot disambiguate these inputs a priori, as they are both part of the bottom-up pathway, and are possibly experimentally indistinguishable. Even with restricted opto LGN stimulation, disynaptic inputs via L4 PCs cannot be completely ruled out under our conditions. On the other hand, the probability of L5 PC axons to terminate on L2/3 PCs is exceedingly low (single reported connection out of 1145 potential connections; Hage et al. 2022). We did find two clearly different synaptic subpopulations (Supp. Fig 3) in L4- which was tempting to classify as one or the other. However we felt there was not enough evidence in the literature as well as our additional optogenetic experiments to make a classification on the source of these different L4 inputs. Thus we deemed them as Type-I or Type-II for now.

      Do you inject more holding current to compensate for the resting membrane potential when Cs+ or ZD7288 is in the bath?

      We thank the reviewer for the question. We did not inject a compensatory current, as we wanted to investigate the dual, physiologically relevant action of HCN channels (George et al. 2009)

      I'd like to see distributions (histograms) of L4 and L1 EPSP amplitudes, under control conditions and ideally also under HCN block.

      We have now extended the manuscript with a supplementary figure (Supplementary Figure 6) to show that EPSP peak was not distance dependent in control conditions, and there was no relationship between peak and halfwidth in our dataset.

      Line 186, custom pipette perfusion: why not use this for internal ZD7288, to make it cell-specific?

      We thank the reviewer for the question, this is a good point. In future work we will consider this when applicable. It is certainly a way to control for bath application confounds in many ways.

      L205: "recapitulate our experimental findings" - which findings do you mean? I think a bit of explanation/referencing would help.

      Corrected.

      Line 210: L4-evoked were narrower than L1-evoked: is this not expected based on filtering?

      We thank the reviewer for pointing this out, the word “Intriguingly” has been omitted.

      Line 231 and 235: "in L5 PCs" should be restricted to L5 PT-type PCs.

      We have corrected this throughout the manuscript.

      Neuromodulation, Fig. 7, L263-282: the neuromodulation finding is interesting. However, a bit like the developmental figure, it feels "tacked on" and the transition feels a bit awkward. I think you may want to discuss/cite more of the existing literature on neuromodulatory interactions with HCN (not just L2/3). Most importantly, what I feel is missing is a connection to your main finding, namely L1 and L4 inputs. Does serotonergic neuromodulation put L1 and L4 back on equal footing, or does it exaggerate the differences?

      We thank the reviewer for the question. We agree with the reviewer that Figure 7 does not give a complete picture about how the adult brain can capitalize on this channel distribution, as our intention was to show that HCN channels are not a stationary feature of L2/3 PC, but a feature which can be regulated developmentally and even in the adult brain via neuromodulation. In other words, the subthreshold NMDA boosting we observed can be gated by HCN, depending on developmental stage and/or neuromodulatory state of the system. We have now added some brief language to better introduce the transition and its relevance to the current study in the results (p8), and discussed the implications in the discussion section of the original manuscript.

      General comment: different types/sources of synapses may have different EPSP kinetics. I feel this is not mentioned/discussed adequately, considering your emphasis on EPSPs/HCN.

      See points above on input-specific synaptic diversity.

      Line 319/320: enriched distal HCN is found in L5 PT-type, not in all L5 PCs.

      Corrected

      L320: CA1 reportedly has a subset of pyramidal neurons that have higher proximal HCN than distal (I gave the citation above). In light of that, I think "unprecedented" is an overstatement.

      Corrected.

      Methods:

      L367: What form of anesthesia was used?

      Amended.

      Which brain areas, and how?

      Amended.

      Why did you first hold slices at 34C, but during recording hold at 30C?

      We held the slices at 34C to accelerate the degradation of superficial damaged parts of the slice, which is in line with currently used acute slice preparation methodologies, regardless of the subsequent recording temperature.

      Pipette resistance/tip size?

      Amended.

      Cell-attached recordings (L385): provide details of recordings. What was the command potential (fixed value, or did you adjust it per neuron by some criteria)?

      Amended.

      What type of stimulating electrode did you use? If glass, what solution is inside, and what tip size?

      We thank the reviewer for pointing these out, the specific points were added to the methods section.

      L392/393: you adjusted the holding (bias) current to sit at -80 mV. What were the range and max values of holding current? Was -80 mV the "raw" potential, or did it account for liquid junction? If you did not account for liquid junction potential, then would -80 in your hands effectively be between -95 and -90 mV? That seems unusually hyperpolarized.

      All cells were held with bias holding currents between -50 pA and 150 pA. To be clear, as mentioned below, we did not change the bias current after any drug applications. We did not correct for liquid junction potential, and cells were ‘held‘ with bias current at -80 mV as during our recordings, as 1) this value was apparently close to the RMP (i.e. little bias current needed at this voltage on average) (Fig. 2e) and 2) to keep consistent conditions across recordings. The uncorrected -80 mV is in the range of previously reported membrane potential values both in vivo and in vitro (Svoboda et al. 1999, Oswald et al. 2008, Luo et al. 2017), which found the (corrected) RMP to be below -80mV. Naturally this will not reflect every in vivo condition completely and further investigation using naturalistic conditions in the future are warranted.  

      Did you adjust the bias current during/after pharmacology?

      Bias current was not adjusted in order to resolve the effect on resting membrane potential.

      L398: sag calculation could use better explanation: how did you combine/analyze multiple steps from a single neuron when calculating sag? Did you choose one level (how) or did you average across step sizes or ...?

      Sag ratio was measured at -6 pA/pF current step except for one set of experiments in Fig. 7. Methods section was amended.

      L400, 401: 10 uM Alexa-594 or 30 um Alexa-594, which is correct?

      10 µM is correct, typo was corrected

      L445: "PV cell" seems like a typo?

      Typo is corrected.

      L450: "altered", please describe the algorithm or manual process.

      Alterations were made manually.

      L474: NDMA, typo.

      Typo is fixed.

      L474: "were adjusted", again please describe the process.

      Adjustments were made by a grid-search algorithm.

      Biel, M., Wahl-Schott, C., Michalakis, S., & Zong, X. (2009). Hyperpolarization-activated cation channels: from genes to function. Physiological reviews, 89(3), 847-885. https://journals.physiology.org/doi/full/10.1152/physrev.00029.2008 - (very comprehensive review of HCN)

      Bullis JB, Jones TD, Poolos NP. Reversed somatodendritic I(h) gradient in a class of rat hippocampal neurons with pyramidal morphology. J Physiol. 2007 Mar 1;579(Pt 2):431-43. doi: 10.1113/jphysiol.2006.123836. Epub 2006 Dec 21. PMID: 17185334; PMCID: PMC2075407. https://physoc.onlinelibrary.wiley.com/doi/full/10.1113/jphysiol.2006.123836 - (CA1 subset (PLPs) have a reversed HCN gradient; cell-attached patches, NMDAR)

      Velumian AA, Zhang L, Pennefather P, Carlen PL. Reversible inhibition of IK, IAHP, Ih, and ICa currents by internally applied gluconate in rat hippocampal pyramidal neurones. Pflugers Arch. 1997 Jan;433(3):343-50. doi: 10.1007/s004240050286. PMID: 9064651. https://link.springer.com/article/10.1007/s004240050286 - (K-Gluc internal inhibits HCN)

      Sheets, P. L., Suter, B. A., Kiritani, T., Chan, C. S., Surmeier, D. J., & Shepherd, G. M. (2011). Corticospinal-specific HCN expression in mouse motor cortex: I h-dependent synaptic integration as a candidate microcircuit mechanism involved in motor control. Journal of neurophysiology, 106(5), 2216-2231. https://journals.physiology.org/doi/full/10.1152/jn.00232.2011 - (L2/3 IT have same sag ratio as all other non-PT pyramidals, roughly 5% (vs 20% PT); intracellular ZD7288 used at 10 or 25 um)

      Harris NC, Constanti A. Mechanism of block by ZD 7288 of the hyperpolarization-activated inward rectifying current in guinea pig substantia nigra neurons in vitro. J Neurophysiol. 1995 Dec;74(6):2366-78. doi: 10.1152/jn.1995.74.6.2366. PMID: 8747199. https://journals.physiology.org/doi/abs/10.1152/jn.1995.74.6.2366 - (comparison Cs+ and ZD7288)

      Harris, N. C., Libri, V., & Constanti, A. (1994). Selective blockade of the hyperpolarization-activated cationic current (Ih) in guinea pig substantia nigra pars compacta neurones by a novel bradycardic agent, Zeneca ZM 227189. Neuroscience letters, 176(2), 221-225. https://www.sciencedirect.com/science/article/abs/pii/0304394094900876 - (Cs+ is not HCN-selective; it also broadens APs, reduces the AHP)

      Chevaleyre, V., & Castillo, P. E. (2002). Assessing the role of Ih channels in synaptic transmission and mossy fiber LTP. Proceedings of the National Academy of Sciences, 99(14), 9538-9543. https://pnas.org/doi/abs/10.1073/pnas.142213199 - (Cs+ blocks K channels, increases transmitter release; but also ZD7288 affects synaptic transmission)

      Thank you

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      Early-life adversity or stress can enhance stress susceptibility by causing changes in emotion, cognition, and reward-seeking behaviors. This important manuscript highlights the involvement of lateral amygdala astrocytes in fear generalization and the associated synaptic plasticity, which are parallel to the effects of early life stress. With an elegant combination of behavioral models, morphological and functional assessments using immunostaining, electrophysiology, and viral-mediated loss-of-function approaches, the authors provide solid correlational and causal evidence that is consistent with the hypothesis that early life stress produces neural and behavioral dysfunction via perturbing lateral amygdala astrocytic function.

      We would like to thank the authors and editors for taking the time to review our work, and re-review it now. Also, we are grateful for this very positive assessment of our work. In this revised manuscript we made a strong effort to address comments made by all reviewers, providing clarification where required and new data to our manuscript in order to further support our observations.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript asks the question of whether astrocytes contribute to behavioral deficits triggered by early life stress. This question is tested by experiments that monitor the effects of early life stress on anxiety-like behaviors, long-term potentiation in the lateral amygdala, and immunohistochemistry of astrocyte-specific (GFAP, Cx43, GLT-1) and general activity (c-Fos ) markers. Secondarily, astrocyte activity in the lateral amygdala is impaired by viruses that suppress gap-junction coupling or reduce astrocyte Ca2+ followed by behavioral, synaptic plasticity, and c-Fos staining. Early life stress is found to reduce the expression of GFAP and Cx43 and to induce translocation of the glucocorticoid receptor to astrocytic nuclei. Both early life stress and astrocyte manipulations are found to result in the generalization of fear to neutral auditory cues. All of the experiments are done well with appropriate statistics and control groups. The manuscript is very well-written and the data are presented clearly. The authors' conclusion that lateral amygdala astrocytes regulate amygdala-dependent behaviors is strongly supported by the data. However, the extent to which astrocytes contribute to behavioral and neuronal consequences of early life stress remains open to debate.

      Strengths:

      A strong combination of behavioral, electrophysiology, and immunostaining approaches is utilized and possible sex differences in behavioral data are considered. The experiments clearly demonstrate that disruption of astrocyte networks or reduction of astrocyte Ca2+ provokes generalization of fear and impairs long-term potentiation in the lateral amygdala. The provocative finding that astrocyte dysfunction accounts for a subset of behavioral effects of early life stress (e.g. not elevated plus or distance traveled observations) is also perceived as a strength.

      Weaknesses:

      The main weakness is the absence of more direct evidence that behavioral and neuronal plasticity after early life stress can be attributed to astrocytes. It remains unknown what would happen if astrocyte activity were disrupted concurrently with early life stress or if the facilitation of astrocyte Ca2+ would attenuate early life stress outcomes. As is, the only evidence that early life stress involves astrocytes is nuclear translocation of GR and downregulation of GFAP and Cx43 in Figure 3 which may or may not provoke astrocyte Ca2+ or astrocyte network activity changes.

      We would like to thank the reviewer for their constructive feedback on our work. In the revised version we have added new experiments that further support a role of astrocytes in ELS-induced behavioural dysfunction. Specifically, we carried out two-photon calcium imaging in lateral amygdala astrocytes using viral overexpression of membrane tethered GCaMP6f. These experiments revealed a decrease in astrocyte calcium activity following ELS (Figure 4). Interestingly these data also showed an important number of sex differences (Figure 4 - Figure supplement 1).

      These new data allow us to strengthen the link between ELS-induced astrocyte hypofunction and behavioural changes. Indeed, we validated the impact of CalEx on astrocyte calcium activity in the lateral amygdala, again using two-photon microscopy, and show that CalEx resulted in an astrocyte calcium signature that very closely resembled that of ELS, i.e. reduced frequency and amplitude of events (Figure 5 - Figure supplement 2). As such, we feel like these data, while still correlative in nature, strengthen our findings and conclusion that astrocyte dysfunction alone is sufficient to recapitulate the effects of stress on excitability, synaptic function, and behaviour.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Guayasamin et al. show that early-life stress (ELS) can induce a shift in fear generalisation in mice. They took advantage of a fear conditioning paradigm followed by a discrimination test and complemented learning and memory findings with measurements for anxiety-like behaviors. Next, astrocytic dysfunction in the lateral amygdala was investigated at the cellular level by combining staining for c-Fos with astrocyte-related proteins. Changes in excitatory neurotransmission were observed in acute brains slices after ELS suggesting impaired communication between neurons and astrocytes. To confirm the causality of astrocytic-neuronal dysfunction in behavioral changes, viral manipulations were performed in unstressed mice. Occlusion of functional coupling with a dominant negative construct for gap junction connexin 43 or reduction in astrocytic calcium with CalEx mimicked the behavioral changes observed after ELS suggesting that dysfunction of the astrocytic network underlies ELS-induced memory impairments.

      Strengths:

      Overall, this well-written manuscript highlights a key role for astrocytes in regulating stress-induced behavioral and synaptic deficits in the lateral amygdala in the context of ELS. Results are innovative, and methodological approaches relevant to decipher the role of astrocytes in behaviors. As mentioned by the authors, non-neuronal cells are receiving increasing attention in the neuroscience, stress, and psychiatry fields.

      Weaknesses:

      I do have several suggestions and comments to address that I believe will improve the clarity and impact of the work. For example, there is currently a lack of information on the timeline for behavioral experiments, tissue collection, etc.

      We thank the reviewer for their kind comments and constructive feedback on our manuscript. We agree that certain aspects could have been made more clear and we have revised the manuscript and figures to be more explicit regarding timelines. Including the addition of timelines on figures and improved clarity in the text where possible. We have also addressed the private comments provided by the reviewers alluded to in this public review.

      Reviewer #3 (Public Review):

      Summary

      The authors show that ELS induces a number of brain and behavioral changes in the adult lateral amygdala. These changes include enduring astrocytic dysfunction, and inducing astrocytic dysfunction via genetic interventions is sufficient to phenocopy the behavioral and neural phenotypes. This suggests that astrocyte dysfunction may play a causal role in ELS-associated pathologies.

      Strengths:

      A strength is the shift in focus to astrocytes to understand how ELS alters adult behavior.

      Weaknesses:

      The mechanistic links between some of the correlates - altered astrocytic function, changes in neural excitability, and synaptic plasticity in the lateral amygdala and behaviour - are underdeveloped.

      We thank the reviewer for their comments. We are happy that they found our shift in focus towards astrocytes to be a strength of our work. Regarding mechanistic links being underdeveloped, we have attempted to address this by placing more effort into understanding the functional changes in astrocytes and how this relates to behaviour.

      To address this comment we have used two-photon calcium imaging to quantify the impact of ELS on astrocyte calcium activity. As such, the revised manuscript contains several new figures including a detailed characterisation of the effects of ELS on astrocyte calcium activity (Figure 4), including sex differences in naive and the effects of stress (Figure 4 - Figure supplement 1), and an important validation of the impact of CalEx on astrocyte calcium activity. CalEx mirrors the impact of stress on astrocyte calcium activity reducing the frequency and amplitude of individual events (Figure 5 - Figure supplement 2).

      Considering the strong overlap of the effects of ELS and CalEx on synapses, excitability, behaviour, and now astrocyte calcium activity, we hope that this added detail addresses some of the points highlighted by the reviewer.

      Recommendations for the authors:

      The reviewers all agree on one major issue for the authors to address. There is a bit of a lack of mechanistic linking between the astrocyte function and the early life stress and these data are more correlational than causal in nature. This could either be addressed by scaling back the data interpretation and title to be more reflective of the data at hand or if the authors would consider, doing the causal experiment of examining the manipulation of astrocyte activity following early life stress to see if this does influence the phenotype.

      We agree with reviewers on this issue and realise that we have overstated our findings somewhat. As an immediate fix, suggested by reviewers, we have changed the title to more closely align with our data stating that astrocyte dysfunction is “associated with” rather than “induces” as well as adjusting our interpretations.

      In addition to this one major comment, there are a list of minor comments that the authors should consider to improve the manuscript.

      (1) A major caveat is the lack of information on the timeline for behavioral experiments, tissue collection, etc. The authors mention "Mice between ages P45-70' but considering the developmental changes occurring between late adolescence and young adulthood, I recommend adding timelines on all Figures clearly indicating when behavioral tests were performed, and tissue collected for electrophysiology or immunostaining. With corticosterone (CORT) back at baseline at P70 vs a difference observed at P45 was this time point favored? It should be clarified throughout.

      We apologise for the lack of clarity on this and have added more timelines on figures.

      The age range favoured (p45-p70), relates to adolescence a time when latent psychiatric disorders tend to manifest in humans following early-life adversity. We have clarified this choice in the text.

      (2) Given the transient increase in corticosterone levels in early-life stress mice, peaking at P45 and declining to control levels by P70, it would be informative to know whether the reported behavioral and synaptic changes differ within this time window. This may not be doable in the current approach, but this should be addressed nonetheless. Furthermore, it wasn't clear why the increase in blood corticosterone was delayed. Was this expected? How does this relate to earlier work? Wouldn't it be expected to be elevated at P17 (end of ELS period)?

      We agree that this observation was very unexpected. Initially, we expected CORT to be elevated at P17, end of ELS period. We believe that low CORT levels during the ELS paradigm can be attributed to this paradigm coinciding with the stress hyporesponsive period (SHRP) which in rodents lasts until roughly postnatal day 14. During this period, mild stressors fail to elicit CORT responses. Considering our ELS paradigm lasts from P10-P17, there is a significant overlap with the SHRP.

      This point is now included in the discussion with several citations regarding this biological phenomenon, as well as other studies that report similar findings to our own, i.e. a delayed increase in blood corticosterone levels following early-life stress.

      (3) It is mentioned that behavioral tests were performed in both sexes with no sex differences observed. Were animals of both sexes also included in other experiments (ephys, immunostaining, blood CORT analysis)? Behavioral outcomes could be the same but underlying biological processes different. This is a topic that should be discussed. Identification of males vs females on graphs would be helpful.

      We apologise for not having provided this data in the previous version of the manuscript. In the revised manuscript we provide analysis of sex differences for our initial behavioural observations (Figure 2 - Figure supplement 1), c-Fos (Figure 2 - Figure supplement 2), for GFAP and Cx43 (Figure 3 - Figure supplement 1), calcium signalling (Figure 4 - Figure supplement 1), and for CalEx and dnCx43 experiments across behaviour (Figure 5 - Figure supplement 4) and c-Fos (Figure 5 - Figure supplement 5).

      (4) How long-lasting are the generalization phenotypes? Do they outlast the transient increase in blood corticosterone? Showing this would provide a more solid foundation for future explorations.

      The reviewers raise a very important point. It remains unclear as to how long these effects last and this is something we are keen to address in future studies, with careful experiments designed to explicitly test this question, as well as subsequent questions regarding whether long-lasting effects are due to impaired brain development or whether these effects emerge due to CORT changes, or other changes, or a combination of them all?

      As an aside, an additional manuscript from our lab (Depaauw-Holt et al. 2024 bioRxiv) which uses the same stressor but focuses on distinct brain regions and behaviours uses a prolonged time window in which the effects of stress are readily observable all the way to P90.

      So while we do provide the answers in this work, it is a really great idea that we would like to follow up subsequently.

      (5) With the ELS-induced change in locomotion, I would recommend presenting open field (center, periphery) and elevated plus maze (open, closed arms) data independently. It could also be interesting to analyze corner time in the open field as well as center time in the elevated plus maze.

      We now provide data for the open field and elevated plus maze as requested. Our findings remain unchanged, but we agree with the reviewer that this way of representing the data is more clear.

      (6) For Figure 2C, the ideal stats would be an ANOVA with CS (+/-) as a within-subject variable and treatment (naive/ELS) as a between-subjects variable. Then the best support for the generalization claim would be a CS x treatment interaction. I encourage the authors to do these stats. I note that this point is mitigated by the discrimination analysis presented in 2D (where they compare naive and ELS groups directly).

      We have carried out the analysis as requested and these data further support the notion of fear generalisation in ELS mice (Figure 2 - Figure supplement 2A, B). Additionally, the analyses are included in a supplementary table. We hope that we have understood correctly, and this figure accurately reflects the reviewer’s suggestion.

      (7) In Figure 2H, why not evaluate c-Fos levels after the discrimination test which is the main behavioral outcome? This statement in the Discussion should be modified if, as per my understanding, c-Fos was measured after the fear paradigm only "We find that both ELS and astrocyte dysfunction both enhance neuronal excitability, assessed by local c-Fos staining in the lateral amygdala following auditory discriminative fear conditioning. One interpretation of these data is that astrocytes might tune engram formation, with astrocyte dysfunction, genetically or after stress, increasing c-Fos expression resulting in a loss of specificity of the memory trace and generalisation of fear.'

      We agree that further evaluation of c-Fos levels following the discrimination test would be insightful. We honestly did not consider this time point in our initial experimental design, as we considered previous reports in the literature that investigated how the numbers of cells recruited to the engram (c-Fos density) could influence memory accuracy at a later time point. As such, investigating c-Fos levels following training was our initial target. We have modified the text to be more explicit in our experimental approach.

      This is nevertheless a fascinating point that we are keen to pursue in future studies.

      (8) Some thoughts on why dnCx43 suppression of astrocyte network activity is less effective at inducing fear generalization than CalEx suppression of astrocyte Ca2+ are warranted. One might predict that both manipulations should result in similar effects, as seen in fEPSP and cFos data in Figure 4.

      We agree that this is an interesting observation and the fact we did not observe the same behavioural phenotype despite fEPSP and c-Fos data to be the same is puzzling.

      Nevertheless, we do see increased fear generalisation in both dnCx43 and CalEx. We hypothesise that CalEx had a more profound effect due to the wide range of processes that are presumably affected by reduced astrocyte calcium activity, whereas blocking gap junction channels still leaves a large number of astrocyte functions intact.

      Overall, our conclusion is that behaviour is a more sensitive assay compared to the cellular phenotypes, which highlights the importance of answering these questions from multiple angles.

      (9) Ideally changes in functional coupling following the dnCx43 manipulation) should be shown here (line 169).

      We, unfortunately, did not directly evaluate functional coupling in dnCx43 mice in this manuscript. This would have been a useful experiment, but we rely on our previous data where we extensively characterised this tool (Murphy-Royal et al. 2020 Nat Comms).

      (10) It would be relevant to perform c-Fos staining with markers for astrocytes or neuronal cells. Is an increase in activity expected for both cell types?

      This is a fascinating question, given recent work on this topic showing that astrocytes can indeed express c-Fos and may be recruited into engrams. We analysed our existing tissue, we found that indeed astrocytes were labelled with c-Fos following our behavioural conditioning paradigm. Our data align with recent reports, and we demonstrate a small percentage of astrocytes expressing c-Fos (Figure 2 - figure supplement 3). This modest number of astrocytes expressing c-Fos is discussed in the text and placed into context of very recent papers that have been published since our submission to eLife.

      (11) Were the same mice subjected to behavior analysis than immunostaining?

      We generated separate cohorts of mice for immunostaining and behaviour, and have made this more clear in the text.

      (12) Language describing learning paradigm. The CS+ (line 73) isn't in itself aversive (and shouldn't be described as such). It acquires that value after pairing with the US (which is aversive).

      We agree that this is poorly worded and have modified the text from “aversive cue” to “conditioned cue”.

      (13) It is hard to appreciate the glucocorticoid receptor translocation with the images provided. Would it be possible to increase magnification or at least, provide small inserts at higher magnification?

      We have re-imaged our brain sections to get more detailed images. These are provided in revised manuscript (Figure 3)

      (14) For the viral injection experiment, for how long is the virus expressed before running behavior/recording/c-Fos staining? Is the age of the tested mice the same as Figures1-3 or they were injected at P45 and tested weeks later?

      We age-matched all mice for all experiments and tried to keep our experimental window as tight as possible (p45-70). All mice were injected at P25-30 in order to meet the experimental time window. To be more precise we have added timelines on all figures.

      (15) A validation of the virus is missing to confirm the reduction of Cx43 expression at mRNA and protein levels when compared to controls. A reference is provided but to my understanding age of the animals might be different.

      Here, I believe the reviewer is referring to dnCx43. In this experiment we used a viral approach to overexpress a non-functional connexin 43 protein (Murphy-Royal et al. 2020 Nat Comms). As such, a PCR or immuno against this protein would be expected to reveal higher expression levels. We have tried to clarify this approach in the text.

      It is true that we did not fully characterise this tool in the lateral amygdala which would have been useful but considering our extensive experience with this tool and in it’s development with our collaborators Baljit Khakh, Randy Stout, David Spray (see Murphy-Royal et al. 2020) we are confident in these data, despite the limitation of validation in this manuscript.

      (16) Same comment for the CalEx, a validation would be appreciated. Based on Yu et al. could a GCaMP6f virus be more appropriate as control?

      We agree this is an important experiment as our lab has not fully validated this tool in house (compared to dnCx43, which we previously validated).

      Importantly, we now have the capacity to do these experiments. Until very recently our two-photon microscope was not fully functional due to dodgy PMTs sent from the company we purchased our equipment from… Troubleshooting this issue took many months before we were convinced that we were not at fault and that the problem was the equipment.

      As such, mice were injected with both a membrane tethered GCaMP6f under the control of the short GFAP promoter - AAV2/5-gfaABC1D-lck-GCaMP6f and CalEx - AAV2/5-gfaABC1D-hPMCA2w/b-mCherry. Using this approach we were able to record calcium activity from CalEx positive and CalEx negative astrocytes in the same tissue (Figure 5 - figure supplement 2).

      We report that this approach does indeed reduce astrocyte calcium but does not entirely eliminate it. In fact, CalEx expressing astrocytes displayed similar calcium activity dynamics to that we observed following ELS. Together, this further strengthens our rationale to use CalEx in order to mimic the effects of stress on astrocytes, and determine downstream effects on excitability, synapses, and behaviour.

      (17) Have previous studies found ELS--> generalization phenotypes in adulthood? If so, these should be discussed in more detail. If not, perhaps this point can be made more explicit.

      This is a great point. After looking deeper into the literature in more depth we found an example of this in which ELS resulted in context fear generalisation in adult rats. This work is cited in the discussion in the context of our findings.

      (18) A paper by Krugers et al (Biol Psychiatry 2020) seems especially relevant (glucocorticoids, fear generalization, engram size) and should be discussed.

      Thank you for bringing this work to our attention. This is certainly important work that we had unfortunately overlooked. We have added a citation and discussed the manuscript Lesuis et al. Biol. Psychiatry 2021, which contains the data discussed in the conference proceeding by Krugers et al. Biol. Psychiatry 2020.

      Additionally, we added another great manuscript by Lesuis et al. recently published in Cell in which they investigated the cellular mechanisms by which acute stress results in fear generalisation via endocannabinoids.

      (19) Minor text revisions are necessary at lines 101 and 264 as well as p.5, line 58: "ratio" and p.10, line 128: "region of interest".

      Thank you for pointing out these typos and errors. We have corrected them.

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The modeling and experimental work described provide solid evidence that this model is capable of qualitatively predicting alterations to the swing and stance phase durations during locomotion at different speeds on intact or split-belt treadmills, but a revision of the figures to overlay the model predictions with the experimental data would facilitate the assessment of this qualitative agreement. This paper will interest neuroscientists studying vertebrate motor systems, including researchers investigating motor dysfunction after spinal cord injury.

      Figures showing the overlay of the experimental data with the modeling predictions have been included as figure supplements for Figures 5-7. This highlights how accurate the model predictions were.

      Public Reviews:

      Reviewer #1 (Public review):

      We thank the reviewer for the positive evaluation of our paper and emphasizing its strengths in the Summary.

      Weaknesses:

      (1) Could the authors provide a statement in the methods or results to clarify whether there were any changes in synaptic weight or other model parameters of the intact model to ensure locomotor activity in the hemisected model?

      Such a statement has been inserted in Materials and Methods, section “Modeling”. Also, in the 1st paragraph of section “Spinal sensorimotor network architecture and operation after a lateral spinal hemisection”, we stated that no “additional changes or adjustments” were made.

      (2) The authors should remind the reader what the main differences are between state-machine, flexor-driven, and classical half-center regimes (lines 77-79).

      Short explanations/reminders have been inserted (see lines 80-83 of tracked changes document).

      (3) There may be changes in the wiring of spinal locomotor networks after the hemisection. Yet, without applying any sort of plasticity, the model is able to replicate many of the experimental data. Based on what was experimentally replicated or not, what does the model tell us about possible sites of plasticity after hemisection?

      Quantitative correspondence of changes in locomotor characteristics predicted by the model and those obtained experimentally provide additional validation of the model proposed in the preceding paper and used in this paper. This was our ultimate goal. None of the plastic changes during recovery were modeled because of a lack of precise information on these changes. The absence of possible plastic changes may explain the small discrepancies between our simulations and experimental data (see Supplemental Figures that have been added). However, the model only has a simplified description of spinal circuits without motoneurons and without real simulation of leg biomechanics. This limits our analysis or predictions of possible plastic changes within a reasonable degree of speculation. This issue is discussed in section: “Limitations and future directions” in the Discussion. We have also inserted a sentence: “The lack of possible plastic changes in spinal sensorimotor circuits of our model may explain the absence of exact/quantitative correspondences between simulated and experimental data.

      (4) Why are the durations on the right hemisected (fast) side similar to results in the full spinal transected model (Rybak et al. 2024)? Is it because the left is in slow mode and so there is not much drive from the left side to the right side even though the latter is still receiving supraspinal drive, as opposed to in the full transection model? (lines 202-203).

      This is correct. We have included this explanation in the text (lines 210-211 of tracked changes document).

      (5) There is an error with probability (line 280).

      This typo was corrected.

      Reviewer #2 (Public review):

      This is a nice article that presents interesting findings. One main concern is that I don't think the predictions from the simulation are overlaid on the animal data at any point - I understand the match is qualitative, which is fine, but even that is hard to judge without at least one figure overlaying some of the data.

      We thank the Reviewer for the constructive comments. Figures showing the overlay of the experimental data with the modeling predictions have been included as figure supplements for Figures 5-7. This highlights how accurate the model predictions were.

      Second is that it's not clear how the lateral coupling strengths of the model were trained/set, so it's hard to judge how important this hemi-split-belt paradigm is. The model's predictions match the data qualitatively, which is good; but does the comparison using the hemi-split-belt paradigm not offer any corrections to the model? The discussion points to modeling plasticity after SCI, which could be good, but does that mean the fit here is so good there's no point using the data to refine?

      The model has not been trained or retrained, but was used as it was described in the preceding paper. Response: Quantitative correspondence of changes in locomotor characteristics predicted by the model and those obtained experimentally provide additional validation of the model proposed in the preceding paper and used in this paper. This was our ultimate goal. None of the plastic changes during recovery were modeled because of a lack of precise information on these changes. The absence of possible plastic changes may explain the small discrepancies between our simulations and experimental data (see figure supplements that have been added). However, the model only has a simplified description of spinal circuits without motoneurons and without real simulation of leg biomechanics. This limits our analysis or predictions of possible plastic changes within a reasonable degree of speculation. This issue is discussed in section: “Limitations and future directions” in the Discussion.

      The manuscript is well-written and interesting. The putative neural circuit mechanisms that the model uncovers are great, if they can be tested in an animal somehow.

      We agree and we are considering how we can do this in an animal model.

      Page 2, lines 75-6: Perhaps it belongs in the other paper on the model, but it's surprising that in the section on how the model has been revised to have different regimes of operation as speed increases, there is no reference to a lot of past literature on this idea. Just one example would be Koditschek and Full, 1999 JEB Figure 3, where they talk about exactly this idea, or similarly Holmes et al., 2006 SIAM review Figure 7, but obviously many more have put this forward over the years (Daley and Beiwener, etc). It's neat in this model to have it tied down to a detailed neural model that can be compared with the vast cat literature, but the concept of this has been talked about for at least 25+ years. Maybe a review that discusses it should be cited?

      We have revised the Introduction to include the suggested references.

      Page 2, line 88: While it makes sense to think of the sides as supraspinal vs afferent driven, respectively, what is the added insight from having them coupled laterally in this hemisection model? What does that buy you beyond complete transection (both sides no supra) compared with intact?

      We are trying to make one model that could reproduce multiple experimental data in quadrupedal locomotion, including genetic manipulations with (silencing/removal) particular neuron types (and commissural interneurons), as pointed out in the section “Model Description” in the Results. These lateral connections are critical for reproducing and explaining other locomotor behaviors demonstrated experimentally. However, even in this study, these lateral interactions are necessary to maintain left-right coordination and equal left-right frequency (step period) during split-belt locomotion and after hemisection.

      I can see how being able to vary cycle frequencies separately of the two limbs is a good "knob" to vary when perturbing the system in order to refine the model. But there isn't a ton of context explaining how the hemi-section with split belt paradigm is important for refining the model, and therefore the science. Is it somehow importantly related to the new "regimes" of operation versus speed idea for the model?  

      We did not refine the model in this paper. We just used it for new simulations. The predictions strengthen the organization and operation of the model we recently proposed.

      Page 5, line 212: For the predictions from the model, a lot depends on how strong the lateral coupling of the model is, which, in turn, depends on the data the model was trained on. Were the model parameters (especially for lateral coupling of the limbs) trained on data in a context where limbs were pushed out of phase and neuronal connectivity was likely required to bring the limbs back into the same phase relationship? Because if the model had no need for lateral coupling, then it's not so surprising that the hemisected limbs behave like separate limbs, one with surpaspinal intact and one without.

      Please see our response above concerning the need for lateral interactions incorporated to the model.

      Page 8, line 360: The discussion of the mechanisms (increased influence of afferents, etc) that the model reveals could be causing the changes is exciting, though I'm not sure if there is an animal model where it can be tested in vivo in a moving animal.

      We agree it may be difficult to test right now but we are considering experimental approaches.

      Page 9, line 395: There are some interesting conclusions that rely on the hemi-split-belt paradigm here.

      We agree with this comment. Thanks.

      Reviewer #2 (Recommendations for the authors):

      Figures: Why aren't there any figures with the simulation results overlaid on the animal data?

      We followed this suggestion. Figures showing the overlay of the experimental data with the modeling predictions have been included as figure supplements.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      A nice study trying to identify the relationship between E. coli O157 from cattle and humans in Alberta, Canada.

      Strengths:

      (1) The combined human and animal sampling is a great foundation for this kind of study.

      (2) Phylogenetic analyses seem to have been carried out in a high-quality fashion.

      Weaknesses:

      I think there may be a problem with the selection of the isolates for the primary analysis. This is what I'm thinking:

      (1) Transmission analyses are strongly influenced by the sampling frame.

      (2) While the authors have randomly selected from their isolate collections, which is fine, the collections themselves are not random.

      (3) The animal isolates are likely to represent a broad swathe of diversity, because of the structured sampling of animal reservoirs undertaken (as I understand it).

      (4) The human isolates are all from clinical cases. Clinical cases of the disease are likely to be closely related to other clinical cases, because of outbreaks (either detected, or undetected), and the high ascertainment rate for serious infections.

      (5) Therefore, taking an equivalent number of animal and clinical isolates, will underestimate the total diversity in the clinical isolates because the sampling of the clinical isolates is less "independent" (in the statistical sense) than sampling from the animal isolates.

      (6) This could lead to over-estimating of transmission from cattle to humans.

      We appreciate the reviewer’s careful thoughts about our sampling strategy. We agree with points (1) and (2), and we have provided additional details on the animal collections as requested (lines 95-101).

      We agree with point (3) in theory but not in fact. As shown in Figure 3, the cattle isolates were very closely related, despite the temporal and geographic breadth of sampling within Alberta. The median SNP distance between cattle sequences was 45 (IQR 36-56), compared to 54 (IQR 43-229) SNPs between human sequences from cases in Alberta during the same years. Additionally, as shown in Figure 2, only clade A and B isolates – clades that diverge substantially from the rest of the tree – were dominated by human cases in Alberta. We have better highlight this evidence in the revision (lines 234-236 and 247-249).

      We agree with the reviewer in point (4) that outbreaks can be an important confounder of phylogenetic inference. This is why we down-sampled outbreaks (based on genetic relatedness, not external designation) in our extended analyses. We did not do this in the primary analysis, because there were no large clusters of identical isolates. Figure 3b shows a limited number of small clusters; however, clustered cattle isolates outnumbered clustered human isolates, suggesting that any bias would be in the opposite direction the reviewer suggests. In the revision, we down-sampled all analyses and, indeed, the proportion of human lineages descending from cattle lineages increased (lines 259-261). Regarding severe cases being oversampled among the clinical isolates, this is absolutely true and a limitation of all studies utilizing public health reporting data. We made this limitation to generalizability clearer in the discussion. However, as noted above, clinical isolates were more variable than cattle isolates, so it does not appear to have heavily biased the analysis (lines 490-495).

      We disagree with the reviewer on point (5). While the bias toward severe cases could make the human isolates less independent, the relative sampling proportions are likely to induce greater distance between clinical isolates than cattle isolates, which is exactly what we observe (see response to point (3) above). Cattle are E. coli O157:H7’s primary reservoir, and humans are incidental hosts not able to sustain infection chains long-term. Not only is the bacteria prevalent among cattle, cattle are also highly prevalent in Alberta. Thus, even with 89 sampling points, we are still capturing a small proportion of the E. coli O157:H7 in the province. Being able to sample only a small proportion of cattle’s E. coli O157:H7 increases the likelihood of only sampling from the center of the distribution, making extreme cases such as that shown at the very bottom of the tree in Figure 4, rare and important. In comparison, sampling from human cases constitutes a higher proportion of human infections relative to cattle, and is therefore more representative of the underlying distribution, including extremes. We added this point to the limitations (lines 495-504). As with the clustering above, if anything, this outcome would have biased the study away from identifying cattle as the primary reservoir. Additionally, the relatively small proportion of cattle sampled makes our finding that 15.7% of clinical isolates were within 5 SNPs of a cattle isolate, the distance most commonly used to indicate transmission for E. coli O157:H7, all the more remarkable.

      Because of the aforementioned points, we disagree with the reviewer’s conclusion in point (6). If a bias exists, we believe transmission from cattle-to-humans is likely underestimated for the reasons given above. Not only do all prior studies indicate ruminants as the primary reservoirs of E. coli O157:H7, and humans as only incidental hosts, our specific data do not support the reviewer’s individual contentions. The results of the sensitivity analysis the reviewer recommended is consistent with the points we outlined above, estimating that 94.3% of human lineages arose from cattle lineages (vs. 88.5% in the primary analysis). We have opted to retain the more conservative estimate of the primary analysis, which includes a more representative number of clinical cases.

      (7) We hypothesize that the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence" - this seems a bit tautological. There is a lot of O157 because there's a lot of transmission. What part of the fact it is local means that it is a principal cause of high incidence? It seems that they've observed a high rate of local transmission, but the reasons for this are not apparent, and hence the cause of Alberta's incidence is not apparent. Would a better conclusion not be that "X% of STEC in Alberta is the result of transmission of local variants"? And then, this poses a question for future epi studies of what the transmission pathway is.

      The reviewer is correct, and the suggestion for the direction of future studies was our intent with this statement. We have removed this sentence.

      Reviewer #1 (Recommendations For The Authors):

      (1) To address my concerns about the different sampling frames in humans and animals, I would suggest a sensitivity analysis, using something like the following strategy. Make a phylogeny of all the available genome sequences from humans and cattle from Alberta. Phylogenetically sub-sample the tree, using something like Treemer (https://github.com/fmenardo/Treemmer), to remove phylogenetically redundant isolates from the same host type. Randomly select 100 human and 100 animal isolates from this non-redundant tree, and re-do your analysis.

      Although we originally down-sampled outbreaks for our analysis of the extended Alberta tree (2007-2019), we had not done this systematically for all analyses. We were not able to use the recommended Treemer tool, because we did not see a way to incorporate the timing of sequences. Because the objective of our study was to evaluate persistence, we did not want to exclude identical sequences that were separated in time and thus could be indicating persistence. To accomplish this, we developed a utility that allowed us to incorporate the temporality of sequences. Using this utility, we systematically down-sampled all sequences that met the following conditions: 1) within 0-2 SNPs of another sequence and 2) no gaps in sequence set >2 months. The second condition means that for any set of sequences within 0-2 SNPs of one another, there can be no more than 2 months without a sequence from the set. Similar sequences that occur beyond this 2-month-cutoff would be considered a separate set for down-sampling. This cutoff was chosen based on the epidemiology of E. coli O157 outbreaks, which are generally either point-source or continuous-source outbreaks. Intermittent outbreaks of a single strain are believed to arise from distinct contamination events and are exactly the type of phenomena we are seeking to identify. We have added details on down-sampling to the Methods (lines 178-180).

      After down-sampling, our primary analysis included 115 human and 84 cattle isolates. T conduct the recommended sensitivity analysis, we further randomly subsampled the human isolates, selecting 84 to match the number of cattle isolates. As we suggested in our initial response, and contrary to the reviewer’s concern, subsampling in this way accentuated the results, with 94.3% of human lineages inferred as arising from cattle lineages, compared to 88.5% in the primary analysis. This sensitivity analysis also identified 10 of the 11 LPLs identified in the primary analysis. The LPL not identified had 5 isolates in the primary analysis, the minimum for definition as an LPL, and was reduced to 4 isolates through subsampling. This sensitivity analysis is shown in Suppl. Figure S3.

      (2) This is the first time I've seen target diagrams used for SNP distances, I'm not sure of their value compared with histograms. They seem to emphasise the maximum distance, rather than the largest number of isolates. I.e. most isolates are closely related, but the diagram emphasises the small number of divergent ones.

      In using the target diagrams, we sought to emphasize the bimodal distribution of human-to-closest-cattle SNP differences. However, this is still mostly visible in a histogram, so we have replaced the target diagrams with a histogram as suggested (Figure 3).

      (3) L130 - fastqc doesn't trim adapters and read ends, there will be something else like trimmomatic which does.

      The reviewer is correct, and we appreciate them catching this error. Trimmomatic is incorporated into the Shovill pipeline, which was the assembler we used through the Bactopia pipeline. We have updated the Methods to indicate this (lines 142-144).

      (4) I find the flow of the article a bit confusing. You have your primary analysis, but Figure 2, which is a secondary analysis, comes before Figure 3. Which is the primary analysis? For me, primary analysis results should come first, or at least signpost a bit better.

      Figure 2 is not a secondary analysis. It is intended to provide an overview of the isolates used from the phylogenetic perspective, just as the diagram in Figure 1 provides an overview of the isolates by analysis. The secondary analyses are shown in Figures 5-7. We have added a sub-header, “Description of Isolates”, to the section referring to Figure 2, to clarify (line 232).

      (5) Locally persistent lineage definition. What is the rationale for the different criteria signifying locally persistent lineages? There is nothing in some of your criteria e.g. all isolates <30 SNPs from each other, which indicates that it is locally persistent - could have been transmitted to Japan (just to pick a place at random), causing a bunch of cases there, and then come back for all we know. Would that be a locally persistent lineage? Did you use the MCC tree here? That is a sub-sample of your full dataset, I am not sure what exactly you're trying to say with the LPLs, but maybe using a larger dataset would be better? Also, there are lots of STEC genomes available from e.g. UK and USA, by only including a fraction of these, you limit the strength of the inferences you can make about locally persistent lineages unless you know that they don't see the G sub-lineage that you observe.

      The reviewer raises multiple points here. First, regarding our definition of LPLs, it is intended to identify those lineages that pose a threat to populations in the specific geographic area (“local”) for at least 1 year (“persistent”) that are likely to be harbored in local reservoirs. Each of the criteria contributes to this definition.

      (1) A single lineage of the MCC tree with a most recent common ancestor (MRCA) with ≥95% posterior probability: This criterion provides confidence in the given isolates being part of a single, defined lineage. The posterior probability gives the probability that the topology of the tree is accurate, based on the data provided and the chosen model of evolution. In other words, we required at least 95% probability that the lineage was correct, and in practice the posterior probability of the lineages we defined as LPLs was 99.7-100% (we have added this detail to the text, lines 269-270). We also added a sensitivity analysis, shown in Suppl. Figure S4, which shows all sampled trees. We find that the essential structure of the tree around the LPLs we defined is well-supported.

      (2) All isolates ≤30 core SNPs from one another: This criterion limited LPLs to those lineages where the isolates were closely related. We did not want to limit LPLs to those that might define an outbreak, for example using a 5-10 SNP threshold, because the point of the study is to identify lineages that persistently cause disease over longer periods than a normal outbreak. Pathogens evolve over time in their reservoirs, leading to greater SNP distances, and we wanted to allow for this. The U.S. CDC has acknowledged a similar concern for such persistent lineages in its definition of REP strains, which it has defined based on ranges of 13-104 allele differences by cgMLST. Thus, our choice of 30 core SNPs as the threshold is in line with current practice in the emerging science on persistence of enteric pathogens. We have also added a sensitivity analysis examining alternate SNP thresholds, shown in Suppl. Figure S5, which results in clusters of LPLs identified in the primary analysis being grouped into larger lineages. Additionally, in the tree showing our primary analysis (Figure 4), we now note the minimum number of SNPs all isolates within the lineage differ by.

      (3) Contained at least 1 cattle isolate: This criterion increases confidence that the lineage is indeed “local”. Unlike humans, cattle are not known to be routinely infected by imported food products, and they do not make roundtrip journeys to other locations, as humans infected during travel do. Cattle themselves may be imported into Alberta while infected, and cattle in Alberta can be infected by other imported animals. In these cases, if the STEC strains the cattle harbor persist for ≥1 year, they become the type of lineages we are interested in as LPLs, regardless where they previously came from, because they are now potential persistent sources of infection in Alberta. By including at least one cattle isolate in each LPL, the only way an identified LPL is not actually local is if cattle are imported from the lineage’s reservoir community elsewhere (e.g., in Japan, as the reviewer suggested), the lineage is persisting in that non-Alberta reservoir, and newly infected cattle are imported repeatedly over 1 or more years. This could feasibly explain G(vi)-AB LPL 5 (Figure 4), which is entirely composed of cattle. Indeed, such an explanation would be consistent with the lack of new cases from this LPL after 2015 in the extended analysis (Figure 5). However, for all other LPLs, which contain both cattle and human isolates, for the LPL to not be local, both cattle and human cases would have to be imported from the same non-Alberta reservoir. While this is possible, the probability of such a scenario is low, and it decreases the more isolates are in an LPL. For the average LPL, this means 4 human and 6 cattle cases would need to be imported from a non-Alberta reservoir over several years. Given that our study is only a random sample of the total STEC cases and cattle in Alberta from 2007-2015, these numbers are underestimates of the true absolute number of cases and cattle associated with LPLs that would have to be explained by importation if the LPL were not local. We have added some explanation of the possibility of importation in the Discussion where we discuss the LPL criteria (lines 376-380).

      (4) Contained ≥5 isolates: In concert with criterion 3, this criterion guards against anomalies being counted as LPLs. By requiring at least 5 isolates in an LPL after down-sampling, at least 5 infection events must have occurred from the LPL, reducing the likelihood of importation explaining the LPL and emphasizing more significant LPLs.

      (5) The isolates were collected at sampling events (for cattle) or reported (for humans) over a period of at least 1 year: This criterion defines the persistence aspect of the LPL. In the primary analysis, the LPLs we identified persisted for an average of 8 years, with the shortest persisting for 5 years (these details have been added to the text, lines 268-269). Incorporating the extended analysis, several LPLs persisted for the full 13 years of the study.

      Regarding using additional non-Alberta isolates to help rule out importation, we have expanded the number of U.S. and global isolates included in the importation analysis, over-sampling clade G isolates from the U.S. (Figure 7). As cattle trade is substantially more common with the U.S. than other countries, we felt it most important to focus on the U.S. as a potential source of both imported cattle and human cases. Our results from this analysis show that only 9 of 494 (1.8%) U.S. isolates occurred in the LPLs we defined in the primary analysis, and all occurred after Alberta isolates (lines 313-317). Although we also added more global isolates, we still found that none were associated with the Alberta LPLs.

      (6) Given the importance of sampling for a study like this, some more information on animal sampling studies should be included here.

      We have added details on the cattle sampling to the Methods (lines 95-101).

      (7) L172 - do you mean an MRCA with >- 95% probability of location in Alberta?

      Location in Alberta was not determined from the primary analysis, which defined the LPLs, as only Alberta isolates were included in that analysis. As described above, this criterion meant that we required at least 95% probability that the tree topology at the lineage’s MRCA was correct, and in practice the posterior probability of the lineages we defined as LPLs was 99.7-100%.

      (8) Need a supplementary figure of just clade G from Figure 2.

      We have added a sub-tree diagram of clade G(vi) as Figure 2b.

      Reviewer #2 (Public Review):

      This study identified multiple locally evolving lineages transmitted between cattle and humans persistently associated with E. coli O157:H7 illnesses for up to 13 years. Furthermore, this study mentions a dramatic shift in the local persistent lineages toward strains with the more virulent stx2a-only profile. The authors hypothesized that this phenomenon is the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence. These opinions more effectively explain the role of the cattle reservoir in the dynamics of E. coli O157:H7 human infections.

      (1) The authors acknowledge the possibility of intermediate hosts or environmental reservoirs playing a role in transmission. Further discussion on the potential roles of other animal species commonly found in Alberta (e.g., sheep, goats, swine) could enhance the understanding of the transmission dynamics. Were isolates from these species available for analysis? If not, the authors should clearly state this limitation.”

      We have expanded the discussion of other species in Alberta, as suggested, including other livestock, wildlife, and the potential role of birds and flies (lines 353-360). Unfortunately, we did not have sequences available from other species, which we have added to the limitations (lines 487-490).

      (2) The focus on E. coli O157:H7 is understandable given its prominence in Alberta and the availability of historical data. However, a brief discussion on the potential applicability of the findings to non-O157 STEC serogroups, and the limitations therein, would be beneficial. Are there reasons to believe the transmission dynamics would be similar or different for other serogroups?

      We appreciate this comment and have expanded our discussion of relevance to non-O157 STEC (lines 452-460). Other authors have proposed that transmission dynamics differ, and studies of STEC risk factors, including our own, support this. However, there has been very little direct study of non-O157 transmission dynamics and there is even less cross-species genomic and metadata available for non-O157 isolates of concern.

      (3) The authors briefly mention the need for elucidating local transmission systems to inform management strategies. A more detailed discussion on specific public health interventions that could be targeted at the identified LPLs and their potential reservoirs would strengthen the paper's impact.

      We agree with the reviewer that this would be a good addition to the manuscript. The public health implications for control are several and extend to non-STEC reportable zoonotic enteric infections, such as Campylobacter and Salmonella. We have added a discussion of these (lines 460-465, 467-485).

      (4) Understanding the relationship between specific risk factors and E. coli O157:H7 infections is essential for developing effective prevention strategies. Have case-control or cohort studies been conducted to assess the correlation between identified risk factors and the incidence of E. coli O157:H7 infections? What methodologies were employed to control for potential confounders in these studies?

      Yes, there have been several case-control studies of reported cases. Many of these are referenced in the discussion in terms of the contribution of different sources to infection. As risk factors were not the focus of the current study, we believe a thorough discussion of the literature on the aspects of these various studies is beyond our scope. However, we have added some details on the risk factors themselves (lines 72-79).

      (5) The study's findings are noteworthy, particularly in the context of E. coli O157:H7 epidemiology. However, the extent to which these results can be replicated across different temporal and geographical settings remains an open question. It would be constructive for the authors to provide additional data that demonstrate the replication of their sampling and sequencing experiments under varied conditions. This would address concerns regarding the specificity of the observed patterns to the initial study's parameters.

      We appreciate the reviewer’s comment, as we are currently building on this analysis with an American dataset with different types of data available than were used in this study. Aligned with this work, we have added a comment on the adaptation of our method to other settings with different types of data (lines 448-450). We also added a sensitivity analysis to the manuscript simulating a different sampling approach (Suppl. Fig. S3), which should be informative to this question.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments.

      (1) Figure 1: The figure is a critical visual representation of the study's findings and should be given prominent emphasis. It is essential that the key discoveries of the research are clearly depicted and explained in this visual format. The authors should ensure that Figure 1 is detailed and informative enough to stand out as a central piece of the study.

      Figure 1 is the diagram of sample numbers, locations, and corresponding analyses. We assume that the reviewer means to refer to Figure 2. Although the inclusion of >1,200 isolates makes the tree difficult to see in detail, we have made some modifications to make the findings clearer. First, we changed the clade coloration such that the only subclade differentiated is G(vi). We have removed the stx metadata ring to focus attention on the location and species of the isolates, as stx data are described in Table 1. Finally, we have added a sub-tree diagram of clade G(vi), colored by location. This makes clear the large sections of the subclade dominated by isolates from one location or another, and the limited areas where they overlap.

      (2) Figures 2 and 4: While these figures contribute to the presentation of the data, they appear to be somewhat rudimentary in their current form. The lack of detailed annotations regarding the clustering of different strains is a notable omission. I recommend that the authors refine these figures to include comprehensive labeling that clearly delineates the various bacterial clusters. Enhanced graphical representation with clear annotations will aid readers in better understanding the study's findings.

      We appreciate this suggestion. We have remade all trees generated by the BEAST 2 analyses in R, rather than FigTree. This has allowed us to annotate the trees with additional information on the LPLs and we believe provides a clearer picture of each LPL.

      (3) Supplemental Table S1: The supplemental tables are an excellent opportunity to showcase additional data and findings that support the study's conclusions. For Supplemental Table S1, it is recommended that the authors highlight the innovative aspects or novel discoveries presented in this table.

      Suppl. Table S1 shows the modeling specifications and priors used in the analyses. These decisions were not in and of themselves novel. The innovation in our methods is due to the development of the LPLs based on the trees resulting from the analyses detailed in Suppl. Table S1, as well as from the application of these models to E. coli O157:H7 for the first time. However, we understand the reviewers point and have emphasized the importance of the results shown in Suppl. Table S2 (lines 391-395).

      (4) Line 35: "We assessed the role of persistent cross-species transmission systems in Alberta's E. coli O157:H7 epidemiology." change to "We assessed the impact of persistent cross-species transmission systems on the epidemiology of E. coli O157:H7 in Alberta."

      We have made this change.

      (5) To facilitate a deeper understanding of the core findings of the manuscript and to enable the development of effective response strategies, I suggest that the authors provide more information regarding the sequencing data used in the study. This information should at least include aspects such as data accessibility and quality control measures.

      We have included a Supplemental Data File that lists all isolates used in the analysis, and the QC measures are detailed in the Methods.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Can a plastic RNN serve as a basis function for learning to estimate value. In previous work this was shown to be the case, with a similar architecture to that proposed here. The learning rule in previous work was back-prop with an objective function that was the TD error function (delta) squared. Such a learning rule is non-local as the changes in weights within the RNN, and from inputs to the RNN depends on the weights from the RNN to the output, which estimates value. This is non-local, and in addition, these weights themselves change over learning. The main idea in this paper is to examine if replacing the values of these non-local changing weights, used for credit assignment, with random fixed weights can still produce similar results to those obtained with complete bp. This random feedback approach is motivated by a similar approach used for deep feed-forward neural networks.

      This work shows that this random feedback in credit assignment performs well but is not as well as the precise gradient-based approach. When more constraints due to biological plausibility are imposed performance degrades. These results are not surprising given previous results on random feedback. This work is incomplete because the delay times used were only a few time steps, and it is not clear how well random feedback would operate with longer delays. Additionally, the examples simulated with a single cue and a single reward are overly simplistic and the field should move beyond these exceptionally simple examples.

      Strengths:

      • The authors show that random feedback can approximate well a model trained with detailed credit assignment.

      • The authors simulate several experiments including some with probabilistic reward schedules and show results similar to those obtained with detailed credit assignments as well as in experiments.

      • The paper examines the impact of more biologically realistic learning rules and the results are still quite similar to the detailed back-prop model.

      Weaknesses:

      • The authors also show that an untrained RNN does not perform as well as the trained RNN. However, they never explain what they mean by an untrained RNN. It should be clearly explained. These results are actually surprising. An untrained RNN with enough units and sufficiently large variance of recurrent weights can have a high-dimensionality and generate a complete or nearly complete basis, though not orthonormal (e.g: Rajan&Abbott 2006). It should be possible to use such a basis to learn this simple classical conditioning paradigm. It would be useful to measure the dimensionality of network dynamics, in both trained and untrained RNN's.

      Thank you for pointing out the lack of explanation about untrained RNN. Untrained RNN in our simulations (except Fig. 6D/6E-gray-dotted) was randomly initialized RNN (i.e., connection weights were drawn from a pseudo normal distribution) that was used as initial RNN for training of value-RNNs. As you suggested, the performance of untrained RNN indeed improved as the number of units increased (Fig. 2J), and its highest part was almost comparable to the highest performance of trained value-RNNs (Fig. 2I). In the revision we will show the dimensionality of network dynamics (as you have suggested), and eigenvalue spectrum of the network.

      • The impact of the article is limited by using a network with discrete time-steps, and only a small number of time steps from stimulus to reward. What is the length of each time step? If it's on the order of the membrane time constant, then a few time steps are only tens of ms. In the classical conditioning experiments typical delays are of the order to hundreds of milliseconds to seconds. Authors should test if random feedback weights work as well for larger time spans. This can be done by simply using a much larger number of time steps.

      Thank you for pointing out this important issue, for which our explanation was lacking and our examination was insufficient. We do not consider that single time step in our models corresponds to the neuronal membrane time constant. Rather, for the following reasons, we assume that the time step corresponds to several hundreds of milliseconds:

      - We assume that single RNN unit corresponds to a small neuron population that intrinsically (for genetic/developmental reasons) share inputs/outputs and are mutually connected via excitatory collaterals.

      - Cortical activity is suggested to be sustained not only by fast synaptic transmission and spiking but also, even predominantly, by slower synaptic neurochemical dynamics (Mongillo et al., 2008, Science "Synaptic Theory of Working Memory" https://www.science.org/doi/10.1126/science.1150769).

      - In line with such theoretical suggestion, previous research examining excitatory interactions between pyramidal cells, to which one of us (the corresponding author Morita) contributed by conducting model fitting (Morishima, Morita, Kubota, Kawaguchi, 2011, J Neurosci, https://www.jneurosci.org/content/31/28/10380), showed that mean recovery time constant from facilitation for recurrent excitation among one of the two types of cortico-striatal pyramidal cells was around 500 milliseconds.

      If single time step corresponds to 500 milliseconds, three time steps from cue to reward in our simulations correspond to 1.5 sec, which matches the delay in the conditioning task used in Schultz et al. 1997 Science. Nevertheless, as you pointed out, it is necessary to examine whether our random feedback models can work for longer delays, and we will examine it in our revision.

      • In the section with more biologically constrained learning rules, while the output weights are restricted to only be positive (as well as the random feedback weights), the recurrent weights and weights from input to RNN are still bi-polar and can change signs during learning. Why is the constraint imposed only on the output weights? It seems reasonable that the whole setup will fail if the recurrent weights were only positive as in such a case most neurons will have very similar dynamics, and the network dimensionality would be very low. However, it is possible that only negative weights might work. It is unclear to me how to justify that bipolar weights that change sign are appropriate for the recurrent connections and inappropriate for the output connections. On the other hand, an RNN with excitatory and inhibitory neurons in which weight signs do not change could possibly work.

      Our explanation and examination about this issue were insufficient, and thank you for pointing it out and giving us helpful suggestion. In the Discussion (Line 507-510) of the original manuscript, we described "Regarding the connectivity, in our models, recurrent/feed-forward connections could take both positive and negative values. This could be justified because there are both excitatory and inhibitory connections in the cortex and the net connection sign between two units can be positive or negative depending on whether excitation or inhibition exceeds the other." However, we admit that the meaning of this description was not clear, and more explicit modeling will be necessary as you suggested.

      Therefore in our revision, we will examine models, in which inhibitory units (modeling fast-spiking (FS) GABAergic cells) will be incorporated, and neuron will follow Dale’s law.

      • Like most papers in the field this work assumes a world composed of a single cue. In the real world there many more cues than rewards, some cues are not associated with any rewards, and some are associated with other rewards or even punishments. In the simplest case, it would be useful to show that this network could actually work if there are additional distractor cues that appear at random either before the CS, or between the CS and US. There are good reasons to believe such distractor cues will be fatal for an untrained RNN, but might work with a trained RNN, either using BPPT or random feedback. Although this assumption is a common flaw in most work in the field, we should no longer ignore these slightly more realistic scenarios.

      Thank you very much for this insightful comment. In our revision, we will examine situations where there exist not only reward-associated cue but also randomly appeared distractor cues.

      Reviewer #2 (Public review):

      Summary:

      Tsurumi et al. show that recurrent neural networks can learn state and value representations in simple reinforcement learning tasks when trained with random feedback weights. The traditional method of learning for recurrent network in such tasks (backpropagation through time) requires feedback weights which are a transposed copy of the feed-forward weights, a biologically implausible assumption. This manuscript builds on previous work regarding "random feedback alignment" and "value-RNNs", and extends them to a reinforcement learning context. The authors also demonstrate that certain non-negative constraints can enforce a "loose alignment" of feedback weights. The author's results suggest that random feedback may be a powerful tool of learning in biological networks, even in reinforcement learning tasks.

      Strengths:

      The authors describe well the issues regarding biologically plausible learning in recurrent networks and in reinforcement learning tasks. They take care to propose networks which might be implemented in biological systems and compare their proposed learning rules to those already existing in literature. Further, they use small networks on relatively simple tasks, which allows for easier intuition into the learning dynamics.

      Weaknesses:

      The principles discovered by the authors in these smaller networks are not applied to deeper networks or more complicated tasks, so it remains unclear to what degree these methods can scale up, or can be used more generally.

      In our revision, we will examine more biologically realistic models with excitatory and inhibitory units, as well as more complicated tasks with distractor cues. We will also consider whether/how the depth of networks can be increased, though we do not currently have concrete idea on this last point. Thank you also for giving us the detailed insightful 'recommendations for authors'. We will address also them in our revision.

      Reviewer #3 (Public review):

      Summary:

      The paper studies learning rules in a simple sigmoidal recurrent neural network setting. The recurrent network has a single layer of 10 to 40 units. It is first confirmed that feedback alignment (FA) can learn a value function in this setting. Then so-called bio-plausible constraints are added: (1) when value weights (readout) is non-negative, (2) when the activity is non-negative (normal sigmoid rather than downscaled between -0.5 and 0.5), (3) when the feedback weights are non-negative, (4) when the learning rule is revised to be monotic: the weights are not downregulated. In the simple task considered all four biological features do not appear to impair totally the learning.

      Strengths:

      (1) The learning rules are implemented in a low-level fashion of the form: (pre-synaptic-activity) x (post-synaptic-activity) x feedback x RPE. Which is therefore interpretable in terms of measurable quantities in the wet-lab.

      (2) I find that non-negative FA (FA with non negative c and w) is the most valuable theoretical insight of this paper: I understand why the alignment between w and c is automatically better at initialization.

      (3) The task choice is relevant since it connects with experimental settings of reward conditioning with possible plasticity measurements.

      Weaknesses:

      (4) The task is rather easy, so it's not clear that it really captures the computational gap that exists with FA (gradient-like learning) and simpler learning rule like a delta rule: RPE x (pre-synpatic) x (post-synaptic). To control if the task is not too trivial, I suggest adding a control where the vector c is constant c_i=1.

      Thank you for this insightful comment. We have realized that this is actually an issue that would need multilateral considerations. A previous study of one of us (Wärnberg & Kumar, 2023 PNAS) assumed that DA represents a vector error rather than a scalar RPE, and thus homogeneous DA was considered as negative control because it cannot represent vector error other than the direction of (1, 1, .., 1). In contrast, the present work assumed that DA represents a scalar RPE, and then homogeneous DA (i.e., constant feedback) would not be said as a failure mode because it can actually represent a scalar RPE and FA to the direction of (1, 1, .., 1) should in fact occur. And this FA to (1, 1, ..., 1) may actually be interesting because it means that if heterogeneity of DA inputs is not large and the feedback is not far from (1, 1, ..., 1), states are learned to be represented in such a way that simple summation of cortical neuronal activity approximates value, thereby potentially explaining why value is often correlated with regional activation (fMRI BOLD signal) of not only striatal but also cortical regions (which I have been considering as an unresolved mystery). But on the other hand, the case with constant feedback is the same as the simple delta rule, as you pointed out, and then what could be obtained from the present analyses would be that FA is actually occurring behind the successful operation of such a simple rule. Anyway we will make further examinations and considerations on this issue.

      (5) Related to point 3), the main strength of this paper is to draw potential connection with experimental data. It would be good to highlight more concretely the prediction of the theory for experimental findings. (Ideally, what should be observed with non-negative FA that is not expected with FA or a delta rule (constant global feedback) ?).

      In response to this insightful comment, we considered concrete predictions of our models. In the FA model, the feedback vector c and the value-weight vector w are initially at random (on average orthogonal) relationships and become gradually aligned, whereas in the non-negative model, the vectors c and w are loosely aligned from the beginning. We considered how the vectors c and w can be experimentally measured. Each element of the feedback vector c is multiplied with TD-RPE, modulating the degree of update in each pyramidal cell (more accurately, pyramidal cell population that corresponds to single RNN unit). Thus each element of c could be measured as the magnitude of response of each pyramidal cell to DA stimulation. The element of the value-weight vector w corresponding to a given pyramidal cell could be measured, if striatal neuron that receives input from that pyramidal cell can be identified (although technically demanding), as the magnitude of response of the striatal neuron to activation of the pyramidal cell.

      Then, the abovementioned predictions can be tested by (i) identify cortical, striatal, and VTA regions that are connected by meso-cortico-limbic pathway and cortico-striatal-VTA pathway, (ii) identify pairs of cortical pyramidal cells and striatal neurons that are connected, (iii) measure the responses of identified pyramidal cells to DA stimulation, as well as the responses of identified striatal neurons to activation of the connected pyramidal cells, and (iv) test whether the DA->pyramidal responses and the pyramidal->striatal responses are associated across pyramidal cells, and whether such associations develop through learning. We will elaborate this tentative idea, and also other ideas, in our revision.

      (6a) Random feedback with RNN in RL have been studied in the past, so it is maybe worth giving some insights how the results and the analyzes compare to this previous line of work (for instance in this paper [https://www.nature.com/articles/s41467-020-17236-y]). For instance, I am not very surprised that FA also works for value prediction with TD error. It is also expected from the literature that the RL + RNN + FA setting would scale to tasks that are more complex than the conditioning problem proposed here, so is there a more specific take-home message about non-negative FA? or benefits from this simpler toy task?

      In reply to this suggestion, we will explore how our results compare to the previous studies including the paper [https://www.nature.com/articles/s41467-020-17236-y], and explore benefits of our models. At preset, we think of one possible direction. According to our results (Fig. 6E), under the non-negativity constraint, the model with random feedback and monotonic plasticity rule (bioVRNNrf) performed better, on average, than the model with backprop and non-monotonic plasticity rule (revVRNNbp) when the number of units was large, though the difference in the performance was not drastic. We will explore reasons for this, and examine if this also applies to cases with more realistic models, e.g., having separate excitatory and inhibitory units (as suggested by other reviewer).

      (6b) Related to task complexity, it is not clear to me if non-negative value and feedback weights would generally scale to harder tasks. If the task in so simple that a global RPE signal is sufficient to learn (see 4 and 5), then it could be good to extend the task to find a substantial gap between: global RPE, non-negative FA, FA, BP. For a well chosen task, I expect to see a performance gap between any pair of these four learning rules. In the context of the present paper, this would be particularly interesting to study the failure mode of non-negative FA and the cases where it does perform as well as FA.

      In reply to this comment and also other reviewer's comment, we will examine the performance of the different models in more complex tasks, e.g., having distractor cues or longer delays. We will also see whether or not the better performance of bioVRNNrf than revVRNNbp mentioned in the previous point applies to the different tasks.

      (7) I find that the writing could be improved, it mostly feels more technical and difficult than it should. Here are some recommendations:

      (7a) for instance the technical description of the task (CSC) is not fully described and requires background knowledge from other paper which is not desirable.

      (7b) Also the rationale for the added difficulty with the stochastic reward and new state is not well explained.

      (7c) In the technical description of the results I find that the text dives into descriptive comments of the figures but high-level take home messages would be helpful to guide the reader. I got a bit lost, although I feel that there is probably a lot of depth in these paragraphs.

      Thank you for your helpful suggestions. We will thoroughly revise our writings.

      (8) Related to the writing issue and 5), I wished that "bio-plausibility" was not the only reason to study positive feedback and value weights. Is it possible to develop a bit more specifically what and why this positivity is interesting? Is there an expected finding with non-negative FA both in the model capability? or maybe there is a simpler and crisp take-home message to communicate the experimental predictions to the community would be useful?

      We will make considerations on whether/how the non-negative constraints could have any benefits other than biological plausibility, in particular, in theoretical aspects or applications using neuro-morphic hardware, while we will also elaborate the links to biology and concretize the model's predictions.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      This is an interesting manuscript where the authors systematically measure rG4 levels in brain samples at different ages of patients affected by AD. To the best of my knowledge this is the first time that BG4 staining is used in this context and the authors provide compelling evidence to show an association with BG4 staining and age or AD progression, which interestingly indicates that such RNA structure might play a role in regulating protein homeostasis as previously speculated. The methods used and the results reported seem robust and reproducible.

      In terms of the conclusions, however, I think that there are 2 main things that need addressing prior to publication:

      (1) Usually in BG4 staining experiments to ensure that the signal detected is genuinely due to rG4 an RNase treatment experiment is performed. This does not have to be extended to all the samples presented but having a couple of controls where the authors observe loss of staining upon RNase treatment will be key to ensure with confidence that rG4s are detected under the experimental conditions. This is particularly relevant for this brain tissue samples where BG4 staining has never been performed before.

      With what is now known about RNA rG4s and the recent reconciliation of the controversy on rG4 formation (Kharel, Nature Communications 2023), this experiment is no longer strictly required for demonstration of rG4 formation. Despite this change, we did attempt this experiment at the reviewer’s suggestion, but the controls were not successful, suggesting it may not be feasible with our fixing and staining conditions. That said, we agree that despite the G4 staining appearing primarily outside the nucleus, it would be helpful to have some direct indication of whether we were observing primarily RNA or DNA G4s, and so we performed an alternate experiment to determine this.

      In our previous submission, we had performed ribosomal RNA staining  (Figure S7), and the staining patterns were similar to that of BG4, especially the punctate pattern near the nuclei. Therefore, we directly asked whether the BG4 was largely binding to rRNA and have now shown the resulting co-stain in Figure 3b. These results show that at least a large amount of the BG4 staining does arise from rG4s in ribosomes. At high magnification, we observe that the BG4 stains a subset of the ribosomes, consistent with previous observations of high rG4 levels in ribosomes both in vitro and in cells (Mestre-Fos, 2019 J Mol Biol, Mestre-Fos 2019 PLoS One, Mestre-Fos 2020 J Biol Chem), but this had never been demonstrated in tissue. This experiment has therefore both answered the primary question of whether we are primarily observing rG4s, as well as provided more detailed information on the cellular sublocalization of rG4 formation, and provided the first evidence of rG4 formation on ribosomes in tissue.

      (2) The authors have an association between rG4-formation and age/disease progression. They also observe distribution dependency of this, which is great. However, this is still an association which does not allow the model to be supported. This is not something that can be fixed with an easy experiment and it is what it is, but my point is that the narrative of the manuscript should be more fair and reflect the fact that, although interesting, what the authors are observing is a simple correlation. They should still go ahead and propose a model for it, but they should be more balanced in the conclusion and do not imply that this evidence is sufficient to demonstrate the proposed model. It is absolutely fine to refer to the literature and comment on the fact that similar observations have been reported and this is in line with those, but still this is not an ultimate demonstration.

      We agree that these are correlative studies (of necessity when studying human tissue), but recent experiments have shown that rG4s affect the aggregation of Tau in vitro – and we have now better clarified this in the text itself. We have now also been more careful in drawing causative conclusions as shown in the revised text.

      Minor point:

      (3) rG4s themselves have been shown to generate aggregates in ALS models in the absence of any protein (Ragueso et al. Nat Commun 2023). I think this is also important in the light of my comment on the model, could well be that these rG4s are causing aggregates themselves that act as nucleation point for the proteins as reported in the paper I mentioned. Providing a broader and more unbiased view of the current literature on the topic would be fair, rather than focusing on reports more in line with the model proposed.

      We agree and have modified the discussion and added a broader context, including the Ragueso report described above.

      Reviewer #1 (Significance):

      This is a significant novel study, as per my comments above. I believe that such a study will be of impact in the G4 and neurodegenerative fields. Providing that the authors can address the criticisms above, I strongly believe that this manuscript would be of value to the scientific community. The main strength is the novelty of the study (never done before) the main weakness is the lack of the RNase control at the moment and the slightly over interpretation of the findings (see comments above).

      Reviewer #2 (Evidence, reproducibility and clarity):

      RNA guanine-rich G-quadruplexes (rG4s) are non-canonical higher order nucleic acid structures that can form under physiological conditions. Interestingly, cellular stress is positively correlated with rG4 induction.  In this study, the authors examined human hippocampal postmortem tissue for the formation ofrG4s in aging and Alzheimer Disease (AD). rG4 immunostaining strongly increased in the hippocampus with both age and with AD severity. 21 cases were used in this study (age range 30-92).  This immunostaining co-localized with hyper-phosphorylated tau immunostaining in neurons. The BG4 staining levels were also impacted by APOE status. rG4 structure was previously found to drive tau aggregation. Based on these observations, the authors propose a model of neurodegeneration in which chronic rG4 formation drives proteostasis collapse.

      This model is interesting, and would explain different observations (e.g., RNA is present in AD aggregates and rG4s can enhance protein oligomerization and tau aggregation).

      Main issue:

      There is indeed a positive correlation between Braak stage severity and BG4 staining, but this correlation is relatively weak and borderline significant ((R = 0.52, p value = 0.028). This is probably the main limitation of this study, which should be clearly acknowledged (together with a reminder that "correlation is not causality”.

      We believe that we had not explained this clearly enough in the text (based on the reviewer’s comment), as the correlation mentioned by the Reviewer was for the CA4 region only, and not the OML, which was substantially more correlated and statistically significant (Spearman R= 0.72, p = 0.00086). As a result, we believe this was a miscommunication that is rectified by the revised text:

      “In the OML, plotting BG4 percent area versus Braak stage demonstrated a strong correlation (Spearman R= 0.72) with highly significantly increased BG4 staining with higher Braak stages (p = 0.00086) (Fig. 2b).”

      Related to this, here is no clear justification to exclude the four individuals in Fig 1d (without them R increases to 0.78). Please remove this statement. On the other hand, the difference based on APOE status is more striking.

      We did not mean to imply that deleting these outliers was correct, but merely were demonstrating that they were in fact outliers. To avoid this misinterpretation, we have now deleted the sentence in the Figure 1d caption mentioning the outliers.

      Minor suggestions

      - "BG4 immunostaining was in many cases localized in the cytoplasm near the nucleus in a punctate pattern". Define "many"

      This is seen in nearly every cells and this is now altered in the text and is now identified as ribosomes containing rG4s using the rRNA antibody (Fig. 3b).

      - Specify that MABE917 corresponds to the specific single-chain version of the BG4 antibody

      Yes, this is correct, and this clarification has been added to the manuscript

      - Define PMI, Braak, CERAD (add a list of acronyms or insert these definitions in Fig 1b legend)

      These definitions have all been added when they first appear.

      - Fig 3: scale bar legend missing (50 micrometers?)

      This has been added, and the reviewer was correct that it was 50 micrometers.

      - Supplementary data Table 1: indicate target for all antibodies

      The target for each antibody has been added to supplementary Table 1.

      - Supplementary data Table 2: why give ages with different levels of precision? (e.g. 90.15 vs 63)

      We apologize for this oversight and have altered the ages to the same (whole years) in the figure.

      - Supplementary data Fig 1 X-axis legend: add "(nm)" after wavelength. Sequence can also be added in the legend. Why this one? Max/Min Wavelengths in the figure do not match indications in the experimental part. Not sure if that part is actually relevant for this study.

      The CD spectrum in Sup Fig 1 is the sequence that had previously been shown to aid in tau aggregation seeding, but had not been suspected by those authors to be a quadruplex. So we tested that here and showed it is a quadruplex, as described at the end of the introduction. We have added wording to the figure legend to clarify where its corresponding description in the main text can be found. We have also checked and corrected the wavelength and units.

      - Supplementary data Fig 7: Which ribosomal antibody was used?

      The details of this antibody have now been added to Supplementary Table 2 which lists all the antibodies used.

      Reviewer #2 (Significance):

      Provide a link between Alzheimer disease and RNA G-quadruplexes.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This study investigated the formation of RNA G quadruplexes (rG4) in aging and AD in human hippocampal postmortem tissue. The rG4 immunostaining in the hippocampus increases strongly with age and with the severity of AD. Furthermore, rG4 is present in neurons with an accumulation of phosphorylated tau immunostaining.

      Major comments

      (1) The method used in this study is primarily immunostaining of BG4, and the results cannot be considered correct without additional data from more multifaceted analyses (biochemical analysis, RNA expression analysis, etc.).

      We respectfully disagree with the Reviewer’s assessment of the value of these experiments. The most relevant biochemical experiments at the cellular and molecular level showing the role of G4s in aggregation in general and Tau in particular have been done and are referenced in the text. The results here stand on their own and are highly novel and significant, as evaluated by both of the other reviewers. There has been no previous work demonstrating the presence of rG4s in human brain – either in controls or in patients with AD. AD is a complex condition that only occurs spontaneously in the human brain and no other species; because of this complexity, novel aspects are best first studied in human brain tissue using the methods employed here.

      (2) Overall, the quality of the stained images is poor, and detailed quantitative analysis using further high quality data is essential to conclude the authors' conclusions.

      We have again looked at our images and they are not poor quality -they are confocal images taken at recommended resolution of the confocal microscope. It is possible the poor quality came from pdf compression by the manuscript submission portal, which is beyond our control as they were uploaded at high resolution. These data were quantified by scientists who were blinded to the diagnosis of each case. The level of description on the detailed quantification is higher than we have observed in similar studies. We therefore disagree with the reviewer’s conclusion.

      Reviewer #3 (Significance):

      Overall, this study is not a deeply analyzed study. In addition, the authors of this study need further understanding regarding G4.

      It is also unclear why the reviewer believes that we do not have sufficient understanding of G4s, and would request that the reviewer instead provides specific comments regarding what is lacking in terms of knowledge on G4s, as we respectfully disagree with this judgement of our knowledge-base (see other G4 papers from the Horowitz lab, Begeman, 2020, Litberg 2023, Son, 2023 referenced below).

      Litberg TJ, Sannapureddi RKR, Huang Z, Son A, Sathyamoorthy B, Horowitz S. Why are G-quadruplexes good at preventing protein aggregation? Jan;20(1):495-509. doi: 10.1080/15476286.2023.2228572. RNA Biol. (2023)

      Son A, Huizar Cabral V, Huang Z, Litberg TJ, Horowitz S. G-quadruplexes rescuing protein folding. May 16;120(20):e2216308120. doi: 10.1073/pnas.2216308120. Proc Natl Acad Sci U S A (2023)

      Guzman BB, Son A, Litberg TJ, Huang Z, Dominguez , Horowitz S. Emerging Roles for G-Quadruplexes in Proteostasis FEBS J.doi: 10.1111/febs.16608. (2022)

      Begeman A, Son A, Litberg TJ, Wroblewski TH, Gehring T, Huizar Cabral V, Bourne J, Xuan Z, Horowitz S. G-Quadruplexes Act as Sequence Dependent Protein Chaperones. EMBO Reports Sep 18;e49735. doi: 10.15252/embr.201949735. (2020)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The use of antalarmin, a selective CRF1 receptor antagonist, prevents the deficits in sociability in (acutely) morphine-treated males, but not in females. In addition, cell-attached experiments show a rescue to control levels of the morphine-induced increased firing in PVN neurons from morphine-treated males. Similar results are obtained in CRF receptor 1-/- male mice, confirming the involvement of CRF receptor 1-mediated signaling in both sociability deficits and neuronal firing changes in morphine-treated male mice.

      Strengths:

      The experiments and analyses appear to be performed to a high standard, and the manuscript is well written and the data clearly presented. The main finding, that CRF-receptor plays a role in sociability deficits occurring after acute morphine administration, is an important contribution to the field.

      Weaknesses:

      The link between the effect of pharmacological and genetic modulation of CRF 1 receptor on sociability and on PVN neuronal firing, is less well supported by the data presented. No evidence of causality is provided.

      Major points:

      (1) The results of behavioral tests and the neural substrate are purely correlative. To find causality would be important to selectively delete or re-express CRF1 receptor sequence in the VPN. Re-expressing the CRF1 receptor in the VPN of male mice and testing them for social behavior and for neuronal firing would be the easier step in this direction.

      We agree with this comment and have acknowledged that further studies, such as genetic or pharmacological inactivation of CRF<sub>1</sub> receptors selectively in the paraventricular nucleus of the hypothalamus (PVN), are warranted to address this issue (page 17, line 25 to page 18, line 1).

      We would also like to mention that our manuscript title intentionally presented our findings separately without implying causality. Our idea was simply to pair the behavioral data to neural activity within a network of interest, i.e., the PVN CRF-oxytocin (OXY)/arginine-vasopressin (AVP) network, which is thought to play a critical role at the interface of substance use disorders and social behavior. Accordingly, we previously reported that genetic CRF<sub>2</sub> receptor deficiency reliably eliminated sociability deficits and hypothalamic OXY and AVP expression induced by cocaine withdrawal (Morisot et al., 2018). Thus, the present manuscript reliably shows that CRF<sub>1</sub> receptor-mediated effects of acute morphine administration upon social behavior are consistently mirrored by neural activity changes within the PVN, and particularly within its OXY<sup>+</sup>/AVP<sup>+</sup> neuronal populations. In addition, we demonstrate that the latter effects are sex-linked, which is in line with previous reports of sex-biased CRF<sub>1</sub> receptor roles in rodents (Rosinger et al., 2019; Valentino et al., 2013) and humans (Roy et al., 2018; Weber et al., 2016).

      (2) It would be interesting to discuss the relationship between morphine dose and CRF1 receptor expression.

      We are not aware of studies reporting CRF<sub>1</sub> receptor expression following acute morphine administration. However, repeated heroin self-administration was shown to increase CRF<sub>1</sub> receptor expression in the ventral tegmental area (VTA). We have mentioned the latter study in the present revised version of our manuscript at page 18, lines 1-2.

      (3) It would be important to show the expression levels of CRF1 receptors in PVN neurons in controls and morphine-treated mice, both males and females.

      We agree with this reviewer comment and, in the present version of the manuscript, have mentioned that examination of CRF<sub>1</sub> receptor expression in the PVN might help to understand the brain mechanisms underlying morphine effects upon social behavior (page 18, lines 2-6). Moreover, at page 15, lines 11-19 we have mentioned studies showing higher levels of the CRF<sub>1</sub> receptor in the PVN of adult (2 months) and old (20-24 months) male mice, as compared to adult and old female mice (Rosinger et al., 2019). Thus, differences in PVN CRF<sub>1</sub> receptor expression between male and female mice might underlie the sex-linked effects of CRF<sub>1</sub> receptor antagonism by antalarmin reported in our manuscript.

      (4) It would be important to discuss the mechanisms by which CRF1 receptor controls the firing frequency of APV+/OXY+ neurons in the VPN of male mice.

      Using the in situ hybridization technique, studies reported relatively low expression of the CRF<sub>1</sub> receptor in the PVN (Van Pett et al., 2000). However, more recent studies using genetic approaches identified a substantial population of CRF<sub>1</sub> receptor-expressing neurons within the PVN (Jiang et al., 2019, 2018). These CRF<sub>1</sub> receptor-expressing neurons are believed to respond to local CRF release and likely form bidirectional connections with both CRF and OXY+/AVP+ neurons (Jiang et al., 2019, 2018). Thus, one proposed mechanism of action is that morphine increases intra-PVN release of CRF, which may act on intra-PVN CRF<sub>1</sub> receptor-expressing neurons. The latter neurons might in turn influence the activity of PVN OXY+/AVP+ neurons, which largely project to the VTA and the bed nucleus of the stria terminalis (BNST) to modulate social behavior. Within this framework, pharmacological or genetic inactivation of CRF<sub>1</sub> receptors might deregulate the activity of intra-PVN CRF-OXY/AVP interactions and thus interfere with opiate-induced social behavior deficits. In particular, the latter phenomenon might be more pronounced in male mice since they express more CRF<sub>1</sub> receptor-positive neurons in the PVN, as compared to female mice (Rosinger et al., 2019). The putative mechanisms of action described herein are also mentioned at page 16, lines 12 to page 17, line 7 of the present revised version of the manuscript.

      Minor points:

      (1) The phase of the estrous cycles in which females are analyzed for both behavior and electrophysiology should be stated.

      The normal estrous cycle of laboratory mice is 4-5 days in length, and it is divided into four phases (proestrus, estrus, metestrus and diestrus). The three-chamber experiments were generally carried out over a 5-day period, thus spanning across the entire estrous cycle. In particular, on each test day approximately the same number of mice was assigned to each experimental group. Thus, within each group the number of female mice tested on each phase of the estrous cycle was likely similar. Moreover, except for firing frequency displayed by vehicle/morphine-treated mice, female and male mice showed similar results variability, indicating a marginal role for the estrous cycle in the spread of data. We would also like to mention relatively recent studies indicating no significant difference over different phases of the estrous cycle in the social interaction test as well as in anxiety-like and anhedonia-like behavioral tests in C57BL/6J female mice (Zhao et al., 2021). Accordingly, similar findings were also reported by other authors who found no difference across the diestrus and estrus phases of the estrous cycle in C57BL/6J female mice tested in behavioral assays of anxiety-like, depression-like and social interaction (Zeng et al., 2023).

      A paragraph has been added to page 20, lines 1-9 of the present version of the manuscript to explain why we did not monitor the estrous cycle in female mice.

      (2) It would be important to show the statistical analysis between sexes.

      Following this reviewer comment, we examined the sociability ratio results by a three-way ANOVA with sex (males vs. females), pretreatment (vehicle vs. antalarmin) and treatment (saline vs. morphine) as between-subjects factors. The latter analysis revealed an almost significant sex X pretreatment X treatment interaction effect (F<sub>1,53</sub>=3.287, P=0.075), which could not allow for post-hoc individual group comparisons. Nevertheless, Newman-Keuls post-hoc comparisons revealed that male mice treated with antalarmin/morphine showed higher sociability ratio than female mice treated with antalarmin/morphine (P<0.05). The latter statistical results have been added to the present revised version of the manuscript at page 7, lines 2-8.

      We also examined neuronal firing frequency by a three-way ANOVA with sex (males vs. females), pretreatment (vehicle vs. antalarmin) and treatment (saline vs. morphine) as between-subjects factors. Analysis of firing frequency of all of the recorded cells in C57BL/6J mice revealed a sex X pretreatment X treatment interaction effect (F<sub>1,195</sub>=4.765, P<0.05). Newman-Keuls post-hoc individual group comparisons revealed that male mice treated with vehicle/morphine showed higher firing frequency than all other male and female groups (P<0.0005). Moreover, male mice treated with antalarmin/morphine showed lower firing frequency than male mice treated with vehicle/morphine (P<0.0005). In net contrast, female mice treated with antalarmin/morphine did not differ from female mice treated with vehicle/morphine (P=0.914). The latter statistical results have been added to the present revised version of the manuscript at page 8, lines 4-12. Finally, similar results were obtained following the three-way ANOVA (sex X pretreatment X treatment) of firing frequency recorded in the subset of neurons co-expressing OXY and AVP (data not shown).

      Thus, sex-linked responses to morphine were detected also by three-way ANOVAs including sex as a variable. However, in the revised version of the manuscript we did not include novel figures combining the two sexes because it would have been largely redundant with the figures already reported, especially with Fig. 1D, Fig. 1G, Fig. 2B and Fig. 2D.

      Reviewer #2 (Public review):

      This manuscript reports a series of studies that sought to identify a biological basis for morphine-induced social deficits. This goal has important translational implications and is, at present, incompletely understood in the field. The extant literature points to changes in periventricular CRF and oxytocin neurons as critical substrates for morphine to alter social behavior. The experiments utilize mice, administered morphine prior to a sociability assay. Both male and female mice show reduced sociability in this procedure. Pretreatment with the CRF1 receptor antagonist, antalarmin, clearly abolished the morphine effect in males, and the data are compelling. Consistently, CRF1-/- male mice appeared to be spared of the effect of morphine (while wild-type and het mice had reduced sociability). The same experiment was reported as non-feasible in females due to the effect of dose on exploratory behavior per se. Seeking a neural correlate of the behavioral pharmacology, acute cell-attached recordings of PVN neurons were made in acute slices from mice pretreated with morphine or anatalarmin. Morphine increased firing frequencies, and both antalarmin and CRF1-/- mice were spared of this effect. Increasing confidence that this is a CRF1 mediated effect, there is a gene deletion dose effect where het's had an intermediate response to morphine. In general, these experiments are well-designed and sufficiently powered to support the authors' inferences. A final experiment repeated the cell-attached recordings with later immunohistochemical verification of the recorded cells as oxytocin or vasopressin positive. Here the data are more nuanced. The majority of sampled cells were positive for both oxytocin and vasopressin, in cells obtained from males, morphine pretreatment increased firing in this population and was CRF1 dependent, however in females the effect of morphine was more modest without sensitivity to CRF1. Given that only ~8 cells were only immunoreactive for oxytocin, it may be premature to attribute the changes in behavior and physiology strictly to oxytocinergic neurons.

      In sum, the data provide convincing behavioral pharmacological evidence and a regional (and possibly cellular) correlation of these effects suggesting that morphine leads to sociality deficits via CRF interacting with oxytocin in the hypothalamus. While this hypothesis remains plausible, the current data do not go so far as directly testing this mechanism in a site or cell-specific way.

      We agree with this reviewer’s comment and acknowledge that further studies are needed to better understand the neural substrates of CRF<sub>1</sub> receptor-mediated sociability deficits induced by morphine. This has been mentioned at page 17, line 25 to page 18, line 6 of the present revised version of the manuscript.

      With regard to the presentation of these data and their interpretation, the manuscript does not sufficiently draw a clear link between mu-opioid receptors, their action on CRF neurons of the PVN, and the synaptic connectivity to oxytocin neurons. Importantly, sex, cell, and site-specific variations in the CRF are well established (see Valentino & Bangasser) yet these are not reviewed nor are hypotheses regarding sex differences articulated at the outset. The manuscript would have more impact on the field if the implications of the sex-specific effects evident here were incorporated into a larger literature.

      At page 15, line 19 to page 16, line 2 of the present version of the manuscript, we have mentioned prior studies reporting differences in CRF<sub>1</sub> receptor signaling or cellular compartmentalization between male and female rodents (Bangasser et al., 2013, 2010). However, the latter studies were conducted in cortical or locus coeruleus brain tissues. Thus, more studies are needed to examine CRF<sub>1</sub> receptor signaling or cellular compartmentalization in the PVN and their relationship to the sex-linked results reported in our manuscript.

      With regards to the model proposed in the discussion, it seems that there is an assumption that ip morphine or antalarmin have specific effects on the PVN and that these mediate behavior - but this is impossible to assume and there are many meaningful alternatives (for example, both MOR and CRF modulation of the raphe or accumbens are worth exploration).

      We focused our discussion on PVN OXY/AVP systems because ourelectrophysiology studies examined neurons expressing OXY and/or AVP in this brain area. However, we understand that other brain areas/systems might mediate the effect of systemic administration of the CRF<sub>1</sub> receptor antagonist antalarmin or whole-body genetic disruption of the CRF<sub>1</sub> receptor upon morphine-induced social behavior deficits. For this reason, at page 16, line 12 to page 17, line 7 of the present version of the manuscript we have mentioned the possible involvement of BNST OXY or VTA dopamine systems in the CRF<sub>1</sub> receptor-mediated social behavior effects of morphine reported herein. Indeed, literature suggests important CRF-OXY and CRF-dopamine interactions in the BNST and the VTA, which might be relevant to the expression of social behavior. Nevertheless, to date the implication of the latter brain systems interactions in social behavior alterations induced by substances of abuse remains to be elucidated.

      While it is up to the authors to conduct additional studies, a demonstration that the physiology findings are in fact specific to the PVN would greatly increase confidence that the pharmacology is localized here. Similarly, direct infusion of antalarmin to the PVN, or cell-specific manipulation of OT neurons (OT-cre mice with inhibitory dreadds) combined with morphine pre-exposure would really tie the correlative data together for a strong mechanistic interpretation.

      We agree with this reviewer’s comment that the suggested experiments would greatly increase the understanding of the brain mechanisms underlying the social behavior deficits induced by opiate substances. We have acknowledged this at page 17, line 25 to page 18, line 6.

      Because the work is framed as informing a clinical problem, the discussion might have increased impact if the authors describe how the acute effects of CRF1 antagonists and morphine might change as a result of repeated use or withdrawal.

      Prior studies reported behavioral and neuroendocrine (hypothalamus-pituitary-adrenal axis) effects of chronic systemic administration of CRF<sub>1</sub> receptor antagonists, such as R121919 and antalarmin (Ayala et al., 2004; Dong et al., 2018). However, to our knowledge, no studies have directly compared the behavioral effects of acute vs. repeated administration of CRF<sub>1</sub> receptor antagonists. We previously reported that acute administration of antalarmin increased the expression of somatic opiate withdrawal in mice, indicating that this compound is effective following withdrawal from repeated morphine administration (Papaleo et al., 2007). Nevertheless, further studies are needed to specifically address this reviewer’s comment.

      Reviewer #3 (Public review):

      Summary:

      In the current manuscript, Piccin et al. identify a role for CRF type 1 receptors in morphine-induced social deficits using a 3-chamber social interaction task in mice. They demonstrate that pre-treatment with a CRFR1 antagonist blocks morphine-induced social deficits in male, but not female, mice, and this is associated with the CRF R1 antagonist blocking morphine-induced increases in PVN neuronal excitability in male but not female mice. They followed up by using a transgenic mouse CRFR1 knockout mouse line. CRFR1 genetic deletion also blocked morphine-induced social deficits, similar to the pharmacological approach, in male mice. This was also associated with morphine-induced increases in PVN neuronal excitability being blocked in CRFR1 knockout mice. Interestingly they found that the pharmacological antagonism of the CRFR1 specifically blocked morphine-induced increases in oxytocin/AVP neurons in the PVN in male mice.

      Strengths:

      The authors used both male and female mice where possible and the studies were fairly well controlled. The authors provided sufficient methodological detail and detailed statistical information. They also examined measures of locomotion in all of the behavioral tasks to separate changes in sociability from overall changes in locomotion. The experiments were well thought out and well controlled. The use of both the pharmacological and genetic approaches provides converging lines of evidence for the role of CRFR1 in morphine-induced social deficits. Additionally, they have identified the PVN as a potential site of action for these CRFR1 effects.

      Weaknesses:

      While the authors included both sexes they analyzed them independently. This was done for simplicity's sake as they have multiple measures but there are several measures where the number of factors is reduced and the inclusion of sex as a factor would be possible.

      Please, see above our response to the same comment made by Reviewer 1.

      Additionally, single doses of both the CRFR1 antagonist and morphine are used within an experiment without justification for the doses. In fact, a lower dose of morphine was needed for the genetic CRFR1 mouse line. This would suggest that the dose of morphine being used is likely causing some aversion that may be more present in the females, as they have lower overall time in the ROI areas of both the object and the mouse following morphine exposure.

      The morphine dose was chosen based on our prior study showing that morphine (2.5 mg/kg) impaired sociability in male and female C57BL/6J mice, without affecting locomotor activity (Piccin et al., 2022). Also, the antalarmin dose (20 mg/kg) and the route of administration (per os) was chosen based on our prior studies demonstrating behavioral effects of this CRF<sub>1</sub> receptor antagonist administered per os (Contarino et al., 2017; Ingallinesi et al., 2012; Piccin and Contarino, 2020). This is now mentioned in the “materials and methods” section of the present revised version of the manuscript at page 23, lines 6-13. We also agree with this reviewer that female mice seemed more sensitive to morphine than male mice. Indeed, during the habituation phase of the three-chamber test female mice treated with morphine (2.5 mg/kg) spent less time in the ROIs containing the empty wire cages, as compared to saline-treated female mice (Fig. 1E). However, morphine did not affect locomotor activity in female mice (Fig. S1B), suggesting independency between social approach and ambulation.

      As for the discussion, the authors do not sufficiently address why CRFR1 has an effect in males but not females and what might be driving that difference, or why male and female mice have different distribution of PVN cell types during the recordings.

      At page 15, line 11 to page 16, line 2, we have mentioned possible mechanisms that might underlie the sex-linked results reported in our manuscript. Moreover, at page 16, lines 6-9 we have mentioned a seminal review reporting sex-linked expression of PVN OXY and AVP in a variety of animal species that is similar to the present results. Nevertheless, as mentioned in the “discussion” section, further studies are needed to elucidate the neural substrates underlying sex-linked effects of opiate substances upon social behavior.

      Additionally, the authors attribute their effect to CRF and CRFR1 within the PVN but do not consider the role of extrahypothalamic CRF and CRFR1. While the PVN does contain the largest density of CRF neurons there are other CRF neurons, notably in the central amygdala and BNST, that have been shown to play important roles in the impact of stress on drug-related behavior. This also holds true for the expression of CRFR1 in other regions of the brain, including the VTA, which is important for drug-related behavior and social behavior. The treatments used in the current manuscript were systemic or brain-wide deletion of CRFR1. Therefore, the authors should consider that the effects could be outside the PVN.

      Even if they suggest a role for PVN CRF<sub>1</sub>-OXY circuits, we are aware that the present data do not support a direct link between behavior and PVN CRF<sub>1</sub> receptors. Thus, at page 16, line 12 to page 17, line 7 of the present version of the manuscript we have mentioned some studies showing a role for PVN OXY, BNST OXY or VTA dopamine systems in social behavior. Interestingly, the latter brain systems are thought to interact with the CRF system. However, more studies are warranted to understand the implication of CRF-OXY or CRF-dopamine interactions in social behavior deficits induced by substances of abuse.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I commend the authors on crafting a well-written and clear manuscript with excellent figures. Furthermore, the data analysis and rigor are quite high. I have a few suggestions in the order they appear in the manuscript:

      The introduction has a number of abrupt transitions. For example, the sentence beginning with "Besides," in paragraph 2 jumps from CRF to oxytocin and vasopressin without a transition or justification. In all, vasopressin may be better removed from the introduction. There is sufficient evidence in the literature to support the CRF-OT circuit that might mediate behavioral pharmacology and this should be clearly described in the introduction.

      We have added a sentence at page 3, lines 22-23 to introduce possible interactions of the CRF system with other brain systems implicated in social behavior. Also, in the “introduction” section both OXY and AVP systems are mentioned because our electrophysiology studies examined the effect of morphine upon the activity of OXY- and AVP-positive neurons.

      Our interest in the PVN CRF-OXY/AVP network also stems from previous findings from our laboratory showing that genetic inactivation of the CRF<sub>2</sub> receptor eliminated both sociability deficits and increased hypothalamic OXY and AVP expression associated with long-term cocaine withdrawal in male mice (Morisot et al., 2018). Moreover, evidence suggests the implication of AVP systems in opiate effects. In particular, pharmacological antagonism of AVP-V1b receptors decreased the acquisition of morphine-induced conditioned place preference in male C57BL/6N mice housed with morphine-treated mice (Bates et al., 2018).

      Throughout the manuscript, it seems that there is an assumption that ip morphine or antalarmin have specific effects on the PVN and that these mediate behavior - this is impossible to assume and there are many meaningful alternatives (for example, both MOR and CRF modulation of the raphe or accumbens are worth exploration). While it is up to the authors to conduct additional studies, a demonstration that the physiology findings are in fact specific to the PVN would greatly increase confidence that the pharmacology is localized here. Similarly, direct infusion of antalarmin to the PVN, or cell-specific manipulation of OT neurons (OT-cre mice with inhibitory dreadds) combined with morphine pre-exposure would really tie the correlative data together for a strong mechanistic interpretation.

      We agree that the suggested experiments would greatly increase the understanding of the brain mechanisms underlying the social behavior deficits induced by opiate substances. This has been acknowledged at page 17, line 25 to page 18, line 6 of the present version of the manuscript.

      Also in the introduction, the reference to shank3b mice is not the most direct evidence of oxytocin involvement in sociability. It may be helpful to point reviewers to studies with direct manipulation of these populations (Grinevich group, for example).

      At page 4, lines 4-6 of the “introduction” section, we have added a sentence to mention a seminal paper by the Grinevich group demonstrating an important role for OXY-expressing PVN parvocellular neurons in social behavior (Tang et al., 2020). Moreover, at page 4, lines 8-10 we have mentioned a recent study showing that targeted chemogenetic silencing of PVN OXY neurons in male rats impaired short- and long-term social recognition memory (Thirtamara Rajamani et al., 2024).

      It would be helpful in the figures to indicate which panels contain male or female data.

      The sex of the mice is mentioned above each panel of the main and supplemental figures, except for the studies with CRF<sub>1</sub> receptor-deficient mice wherein only experiments carried out with male mice were illustrated. In the latter case, the sex (male) of the mice is mentioned in the related legend.

      The discussion itself departs from the central data in a few ways - the passages suggesting that morphine produces a stress response and that CRF1 antagonists would block the stress state are highly speculative (although testable). The manuscript would have more impact if the sex-specific effects and alternative hypotheses were enhanced in the discussion.

      At page 16, line 12 to page 17, line 7 of the “discussion” section, we have suggested that interaction of the CRF system with other brain systems implicated in social behavior (i.e., OXY, dopamine) might underlie the sex-linked CR<sub>1</sub> receptor-mediated effects of morphine reported in our manuscript. Also, at page 15, line 19 to page 16, line 2 we have mentioned studies showing sex-linked CRF<sub>1</sub> receptor signaling and cellular compartmentalization that might be relevant to the present findings. Finally, to further support the notion of morphine-induced PVN CRF activity, at page 15, lines 4-6 we have mentioned a study suggesting that activation of presynaptic mu-opioid receptors located on PVN GABA terminals might reduce GABA release (and related inhibitory effects) onto PVN CRF neurons (Wamsteeker Cusulin et al., 2013). Nevertheless, we believe that more work is needed to better understand the role for the CRF<sub>1</sub> receptor in opiate-induced stress responses and activity of OXY and dopamine systems implicated in social behavior.

      Reviewer #3 (Recommendations for the authors):

      (1) You should provide justification for the doses selected for treatments and the route of administration for the CRFR1 antagonist, especially for females.

      This has been added at page 23, lines 6-13 of the present version of the manuscript. In particular, the doses and routes of administration for morphine and antalarmin used in the present study were chosen based on previous work from our laboratory. Indeed, the intraperitoneal administration of morphine (2.5 mg/kg) impaired social behavior in male and female mice, without affecting locomotor activity (Piccin et al., 2022). Moreover, the oral route of administration for antalarmin was chosen for its translational relevance, as it could be easily employed in clinical trials assessing the therapeutic value of pharmacological CRF<sub>1</sub> receptor antagonists.

      (2) For the electrophysiology data you should include the number of cells per animal that were obtained. It appears that fewer cells from more females were obtained than in males and so the distribution of individual animals to the overall variance may be different between males and females.

      The number of cells examined and animals used in the electrophysiology experiments are reported above each panel of the related Figures 2, 3 and 4 as well as in the supplementary tables S1B and S1C. Overall, the number of cells examined in male and female mice was quite similar. Also, the number of male and female mice used was comparable. Standard errors of the mean (SEM) were quite similar across the different male and female groups (Fig. 2B and 2D), except for vehicle/morphine-treated male mice. Indeed, in the latter group a considerable number of cells displayed elevated firing responses to morphine, which accounted for the higher spread of the data. Accordingly, as mentioned above, the three-way ANOVA with sex (males vs. females), pretreatment (vehicle vs. antalarmin) and treatment (saline vs. morphine) as between-subjects factors revealed that male mice treated with vehicle/morphine showed higher firing frequency than all other male and female groups (P<0.0005). Finally, a similar pattern of firing frequency was observed also in neurons co-expressing OXY and AVP, wherein vehicle/morphine-treated male mice displayed higher SEM, as compared to all other male and female groups (Fig. 4C and 4F). Thus, except for vehicle/morphine-treated mice, distribution of the firing frequency data did not seem to be linked to the sex of the animal.

      (3) You should consider using a nested analysis for the slice electrophysiology data as that is more appropriate.

      We thank the reviewer for this suggestion. However, after careful consideration, we have decided to keep the current statistical analyses. In particular, given the relatively low variability of our data, we believe that the use of parametric ANOVA tests is appropriate. Moreover, additional details supporting our choice are provided just above in our response to the comment #2.

      (4) While it makes sense to not want to directly compare male and female data that results in needing to run a 4-way ANOVA, there are many measures, such as sociability, firing rate, etc., that if including sex as a factor would result in running a 3-way ANOVA and would allow for direct comparison of male and female mice.

      Please, see above our response to the same comment made by Reviewer 1. Notably, the results of our new statistical analyses including sex as a variable further support sex-linked effects of the CRF<sub>1</sub> receptor antagonist antalarmin upon morphine-induced sociability deficits and PVN neuronal firing. Nevertheless, we would like to keep the figures illustrating our findings as they are since it easily allows detecting the observed sex-linked results. Finally, we hope that this reviewer agrees with our choice, which is consistent with the wording of the title (i.e., “in male mice”).

      (5) There are grammatical and phrasing issues throughout the manuscript and the manuscript would benefit from additional thorough editing.

      We appreciate this reviewer’s feedback. Thus, upon revising, we have carefully edited the manuscript with regard to possible grammatical and phrasing errors. We hope that our changes have made the manuscript clearer in order to facilitate readability by the audience.

      (6) The discussion should be edited to include consideration of an explanation for the presence of the effect in male, but not female, mice more clearly. The discussion should also include some discussion as to why the distribution of cell types used in the electrophysiology recordings was different between males and females and whether the distribution of CRFR1 is different between males and females. Lastly, the authors need to include consideration of extrahypothalamic CRF and CRFR1 as a possible explanation for their effects. While they have PVN neuron recordings, the treatments that they used are brain-wide and therefore the possibility that the critical actions of CRFR1 could be outside the PVN.

      At page 15, line 11 to page 16, line 2 of the “discussion” section, we have suggested several mechanisms that might underlie the sex-linked behavioral and brain effects of CR<sub>1</sub> receptor antagonism reported in our manuscript. With regard to the distribution of cell types examined in the electrophysiology studies, at page 16, lines 6-9 we have mentioned a seminal review reporting sex-linked expression of PVN OXY and AVP in a variety of animal species that is similar to our results. Moreover, at page 18, lines 2-6 we mentioned that more studies are needed to examine PVN CRF<sub>1</sub> receptor expression in male and female animals, an issue that is still poorly understood. Finally, at page 16, line 12 to page 17, line 7 of the “discussion” section we also suggest that CRF<sub>1</sub> receptor-expressing brain areas other than the PVN, such as the BNST or the VTA, might contribute to the sex-linked effects of morphine reported in our manuscript. Thus, in agreement with this reviewer’s suggestion, in the present version of the manuscript we have further emphasized the possible implication of CRF<sub>1</sub> receptor-expressing extrahypothalamic brain areas in social behavior deficits induced by opiate substances.

      References

      Ayala AR, Pushkas J, Higley JD, Ronsaville D, Gold PW, Chrousos GP, Pacak K, Calis KA, Gerald M, Lindell S, Rice KC, Cizza G. 2004. Behavioral, adrenal, and sympathetic responses to long-term administration of an oral corticotropin-releasing hormone receptor antagonist in a primate stress paradigm. J Clin Endocrinol Metab 89:5729–5737. doi:10.1210/jc.2003-032170

      Bangasser DA, Curtis A, Reyes B a. S, Bethea TT, Parastatidis I, Ischiropoulos H, Van Bockstaele EJ, Valentino RJ. 2010. Sex differences in corticotropin-releasing factor receptor signaling and trafficking: potential role in female vulnerability to stress-related psychopathology. Mol Psychiatry 15:877, 896–904. doi:10.1038/mp.2010.66

      Bangasser DA, Reyes B a. S, Piel D, Garachh V, Zhang X-Y, Plona ZM, Van Bockstaele EJ, Beck SG, Valentino RJ. 2013. Increased vulnerability of the brain norepinephrine system of females to corticotropin-releasing factor overexpression. Mol Psychiatry 18:166–173. doi:10.1038/mp.2012.24

      Bates MLS, Hofford RS, Emery MA, Wellman PJ, Eitan S. 2018. The role of the vasopressin system and dopamine D1 receptors in the effects of social housing condition on morphine reward. Drug Alcohol Depend 188:113–118. doi:10.1016/j.drugalcdep.2018.03.021

      Contarino A, Kitchener P, Vallée M, Papaleo F, Piazza P-V. 2017. CRF1 receptor-deficiency increases cocaine reward. Neuropharmacology 117:41–48. doi:10.1016/j.neuropharm.2017.01.024

      Dong H, Keegan JM, Hong E, Gallardo C, Montalvo-Ortiz J, Wang B, Rice KC, Csernansky J. 2018. Corticotrophin releasing factor receptor 1 antagonists prevent chronic stress-induced behavioral changes and synapse loss in aged rats. Psychoneuroendocrinology 90:92–101. doi:10.1016/j.psyneuen.2018.02.013

      Ingallinesi M, Rouibi K, Le Moine C, Papaleo F, Contarino A. 2012. CRF2 receptor-deficiency eliminates opiate withdrawal distress without impairing stress coping. Mol Psychiatry 17:1283–1294. doi:10.1038/mp.2011.119

      Jiang Z, Rajamanickam S, Justice NJ. 2019. CRF signaling between neurons in the paraventricular nucleus of the hypothalamus (PVN) coordinates stress responses. Neurobiol Stress 11:100192. doi:10.1016/j.ynstr.2019.100192

      Jiang Z, Rajamanickam S, Justice NJ. 2018. Local Corticotropin-Releasing Factor Signaling in the Hypothalamic Paraventricular Nucleus. J Neurosci 38:1874–1890. doi:10.1523/JNEUROSCI.1492-17.2017

      Morisot N, Monier R, Le Moine C, Millan MJ, Contarino A. 2018. Corticotropin-releasing factor receptor 2-deficiency eliminates social behaviour deficits and vulnerability induced by cocaine. Br J Pharmacol 175:1504–1518. doi:10.1111/bph.14159

      Papaleo F, Kitchener P, Contarino A. 2007. Disruption of the CRF/CRF1 receptor stress system exacerbates the somatic signs of opiate withdrawal. Neuron 53:577–589. doi:10.1016/j.neuron.2007.01.022

      Piccin A, Contarino A. 2020. Sex-linked roles of the CRF1 and the CRF2 receptor in social behavior. J Neurosci Res 98:1561–1574. doi:10.1002/jnr.24629

      Piccin A, Courtand G, Contarino A. 2022. Morphine reduces the interest for natural rewards. Psychopharmacology (Berl) 239:2407–2419. doi:10.1007/s00213-022-06131-7

      Rosinger ZJ, Jacobskind JS, De Guzman RM, Justice NJ, Zuloaga DG. 2019. A sexually dimorphic distribution of corticotropin-releasing factor receptor 1 in the paraventricular hypothalamus. Neuroscience 409:195–203. doi:10.1016/j.neuroscience.2019.04.045

      Roy A, Laas K, Kurrikoff T, Reif A, Veidebaum T, Lesch K-P, Harro J. 2018. Family environment interacts with CRHR1 rs17689918 to predict mental health and behavioral outcomes. Prog Neuropsychopharmacol Biol Psychiatry 86:45–51. doi:10.1016/j.pnpbp.2018.05.004

      Tang Y, Benusiglio D, Lefevre A, Hilfiger L, Althammer F, Bludau A, Hagiwara D, Baudon A, Darbon P, Schimmer J, Kirchner MK, Roy RK, Wang S, Eliava M, Wagner S, Oberhuber M, Conzelmann KK, Schwarz M, Stern JE, Leng G, Neumann ID, Charlet A, Grinevich V. 2020. Social touch promotes interfemale communication via activation of parvocellular oxytocin neurons. Nat Neurosci 23:1125–1137. doi:10.1038/s41593-020-0674-y

      Thirtamara Rajamani K, Barbier M, Lefevre A, Niblo K, Cordero N, Netser S, Grinevich V, Wagner S, Harony-Nicolas H. 2024. Oxytocin activity in the paraventricular and supramammillary nuclei of the hypothalamus is essential for social recognition memory in rats. Mol Psychiatry 29:412–424. doi:10.1038/s41380-023-02336-0

      Valentino RJ, Van Bockstaele E, Bangasser D. 2013. Sex-specific cell signaling: the corticotropin-releasing factor receptor model. Trends Pharmacol Sci 34:437–444. doi:10.1016/j.tips.2013.06.004

      Van Pett K, Viau V, Bittencourt JC, Chan RK, Li HY, Arias C, Prins GS, Perrin M, Vale W, Sawchenko PE. 2000. Distribution of mRNAs encoding CRF receptors in brain and pituitary of rat and mouse. J Comp Neurol 428:191–212. doi:10.1002/1096-9861(20001211)428:2<191::aid-cne1>3.0.co;2-u

      Wamsteeker Cusulin JI, Füzesi T, Inoue W, Bains JS. 2013. Glucocorticoid feedback uncovers retrograde opioid signaling at hypothalamic synapses. Nat Neurosci 16:596–604. doi:10.1038/nn.3374

      Weber H, Richter J, Straube B, Lueken U, Domschke K, Schartner C, Klauke B, Baumann C, Pané-Farré C, Jacob CP, Scholz C-J, Zwanzger P, Lang T, Fehm L, Jansen A, Konrad C, Fydrich T, Wittmann A, Pfleiderer B, Ströhle A, Gerlach AL, Alpers GW, Arolt V, Pauli P, Wittchen H-U, Kent L, Hamm A, Kircher T, Deckert J, Reif A. 2016. Allelic variation in CRHR1 predisposes to panic disorder: evidence for biased fear processing. Mol Psychiatry 21:813–822. doi:10.1038/mp.2015.125

      Zeng P-Y, Tsai Y-H, Lee C-L, Ma Y-K, Kuo T-H. 2023. Minimal influence of estrous cycle on studies of female mouse behaviors. Front Mol Neurosci 16:1146109. doi:10.3389/fnmol.2023.1146109

      Zhao W, Li Q, Ma Y, Wang Z, Fan B, Zhai X, Hu M, Wang Q, Zhang M, Zhang C, Qin Y, Sha S, Gan Z, Ye F, Xia Y, Zhang G, Yang L, Zou S, Xu Z, Xia S, Yu Y, Abdul M, Yang J-X, Cao J-L, Zhou F, Zhang H. 2021. Behaviors Related to Psychiatric Disorders and Pain Perception in C57BL/6J Mice During Different Phases of Estrous Cycle. Front Neurosci 15:650793. doi:10.3389/fnins.2021.650793

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Comment 1: This manuscript from Clayton and co-authors, entitled ”Mechanism of dimer selectivity and binding cooperativity of BRAF inhibitors”, aims to clarify the molecular mechanism of BRAF dimer selectivity. Indeed, first-generation BRAF inhibitors, targeting monomeric BRAFV600E, are ineffective in treating resistant dimeric BRAF isoforms. Here, the authors employed molecular dynamics simulations to study the conformational dynamics of monomeric and dimeric BRAF, in the presence and absence of inhibitors. Multi-microsecond MD simulations showed an inward shift of the αC helix in the BRAFV600E mutant dimer. This helped in identifying a hydrogen bond between the inhibitors and the BRAF residue Glu501 as critical for dimer compatibility. The stability of the aforementioned interaction seems to be important to distinguish between dimer-selective and equipotent inhibitors.

      The study is overall valuable and robust. The authors used the recently developed particle mesh Ewald constant pH molecular dynamics, a state-of-the-art method, to investigate the correct histidine protonation considering the dynamics of the protein. Then, multi-microsecond simulations showed differences in the flexibility of the αC helix and DFG motif. The dimerization restricts the αC position in the inward conformation, in agreement with the result that dimer-compatible inhibitors can stabilize the αC-in state. Noteworthy, the MD simulations were used to study the interactions between the inhibitors and the protein, suggesting a critical role for a hydrogen bond with Glu501. Finally, simulations of a mixed state of BRAF (one protomer bound to the inhibitor and the other apo) indicate that the ability to stabilize the inward αC state of the apo protomer could be at the basis of the positive cooperativity of PHI1.

      We thank the reviewer for the positive evaluation of our work.

      Comment 2a: Regarding the analyses of the mixed state simulations, the DFG dihedral probability densities for the apo protomer (Fig. 5a right) are highly overlapping. It is not convincing that a slight shift can support the conclusion that the binding in one protomer is enough to shift the DFG motif outward allosterically. Moreover, the DFG dihedral time-series for the apo protomer (Supplementary Figure 9) clearly shows that the measured quantities are affected by significant fluctuations and poor consistency between the three replicates. The apo protomer of the mixed state simulations could be affected by the same problem that the authors pointed out in the case of the apo dimer simulations, where the amount of sampling is insufficient to model the DFG-out/-in transition properly.

      While the reviewer is correct there are large fluctuations in the DFG pseudo dihedral over the course of the apo simulations, these fluctuations occur primarily in the first 2 µs of the simulations, which were removed from our analysis. The reviewer is also correct that these simulations do not sufficiently model the DFG-out/-in transition; however, a full transition is not necessary for our analysis, as we are only interested in the shift of the DFG pseudo dihedral. As to the reviewer’s comment on the overlapping DFG distributions, we agree that the difference is very subtle. We revised the text.

      On page 9, second paragraph from the bottom:

      “While PHI1 or LY binding clearly perturbs the αC helix of the opposite apo protomer, the effect on the DFG conformation is less clear when comparing the DFG dihedral distribution of the the apo protomer in the PHI1 or LY-mixed dimer with that of the apo dimer (blue, orange, and grey, Figure 5a right). All three distributions are broad, covering a range of 160-330°. It appears that, relative to the apo dimer, the DFG of the apo protomer in the PHI1-mixed dimer is slightly shifted to the right, whereas that of the LY-mixed dimer is slightly shifted to the left; however, these differences are very subtle and warrant further investigation in future studies.”

      Comment 2b: There is similar concern with the Lys483-Glu501 salt bridge measured for the apo protomers of the mixed simulations. As it can be observed from the probabilities bar plot (Fig. 5a middle), the standard deviation is too high to support a significant role for this interaction in the allosteric modulation of the apo protomer.

      As for the salt bridge, the fluctuation in the apo dimer and LY-mixed dimer is indeed large, and together with the lower average probability suggests that the salt bridge is weaker, which is consistent with the αC helix moving outward. To clarify this, we revised the text.

      On page 9, second paragraph from the bottom:

      “Consistent with the inward shift of the αC helix, the Glu501–Lys483 salt bridge has a lower average probability and a larger fluctuation in the apo dimer and the apo protomer of the LY-mixed dimer, as compared to the apo protomer of the PHI1-mixed dimer.”

      Reviewer #2 (Public review):

      Comment 1: The authors employ molecular dynamics simulations to understand the selectivity of FDA approved inhibitors within dimeric and monomeric BRAF species. Through these comprehensive simulations, they shed light on the selectivity of BRAF inhibitors by delineating the main structural changes occurring during dimerization and inhibitor action. Notably, they identify the two pivotal elements in this process: the movement and conformational changes involving the alpha-C helix and the formation of a hydrogen bond involving the Glu-501 residue. These findings find support in the analyses of various structures crystallized from dimers and co-crystallized monomers in the presence of inhibitors. The elucidation of this mechanism holds significant potential for advancing our understanding of kinase signalling and the development of future BRAF inhibitor drugs.

      The authors employ a diverse array of computational techniques to characterize the binding sites and interactions between inhibitors and the active site of BRAF in both dimeric and monomeric forms. They combine traditional and advanced molecular dynamics simulation techniques such as CpHMD (all-atom continuous constant pH molecular dynamics) to provide mechanistic explanations. Additionally, the paper introduces methods for identifying and characterizing the formation of the hydrogen bond involving the Glu501 residue without the need for extensive molecular dynamics simulations. This approach facilitates the rapid identification of future BRAF inhibitor candidates.

      We thank the reviewer for the positive evaluation of our work.

      Comment 2: Despite the use of molecular dynamics yields crucial structural insights and outlines a mechanism to elucidate dimer selectivity and cooperativity in these systems, the authors could consider adoption of free energy methods to estimate the values of hydrogen bond energies and hydrophobic interactions, thereby enhancing the depth of their analysis.

      As mentioned in our previous response, current free energy methods are capable of giving accurate estimates of the relative binding free energies of similar ligands; however, accurate calculations of the absolute free energies of hydrogen bond and hydrophobic interactions are not feasible yet. Thus, we decided not to pursue the calculations.

      Reviewer #1 (Recommendations to author):

      Comment 1: It would be useful to cite all supplementary figures in the main text (where relevant). In the present version, only Supplementary Figures 2,3, and 4 are cited in the main text.

      This was an oversight; supplementary figures 5 through 9 are now cited in the text, to point to the time-series of the quantity discussed. We note that supplementary figures 10 and 11 show the time-series of the root mean squared deviation (RMSD) of each protomer in both all monomeric and dimeric simulations; these quantities are not discussed in the manuscript but are provided for further insight.

      Comment 2: It is unclear whether the present data could support a direct involvement of the DFG movement in the allosteric mechanism proposed. The same argument applies to the Lys483Glu501 interaction in the apo protomer of the mixed state simulations. The current simulation data could only support a different stabilization of the αC-helix position. The authors should either remove/tone down the claim or extend the simulations to sample a ”converged” distribution of the DFG dihedral and the Lys483-Glu501 salt bridge of the apo protomers.

      We agree that the DFG change in the apo protomer of the PH1-mixed dimer is very subtle (see our response and revision to comment 2); however, the allosteric involvement of DFG is clearly demonstrated in Figure 5 (right panel in 5a and 5b). We compare three states: apo protomer in the mixed dimer, PHI1-bound protomer in the mixed dimer, and holo dimer (i.e., with two PHI1) Binding of the first PHI1 restricts the DFG conformation to the larger DFG dihedrals (blue curves in the top and bottom right panels). This effect (DFG outward and more restricted) is even strong when the second PHI1 binds, locking the DFG in both protomers to a narrow dihedral range 270–330 degree (green and blue curves in Figure 5b, right panel). These are allosteric effects, demonstrating that the second PH1 binding induces conformational change of the DFG in the first protomer. This is why in Figure 6, the DFG of the PHI1-bound protomer in the mixed dimer is labeled as “almost out”, while the DFG in the holo dimer is labeled as “fully out”.

      The effect of second PHI1 on the DFG of the first protomer is consistent with that the αC helix position, in which case, the second PH1 induces an inward movement of the αC of the first protomer (illustrated as “fully in” in the schematic Figure 6). Through the aC movement, the salt-bridge strength is affected, as we discussed in our response and revision to Reviewer’s comment 2a. To clarify these points, we revised the discussion of Figure 5. We made the x axis range of the DFG dihedral distributions the same between the top and bottom panels in Figure 5. To remove the claim of priming effect on DFG, we revised Figure 6.

      Page 10, Figure 5:

      we made the x axis range of the DFG dihedral distributions on the top and bottom panels the same to facilitate comparison.

      Page 11, second and third paragraphs:

      “Consistent with the change in the DFG conformation between the holo (two inhibitor) and apo dimers (Figure 3c,3f), DFG is rigidified upon binding of the first inhibitor, as evident from the narrower DFG dihedral distribution of the PHI1 or LY-bound protomer in the mixed protomer (Figure 5b right) compared to the apo protomer in the mixed dimer (Figure 5a right). Importantly, the DFG dihedral is right shifted in the occupied vs. apo protomer, demonstrating that the inhibitor pushes the DFG outward.”

      “Consistent with the effect of the second PHI1 on the αC position of the first PHI1-bound protomer, binding of the second PHI1 shifts the peak of the DFG distribution for both protomers further outward, as shown by the 30° larger DFG pseudo dihedral in the holo dimer relative to the mixed dimer (green and blue in Figure 5b right; Supplementary Figures 6,9). In contrast, there is no significant difference in the DFG pseudo dihedral between the LY-mixed and holo dimers. These data suggest that while the binding of the first PHI1 pushes the DFG outward, binding of the second PHI1 has an allosteric effect, shifting the DFG of the opposite protomer further outward.”

      On page 12, the last paragraph of Conclusion, we remove the claim of the priming effect for DFG:

      “The first PHI1 binding in the BRAF<sup>V600E</sup> dimer restricts the motion of the αC helix and DFG, shifting them slightly inward and outward, respectively (Figure 6, bottom right panel). Intriguingly, the first PHI1 binding primes the apo protomer by making the αC more favorable for binding, i.e., shifting the αC inward (Figure 6, bottom right panel). Importantly, upon binding the second PHI1, the αC helix is shifted further inward and the DFG is shifted further outward in both protomers.”

      On page 13, Figure 6:

      we removed the label “slightly outward” for DFG.

      Comment 3: An alternative approach could be using enhanced sampling methods to enhance the diffusion along these coordinates.

      We thank the reviewer for bringing up this point. While that the allostery and cooperativity effects are apparent from our simulation data, we agree that enhanced sampling methods in principle could be used to further converge the conformational sampling; however, these approaches face significant challenges. First, BRAF dimer is weakly associated, with αC helix forming a part of the dimer interface. Enhanced sampling of αC helix would likely result in dimer dissociation. On the other hand, simply using RMSD as a reaction coordinate or progress variable would not necessarily enhance the motion of αC helix or DFG or activation loop, which are all coupled. Second, our extensive simulations of a monomer kinase with metadynamics demonstrated that the kinase conformation becomes distorted when a biasing potential is placed to enhance the motion of DFG. This is likely because the other parts of the protein do not have enough time to relax to accommodate the conformational change. To our knowledge, this aspect has not been discussed in the current metadynamics literature, which focuses on the free energy differences and (local) conformational changes along the reaction coordinate. To clarify these points, we added a discussion.

      Page 6, end of the first paragraph:

      “We note that enhanced sampling methods were not used due to several challenges. First, the BRAF dimer is weakly associated, with αC helix forming a part of the dimer interface (Figure 1a). Enhanced sampling (particularly of αC helix) would likely lead to dimer dissociation. Second, biased sampling methods such as metadynamics may lead to unrealistic conformational states due to the slow relaxation of some parts of the protein to accommodate the conformational change directed by the reaction coordinate. For example, our unpublished metadynamics simulations of a monomer kinase showed that enhancing the DFG conformational change resulted in distortion of the kinase structure.”

      We thank the reviewers again for their valuable comments. We believe our revision has further elevated the quality of the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors test the "OHC-fluid-pump" hypothesis by assaying the rates of kainic acid dispersal both in quiet and in cochleae stimulated by sounds of different levels and spectral content. The main result is that sound (and thus, presumably, OHC contractions and expansions) result in faster transport along the duct. OHC involvement is corroborated using salicylate, which yielded results similar to silence. Especially interesting is the fact that some stimuli (e.g., tones) seem to provide better/faster pumping than others (e.g., noise), ostensibly due to the phase profile of the resulting cochlear traveling-wave response.

      Strengths:

      The experiments appear well controlled and the results are novel and interesting. Some elegant cochlear modeling that includes coupling between the organ of Corti and the surrounding fluid as well as advective flow supports the proposed mechanism.

      The current limitations and future directions of the study, including possible experimental tests, extensions of the modeling work, and practical applications to drug delivery, are thoughtfully discussed.

      Weaknesses:

      Although the authors provide compelling evidence that OHC motility can usefully pump fluid, their claim (last sentence of the Abstract) that wideband OHC motility (i.e., motility in the "tail" region of the traveling wave) evolved for the purposes of circulating fluid---rather then emerging, say, as a happy by-product of OHC motility that evolved for other reasons---seems too strong.

      We adjusted our tone to be less assertive.

      Our measurements and simulations coherently suggest that active outer hair cells in the tail region of cochlear traveling waves drive cochlear fluid circulation.

      Reviewer #2 (Public review):

      Although recent cochlear micromechanical measurements in living animals have shown that outer hair cells drive broadband vibration of the reticular lamina, the role of this vibration in cochlear fluid circulation remains unknown. The authors hypothesized that motile outer hair cells may facilitate cochlear fluid circulation. To test this hypothesis, they investigated the effects of acoustic stimuli and salicylate, an outer hair cell motility blocker, on kainic acid-induced changes in the cochlear nucleus activities. The results demonstrated that acoustic stimuli reduced the latency of the kainic acid effect, with low-frequency tones being more effective than broadband noise. Salicylate reduced the effect of acoustic stimuli on kainic acid-induced changes. The authors also developed a computational model to provide a physical framework for interpreting experimental results. Their combined experimental and simulated results indicate that broadband outer hair cell action serves to drive cochlear fluid circulation.

      The major strengths of this study lie in its high significance and the synergistic use of electrophysiological recording of the cochlear nucleus responses alongside computational modeling. Cochlear outer hair cells have long been believed to be responsible for the exceptional sensitivity, sharp tuning, and huge dynamic range of mammalian hearing. However, recent observations of the broadband reticular lamina vibration contradict widely accepted view of frequency-specific cochlear amplification. Furthermore, there is currently no effective noninvasive method to deliver the drugs or genes to the cochlea, a crucial need for treating sensorineural hearing loss, one of the most common auditory disorders. This study addresses these important questions by observing outer hair cells' roles in the cochlear transport of kainic acid. The well-established electrophysiological method used to record cochlear nucleus responses produced valuable new data, and the custom-developed developed computational model greatly enhanced the interpretation of the experimental results.

      The authors successfully tested their hypothesis, with both the experimental and modeling results supporting the conclusion that active outer hair cells can enhance cochlear fluid circulation in the living cochlea.

      The findings from this study can potentially be applied for treating sensorineural hearing loss and advance our understanding of how outer hair cells contribute to cochlear amplification and normal hearing.

      Reviewer #3 (Public review):

      Summary:

      This study reveals that sound exposure enhances drug delivery to the cochlea through the nonselective action of outer hair cells. The efficiency of sound-facilitated drug delivery is reduced when outer hair cell motility is inhibited. Additionally, low-frequency tones were found to be more effective than broadband noise for targeting substances to the cochlear apex. Computational model simulations support these findings.

      Strengths:

      The study provides compelling evidence that the broad action of outer hair cells is crucial for cochlear fluid circulation, offering a novel perspective on their function beyond frequency-selective amplification. Furthermore, these results could offer potential strategies for targeting and optimizing drug delivery throughout the cochlear spiral.

      Weaknesses:

      The primary weakness of this paper lies in the surgical procedure used for drug administration through the round window. Opening the cochlea can alter intracochlear pressure and disrupt the traveling wave from sound, a key factor influencing outer hair cell activity. However, the authors do not provide sufficient details on how they managed this issue during surgery. Additionally, the introduction section needs further development to better explain the background and emphasize the significance of the work.

      Comments on revisions:

      Thank you for addressing the comments and concerns. The author has responded to all points thoroughly and clarified them well. However, please include the key points from the responses to the comments (Introduction ((3), (5)) and Results ((5)) into the manuscript. While the explanations in the response letter are reasonable, the current descriptions in the manuscript may limit the reader's understanding. Expanding on these points in the Introduction, Results, or Discussion sections would enhance clarity and comprehensiveness.

      Introduction (3): As inner-ear fluid homeostasis is maintained locally, longitudinal electro-chemical gradients, including the endocochlear potential, may vary along the cochlear length (Schulte and Schmiedt 1992; Sadanaga and Morimitsu 1995; Hirose and Liberman 2003).

      Introduction (5): We do not want to distract the readers from the primary message by discussing different drug delivery methods into the inner ear. This paper is regarding active outer hair cells’ new role as the title suggests. An extensive discussion of drug delivery can confuse the theme of this work.

      Results (5): High frequencies were not tested because they would not affect drug delivery to the apex of the cochlea (i.e., the traveling waves stop near the CF location.)

    1. Author response:

      We thank the three anonymous reviewers who took the time to read and evaluate our work. We look forward to submitting a revised version of  the manuscript that addresses their comments. 

      We agree with the reviewers that missing genes and incomplete genome assemblies can be challenges when trying to make interspecies comparisons in a complex and repetitive region like the MHC. Our revised manuscript will include more discussion of this topic, and we look forward to future work on this region that considers the next generation of complete telomere-to-telomere genomes with long-read sequencing.

      Repeating this analysis with other gene families—immune and non-immune—is a great idea. While outside of the scope of this work, this will provide many opportunities for comparison and help tease apart the features that make this family unique.

      We also point readers to our companion paper, Ancient Trans-Species Polymorphism at the Major Histocompatibility Complex in Primates, which tackles different (but related) questions about long-term balancing selection in the primate MHC and also summarizes relevant past work in the area. This second paper addresses some questions raised by reviewers here.

    1. Author response:

      Reviewer #1 (Public review):

      The authors present their new bioinformatic tool called TEKRABber, and use it to correlate expression between KRAB ZNFs and TEs across different brain tissues, and across species. While the aims of the authors are clear and there would be significant interest from other researchers in the field for a program that can do such correlative gene expression analysis across individual genomes and species, the presented approach and work display significant shortcomings. In the current state of the analysis pipeline, the biases and shortcomings mentioned below, for which I have seen no proof that they are accounted for by the authors, are severely impacting the presented results and conclusions. It is therefore essential that the points below are addressed, involving significant changes in the TEKRABber program as well as the analysis pipeline, to prevent the identification of false positive and negative signals, that would severely affect the conclusions one can raise about the analysis.

      Thank you very much for the insightful review of our manuscript.

      My main concerns are provided below:

      (1) One important shortcoming of the biocomputational approach is that most TEs are not actually expressed, and others (Alus) are not a proxy of the activity of the TE class at all. I will explain: While specific TE classes can act as (species-specific) promoters for genes (such as LTRs) or are expressed as TE derived transcripts (LINEs, SVAs), the majority of other older TE classes do not have such behavior and are either neutral to the genome or may have some enhancer activity (as mapped in the program they refer to 'TEffectR'. A big focus is on Alus, but Alus contribute to a transcriptome in a different way too: They often become part of transcripts due to alternative splicing. As such, the presence of Alu derived transcripts is not a proxy for the expression/activity of the Alu class, but rather a result of some Alus being part of gene transcripts (see also next point). The bottom line is that the TEKRABber software/approach is heavily prone to picking up both false positives (TEs being part of transcribed loci) and false negatives (TEs not producing any transcripts at all), which has a big implication for how reads from TEs as done in this study should be interpreted: The TE expression used to correlate the KRAB ZNF expression is simply not representing the species-specific influences of TEs where the authors are after.

      With the strategy as described, a lot of TE expression is misinterpreted: TEs can be part of gene-derived transcripts due to alternative splicing (often happens for Alus) or as a result of the TE being present in an inefficiently spliced out intron (happens a lot) which leads to TE-derived reads as a result of that TE being part of that intron, rather than that TE being actively expressed. As a result, the data as analysed is not reliably indicating the expression of TEs (as the authors intend to) and should be filtered for any reads that are coming from the above scenarios: These reads have nothing to do with KRAB ZNF control, and are not representing actively expressed TEs and therefore should be removed. Given that from my lab's experience in the brain (and other) tissues, the proportion of RNA sequencing reads that are actually derived from active TEs is a stark minority compared to reads derived from TEs that happen to be in any of the many transcribed loci, applying this filtering is expected to have a huge impact on the results and conclusions of this study.

      We sincerely thank the reviewer for highlighting the potential issues of false positives and negatives in TE quantification. The reviewer provided valuable examples of how different TE classes, such as Alus, LTRs, LINEs, and SVAs, exhibit distinct behaviors in the genome. To our knowledge, specific tools like ERVmap (Tokuyama et al., 2018), which annotates ERVs, and LtrDetector (Joseph et al., 2019), which uses k-mer distributions to quantify LTRs, could indeed enhance precision by treating specific TE classes individually. We acknowledge that such approaches may yield more accurate results and appreciate the suggestion.

      In our study, we used TEtranscripts (Jin et al., 2015) prior to TEKRABber. TEtranscripts applies the Expectation Maximization (EM) algorithm to assign ambiguous reads as the following steps. Uniquely mapped reads are first assigned to genes, and reads overlapping genes and TEs are assigned to TEs only if they do not uniquely match an annotated gene. The remaining ambiguous reads are distributed based on EM iterations. While this approach may not be as specialized as the latest tools for specific TE classes, it provides a general overview of TE activity. TEtranscripts outputs subfamily-level TE expression data, which we used as input for TEKRABber to perform downstream analyses such as differential expression and correlation studies.

      We understand the importance of adapting tools to specific research objectives, including focusing on particular TE classes. TEKRABber is designed not to refine TE quantification at the mapping stage but to flexibly handle outputs from various TE quantification tools. It accepts raw TE counts as input in the form of dataframes, enabling diverse analytical pipelines. In the revised version of our manuscript, we will emphasize this distinction in the discussion and provide examples of how TEKRABber can integrate with other tools to enhance specificity and accuracy.

      (2) Another potential problem that I don't see addressed is that due to the high level of similarity of the many hundreds of KRAB ZNF genes in primates and the reads derived from them, and the inaccurate annotations of many KZNFs in non-human genomes, the expression data derived from RNA-seq datasets cannot be simply used to plot KZNF expression values, without significant work and manual curation to safeguard proper cross species ortholog-annotation: The work of Thomas and Schneider (2011) has studied this in great detail but genome-assemblies of non-human primates tend to be highly inaccurate in appointing the right ortholog of human ZNF genes. The problem becomes even bigger when RNA-sequencing reads are analyzed: RNA-sequencing reads from a human ZNF that emerged in great apes by duplication from an older parental gene (we have a decent number of those in the human genome) may be mapped to that older parental gene in Macaque genome: So, the expression of human-specific ZNF-B, that derived from the parental ZNF-A, is likely to be compared in their DESeq to the expression of ZNF-A in Macaque RNA-seq data. In other words, without a significant amount of manual curation, the DE-seq analysis is prone to lead to false comparisons which make the strategy and KRABber software approach described highly biased and unreliable.

      There is no doubt that there are differences in expression and activity of KRAB-ZNFs and TEs respectively that may have had important evolutionary consequences. However, because all of the network analyses in this paper rely on the analyses of RNA-seq data and the processing through the TE-KRABber software with the shortcomings and potential biases that I mentioned above, I need to emphasize that the results and conclusions are likely to be significantly different if the appropriate measures are taken to get more accurate and curated TE and KRAB ZNF expression data.

      We thank the reviewer for raising the important issue of accurately annotating the expanded repertoire of KRAB-ZNFs in primates, particularly the challenges of cross-species orthology and potential biases in RNA-seq data analysis. Indeed, we have also addressed this challenge in some of our previous papers (Nowick et al., 2010, Nowick et al., 2011 and Jovanovic et al., 2021).

      In the revised manuscript, we will include more details about our two-step strategy to ensure accurate KRAB-ZNF ortholog assignments. First, we employed the Gene Order Conservation (GOC) score from Ensembl BioMart as a primary filter, selecting only one-to-one orthologs with a GOC score above 75% across primates. This threshold, recommended in Ensembl’s ortholog quality control guidelines, ensures high-confidence orthology relationships, (http://www.ensembl.org/info/genome/compara/Ortholog_qc_manual.html#goc).

      Second, we incorporated data from Jovanovic et al. (2021), which independently validated KRAB-ZNF orthologs across 27 primate genomes. This additional layer of validation allowed us to refine our dataset, resulting in the identification of 337 orthologous KRAB-ZNFs for differential expression analysis (Figure S2).

      We acknowledge that different annotation methods or criteria may for some genes yield variations in the identified orthologs. However, we believe that this combination provides a robust starting point for addressing the challenges raised, while we remain open to additional refinements in future analyses.

      (3) The association with certain variations in ZNF genes with neurological disorders such as AD, as reported in the introduction is not entirely convincing without further functional support. Such associations could merely happen by chance, given the high number of ZNF genes in the human genome and the high chance that variations in these loci happen to associate with certain disease-associated traits. So using these associations as an argument that changes in TEs and KRAB ZNF networks are important for diseases like AD should be used with much more caution.

      There are a number of papers where KRAB ZNF and TE expression are analysed in parallel in human brain tissues. So the novelty of that aspect of the presented study may be limited.

      We fully acknowledge the concern that, given the large number of KRAB-ZNFs and their inherent variability, some associations with AD or other neurological disorders could occur by chance. This highlights the importance of additional functional studies to validate the causal role of KRAB-ZNF and TE interactions in disease contexts. While previous studies have indeed analyzed KRAB-ZNF and TE expression in human brain tissues, our study seeks to expand on this foundation by incorporating interspecies comparisons across primates. This approach enabled us to identify TE:KRAB-ZNF pairs that are uniquely present in healthy human brains, which may provide insights into their potential evolutionary significance and relevance to diseases like AD.

      In addition to analyzing RNA-seq data (GSE127898 and syn5550404), we have cross-validated our findings using ChIP-exo data for 159 KRAB-ZNF proteins and their TE binding regions in human (Imbeault et al., 2017). This allowed us to identify specific binding events between KRAB-ZNF and TE pairs, providing further support for the observed associations. We agree with the reviewer that additional experimental validations, such as functional studies, are critical to further establish the role of KRAB-ZNF and TE networks in AD. We hope that future research can build upon our findings to explore these associations in greater detail.

      Reviewer #2 (Public review):

      Summary:

      The aim was to decipher the regulatory networks of KRAB-ZNFs and TEs that have changed during human brain evolution and in Alzheimer's disease.

      Strengths:

      This solid study presents a valuable analysis and successfully confirms previous assumptions, but also goes beyond the current state of the art.

      Weaknesses:

      The design of the analysis needs to be slightly modified and a more in-depth analysis of the positive correlation cases would be beneficial. Some of the conclusions need to be reinterpreted.

      We sincerely thank the reviewer for the thoughtful summary, positive evaluation of our study, and constructive feedback. We appreciate the recognition of the strengths in our analysis and the valuable suggestions for improving its design and interpretation.

      We would like to briefly comment on the suggested modifications to the design here, and will provide a detailed point-by-point review later with our revised manuscript.

      The reviewer recommended considering a more recent timepoint, such as less than 25 million years ago (mya), to define the "evolutionary young group" of KRAB-ZNF genes and TEs when discussing the arms-race theory. This is indeed a valuable perspective, as the TE repressing functions by KRAB-ZNF proteins may have evolved more recently than the split between Old World Monkeys (OWM) and New World Monkeys (NWM) at 44.2 mya we used.

      Our rationale for selecting 44.2 mya is based on certain primate-specific TEs such as the Alu subfamilies, which emerged after the rise of Simiiformes and have been used in phylogenetic studies (Xing et al., 2007 and Williams et al., 2010). This timeframe allowed us to investigate the potential co-evolution of KRAB-ZNFs and TEs in species that emerged after the OWM-NWM split (e.g., human, chimpanzee, bonobos, and macaques used for this study). However, focusing only on KRAB-ZNFs and TEs younger than 25 million years would limit the analysis to just 9 KRAB-ZNFs and 92 TEs expressed in our datasets. While we will not conduct a reanalysis using this more recent timepoint, we will integrate the recommendation into the discussion section of the revised manuscript.

      Furthermore, we greatly appreciate the reviewer's detailed insights and suggestions for refining specific descriptions and interpretations in our manuscript. We will address these points in the revised version to ensure the content is presented with greater precision and clarity.

      Once again, we thank both reviewers for their valuable feedback, which provides significant input for strengthening our study.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors present an interesting study using RL and Bayesian modelling to examine differences in learning rate adaptation in conditions of high and low volatility and noise respectively. Through "lesioning" an optimal Bayesian model, they reveal that apparently a suboptimal adaptation of learning rates results from incorrectly detecting volatility in the environment when it is not in fact present.

      Strengths:

      The experimental task used is cleverly designed and does a good job of manipulating both volatility and noise. The modelling approach takes an interesting and creative approach to understanding the source of apparently suboptimal adaptation of learning rates to noise, through carefully "lesioning" and optimal Bayesian model to determine which components are responsible for this behaviour.

      We thank the reviewer for this assessment.

      Weaknesses:

      The study has a few substantial weaknesses; the data and modelling both appear robust and informative, and it tackles an interesting question. The model space could potentially have been expanded, particularly with regard to the inclusion of alternative strategies such as those that estimate latent states and adapt learning accordingly.

      We thank the reviewer for this suggestion. We agree that it would be interesting to assess the ability of alternative models to reproduce the sub-optimal choices of participants in this study. The Bayesian Observer Model described in the paper is a form of Hierarchical Gaussian Filter, so we will assess the performance of a different class of models that are able to track uncertainty-- RL based models that are able to capture changes of uncertainty (the Kalman filter, and the model described by Cochran and Cisler, Plos Comp Biol 2019). We will assess the ability of the models to recapitulate the core behaviour of participants (in terms of learning rate adaption) and, if possible, assess their ability to account for the pupillometry response.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors aimed to investigate how humans learn and adapt their behavior in dynamic environments characterized by two distinct types of uncertainty: volatility (systematic changes in outcomes) and noise (random variability in outcomes). Specifically, they sought to understand how participants adjust their learning rates in response to changes in these forms of uncertainty.

      To achieve this, the authors employed a two-step approach:

      (1) Reinforcement Learning (RL) Model: They first used an RL model to fit participants' behavior, revealing that the learning rate was context-dependent. In other words, it varied based on the levels of volatility and noise. However, the RL model showed that participants misattributed noise as volatility, leading to higher learning rates in noisy conditions, where the optimal strategy would be to be less sensitive to random fluctuations.

      (2) Bayesian Observer Model (BOM): To better account for this context dependency, they introduced a Bayesian Observer Model (BOM), which models how an ideal Bayesian learner would update their beliefs about environmental uncertainty. They found that a degraded version of the BOM, where the agent had a coarser representation of noise compared to volatility, best fit the participants' behavior. This suggested that participants were not fully distinguishing between noise and volatility, instead treating noise as volatility and adjusting their learning rates accordingly.

      The authors also aimed to use pupillometry data (measuring pupil dilation) as a physiological marker to arbitrate between models and understand how participants' internal representations of uncertainty influenced both their behavior and physiological responses. Their objective was to explore whether the BOM could explain not just behavioral choices but also these physiological responses, thereby providing stronger evidence for the model's validity.

      Overall, the study sought to reconcile approximate rationality in human learning by showing that participants still follow a Bayesian-like learning process, but with simplified internal models that lead to suboptimal decisions in noisy environments.

      Strengths:

      The generative model presented in the study is both innovative and insightful. The authors first employ a Reinforcement Learning (RL) model to fit participants' behavior, revealing that the learning rate is context-dependent-specifically, it varies based on the levels of volatility and noise in the task. They then introduce a Bayesian Observer Model (BOM) to account for this context dependency, ultimately finding that a degraded BOM - in which the agent has a coarser representation of noise compared to volatility - provides the best fit for the participants' behavior. This suggests that participants do not fully distinguish between noise and volatility, leading to the misattribution of noise as volatility. Consequently, participants adopt higher learning rates even in noisy contexts, where an optimal strategy would involve being less sensitive to new information (i.e., using lower learning rates). This finding highlights a rational but approximate learning process, as described in the paper.

      We thank the reviewer for their assessment of the paper.

      Weaknesses:

      While the RL and Bayesian models both successfully predict behavior, it remains unclear how to fully reconcile the two approaches. The RL model captures behavior in terms of a fixed or context-dependent learning rate, while the BOM provides a more nuanced account with dynamic updates based on volatility and noise. Both models can predict actions when fit appropriately, but the pupillometry data offers a promising avenue to arbitrate between the models. However, the current study does not provide a direct comparison between the RL framework and the Bayesian model in terms of how well they explain the pupillometry data. It would be valuable to see whether the RL model can also account for physiological markers of learning, such as pupil responses, or if the BOM offers a unique advantage in this regard. A comparison of the two models using pupillometry data could strengthen the argument for the BOM's superiority, as currently, the possibility that RL models could explain the physiological data remains unexplored.

      We thank the reviewer for this suggestion. In the current version of the paper, we use an extremely simple reinforcement learning model to simply measure the learning rate in each task block (as this is the key behavioural metric we are interested in). As the reviewer highlights, this simple model doesn’t estimate uncertainty or adapt to it. Given this, we don’t think we can directly compare this model to the Bayesian Observer Model—for example, in the current analysis of the pupillometry data we classify individual trials based on the BOM’s estimate of uncertainty and show that participants adapt their learning rate as expected to the reclassified trials, this analysis would not be possible with our current RL model. However, there are more complex RL based models that do estimate uncertainty (as discussed above in response to Reviewer #1) and so may more directly be compared to the BOM. We will attempt to apply these models to our task data and describe their ability to account for participant behaviour and physiological response as suggested by the Reviewer.

      The model comparison between the Bayesian Observer Model and the self-defined degraded internal model could be further enhanced. Since different assumptions about the internal model's structure lead to varying levels of model complexity, using a formal criterion such as Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC) would allow for a more rigorous comparison of model fit. Including such comparisons would ensure that the degraded BOM is not simply favored due to its flexibility or higher complexity, but rather because it genuinely captures the participants' behavioral and physiological data better than alternative models. This would also help address concerns about overfitting and provide a clearer justification for using the degraded BOM over other potential models.

      Thank you, we will add this.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study dissects distinct pools of diacylglycerol (DAG), continuing a line of research on the central concept that there is a major lipid metabolism DAG pool in cells, but also a smaller signaling DAG pool. It tests the hypothesis that the second pool is regulated by Dip2, which influences Pkc1 signaling. The group shows that stressed yeast increase specific DAG species C36:0 and 36:1, and propose this promotes Pkc1 activation via Pck1 binding 36:0. The study also examines how perturbing the lipid metabolism DAG pool via various deletions such as lro1, dga1, and pah1 deletion impacts DAG and stress signaling. Overall this is an interesting study that adds new data to how different DAG pools influence cellular signaling.

      Strengths:

      The study nicely combined lipidomic profiling with stress signaling biochemistry and yeast growth assays.

      We thank the reviewer for finding this study of interest and appreciating our multi-pronged approach to prove our hypothesis that a distinct pool of Dip2 regulated by DAGs activate PKC signalling.

      Weaknesses:

      One suggestion to improve the study is to examine the spatial organization of Dip2 within cells, and how this impacts its ability to modulate DAG pools. Dip2 has previously been proposed to function at mitochondria-vacuole contacts (Mondal 2022). Examining how Dip2 localization is impacted when different DAG pools are manipulated such as by deletion Pah1 (also suggested to work at yeast contact sites such as the nucleus-vacuole junction), or with Lro1 or Dga1 deletion would broaden the scope of the study.

      We thank the reviewer for the valuable suggestions regarding the spatial organization of Dip2 in cells under the influence of different DAG pools. As suggested, we will probe the localization of Dip2 in the absence of Pah1. We would also trace the localization of Dip2 in LRO1 and DGA1 deletion where the bulk DAGs are accumulated and present the data in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors use yeast genetics, lipidomic and biochemical approaches to demonstrate the DAG isoforms (36:0 and 36:1) can specifically activate PKC. Further, these DAG isoforms originate from PI and PI(4,5)P2. The authors propose that the Psi1-Plc1-Dip2 functions to maintain a normal level of specific DAG species to modulate PKC signalling.

      Strengths:

      Data from yeast genetics are clear and strong. The concept is potentially interesting and novel.

      We would like to thank the reviewer for the positive comments on our work. We are happy to know that the reviewer finds the study novel and interesting.

      Weaknesses:

      More evidence is needed to support the central hypothesis. The authors may consider the following:

      (1) Figure 2: the authors should show/examine C36:1 DAG. Also, some structural evidence would be highly useful here. What is the structural basis for the assertion that the PKC C1 domain can only be activated by C36:0/1 DAG but not other DAGs? This is a critical conclusion of this work and clear evidence is needed.

      We agree with the reviewer that PKC activated by C36:0 and C36:1 DAGs is a critical conclusion of our work. While we understand that there is no obvious structural explanation as to how the DAG binding C1 domain of PKC attains the acyl chain specificity for DAGs, our conclusion that yeast Pkc1 is selective for C36:0 and C36:1 DAGs is supported by a combination of robust in vitro and in vivo data

      1. In Vitro Evidence: The liposome binding assays demonstrate that the Pkc1 C1 domain only binds the selective DAG and does not interact with bulk DAGs.

      2. In Vivo Evidence: Lipidomic analyses of wild-type cells subjected to cell wall stress reveal increased levels of C36:0 and C36:1 DAGs, while levels of bulk DAGs remain unaffected. This clearly parallels the Dip2 knockout scenario in which the levels of the same set of DAGs go up and Pkc1 gets hyperactivated.

      These findings collectively indicate that Pkc1 neither binds nor is activated by bulk DAGs, reinforcing its specificity for C36:0 and C36:1 DAGs. It is also further corroborated by DGA1 and LRO1 knockouts wherein the increase of the bulk DAGs does not result in a significant increase in Pkc1 signalling.

      Moreover, elucidating the structural basis of this selectivity would require a specific DAG-bound C1 domain structure of Pkc1, which is difficult owing to the flexibility of the longer acyl chains present in C36:0 and C36:1 DAGs. Furthermore, capturing the full-length Pkc1 structure that might provide deeper insights has been challenging for several other groups for a long time. Additionally, we believe that the DAG selectivity by Pkc1 is more of a membrane-associated phenomenon wherein these DAGs might create a specific microdomain or a particular curvature which are required for Pkc1’s ability to bind DAG followed by activation. Investigating this would require extensive structural and biophysical studies, which are beyond the scope of the current work but are planned for future research.

      (2) Does Dip2 colocalize with Plc1 or Pkc1? Does Dip2 reach the plasma membrane upon Plc activation?

      Thank you for your questions regarding the colocalization and potential translocation of Dip2 upon Plc1 or Pkc1 activation.

      In the wild-type scenario, Dip2 does not colocalize with Pkc1. Dip2 predominantly localizes to the mitochondria and mitochondria-vacuole contact sites, while Pkc1 is found in the cytosol, plasma membrane and bud site. Moreover, the localization of Plc1 has not yet been studied in yeast and therefore we currently lack data on the colocalisation of Dip2 and Plc1.

      However, to investigate whether Dip2 translocates to the plasma membrane under conditions requiring Plc1 or Pkc1 activation, we plan to probe the localization of Dip2 under cell wall stress condition. This would provide a better understanding of the spatial crosstalk between Dip2 and Pkc1. We will include the results in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      In this manuscript, the role of orexin receptors in dopamine transmission is studied. It extends previous findings suggesting an interplay of these two systems in regulating behaviour by first characterising the expression of orexin receptors in the midbrain and then disrupting orexin transmission in dopaminergic neurons by deleting its predominant receptor, OX1R (Ox1R fl/fl, DatCre tg/wt mice). Electrophysiological and calcium imaging data suggest that orexin A acutely and directly stimulates SN and VTA dopaminergic neurons, but does not seem to induce c-Fos expression. Behavioural effects of depleting OX1R from dopaminergic neurons includes enhanced noveltyinduced locomotion and exploration, relative to littermate controls (Ox1R fl/fl, Dat-Cre wt/wt). However, no difference between groups is observed in tests that measure reward processing, anxiety, and energy homeostasis. To test whether depletion of OX1R alters overall orexin-triggered activation across the brain, PET imaging is used in OX1R∆DAT knockout and control mice. This analysis reveals that several regions show a higher neuronal activation after orexin injection in OX1R∆DAT mice, but the authors focus their follow up study on the dorsal bed nucleus of the stria terminalis (BNST) and lateral paragigantocellular nucleus (LPGi). Dopaminergic inputs and expression of dopamine receptors type-1 and -2 (DRD1 & DRD2) is assessed and compared to control demonstrating moderate decrease of DRD1 and DRD2 expression in BNST of OX1R∆DAT mice and unaltered expression of DRD2, with absence of DRD1 expression in LPGi of both groups. Overall, this study is valuable for the information it provides on orexin receptor expression and function on behaviour and for the new tools it generated for the specific study of this receptor in dopaminergic circuits. 

      Strengths: 

      The use of a transgenic line that lacks OX1R in dopamine-transporter expressing neurons is a strong approach to dissect the direct role of orexin in modulating dopamine signalling in the brain. The battery of behavioural assays to study this line provides a valuable source of information for researchers interested in the role of orexin in animal physiology. 

      We thank the reviewer for summarizing the importance and significance of our study. 

      Weaknesses: 

      This study falls short in providing evidence for an anatomical substrate of the altered behaviour observed in mice lacking orexin receptor subtype 1 in dopaminergic neurons. How orexin transmission in dopaminergic neurons regulates the expression of postsynaptic dopamine receptors (as observed in BNST of OX1R<sup>∆DAT</sup> mice) is an intriguing question poorly discussed. Whether disruption of orexin activity alters dopamine release in target areas is an important point not addressed. 

      We identified dopaminergic fibers and dopamine receptors in the dBNST and LPGi, suggesting anatomical basis for dopamine neurons to regulate neural activity and receptor expression levels in these areas. PET imaging scan and c-Fos staining revealed that Ox1R signaling in dopaminergic cells regulates neuronal activity in dBNST and LPGi. The expression levels of Th were unchanged in both regions. Dopamine receptor 2 (DRD2), but not DRD1, is expressed in LPGi. The deletion of Ox1R in DAT-expressing cells did not affect DRD2 expression in LPGi. The expression levels of DRD1 and DRD2 were decreased or showed a tendency to decrease in dBNST. 

      We included the comments in the discussion in this revised manuscript (lines 308-312): ‘The expression levels of Th were not altered in dBNST or LPGi by Ox1R deletion in dopaminergic neurons. It remains unclear whether dopamine release is affected in these regions. It is possible that either the dopaminergic regulation of neuronal activity or the changes in dopamine release could lead to the decreased expression of dopamine receptors in dBNST.’

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript examines expression of orexin receptors in midbrain - with a focus on dopamine neurons - and uses several fairly sophisticated manipulation techniques to explore the role of this peptide neurotransmitter in reward-related behaviors. Specifically, in situ hybridization is used to show that dopamine neurons predominantly express orexin receptor 1 subtype and then go on to delete this receptor in dopamine transporter-expressing using a transgenic strategy. Ex vivo calcium imaging of midbrain neurons is used to show that, in the absence of this receptor, orexin is no longer able to excite dopamine neurons of the substantia nigra. 

      The authors proceed to use this same model to study the effect of orexin receptor 1 deletion on a series of behavioral tests, namely, novelty-induced locomotion and exploration, anxiety-related behavior, preference for sweet solutions, cocaine-induced conditioned place preference, and energy metabolism. Of these, the most consistent effects are seen in the tests of novelty-induced locomotion and exploration in which the mice with orexin 1 receptor deletion are observed to show greater levels of exploration, relative to wild-type, when placed in a novel environment, an effect that is augmented after icv administration of orexin. 

      In the final part of the paper, the authors use PET imaging to compare brain-wide activity patterns in the mutant mice compared to wildtype. They find differences in several areas both under control conditions (i.e., after injection of saline) as well as after injection of orexin. They focus in on changes in dorsal bed nucleus of stria terminalis (dBNST) and the lateral paragigantocellular nucleus (LPGi) and perform analysis of the dopaminergic projections to these areas. They provide anatomical evidence that these regions are innervated by dopamine fibers from midbrain, are activated by orexin in control, but not mutant mice, and that dopamine receptors are present. Thus, they argue these anatomical data support the hypothesis that behavioral effects of orexin receptor 1 deletion in dopamine neurons are due to changes in dopamine signaling in these areas.

      Strengths: 

      Understanding how orexin interacts with the dopamine system is an important question and this paper contains several novel findings along these lines. Specifically:

      (1) Distribution of orexin receptor subtypes in VTA and SN is explored thoroughly.

      (2) Use of the genetic model that knocks out a specific orexin receptor subtype from dopaminetransporter-expressing neurons is a useful model and helps to narrow down the behavioral significance of this interaction.  

      (3) PET studies showing how central administration of orexin evokes dopamine release across the brain is intriguing, especially that two key areas are pursued - BNST and LPGi - where the dopamine projection is not as well described/understood. 

      We thank the reviewer for summarizing the importance and significance of our study. 

      Weaknesses: 

      The role of the orexin-dopamine interaction is not explored in enough detail. The manuscript presents several related findings, but the combination of anatomy and manipulation studies do not quite tell a cogent story. Ideally, one would like to see the authors focus on a specific behavioral parameter and show that one of their final target areas (dBNST or LPGi) was responsible or at least correlated with this behavioral readout. 

      We agree that exploring the orexin-dopamine interactions in more detail and focusing on the behavioral impact of their final target areas (e.g., dBNST or LPGi), would provide valuable data. While we are very interested in pursuing these studies, the aim of the present manuscript is to provide an overview of the behavioral roles of orexin-dopamine interaction and to propose some promising downstream pathways in a relatively broad and systematic manner. 

      In many places in the Results, insufficient explanation and statistical reporting is provided. Throughout the Results - especially in the section on behavior although not restricted to this part - statements are made without statistical tests presented to back up the claims, e.g., "Compared to controls, Ox1R<sup>ΔDAT</sup> 143 mice did not show significant changes in spontaneous locomotor activity in home cages" (L143) and "In a hole-board test, female Ox1RΔDAT mice showed increased nose pokes into the holes in early (1st and 2nd) sessions compared to control mice" (L151). In other places, ANOVAs are mentioned but full results including main effects and interactions are not described in detail, e.g., in F3-S3, only a single p-value is presented and it is difficult to know if this is the interaction term or a post hoc test (L205). These and all other statements need statistics included in the text as support. Addition of these statistical details was also requested by the editor. 

      We submitted all our source data as Excel spreadsheets to eLife during our first-round revision, and the full statistics, such as main effects and interactions, are presented alongside the source data in the respective spreadsheets. We thank the reviewer for pointing out our lack of clarity in the manuscript. In this revised manuscript, we included the statistical details of ANOVAs mentioned above in the figure legends. In the figure legends, we also explained that the full statistics were provided alongside the source data in the supplementary materials.

      In the presentation of reward processing this is particularly important as no statistical tests are shown to demonstrate that controls show a cocaine-induced preference or a sucrose preference. Here, one option would be to perform one-sample t-tests showing that the data were different to zero (no preference). As it is, the claim that "Both of the control and Ox1RΔDAT groups showed a preference for cocaine injection" is not yet statistically supported. 

      We thank the reviewer for the suggestions. We have added the one-sample t-test results in this revised manuscript (Figure 2–figure supplement 4, lines 171 - 183). 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors): 

      Can the authors comment on overlap between DAT and Ox1R in brain areas outside VTA/SN? Is there any? 

      We only focused on the expression patterns of orexin receptors in VTA/SN, and we did not examine other brain regions. Additionally, little is known from the literature about the expression of Ox1R in DAT-expressing cells in brain areas outside VTA/SN. Further analysis is necessary to answer this question. We have added the comment in our discussion (lines 243 - 344).

      For the Ca2+ imaging experiment, it is unclear to me why the authors do not show all the neurons (almost 160 in total) and just select 5 neurons to show for each condition. 

      Heat maps of all recorded neurons are now shown in Figure 1—figure supplement 4.

      There are other claims that still require a statistical justification to be included in addition to the passages on behavior mentioned above, e.g., "Increasing the orexin A concentration to 300 nM further increased [Ca2+]i" (L118). 

      Authors should ensure that all such claims are either presented with a statistical test or are phrased differently, e.g. "Visual inspection of data suggested that there was a further increase...". In addition, when an ANOVA is conducted, full results including main effects and interactions should be described. 

      We emphasize now our statement that ALREADY 100 nM orexin A significantly increased [Ca<sup>2+</sup>]i levels (lines 117 - 118).

      We submitted all our source data as Excel spreadsheets to eLife during our first-round revision, and the full statistics, such as main effects and interactions, are presented alongside the source data in the respective spreadsheets. For clarity, we chose to include only the key statistical information in the main text and figures. We thank the reviewer for pointing this out. In this revised manuscript, we have emphasized in each figure legend: ‘Source data and full statistics are provided in the supplementary materials’.

      Typos in figure captions  

      F2-S1 - spontanous 

      F3-S2 - intrest 

      We apologize for the typos. We have corrected them in this revised manuscript.

      Editor's note: 

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05. 

      We submitted all our source data as Excel spreadsheets to eLife during our first-round revision, and the full statistics, such as test statistics, df and 95% confidence intervals, are presented alongside the source data in the respective spreadsheets. We thank the editor’s note. In this revised manuscript, we have included more statistical information in the main text and figure legends (see our response to reviewer #2). In the figure legends, we also explained that the full statistics were provided alongside the source data in the supplementary materials. In addition, we also uploaded the source data and full statistics in the bioRxiv before we upload this revised manuscript to eLife.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study sought to reveal the potential roles of m6A RNA methylation in gene dosage regulatory mechanisms, particularly in the context of aneuploid genomes in Drosophila. Specifically, this work looked at the relationships between the expression of m6A regulatory factors, RNA methylation status, classical and inverse dosage effects, and dosage compensation. Using RNA sequencing and m6A mapping experiments, an in-depth analysis was performed to reveal changes in m6A status and expression changes across multiple aneuploid Drosophila models. The authors propose that m6A methylation regulates MOF and, in turn, deposition of H4K16Ac, critical regulators of gene dosage in the context of genomic imbalance.

      Strengths:

      This study seeks to address an interesting question with respect to gene dosage regulation and the possible roles of m6A in that process. Previous work has linked m6A to X-inactivation in humans through the Xist lncRNA, and to the regulation of the Sxl in flies. This study seeks to broaden that understanding beyond these specific contexts to more broadly understand how m6A impacts imbalanced genomes in other contexts.

      Weaknesses:

      The methods being used particularly for analysis of m6A at both the bulk and transcript-specific level are not sufficiently specific or quantitative to be able to confidently draw the conclusions the authors seek to make. MeRIP m6A mapping experiments can be very valuable, but differential methylation is difficult to assess when changes are small (as they often are, in this study but also m6A studies more broadly). For instance, based on the data presented and the methods described, it is not clear that the statement that "expression levels at m6A sites in aneuploidies are significantly higher than that in wildtype" is supported. MeRIP experiments are not quantitative, and since there are far fewer peaks in aneuploidies, it stands to reason that more antibody binding sites may be available to enrich those fewer peaks to a larger extent. But based on the data as presented (figure 2D) this conclusion was drawn from RPKM in IP samples, which may not fully account for changing transcript abundances in absolute (expression level changes) and relative (proportion of transcripts in input RNA sample) terms.

      Methylated RNA immunoprecipitation followed by sequencing (MeRIP-seq) is a commonly used strategy of genome-wide mapping of m6A modification. This method uses anti-m6A antibody to immunoprecipitate RNA fragments, which results in selective enrichment of methylated RNA. Then the RNA fragments were subjected to deep sequencing, and the regions enriched in the immunoprecipitate relative to input samples are identified as m6A peaks using the peak calling algorithm. We identified m6A peaks in different samples by the exomePeak2 program and determined common m6A peaks for each genotype based on the intersection of biological replicates. Figure 2D shows the RPM values of m6A peaks in MeRIP samples for each genotype, indicating that the levels of reads in the m6A peak regions were significantly higher in the aneuploid IP samples than in wildtypes. When the enrichment of IP samples relative to Input samples (RPM.IP/RPM.Input) was taken into account, the statistics for all three aneuploidies were still significantly higher than those of the wildtypes (Mann Whitney U test p-values < 0.001). This analysis is not about changes in the abundance of transcripts, but from the MeRIP perspective, showing that there are relatively more m6A-modified reads mapped to the m6A peaks in aneuploidies than that in wildtypes. We hope to provide a possible explanation for the phenomenon that the quantitative changes of m6A peaks are not consistent with the overall m6A abundance trend. We have added the results of IP/Input in the main text, and revised the description in the manuscript to make it more precise to reduce possible misunderstandings.

      The bulk-level m6A measurements as performed here also cannot effectively support these conclusions, as they are measured in total RNA. The focus of the work is mRNA m6A regulators, but m6A levels measured from total RNA samples will not reflect mRNA m6A levels as there are other abundance RNAs that contain m6A (including rRNA). As a result, conclusions about mRNA m6A levels from these measurements are not supported.

      According to published articles, m6A levels of mRNA or total RNA can be detected by different methods (such as mass spectrometry, 2D thin-layer chromatography, etc.) in Drosophila cells or tissues [1-3]. We used the EpiQuik m6A RNA Methylation Quantification Kit, which is suitable for detecting m6A methylation status directly using total RNA isolated from any species such as mammals, plants, fungi, bacteria, and viruses. This kit has previously been used by researchers to detect the m6A/A ratio in total RNA [4, 5] or purified mRNA [6] from different species. Our pre-experiments showed that the enrichment of mRNA from total RNA did not appear to significantly affect the results of the detection of m6A levels.

      We extracted and purified mRNA from the heads of the control and MSL2 transgenic Drosophila to verify our conclusion. mRNA was isolated from total RNA using the Dynabeads mRNA purification kit (Invitrogen, Carlsbad, CA, USA, 61006). It was showing a heightened abundance of m6A modification on mRNA as opposed to total RNA (Figure 7E,F; Figure 7—figure supplement 1G,H). Compared with control Drosophila, the abundance changes of m6A in mRNA and total RNA in MSL2 transgenic Drosophila are basically the same. These results supported the conclusions in our manuscript. In the MSL2 knockdown Drosophila, the m6A modification levels on mRNA mirrored those observed on total RNA, exhibiting a significant downregulation (Figure 7E; Figure 7—figure supplement 1G). The only difference is that no substantial difference in the m6A abundance on mRNA was detected between MSL2 overexpressed female and the control Drosophila (Figure 7F; Figure 7—figure supplement 1H). It is suggested that m6A modification in other types of RNA other than mRNA (e.g., lncRNA, rRNA) is not necessarily meaningless, which is the future research direction. We will also add discussions of this issue in the manuscript.

      (1) Lence T, et al. (2016) m6A modulates neuronal functions and sex determination in Drosophila. Nature 540(7632):242-247.

      (2) Haussmann IU, et al. (2016) m(6)A potentiates Sxl alternative pre-mRNA splicing for robust Drosophila sex determination. Nature 540(7632):301-304.

      (3) Kan L, et al. (2017) The m(6)A pathway facilitates sex determination in Drosophila. Nat Commun 8:15737.

      (4) Zhu C, et al. (2023) RNA Methylome Reveals the m(6)A-mediated Regulation of Flavor Metabolites in Tea Leaves under Solar-withering. Genomics Proteomics Bioinformatics 21(4):769-787.

      (5) Song H, et al. (2021) METTL3-mediated m(6)A RNA methylation promotes the anti-tumour immunity of natural killer cells. Nat Commun 12(1):5522.

      (6) Yin H, et al. (2021) RNA m6A methylation orchestrates cancer growth and metastasis via macrophage reprogramming. Nat Commun 12(1):1394.

      Reviewer #2 (Public Review):

      Summary:

      The authors have tested the effects of partial- or whole-chromosome aneuploidy on the m6A RNA modification in Drosophila. The data reveal that overall m6A levels trend up but that the number of sites found by meRIP-seq trend down, which seems to suggest that aneuploidy causes a subset of sites to become hyper-methylated. Subsequent bioinformatic analysis of other published datasets establish correlations between the activity of the H4K16 acetyltransferase dosage compensation complex (DCC) and the expression of m6A components and m6A abundance, suggesting that DCC and m6A can act in a feedback loop on each other. Overall, this paper uses bioinformatic trends to generate a candidate model of feedback between DCC and m6A. It would be improved by functional studies that validate the effect in vivo.

      Strengths:

      • Thorough bioinformatic analysis of their data.

      • Incorporation of other published datasets that enhance scope and rigor.

      • Finds trends that suggest that a chromosome counting mechanism can control m6A, as fits with pub data that the Sxl mRNA is m6A modified in XX females and not XY males.

      • Suggests this counting mechanism may be due to the effect of chromatin-dependent effects on the expression of m6A components.

      Weaknesses:

      • The linkage between H4K16 machinery and m6A is indirect and based on bioinformatic trends with little follow-up to test the mechanistic bases of these trends.

      Western blots were performed to detect H4K16Ac in Ythdc1 knockdown Drosophila and control Drosophila. Through quantitative analysis, it is demonstrated that H4K16Ac levels changed significantly in Ythdc1 knockdown Drosophila. Combined with the results of polytene chromosome immunostaining in third instar larvae, we found that Ythdc1 affects the expression of H4K16Ac in tissue- and developmental stage-specific manners. This specificity may be associated with the onuniformity and heterogeneity of RNA m6A modification characteristics, encompassing the tissue specificity, the developmental specificity, the different numbers of m6A sites in one transcript, the different proportions of methylated transcripts, et cetera [1-3].

      In addition, we found a set of ChIP-seq data (GSE109901) of H4K16ac in female and male Drosophila larvae from the public database, and analyzed whether H4K16ac is directly associated with m6A regulator genes. ChIP-seq is a standard method to study transcription factor binding and histone modification by using efficient and specific antibodies for immunoprecipitation. The results showed that there were H4K16ac peaks at the 5' region in gene of m6A reader Ythdc1 in both males and females. In addition, most of the genome sites where the other m6A regulator genes located are acetylated at H4K16 in both sexes, except that Ime4 shows sexual dimorphism and only contains H4K16ac peak in females. These results indicate that the m6A regulator gene itself is acetylated at H4K16, so there is a direct relationship between H4K16ac and m6A regulators. We have added these contents to the text.

      Our analysis of experimental outcomes and public sequencing data has shed light on the interaction of the m6A reader protein Ythdc1 with H4K16Ac. We appreciate your interest in the complex interplay between H4K16Ac and m6A modifications. We acknowledge the intricacy of this interaction and concur that it merits further investigation, potentially supported by additional experiments.

      In current submitted manuscript, it is mainly focused on the role of RNA m6A modification in genomes experiencing imbalance, and we are going to explore this complex interplay in subsequent work for sure.

      (1) Meyer, K. D., et al. (2012). Comprehensive analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons. Cell, 149(7), 1635-1646.

      (2) Meyer, K. D., & Jaffrey, S. R. (2014). The dynamic epitranscriptome: N6-methyladenosine and gene expression control. Nature Reviews: Molecular Cell Biology, 15(5), 313-326.

      (3) Zaccara, S., Ries, R. J., & Jaffrey, S. R. (2019). Reading, writing and erasing mRNA methylation. Nature Reviews: Molecular Cell Biology, 20(10), 608-624.

      • The paper lacks sufficient in vivo validation of the effects of DCC alleles on m6A and vice versa. For example, Is the Ythdc1 genomic locus a direct target of the DCC component Msl-2 ? (see Figure 7).

      In order to study whether Ythdc1 genomic locus is a direct target of DCC component, we first analyzed a published MSL2 ChIP-seq data of Drosophila (GSE58768). Since MSL2 is only expressed in males under normal conditions, this set of data is from male Drosophila. According to the results, the majority (99.1%) of MSL2 peaks are located on the X chromosome, while the MSL2 peaks on other chromosomes are few. This is consistent with the fact that MSL2 is enriched on the X chromosome in male Drosophila [1, 2]. Ythdc1 gene is located on chromosome 3L, and there is no MSL2 peak near it. Similarly, other m6A regulator genes are not X-linked, and there is no MSL2 peak. Then we analyzed the MOF ChIP-seq data (GSE58768) of male Drosophila. It was found that 61.6% of MOF peaks were located on the X chromosome, which was also expected [3, 4]. Although there are more MOF peaks on autosomes than MSL2 peaks, MOF peaks are absent on m6A regulator genes on autosomes. Therefore, at present, there is no evidence that the gene locus of m6A regulators are the direct targets of DCC component MSL2 and MOF, which may be due to the fact that most MSL2 and MOF are tethered to the X chromosome by MSL complex under physiological conditions. Whether there are other direct or indirect interactions between Ythdc1 and MSL2 is an issue worthy of further study in the future.

      (1) Bashaw GJ & Baker BS (1995) The msl-2 dosage compensation gene of Drosophila encodes a putative DNA-binding protein whose expression is sex specifically regulated by Sex-lethal. Development 121(10):3245-3258.

      (2) Kelley RL, et al. (1995) Expression of msl-2 causes assembly of dosage compensation regulators on the X chromosomes and female lethality in Drosophila. Cell 81(6):867-877.

      (3) Kind J, et al. (2008) Genome-wide analysis reveals MOF as a key regulator of dosage compensation and gene expression in Drosophila. Cell 133(5):813-828.

      (4) Conrad T, et al. (2012) The MOF chromobarrel domain controls genome-wide H4K16 acetylation and spreading of the MSL complex. Dev Cell 22(3):610-624.

      Quite a bit of technical detail is omitted from the main text, making it difficult for the reader to interpret outcomes.

      (1) Please add the tissues to the labels in Figure 1D.

      Figure 1D shows the subcellular localization of FISH probe signals in Drosophila embryos. Arrowheads indicate the foci of probe signals. The corresponding tissue types are (1) blastoderm nuclei; (2) yolk plasm and pole cells; (3) brain and midgut; (4) salivary gland and midgut; (5) blastoderm nuclei and yolk cortex; (6) blastoderm nuclei and pole cells; (7) blastoderm nuclei and yolk cortex; (8) germ band. We have added these to the manuscript.

      (2) In the main text, please provide detail on the source tissues used for meRIP; was it whole larvae? adult heads? Most published datasets are from S2 cells or adult heads and comparing m6A across tissues and developmental stages could introduce quite a bit of variability, even in wt samples. This issue seems to be what the authors discuss in lines 197-199.

      In this article, the material used to perform MeRIP-seq was the whole third instar larvae. Because trisomy 2L and metafemale Drosophila died before developing into adults, it was not possible to use the heads of adults for MeRIP-seq detection of aneuploidy. For other experiments described here, the m6A abundance was measured using whole larvae or adult heads; material used for RT-qPCR analysis was whole larvae, larval brains, or adult heads; Drosophila embryos at different developmental stages were used for fluorescence in situ hybridization (FISH) experiments. We provide a detailed description of the experimental material for each assay in the manuscript.

      (3) In the main text, please identify the technique used to measure "total m6A/A" in Fig 2A. I assume it is mass spec.

      We used the EpiQuik m6A RNA Methylation Quantification Kit (Colorimetric) (Epigentek, NY, USA, Cat # P-9005) to measure the m6A/A ratio in RNA samples. This kit is commercially available for quantification of m6A RNA methylation, which used colorimetric assay with easy-to-follow steps for convenience and speed, and is suitable for detecting m6A methylation status directly using total RNA isolated from any species such as mammals, plants, fungi, bacteria, and viruses.

      (4) Line 190-191: the text describes annotating m6A sites by "nearest gene" which is confusing. The sites are mapped in RNAs, so the authors must unambiguously know the identity of the gene/transcript, right?

      When the m6A peaks were annotated using the R package ChIPseeker, it will include two items: "genomic annotation" and "nearest gene annotation". "Genomic annotation" tells us which genomic features the peak is annotated to, such as 5’UTR, 3’UTR, exon, etc. "Nearest gene annotation" indicates which specific gene/transcript the peak is matched to. We modified the description in the main text to make it easier to understand.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      While I believe this study aims to address a very interesting question and demonstrates intriguing evidence suggesting a role for m6A in unbalanced genomes, technical limitations in the methods being used limited my confidence in the overall conclusions. In addition, some of the analyses seemed to distract a bit from the main question of the work, which made thoroughly reading and reviewing the work challenging at times due to the length and lack of cohesion. Some specific points and suggestions are detailed below.

      (1) Some specific points/recommendations for the bulk m6A measurements: for Figure 2A, the authors refer to m6A/A ratio in the text, but based on the methods section and axis labels in Figure 2A (as well as other figures), it may represent m6A% in total RNA. The authors should just clarify which one it is and make the text and figures consistent. The methods description also seems to specify that m6A is quantified in total RNA, and yet the factors being discussed (Ime4, Ythdc1, etc) are associated with m6A in mRNA. Since m6A is present in non-mRNAs (including highly abundant rRNAs), m6A analysis of total RNA may be masking some of the effects due to the relatively low abundance of mRNA relative to rRNA. It is possible that the above point contributes to the discrepancy between the overall m6A abundance in aneuploidies and the changing methylase expression levels (which does seem to correlate better with m6A sequencing data). On a related note, though the authors suggest in Figures 7E and F that m6A level changes are different in males and females, the levels and trends of m6A% in these panels seem quite similar, and the absence of the presence of statistical significance seems driven by higher variation (larger error bars) in the measurements in 7F (and again effects may be masked if total RNA is being quantified). This may be a very addressable issue, as m6A analysis of mRNA-enriched samples should be feasible, and in fact, may show clearer changes to better support the authors' conclusions.

      Thank you for your helpful comments.

      As suggested, the abundance of m6A on mRNA were detected (Figure 7E, F). Total RNA was extracted from the heads of the control and MSL2 transgenic Drosophila and mRNA was isolated using the Dynabeads mRNA purification kit (Invitrogen, Carlsbad, CA, USA, 61006). 300-600 ng mRNA can be purified from 40 μg total RNA (200-300 heads per sample). We used the EpiQuik m6A RNA Methylation Quantification Kit (Colorimetric) (Epigentek, NY, USA, Cat # P-9005) to measure the abundance of m6A in mRNA samples (200ng). The results obtained by this method represent the m6A/A ratio (%), which is also written as m6A% on the user guide of the kit. We made corresponding revisions in the main text and figures to made them consistent.

      It is showing a heightened abundance of m6A modification on mRNA as opposed to total RNA including some other types of RNA such as mRNA, lncRNA, and rRNA (Figure 7E,F; Figure 7—figure supplement 1G,H). Consistently, in the MSL2 knockdown Drosophila, the m6A modification levels on mRNA mirrored those observed on total RNA, exhibiting a significant downregulation (Figure 7E; Figure 7—figure supplement 1G). In contrast, no substantial difference in the m6A abundance on mRNA was detected between MSL2 overexpressed Drosophila and the control Drosophila (Figure 7F; Figure 7—figure supplement 1H). The differences of m6A abundance between males and females were not statistically significant (Figure 7E,F), prompting us to make revisions to the manuscript.

      (2) The analyses in Figures 5 and 6 describe a lot of different comparisons derived from these datasets, and while there seem to be many interesting new hypotheses to be tested, the authors do not make any definitive conclusions from these analyses. These figures also seem to diverge a bit from the main conclusion of the work, and from this reviewer's perspective made it more difficult to read and review the work. Overall streamlining the narrative may help readers appreciate the main conclusions of the work (though this is of course up to the author's discretion).

      As indicated in Figure 5, the results demonstrated a sexually dimorphic role of m6A modification in the regulation of gene expression in aneuploid Drosophila, suggesting its potential involvement in the gene regulatory network through interactions with dosage-sensitive regulators. Furthermore, Figure 6 illustrated the intricate interplay between RNA m6A modification, gene expression, and alternative splicing under genomic imbalance, with RNA splicing being more intimately associated with m6A methylation than gene transcription itself.

      This manuscript also discussed the correlation between methylation status and classical dosage effects, dosage compensation effects, and inverse dosage effects. We have initially demonstrated that RNA m6A methylation could influence dosage-dependent gene regulation via multiple avenues, such as interactions with dosage-sensitive modifiers, alternative splicing mechanisms, the MSL complex, and other related processes. Indeed, our study primarily utilizes m6A methylated RNA immunoprecipitation sequencing (MeRIP-Seq) to comprehensively investigate the role of RNA m6A modification in genomes experiencing imbalance. We agree that more specific and in-depth research on these factors will be instrumental in elucidating the precise mechanisms by which m6A modification regulates expression in unbalanced genomes, which we acknowledge as a significant avenue for our future research.

      We are grateful for your suggestions and, should it be necessary, we might to simplify the volume of the whole manuscript by removing or condensing the data analyse and description to enhance the prominence of the central theme.

      Reviewer #2 (Recommendations For The Authors):

      Overall, please provide enough technical detail in the main text so that the reader understands what was done, and does not have to repeatedly dig into figure legends and materials and methods to understand each data statement.

      Thank you for your suggestions. We have added some technical details to the manuscript and made some modifications as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      How reconsolidation works - particularly in humans - remains largely unknown. With an elegant, 3-day design, combining fMRI and psychopharmacology, the authors provide evidence for a certain role for noradrenaline in the reconsolidation of memory for neutral stimuli. All memory tasks were performed in the context of fMRI scanning, with additional resting-state acquisitions performed before and after recall testing on Day 2. On Day 1, 3 groups of healthy participants encoded word-picture associates (with pictures being either scenes or objects) and then performed an immediate cued recall task to presentation of the word (answering is the word old or new, and whether it was paired with a scene or an object). On Day 2, the cued recall task was repeated using half of the stimulus set words encoded on Day 1 (only old words were presented, with subjects required to indicate prior scene vs object pairing). This test was immediately preceded by the oral administration of placebo, cortisol, or yohimbine (to raise noradrenaline levels) depending on group assignment. On Day 3, all words presented on Day 1 were presented. As expected, on Day 3, memory was significantly enhanced for associations that were cued and successfully retrieved on Day 2 compared to uncued associations. However, for associative d', there was no Cued × Group interaction nor a main effect of Group, i.e., on the standard measure of memory performance, post-retrieval drug presence on Day 2 did not affect memory reconsolidation. As further evidence for a null result, fMRI univariate analyses showed no Cued × Group interactions in whole-brain or ROI activity.

      Strengths:

      There are some aspects of this study that I find impressive. The study is well-designed and the fMRI analysis methodology is innovative and sound. The authors have made meticulous and thorough physiological measurements, and assays of mood, throughout the experiment. By doing so, they have overcome, to a considerable extent, the difficulties inherent in the timing of human oral drug delivery in reconsolidation tasks, where it is difficult to have the drug present in the immediate recall period without affecting recall itself. This is beautifully shown in Figure 3. I also think that having some neurobiological assay of memory reactivation when studying reconsolidation in humans is critical, and the authors provide this. While multi-voxel patterns of hemodynamic responses are, in my view, very difficult to equate with an "engram", these patterns do have something to do with memory.

      We thank the reviewer for considering aspects of our work impressive, the study to be well-designed, and the methodology to be innovative and sound.

      Weaknesses:

      I have major issues regarding the behavioral results and the framing of the manuscript.

      (1) To arrive at group differences in memory performance, the authors performed median splitting of Day 3 trials by short and long reaction times during memory cueing on Day 2, as they took this as a putative measure of high/low levels of memory reactivation. Associative category hits on Day 3 showed a Group by Day 2 Reaction time (short, long) interaction, with post-hocs showing (according to the text) worse memory for short Day 2 RTs in the Yohimbine group. These post-hocs should be corrected for multiple comparisons, as the result is not what would be predicted (see point 2). My primary issue here is that we are not given RT data for each group, nor is the median splitting procedure described in the methods. Was this across all groups, or within groups? Are short RTs in the yohimbine group any different from short RTs in the other two groups? Unfortunately, we are not given Day 2 picture category memory levels or reaction times for each group. This is relevant because (as given in Supplemental Table S1) memory performance (d´) for the Yohimbine group on Day 1 immediate testing is (roughly speaking) 20% lower than the other 2 groups (independently of whether the pairs will be presented again the following day). I appreciate that this is not significant in a group x performance ANOVA but how does this relate to later memory performance? What were the group-specific RTs on Day 1? So, before the reader goes into the fMRI results, there are questions regarding the supposed drug-induced changes in behavior. Indeed, in the discussion, there is repeated mention of subsequent memory impairment produced by yohimbine but the nature of the impairment is not clear.

      Thank you for the opportunity to clarify these important issues.

      Reaction times are well established proxies (correlates) of memory strength and memory confidence in previous research, as they reflect cognitive processes involved in retrieving information. Faster reaction times indicate stronger mnemonic evidence and higher confidence in the accuracy of a memory decision, while slower responses suggest weaker evidence and decision uncertainty or doubt. This relationship is supported by an extensive literature (e.g., Starns 2021; Robinson et al., 1997; Ratcliff & Murdock, 1976; amongst others). Importantly, distinguishing between high and low confidence choices in a memory task serves the purpose of differentiating between particularly strong memory evidence (e.g., in associative cued recall, when remembering is particularly vivid) and weaker memory evidence. Separating low from high confidence responses based on participants’ reaction times was especially important in the current analyses, because previous research demonstrates that reaction times during cued recall tasks inversely correlate with hippocampal involvement (Heinbockel et al., 2024; Gagnon et al. 2019) and that stress-effects on human memory may be particularly pronounced for high-confidence memories (Gagnon et al., 2019).

      In response to the Reviewer 1’s comments, we have elaborated on our rationale for the distinction between short and long reaction times in the introduction, results, and methods. Please see page 4, lines 144 to 148:

      “We distinguished between responses with short and long reaction times indicative of high and low confidence responses because previous research showed that reaction times are inversely correlated with hippocampal memory involvement(58-60) and memory strength(61,62), and that high confidence memories associated with short reaction times may be particularly sensitive to stress effects(63).”

      On page 13, lines 520 to 523:

      “Reaction times in the Day 2 Memory cueing task revealed a trial-specific gradient in reactivation strength. Thus, we turned to single-trial analyses, differentiating Day 3 trials by short and long reaction times during memory cueing on Day 2 (median split), indicative of high vs. low memory confidence(58–60) and hippocampal reactivation(26,63).”

      And on page 26, lines 1046 to 1053:

      “Reaction times serve as a proxy for memory confidence and memory strength, with faster responses reflecting higher confidence/strength and slower responses suggesting greater uncertainty/weaker memory. The association between reaction times and memory confidence has been established by previous research(58–60), suggesting that the distinction between high from low confidence responses differentiates vividly recalled associations from decisions based on weaker memory evidence. Reaction times are further linked to hippocampal activity during recall tasks(26,53), and stress effects on memory are particularly pronounced for high-confidence memories(53).”

      With respect to behavioral data reporting, we agree that the critical median-split procedure was not sufficiently clear in the original manuscript. We elaborate on this important aspect of the analysis now on page 26, lines 1053 to 1057:

      “We conducted a median-split within each participant to categorize trials as fast vs. slow reaction time trials during Day 2 memory cueing. We conducted this split on the participant- and not group-level because there is substantial inter-individual variability in overall reaction times. This approach also results in an equal number of trials in the low and high confidence conditions.”

      We completely agree that the relevant post-hoc test should be corrected for multiple comparisons. Please note that all reported post-hoc tests had been Bonferroni-corrected already. We clarify this now by explicitly referring to corrected p-values (P<sub>corr</sub>) and indicate in the methods that P<sub>corr</sub> refers to Bonferroni-corrected p-values. (please see page 25, lines 1036 to 1038).

      We further agree that for a comprehensive overview of the behaviour in terms of memory performance and RTs, these data need to be provided for each group and experimental day. Therefore, we now extended Supplementary Table S1 to include descriptive indices of memory performance (hits, dprime) and RTs for each group for each day. Moreover, we now report ANOVAs for reaction times for each of the experimental days in the main text.

      The ANOVA for Day 1 is now reported on page 6, lines 200 to 204: “To test for potential group differences in reaction times for correctly remembered associations on Day 1, we fit a linear model including the factors Group and Cueing. Critically, we did not observe a significant Group x Cueing interaction, suggesting no RT difference between groups for later cued and not cued items (F(2,58) = 1.41, P = .258, η<sup>2</sup> = 0.01; Supplemental Table S1).”

      The ANOVA for Day 2 is now reported on page 7, lines 243 to 248: “To test for potential group differences in reaction times for correctly remembered associations on Day 2, we fit a linear model including the factors Group and Reaction time (slow/fast) following the subject specific median split. The model did not reveal any main effect or interaction including the factor Group (all Ps > .535; Supplemental Table S1), indicating that there was no RT difference between groups, nor between low and high RT trials in the groups.”

      The ANOVA for Day 3 is reported on page 13 lines 487 to 494: “To test for potential group differences in reaction times for correctly remembered associations on Day 3 we fit a linear model including the factors Group and Cueing. This model did not reveal any main effect or interaction including the factor Group (all Ps > .267), indicating that there was no average RT difference between groups. As expected we observed a main effect of the factor Cueing, indicating a significant difference of reaction times across groups between trials that were successfully cued and those not cued on Day 2 (F(2,58) = 153.07, P < .001, η<sup>2</sup> = 0.22; Supplemental Table S1).”

      (2) The authors should be clearer as to what their original hypotheses were, and why they did the experiment. Despite being a complex literature, I would have thought the hypotheses would be reconsolidation impairment by cortisol and enhancement by yohimbine. Here it is relevant to point out that - only when the reader gets to the Methods section - there is mention of a paper published by this group in 2024. In this publication, the authors used the same study design but administered a stress manipulation after Day 2 cued recall, instead of a pharmacological one. They did not find a difference in associative hit rate between stress and control groups, but - similar to the current manuscript - reported that post-retrieval stress disrupts subsequent remembering (Day 3 performance) depending on neural memory reinstatement during reactivation (specifically driven by the hippocampus and its correlation with neocortical areas).

      Instead of using these results, and other human studies, to motivate the current work, reference is made to a recent animal study: Line 169 "Building on recent findings in rodents (Khalaf et al. 2018), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval". It is difficult to follow that a rodent study using contextual fear conditioning and examining single neuron activity to remote fear recall and extinction would be relevant enough to motivate a hypothesis for a human psychopharmacological study on emotionally neutral paired associates.

      We agree that our recent publication utilizing a very similar experimental design including three days is highly relevant in the context of the current study and we now refer to this recent study earlier in our manuscript. Please see page 3, lines 89 to 94:  

      “Recently, we showed a detrimental impact of post-retrieval stress on subsequent memory that was contingent upon reinstatement dynamics in the Hippocampus, VTC and PCC during memory reactivation26. While this study provided initial insights into the potential brain mechanisms involved in the effects of post-retrieval stress on subsequent memory, the underlying neuroendocrine mechanisms remained elusive.”

      Moreover, we explicitly state our hypothesis regarding the neural mechanism, with reference to our recent work, on page 5, lines 166 to 169:

      “Building on our recent findings in humans(26) as well as current insights from rodents(47), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval.”

      Concerning the potential direction of the effects of post-retrieval cortisol and noradrenaline, the literature is indeed mixed with partially contradicting results, which made it, in our view, difficult to derive a clear hypothesis of potentially opposite effects of cortisol and yohimbine. We summarize the relevant evidence in the introduction on pages 3 to 4, lines 100 to 113:

      “Some studies, using emotional recognition memory or fear conditioning in healthy humans, suggest enhancing effects of post-retrieval glucocorticoids on subsequent memory(30,31). However, rodent studies on neutral recognition memory(21), fear conditioning(32), as well as evidence from humans on episodic recognition memory(33) report impairing effects of glucocorticoid receptor activation on post-retrieval memory dynamics. For noradrenaline, post-retrieval blockade of noradrenergic activity impairs putative reconsolidation or future memory accessibility in human fear conditioning(34), as well as drug (alcohol) memory(35) and spatial memory in rodents(36). However, this effect is not consistently observed in human studies on fear conditioning(40), speaking anxiety(37), inhibitory avoidance(39), traumatic mental imagination (PTSD patients)(38), and might depend on the arousal state of the individual(21) or the exact timing of drug administration as suggested by studies in humans(41) and rodents(42). Thus, while there is evidence that glucocorticoid and noradrenergic activation after retrieval can affect subsequent memory, the direction of these effects remains elusive.”

      In addition to these reviewer comments and in response to the eLife assessment, we would like to emphasize that the present findings are in our view not only relevant for a subfield but may be of considerable interest for researchers from various fields, beyond experimental memory research, including Neurobiology, Psychiatry, Clinical Psychology, Educational Psychology, or Law Psychology. We highlight the relevance of the topic and our findings now more explicitly in the introduction and discussion. Please see page 3:

      “The dynamics of memory after retrieval, whether through reconsolidation of the original trace or interference with retrieval-related traces, have fundamental implications for educational settings, eyewitness testimony, or mental disorders(5,11,12). In clinical contexts, post-retrieval changes of memory might offer a unique opportunity to retrospectively modify or render less accessible unwanted memories, such as those associated with posttraumatic stress disorder (PTSD) or anxiety disorders(13–15). Given these potential far reaching implications, understanding the mechanisms underlying post-retrieval dynamics of memory is essential.”

      On page 17:

      “Upon their retrieval, memories can become sensitive to modification(1,2). Such post-retrieval changes in memory may be fundamental for adaptation to volatile environments and have critical implications for eyewitness testimony, clinical or educational contexts(5,11–15). Yet, the brain mechanisms involved in the dynamics of memory after retrieval are largely unknown, especially in humans.”

      And on page 19:

      “Beyond their theoretical relevance, these findings may have relevant implications for attempts to employ post-retrieval manipulations to modify unwanted memories in anxiety disorders or PTSD(97,98). Specifically, the present findings suggest that such interventions may be particularly promising if combined with cognitive or brain stimulation techniques ensuring a sufficient memory reactivation.“

      Reviewer #1 (Recommendations for the authors):

      (1) Related to major issue 2 in the Public Review. In the introduction, it would be helpful to be specific about the type of memory being probed in the different studies referenced (episodic vs conditioning). For the former, please make it clear whether stimuli to be remembered were emotional or neutral, and for which stimulus class drug effects were observed. This is particularly important given that in the first paragraph, you describe memory reactivation in the context of traumatic memories via mention of PTSD. It would also be helpful to know to which species you refer. For example, in line 115, "timing of drug administration..." a rodent and a human study are cited.

      We completely agree that these aspects are important. We have therefore rewritten the corresponding paragraph in the introduction to clarify the type of memory probed, the emotionality of the stimuli and the species tested. Please see pages 3 to 4, lines 100 to 113:

      “Some studies, using emotional recognition memory or fear conditioning in healthy humans, suggest enhancing effects of post-retrieval glucocorticoids on subsequent memory(30,31). However, rodent studies on neutral recognition memory(21), fear conditioning(32), as well as evidence from humans on episodic recognition memory(33) report impairing effects of glucocorticoid receptor activation on post-retrieval memory dynamics. For noradrenaline, post-retrieval blockade of noradrenergic activity impairs putative reconsolidation or future memory accessibility in human fear conditioning(34), as well as drug (alcohol) memory(35) and spatial memory in rodents(36). However, this effect is not consistently observed in human studies on fear conditioning(40), speaking anxiety(37), inhibitory avoidance(39), traumatic mental imagination (PTSD patients)(38), and might depend on the arousal state of the individual(21) or the exact timing of drug administration as suggested by studies in humans(41) and rodents(42). Thus, while there is evidence that glucocorticoid and noradrenergic activation after retrieval can affect subsequent memory, the direction of these effects remains elusive.”

      (2) The Bos 2014 reference appears incorrect. I think you mean the Frontiers paper of the same year.

      Thank you for noticing this mistake, which has been corrected.

      (3) Line 734 "The study employed a fully crossed, placebo-controlled, double-blind, between-subjects design". What is a fully crossed design?

      A fully-crossed design refers to studies in which all possible combinations of multiple between-subjects factors are implemented. However, because the factor reactivation/cueing was manipulated within-subject in the present study and there is only one between-subjects factor (group/drug), “fully-crossed” may be misleading here. We removed it from the manuscript.

      (4) Supplemental Table S3. Are these ordered in terms of significance? A t- or Z-value for each cluster (either of the peak or a summed value) would be helpful.

      We agree that the ordering of the clusters was not clearly described. In the revised Supplemental Table S3, we have now added a column with the cluster-peak specific T-values and added an explanation in the table caption: “Depicted clusters are ordered by cluster-peak T-values.”

      (5) Please provide the requested memory performance and reaction time data, and relevant group comparisons.

      In response to general comment #1 above, we now provide all relevant accuracy and reaction time data for all groups and experimental days in the revised Supplemental Table S1. Moreover, we now report the relevant group comparisons in the main text on page 6, lines 200 to 204, on page 7, lines 243 to 248, and on page 13, lines 487 to 494.

      (6) Please rewrite the introduction with specific hypotheses, mention your recent results published in Science Advances, and attend to suggestions made in the first comment above.

      We have rewritten parts of the introduction to make the link to our recent publication clearer and to clarify the types of memories and species tested, as suggested by the reviewer (please see pages 3 to 4, lines 100 to 113). Moreover, we explicitly state our hypothesis regarding the neural mechanism on page 5, lines 166 to 169:

      “Building on our recent findings in humans(26) as well as current insights from rodents(47), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval.”

      In terms of the direction of the potential cortisol and yohimbine effects, we have elaborated on the relevant literature, which in our view does not allow a clear prediction regarding the nature of the drug effects. We have made this explicit by stating that “… while there is evidence that glucocorticoid and noradrenergic activation after retrieval can affect subsequent memory, the direction of these effects remains elusive.” (please see page 4, lines 111 to 113). It would be, in our view, inappropriate to retrospectively add another, more specific “hypothesis”.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate how noradrenergic and glucocorticoid activity after retrieval influence subsequent memory recall with a 24-hour interval, by using a controlled three-day fMRI study involving pharmacological manipulation. They found that noradrenergic activity after retrieval selectively impairs subsequent memory recall, depending on hippocampal and cortical reactivation during retrieval.

      Overall, there are several significant strengths of this well-written manuscript.

      Strengths:

      (1) The study is methodologically rigorous, employing a well-structured three-day experimental design that includes fMRI imaging, pharmacological interventions, and controlled memory tests.

      (2) The use of pharmacological agents (i.e., hydrocortisone and yohimbine) to manipulate glucocorticoid and noradrenergic activity is a significant strength.

      (3) The clear distinction between online and offline neural reactivation using MVPA and RSA approaches provides valuable insights into how memory dynamics are influenced by noradrenergic and glucocorticoid activity distinctly.

      We thank the reviewer for these very positive and encouraging remarks.

      Weaknesses:

      (1) One potential limitation is the reliance on distinct pharmacodynamics of hydrocortisone and yohimbine, which may complicate the interpretation of the results.

      We agree that the pharmacodynamics of hydrocortisone and yohimbine are different. However, we took these pharmacodynamics into account when designing the experiment and have made an effort to accurately track the indicators for noradrenergic arousal and glucocorticoids across the experiment. As shown in Figure 2, these indicators confirm that both drugs are active within the time window of approximately 40-90 minutes after reactivation. This time window corresponds to the proposed reconsolidation window, which is assumed to open around 10 minutes post-reactivation and to remain open for a few hours (approximately 90 minutes; Monfils & Holmes, 2018; Lee et al., 2017; Monfils et al., 2009).

      We have now acknowledged the distinct pharmacodynamics of hydrocortisone and yohimbine on page 21, lines 845 to 847: “We note that yohimbine and hydrocortisone follow distinct pharmacodynamics(104,105), yet selected the administration timing to ensure that both substances are active within the relevant post-retrieval time window.”

      In the results section, on page 11, lines 437 to 439, we further emphasize this differential dynamic: “Our data demonstrate that, despite the distinct pharmacodynamics of CORT and YOH, both substances are active within the time window that is critical for potential reconsolidation effects(3,4,43).”

      (2) Another point related above, individual differences in pharmacological responses, physiological and cortisol measures may contribute to memory recall on Day 3.

      The administered drugs elicit a pronounced adrenergic and glucocorticoid response, respectively. Specifically, the cortisol levels reached by 20mg of hydrocortisone correspond to those observed after a significant stressor exposure. Moreover, individual variation in stress system activation following drug intake tends to be less pronounced than in response to a natural stressor. Nevertheless, we fully agree that individual factors, such as metabolism or body weight, can influence the drug's action.

      We therefore re-analysed the reported Day 3 models, now including individual measures of baseline-to-peak changes in cortisol and systolic blood pressure, respectively. We report these additional analyses in the supplement and refer the interested reader to these analyses on page 15, lines 580 to 586:

      “As individual factors, such as metabolism or body weight, can influence the drug's action, we ran an additional analysis in which we included individual (baseline-to-peak) differences in salivary cortisol and (systolic) blood pressure, respectively. This analysis did not show any group by baseline-to-peak difference interaction suggesting that the observed memory effects were mainly driven by the pharmacological intervention group per se and less by individual variation in responses to the drug (see Supplemental Results).”

      And in the Supplemental Results:

      “To account for individual differences in cortisol responses after pill intake, we fit additional GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak cortisol and Group. Doing so allowed us to account for variation in Day 3 performance, which might have resulted from within-group variation in cortisol responses, in particular in the CORT group. Importantly, none of the models predicting Day 3 memory performance by Day 2 cortisol-increase and Group, median-split RTs (high/low), hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement revealed a significant group x baseline-to-peak cortisol interaction (all Ps > .122). These results suggest that inter-individual differences in cortisol responses did not have a significant impact on subsequent memory, beyond the influence of group per se. The same analyses were repeated for systolic blood pressure employing GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak systolic blood pressure and Group to account for variation in Day 3 performance, which might have resulted from within-group variation in blood pressure response, in particular in the YOH group. While the model predicting Day 3 memory performance revealed a significant Individual baseline-to-peak systolic blood pressure × Group × median-split RTs (high/low) interaction (β = -0.05 ± 0.02, z = -2.04, P = .041, R<sup>2</sup><sub>conditional</sub> = 0.01), post-hoc slope tests, however, did not show any significant difference between groups (all P<sub>Corr</sub> > .329). The remaining models including hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement did not reveal a significant Group × Individual baseline-to-peak systolic blood pressure interaction (all Ps > .101). These results suggest that inter-individual differences in systolic blood pressure responses did not have a significant impact on subsequent memory, beyond the influence of group per se.”

      Although we acknowledge that our study may not have been sufficiently powered for an analysis of individual differences, these data suggest that our memory effects were mainly driven by the pharmacological intervention group per se and less by individual variation in responses. It is to be noted, however, that all participants of the respective groups showed a pronounced increase in cortisol concentrations (on average > 1000% in the CORT group) and autonomic arousal (on average > 10% in the YOH group), respectively. These increases appeared to be sufficient to drive the observed memory effects, irrespective of some individual variation in the magnitude of the response.

      (3) Median-splitting approach for reaction times and hippocampal activity should better be justified.

      Reaction times are well established proxies (correlates) of memory strength and memory confidence in previous research, as they reflect cognitive processes involved in retrieving information. Faster reaction times indicate stronger mnemonic evidence and higher confidence in the accuracy of a memory decision, while slower responses suggest weaker evidence and decision uncertainty or doubt. This relationship is supported by an extensive literature (e.g., Starns 2021; Robinson et al., 1997; Ratcliff & Murdock, 1976; amongst others). Importantly, distinguishing between high and low confidence choices in a memory task serves the purpose to differentiating between particularly strong memory evidence (e.g., is associative cued recall, when remembering is particularly vivid) and weaker memory evidence. Separating low from high confidence responses based on participants’ reaction times was especially important in the current analyses, because previous research demonstrates that reaction times during cued recall tasks inversely correlate with hippocampal involvement  Heinbockel et al., 2024; Gagnon et al. 2019) and that stress-effects on human memory may be particularly pronounced for high-confidence memories (Gagnon et al., 2019).

      In response to the Reviewer comments, we have elaborated on our rationale for the distinction between short and long reaction times in the introduction, results, and methods. Please see page 4, lines 144 to 148:

      “We distinguished between responses with short and long reaction times indicative of high and low confidence responses because previous research showed that reaction times are inversely correlated with hippocampal memory involvement(58–60) and memory strength(61,62), and that high confidence memories associated with short reaction times may be particularly sensitive to stress effects(63).”

      On page 13, lines 520 to 523:

      “Reaction times in the Day 2 Memory cueing task revealed a trial-specific gradient in reactivation strength. Thus, we turned to single-trial analyses, differentiating Day 3 trials by short and long reaction times during memory cueing on Day 2 (median split), indicative of high vs. low memory confidence(58–60) and hippocampal reactivation(26,63).”

      And on page 26, lines 1046 to 1053:

      “Reaction times serve as a proxy for memory confidence and memory strength, with faster responses reflecting higher confidence/strength and slower responses suggesting greater uncertainty/weaker memory. The association between reaction times and memory confidence has been established by previous research(58–60), suggesting that the distinction between high from low confidence responses differentiates vividly recalled associations from decisions based on weaker memory evidence. Reaction times are further linked to hippocampal activity during recall tasks(26,53), and stress effects on memory are particularly pronounced for high-confidence memories(53).”

      We agree that the critical median-split procedure was not sufficiently clear in the original manuscript. We elaborate on this important aspect of the analysis now on page 26, lines 1053 to 1057:

      “We conducted a median-split within each participant to categorize trials as slow vs. fast reaction time trials during Day 2 memory cueing. We chose to conduct this split on the participant- and not group-level because there is substantial inter-individual variability in overall reaction times and to retain an equal number of trials in the low and high confidence conditions.”

      In addition to these reviewer comments and in response to the eLife assessment, we would like to emphasize that the present findings are in our view not only relevant for a subfield but may be of considerable interest for researchers from various fields, beyond experimental memory research, including Neurobiology, Psychiatry, Clinical Psychology, Educational Psychology, or Law Psychology. We highlight the relevance of the topic and our findings now more explicitly in the introduction and discussion. Please see page 3:

      “The dynamics of memory after retrieval, whether through reconsolidation of the original trace or interference with retrieval-related traces, have fundamental implications for educational settings, eyewitness testimony, or mental disorders5,11,12. In clinical contexts, post-retrieval changes of memory might offer a unique opportunity to retrospectively modify or render less accessible unwanted memories, such as those associated with posttraumatic stress disorder (PTSD) or anxiety disorders(13–15). Given these potential far reaching implications, understanding the mechanisms underlying post-retrieval dynamics of memory is essential.”

      On page 17:

      “Upon their retrieval, memories can become sensitive to modification(1,2). Such post-retrieval changes in memory may be fundamental for adaptation to volatile environments and have critical implications for eyewitness testimony, clinical or educational contexts(5,11–15), Yet, the brain mechanisms involved in the dynamics of memory after retrieval are largely unknown, especially in humans.”

      And on page 19:

      “Beyond their theoretical relevance, these findings may have relevant implications for attempts to employ post-retrieval manipulations to modify unwanted memories in anxiety disorders or PTSD(97,98). Specifically, the present findings suggest that such interventions may be particularly promising if combined with cognitive or brain stimulation techniques ensuring a sufficient memory reactivation.“

      Reviewer #2 (Recommendations for the authors):

      My comments and/or questions for the authors to improve this well-written manuscript.

      (1) This study identifies the modulatory role of the hippocampus and VTC in the effects of norepinephrine on subsequent memory. Are there functional interactions between these ROIs and other brain regions that could be wise to consider for a more comprehensive understanding of the underlying neural mechanisms?

      We agree that functional interactions of hippocampus and VTC and other regions that were active during Day 2 memory cueing are relevant for our understanding of the underlying mechanisms. We therefore now performed connectivity analyses using general psycho-physiological interaction analysis (gPPI; as implemented in SPM) and report the results of this analysis on page 16, lines 635 to 644, and added Supplemental Table S4 including gPPI statistics.

      “We conducted general psycho-physiological interaction analysis (gPPI) analyses on the Day 2 memory cueing task (remembered – forgotten), which revealed that successful cueing was accompanied by significant functional connectivity between the left hippocampus, VTC, PCC and MPFC (see Supplemental Table S4). However, using these connectivity estimates to predict Day 3 subsequent memory performance (dprime) via regression did not reveal any significant Group × Connectivity interactions, indicating that the pharmacological manipulation (i.e. noradrenergic stimulation) did not modulate subsequent memory based on functional connectivity during memory cueing (all P<sub>Corr</sub> > .228). The same pattern of results was observed when including single trial beta estimates from multiple ROIs during memory cueing to predict Day 3 memory (all interaction effects P<sub>Corr</sub> > .288).”

      (2) In theory, noradrenergic activity would have a profound impact on activity in widespread brain regions that are closely related to memory function. It would be interesting to know other possible effects beyond the hippocampus and VTC.

      We agree and included in our analysis additional ROIs beyond the HC and VTC; we now report these explorative results on page 16, lines 616 to 633:

      “Beyond hippocampal and VTC activity during memory cueing (Day 2), we exploratively reanalysed the GLMMs predicting Day 3 memory performance including the PCC, which was relevant during memory cueing in the current study and in our previous work(26).  Predicting Day 3 memory performance by the factors Group and Single trial beta activity during memory cueing in the PCC did not reveal a significant interaction (P<sub>Corr</sub>  = 1); adding the factor Reaction time to the model also did not result in a significant interaction (P<sub>Corr</sub> = 1). We also included the Medial Prefrontal Cortex (MPFC) to predict Day 3 memory performance, as the MPFC has been shown to be sensitive to noradrenergic modulation in previous work(75). Predicting Day 3 memory performance by the factors Group and Single trial beta activity during memory cueing in the MPFC did not reveal a significant interaction (P<sub>Corr</sub>  = 1); adding the factor Reaction time to the model also did not result in a significant interaction (P<sub>Corr</sub> = 1), which indicates that the MPFC was not modulated by either pharmacological intervention. Finally, we investigated memory cueing from all remaining ROIs that were significantly activated during the Day 2 memory cueing task (Day 2 whole-brain analysis; correct-incorrect; Supplemental Table S3). We again fit GLMMs predicting Day 3 memory performance by the factors Group and Single trial beta activity during memory cueing. Again, we did not observe any significant interaction effect any of the ROIs (all interaction P<sub>Corr</sub> > .060) and these results did not change when adding the factor Reaction time to the respective models (all  P<sub>Corr</sub> > .075).”

      (3) There are substantial individual differences in pharmacological responses, physiological and cortisol measures, as shown in Figure 3A&B. If such individual differences are taken into account, are there any potential effects on subsequent recall on Day 3 pertaining to the hydrocortisone group?

      In response to this comment (and the General comment #1 of this reviewer), we now re-analyzed the respective models including individual measures of baseline-to-peak cortisol and systolic blood pressure.

      We re-analysed the reported Day 3 models, now including individual measures of baseline-to-peak changes in cortisol and systolic blood pressure, respectively. We report these additional analyses in the supplement and refer the interested reader to these analyses on page 15, lines 580 to 586:

      “As individual factors, such as metabolism or body weight, can influence the drug's action, we ran an additional analysis in which we included individual (baseline-to-peak) differences in salivary cortisol and (systolic) blood pressure, respectively. This analysis did not show any group by baseline-to-peak difference interaction suggesting that the observed memory effects were mainly driven by the pharmacological intervention group per se and less by individual variation in responses to the drug (see Supplemental Results).”

      And in the Supplemental Results:

      “To account for individual differences in cortisol responses after pill intake, we fit additional GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak cortisol and Group. Doing so allowed us to account for variation in Day 3 performance, which might have resulted from within-group variation in cortisol responses, in particular in the CORT group. Importantly, none of the models predicting Day 3 memory performance by Day 2 cortisol-increase and Group, median-split RTs (high/low), hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement revealed a significant group x baseline-to-peak cortisol interaction (all Ps > .122). These results suggest that inter-individual differences in cortisol responses did not have a significant impact on subsequent memory, beyond the influence of group per se. The same analyses were repeated for systolic blood pressure employing GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak systolic blood pressure and Group to account for variation in Day 3 performance, which might have resulted from within-group variation in blood pressure response, in particular in the YOH group. While the model predicting Day 3 memory performance revealed a significant Individual baseline-to-peak systolic blood pressure × Group × median-split RTs (high/low) interaction (β = -0.05 ± 0.02, z = -2.04, P = .041, R<sup>2</sup><sub>conditional</sub> = 0.01), post-hoc slope tests, however, did not show any significant difference between groups (all P<sub>Corr</sub> > .329). The remaining models including hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement did not reveal a significant Group × Individual baseline-to-peak systolic blood pressure interaction (all Ps > .101). These results suggest that inter-individual differences in systolic blood pressure responses did not have a significant impact on subsequent memory, beyond the influence of group per se.”

      (4) Median-splitting approach for reaction times and hippocampal activity should better be justified.

      Reaction times are well established proxies (correlates) of memory strength and memory confidence in previous research, as they reflect cognitive processes involved in retrieving information. Faster reaction times indicate stronger mnemonic evidence and higher confidence in the accuracy of a memory decision, while slower responses suggest weaker evidence and decision uncertainty or doubt. This relationship is supported by an extensive literature (e.g., Starns 2021; Robinson et al., 1997; Ratcliff & Murdock, 1976; amongst others). Importantly, distinguishing between high and low confidence choices in a memory task serves the purpose to differentiating between particularly strong memory evidence (e.g., is associative cued recall, when remembering is particularly vivid) and weaker memory evidence. Separating low from high confidence responses based on participants’ reaction times was especially important in the current analyses, because previous research demonstrates that reaction times during cued recall tasks inversely correlate with hippocampal involvement ( Heinbockel et al., 2024; Gagnon et al. 2019) and that stress-effects on human memory may be particularly pronounced for high-confidence memories (Gagnon et al., 2019).

      In response to the Reviewer comments, we have elaborated on our rationale for the distinction between short and long reaction times in the introduction, results, and methods. Please see page 4, lines 144 to 148:

      “We distinguished between responses with short and long reaction times indicative of high and low confidence responses because previous research showed that reaction times are inversely correlated with hippocampal memory involvement(58–60) and memory strength(61,62), and that high confidence memories associated with short reaction times may be particularly sensitive to stress effects(63).”

      On page 13, lines 520 to 523:

      “Reaction times in the Day 2 Memory cueing task revealed a trial-specific gradient in reactivation strength. Thus, we turned to single-trial analyses, differentiating Day 3 trials by short and long reaction times during memory cueing on Day 2 (median split), indicative of high vs. low memory confidence(58–60) and hippocampal reactivation(26,63).”

      And on page 26, lines 1046 to 1053:

      “Reaction times serve as a proxy for memory confidence and memory strength, with faster responses reflecting higher confidence/strength and slower responses suggesting greater uncertainty/weaker memory. The association between reaction times and memory confidence has been established by previous research(58–60), suggesting that the distinction between high from low confidence responses differentiates vividly recalled associations from decisions based on weaker memory evidence. Reaction times are further linked to hippocampal activity during recall tasks(26,53), and stress effects on memory are particularly pronounced for high-confidence memories(53).”

      Minor comments:

      (5) Please include the full names of key abbreviations in the figure legends, such as "ass.cat.hit" and among others.

      We now include the full names of key abbreviations in all figure legends (e.g., ass.cat.hit = associative category hit).

      (6) Please introduce various metrics used in the study to aid readers in better understanding the measurements they utilized.

      We agree that various measures that were included in our analyses had not been described clearly enough before, especially concerning the multivariate analyses. We therefore added short explanations across the results section.

      Page 8, lines 279 to 280: “Classifier accuracy is derived from the sum of correct predictions the trained classifier made in the test-set, relative to the total amount of predictions.”

      Page 8, lines 290 to 292:  “Neural reinstatement reflects the extent to which a neural activity pattern (i.e., for objects) that was present during encoding is reactivated during retrieval (e.g., memory cueing).”

      Page 8, lines 299 to 301:  “The logits here reflect the log-transformed trial-wise probability of a pattern either representing a scene or an object.”

      Page 10, lines 378 to 380:  “Beyond category-level reinstatement, we assessed event-level memory trace reinstatement from initial encoding (Day 1) to memory cueing (Day 2), via RSA, correlating neural patterns in each region (hippocampus, VTC, and PCC) across days.”

      (7) Please explain what the different colors represent in Figures 5B and 5C to avoid confusion. It would be good to indicate significant differences in the figures if applicable.

      We now added line legends to the figure and also the caption to clarify what exactly is depicted. We added asterisks to mark significant differences.

      References:

      Monfils, M. H., Cowansage, K. K., Klann, E., & LeDoux, J. E. (2009). Extinction-reconsolidation boundaries: key to persistent attenuation of fear memories. science324(5929), 951-955.

      Monfils, M. H., & Holmes, E. A. (2018). Memory boundaries: opening a window inspired by reconsolidation to treat anxiety, trauma-related, and addiction disorders. The Lancet Psychiatry5(12), 1032-1042.

      Lee, J. L. C., Nader, K. & Schiller, D. An Update on Memory Reconsolidation Updating. Trends Cogn. Sci. 21, 531–545 (2017).

      Radley, J. J., Williams, B., & Sawchenko, P. E. (2008). Noradrenergic innervation of the dorsal medial prefrontal cortex modulates hypothalamo-pituitary-adrenal responses to acute emotional stress. Journal of Neuroscience28(22), 5806-5816.

      Heinbockel, H., Wagner, A. D., & Schwabe, L. (2024). Post-retrieval stress impairs subsequent memory depending on hippocampal memory trace reinstatement during reactivation. Science Advances10(18), eadm7504.

    1. Author response:

      Reviewer #1 (Public review):

      We thank Reviewer #1 for their thoughtful assessment. We especially agree that AVI-4206 will be a valuable tool to help understand the host immune response to viral infection.

      Reviewer #2 (Public review):

      We thank Reviewer #2 for their comments and will address PARP9/14 selectivity with in vitro experiments and alignments/modeling. For ADP-ribosylation of PARP14, we will attempt experiments patterned after Kar et al, EMBO Journal, 2024, but note that detection of ADPr by IF and western has been relatively inconsistent and detection-reagent dependent in our hands. Regardless of the outcome, we will expand the discussion of the prior literature on this point.

      Reviewer #3 (Public review):

      We thank Reviewer #3 for their comments, especially noting that we had the “chutzpah” to go for the in vivo experiment. We share the concern about potential off target effects, which is why we prioritized so many selectivity experiments prior to testing. Ongoing chemistry efforts are focused on developing next-generation inhibitors that are orally bioavailable, but this work is in its early stages.

    1. Author response:

      We thank the reviewers for the thoughtful comments, and we hope to address these issues in a future revision. We will clarify that chaos only serves to generate barcodes, and show that once they are formed and assigned the memory mechanism is stable to initial conditions.  We will also clarify the model's assumptions and its connections to indexing theory and to experimental results.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Review: 

      Summary: 

      The authors present a new application of the high-content image-based morphological profiling Cell Painting (CP) to single cell type classification in mixed heterogeneous induced pluripotent stem cell-derived mixed neural cultures. Machine learning models were trained to classify single cell types according to either "engineered" features derived from the image or from the raw CP multiplexed image. The authors systematically evaluated experimental (e.g., cell density, cell types, fluorescent channels) and computational (e.g., different models, different cell regions) parameters and convincingly demonstrated that focusing on the nucleus and its surroundings contain sufficient information for robust and accurate cell type classification. Models that were trained on mono-cultures (i.e., containing a single cell type) could generalize for cell type prediction in mixed co-cultures, and to describe intermediate states of the maturation process of iPSC-derived neural progenitors to differentiation neurons.

      Strengths:

      Automatically identifying single cell types in heterogeneous mixed cell populations hold great promise to characterize mixed cell populations and to discover new rules of spatial organization and cell-cell communication. Although the current manuscript focuses on the application of quality control of iPSC cultures, the same approach can be extended to a wealth of other applications including in depth study of the spatial context. The simple and high-content assay democratizes use and enables adoption by other labs.

      The manuscript is supported by comprehensive experimental and computational validations that raises the bar beyond the current state of the art in the field of highcontent phenotyping and makes this manuscript especially compelling. These include (i) Explicitly assessing replication biases (batch effects); (ii) Direct comparison of featurebased (a la cell profiling) versus deep-learning-based classification (which is not trivial/obvious for the application of cell profiling); (iii) Systematic assessment of the contribution of each fluorescent channel; (iv) Evaluation of cell-density dependency; (v) explicit examination of mistakes in classification; (vi) Evaluating the performance of different spatial contexts around the cell/nucleus; (vii) generalization of models trained on cultures containing a single cell type (mono-cultures) to mixed co-cultures; (viii) application to multiple classification tasks.

      Comments on latest version:

      I have consulted with Reviewer #3 and both of us were impressed by revised manuscript, especially by the clear and convincing evidence regarding the nucleocentric model use of the nuclear periphery and its benefit for the case of dense cultures. However, there are two issues that are incompletely addressed (see below). Until these are resolved, the "strength of evidence" was elevated to "compelling".

      First, the analysis of the patch size is not clearly indicating that the 12-18um range is a critical factor (Fig. 4E). On the contrary, the performance seems to be not very sensitive to the patch size, which is actually a desired property for a method. Still, Fig. 4B convincingly shows that the nucleocentric model is not sensitive to the culture density, while the other models are. Thus, the authors can adjust their text saying that the nucleocentric approach is not sensitive to the patch size and that the patch size is selected to capture the nucleus and some margins around it, making it less prone to segmentation errors in dense cultures.

      We agree that there is a significant tolerance to different patch sizes, and have therefore reformulated the conclusion as suggested in the results and the discussion sections (page 10 and 16). As very large patch sizes (>40µm) do increase the variability of the predictions and the imbalance between recall and precision, we have left this observation in the results section, as it also motivates for using smaller patch sizes.  

      Second, the GitHub does not contain sufficient information to reproduce the analysis. Its current state is sparse with documentation that would make reproducing the work difficult. What versions of the software were used? Where should data be downloaded? The README contains references to many different argparse CLI arguments, but sparse details on what these arguments actually are, and which parameters the authors used to perform their analyses. Links to images are broken. Ideally, all of these details would be present, and the authors would include a step-by-step tutorial on how to reproduce their work. Fixing this will lead to an "exceptional" strength of evidence.

      We have added additional information to the GitHub to increase the reproducibility of the analysis.  

      • The README now contains additional documentation and more extensive explanations. A flowchart has been added, making the dataflow and order of analyses more clear.  

      • The accompanying dataset is 20GB in size and can be downloaded as a .zip-file from https://figshare.com/articles/dataset/Nucleocentric-Profiling/27141441?file=49522557. This file contains 2x480 raw images and a layout file.  

      • The used software versions are included in the manuscript in table 4. To increase the reproducibility, a Conda environment file (.yaml) has been added to the GitHub. This can be installed and contains the correct package versions.

      • The README now contains for each script and its arguments a short description on its meaning, on whether it is required or optional and its default setting.  

      • A step-by-step tutorial on the use of the test dataset has been included. This tutorial includes the arguments used to run the code from the command line terminal.

      Recommendations for the authors:

      There are no reference from the text to Fig. 2D and to Fig. 3C.

      This has been changed. The text has been added to the manuscript at page 6 (fig. 2D) and the reference to Fig. 3C has been included at page 8.

    1. Author response:

      We thank the reviewers for the constructive suggestions made in the Public Reviews and the Recommendations to Authors. We intend to address these comments in a revised manuscript as follows:

      (1) We will revise the text according to the reviewer suggestions with regards to specific RBM20-dependent mRNAs and providing more detailed explanations in results and discussion.

      (2) We will upload higher resolution images of several figures (resolution had been reduced to achieve lower file sizes) to address the comment regarding “data quality”.

      (3) We will include data on eCLIP control experiments.

      (4) We will add information on replication and new data for the western blot analysis.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Previous experimental studies demonstrated that membrane association drives avidity for several potent broadly HIV-neutralizing antibodies and its loss dramatically reduces neutralization. In this study, the authors present a tour de force analysis of molecular dynamics (MD) simulations that demonstrate how several HIV-neutralizing membrane-proximal external region (MPER)-targeting antibodies associate with a model lipid bilayer.

      First, the authors compared how three MPER antibodies, 4E10, PGZL1, and 10E8, associated with model membranes, constructed with two lipid compositions similar to native viral membranes. They found that the related antibodies 4E10 and PGZL1 strongly associate with a phospholipid near heavy chain loop 1, consistent with prior crystallographic studies. They also discovered that a previously unappreciated framework region between loops 2-3 in the 4E10/PGZL1 heavy chain contributes to membrane association. Simulations of 10E8, an antibody from a different lineage, revealed several differences from published X-ray structures. Namely, a phosphatidylcholine binding site was offset and includes significant interaction with a nearby framework region. The revised manuscript demonstrates that these lipid interactions are robust to alterations in membrane composition and rigidity. However, it does not address the reverse-that phospholipids known experimentally not to associate with these antibodies (if any such lipids exist) also fail to interact in MD simulations.

      Next, the authors simulate another MPER-targeting antibody, LN01, with a model HIV membrane either containing or missing an MPER antigen fragment within. Of note, LN01 inserts more deeply into the membrane when the MPER antigen is present, supporting an energy balance between the lowest energy conformations of LN01, MPER, and the complex. These simulations recapitulate lipid binding interactions solved in published crystallographic studies but also lead to the discovery of a novel lipid binding site the authors term the "Loading Site", which could guide future experiments with this antibody.

      The authors next established course-grained (CG) MD simulations of the various antibodies with model membranes to study membrane embedding. These simulations facilitated greater sampling of different initial antibody geometries relative to membrane. These CG simulations , which cannot resolve atomistic interactions, are nonetheless compelling because negative controls (ab 13h11, BSA) that should not associate with membrane indeed sample significantly less membrane.

      Distinct geometries derived from CG simulations were then used to initialize all-atom MD simulations to study insertion in finer detail (e.g., phospholipid association), which largely recapitulate their earlier results, albeit with more unbiased sampling. The multiscale model of an initial CG study with broad geometric sampling, followed by all-atom MD, provides a generalized framework for such simulations.

      Finally, the authors construct velocity pulling simulations to estimate the energetics of antibody membrane embedding. Using the multiscale modelling workflow to achieve greater geometric sampling, they demonstrate that their model reliably predicts lower association energetics for known mutations in 4E10 that disrupt lipid binding. However, the model does have limitations: namely, its ability to predict more subtle changes along a lineage-intermediate mutations that reduce lipid binding are indistinguishable from mutations that completely ablate lipid association. Thus, while large/binary differences in lipid affinity might be predictable, the use of this method as a generative model are likely more limited.

      The MD simulations conducted throughout are rigorous and the analysis are extensive, creative, and biologically inspired. Overall, these analyses provide an important mechanistic characterization of how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization.

      Reviewer #2 (Public review):

      In this study, Maillie et al. have carried out a set of multiscale molecular dynamics simulations to investigate the interactions between the viral membrane and four broadly neutralizing antibodies that target the membrane proximal exposed region (MPER) of the HIV-1 envelope trimer. The simulation recapitulated in several cases the binding sites of lipid head groups that were observed experimentally by X-ray crystallography, as well as some new binding sites. These binding sites were further validated using a structural bioinformatics approach. Finally, steered molecular dynamics was used to measure the binding strength between the membrane and variants of the 4E10 and PGZL1 antibodies.

      The use of multiscale MD simulations allows for a detailed exploration of the system at different time and length scales. The combination of MD simulations and structural bioinformatics provides a comprehensive approach to validate the identified binding sites. Finally, the steered MD simulations offer quantitative insights into the binding strength between the membrane and bnAbs.

      While the simulations and analyses provide qualitative insights into the binding interactions, they do not offer a quantitative assessment of energetics. The coarse-grained simulations exhibit artifacts and thus require careful analysis.

      This study contributes to a deeper understanding of the molecular mechanisms underlying bnAb recognition of the HIV-1 envelope. The insights gained from this work could inform the design of more potent and broadly neutralizing antibodies.

      Recommendations for the authors:

      Reviewing Editor:

      We recommend the authors remove the figure and section related to bnAb LN01, perform additional analysis (e.g., further expanding on the differences in antibody binding in the presence or absence of antigen), and present this as a separate manuscript in a follow-up study.

      We consider the analysis of a bnAb with a transmembrane antigen and of LN01 as essential to the manuscript and novel results.  Study of LN01 provides many insights unique from the other MPER bnAbs in this study.  We agree further characterization of LN01 and bnAbs with transmembrane antigen or full-length Env are intriguing and necessary to complete the full mechanistic understanding of lipid-associated antibodies.  LN01 section in this paper is novel in the field and demonstrates the preliminary evidence motivating further work, which we agree are beyond the scope of this already long detailed study.

      Reviewer #1 (Recommendations for the authors):

      I appreciate the degree to which the authors responded to my previous points raised in the private review, including edits where I might have missed something in the manuscript or relevant literature. I imagine such a point-by-point response was quite onerous. Thank you also for balancing presentation/clarity with content/rigor considering the large information content of this manuscript; in silico results are inherently hard to present given the delicate balance between rigorous validation and novel information content. I apologize if I repeat points raised and addressed previously and commend the authors on their revised study, which is much improved in clarity; any additional revisions are of course entirely at your discretion.

      "...now having more diversity in lipid headgroup chemistries" references the wrong figure-it should be: Figure 2-figure supplement 2A-C. The incorrect figure is also referenced again several sentences down: "...relevant CDR and framework surface loops..."

      Thank you for pointing out this error. We have corrected figure references.

      "One shared conformational difference observed for these bnAbs the higher cholesterol bilayers was slightly more extensive and broader interaction profiles as well as modestly deeper embedding of the relevant CDR and framework surfaces loops" please rephrase

      Thank you for this suggestion.  We rephrased this for improved clarity and flow. 

      "These results bolster the feasibility for using all-atom MD as an in silico platform to explore differential phospholipid affinity at these sites (i.e., specificity studies) and influence on antibody preferred conformation as membrane composition and lipid chemistry are systematically varied" Please tone down these speculations-you have demonstrated that simulations are robust to different headgroup chemistries but have not provided evidence for the exclusion of lipids that are known not to associate with these antibodies.

      We rephrased this speculation to highlight the potential of this application. We also emphasize future studies that would be required to achieve this application in the following sentence.

      “These results motivate use of all-atom MD as an in silico approach for exploring differential phospholipid affinity at these sites…”

      Figure 2A: Specify which PDB entry corresponds to the displayed crystal structures in the main figure or caption.

      We clarified these PDB entries in the figure caption. 

      Check reference formatting in supplemental figures when generating VOR.

      I am not sure how relevant this might be to the claims of Figure 2-figure supplement 3, but AlphaFold3-based phospholigand docking might provide an additional orthogonal approach if relevant ligand(s) are available for such analysis (particularly for the newly proposed 10E8 POPC complex).

      Thank you for this suggestion.  AI/ML based prediction methods like AF3 and RoseTTAFold All-Atom (RFAA) are interesting new methods that have come since our initial submission.   We’ve decided these experiments are beyond the scope of this already long and detailed study. We have added a sentence suggesting use of these methods in future work.

      "We next studied bnAb LN01 to interrogate differences" --> this transition still reads a bit unclear. Why shift gears and change antibodies? Also, while you do go into its interactions both +/- antigen, there's no lead into the simulation initialization with and without antigen to guide the reader into the comparisons you will draw in the figure. Also, the order of information presentation is a bit strange, where the rationale for choosing a single monomeric helix is brought up in the middle of the paragraph instead of at the beginning of the section. In the next paragraph, it goes back to the initialization of the membrane composition again, which feels a bit disorganized-I do appreciate the unique challenge of having to weave through so much quality data! In fact, if you were to conduct simulations of membrane + antigen vs. membrane + LN01 vs. membrane + LN01 + antigen, I am tempted to say that this could be removed from this manuscript and flow better as a paper in and of itself.

      We thank the reviewer for the suggestion to improve the writing style.  We feel this section adds a lot of value to the manuscript, so we will keep it in the paper and improved the transition as well as rationale.  

      We selected to study the additional antibody LN01 and the monomeric MPER-TM antigen conformation because of the existing structural evidence available without additional creative model building.  This rationale has been updated in the new text.  

      We changd the order of information as suggested, moving the rationale for antigen fragment earlier in the paragraph followed by the background of the lipids sites from the crystal that can lead into simulation set-up.  We clarified the simulation initialization was similar for systems with and without antigen in the opening sentence of the paragraph

      "previously observed snorkeling and hydration of TM Arg686" --> Is this R696 (numbering could be different based on the particular Env)?

      Thank you for noting this typo, we have corrected the numbering.

      Potential font color issue with Figure 3-Figure supplement 1 B and part of A text-could be fixed in typesetting.

      The discussion reads very well. Is it possible to direct antibody maturation, even in an engineered context, towards membrane affinity without increasing immunogenic polyreactivity? This is mentioned very briefly and cited with ref 36, but I would be interested in the author's thoughts on this topic.

      We thank the reviewer for the insightful idea to explore in future work.  Our conclusion alludes to possibly artificially evolving membrane affinity studied by MD, as done in vitro by Nieva and co-workers.  Because the hypothetical nature, we’ve chosen not to elaborate on those ideas from this manuscript.

      Reviewer #2 (Recommendations for the authors):

      To ensure reproducibility and facilitate further research, the authors should publicly deposit the code for running the MD simulations and analyses (e.g., on GitHub) along with the underlying data used in the study (e.g., on Zenodo.org).

      We appreciate the consideration for open-source code and analysis. Representative code and simulation trajectories were uploaded to the following repositories:

      https://github.com/cmaillie98/mper_bnAbs.git

      https://zenodo.org/records/13830877

      —-

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Previous experimental studies demonstrated that membrane association drives avidity for several potent broadly HIV-neutralizing antibodies and its loss dramatically reduces neutralization. In this study, the authors present a tour de force analysis of molecular dynamics (MD) simulations that demonstrate how several HIV-neutralizing membrane-proximal external region (MPER)-targeting antibodies associate with a model lipid bilayer.

      First, the authors compared how three MPER antibodies, 4E10, PGZL1, and 10E8, associated with model membranes, constructed with a lipid composition similar to the native virion. They found that the related antibodies 4E10 and PGZL1 strongly associate with a phospholipid near heavy chain loop 1, consistent with prior crystallographic studies. They also discovered that a previously unappreciated framework region between loops 2-3 in the 4E10/PGZL1 heavy chain contributes to membrane association. Simulations of 10E8, an antibody from a different lineage, revealed several differences from published X-ray structures. Namely, a phosphatidylcholine binding site was offset and includes significant interaction with a nearby framework region.

      Next, the authors simulate another MPER-targeting antibody, LN01, with a model HIV membrane either containing or missing an MPER antigen fragment within. Of note, LN01 inserts more deeply into the membrane when the MPER antigen is present, supporting an energy balance between the lowest energy conformations of LN01, MPER, and the complex. Additional contacts and conformational restraints imposed by ectodomain regions of the envelope glycoprotein, however, remain unaddressed-the size of such simulations likely runs into technical limitations including sampling and compute time.

      The authors next established course-grained (CG) MD simulations of the various antibodies with model membranes to study membrane embedding. These simulations facilitated greater sampling of different initial antibody geometries relative to membrane. Distinct geometries derived from CG simulations were then used to initialize all-atom MD simulations to study insertion in finer detail (e.g., phospholipid association), which largely recapitulate their earlier results, albeit with more unbiased sampling. The multiscale model of an initial CG study with broad geometric sampling, followed by all-atom MD, provides a generalized framework for such simulations.

      Finally, the authors construct velocity pulling simulations to estimate the energetics of antibody membrane embedding. Using the multiscale modelling workflow to achieve greater geometric sampling, they demonstrate that their model reliably predicts lower association energetics for known mutations in 4E10 that disrupt lipid binding. However, the model does have limitations: namely, its ability to predict more subtle changes along a lineage-intermediate mutations that reduce lipid binding are indistinguishable from mutations that completely ablate lipid association. Thus, while large/binary differences in lipid affinity might be predictable, the use of this method as a generative model are likely more limited.

      The MD simulations conducted throughout are rigorous and the analysis are extensive. However, given the large amount of data presented within the manuscript, the text would benefit from clearer subsections that delineate discrete mechanistic discoveries, particularly for experimentalists interested in antibody discovery and design. One area the paper does not address involves the polyreactivity associated with membrane binding antibodies-MD simulations and/or pulling velocity experiments with model membranes of different compositions, with and without model antigens, would be needed. Finally, given the challenges in initializing these simulations and their limitations, the text regarding their generalized use for discovery, rather than mechanism, could be toned down.

      Overall, these analyses provide an important mechanistic characterization of how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization.

      Reviewer #2 (Public Review):

      In this study, Maillie et al. have carried out a set of multiscale molecular dynamics simulations to investigate the interactions between the viral membrane and four broadly neutralizing antibodies that target the membrane proximal exposed region (MPER) of the HIV-1 envelope trimer. The simulation recapitulated in several cases the binding sites of lipid head groups that were observed experimentally by X-ray crystallography, as well as some new binding sites. These binding sites were further validated using a structural bioinformatics approach. Finally, steered molecular dynamics was used to measure the binding strength between the membrane and variants of the 4E10 and PGZL1 antibodies.

      The conclusions from the paper are mostly well supported by the simulations, however, they remain very descriptive and the key findings should be better described and validated. In particular:

      It has been shown that the lipid composition of HIV membrane is rich in cholesterol [1], which accounts for almost 50% molar ratio. The authors use a very different composition and should therefore provide a reference. It has been shown for 4E10 that the change in lipid composition affects dynamics of the binding. The robustness of the results to changes of the lipid composition should also be reported.

      The real advantage of the multiscale approach (coarse grained (CG) simulation followed by a back-mapped all atom simulation) remains unclear. In most cases, the binding mode in the CG simulations seem to be an artifact.

      The results reported in this study should be better compared to available experimental data. For example how does the approach angle compare to cryo-EM structure of the bnAbs engaging with the MPER region, e.g. [2-3]? How do these results from this study compare to previous molecular dynamics studies, e.g.[4-5]?

      References<br /> (1) Brügger, Britta, et al. "The HIV lipidome: a raft with an unusual composition." Proceedings of the National Academy of Sciences 103.8 (2006): 2641-2646.<br /> (2) Rantalainen, Kimmo, et al. "HIV-1 envelope and MPER antibody structures in lipid assemblies." Cell Reports 31.4 (2020).<br /> (3) Yang, Shuang, et al. "Dynamic HIV-1 spike motion creates vulnerability for its membrane-bound tripod to antibody attack." Nature Communications 13.1 (2022): 6393.<br /> (4) Carravilla, Pablo, et al. "The bilayer collective properties govern the interaction of an HIV-1 antibody with the viral membrane." Biophysical Journal 118.1 (2020): 44-56.<br /> (5) Pinto, Dora, et al. "Structural basis for broad HIV-1 neutralization by the MPER-specific human broadly neutralizing antibody LN01." Cell host & microbe 26.5 (2019): 623-637.

      Considering reviewer suggestions, we slightly reorganized the results section into specific sub-sections with headings and changed the order in which key results were presented to allow the subsequent analysis more accessible for readers.  Supplemental materials were redistributed into eLife format, having each supplemental item grouped to a corresponding main figure. Many slightly detail modifications were made to figures (mostly supplemental items) without changing their character, such as clearer axes labels or revised annotations within panels.

      The major additions within the results sections based on the reviews were:

      (1) An expanded the comparison between our simulation analyses to previous simulations and to existing cryo-EM structural evidence for MPER antibodies’ membrane orientation the context of full-length antigen, resulting in new supplemental figure panels.

      (2) New atomistic simulations of 10E8, PGZL1, and 4E10 evaluating the phospholipid binding predictions in a different lipid composition more closely modeling HIV membranes.

      Minor edits to the analyses and interpretations include:

      (1) Outlining the geometric components contributing to variance in substates after clustering the atomistic 10E8, 4E10, and PGZL1 simulations.

      (2) Better defining the variance and durability of membrane interactions within and across systems in the coarse grain methods section.

      (3) Removed interpretations in the original results sections regarding polyreactivity and energetics for MPER bnAbs that were not explicitly supported by data.   

      (4) More context of the prevenance of bnAb loop geometries in structural informatics section

      (5) Rationale for the choice of the continuous helix MPER-TM conformation in LN01-antigen conformations, and citations to previous gp41 TM simulations.

      (6) Removed language on the novelty of the coarse grain and steered pulling simulations as newly developed approaches; tempering the potential discriminating power and applications of those approaches, in light of their limitations.

      The discussion was revised to provide more novel context of the results within the field, including discussing direct relevance of the simulation methods for evaluating immune tolerance mechanisms and into antibody engineering.   We have shared custom scripts used for molecular dynamics analysis on github (https://github.com/cmaillie98/mper_bnAbs.git) and uploaded trajectories to a public repository hosted on Zenodo (https://zenodo.org/records/13830877).

      Recommendations for the authors:

      Below, I provide an extensive list of minor edits associated with the text and figures for the authors to consider. I provide these with the hope of increasing the accessibility of the manuscript to broader audiences but leave changes to the discretion of the authors.

      Text/clarity

      Figure 1 main text

      The main text discussing Figure 1 is disorganized, making the analysis difficult to follow. I would suggest the following: moving the sentence, "4E10 and PG2L1 are structurally homologous" immediately after the paragraph discussing the simulation initiation. Then, add a sentence that directly compares their experimental affinity, neutralization, and polyreactivity of 4E10 and PG2L1 (later, an unintroduced idea pops up, "These patterns may in part explain 4E10's greater polyreactivity"). Next, lead into the discussion of the MD simulation data with something to the effect of: "Given these similarities, we first compared mechanisms of membrane insertion between 4E10 and PG2L1 to bolster confidence in our predictions". Later, the sentence "Across 4E10 and PGZL1 simulations, the bound lipid phosphates"

      We thank the reviewer for the suggestion and we have restructured the beginning of the results to implement this style: to first introduce then discuss the comparative PGZL1 & 4E10 results, i.e. Figure 1 plus associated supplements.

      In the background and the introduction text leading up to Figure 1, CDR-H3 is discussed at length, however, the first figure focuses almost entirely on how CDR-H1 coordinates a lipid phosphate headgroup. Are there experimental mutations in this loop that do not affect affinity (e.g., to a soluble gp41 peptide), but do affect neutralization (like the WAWA mutation for CDR-H3, discussed later)?

      We have altered the Introduction (para 2) and Results (4E10/PGZL1 sub-section) to give more balanced discussion of CDRs H1 & H3.  That includes referencing experimental data addressing the reviewer’s question; a PGZL1 clone H4K3 where mutations to CDRH1 were introduced and shown have minimal impact on affinity to MPER peptide via ELISA and BLI, but those mutant bnAbs had significantly reduced neutralization efficacy (PMC6879610).

      The sentence "These phospholipid binding events were highly stable, typically persisting for hundreds of nanoseconds" should be moved down to immediately precede, "[However], in a PGZL1 simulation, we observed a". This would be a good place for a paragraph break following, "Thus, these bnABs constitutively", since this block of text is very long.

      Similarly, the sentence and parts of the section, "Likewise, the interactions coordinating the lipid phosphate oxygens at CDR-H1" more appropriately belongs immediately before or after the sentence, "Our simulations uncover the CDR-lipid interactions that are the most feasible".

      Thank you for the detailed guidance in reorganizing the Figure 1 results.  We followed the advice to directly compare 4E10 and PGZL1 results separately from 10E8, moving those sections of text appropriately.  New paragraph breaks were added to improve accessibility and flow of concepts throughout the Results.

      In the sentence, "our simulations uncover CDR-lipid interactions that are the most feasible and biologically relevant in the context of a full [HIV] lipid bilayer... validation to which of the many possible ions" à have you confidently determined lipid binding and positioning outside of the site validated in figure 1? Which site(s) are these referencing? The next two sentences then introduce two new ideas on the loop backbone stability then lead into lipid exchange, which is a bit jarring.

      We have adjusted the language concerning the putative ions/lipids electron density across the many PGZL1 and 4E10 crystal structures, and additionally make the explicit point that we confidently determined the lack of lipid binding outside of the site focused on in Figure 1.

      “… both bnAbs showed strong hotspots for a lipid phosphate bound within the CDR-H1 loops, with minimal phospholipid or cholesterol ordering around the proteins elsewhere.  The simulated lipid phosphates bound within CDR-H1 have exceptional overlap with electron densities and atomic details of modelled headgroups from respective lipid-soaked co-crystal structures…”

      Figure 2 main text

      "We similarly investigated bnAb 10E8" - Please make this a separate subheader, the block text is very long up to this point.

      Thank you for the suggestion. We introduced a sub-header to separate work on 10E8 all-atom simulations.

      "we observed a POPC complexed with... modelled as headgroup phosphoglycerol anions..." - please cite the references within the text.

      Thank you for pointing out this missing reference, we added the appropriate reference.

      "One striking and novel observation" - please remove the phrase "striking" throughout, for following best practices in scientific writing (PMC10212555)-this is generally well-done throughout.

      We removed “striking” from our text per your suggestion.

      "This CDR-L1 site highlights... (>500 fold) across HIV strains" - How much do R29 and Y32 also contribute to antigen binding and the conformation of this loop? These mutants also decreased Kd by approximately 20X, and based on the co-crystal structure with the TM antigen (PDB: 4XCC), seem to play a more direct role in antigen contact. Additionally, these residues should be highlighted on a figure, otherwise it's difficult to understand why they are important for membrane association.

      We thank the reviewer for deep engagement to these supporting experimental details.  The R29A+Y32A 10E8 mutant referenced in the text showed only 4-fold Kd increase, a modest change for an SPR binding experiment.  Whereas R29E+Y32E 10E8 mutant resulted in 40x Kd increase, the “20x” the reviewer refers to.  Both 10E8 mutants showed similar drastically reduced breadth and potency of over 2 orders of magnitude on average.

      These mutated CDR-L1 residues are not directly involved in antigen contact and adopt the same loop helix conformation when antigen is bound.  A minor impact on antigen binding affinity could be due altering pre-organization of CDR loops upon losing interactions from the Tyr & Arg sidechains - particularly Tyr31 in contact with CDR-H3.

      As per the suggestion, clearer annotated figure panel denoting these sidechains has been added to Figure 2-Figure Supplement 1 for 10E8 analysis.

      "Structural searches querying... identified between 10^5 and 2*10^6..." - why is this value represented as such a large range? Does this depend on the parameters used for analysis? Please clarify.

      Additionally, how prevalent are any random loop conformations compared to the ones you searched? It's otherwise difficult to attribute number of occurrences within the 2 A cutoff to biological significance, as this number is not put in context.

      We appreciate the reviewers comment to contextualize the range and relative frequency of the bnAb loop conformations.   RMSD and length of loop are the key parameters, which can be controlled by searching reference loops of similar length.  The main point of the backbone-level searching is simply to imply the bnAb loops are not particularly rare when comparing loops of similar length.   

      We did as was suggested and added comparison to random loops of the same length to the main text, including a new Supplementary Table 4.   

      “…identified between 105 to 2∙106 geometrically similar sub-segments within natural proteins (<2 Å RMSD)40, reflecting they are relatively prevalent (not rare) in the protein universe, comparing well with frequency of other surface loops of similar length in antibodies (Supplementary Table 3).”

      "We next examined the geometries" could start after its own new subheading. Moreover, while there's an emphasis on tilt for neutralization, there is not a figure clearly modelling the proposed Env tilt compared to the relatively planar bilayer. It would be helpful to have an additional panel somewhere that shows the orientation of the antibody (e.g., a representative pose) in the simulations relative to an appropriately curved membrane, Env, the binding conformation of the antibody to Env, and apo Env, given the tilting observed in PMID: 32348769 and theorized in PMC5338832. What additional conformational changes or tilting need to occur between the antibodies and Env to accomplish binding to their respective epitopes?

      Thank you for outlining an interesting element to consider in our analysis of a multi-step binding mechanism for MPER antibodies. We added additional figure panels in the supplement to outline the similarities and differences between our simulations and Fabs with the inferred membranes in cryo-EM experiments of full-length HIV Env.  The simulated Fabs’ angles are very similar with only minor tilting to match the cryo-EM antibody-membrane geometries. 

      We added Figure 1-figure supplement 1A & Figure 2-figure supplement 2A, and alter to text to reflect this:

      “The primary difference is Env-bound Fabs in cryo-EM adopt slightly more shallow approach angles (~15_°_) relative to the bilayer normal.  The simulated bnAbs in isolation prefer orientations slightly more upright, but presenting CDRs at approximately the same depth and orientation.  Thus, these bnAbs appear pre-disposed in their membrane surface conformations, needing only a minor tilt to form the membrane-antibody-antigen neutralization complex.”   

      Env tilt dynamics and membrane curvature of natural virions may reconcile some of these differences.  Recent in situ tomography of Full-length Env in pseudo-virions corroborates our approximation of flat bilayers over the short length scales around Env.

      The sentence "we next examined the geometries" mentions "potential energy cost, if any, for reorienting...". However, there's no further discussions of geometry or energy cost within this section. Please rephrase, or move this figure to main and increase discussion associated with the various conformational ensembles, their geometry, and their phospholipid association.

      As the reviewer highlights, the unbiased simulations and our analysis do not explicitly evaluate energetics.  We removed this phrase, and now only allude to the minimal energy barrier between the similar geometric conformations, relative to the tilting & access requirements for antigen binding mechanism.

      “The apparent barrier for re-orientation is likely much less energetically constraining than shielding glycans and accessibility of MPER”

      ".. describing the spectrum of surface-bound conformations" cites the wrong figure.

      Thank you for noticing this error; we correct the figure reference to (Figure 2-figure supplement 4).

      Please comment on the significance of how global clustering (Fig. S5A-C) was similar for 4E10 and PGZL1, but different for 10E8 (e.g., blue, orange, and yellow clusters for 4E10 and PHZL1 versus cyan, red, and green clusters for 10E8). As the cyan cluster seems to be much closer in Euclidian space to the 4E10/PGZL1 clusters, it might warrant additional analysis. What do these clusters represent in terms of structure/conformation? How do these clusters differ in membrane insertion as in (A)?

      We are grateful you identify analysis in the geometric clustering section that may be of interest to other readers. We have added additional supplementary table (Table 2) to detail the CDR loop membrane insertion and global Fab angles which describe each cluster, to demonstrate their similarities and differences.  We also better describe how global clustering was similar for 4E10 and PGZL1, but different for 10E8 in the relevant results section<br /> The cyan cluster is not close in structure to 4E10/PGZL1 clusters.  We note the original figure panel had an error.  The updated Figure 2-supplement 4B shows the correct Euclidian distance hierarchy with an early split between 4e10/pgzl1 and 10e8 clusters.

      Figure 3 main text

      The start of this section, "We next studied bnAb LN01...", is a good place for a new subheader.

      We have added an additional subheader here: Antigen influence on membrane bound conformations and lipid binding sites for LN01

      There should be a sentence in the main text defining the replicate setup and production MD run time. Is the apo and complex based on a published structure? How do you embed the MPER? Is the apo structure docked to membrane like in 4E10? The MD setup could also be better delineated within the methods.

      The first two paragraphs in this section have been updated to clarify the relevant simulations configuration and Fab membrane docking prediction details. 

      The procedure was the same for predicting an initial membrane insertion, albeit now we use the LN01-TM complex and the calculation will account for the membrane burial of the the TM domain and MPER fragment.  As mentioned, LN01 is predicted as inserted with CDR loops insert similarly with or without the TM-MPER fragment.  The geometry differs from PGZL1/4E10 and 10E8, denoted by the text.

      Please comment on the oligomerization state of the antigen used in the MD simulation: how does the simulation differ from a crossed MPER as observed in an MPER antibody-bound Env cryo-EM structure (PMID: 32348769), a three-helix bundle (PMC7210310), or single transmembrane helix (PMC6121722)? How does the model MPER monomer embed in the membrane compared to simulations with a trimeric MPER (PMC6035291, PMID: 33882664)-namely, key arginine residues such as R696?

      We thank the reviewer for pointing out critical underlying rationale for modeling this TM-MPER-LN01 complex which we have corrected in the revised draft. The range of potential conformations and display of MPER based on TM domain organization could easily be its own paper – we in fact have a manuscript in preparation on the topic.  

      The updated text expands the rationale for choosing the monomeric uninterrupted helix form of the MPER-TM model antigen (para 1 of LN01 section). The alternative conformations we did not to explore are called out, with references provided by the reviewer.

      The discussion qualified that the MPER presentation is likely oversimplified here, noting MPER display in the full-length Env trimer will vary in different conformational states or membrane environments. However, the only cryo-EM structures of full-length ENV with TM domains resolved have this continuous helix MPER-TM conformation – seen both within crossing TM dimers or dissociated TM monomers.

      Are there additional analyses that can validate the dynamics of the MPER monomer in the membrane and relative to LN01? Such as key contacts you would expect to maintain over the duration of the MD simulation?

      We also increased description of this TM domain’s behavior, dynamics (tilt, orientation, Arg696 snorkeling, and complex w LN01) to provide a clearer picture of the simulation results – which aligns with past MD of the gp41 TM domain as a monomer (para 2 of LN01 section).  As well, we noted key LN01-MPER contacts that were maintained.

      How does the model MPER modulate membrane properties like lipid density and lipid proximities near LN01?

      We checked and didn’t notice differences for the types of lipids (chol, etc) proximal to the MPER-TM or the CDR loops versus the bulk lipid bilayer distributions.  Due to the already long & detailed nature of this manuscript, we elect not to include discussion on this topic.

      Supplemental figure 1H-I would be better positioned as a figure 3-associated supplemental figure.

      We rearranged to follow the eLife format and have paired supplemental panels with their most relevant main figures.

      Figure 3F/H reference a "loading site" but this site is defined much later in the text, which was confusing.

      Thank you for pointing out this source of confusion, we rearranged our discussion to reflect the order in which we present data in figures.

      What evidence suggests that lipids "quickly exchange from the Loading site into the X-ray site by diffusion"? I do not gather this from Figure S1H/I.

      We have rearranged the loading side and x-ray site RMSD maps in Figure 3-Figure supplement 1 to better illustrate how a lipid exchanges between these sites.

      Figure 4 main text

      The authors assert that in the CG simulations, restraints, "[maintain] Fab tertiary and quaternary structure". However, backbone RMSD does not directly assert this claim-an additional analysis of the key interfacial residues between chains, or geometric analysis between the chains, would better support this claim.

      Thank you for pointing this point.  We rephrased to add that the major sidechain contacts between heavy and light chain persist, in addition to backbone RMSD, to describe how these Fabs maintain the fold stably in CG representation. 

      In several cases, CG models sample and then dissociate from the membrane. In the text, the authors mention, "course-grained models can distinguishing unfavorable and favorable membrane-bound conformations". Is there a particular orientation that causes/favors membrane association and dissociation? This analysis could look at conformations immediately preceding association and dissociation to give clues as to what orientation(s) favor each state.

      Thank you for suggesting this interesting analysis.  Clustering analysis of associated states are presented in Figure 5, Figure 5-Figure Supplement 1, and Figure 6, which show all CDR and framework loop directed insertion.  This feature is currently described in the main text.  

      We did not find strong correlation of specific orientations as “pre-dissociation” states or ineffective non-inserting “scanning” events.  We revised the key sentence to reflect the major take away – that non-CDR alternative conformations did not insert and most of those having CDRs inserted in a different manner than all-atom simulations also were prone to dissociate:

      “Given that non-CDR directed and alternative CDR-embedded orientations readily dissociate, we conclude that course-grained models can distinguish unfavorable and favorable membrane-bound conformations to an extent that provides utility for characterizing antibody-bilayer interaction mechanisms.”

      Figure 6 main text

      "For 4E10, trajectories initiated from all three geometries..." only two geometries are shown for each antibody. Please include all three on the plot.

      The plots include markers for all three geometries for 4E10, highlighted in stars or with letters on the density plots of angles sampled (Figure 6B,C)

      "Aligning a full-length IgG... unlikely that two Fabs simultaneously..." Are there theoretical conformations in which two Fabs could simultaneously associate with membrane? If this was physiological or could be designed rationally, could an antibody benefit further from avidity?

      Our modeling suggests the theoretical conformations having two Fabs on the membrane are infeasible.  It’s even less likely multiple Env antigens could be engaged by one IgG.  We have revised the text to express this more clearly.

      Figure 7 main text

      "An intermediate... showed a modest reduction in affinity..." what affinity does PGZL1 have for this antigen?

      The preceding sentence for this information: “Mature PGZL1 has relatively high affinity to the MPER epitope peptide (Kd = 10 nM) and demonstrates great breadth and potency, neutralizing 84% of a 130 strain panel “

      Figures

      Figure 1

      It would be helpful to have an additional panel at the top of this figure further zoomed out showing the orientation of the antibody (e.g., a representative pose) in the simulations relative to an appropriately curved membrane, Env, the binding conformation of the antibody to Env, and apo Env, given the tilting observed in PMID: 32348769 and theorized in PMC5338832. What additional conformational changes or tilting need to occur between the antibodies and Env to accomplish binding to their respective epitopes?

      Thank you for the suggestion to include this analysis.  We have added to the text reflecting this information, as well as making new supplemental panels for 4E10 and 10E8 that we compare simulated 4E10 and 10E8 Fab conformations to cryoEM density maps with Fabs bound to full-length HIV Env. Figure 1-figure supplement 1A & Figure 2-figure supplement 2A

      In Figure 1, space permitting, it would be helpful to annotate the distances between the phosphates and side chains (similarly, for Figure S1A).

      To avoid the overloading the Main figure panels with text, those relevant distances are listed in the methods sections.  Those distances are used to define the “bound” lipid phosphate state.  Generally, we note the interactions are within hydrogen bonding distance.

      Annotating "Replicate 1" and "Replicate 2" on the left side of Figure 1C/D would make this figure immediately intuitive.

      We have added these labels.

      Figure caption 1C: Please clarify the threshold/definition of a contact used to binarize "bound" versus "unbound" (for example, "mean distance cutoff of 2A between the phosphate oxygen and the COM of CDR-H1") [on further reading of the methods section, this criterion is quite involved and might benefit from: a sentence that includes "see methods"]. Additionally, C could use a sentence explaining the bar such as in E, "Phosphate binding is mapped to above each MD trajectory" Please define FR-H3 in the figure caption for E/F.

      We have added these details to the figure caption.

      Because Figure 1 is aggregated simulation time, it would be helpful to also represent the data as individual replicates or incorporate this information to calculate standard deviations/statistics (e.g., 1 microsecond max using the replicates to compute a standard deviation).

      We believe the current quantification & display of data via sharing all trajectories is sufficient to convey the major point for how often each CDR-phosholipid binding site it occupied.  Further tracking and statistics of inter-atomic distances will likely be too tedious & add minimal value. There is some dynamics of the phosphate oxygens between the polar within the CDR site but our “bound” state definitions sufficiently describe the key participating interactions are made.

      Figure 2

      For A, it would be helpful to annotate the yellow and blue mesh on the figure itself.

      We have defined the orange phosphate and blue choline densities.

      Also, where are R29 and Y32 relative to this site? In the X-ray panels, Y38 is not shown, and the box delineating the zoom-in is almost imperceptible.

      Thank you for this suggestion to include those amino acids which are referenced in the text as critical sites where mutation impacts function. To clarify, Y32 is the pdb numbering for residue Y38 in IMGT numbering. We have added a panel to Figure 2-Figure Supplement 1 having a cartoon graphic of 10E8 loop groove with sidechains & annotating R29 and Y38, staying consistent with out use of IMGT numbering in the manuscript.

      Figure 3

      It might read clearer to have "LN01+MPER-TM" and "LN01-Apo" in the middle of A/B and C/D, respectively, and a dotted line delineating the left and right side of the figure panels.

      We have added these details to the figure for clarity for readers.

      It would be helpful to show some critical interactions that are discussed in the text, such as the salt bridge with K31, by labeling these on the figure (e.g., in E-H).

      We drafted figure panels with dashed lines to indicate those key interactions.  However, they became almost imperceptible and overloaded with annotations that distracted from the overall details.  For K31, the interaction occurs in LN01 crystal structures readers can refer to.

      Why are axes cut off for J?

      We corrected this.

      Please re-define K/L plots as in Figure 1, and explain abbreviations.

      We updated the figure caption to reflect these changes.

      Figure 4

      The caption for panel A states that the Fab begins in solvent 1-2 nm above the bilayer, but the main text states 0.5-2 nm.

      We have reconciled this difference and listed the correct distances: 0.5-2nm.

      Please label the y-axis as "Replicate" for relevant figure panels so that they are more immediately interpretable.

      This label has been added.

      A legend with "membrane-associated" and "non-associated" within the figure would be helpful. Additionally, the average percent membrane associated, with a standard deviation, should be shown (Similar to 1C, albeit with the statistics).

      This legend has been added.  We also added the additional statistical metrics requested to strengthen our analysis.

      The text references "10, 14, and 12 extended insertion events" for the three antibody-based simulations. How do you define "extended insertion events"? Would breaking this into average insertion time and standard deviation better highlight the association differences between MPER antibodies and controls, in addition to the variability due to difference random initialization?

      We thank the reviewer for the insightful suggestion on how to better organize quantitative analysis to support the method. Supplemental Table 3 includes these numbers.

      Figure 5

      The analysis in Fig. S6C could be included here as a main figure.

      The drafted revised figure adding S6C to Figure 5 made for too much information.  Likewise, putting this panel S6C separated it from the parent clustering data of S6B, so we decided to keep these figures separated.  The S6 figure is now Figure 5-figure supplement 1.

      Figure 6

      Please annotate membrane insertion on E as %.

      These are phosphate binding RMSD/occupancy vs time.  The panels are now too small to annotate by %.  The qualitative presentation is sufficient at this stage.  The quantitative % are listed in-line within text when relevant to support assertions made. 

      Please use the figure caption to explain why certain clusters (e.g., 10E8 cluster A, artifact, Fig. S6E) are not included in panel E.

      We have added this information in the figure caption.

      Figure 7

      Please show all points on the box and whisker plots (panels E and F), and perform appropriate statistical tests to see if means are significantly different (these are mentioned in the text, but should be annotated on the graph and mentioned within the figure caption).

      We have changed these plots to show all data points along with relevant statistical comparisons. The figure captions describe unpaired t-test statistical tests used.

      Figure S1

      G, H, and I do not belong here-they should be moved to accompany their relevant text section, which associates with Figure 3. It would be helpful to associate this with Figure 3 in the eLife format, "Figure 3-Supplemental Figure 1" or its equivalent.

      It's very difficult to distinguish the green and blue circles on panel G.

      We darkened the shading and added outline for better visualization

      Subfigure I is missing a caption, could be included with H: "(H,I) Additional replicates for LN01+TM (H) and LN01 (I)".

      We corrected this as suggested.

      Why is H only 3 simulations and not 4? Does it not have a lipid in the x-ray site? Also, the caption states "(top, green)" and "(bottom, cyan)", but the green vs. cyan figures are organized on the left and right. Additional labels within the figure would help make this more intuitive.

      If the point of H and I is to illustrate that POPC exchanges between the X-ray and loading sites, this is unclear from the figure. Consider clarifying these figures.

      Thank you for describing the confusion in this figure, we have added labels to clarify.

      Figure S2 (panels split between revised Figure 4 associated figure supplements)

      The LN01 figures should likely follow later so that they can associate with Figure 3, despite being a similar analysis.

      We corrected supplements to eLife format so supplements are associated with relevant main figures.

      Figure S3 (panels split between revised Figure 1 & 2 associated figure supplements)

      As hydrophobicity is discussed as a driving factor for residue insertion, it would be helpful to have a rolling hydrophobicity chart underneath each plot to make this claim obvious.

      We prefer the current format, due to the worry of having too much information in these already data-rich panels.  As well, residues are not apolar but are deeply inserted.

      Figure S4 (panels split between revised Figure 1 & 2 associated figure supplements)

      It would be helpful to label the relevant loops on these figures.

      We have labeled loops for clarity.

      Do any of these loops have minor contacts with Env in the structure?

      The 4E10 and PGZL1 CDRH-1 loop does not directly contact bound MPER peptides bound in crystal structures. 

      FRL-3 and CDR-H1 in 10E8 do not contact the MPER peptide antigen component based on x-ray crystal structures.

      Do motif contacts with lipid involve minor contacts with additional loops other than those displayed in this figure?

      The phosphate-loop interactions in motifs used as query bait here are mediated solely by the backbone and side chain interactions of the loops displayed. We visually inspected most matches and did not see any “consensus” additional peripheral interactions common across each potential instance in the unrelated proteins.  The supplied Supplemental Table 2 contains the information if a reader wanted to conduct a detailed search. 

      Why is there such a difference between the loop conformation adopted in the X-ray structure and that in the MD simulation, and why does this lead to the large observed differences in ligand-binding structure matches?

      We thank the reviewer for carefully noting our error in labeling of CDR loop and framework region input queries. We revised the labeling to clarify the issue.

      The is minimal structural difference between the loops in x-ray and MD.

      Figure S5 (Figure 2-Figure supplement 4)

      This figure is not colorblind friendly-it would be helpful to change to such a pallet as the data are interesting, but uninterpretable to some.

      We have left this figure the same.

      "Susbstates" - "Substates"

      Corrected, thank you.

      Panel B is uninterpretable-please break the axis so that the Euclidian distances can be represented accurately but the histograms can be interpreted.

      We have adjusted axis for this plot to better illustrate the cluster thresholds.

      The clusters in D-H should be analyzed in greater depth. What is the structural relevance of these clusters other than differences in phospholipid occupancy in (I)? Snapshots of representative poses for each cluster could help clarify these differences.

      We have adjusted the text to describe the geometric differences in each of those clusters that result in the different exceptionally lower propensities for forming the key phospholipid interaction.  

      The figure caption should make it clear that 3 μS of aggregate simulation time is being used here instead of 4 μS to start with unique tilt initializations. E.g., "unique starting membrane-bound conformations (0 degrees, -15 degrees, 15 degrees initialization relative to the docked pose)". Further, why was the particular 0-degree replicate chosen while the other was thrown out? Or was this information averaged? Why is the full 4 μS then used for D-I?

      We thank the reviewer for noting these details.  We didn’t want to bias the differential between 10E8 and 4E10/PGZL1 by including the replicate simulations.  The analysis was mainly intended to achieve more coarse resolution distinction between 10E8 and the similar PGZL1/4E10.  

      In the subsequent clustering of individual bnAb simulation groups, the replicate 0 degree simulations had sufficiently different geometric sampling and unique lipid binding behavior that we though it should be used (4 us total) to achieve finer conformational resolution for each bnAb.

      Figure S6 (now Figure 5-Figure Supplement 1)

      Please label the CDRs in C and provide a color key like in other figures. Also, please label the y-axes. This figure could move to main below 5B with the clusters "A,B,C" labeled on 5B.

      We have added the axes labels and color key legend.  We retained a minimal CDR loop labeling scheme for the more throughput interaction profiles here where colored sections in the residue axes denote CDR loop regions.

      Figure S7 (Figure 7 Figure Supplement 1)

      Panels A and B would likely read better if swapped.

      We have swapped these panels for a better flow.

      For panel C, please display mean and standard deviation, and compare these values with an appropriate statistical test.

      This is already displayed in main figure, we have removed it from supplement.

      For E and F, please clarify from which trajectory(s) you are extracting this conformation from. Are these the global mean/representative poses? How do they compare to other geometrically distinct clusters?

      The requested information was added to supplemental figure caption.  These are frames from 2 distinct time points selected phosphate bound frames from 0-degree tilt replicates for both 4E10 and 10E8, representing at least 2 distinct macroscopic substates differing in global light chain and heavy chain orientation towards the membrane. 

      Table S2 (now Supplementary Table 3)

      Please add details for the 13h11 simulation.

      Additionally, please add average contact time and their standard deviation to the table, rather than just the aggregated total time. This will highlight the variability associated with the random initializations of each simulation.

      We have added the details for 13h11 and the requested analysis (average aggregated time +/- standard deviation and average time per association event +- standard deviation) to supplement our summary statistics for this method.

      Reviewer #2 (Recommendations For The Authors):

      (1) The structure of the manuscript should be improved. For example, almost half of the introduction (three paragraphs) summarize the results. I found it hard to navigate all the data and specific interactions described in the result section. Furthermore, the claims at the end of several sections seem unsupported. Especially for the generalization of the approach. This should be moved to the discussion section. The discussion is pretty general and does not provide much context to the results presented in this study.

      We have significantly reorganized the results section to improve the flow of the manuscript and accessibility for readers, especially the first sections of all-atom simulations. We also removed claims not directly supported by data from our results, and expanded on some of these concepts in the discussion to make some more novel context to the result.

      (2) The author should cite more rigorously previous work and refrain from using the term "develop" to describe the simple use of a well established method. E.g. Several studies have investigated membrane protein interactions e.g. [1], membrane protein-bilayer self-assembly [2], steered molecular dynamics [3], etc.

      Thank you for identifying relevant work for the simulations that set precedent for our novel application to antibody-membrane interactions.  We have removed language about development of simulation methods from the text and now better reference the precedent simulation methods used here.

      (3) Have the authors considered estimating the PMF by combining the steered MD simulation through the application of Jarzynski's equality?

      We performed from preliminary PMFs for Fab-membrane binding, but saw it was taking upward of 40 us to reach convergence.  Steered simulations focus on a key lipid may be easier.

      Although PMFs are beyond the scope of this work, we added proposals & allusion to their utility as the next steps for more rigorous quantification of fab-membrane interactions.

      Minor

      (4) The term "integrative modeling" is usually used for computational pipelines which incorporate experimental data. Multiscale modeling would be more appropriate for this study.

      We altered descriptions throughout the manuscript to reflect this comment.

      (5) Units to report the force in the steered molecular dynamics are incorrect. They should be 98.

      We changed axes and results to correctly report this unit.

      (6) Labels for axes of several graphs are not missing.

      We added labels to all axes of graphs, except for a few where stacked labels can be easily interpreted to save space and reduce complexity in figures.

      (7) Figure 3 K & L is this really < 1% of total? The term "total" should also be clarified.

      Thank you for pointing this out, we changed the % labels to be correct with axes from 0-100%. We clarified total in the figure caption.

      (8) The font size in figures should be uniformized.

      This suggestion has been applied

      (9) Time needed for steered MD should be reported in CPUh and not hours (page 17).

      We removed comments on explicit time measurements for our simulations.

      (10) Version of Martini force field is missing in methods section

      We used Martini 2.6 and added this to the methods.

      References

      (1) Prunotto, Alessio, et al. "Molecular bases of the membrane association mechanism potentiating antibiotic resistance by New Delhi metallo-β-lactamase 1." ACS infectious diseases 6.10 (2020): 2719-2731.

      (2) Scott, Kathryn A., et al. "Coarse-grained MD simulations of membrane protein-bilayer self-assembly." Structure 16.4 (2008): 621-630.

      (3) Izrailev, S., et al. "Computational molecular dynamics: challenges, methods, ideas. Chapter 1. Steered molecular dynamics." (1997).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors explore a novel mechanism linking aging to chromosome mis-segregation and aneuploidy in yeast cells. They reveal that, in old yeast mother cells, chromosome loss occurs through asymmetric partitioning of chromosomes to daughter cells, a process coupled with the inheritance of an old Spindle Pole Body. Remarkably, the authors identify that remodeling of the nuclear pore complex (NPC), specifically the displacement of its nuclear basket, triggers these asymmetric segregation events. This disruption also leads to the leakage of unspliced pre-mRNAs into the cytoplasm, highlighting a breakdown in RNA quality control. Through genetic manipulation, the study demonstrates that removing introns from key chromosome segregation genes is sufficient to prevent chromosome loss in aged cells. Moreover, promoting pre-mRNA leakage in young cells mimics the chromosome mis-segregation observed in old cells, providing further evidence for the critical role of nuclear envelope integrity and RNA processing in aging-related genome instability.

      Strengths:

      The findings presented are not only intriguing but also well-supported by robust experimental data, highlighting a previously unrecognized connection between nuclear envelope integrity, RNA processing, and genome stability in aging cells, deepening our understanding of the molecular basis of chromosome loss in aging.

      We thank the reviewer for this very positive assessment of our work

      Weaknesses:

      Further analysis of yeast aging data from microfluidic experiments will provide important information about the dynamic features and prevalence of the key aging phenotypes, e.g. pre-mRNA leakage and chromosome loss, reported in this work.

      We thank the reviewer for bringing this point, which we will address indeed in the revised version of the manuscript.  In short, chromosome loss is an abrupt, late event in the lifespan of the cells.  Its prevalence is more complex to assess and will require correlated loss rate of several chromosomes concomitantly. The prevalence of the pre-mRNA leakage phenotype is easier to assess and we will provide data about this in the revised manuscript as well.  Our data show that the prevalence is quite high (well above 50%), even if not every cell is affected.

      In addition, a discussion would be needed to clarify the relationship between "chromosome loss" in this study and "genomic missegregation" reported previously in yeast aging.

      The genomic missegregation mentioned by the reviewer is a process distinct from the chromosome loss that we report.  Genomic missegregation is characterized by the entry of both SPBs and all the chromosomes into the daughter cell compartment (PMID: 31714209).  We do observed these events in our movies as well.  In contrast, the chromosome loss phenotype is takes place under proper elongation of the spindle and proper segregation of the two SPBs between mother and bud, as shown in figure 2 of the manuscript.  In our movies, chromosome loss is at least three fold more frequent (for a single chromosome) than full genome missegregation.  Furthermore, whereas chromosome loss is alleviated by the removal of the introns of MCM21, NBL1 and GLC7, genomic missegregation is not.

      Nevertheless, we thank the reviewer for bringing up the possible confusion between the two phenotypes.  We will explain and illustrate the difference between the two processes in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors make the interesting discovery of increased chromosome non-dysjunction in aging yeast mother cells. The phenotype is quite striking and well supported with solid experimental evidence. This is quite significant to a haploid cell (as used here) - loss of an essential chromosome leads to death soon thereafter. The authors then work to tie this phenotype to other age-associated phenotypes that have been previously characterized: accumulation of extrachromosomal rDNA circles that then correlate with compromised nuclear pore export functions, which correlates with "leaky" pores that permit unspliced mRNA messages to be inappropriately exported to the cytoplasm. They then infer that three intron containing mRNAs that encode portions in resolving sister chromatid separation during mitosis, are unspliced in this age-associated defect and thus lead to the non-dysjunction problem.

      Strengths: The discovery of age-associated chromosome non-dysjunction is an interesting discovery, and it is demonstrated in a convincing fashion with "classic" microscopy-based single cell fluorescent chromosome assays that are appropriate and seem robust. The correlation of this phenotype with other age-associated phenotypes - specifically extrachromosomal rDNA circles and nuclear pore dysfunction - is supported by in vivo genetic manipulations that have been well-characterized in the past.

      In addition, the application of the single cell mRNA splicing defect reporter showed very convincingly that general mRNA splicing is compromised in aged cells. Such a pleiotropic event certainly has big implications.

      We thank the reviewer for this assessment of our work.  To avoid confusion, we would like to stress out, however, that our data do not show that splicing per se is defective in old cells.  We only show that unspliced mRNAs tend to leak out of the nucleus of old cells.

      Weaknesses:

      The biggest weakness is "connecting all the dots" of causality and linking the splicing defect to chromosome disjunction. I commend the authors for making a valiant effort in this regard, but there are many caveats to this interpretation. While the "triple intron" removal suppressed the non-dysjunction defect in aged cells, this could simply be a kinetic fix, where a slowdown in the relevant aspects of mitosis, could give the cell time to resolve the syntelic attachment of the chromatids.

      The possibility that intron-removal leads to a kinetic fix is an interesting idea that we will address in the revised manuscript.  So far we have no observed that removing these introns slows down mitosis but we will test the idea by doing precise measurements.

      To this point, I note that the intron-less version of GLC7, which affects the most dramatic suppression of the three genes, is reported by one of the authors to have a slow growth rate (Parenteau et al, 2008 - https://doi.org/10.1091/mbc.e07-12-1254)

      The reviewer is right, removing the intron of GLC7 reduces the expression levels of the gene product (PMID: 16816425) to about 50% of the original value and causes a slow growth phenotype.  However, the cells revert fairly rapidly through duplication of the GLC7 gene.  As a consequence, neither the GLC7-∆i nor the 3x∆i mutant strains show noticeable growth phenotypes by spot assays.  We will document these findings and provide a measurement of the growth rate of the mutant strain in the revised manuscript. 

      In addition, the lifespan curve containing the 3∆i in Figure 5E has a very unusual shape, suggesting a growth problem/"sickness" in this strain.

      To be accurate the strain plotted in Figure 5E is not the 3x∆i triple mutant strain but the 3x∆i mlp1∆  quadruple mutant strain.  The 3x∆i triple mutant strain is plotted in Figure 4D and its shape is similar to that of the wild type cells.  The strain in Figure 5E is indeed sick ,due to the removal of the nuclear basket. However, the 3x∆i mutations partially rescue the replicative lifespan shortening due the mlp1∆ mutation (see text).  Illustrating the fact that the 3x∆i mutant strain is not particularly sick, it shows a prolonged lifespan and a fairly standard aging curve.

      Lastly, the Herculean effort to perform FISH of the introns in the cytoplasm is quite literally at the statistical limit of this assay. The data were not as robust as the other assays employed through this study. The data show either "no" signal for the young cells or a signal of 0, 1,or 2 FISH foci in the aged cells. In a Poisson distribution, which this follows, it is improbable to distinguish between these differences.

      This is correct, this experiment was not the easiest of the manuscript... However, despite the limitations of the assay, the data presented in figure 6B are quite clear.  300 cells aged by MEP were analysed, divided in the cohorts of 100 each, and the distribution of foci (nuclear vs cytoplasmic) in these aged cells were compared to the distribution in three cohorts of young cells.  For all 3 aged cohorts, over 70% of the visible foci were cytoplasmic, while in the young cells, this figure was around 3%.  A t-test was conducted to compare these frequencies between young and old cells (Figure 6B).  The difference is highly significant.  The reviewer refers to the supplementary Figure 4, where we were simply asking i) is the signal lost in cells lacking the intron of GLC7 (the response is unambiguously yes) and ii) what is the general number of dots per cells between young and old wild type cells (without distinguishing between nuclear and cytoplasmic) and the information to be taken from this last quantification is indeed that there is no clearly distinguishable difference between these two population of cells.  In other word, the reason why there are more dots in the cytoplasm of the old cells in the Figure 6B is not because the old cells have much more dots in general.  We hope that these clarifications help understand the data better.  We will make sure that this is clearer in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      Mirkovic et al explore the cause underlying development of aneuploidy during aging. This paper provides a compelling insight into the basis of chromosome missegregation in aged cells, tying this phenomenon to the established Nuclear Pore Complex architecture remodeling that occurs with aging across a large span of diverse organisms. The authors first establish that aged mother cells exhibit aberrant error correction during mitosis. As extrachromosomal rDNA circles (ERCs) are known to increase with age and lead to NPC dysfunction that can result in leakage of unspliced pre-mRNAs, Mirkovic et al search for intron-containing genes in yeast that may be underlying chromosome missegregation, identifying three genes in the aurora B-dependent error correction pathway: MCM21, NBL1, and GLC7. Interestingly, intron-less mutants in these genes suppress chromosome loss in aged cells, with a significant impact observed when all three introns were deleted (3x∆i). The 3x∆i mutant also suppresses the increased chromosome loss resulting from nuclear basket destabilization in a mlp1∆ mutant. The authors then directly test if aged cells do exhibit aberrant mRNA export, using RNA FISH to identify that old cells indeed leak intron-containing pre-mRNA into the cytoplasm, as well as a reporter assay to demonstrate translation of leaked pre-mRNA, and that this is suppressed in cells producing less ERCs. Mutants causing increased pre-mRNA leakage are sufficient to induce chromosome missegregation, which is suppressed by the 3x∆i.

      Strengths:

      The finding that deleting the introns of 3 genes in the Aurora B pathway can suppress age-related chromosome missegregation is highly compelling. Additionally, the rationale behind the various experiments in this paper is well-reasoned and clearly explained.

      We thank the reviewer for their very positive assessment of our work

      Weaknesses:

      In some cases, controls for experiments were not presented or were depicted in other figures.

      We are sorry about this confusion.  We will improve our presentation of the controls, make sure that they are brought back again each time they are relevant (we wanted to limit the cases of replotting the same controls several times).  We will also add those that are missing (such as those mentioned by reviewer 2, see above)

      High variability was seen in chromosome loss data, leading to large error bars.

      We thank the reviewer for this comment. The variance in those two figures (3A and 5D) comes from the suboptimal plotting of this data. This will be corrected in the revised version of the manuscript. 

      The text could have been more polished.

      Thank you for this comment.  We will go through the manuscript again in details