5,945 Matching Annotations
  1. Jun 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Formins are complex proteins with multiple effects on actin filament assembly, including nucleation, capping with processive elongation, and bundling. Determining which of these activities is important for a given biological process and normal cellular function is a major challenge.

      Here, the authors study the formin FHOD3L, which is essential for normal sarcomere assembly in muscle cells. They identify point mutants of FHOD3L in which formin nucleation and elongation/bundling activities are functionally separated. Expression of these mutants in neonatal rat ventricular myocytes shows that the control of actin filament elongation by formin is the major activity required for the normal assembly of functional sarcomeres.

      Strengths:

      The strength of this work is to combine sensitive biochemical assays with excellent work in neonatal rat ventricular myocytes. This combination of approaches is highly effective for analyzing the function of proteins with multiple activities in vitro.

      Weaknesses:

      FHOD3L does not seem to be the easiest formin to study because of its relatively weak nucleation activity and the short duration of capping events. This difficulty imposes rigorous biochemical analysis and careful interpretation of the data, which should be improved in this work.

      We thank the reviewer for their praise and appreciation of our work. Indeed, FHOD3L is a challenging formin to work with.

      Important points are raised here and below regarding the brief elongation events we reported. As suggested, we performed more rigorous analysis of the data and present it in the revised manuscript. We now report that from 45 dim regions analyzed, in three independent experiments with wild type FHOD3L, we detected 40 bursts. (The remaining five could be formin falling off too quickly to detect or the dim spots could be regions of inhomogeneity in intensity, not due to formin.) For comparison to the presented data with FHOD3L-CT, we analyzed the filaments in TIRF assays with no formin present. As the reviewers point out, inhomogeneities in filament intensity are normal. Thus, we examined any dim spots for pauses and/or bursts. As is now reported in Figure 2G,H, the velocity of growth of these dim spots is indistinguishable from the velocity of the rest of the filament. We acknowledge that our numbers may not be perfectly accurate, due to the noise in our system, we believe that the difference of 3-4 fold increase versus no change in rate is substantial and convincing.

      We also determined the number of dim spots per length of filament. We found a higher frequency when FHOD3L-CT or FHOD3S-CT was present vs no formin, as now shown in Figure 2 – supplements 1G and 2E.

      We were asked about the pauses we observe before bursts of elongation and how we know they are functionally relevant. The short answer is that we do not know. We reported them because they were so common: Of the 40 bursts, pauses preceded the burst in 38 cases. We cannot rule out that this pause reflects an interaction with the surface but might expect the frequency to be lower if it were. We revise the text to make our conclusions about pauses more circumspect.

      We are convinced that the brief dim events we observed in the presence of FHOD3L-CT, in fact, reflect formin-mediated elongation and worked hard to improve their presentation, in addition to the added analysis. We include new kymographs, including examples from FHOD3L, FHOD3S, K1193L, and actin alone. We hope that the reviewers are also convinced.

      This does not preclude our interest in the microfluidics and two-color assays, which will be pursued in the future. We have reached out to a colleague who is set up to repeat these measurements with microfluidics-assisted TIRF. The noise should be greatly reduced and the system is also optimal for directly visualizing labeled FHOD3, as suggested. We expect these experimental approaches will provide additional insights.

      Reviewer #2 (Public review):

      This article elucidates the biochemical and cellular mechanisms by which the FHOD-family of formins, particularly FHOD3, contributes to sarcomere formation and contractility in cardiomyocytes. Formins are mainly known to nucleate and elongate actin filaments, with certain family members also exhibiting capping, severing, and bundling activities. Although FHOD3 has been well-established as essential for sarcomere assembly in cardiomyocytes, its precise biochemical functions and contributions to actin dynamics remain poorly understood.

      In this study, the authors combine in vitro biochemical assays with cellular experiments to dissect FHOD3's roles in actin assembly and sarcomere formation. They demonstrate that FHOD3 nucleates actin filaments and acts as a transient elongator, pausing elongation after an initial burst of filament growth. Using separation-of-function mutants, they show thatFHOD3's elongation activity - rather than its nucleation, capping, or bundling capabilities - is key for its sarcomeric function.

      The experiments have been conducted rigorously and well-analyzed, and the paper is clearly written. The data presented support the authors' conclusions. I appreciate the detailed description and rationale behind the FHOD3 constructs used in this study.

      We are happy to hear others find paper to be clearly written and well described.

      However, I was somewhat surprised and a bit disappointed that while the authors conducted single-color TIRF experiments to observe the effects of FHOD3 on single filaments, they did not use fluorescently labeled FHOD3 to directly visualize its behavior. Incorporating such experiments would significantly strengthen their conclusions regarding FHOD3's bursts of elongation interspersed with capping activity. While I understand this might require a few additional weeks of experiments, these data would add considerable value by directly testing the proposed mechanism.

      We appreciate the suggestion and hope to incorporate a two-color approach soon. As noted, FHOD3L is not always easy to work with and we do not have a functional labeled copy of the protein at this time.

      There is a typo in the word "required" in line number 30. The authors also use fit data to extract parameters in several panels (e.g., Figures 2b, 2d, 3a, and 3b). While these fit functions may be intuitive to actin experts, explicitly describing the fit functions in the figure legends or methods would greatly benefit the broader readership.

      Thank you for these comments. We updated the indicated figures and described the analysis in greater detail.

      Reviewer #3 (Public review):

      Valencia et al. aim to elucidate the biochemical and cellular mechanisms through which the human formin FHOD3 drives sarcomere assembly in cardiomyocytes. To do so, they combined rigorous in vitro biochemical assays with comprehensive in vivo characterizations, evaluating two wild-type FHOD3 isoforms and two function-separating mutants. Surprisingly, they found that both wild-type FHOD3 isoforms can nucleate new actin filaments, as well as elongate existing actin filaments in conjunction with profilin following barbed-end capping. This is in addition to FHOD3's proposed role as an actin bundler. Next, the authors asked whether FHOD3L promotes sarcomere assembly in cardiomyocytes through its activity in actin nucleation or rather elongation. With two function-separating mutants, the authors evaluated the numbers and morphology of sarcomeres, as well as their ability to beat and generate cardiac rhythm. The authors found that while the wild-type FHOD3L and the K1193L mutant can rescue sarcomere morphology and physiology, the GS-FH1 mutant fails to do so. Given that in GS-FH1 mainly elongation activity is compromised, the authors concluded that the elongation activity of FHOD3 is essential for its role in sarcomere assembly in cardiomyocytes, while its nucleator activity is dispensable. Overall, this important study provided a broadened view on the biochemical activities of FHOD3, and a pioneering view on a possible cellular mechanism of how FHOD3L drives sarcomere assembly. If further validated, this can lead to new mechanistic models of sarcomere assembly and potentially new therapeutic targets of cardiomyopathy.

      The conclusions of this paper are mostly well supported by the comprehensive biochemical analyses performed by the authors. However, the sarcomere assembly defect phenotype in the GS-FH1 rescue condition requires further investigation, as the extremely low level of GS-FH1 signal in transfected cells in Figure 6A may reflect a failure of actin-binding by this construct in vivo, rather than its inability to drive elongation. Though the authors do show in Figure 6 that GS-FH1 can bind to normal-looking sarcomeres when they are present, this may be due to a lack of siRNA activity in these cells, such that endogenous FHOD3L is still present. In this possible scenario, GS-FH1 may dimerize with endogenous FHOD3L. The authors should demonstrate that GS-FH1 alone can indeed interact with existing actin filaments in vivo. While this has been clearly demonstrated in vitro, given the more complex biochemical environment in vivo where additional unknown binding partners may present, cautions should be made when extrapolating findings from the former to the latter.

      The reviewer is concerned about the low protein levels in the GS-FH1 rescue experiments as reflected in the HA fluorescence intensity distributions shown in Fig. 5 Supplement 2A. While the scenario proposed could explain our observations with the GSFH1 rescues it is quite complex. Nor does the scenario preclude the conclusion that the FH1 domain is critical. We agree that the observed sarcomeres are likely to be residual in cells with incomplete RNAi. We now include the image of a cell that is still full of sarcomeres and note that the GH-FH1 is expressed at a relatively high level and striated throughout the cell. We interpret this as evidence that GS-FH1 is stable when suitable binding sites are available. We cannot exclude that there is more GS-FH1 because there was more endogenous FHOD3L with which to heterodimerize. If the GS-FH1 heterodimer were simply poisoning the wild type protein, we do not expect that it would be bound correctly to sarcomeres. If, instead, heterodimers have some activity, it seems far from sufficient to rescue sarcomere formation, suggesting that two functional FH1 domains are critical.

      Furthermore, we do not see evidence of correlation between protein levels and rescue at the level present in these cells (addressed below). Unfortunately, the proposed IP to test whether FHOD3L binds actin in vivo would only potentially report on filament side binding (both direct and indirect). It would not address whether the GS-FH1 mutant functions as a nucleator, elongator, bundler and/or capping protein in vivo.

      The critical question that we can address is whether the phenotype is due to low protein levels, assuming the protein present is functional, or due to loss of elongation activity by FHOD3L. To address this question, we returned to our data.

      First, we plotted the distributions of the intensities of the cells we analyzed further, in addition to the automated readout of all of the cells in the dish (Fig. 4 supplement 1). These cells were selected randomly and, as should be the case, the distributions of their intensities agree well with the original distributions for the three different rescue constructs: FHOD3L, K1193L, and GS-FH1 (Fig. 6 supplement 1). We then asked whether there was any correlation in HA intensities with the sarcomere metrics. As seen in our pilot data, no correlation is evident in any of the three cases across the range of intensities we collected (400 – 2700 a.u.) (old Fig. 6 supplement C,D,E). We now replace the data from pilot experiments with analysis of HA intensities and sarcomere metrics from the data sets included in the paper (new Fig 6. Supplement 1). Again, little to no correlation was observed (the single highest r-squared value is 0.2 and the remaining eight values are less than or equal to 0.08).

      To more specifically address the question of whether low HA fluorescence intensity is likely to reflect sufficient protein levels to build sarcomeres we re-examined two data sets from the FHOD3L WT rescue data. We found that, by chance, the first replicate of data from the wild type rescue has a comparable intensity distribution to that of the GSFH1 rescues (580 +/- 261 / cell vs. 548 +/- 105 / cell). In addition, we collected all of the data from cells with intensity levels <720, designed to mimic the distribution of the GS-FH1 cells (Fig. 6 supplement 3). We then compared the sarcomere metrics (sarcomere number, sarcomere length, sarcomere width) between the full data set and the two low intensity subsets:

      • Sarcomere number is the only non-normal metric. We therefore used the Mann Whitney U test, which shows no difference between all 3 WT distributions.

      • We compared Z-line lengths by one-way ANOVA and Tukey's post hoc tests, again finding no significant difference for all distributions.

      • Sarcomere length shows a weakly significant difference (p=0.038) between the whole WT data set and bio rep 1, but no difference between the whole WT data set and the HA<720 group.

      Thus, cells expressing wild type FHOD3L at levels comparable to levels detected in GS-FH1 mutant rescues, are fully rescued. Based on these findings we conclude that the expression levels in the GS-FH1 are high enough to rescue the FHOD3 knock down, supporting our conclusion that the defect is due to loss of elongation activity. We have added this analysis and discussion to the revised manuscript.

      Recommendations for the authors:

      Reviewing Editor Comments:

      You will see that the 3 reviewers are very positive about your work and appreciate the elegant combination of biochemical assays and functional tests in cardiomyocytes. We've had a long discussion with them and we all agree that two experiments deserve further effort to make the conclusions of your paper more convincing.

      Thank you.

      The first experiment is the TIRF elongation assay, where the two biochemist Reviewers remain doubtful that these short events are really due to the presence of a formin at the end of the filament. One of them suggests that two-color imaging with a labeled formin should clearly prove this point.

      We agree that the elongation assays can be improved. Given the similarity of processivity of Fhod3L, Fhod3S and Drosophila FhodA (measured by a distinct method), we are inclined to believe them. However, the reviewer raises an excellent point about the accuracy of the measurements given the resolution (and noise) of the data. We are interested in the two-color imaging assay but do not believe it will necessarily simplify the analysis. We suspect that Fhod spends more time at/near the barbed end than is apparent based on elongation rates. The fact that we see repeated events on individual filaments at such low concentrations of FHOD3L (0.1 nM) supports this idea. Otherwise, the likelihood of FHOD3L finding barbed ends so often is really quite low.

      We will return to these experiments, using alternate methods, curious to see what else we learn. In the meantime, we conducted more thorough analysis, including controls, and improved visualization of example traces. Data for elongation analysis and kymographs were acquired with Jfilament. We stretched the x-axis (time) in kymographs for FHOD3L-CT (Fig. 2F), FHOD3S-CT (Fig. 2, supplement 2C), FHOD3L-CT K1193L (Fig. 3, supplement 1A), and actin alone (Fig 2G), and highlighted regions of analysis. The slopes for these regions, separated based on intensity, were fit to the data in KaleidaGraph. The fits are offset from the data such that they do not obscure the filaments and corresponding rates are given. The fact that we never see fast dim regions when FHOD3 is not present, as shown in Fig. 2H and that the frequency of dim events is markedly increased (Fig. 2-supplements 1G and 2E) give us confidence that the events are real. We acknowledge in the text that the precise values of the short events may be inaccurate due to the resolution of our experiments. We hope the reviewers are convinced by the improved analysis.

      The second experiment is the sarcomere assembly defect phenotype in the GS-FH1 rescue condition. This requires further investigation, as the extremely low level of GS-FH1 signal in transfected cells in Figure 6A may reflect a failure of actin-binding/nucleation in vivo, rather than its inability to elongate F-actin. Although you show that GS-FH1 can bind to sarcomeres when they are present, this may be due to a lack of siRNA activity in these cells, such that endogenous FHOD3L is still present. In this possible scenario, GS-FH1 could dimerize with endogenous FHOD3L.

      We agree that the sarcomeres we see are likely to be residual and could reflect some remaining endogenous FHOD3. The reviewers are concerned about the low protein levels in the GSFH1 rescues. First, we do not agree that the levels are “extremely” low. Through careful analysis, we established that 3xHA-FHOD3L intensities between 300 and 3000 a.u./um<sup>2</sup> were sufficient for full rescue. The mean for the GSFH1 experiments is 533 +/- 93, which is well within this range. Furthermore, we did not observe correlation between sarcomere number, length, or width and HA intensity over the full range collected for wild type FHOD3L or within the GS-FH1 data. We previously showed pilot data but now show correlation analysis for every analyzed cell (Fig. 4 – figure supplement 1 D-F). We conducted this analysis on all of the mutant rescue experiments (Fig. 6-supplement 1). Finally, we identified two subpopulations of the wildtype rescue data. One is all of the cells with HA intensity < 720, which gives a distribution of mean 545 +/- 85. The second set is the first biological replicate of wild type rescue, which has a distribution of mean 560 +/- 160. Again correlation shows little relationship between HA levels and sarcomere metrics. Nevertheless, we show intensity level matched images in Fig 6, as opposed to images reflecting average intensities.

      The critical question remains whether the phenotype is due to low protein levels or due to loss of elongation by FHOD3L. Notably, we now show a cell that is full of sarcomeres and has relatively high FHOD3L levels as well, consistent with available binding sites stabilizing mutant protein but not ruling out heterodimerization (Fig. 6 – figure supplement 2C). Others have expressed mutant FHOD3L in a wild type background in mice. They observed poisoning, consistent with heterodimerization. Thus, it is possible that, as suggested, the FHOD3L-GSFH1 detected in sarcomeres is in fact heterodimerized with residual endogenous FHOD3L. In this case, we would still conclude that the protein is not functional enough to rescue, supporting a role for the FH1 domain.

      In the future, we plan to perform experiments with compromised, but not inactive, FH1 domains, as we discuss in the paper.

      We hope that you will find these comments useful.

      Yes, the comments were thoughtful and helped us write a better paper. Thank you.

      Reviewer #1 (Recommendations for the authors):

      Some experiments should be described and analyzed more carefully. This lack of clarity calls into question the interpretation of some experiments. Overall, this study is not yet as convincing as it should be.

      Main recommendations:

      (1) Formin elongation phases in the TIRF experiment are not convincing. They are rare and it is difficult to see any significant difference between the control movie without FHOD3L-CT and the movie with FHOD3L-CT. Filaments assembled in the absence of FHOD3L-CT also show some fluorescence inhomogeneity (which is normal), and measurements of formin elongation rates and capping times are not convincing (for example, the kymograph of the control profilin-actin situation in Figure 2F also shows a fast elongation phase on the right).

      Please see response above. We conducted more thorough analysis and created improved visualizations. We hope the data are more convincing now.

      It is also difficult to understand how an accurate measurement can be made from these noisy kymographs, and the method section should explain that precisely.

      This is a valid point. We added details of analysis to the methods section and we discuss the fact that the measurements are at the limit of our resolution in the paper. We rely on the large (~3-fold) difference in elongation, more than specific elongation rates for our interpretation.

      One of the problems is that these events are too transient to quantify well with noisy data. I noticed that the formin concentration used in these movies is quite low (0.1 nM FHOD3L-CT). Is there a reason for this? Is it possible to increase the formin concentration to increase the number of formin capping/elongation events and provide more convincing movies?

      We acknowledge that the data are noisy. We felt that it was necessary to perform experiments with filaments only tethered at one end, leaving the growing end free. We did so, in part, because when we did experiments with biotinylated actin to anchor the filaments down, we observed pauses in the absence of formin. Ultimately, we compromised, using anchored seeds and a relatively low concentration of NEM-myosin to decrease motion of the actin filaments.

      The experiments were performed with such low FHOD3L-CT because it was a potent nucleator in TIRF assays, making data analysis nearly impossible with more formin present. FHOD3S-CT and FHOD3L-CT K1193L behaved somewhat differently between these experiments and we were able to perform them with 1 nM formin.

      Not seeing formin at the tip of the filaments is an additional difficulty because we do not know if these pauses occur because formin is stuck to the coverslips (which could very well happen with these sticky proteins) or freely bound at the end of a filament as the text suggests. Is there any argument in favor of one scenario over the other?

      This will be an important experiment. As described above, we suspect that Fhod spends more time at/near the barbed end than is apparent based on elongation data. The fact that we see repeated events on individual filaments at such low concentrations of FHOD3L (0.1 nM) supports this idea. Otherwise, the likelihood of FHOD3L finding barbed ends so often is really quite low. In order to address the question about the cause of pauses, we reviewed our data, finding that 38 of 40 bursts were preceded by pauses. We do, however, discuss that we cannot rule out non-specific interactions with the surface.

      (2) Pyrene elongation assays in the presence of profilin are actually more convincing to test the elongation ability of formins. However, such an assay is not presented for all mutants. It should be.

      While we agree to some extent with this comment, we did not include the pyrene data for all of the mutants because the shapes of the curves were even more complicated than those seen with wild type FHOD3L-CT rendering them uninterpretable.

      (3) Some experiments (e.g. in Figure 2E) are performed with yeast profilin, while others (e.g. in Figure 2F) are performed with human profilin. Obviously, both profilins could modulate formin activity differently and the side-by-side interpretation of both experiments is difficult. Could the authors stick to human profilin for all experiments?

      We used to always perform pyrene assays with yeast profilin because it was known to be insensitive to pyrene. These data were collected before we realized that the affinity of human profilin for actin is so high that we could probably do everything with this profilin. We have compared the two profilins for other formins, e.g. Delphilin, Capu, and did not observe detectable differences.

      Minor recommendations:

      (1) The pyrene assays with the light blue colored curve choice are not ideal. I have difficulties seeing some of the curves.

      Thank you. We added symbols to a subset of the traces to make them more visible.

      (2) In the same curves, I can't understand what the +3.75 and 0.078 numbers mean. Could these results be plotted in a clearer way?

      These values are the lowest concentrations in the range tests. They were matching light blue with black outline for visibility. We added symbols and changed the color of the numbering for improved visibility/understanding.

      (3) In Figure 2D, is the Kd of I1163A really determined only from 2 experimental data points?

      Of course not. We now show the figure with extended axes in Fig. 2 - figure supplement 1C.

      (4) In Figure 2C, the shape of the curves suggests that this is not a pure capping assay, but a mix of capping and nucleation. It's not dramatic but could lead to an under-estimation of the capping efficiency.

      We agree with the reviewer that the complicated shapes confound interpretation. Our analysis is based on the earliest slopes, in part, for this reason. We added discussion of this complication to the text.

      Reviewer #3 (Recommendations for the authors):

      Suggestions for additional experiments:

      (1) To evaluate whether GS-FH1 alone can indeed interact with existing actin filaments in vivo, the authors may consider performing immunoprecipitation assays with GS-FH1 extracted from rescued NRVMs.

      An IP of GS-FH1 from cells could show actin filament side binding but, unfortunately, will not provide any information about filament end binding, which is of much greater interest.

      It will be helpful to show phalloidin staining in GS-FH1 rescues in a similar manner as in Figure 6-supplement 1, panel B, and compare that with mock rescue in Figure 4 panel D. It will be essential to prove this prior to concluding that actin elongation activity is essential for sarcomere assembly.

      This is an excellent suggestion. We now include images of phalloidin stained cells from both K1193L and GS-FH1 rescues (Fig. 6A’ – supplement 2A,B). We were intrigued to see small actin punctae that were sometimes aligned. We speculate that these could be pre-premyofibrils and suggest that this is further evidence that the GS-FH1 protein is not completely unstable.

      (2) Prior to sarcomere assembly, a-actinin is known to form short bundles with actin filaments (I-Z-I complex) without clearly defined periodicity. This semi-ordered state then transforms into the more ordered sarcomeres with periodic spacing. It will be valuable to show the phalloidin staining in addition to the a-actinin IF consistently across all conditions. This may lead to further insights into the defects of sarcomere assembly. Along the same vein, higher magnification images showcasing several sarcomeres will help the readers evaluate these defects.

      We agree that there are additional valuable measurements to be made. In order to favor synchronized contraction, we plated the cells at too high a density to reliably identify IZI complexes. We have included some zoomed in images of the phalloidin staining.

      Recommendations for improving the writing:

      The authors mentioned the interaction between cardiac MyBP-C and FHOD3L as essential for the localization of FHOD3L to the C-line of the sarcomere. Can they discuss whether this interaction is important for the role of FHOD3L in sarcomere assembly? If so, how?

      This is a very interesting question that we cannot answer at this time.

      Minor corrections to the text and figures:

      In the legend of Figure 2-Figure Supplement 1, the labels of (F) and (E) are swapped.

      Thank you for catching this.

    1. Author response:

      eLife Assessment

      This useful study presents Altair-LSFM, a solid and well-documented implementation of a light-sheet fluorescence microscope (LSFM) designed for accessibility and cost reduction. While the approach offers strengths such as the use of custom-machined baseplates and detailed assembly instructions, its overall impact is limited by the lack of live-cell imaging capabilities and the absence of a clear, quantitative comparison to existing LSFM platforms. As such, although technically competent, the broader utility and uptake of this system by the community may be limited.

      We thank the reviewers and editors for their thoughtful evaluation of our work and for recognizing the technical strengths of the Altair-LSFM platform, including the custom-machined baseplates and detailed documentation provided to support accessibility and reproducibility. We respectfully disagree, however, with the assessment that the system lacks live-cell imaging capabilities. We are fully confident in the system’s suitability for live-cell applications and will demonstrate this by including representative live-cell imaging data in the revised manuscript, along with detailed instructions for implementing environment control. Moreover, we will expand our discussion to include a broader, more quantitative comparison to existing LSFM platforms—highlighting trade-offs in cost, performance, and accessibility—to better contextualize Altair’s utility and adaptability across diverse research settings.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The article presents the details of the high-resolution light-sheet microscopy system developed by the group. In addition to presenting the technical details of the system, its resolution has been characterized and its functionality demonstrated by visualizing subcellular structures in a biological sample.

      Strengths:

      (1) The article includes extensive supplementary material that complements the information in the main article.

      (2) However, in some sections, the information provided is somewhat superficial.

      Our goal was to make the supplemental content as comprehensive and useful as possible. In addition to the materials provided with the manuscript, our intention is for the online documentation (available at thedeanlab.github.io/altair) to serve as a living resource that evolves in response to user feedback. For this reason, we are especially interested in identifying and expanding any sections that are perceived as superficial, and we would greatly appreciate the reviewer’s guidance on which areas would benefit from further elaboration.

      Weaknesses:

      (1) Although a comparison is made with other light-sheet microscopy systems, the presented system does not represent a significant advance over existing systems. It uses high numerical aperture objectives and Gaussian beams, achieving resolution close to theoretical after deconvolution. The main advantage of the presented system is its ease of construction, thanks to the design of a perforated base plate.

      We appreciate the reviewer’s assessment and the opportunity to clarify our intent. Our primary goal was not to introduce new optical functionality beyond that of existing high-performance light-sheet systems, but rather to reduce the barrier to entry for non-specialist labs.

      (2) Using similar objectives (Nikon 25x and Thorlabs 20x), the results obtained are similar to those of the LLSM system (using a Gaussian beam without laser modulation). However, the article does not mention the difficulties of mounting the sample in the implemented configuration.

      We agree that there are practical challenges associated with handling 5 mm diameter coverslips. However, the Nikon 25x can readily be replaced by a Zeiss W Plan-Apochromat 20x/1.0 objective, which eliminates the need for the 5 mm coverslip[1]. In the revised manuscript, we will more explicitly detail the practical challenges in handling a 5 mm coverslip and mention the alternative detection objective.

      (3) The authors present a low-cost, open-source system. Although they provide open source code for the software (navigate), the use of proprietary electronics (ASI, NI, etc.) makes the system relatively expensive. Its low cost is not justified.

      We understand the reviewer’s concern regarding the use of proprietary control hardware such as the ASI Tiger Controller and NI data acquisition cards. While lower-cost alternatives for analog and digital control (e.g., microcontroller-based systems) do exist, our choice was intentional. By relying on a unified and professionally supported platform, we minimize the complexity of sourcing, configuring, and integrating components from disparate vendors—each of which would otherwise demand specialized technical expertise. Moreover, in future releases, we aim to further streamline the system by eliminating the need for the NI card, consolidating all optoelectronic control through the ASI Tiger Controller. This approach allows users to purchase a fully assembled and pre-configured system that can be operational with minimal effort.

      It is worth noting that the ASI components are not the primary cost driver. The full set—including XYZ and focusing stages, a filter wheel, a tube lens, the Tiger Controller, and basic optomechanical adapters—costs approximately $27,000, or ~18% of the total system cost. Additional cost reductions are possible. For example, replacing the motorized sample positioning and focusing stages with manual alternatives could reduce the cost by ~$12,000. However, this would eliminate key functionality such as autofocusing, 3D tiling, and multi-position acquisition. Open-source mechanical platforms such as OpenFlexure could in principle be adapted, but they would require custom assembly and would need to be integrated into our control software. Similarly, the filter wheel could be omitted in favor of a multi-band emission filter, reducing the cost by ~$5,000. However, this comes at the expense of increased spectral crosstalk, often necessitating spectral unmixing. An industrial CMOS camera—such as the Ximea MU196CR-ON, recently demonstrated in a Direct View Oblique Plane Microscopy configuration[2]—could substitute for the sCMOS cameras typically used in high-end imaging. However, these industrial sensors often exhibit higher noise floors and lower dynamic range, limiting sensitivity for low-signal imaging applications.

      While a $150,000 system represents a significant investment, we consider it relatively cost-effective in the context of advanced light-sheet microscopy. For comparison, commercially available systems with similar optical performance—such as LLSM systems from 3i or Zeiss—are several-fold more expensive.

      (4) The fibroblast images provided are of exceptional quality. However, these are fixed samples. The system lacks the necessary elements for monitoring cells in vivo, such as temperature or pH control.

      We thank the reviewer for their positive comment regarding the quality of our fibroblast images. As noted, the current manuscript focuses on the optical design and performance characterization of the system, using fixed specimens to validate resolution and imaging stability. We acknowledge the importance of environmental control for live-cell imaging. Temperature regulation is routinely implemented in our lab using flexible adhesive heating elements paired with a power supply and PID controller. For pH stabilization in systems that lack a 5% CO<sub>2</sub> atmosphere, we typically supplement the imaging medium with 10–25 mM HEPES buffer. In the revised manuscript, we will introduce a modified sample chamber capable of maintaining user-specified temperatures, along with detailed assembly instructions. We will also include representative live-cell imaging data to demonstrate the feasibility of in vitro imaging using this system.

      Reviewer #2 (Public review):

      Summary:

      The authors present Altair-LSFM (Light Sheet Fluorescence Microscope), a high-resolution, open-source microscope, that is relatively easy to align and construct and achieves sub-cellular resolution. The authors developed this microscope to fill a perceived need that current open-source systems are primarily designed for large specimens and lack sub-cellular resolution or are difficult to construct and align, and are not stable. While commercial alternatives exist that offer sub-cellular resolution, they are expensive. The authors' manuscript centers around comparisons to the highly successful lattice light-sheet microscope, including the choice of detection and excitation objectives. The authors thus claim that there remains a critical need for high-resolution, economical, and easy-to-implement LSFM systems.

      Strengths:

      The authors succeed in their goals of implementing a relatively low-cost (~ USD 150K) open-source microscope that is easy to align. The ease of alignment rests on using custom-designed baseplates with dowel pins for precise positioning of optics based on computer analysis of opto-mechanical tolerances, as well as the optical path design. They simplify the excitation optics over Lattice light-sheet microscopes by using a Gaussian beam for illumination while maintaining lateral and axial resolutions of 235 and 350 nm across a 260-um field of view after deconvolution. In doing so they rest on foundational principles of optical microscopy that what matters for lateral resolution is the numerical aperture of the detection objective and proper sampling of the image field on to the detection, and the axial resolution depends on the thickness of the light-sheet when it is thinner than the depth of field of the detection objective. This concept has unfortunately not been completely clear to users of high-resolution light-sheet microscopes and is thus a valuable demonstration. The microscope is controlled by an open-source software, Navigate, developed by the authors, and it is thus foreseeable that different versions of this system could be implemented depending on experimental needs while maintaining easy alignment and low cost. They demonstrate system performance successfully by characterizing their sheet, point-spread function, and visualization of sub-cellular structures in mammalian cells, including microtubules, actin filaments, nuclei, and the Golgi apparatus.

      We thank the reviewer for their thoughtful summary of our work. We are pleased that the foundational optical principles, design rationale, and emphasis on accessibility came through clearly. We agree that the approach used to construct the microscope is highly modular, and we anticipate that these design principles will serve as the basis for additional system variants tailored to specific biological samples and experimental contexts. To support this, we provide all Zemax simulations and CAD files openly on our GitHub repository, enabling advanced users to build upon our design and create new functional variants of the Altair system.

      Weaknesses:

      There is a fixation on comparison to the first-generation lattice light-sheet microscope, which has evolved significantly since then:

      (1) The authors claim that commercial lattice light-sheet microscopes (LLSM) are "complex, expensive, and alignment intensive", I believe this sentence applies to the open-source version of LLSM, which was made available for wide dissemination. Since then, a commercial solution has been provided by 3i, which is now being used in multiple cores and labs but does require routine alignments. However, Zeiss has also released a commercial turn-key system, which, while expensive, is stable, and the complexity does not interfere with the experience of the user. Though in general, statements on ease of use and stability might be considered anecdotal and may not belong in a scientific article, unreferenced or without data.

      The referee is correct that our comparisons reference the original LLSM design, which was simultaneously disseminated as an open-source platform and commercialized by 3i. While we acknowledge that newer variants of LLSM have been developed—including systems incorporating adaptive optics[3] and the MOSAIC platform (which remains unpublished)—the original implementation remains the most widely described and cited in the literature. It is therefore the most appropriate point of comparison for contextualizing Altair’s performance, complexity, and accessibility. Importantly, this version of LLSM is far from obsolete; it continues to be one of the most commonly used imaging systems at Janelia Research Campus’s Advanced Imaging Center.

      We acknowledge that more recent commercial implementation by Zeiss has addressed several of the practical limitations associated with the original design. In particular, we agree that the Zeiss Lattice Lightsheet 7 system, which integrates a meniscus lens to facilitate oblique imaging through a coverslip, offers a user-friendly experience—albeit with a modest tradeoff in resolution (reported deskewed resolution: 330 nm × 330 nm × 500–1000 nm).

      While we recognize that statements on usability and stability can be subjective, one objective proxy for system complexity is the number of optical elements that require precise alignment during assembly. The original LLSM setup includes approximately 29 optical components that must each be carefully positioned laterally, angularly, and coaxially along the optical path. In contrast, the first-generation Altair system contains only 9 such elements. By this metric, Altair is considerably simpler to assemble and align, supporting our overarching goal of making high-resolution light-sheet imaging more accessible to non-specialist laboratories. In the revised manuscript, we will clarify the scope of our comparison and provide more precise language about what we mean by complexity (e.g., number of optical elements needed to align).

      (2) One of the major limitations of the first generation LLSM was the use of a 5 mm coverslip, which was a hinderance for many users. However, the Zeiss system elegantly solves this problem, and so does Oblique Plane Microscopy (OPM), while the Altair-LSFM retains this feature, which may dissuade widespread adoption. This limitation and how it may be overcome in future iterations is not discussed.

      We agree that the use of 5 mm diameter coverslips, while enabling high-NA imaging in the current Altair-LSFM configuration, may serve as an inconvenience for many users. We will discuss this more explicitly in the revised manuscript. Specifically, we note that changing the detection objective is sufficient to eliminate the need for a 5 mm coverslip. For example, as demonstrated in Moore et al., Lab Chip 2021, pairing the Zeiss W Plan-Apochromat 20x/1.0 objective with the Thorlabs TL20X-MPL allows imaging beyond the physical surfaces of both objectives, removing the constraint imposed by small-format coverslips[1]. In the revised manuscript, we will propose this modification as a straightforward path for increasing compatibility with more conventional sample mounting formats.

      (3) Further, on the point of sample flexibility, all generations of the LLSM, and by the nature of its design, the OPM, can accommodate live-cell imaging with temperature, gas, and humidity control. It is unclear how this would be implemented with the current sample chamber. This limitation would severely limit use cases for cell biologists, for which this microscope is designed. There is no discussion on this limitation or how it may be overcome in future iterations.

      We appreciate the reviewer’s emphasis on the importance of environmental control for live-cell imaging applications. It is worth noting that the original LLSM design, including the system commercialized by 3i, provided temperature control only, without integrated gas or humidity regulation. Despite this, it has been successfully used by a wide range of scientists to generate important biological insights.

      We agree that both OPM and the Zeiss implementation of LLSM offer clear advantages in terms of environmental control, as we previously discussed in detail in Sapoznik et al., eLife, 2020[4]. However, assembly of high numerical aperture OPM systems is highly technical, and no open-source variant of OPM delivers sub-cellular scale resolution yet.

      (4) The authors' comparison to LLSM is constrained to the "square" lattice, which, as they point out, is the most used optical lattice (though this also might be considered anecdotal). The LLSM original design, however, goes far beyond the square lattice, including hexagonal lattices, the ability to do structured illumination, and greater flexibility in general in terms of light-sheet tuning for different experimental needs, as well as not being limited to just sample scanning. Thus, the Alstair-LSFM cannot compare to the original LLSM in terms of versatility, even if comparisons to the resolution provided by the square lattice are fair.

      We thank the reviewer for this comment. It is true that our discussion focused primarily on the square lattice implementation of LLSM. While this could be viewed as a subset of the system’s broader capabilities, we chose this focus intentionally, as the square lattice remains by far the most commonly used variant in practice. Even in the original LLSM publication, 16 out of 20 figure subpanels utilized the square lattice, with only one panel each representing the hexagonal lattice in SIM mode, a standard Bessel beam in incoherent SIM mode, a hex lattice in dithered mode, and a single Bessel in dithered mode. This usage pattern largely reflects the operational simplicity of the square lattice: it minimizes sidelobe growth and enables more straightforward alignment and data processing compared to hexagonal or structured illumination modes.

      In 2019, we performed an exhaustive accounting of published illumination modes in LLSM and found that the SIM mode had only been used in two additional peer-reviewed publications at that time. We will consider updating this table in the revised manuscript and will expand our discussion to acknowledge the broader flexibility of the LLSM platform—including its capacity for structured illumination and alternative light-sheet geometries. However, we will also emphasize that, despite these advanced capabilities, the square lattice remains the dominant mode used by the community and therefore serves as a fair and practical benchmark for comparison.

      (5) There is no demonstration of the system's live-imaging capabilities or temporal resolution, which is the main advantage of existing light-sheet systems.

      In the revised manuscript, we will include a demonstration of live-cell imaging to directly validate the system’s suitability for dynamic biological applications. We will also characterize the temporal resolution of the system. As a sample-scanning microscope, the imaging speed is primarily limited by the performance of the Z-piezo stage. For simplicity and reduced optoelectronic complexity, we currently power the piezo through the ASI Tiger Controller. We will expand the supplementary material to describe the design criteria behind this choice, including potential trade-offs, and provide data quantifying the achievable volume rates under typical operating conditions.

      While the microscope is well designed and completely open source, it will require experience with optics, electronics, and microscopy to implement and align properly. Experience with custom machining or soliciting a machine shop is also necessary. Thus, in my opinion, it is unlikely to be implemented by a lab that has zero prior experience with custom optics or can hire someone who does. Altair-LSFM may not be as easily adaptable or implementable as the authors describe or perceive in any lab that is interested, even if they can afford it. The authors indicate they will offer "workshops," but this does not necessarily remove the barrier to entry or lower it, perhaps as significantly as the authors describe.

      We appreciate the reviewer’s perspective and agree that building any high-performance custom microscope—Altair-LSFM included—requires a baseline familiarity with optics and instrumentation. Our goal is not to eliminate this requirement entirely, but to significantly reduce the technical and logistical barriers that typically accompany custom light-sheet microscope construction.

      Importantly, no machining experience or in-house fabrication capabilities are required—users can simply submit provided design files and specifications directly to the vendor. We will make this process as straightforward as possible by supplying detailed instructions, recommended materials, and vendor-ready files. Additionally, we draw encouragement from the success of related efforts such as mesoSPIM, which has seen over 30 successful implementations worldwide using a similar model of exhaustive online documentation, open-source control software, and community support through user meetings and workshops.

      We recognize that documentation alone is not always sufficient, and we are committed to further lowering barriers to adoption. To this end, we are actively working with commercial vendors to streamline procurement and reduce the logistical burden on end users. Additionally, Altair-LSFM is supported by a Biomedical Technology Development and Dissemination (BTDD) grant, which provides dedicated resources for hosting workshops, offering real-time community support, and generating supplementary materials such as narrated video tutorials. We will expand our discussion in the revised manuscript to better acknowledge these implementation challenges and outline our ongoing strategies for supporting a broad and diverse user base.

      There is a claim that this design is easily adaptable. However, the requirement of custom-machined baseplates and in silico optimization of the optical path basically means that each new instrument is a new design, even if the Navigate software can be used. It is unclear how Altair-LSFM demonstrates a modular design that reduces times from conception to optimization compared to previous implementations.

      We appreciate the reviewer’s comment and agree that our language regarding adaptability may have been too strong. It was not our intention to suggest that the system can be easily modified without prior experience. Meaningful adaptations of the optical or mechanical design would require users to have expertise in optical layout, optomechanical design, and alignment.

      That said, for labs with sufficient expertise, we aim to facilitate such modifications by providing comprehensive resources—including detailed Zemax simulations, CAD models, and alignment documentation. These materials are intended to reduce the development burden for those seeking to customize the platform for specific experimental needs.

      In the revised manuscript, we will clarify this point and explicitly state in the discussion what technical expertise is required to modify the system. We will also revise our language around adaptability to better reflect the intended audience and realistic scope of customization.

      Reviewer #3 (Public review):

      Summary:

      This manuscript introduces a high-resolution, open-source light-sheet fluorescence microscope optimized for sub-cellular imaging.

      The system is designed for ease of assembly and use, incorporating a custom-machined baseplate and in silico optimized optical paths to ensure robust alignment and performance. The authors demonstrate lateral and axial resolutions of ~235 nm and ~350 nm after deconvolution, enabling imaging of sub-diffraction structures in mammalian cells.

      The important feature of the microscope is the clever and elegant adaptation of simple gaussian beams, smart beam shaping, galvo pivoting and high NA objectives to ensure a uniform thin light-sheet of around 400 nm in thickness, over a 266 micron wide Field of view, pushing the axial resolution of the system beyond the regular diffraction limited-based tradeoffs of light-sheet fluorescence microscopy.

      Compelling validation using fluorescent beads and multicolor cellular imaging highlights the system's performance and accessibility. Moreover, a very extensive and comprehensive manual of operation is provided in the form of supplementary materials. This provides a DIY blueprint for researchers who want to implement such a system.

      Strengths:

      (1) Strong and accessible technical innovation: With an elegant combination of beam shaping and optical modelling, the authors provide a high-resolution light-sheet system that overcomes the classical light-sheet tradeoff limit of a thin light-sheet and a small field of view. In addition, the integration of in silico modelling with a custom-machined baseplate is very practical and allows for ease of alignment procedures. Combining these features with the solid and super-extensive guide provided in the supplementary information, this provides a protocol for replicating the microscope in any other lab.

      (2) Impeccable optical performance and ease of mounting of samples: The system takes advantage of the same sample-holding method seen already in other implementations, but reduces the optical complexity. At the same time, the authors claim to achieve similar lateral and axial resolution to Lattice-light-sheet microscopy (although without a direct comparison (see below in the "weaknesses" section). The optical characterization of the system is comprehensive and well-detailed. Additionally, the authors validate the system imaging sub-cellular structures in mammalian cells.

      (3) Transparency and comprehensiveness of documentation and resources: A very detailed protocol provides detailed documentation about the setup, the optical modeling, and the total cost.

      Weaknesses:

      (1) Limited quantitative comparisons: Although some qualitative comparison with previously published systems (diSPIM, lattice light-sheet) is provided throughout the manuscript, some side-by-side comparison would be of great benefit for the manuscript, even in the form of a theoretical simulation. While having a direct imaging comparison would be ideal, it's understandable that this goes beyond the interest of the paper; however, a table referencing image quality parameters (taken from the literature), such as signal-to-noise ratio, light-sheet thickness, and resolutions, would really enhance the features of the setup presented. Moreover, based also on the necessity for optical simplification, an additional comment on the importance/difference of dual objective/single objective light-sheet systems could really benefit the discussion.

      In the revised manuscript, we will expand our discussion to include a broader range of light-sheet microscope designs and imaging modes, including both single- and dual-objective configurations. We agree that highlighting the trade-offs between these approaches—such as working distance, sample geometry constraints, and alignment complexity—will enhance the overall context and utility of the manuscript.

      To further aid comparison, we will include a summary table referencing key image quality parameters such as lateral and axial resolution, and illumination beam NA for Altair-LSFM. Where available, we will reference values from published work—such as the axial resolution reported in Valm et al. (Nature, 2017)—to provide a clearer benchmark. Because such comparisons can be technically nuanced, especially when comparing across systems with different geometries and sample mounting constraints, we will also include a supplementary note outlining the assumptions and limitations of these comparisons.

      (2) Limitation to a fixed sample: In the manuscript, there is no mention of incubation temperature, CO₂ regulation, Humidity control, or possible integration of commercial environmental control systems. This is a major limitation for an imaging technique that owes its popularity to fast, volumetric, live-cell imaging of biological samples.

      We thank the reviewer for highlighting this important consideration. In the revised manuscript, we will provide a detailed description of how temperature control can be implemented using flexible adhesive heating elements, a power supply, and a PID controller. Step-by-step assembly instructions and recommended components will be included to facilitate adoption by users interested in live-cell imaging. We also note that most light-sheet microscopy systems capable of sub-cellular resolution—including the original LLSM design, diSPIM, and ASLM—typically do not incorporate integrated CO<sub>2</sub> or humidity control. These systems often rely on HEPES-buffered media to maintain pH stability, which is generally sufficient for short- to intermediate-term imaging. While full environmental control may be necessary for extended time-lapse studies, it is not a prerequisite for high-resolution volumetric imaging in many applications. Nonetheless, we will include a discussion of the challenges associated with adding CO<sub>2</sub> and humidity control to open or semi-enclosed architectures like Altair-LSFM, and outline potential future paths for integration with commercial incubation systems.

      (3) System cost and data storage cost: While the system presented has the advantage of being open-source, it remains relatively expensive (considering the 150k without laser source and optical table, for example). The manuscript could benefit from a more direct comparison of the performance/cost ratio of existing systems, considering academic settings with budgets that most of the time would not allow for expensive architectures. Moreover, it would also be beneficial to discuss the adaptability of the system, in case a 30k objective could not be feasible. Will this system work with different optics (with the obvious limitations coming with the lower NA objective)? This could be an interesting point of discussion. Adaptability of the system in case of lower budgets or more cost-effective choices, depending on the needs.

      We thank the reviewer for raising this important point. First, we would like to clarify that the quoted $150k cost estimate includes the optical table and laser source. We apologize for any confusion and will communicate this more effectively in the revised manuscript.

      We agree that adaptability is a key concern, especially in academic settings with limited budgets. The detection path can be readily altered depending on experimental needs and cost constraints. For example, in our discussion of alternatives to the 5 mm coverslip geometry, we will describe how switching to a Zeiss W Plan-Apochromat 20x/1.0 in combination with a compatible excitation objective allows high-resolution imaging while accommodating more conventional sample formats. We will expand this to include cost-effective alternatives as well.

      We will also expand our discussion on cost-reduction strategies and the associated trade-offs. These include replacing motorized stages with manual ones, omitting the filter wheel in favor of a multi-band emission filter, or using industrial-grade cameras in place of scientific CMOS detectors. While each change entails some loss in functionality or sensitivity, such modifications allow users to tailor the system to their specific budget and application.

      Finally, we recognize the challenge in communicating exact costs of commercial systems due to variability in configuration and pricing. Nonetheless, we will include approximate figures where possible and note that comparable commercial systems—such as LLSM platforms from 3i and Zeiss—are several-fold more expensive than the system presented here.

      Last, not much is said about the need for data storage. Light-sheet microscopy's bottleneck is the creation of increasingly large datasets, and it could be beneficial to discuss more about the storage needs and the quantity of data generated.

      Data storage is indeed a critical consideration in light-sheet microscopy. In the revised manuscript, we will provide a note outlining typical volume dimensions for live-cell imaging experiments along with the associated data overhead. This will include estimates for voxel counts, bit depth, time-lapse acquisitions, and multi-channel datasets to help users anticipate storage needs. We will also briefly discuss strategies for managing large datasets, file types and compression formats.

      Conclusion:

      Altair-LSFM represents a well-engineered and accessible light-sheet system that addresses a longstanding need for high-resolution, reproducible, and affordable sub-cellular light-sheet imaging. While some aspects-comparative benchmarking and validation, limitation for fixed samples-would benefit from further development, the manuscript makes a compelling case for Altair-LSFM as a valuable contribution to the open microscopy scientific community.

      References

      (1) Moore, R. P. et al. A multi-functional microfluidic device compatible with widefield and light sheet microscopy. Lab Chip 22, 136-147 (2021). https://doi.org/10.1039/d1lc00600b

      (2) Lamb, J. R., Mestre, M. C., Lancaster, M. & Manton, J. D. Direct-view oblique plane microscopy. Optica 12, 469-472 (2025). https://doi.org/10.1364/OPTICA.558420

      (3) Liu, T. L. et al. Observing the cell in its native state: Imaging subcellular dynamics in multicellular organisms. Science 360 (2018). https://doi.org/10.1126/science.aaq1392

      (4) Sapoznik, E. et al. A versatile oblique plane microscope for large-scale and high-resolution imaging of subcellular dynamics. eLife 9 (2020). https://doi.org/10.7554/eLife.57681

      (5) Huisken, J. & Stainier, D. Y. Even fluorescence excitation by multidirectional selective plane illumination microscopy (mSPIM). Opt Lett 32, 2608-2610 (2007). https://doi.org/10.1364/ol.32.002608

      (6) Ricci, P. et al. Removing striping artifacts in light-sheet fluorescence microscopy: a review. Prog Biophys Mol Biol 168, 52-65 (2022). https://doi.org/10.1016/j.pbiomolbio.2021.07.003

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mollá-Albaladejo et al. investigate the neurons downstream of GR64f and Gr66a, called G2Ns. They identify downstream neurons using trans-Tango labeling with RFP and then perform bulk RNA-seq on the RFP-sorted cells. Gene expression is up- or downregulated between the cell populations and between fed and starved states. They specifically identify Leukocinin as a neuropeptide that is upregulated in starved Gr66a cells. Leucokinin cells, identified by a GAL4 line indeed show higher expression when starved, especially in the SEZ. Furthermore, Leucokinin cells colocalize with the transTango signal from downstream neurons of both GRs. This connection is confirmed with GRASP. According to EM data, Leucokinin cells in the SEZ receive a lot of input and connect to many downstream neurons. In behavior experiments performed with flies lacking Leucokinin neurons, flies show reduced responsiveness to sugar and bitter mixtures when starved. The authors suggest that Leucokinin neurons integrate bitter and sugar tastes and that their output is modified by a hunger state.

      Strengths:

      The authors use a multitude of tools to identify SELK neurons downstream of taste sensory neurons and as starvation-sensitive cells. This study provides an example of how combining genetic labeling, RNA-seq, and EM analysis can be combined to investigate neural circuits.

      Weaknesses:

      The authors do not show a functional connection between sensory neurons and SELK neurons. Additionally, data from RNA seq, anatomical studies, and EM analysis are sometimes contradictory in terms of connectivity. GRASP signal is not foolproof that cells are synaptically connected.

      We appreciate the reviewer’s comments. Unfortunately, we have not successfully demonstrated a functional response of SELK neurons using in vivo calcium imaging with UAS-GCaMP7 (we tried f, m, and s versions), primarily due to challenges in obtaining stable signals. We stimulated GRNs using sucrose, caffeine, or a mixture of both, and maybe even if the concentrations were high, they were not enough to induce a response.

      Regarding GRASP, we acknowledge its limitations as a standalone technique for establishing genuine synaptic connections between neurons, as some signals may reflect false positives resulting from the mere proximity of the candidate neurons. To strengthen our findings, we complemented these results by demonstrating the positive colocalization of the Leucokinin antibody signal over the Gr66aGal4>trans-TANGO and Gr64f-Gal4>trans-TANGO (Figure 4), confirming that Leucokinin neurons are indeed postsynaptic to both sweet and bitter GRNs. Moreover, we incorporated BacTrace data to highlight the direct connectivity between sweet and bitter GRNs (now Figure 5E).

      In the revised manuscript, we have introduced the active-GRASP technique (Macpherson et al., 2015). In this version of GRASP, the presynaptic half of GFP (GFP 1-10) is fused to synaptobrevin, which becomes accessible in the membrane of the presynaptic neuron within the synaptic cleft upon presynaptic stimulation (in our case, by stimulating with sucrose sweet Gr64f<sup>GRNs</sup> and with caffeine the bitter Gr66a<sup>GRNs</sup>). Utilizing this technique, we successfully demonstrated (see new Figure 5B and 5D) that when presented with water, no signal was detected in the Gr66a-LexA, Lk-Gal4 > active-GRASP, or Gr64f-LexA, Lk-Gal4 > active-GRASP transgene flies. However, in the presence of caffeine, Gr66aLexA, Lk-Gal4 > active-GRASP transgene flies exhibited a clear signal in the SEZ, and similarly, sucrose presentation to Gr64f-LexA, Lk-Gal4 > active-GRASP transgene flies yielded a detectable signal. The results obtained from active-GRASP provide additional evidence supporting the connectivity between SELK neurons and both Gr64f<sup>GRNs</sup> and Gr66a<sup>GRNs</sup>, further indicating the functional connectivity of the GRNs and SELK neurons.

      The authors describe a behavioral phenotype when flies are starved, however, they do not use a specific driver for the described cell type, thus they should also tone down their claims.

      We agree with the reviewer that the Lk-Gal4 driver line used labels SELK, LHLK, and ABLK neurons. The behavior examined in this paper, the Proboscis Extension Response (PER), measures the initiation of feeding. Although the neural circuit involved in this behavior is primarily confined to the SEZ where SELK neurons are located, we cannot rule out the possibility that other Lk neurons may also play a role in the process. To restrict expression of the Tetanus Toxin, we have utilized the tsh-Gal80 (Clyne et al., 2008) transgene in combination with the Lk-Gal4>UAS-TNT and Lk-Gal4>UAS-TNT<sup>imp</sup> constructs to prevent the expression of the Tetanus Toxin in ABLK neurons, thereby restricting its expression to the SELK and LHLK neurons in the central brain. The new results (Sup Figure 7A) indicate that ABLK neurons do not play a role in integrating sweet and bitter information. However, we acknowledge the reviewer's point that we are still silencing LHLK neurons, so we have adjusted our claims to align more closely with our data

      Generally, the authors do not provide a big advancement to the field and some of the results are contradictory with previous publications.

      We believe our work does not contradict previous findings, nor does it invalidate the role of ABLK neurons in water homeostasis or the role of LHLK neurons in regulating sleep via starvation. We provide additional information on the possible role of SELK neurons in integrating gustatory information. The location of SELK neurons in the SEZ suggests that they may play a role in feeding behavior, and we have demonstrated that these neurons are indeed involved in integrating gustatory information to influence feeding decisions. We consider we have contributed by highlighting a new role for the Leucokinin neuropeptide in feeding behavior.

      Reviewer #2 (Public review):

      Summary:

      A core task of the brain is processing sensory cues from the environment. The neural mechanisms of how sensory information is transmitted from peripheral sense organs to subsequent being processing in defined brain centers remain an important topic in neuroscience. The taste system hereby assesses the palatability of food by evaluating the chemical composition and nutrient content while integrating the current need for energy by assessing the satiation level of the organism. The current manuscript provides insights into the early circuits of gustatory coding using the fruit fly as a model. By combining trans-tango and FACS- based bulk RNAseq to assess the target neurons of sweet sensing (using Gr64fGal4) and bitter sensing (using Gr66a-Gal4) in a first set of experiments the authors investigate genes that are differentially expressed or co-expressed in normal and starved conditions. With a focus on neuropeptides and neurotransmitters, different expressions in the different conditions were assessed resulting in the identification of Leucokinin as a potentially interesting gene. The notion is further supported by RNAseq of Lk- Gal4>mCD8:GFP sorted cells and immunostainings. GRASP and BacTrace experiments further support that the two Lk- expressing cells in the SEZ should indeed be postsynaptic to both types of sensories. Using EM-based connectomics data (based on a previous publication by Engert et al.), the authors also look for downstream targets of the bitter versus sweet gustatory neurons to identify the Lk-neurons. Based on the morphology they identify candidates and further depict the potential downstream neurons in the connectome, which appears largely in agreement with GRASP experiments. Finally silencing the Lk- neurons shows an increased PER response in starved flies (when combined with bitter compounds) as well as increased feeding neurons shows an increased PER response in starved flies (when combined with bitter compounds) as well as increased feeding in a FlyPad assay. Strengths:

      Overall this is an intriguing manuscript, which provides insight into the organization of 2nd order gustatory neurons. It specifically provides strong evidence for the Lk-neurons as a target of sweet and bitter GRNs and provides evidence for their role in regulating sweet vs bitter-based behavioral responses. Particularly the integration of different techniques and datasets in an elegant fashion is a strong side of the manuscript. Moreover to put the known LK-neurons into the context of 2nd order gustatory signalling is strengthening the knowledge about this pathway.

      Weaknesses:

      I do not see any major weakness in the current manuscript. Novelty is to some degree lessened by the fact, that the RNAseq approach did not identify new neurons but rather put the known LK-neurons as major findings. Similarly, the final behavioral section is not very deep and to some degree corroborates the previous publication by the Keene and Nässel labs - that said, the model they propose is indeed novel (but lacks depth in analyses; e.g. there is no physiology that would support the modulation of Lk neurons by either type of GRN). The connectomic section appears a bit out of place and after reading it it's not really clear what one should make of the potential downstream neurons (particularly since the Lk-receptor expression has been previously analyzed); here it might have been interesting to address if/how Lk-neurons may signal directly via a classical neurotransmitter (an information that might be found easily in the adult brain single-cell data).

      We thank the reviewer for the comment. Indeed, we attempted in vivo Ca imaging but were unsuccessful. We have rewritten the connectomic section to better integrate it with the rest of the text and have reanalyzed the data obtained. We considered gathering data from the single-cell adult dataset, but this dataset includes the entire adult fly brain, encompassing SELK and LHLK neurons, making it impossible to differentiate between the two types of Lk neurons. Any further analysis will require transcriptomic analysis of SELK via scRNAseq under the different metabolic conditions tested in this study work.

      Reviewer #3 (Public review):

      Summary:

      To make feeding decisions, animals need to process three types of information: positive cues like sweetness, negative cues like bitterness, and internal states such as hunger or satiety. This study aims to identify where the information is integrated into the fruit fly brain. The authors applied RNA sequencing on second-order gustatory neurons responsible for sweet and bitter processing, under fed and starved conditions. The sequencing data reveal significant changes in gene expression across sweet vs. bitter pathways and fed vs. starved states. The authors focus on the neuropeptide Leucokinin (Lk), whose expression is dependent on the starvation state. They identify a pair of neurons, named SELK neurons, which express Lk and receive direct input from both sweet and bitter gustatory neurons. These SELK neurons are ideal candidates to integrate gustatory and internal state information. Behavioral experiments show that blocking these neurons in starved flies alters their tolerance to bitter substances during feeding.

      Strengths:

      (1) The study employs a well-designed approach, targeting specific neuronal populations, which is more efficient and precise compared to traditional large-scale genetic screening methods.

      (2) The RNAseq results provide valuable data that can be utilized in future studies to explore other molecules beyond Lk.

      (3) The identification of SELK neurons offers a promising avenue for future research into how these neurons integrate conflicting gustatory signals and internal state information.

      Weaknesses:

      (1) Unfortunately, due to technical challenges, the authors were unable to directly image the functional activity of SELK neurons.

      (2) In the behavioral experiments, tetanus toxin was used to block SELK neurons. Since these neurons may release multiple neurotransmitters or neuropeptides, the results do not specifically demonstrate that Leucokinin (Lk) is the critical factor, as suggested in Figure 8. To address this, I recommend using RNAi to inhibit Lk expression in SELK neurons and comparing the outcomes to wild-type controls via the PER assay.

      We appreciate the author's comments and suggestions. As noted, Tetanus Toxin silences the neuron’s activity, affecting the functioning of various neurotransmitters and neuropeptides released by the targeted neuron. In response to the reviewer's recommendation, we employed an RNAi line specifically designed to silence Leucokinin production in Lk-expressing neurons.

      The results presented in Supplementary Figure 7B demonstrate that knocking down Leucokinin in Lk neurons significantly reduces the flies' tolerance to caffeine in sweet food.

      It is crucial to highlight that the sucrose concentration used in Figure 7C was 50mM, whereas in Supplementary Figure 7B, it was increased to 100mM. This adjustment was necessary because the Lk-Gal4, UAS-RNAi, and Lk-Gal4>UAS-RNAi transgenic lines exhibited reduced sensitivity to sucrose compared to the Lk-Gal4>UAS-TNT or Lk-Gal4>UAS-TNT<sup>imp</sup> lines. We aimed to establish a sucrose concentration that would elicit a 50% Proboscis Extension Response (PER) without adding any other compound, thereby allowing us to evaluate the additional effect of caffeine in the food.

      However, according to the data derived from the connectome, SELK neurons might be cholinergic, and this neurotransmitter might be involved in controlling also the behavior of the flies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      To get more evidence for connections between sensory cells and SELK neurons, could the authors also analyze a second available EM data set? Would setting a different threshold (>5 synapses) reveal connections to both sensories? Comparisons between SELK in- and outputs from EM data and Tango labeling also seem to differ quite a lot based on provided images - can the authors count cell bodies in the stainings? Further proof would be to provide functional imaging data that shows that SELK neurons respond to sugar and bitter compounds.

      In this study, we utilized the recently published EM dataset for the Drosophila central brain connectome (Dorkenwald et al., 2024; Flywire.ai). Changing the number of synapses affects the counts of pre- and postsynaptic neurons. We set a threshold of more than five synapses, as recommended by Flywire, to avoid false positives (Dorkenwald et al., 2024). This threshold has been widely used in recent papers (Engert et al., 2022; Shiu et al., 2022; Walker et al., 2025).

      The neuron counts in the connectomic data differ from those in the trans- and retro-TANGO experiments. In our initial trans-TANGO experiment, which labeled postsynaptic neurons in the Gr64fGal4 and Gr66a-Gal4 transgenic lines, we counted the labeled neurons (see Supplementary Figure 1C) and observed considerable variability between different brains. Due to anticipated variability, we did not count the labeled neurons from trans-TANGO and retro-TANGO techniques in the Leucokinin neurons. Furthermore, neither technique labels all postsynaptic or presynaptic neurons, respectively. A recent study on the retro-TANGO technique (Sorkac et al., 2023) found a minimum threshold: the presynaptic neuron must form a certain number of synapses with the neuron of interest to be adequately labeled. According to this paper, the established threshold is 17 synapses. It is likely that the trans-TANGO technique also has a threshold relating to the number of labeled neurons, contingent on the synapse count. This would explain the discrepancy between the two results.

      Unfortunately, we have not been able to provide functional data pointing to the activation of SELK neurons by sucrose or caffeine. However, our active-GRASP data indicates that the connectivity between Gr64f<sup>GRNs</sup> and Gr66a<sup>GRNs</sup> with SELK neurons is present and functional.

      How many Leucokinin-positive cells are in the SEZ? Does the RNA-seq data provide further information about the SELK neurons? Potential receptor candidates for how they integrate hunger signals? AMPKa was described to be required in LHLK neurons.

      There are two SELK neurons in the SEZ. Due to the nature of our bulk RNA sequencing (RNAseq), we cannot link any additional gene expressions detected in our transcriptomic analysis specifically to the SELK neurons regarding the integration of various signaling processes. Furthermore, the single-cell RNA sequencing (scRNAseq) data available from the Drosophila brain, as reported by Li et al. (2022), does not allow accurate differentiation between SELK and LHLK neurons. To understand how these neurons integrate both metabolic and sensory information, it is crucial to conduct a focused RNAseq study specifically on the SELK neurons to understand how these neurons integrate both metabolic and sensory information. This targeted analysis would provide the necessary insights to elucidate their functional roles better. However, according to the data derived from the connectome, SELK neurons might be cholinergic, and this neurotransmitter might be involved in controlling also the behavior of the flies.

      According to previous studies (Yurgel et al., 2019), the Lk-GAL4 line is also expressed in the VNC, thus the authors could make use of the tsh-GAL80 tool to clean up the line. This study also performed GCaMP imaging in fed and 24h starved animals in SELK and couldn't find a difference, can the authors explain this discrepancy?

      We thank the reviewer for this suggestion. We have now added a new piece of data using the tsh-Gal80 transgene in our PER experiments (Supplementary Figure 7A). Blocking the expression of TNT in the ABLK neurons does not affect the main conclusion of the behavioral results. As stated previously, we were unable to obtain in vivo Ca imaging responses in SELK neurons upon exposure to sucrose, caffeine, or mixtures of sucrose and caffeine. We do not believe this is a discrepancy with previous works like Yurgel et al., 2019. It is likely that we faced technical issues regarding expression stability and that the stimulation was possibly too weak to detect changes in GFP levels

      Reviewer #2 (Recommendations for the authors):

      As mentioned above I do not have any major comments on the manuscript, but there are a few points that I feel should be considered:

      (1) The identification of the Lk-candidate neurons in the connectome remains a bit mysterious. In the method sections, this reads as follows "manual and visual criteria were applied to identify the neurons of interest ". a) What precisely was done to get to the candidates?b) Are there alternative candidates that may be Lk-neurons? c) How would another neuron affect the conclusion of the downstream analysis?

      We thank the reviewer for this comment. We have now modified and added new information in the connectomic section, reinforcing our conclusions and correcting the results obtained.

      Our GRASP, BacTRace, and immunohistochemistry experiments pointed to SELK neurons as postsynaptic to both Gr64f<sup>GRNs</sup> (sweet) and Gr66a<sup>GRNs</sup> (bitter). To identify which neurons in the connectome could be the SELK neurons, we utilized a previously described set of GRNs already identified in the connectome (Shiu et al., 2022). We extracted all postsynaptic neurons to the sweet and bitter GRNs identified and intersected both datasets, retaining only those candidate hits receiving simultaneous input from sweet and bitter GRNs. This process yielded a total of 333 hits. Through visual inspection, we discarded all hits that were merely neuronal fragments or neurons that clearly were not our candidates. We narrowed the list down to a final set of 17 candidate neurons whose arborization was located in the SEZ. We reduced the candidates to two final entries from this list: ID 720575940623529610 (GNG.276) and ID 720575940630808827 (GNG.685). The GNG.276 neuron had a counterpart in the SEZ identified as GNG.246. Both of these neurons were annotated as DNg70 in the Flywire database. GNG.685 had a counterpart identified as GNG.595, and these two neurons were classified as DNg68. In both cases, the neuronal candidates, DNg70 and DNg68, were classified as descending neurons, a characteristic of previously described SELK neurons (Nässel et al., 2021). In our initial analysis published in bioRxiv and sent for revision, we identified DNg70 as potentially the SELK neurons based solely on the morphology of the neurons via visual inspection. However, we employed a better method to determine which candidate is more likely to be the SELK neurons, concluding that DNg68, rather than DNg70, represents the SELK neurons. Briefly, we performed an immunohistochemistry for GFP in the Lk-Gal4>UAS-CD8:GFP flies. We aligned the resulting image in a Drosophila reference brain (JRC2018 U) using the CMTK Registration plugin in ImageJ. The resulting image was skeletonized using the Single Neurite Tracer plugin in ImageJ and later uploaded to the Flywire Gateway platform to compare the structure of the aligned and skeletonized SELK neurons to our candidates. This comparison clearly indicated that the DNg68 neurons are the best candidates for representing the SELK neurons, rather than DNg70. We have updated the text and Figures 6 and Supplementary Figure 6 to reflect the new results. These new results do not alter the conclusions of the paper.

      (2) In the transcriptomic experiments It seems that the raw transcripts are reporters, rather than normalised data. Why?

      All transcriptomic data is normalized. In Figure 1 the differential expression was calculated using Deseq2 normalized counts. In Figure 2, Transcripts Per Million (TPM) were calculated using the Salmon package and normalized for the gene length.

      (3) The expression of nAChRbeta1 in the transcriptomic data is rather striking. However, this remains currently not addressed: is this expression real?

      We have not confirmed the upregulation or downregulation in gene expression for other but for Leucokinin, which is our main interest. We found the presence of nAChRbeta1 interesting, as GRNs are cholinergic (Jaeger et al., 2018), suggesting that it would make sense to find cholinergic receptors in G2Ns. However, it is possible that these receptors are expressed in all G2Ns and serve as a common means of communication.

      (4) The description of the behavioural experiments in the results section is rather brief. I had a hard time following it since the genotypes are not repeated nor is it stated what is different in the experimental group vs control (but instead simply what changes in the experimental group, in a rather discussion-like fashion).

      We thank the reviewer for the comment, we have rewritten this section to improve its clarity.

      (5) If I understand the genetics for the behavioural experiments correctly it addresses the entire Lk-Gal4 expressing population, thus it is not possible to describe the role of the two SEZ neurons, but rather LkGal4 neurons. This should be clarified.

      We thank the reviewer for this comment. Indeed, the Lk-Gal4 driver we used drives expression in all Leucokinin neurons, making it impossible to distinguish between the SELK, LHLK, or ABLK neurons. We have added a new piece of behavioral data by using the tsh-Gal80 transgene to prevent the expression of TNT in the ABLK neurons (Supplementary Figure 7A), but still we cannot distinguish between SELK and LHLK. We have rewritten the text to clarify this fact.

      Reviewer #3 (Recommendations for the authors):

      Overall, the manuscript is well-written, I only have one minor suggestion for improvement. In Figure 8C, please clarify the use of TNT to block Lk release.

      We thank the reviewer for the comment, we have clarified the use of TNT in the text.

      References Clyne, J. D. & Miesenböck, G. Sex-Specific Control and Tuning of the Pattern Generator for Courtship Song in Drosophila. Cell 133, 354–363 (2008).

      Dorkenwald, S. et al. Neuronal wiring diagram of an adult brain. Nature 634, 124–138 (2024).

      Engert, S., Sterne, G. R., Bock, D. D. & Scott, K. Drosophila gustatory projections are segregated by taste modality and connectivity. Elife 11, e78110 (2022).

      Jaeger, A. H. et al. A complex peripheral code for salt taste in Drosophila. Elife 7, e37167 (2018).

      Macpherson, L. J. et al. Dynamic labelling of neural connections in multiple colours by trans-synaptic fluorescence complementation. Nat Commun 6, 10024 (2015).

      Nässel, D. R. Leucokinin and Associated Neuropeptides Regulate Multiple Aspects of Physiology and Behavior in Drosophila. Int J Mol Sci 22, 1940 (2021).

      Shiu, P. K., Sterne, G. R., Engert, S., Dickson, B. J. & Scott, K. Taste quality and hunger interactions in a feeding sensorimotor circuit. eLife 11, e79887 (2022).

      Walker, S. R., Peña-Garcia, M. & Devineni, A. V. Connectomic analysis of taste circuits in Drosophila. Sci. Rep. 15, 5278 (2025).

    1. Author response:

      Reviewer #1:

      As this code was developed for use with a 4096 electrode array, it is important to be aware of double-counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas. Firstly, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code.

      We thank the reviewer for this insightful comment. We agree that signals from the same neuron may be collected by adjacent channels. To address this concern in our software, we plan to add a routine to SpikeMAP that allows users to discard nearby channels where spike count correlations exceed a pre-determined threshold. Because there is no ground truth to map individual cells to specific channels on the hd-MEA, a statistical approach is warranted.

      Secondly, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.

      This is a valid concern. To ensure that firing rates are relatively constant over the duration of a recording, we will plot average spike rates using rolling windows of a fixed duration. We expect that population firing rates will remain relatively stable across the duration of recordings.

      Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.

      We agree that further cycles of experiments could be performed with SOM, VIP, and other neuronal subtypes, and we hope that researchers will take advantage of SpikeMAP too. We will clarify this possibility in the Discussion section of the manuscript.

      Reviewer #2:

      Summary:

      While I find that the paper is nicely written and easy to follow, I find that the algorithmic part of the paper is not really new and should have been more carefully compared to existing solutions. While the GT recordings to assess the possibilities of a spike sorting tool to distinguish properly between excitatory and inhibitory neurons are interesting, spikeMAP does not seem to bring anything new to state-of-the-art solutions, and/or, at least, it would deserve to be properly benchmarked. I would suggest that the authors perform a more intensive comparison with existing spike sorters.

      We thank the reviewer for this comment. As detailed in Table 1, SpikeMAP is the only method that performs E/I sorting on large-scale multielectrodes, hence a comparison to competing methods is not currently possible. That being said, many of the pre-processing steps of SpikeMAP (Figure 1) involve methods that are already well-established in the literature and available under different packages. To highlight the contribution of our work and facilitate the adoption of SpikeMAP, we plan to provide a “modular” portion of SpikeMAP that is specialized in performing E/I sorting and can be added to the pipeline of other packages such as KiloSort more clearly.  This modularized version of the code will be shared freely along with the more complete version already available.

      Weaknesses:

      (1) The global workflow of spikeMAP, described in Figure 1, seems to be very similar to that of Hilgen et al. 2020 (10.1016/j.celrep.2017.02.038). Therefore, the first question is what is the rationale of reinventing the wheel, and not using tools that are doing something very similar (as mentioned by the authors themselves). I have a hard time, in general, believing that spikeMAP has something particularly special, given its Methods, compared to state-of-the-art spike sorters.

      We agree with the reviewers that there are indeed similarities between our work and the Hilgen et al. paper. However, while the latter employs optogenetics to stimulate neurons on a large-scale array, their technique does not specifically target inhibitory (e.g., PV) neurons as described in our work. We will clarify our paper accordingly.

      This is why, at the very least, the title of the paper is misleading, because it lets the reader think that the core of the paper will be about a new spike sorting pipeline. If this is the main message the authors want to convey, then I think that numerous validations/benchmarks are missing to assess first how good spikeMAP is, with reference to spike sorting in general, before deciding if this is indeed the right tool to discriminate excitatory vs inhibitory cells. The GT validation, while interesting, is not enough to entirely validate the paper. The details are a bit too scarce for me, or would deserve to be better explained (see other comments after).

      The title of our work will be edited to make it clear that while elements of the pipeline are well-established and available from other packages, we are the first to extend this pipeline to E/I sorting on large-scale arrays.

      (2) Regarding the putative location of the spikes, it has been shown that the center of mass, while easy to compute, is not the most accurate solution [Scopin et al, 2024, 10.1016/j.jneumeth.2024.110297]. For example, it has an intrinsic bias for finding positions within the boundaries of the electrodes, while some other methods, such as monopolar triangulation or grid-based convolution, might have better performances. Can the authors comment on the choice of the Center of Mass as a unique way to triangulate the sources?

      We agree with the reviewer and will point out limits of the center-of-mass algorithm based on the article of Scopin et al (2024). Further, we will augment the existing code library to include monopolar triangulation or grid-based convolution as options available to end-users.

      (3) Still in Figure 1, I am not sure I really see the point of Spline Interpolation. I see the point of such a smoothing, but the authors should demonstrate that it has a key impact on the distinction of Excitatory vs. Inhibitory cells. What is special about the value of 90kHz for a signal recorded at 18kHz? What is the gain with spline enhancement compared to without? Does such a value depend on the sampling rate, or is it a global optimum found by the authors?

      We will clarify these points. Specifically, the value of 90kHz was chosen because it provided a reasonable temporal characterization of spikes; this value, however, can be adjusted within the software based on user preference.

      (4) Figure 2 is not really clear, especially panel B. The choice of the time scale for the B panel might not be the most appropriate, and the legend filtered/unfiltered with a dot is not clear to me in Bii.

      We will re-check Fig.2B which seems to have error in rendering, likely due to conversion from its original format.

      In panel E, the authors are making two clusters with PCA projections on single waveforms. Does this mean that the PCA is only applied to the main waveforms, i.e. the ones obtained where the amplitudes are peaking the most? This is not really clear from the methods, but if this is the case, then this approach is a bit simplistic and does not really match state-of-the-art solutions. Spike waveforms are quite often, especially with such high-density arrays, covering multiple channels at once, and thus the extracellular patterns triggered by the single units on the MEA are spatio-temporal motifs occurring on several channels. This is why, in modern spike sorters, the information in a local neighbourhood is often kept to be projected, via PCA, on the lower-dimensional space before clustering. Information on a single channel only might not be informative enough to disambiguate sources. Can the authors comment on that, and what is the exact spatial resolution of the 3Brain device? The way the authors are performing the SVD should be clarified in the methods section. Is it on a single channel, and/or on multiple channels in a local neighbourhood?

      Here, the reviewer is suggesting that it may be better to perform PCA on several channels at once, since spikes can occur at several channels at the same time. To address this concern, small routine will be written allowing users to choose how many nearby channels to be selected for PCA.

      (5) About the isolation of the single units, here again, I think the manuscript lacks some technical details. The authors are saying that they are using a k-means cluster analysis with k=2. This means that the authors are explicitly looking for 2 clusters per electrode? If so, this is a really strong assumption that should not be held in the context of spike sorting, because, since it is a blind source separation technique, one cannot pre-determine in advance how many sources are present in the vicinity of a given electrode. While the illustration in Figure 2E is ok, there is no guarantee that one cannot find more clusters, so why this choice of k=2? Again, this is why most modern spike sorting pipelines do not rely on k-means, to avoid any hard-coded number of clusters. Can the authors comment on that?

      It is true that k=2 is a pre-determined choice in our software. In practice, we found that k>2 leads to poorly defined clusters. However, we will ensure that this parameter can be adjusted in the software. Furthermore, if the user chooses not to pre-define this value, we will provide the option to use a Calinski-Harabasz criterion to select k.

      (6) I'm surprised by the linear decay of the maximal amplitude as a function of the distance from the soma, as shown in Figure 2H. Is it really what should be expected? Based on the properties of the extracellular media, shouldn't we expect a power law for the decay of the amplitude? This is strange that up to 100um away from the soma, the max amplitude only dropped from 260 to 240 uV. Can the authors comment on that? It would be interesting to plot that for all neurons recorded, in a normed manner V/max(V) as function of distances, to see what the curve looks like.

      We share the reviewer’s concern and will add results that include a population of neurons to assess the robustness of this phenomenon.

      (7) In Figure 3A, it seems that the total number of cells is rather low for such a large number of electrodes. What are the quality criteria that are used to keep these cells? Did the authors exclude some cells from the analysis, and if yes, what are the quality criteria that are used to keep cells? If no criteria are used (because none are mentioned in the Methods), then how come so few cells are detected, and can the authors convince us that these neurons are indeed "clean" units (RPVs, SNRs, ...)?

      We applied stringent criteria to exclude cells, and we will revise the main text to be clear about these criteria, which include a minimum spike rate and the use of LDA to separate out PCA clusters. For the cells that were retained, we will include SNR estimates.

      (8) Still in Figure 3A, it looks like there is a bias to find inhibitory cells at the borders, since they do not appear to be uniformly distributed over the MEA. Can the authors comment on that? What would be the explanation for such a behaviour? It would be interesting to see some macroscopic quantities on Excitatory/Inhibitory cells, such as mean firing rates, averaged SNRs... Because again, in Figure 3C, it is not clear to me that the firing rates of inhibitory cells are higher than Excitatory ones, whilst they should be in theory.       

      We will include a comparison of firing rates for E and I neurons. It is possible that I cells are located at the border of the MEA due to the site of injections of the viral vector, and not because of an anatomical clustering of I cells per se. We will clarify the text accordingly.

      (9) For Figure 3 in general, I would have performed an exhaustive comparison of putative cells found by spikeMAP and other sorters. More precisely, I think that to prove the point that spikeMAP is indeed bringing something new to the field of spike sorting, the authors should have compared the performances of various spike sorters to discriminate Exc vs Inh cells based on their ground truth recordings. For example, either using Kilosort [Pachitariu et al, 2024, 10.1038/s41592-024-02232-7], or some other sorters that might be working with such large high-density data [Yger et al, 2018, 10.7554/eLife.34518].

      As mentioned previously, Kilosort and related approaches do not address the problem of E/I identification (see Table 1). However, they do have pre-processing steps in common with SpikeMAP. We will add some specific comparison points – for instance, the use of k-means and PCA (which is more common across packages) and the use of cubic spline interpolation (which is less common). Further, we will provide a stand-alone E/I sorting module that can be added to the pipeline of other packages, so that users can use this functionality without having to migrate their entire analysis.

      (10) Figure 4 has a big issue, and I guess the panels A and B should be redrawn. I don't understand what the red rectangle is displaying.

      We apologize for this issue. It seems there was a rendering problem when converting the figure from its original format. We will address this issue in the revised version of the manuscript.

      (11) I understand that Figure 4 is only one example, but I have a hard time understanding from the manuscript how many slices/mice were used to obtain the GT data? I guess the manuscript could be enhanced by turning the data into an open-access dataset, but then some clarification is needed. How many flashes/animals/slices are we talking about? Maybe this should be illustrated in Figure 4, if this figure is devoted to the introduction of the GT data.

      We will mention how many flashes/animals/slices were employed in the GT data and provide open access to these data.

      (12) While there is no doubt that GT data as the ones recorded here by the authors are the most interesting data from a validation point of view, the pretty low yield of such experiments should not discourage the use of artificially generated recordings such as the ones made in [Buccino et al, 2020, 10.1007/s12021-020-09467-7] or even recently in [Laquitaine et al, 2024, 10.1101/2024.12.04.626805v1]. In these papers, the authors have putative waveforms/firing rate patterns for excitatory and inhibitory cells, and thus, the authors could test how good they are in discriminating the two subtypes.

      We thank the reviewer for the suggestion that SpikeMAP could be tested on artificially generated spike trains and will add the citation of the two papers mentioned. We hope future efforts will employ SpikeMAP on both synthetic and experimental data to explore the neural dynamics of E and I neurons in healthy and pathological circuits of the brain.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The Authors investigated the anatomical features of the excitatory synaptic boutons in layer 1 of the human temporal neocortex. They examined the size of the synapse, the macular or the perforated appearance and the size of the synaptic active zone, the number and volume of the mitochondria, the number of the synaptic and the dense core vesicles, also differentiating between the readily releasable, the recycling and the resting pool of synaptic vesicles. The coverage of the synapse by astrocytic processes was also assessed, and all the above parameters were compared to other layers of the human temporal neocortex. The Authors conclude that the subcellular morphology of the layer 1 synapses is suitable for the functions of the neocortical layer, i.e. the synaptic integration within the cortical column. The low glial coverage of the synapses might allow the glutamate spillover from the synapses enhancing synaptic crosstalk within this cortical layer.

      Strengths:

      The strengths of this paper are the abundant and very precious data about the fine structure of the human neocortical layer 1. Quantitative electron microscopy data (especially that derived from the human brain) are very valuable, since this is a highly time- and energy consuming work. The techniques used to obtain the data, as well as the analyses and the statistics performed by the Authors are all solid, strengthen this manuscript, and support the conclusions drawn in the discussion.

      Comments on latest version:

      The third version of this paper has been substantially improved. The English is significantly better, there are only few paragraphs and sentences which are hard to understand (see my comments and suggestions below). Almost all of my suggestions were incorporated.

      We would like to thank the reviewer for the comments and incorporated the suggestions within the latest version of the manuscript.

      Remaining minor concerns:

      About epileptic and non-epileptic (non-affected) tissue. I am aware that temporal lobe neocortical tissue derived from epileptic patients is regarded as non-affected by many groups, and they are quite similar to the cortex of non-epileptic (tumour) patients in their electrophysiological properties and synaptic physiology. But please, note, that one paper you cited did not use samples from epileptic patients, but only tissue from non-epileptic tumor patients (Molnár et al. PLOS 2008).

      When you look deeper, and make thorough comparison of tissues derived from epileptic and non-epileptic patients, there are differences in the fine structure, as well as in several electrophysiological features. See for example Tóth et al., J Physiol, 2018, where higher density of excitatory synapses were found in L2 of neocortical samples derived from epileptic patients compared to non-epileptic (tumor) patients. Furthermore, the appearance of population bursts is similar, but their occurrence is more frequent and their amplitude is higher in tissue from epileptic compared to non-epileptic patients. So, I still cannot agree, that temporal neocortex of epileptic patients with the seizure focus in the hippocampus would be non-affected. Therefore I suggested to use the term biopsy tissue.

      We are thankful for this comment on using non-epileptic tissue also by others. We are also aware that Molnár et al. 2008 worked with tumor tissue.

      It is still not emphasized in the first paragraph of the Discussion, that only excitatory axon terminals were investigated.

      We now mentioned in the first paragraph of the discussion that only excitatory synaptic boutons were investigated.

      The text in the Results and the Discussion are somewhat inconsistent.

      The last two paragraphs of the Results section ends with several sentences which should be part of the discussion, such as line 328: This finding strongly supports multivesicular release... or line 344: --- pointing towards a layer-specific regulation of the putative RRP. Moreover, the results suggest that... and line 370: ... it is most likely... Please, correct this.

      We disagree with the reviewer on these points because these sentences summarizes the findings.

      The first paragraph of the Discussion summarizes the work of the quantitative EM work and gives one conclusion about the astrocytic coverage. This last sentence is inconsistent with the other parts of the paragraph. I would either write that "astrocytic coverage was also investigated" (or something similar), or move this sentence to the paragraph which discusses the astrocytic coverage.

      Results line 180-183. "Special connections" between astrocytic processes and synaptic boutons are mentioned, but not shown. Either show these (but then prove with staining!), or leave out this paragraph.

      We deleted this paragraph as suggested.

      Reviewer #2 (Public review):

      Summary:

      The study of Rollenhagen et al examines the ultrastructural features of Layer 1 of human temporal cortex. The tissue was derived from drug-resistant epileptic patients undergoing surgery, and was selected as further from the epilepsy focus, and as such considered to be non-epileptic. The analyses has included 4 patients with different age, sex, medication and onset of epilepsy. The manuscript is a follow-on study with 3 previous publications from the same authors on different layers of the temporal cortex:

      Layer 4 - Yakoubi et al 2019 eLife

      Layer 5 - Yakoubi et al 2019 Cerebral Cortex,

      Layer 6 - Schmuhl-Giesen et al 2022 Cerebral Cortex

      They find, the L1 synaptic boutons mainly have single active zone a very large pool of synaptic vesicles and are mostly devoid of astrocytic coverage.

      Strengths:

      The MS is well written easy to read. Result section gives a detailed set of figures showing many morphological parameters of synaptic boutons and surrounding glial elements. The authors provide comparative data of all the layers examined by them so far in the Discussion. Given that anatomical data in human brain are still very limited, the current MS has substantial relevance. The work appears to be generally well done, the EM and EM tomography images are of very good quality. The analyses is clear and precise.

      Weaknesses:

      The authors made all the corrections required and answered all of my concerns, included additional data sets, and clarified statements where needed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor suggestions:

      Synaptic density, lines 189-193. If you say "comparatively" high, then compare to something (cite your own work for the other layers, and tell the approximative values for the other layers). Same in line 194 comparably high to what? Other option: say "relatively high".

      We corrected the sentences as suggested by the reviewer.

      Line 206: When present, mitochondria (comma missing)

      Corrected as suggested by the reviewer.

      Line 265: Dot is missing at the end of the sentence (after Shapira et al. 2003)

      Corrected as suggested by the reviewer.

      Lines 300-301: Check the English for this sentence: significant difference BETWEEN TWO sublaminae and not significant difference for both sublaminae.

      Corrected as suggested by the reviewer.

      Lines 304-305: Check the sentence, please, it is not understandable without the text in parenthesis.

      Corrected as suggested by the reviewer.

      Line 354 Dot missing at the end of the sentence (after Figure 6A, B)

      Corrected as suggested by the reviewer.

      Line 354-358: Please rephrase this sentence (too complicated, not understandable). I do not understand why results of the L4, L5, L6 are described here. What does it mean "Astrocytes and their fine processes formed a relatively dense, but a comparably loose network within the neuropil in L1"? Dense or loose?

      In the experiment measuring the volume fraction of astrocytic processes (Figure 6C), all six cortical layers were analyzed, thus we compared the values obtained for L1 with the results for L4, L5 and L6. For more clarity, we rephrased the sentence: “Astrocytes and their fine processes formed a relatively dense network in L4 and L5, but a comparably loose one within the neuropil in L1…” We also rephrased other sentences in this paragraph (as also suggested below).

      Lines 359-369: Please rephrase this paragraph. The sentences are too complicated, have too many parentheses, and are not understandable. I suggest to write first how many synapses were examined in L1 and L4, then how many of them were on spine and on dendrites (either n or %). Then give the values how many (n or %) of them were "tripartite synapses", out of spine synapses and of dendritic synapses in both layers. How many of them were partially covered in both layers. Please, write the data in a systematic way. The best would be to give the values in a table as well. This way it will be more understandable (now, it is chaotic, hard to follow).

      We rephrased the paragraph and added a new table (3).

      Line 383: Dot missing from the end of the sentence.

      Corrected as suggested by the reviewer.

      Line 436: Reconsider "comparably low compared to". The comparably means what in this case? The whole paragraph is hard to understand, please, check and review for improvements to the use of English or use chatGPT to check it.

      We corrected the sentence according to the reviewer’s suggestion.

      Line 487: Same thing again: "The comparably largest size of the RP in L1 when compared..." What would you like to say with "comparably"? Check the meaning of this word in a dictionary, please. I have the feeling that you are using this word instead of "relatively".

      Corrected as suggested by the reviewer.

      Line 488 "and TO that found fot L4 and L5 in rodents..."

      Corrected as suggested by the reviewer.

      Line 493-495: Same again, comparably when compared, correct, please.

      Corrected as suggested by the reviewer.

      Supplemental figures: Now I do understand why Hu-01 and Hu-02 are twice, and I think, 3 patients were examined for L1a and three for L1b. But which side is which on the subfigures? Left side (Hu-01, 02 03) was used for L1a, or L1b? Could you write this in the legend, or mark on the figure (at least at one subfigure), please?

      We implemented a comment for clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Concerning the grounding in experimental phenomenology, it would be beneficial to identify specific experiments to strengthen the model. In particular, what evidence supports reversible beta cell inactivation? This could potentially be tested in mice, for instance, by using an inducible beta cell reporter, treating the animals with high glucose levels, and then measuring the phenotype of the marked cells. Such experiments, if they exist, would make the motivation for the model more compelling.

      There is some direct evidence of reversible beta cell inactivation in rodent / in vitro models. We had already mentioned this in the discussion, but we have added some text emphasizing / clarifying the role of this evidence (lines 359–362).

      Others have also argued that some analyses of insulin treatment in conventional T2D, which has a stronger effect in patients with higher glucose before treatment, provides indirect evidence of reversal of glucotoxicity. We have also mentioned this in the revised paper (lines 284–285).

      For quantitative experiments, the authors should be more specific about the features of beta cell dysfunction in KPD. Does the dysfunction manifest in fasting glucose, glycemic responses, or both? Is there a ”pre-KPD” condition? What is known about the disease’s timescale?

      The answers to some of these questions are not entirely clear—patients present with very high glucose, and thus must be treated immediately. Due to a lack of antecedent data it is not entirely clear what the pre-KPD condition is, but there is some evidence that KPD is at least not preceded by diabetes symptoms. This point is already noted in the introduction of the paper and Table 1. However, we have added a small note clarifying that this does not rule out mild hyperglycemia, as in prediabetes (and indeed, as our model might predict) (lines 76–77). Similarly, due to the necessity of immediate insulin treatment, it is not clear from existing data whether the disorder manifests more strongly in fasting glucose or glucose response, although it is likely in both. (We might infer this since continuous insulin treatment does not produce fasting hypoglycemia, and the complete lack of insulin response to glucose shortly after presentation should produce a strong effect in glycemic response.) We believe our existing description of KPD lists all of the relevant timescales, however we have also slightly clarified this description in response to the first referee’s comments (lines 66–73, 83)

      The authors should also consider whether their model could apply to other conditions besides KPD. For example, the phenomenology seems similar to the ”honeymoon” phase of T1D. Making a strong case for the model in this scenario would be fascinating.

      This is an excellent idea, which had not occurred to us. We have briefly discussed this possibility in the remission (lines 281–291), but plan to analyze it in more detail in a future manuscript.

      Reviewer #1 (Recommendations for the author):

      Whenever simulation results are presented, parameter values should be specified right there in the figure captions.

      We have added the values of glucotoxicity parameters to the caption of Figure 2. In other figures, we have explicitly mentioned which panel of Figure 2 the parameters are taken from. Description of the non-glucotoxicity parameters is a bit cumbersome (there are a lot of them, but our model of fast dynamics is slightly different from Topp et al. so it does not suffice to simply say we took their parameters) so we have referred the reader to the Materials and Methods for those.

      I was confused by the language in Figure 4. Could the authors clarify whether they argue that: (1) the observed KPD behaviour is the result of the system switching from one stable state to another when perturbed with high glucose intake? (2) the observed KPD behaviour is the result of one of the steady states disappearing with high glucose intake?

      What we mean to say is that during a period of high sugar intake or exogeneous insulin treatment, one of the fixed points is temporarily removed—it is still a fixed point of the “normal” dynamics, but not a fixed point of the dynamics with the external condition added. Since when glucose (insulin) intake is high enough, only the low (high)-β fixed point is present, under one of these conditions the dynamics flow toward that fixed point. When the external influx of glucose/insulin is turned off, both fixed points are present again—but if the dynamics have moved sufficiently far during the external forcing, the fixed point they end up in will have switched from one fixed point to the other. We have edited the text to make this clearer (lines 153–185). Do note, however, that in response to both referee’s comments (see below), Figures 3 and 4 have been replaced with more illuminating ones. This specific point is now addressed by the new Figure 3.

      The adaptation of the prefactor ’c’ was confusing to me. I think I understood it in the end, but it sounded like, ”here’s a complication, but we don’t explain it because it doesn’t really matter”. I think the authors can explain this better (or potentially leave out the complication with ’c’ altogether?).

      Indeed, the existence of an adaptation mechanism is important for our overall picture of diabetes pathogenesis, but not for many of our analyses, which assume prediabetes. Nonetheless, we agree that the current explanation of it’s role is confusing because of its vagueness. We have elaborated the explanation of the type of dynamics we assume for c, adding an equation for its dynamics to the “Model” section of the Materials and methods, explained in lines 456–465. We have also amended Figure 1 to note this compensation.

      I expect the main impact of this work will be to get clinical practitioners and biomedical researchers interested in the intermediate timescale dynamics of β-cells and take seriously the possibility that reversible inactive states might exist. But this impact will only be achieved when the results are clearly and easily understandable by an audience that is not familiar with mathematical modelling. I personally found it difficult to understand what I was supposed to see in the figures at first glance. Yes, the subtle points are indeed explained in the figure captions, but it might be advantageous to make the points visually so clear that a caption is barely needed. For example, when claiming that a change in parameters leads to bistability, why not plot the steady state values as a function of that parameter instead of showing curves from which one has to infer a steady state?

      I would advise the authors to reconsider their visual presentation by, e.g., presenting the figures to clinical practitioners or biomedical researchers with just a caption title to test whether such an audience can decipher the point of the figure! This is of course merely a personal suggestion that the authors may decide to ignore. I am making this suggestion only because I believe in the quality of this work and that improving the clarity of the figures and the ease with which one can understand the main points would potentially lead to a much larger impact on the presented results.

      This is a very good point. We have made several changes. Firstly, we have added smaller panels showing the dynamics of β to Figure 2; previously, the reader had to infer what was happening to β from G(t). Secondly, we have completely replaced the two figures showing dβ/dt, and requiring the reader to infer the fixed points of β, with bifurcation diagrams that simply show the fixed points of G and β. The new figures show through bifurcation diagrams how there are multiple fixed points in KPD, how glucose or insulin treatment force the switching of fixed points, and how the presence of bistability depends on the rate of glucotoxicity. (These new figures are Fig. 3–5 in the revised manuscript.)

      Could the authors explicitly point out what could be learned from their work for the clinic? At the moment treatment consists of giving insulin to patients. If I understand correctly, nothing about the current treatment would change if the model is correct. Is there maybe something more subtle that could be relevant to devising an optimal treatment for KPD patients?

      This is another very good point. We have added a new figure (Fig. 7) in our results section showing how this model, or one like it, can be analyzed to suggest an insulin treatment schedule (once parameters for an individual patient can be measured), and added some discussion of this point (lines 224–240) as well as lifestyle changes our model might suggest for KPD patients to the discussion (lines 413–425).

      Similarly, could the authors explicitly point out how their model could be experimentally tested? For example, are the functions f(G) and g(G) experimentally accessible? Related to that, presumably the shape of those functions matters to reproduce the observed behaviour. Could the authors comment on that / analyze how reproducing the observed behaviour puts constraints on the shape of the used functions and chosen parameter values?

      g(G) has not been carefully measured in cellular data, however it could be in more quantative versions of existing experiments. Further, our model indeed requires some general features for the forms of f(G) and g(G) to produce KPD-like phenomena. We have added some comment on this to the discussion section of the revised manuscript (lines 367–372).

      Could the authors explicitly spell out which parameters they think differ between individual KPD patients, and which parameters differ between KPD patients and ’regular’ type 2 diabetics?

      In general we expect all parameters should vary both among KPD patients and between KPD / “conventional” T2D. The primary parameter determining whether KPD and conventional T2D, is seen, however, is the ratio kIN/kRE. We have elaborated on both these points in the revised mansuscript. (Lines 186–192, 250–257.)

      I was confused about the timescale of remission. At one point the authors write “KPD patients can often achieve partial remission: after a few weeks or months of treatment with insulin” but later the authors state that “the duration of the remission varies from 6 months to 10 years”.

      The former timescale is the typical timescale achieve remission. After remission is reached, however, it may or may not last—patients may experience a relapse, where their condition worsens and they again require insulin. We have edited the text to clarify this distinction (lines 66–73).

      When the authors talk about intermediate timescales in the main text could they specify an actual unit of time, such as days, weeks, or months as it would relate to the rate constants in their model for those transitions?

      We have done so (lines 86–87, figure 1 caption, figure 2 caption). Getting KPD-like behavior requires (at high glucose) the deactivation process to be somewhat faster than the reactivation process, so the relevant scales are between weeks (reactivation) and days (deactivation at high G).

      The authors state ”Our simple model of β-cell adaptation also neglects the known hyperglycemiainduced leftward shift in the insulin secretion curve f(G) in Eq. (2)) ”. This seems an important consideration. Could the authors comment on why they did not model this shift, and/or explicitly discuss how including it is expected to change the model dynamics?

      We agree that this process seems potentially relevant, as it seems to happen on a relatively fast timescale compared to glucose-induced β-cell death. It is, however, not so well characterized quantitatively that including it is a simple matter of putting in known values—we would be making assumptions that would complicate the interpretation of our results.

      It is clear that this effect will need to be considered when quanitatively modelling real patient data. However, it is also straightforward to argue that this effect by itself cannot produce KPD-like symptoms, and will only tend to reduce the rate of glucotoxocity necessary to produce bibstability. We have added a discussion of this in the revisions (lines 307–315). We have also, in general, expanded the discussion of the effects that each neglected detail we have mentioned is expected to have (lines 292–315).

      The authors end with a statement that their results may “contribute to explanation of other observations that involve rapid onset or remission of diabetes-like phenomena, such as during pregnancy or for patients on very low calorie diets.” Could the authors spell out exactly how their model potentially relates to these phenomena?

      Our thinking is that, even when another direct cause, such as loss of insulin resistance, is implicated in reversal of diabetes, some portion of the effect may be explained by reversal of glucotoxicity. This is indeed at this point just a hypothesis, but we have expanded on it briefly in the revision. (Lines 281–291.)

      Minor typos:

      In Figure 2.D the last zero of 200 on the axis was cut off.

      Line 359 - there is a missing word ”in the analysis”.

      We have fixed these typos, thanks.

      Reviewer #2 (Recommendations for the author):

      The manuscript could be significantly improved in two key areas: the presentation of the analysis, and the relation with experimental phenomenology.

      Regarding the analysis presentation, the figures could be substantially enhanced with minimal effort from the authors. At present, they are sparse, lack legends, and offer only basic analysis. The authors should consider presenting, for example, a bifurcation diagram for beta cell mass and fasting glucose levels as a function of kIN, and how insulin sensitivity and average meal intake modulate this relationship. The goal should be to present clear, testable predictions in an intuitive manner. Currently, the specific testable predictions of the model are unclear.

      The response to this question is copied from the reponses to related questions from the first referee.

      This is a very good point. We have made several changes. Firstly, we have added smaller panels showing the dynamics of β to Figure 2; previously, the reader thad to infer what was happening to β from G(t). Secondly, we have completely replaced the two figures showing dβ/dt, and requiring the reader to infer the fixed points of β, with bifurcation diagrams that simply show the fixed points of G and β. The new figures show through bifurcation diagrams how there are multiple fixed points in KPD, how glucose or insulin treatment force the switching of fixed points, and how the presence of bistability depends on the rate of glucotoxicity. We have also supplemented our phase diagram that shows the effects of SI and the total beta cell population with bifurcation diagrams showing β as SI and βTOT are varied. (These new figures are Fig. 3–5 in the present manuscript.) Finally, we have added another figure analyzing the model’s predictions for the optimal insulin treatment and the resulting time needed to achieve remission (Fig. 7)

    1. Author response:

      Reviewer #1 (Public review):

      The manuscript titled "The distinct role of human PIT in attention control" by Huang et al. investigates the role of the human posterior inferotemporal cortex (hPIT) in spatial attention. Using fMRI experiments and resting-state connectivity analyses, the authors present compelling evidence that hPIT is not merely an object-processing area, but also functions as an attentional priority map, integrating both top-down and bottom-up attentional processes. This challenges the traditional view that attentional control is localized primarily in frontoparietal networks.

      The manuscript is strong and of high potential interest to the cognitive neuroscience community. Below, I raise questions and suggestions to help with the reliability, methodology, and interpretation of the findings.

      Thank you for a nice summary of the key points of our study. Below you will find our responses to your questions.

      (1) The authors argue that hPIT satisfies the criteria for a priority map, but a clearer justification would strengthen this claim. For example, how does hPIT meet all four widely recognized criteria, such as spatial selectivity, attentional modulation, feature invariance, and input integration, when compared to classical regions such as LIP or FEF? A more systematic summary of how hPIT meets these benchmarks would be helpful. Additionally, to what extent are the observed attentional modulations in hPIT independent of general task difficulty or behavioral performance?

      Great suggestions! For the first suggestion, we will include a clearer justification in the revised manuscript. For the second one, all participants received task practice prior to scanning, and task accuracy exceeded 90% (we will explicitly report the accuracy rate in revision), suggesting the tasks were not overly demanding. Although ceiling effects limit the interpretability of behavioral-performance correlations, we argue that higher task demands would likely require greater attentional effort, leading to stronger modulation in hPIT, which aligns with our findings when we manipulated the attentional load.

      (2) The authors report that hPIT modulation is invariant to stimulus category, but there appear to be subtle category-related effects in the data. Were the face, scene, and scrambled images matched not only in terms of luminance and spatial frequency, but also in terms of factors such as semantic familiarity and emotional salience? This may influence attentional engagement and bias interpretation.

      The response of hPIT is generally insensitive to stimulus category, however, the reviewer is correct in noticing that attentional modulation in hPIT is slightly stronger to faces than scenes and scrambled images. Although faces used in the task had neutral expressions and the scene pictures were also neutral, it is indeed possible that potential semantic familiarity or emotional salience may contribute to the subtle category-related effects in the results of experiment 3. This point will be noted in the revised manuscript.

      (3) The result that attentional load modulates hPIT is important and adds depth to the main conclusions. However, some clarifications would help with the interpretation. For example, were there observable individual differences in the strength of attentional modulation? How consistent were these effects across participants?

      Yes, individual differences exist. In the revised manuscript, we will include individual subject data points in the figure 6B.

      (4) The resting-state data reveal strong connections between hPIT and both dorsal and ventral attention networks. However, the analysis is correlational. Are there any complementary insights from task-based functional connectivity or latency analyses that support a directional flow of information involving hPIT? In addition, do the authors interpret hPIT primarily as a convergence hub receiving input from both DAN and VAN, or as a potential control node capable of influencing activity in these networks? Also, were there any notable differences between hemispheres in either the connectivity patterns or attentional modulation?

      We agree that besides resting-state connection, task-based functional connectivity analyses would have the potential to provide additional information about whether hPIT serves as a convergence node or a control hub. While fMRI data are not the best to generate directional flow of information due to the low temporal resolution, we will conduct task-based functional connectivity analyses.

      We also observed modest hemispheric asymmetries in connectivity—for instance, both left and right hPIT showed stronger connectivity with right-hemisphere attention nodes. This will be described in the revised supplement.

      (5) A few additional questions arise regarding the anatomical characteristics of hPIT: How consistent were its location and size across participants? Were there any cases where hPIT could not be reliably defined? Given the proximity of hPIT to FFA and LOp, how was overlap avoided in ROI definition? Were the functional boundaries confirmed using independent contrasts?

      The size and location of hPIT are generally consistent across subjects, as shown in Supplementary Figure 1. The consistency is also supported by figure 4C. The hPIT is defined by conjunction maps across three tasks and then manually delineated avoiding overlapping voxels with FFA and LOp. The FFA was defined using an independent contrast (Exp3 contrast [face-scene]) and the Lop location was defined by anatomical parcellation (Glasser et al., 2016).

      Reviewer #2 (Public review):

      Summary

      This study investigates the role of the human posterior inferotemporal cortex (hPIT) in attentional control, proposing that hPIT serves as an attentional priority map that integrates both top-down (endogenous) and bottom-up (exogenous) attentional processes. The authors conducted three types of fMRI experiments and collected resting-state data from 15 participants. In Experiment 1, using three different spatial attention tasks, they identified the hPIT region and demonstrated that this area is modulated by attention across tasks. In Experiment 2, by manipulating the presence or absence of visual stimuli, they showed that hPIT exhibits strong attentional modulation in both conditions, suggesting its involvement in both bottom-up and top-down attention. Experiment 3 examined the sensitivity of hPIT to stimulus features and attentional load, revealing that hPIT is insensitive to stimulus category but responsive to task load - further supporting its role as an attentional priority map. Finally, resting-state functional connectivity analyses showed that hPIT is connected to both dorsal and ventral attention networks, suggesting its potential role as a bridge between the two systems. These findings extend prior work on monkey PITd and provide new insights into the integration of endogenous and exogenous attention.

      Strengths

      (1) The study is innovative in its use of specially designed spatial attention tasks to localize and validate hPIT, and in exploring the region's role in integrating both endogenous and exogenous attention, as prior works focus primarily on its involvement in endogenous attention.

      (2) The authors provided very comprehensive experiment designs with clear figures and detailed descriptions.

      (3) A broad range of analyses was conducted to support the hypothesis that hPIT functions as an attentional priority map -- including experiments of attentional modulation under both top-down and bottom-up conditions, sensitivity to stimulus features and task load, and resting-state functional connectivity. These analyses showed consistent results.

      (4) Multiple appropriate statistical analyses - including t-tests, ANOVAs, and post-hoc tests - were conducted, and the results are clearly reported.

      Thank you for a nice summary of the key points and strengths of our study.

      Weaknesses

      (1) The sample size is relatively small (n = 15), and inter-subject variability is big in Figures 5 and 6, as seen in the spread of individual data points and error bars. The analysis of attention-modulated voxel map intersections appears to be influenced by multiple outliers.

      We agree that the sample size (n = 15) is not ideal, and we acknowledge that some data points in Figures 5 and 6 appear to be potential outliers. However, according to conventional outlier detection criteria, all data points are within three standard deviations of the group mean and were therefore retained for analysis. Moreover, the attention-modulated voxel intersection map shown in Figure 4C is insensitive to outliers, because the intersection map plotted is based on the number of subjects.

      (2) The authors acknowledge important limitations, including the lack of exploration of feature-based attention and the temporal constraints inherent to fMRI.

      Yes, we hope to address these limitations in future studies.

      (3) Prior research has established that regions such as the prefrontal cortex (PFC) and posterior parietal cortex (PPC) are involved in both endogenous and exogenous attention and have been proposed as attentional priority maps. It remains unclear what is uniquely contributed by hPIT, how it functionally interacts with these classical attentional hubs, and whether its role is complementary or redundant. The study would benefit from more direct comparisons with these regions.

      In this study, we define the ROI base on intersection across three different types of spatial attention tasks, and the hPIT stands out in showing spatial attentional modulation across tasks. This could be due to the weak lateralized responses in PFC/PPC. To evaluate whether a region qualifies as a priority map, we applied four criteria (as mentioned in introduction). While dorsal and ventral attention network (DAN and VAN) regions can be considered important components of the priority map system, our findings suggest that among the regions tested, hPIT meets all four criteria. In Experiment 2, we included regions such as VFC (as part of PFC) and IPS (as part of PPC), and our findings suggest these areas are more involved in top-down attention. We agree with the reviewer’s suggestion and will perform additional analysis on PPC and PFC.

      (4) The functional connectivity analysis is only performed on resting-state data, and this approach does not capture context-dependent interactions. Task-based data analysis can provide stronger evidence.

      We acknowledge that resting-state FC is limited in assessing task-specific communication. To further investigate the role of hPIT, we plan to conduct task-based functional connectivity analyses.

      (5) The study does not report whether attentional modulation in hPIT is consistent across the two hemispheres. A comparison of hemispheric effects could provide important insight into lateralization and inter-individual variability, especially given the bilateral localization of hPIT.

      We thank the reviewer for this suggestion. hPIT was localized bilaterally using the same intersection-based method in Experiment 1. We have now performed additional analysis and found in Experiment 3, the difference in attentional modulation between high and low load conditions was significant in the right hPIT but not in the left. This result will be reported in the revised manuscript.

    1. Author response:

      Below, we will address point by point any and all concerns of the reviewers.

      Reviewer #1:

      There are no major concerns, but some material could be added for clarity and to make the work more accessible to a more general scientific audience.

      We will add text for clarity and to make the work more accessible to a general audience per this comment and similar suggestions of the other reviewers.

      (1.1) A figure clearly showing the habituation protocol and the use of the dishabituators would be a good addition, even if the procedure has been done before and is cited. There can always be readers who are seeing this for the first time.

      We do think this is a good idea as the time scales of the experiment will be clearly marked as well and we plan to generate one in the revised manuscript.

      (1.2) It would also be nice to comment on other ways dishabituation can happen (for example, when the stimulus is removed for a short time and returns) and what their time scales are.

      If the stimulus is withheld, spontaneous recovery occurs, a process distinct from dishabituation and worth exploring on its own. In a previous publication (Semelidou et al. eLife 2018;7:e39569), we have shown that in this habituation paradigm with 4 min exposure either to the aversive Octanol, or the attractive Ethyl Acetate, spontaneous recovery occurs on or after 6 minutes after the habituated stimulus is withheld. This contrasts the immediate effect of the single dishabituating stimulus, delivered for a few seconds at the end of exposure to the habituator. Granted that per Thomson (Neurobiol Learn Mem. 2009), spontaneous recovery is a characteristic of habituation, we will work this point in the text.

      (1.3) And more generally, the paper could perhaps improve by making a stronger case for why the results are important not just for flies but for neuroscience in general.

      Thank you for the encouragement. We will try to rationally generalize our findings.

      Reviewer #2:

      (2.1) However, the claim that this represents a fundamental difference between homosensory and heterosensory pathways for dishabituation is overstated.

      We had no intention of stating more than the fact that footshock and yeast odor dishabituators relay these stimuli to the mushroom bodies via distinct dopaminergic neurons, hence differentiating distinct dishabituating stimuli via the mechanosensory (footshock) and olfactory (yeast odor) modalities as they engage the mushroom bodies. As the reviewer suggests we will use more measured and specific language to state the above.

      (2.2) The introductory section does not adequately present current broad models for habituation and dishabituation.

      This was not done intentionally, but rather because we aimed at a less extended introductory section and ostensibly this resulted in brief and possibly inadequate presentation of current habituation models. We will present a much more detailed introduction and detail of habituation and dishabituation models in the revised manuscript (Also see reply to point 3.5 below).

      (2.3) There are many different time scales, even for Drosophila olfactory habituation. These, as well as potential underlying mechanistic differences, need to be acknowledged; any claim should be specifically qualified for the time scales being studied here.

      We understand and appreciate the point of the reviewer, as well as its significance and we will address this both in the revised text, but also by the paradigm figure we will add as stated above (point 1.1), where the time scales will be explicitly included and emphasized.

      (2.4) Additionally, there are several unclear, vague, and inaccurate sections and statements. A more careful, precise, and considered presentation of current views, as well as more measured claims of the impact of the findings, would substantially enhance my enthusiasm.

      We will address these concerns of course, though pointing out the specific offending parts would ascertain addressing them thoroughly. As stated above, we will incorporate current views in the introduction and when discussing our results and their impact.

      Reviewer #3:

      (3.1) The key issue is that the main concepts of this manuscript appear to be based on a misunderstanding/misinterpretation of the literature. As the authors set out to settle the debate "whether the novel dishabituating stimulus elicits sensitization of the habituated circuits, or it engages distinct neuronal routes to bypass habituation reinstating the naïve response", it seems that the authors based their investigation on the premise that "sensitization" is mediated by a facilitatory process within the S-R pathway, and "dishabituation" by a facilitatory process outside the S-R pathway. This is not the status quo in the field, particularly with the prevailing theory like the Dual-Process Theory.

      We appreciate the reviewer’s comment and the opportunity to clarify the conceptual framework of our work. Our intention was in fact to test the Groves and Thomson hypothesis (Neurobiol Learn Mem. 2009), in our olfactory habituation system. As such, dishabituation could have been the result of a facilitatory process within the S-R pathway, or from mechanisms outside of it. Our experimental design allowed to distinguish these possibilities and our results clearly show that dishabituation involves circuitry outside the S-R pathway. We do thank the reviewer for pointing out that we have not articulated clearly this intention and we will take care to communicate this effectively in the revised manuscript.

      (3.2) The original version of Dual-Process Theory (Groves and Thompson 1970, but also see Thompson 2008, Neurobiol Learn Mem) already hypothesized that habituation happens within the specific S-R pathway, and sensitization occurs separately in an "organism-wide" state system that modulates the output of all S-R pathways.

      As mentioned above, we are aware of the Dual-Process hypothesis. In fact, our data demonstrate that activity outside the olfactory S-R pathway, engaging novel neuronal circuits, mediates dishabituation. Unlike habituation, these circuits mediating dishabituation include at minimum, the mushroom bodies, the dopaminergic system and the APL neurons. In our view this does not support the “organism-wide state” system, but rather particular circuits that in agreement with the Groves and Thomson hypothesis, are outside the S-R pathway and modulate its behavioral output. We will work these concepts in the discussion section of the revised manuscript.

      (3.3) Dishabituation is recognized by the Dual-Process Theory as sensitization (organism-wide facilitation) manifested on top of existing habituation (depressed S-R pathway). This notion has been supported by a wide range of studies, including cat spinal cord reflex (e.g. Spencer et al. 1966) and work in Aplysia on heterosynaptic facilitation for both sensitization and dishabituation. Therefore, simply showing that the newly identified facilitatory pathways are outside the S-R habituation pathway is insufficient to demonstrate dishabituation.

      We respectfully disagree with the concluding sentence here. In all of our experiments, we observe a clear recovery of olfactory avoidance after exposure to the footshock, or yeast odor dishabituators. Moreover, the dishabituators are emulated by (photo)activation of particular neuronal circuits and the recovery of olfactory avoidance is blocked when these circuits are silenced. Regardless of whether this recovery is classified as dishabituation via sensitization or another facilitatory process, the key point is that the habituated response is reliably reinstated contingent upon the dishabituating stimulus. We believe this meets the established criteria for dishabituation.

      (3.4) As behavioral facilitation of a habituated response can be achieved by dishabituating (specific recovery of the S-R pathway) and/or superimposed sensitizing (organism-wide) processes, dishabituation and sensitization of this olfactory response must be first dissociated; however, the study provided no evidence for the dissociation. Without this piece of evidence, the claim of this paper that the newly identified pathways mediate dishabituation is not fully supported.

      We agree with the reviewer that we have not provided specific evidence dissociating dishabituation and sensitization of the particular olfactory response beyond the evidence implicating particular circuitry in the outcome of facilitation of the olfactory response.

      It should be noted that in photoactivation of the implicated circuitries in naïve flies, we do not observe enhanced octanol avoidance, suggesting that activation of these circuits alone does not induce sensitization. Moreover, our results show that neither footshock nor yeast odor drive an organism-wide sensitization, as silencing specific circuits was sufficient to block dishabituation—something that would not be expected if a global sensitization process was responsible of reinstating the olfactory response.

      Nonetheless, we will also attempt to dissociate sensitization from dishabituation using mutants previously reported deficient in sensitization (Duerr and Quinn, PNAS 1982), assuming these mutants retain normal olfactory habituation. We will also try sensitization protocols in the case of within-modal dishabituation to further clarify the underlying mechanisms. In principle, this includes using diluted Octanol as the habituating stimulus and attempt dishabituation with concentrated octanol.

      (3.5) The literature review of this manuscript has some discrepancies. In the introduction, the authors wrote "initial studies in Aplysia were consistent with the "dual-process theory" (Groves and Thompson 1979), where response recovery due to dishabituation appeared to result from sensitization superimposed on habituation, thus driving reversal of the attenuated response (Carew, Castellucci et al. 1971, Hochner, Klein et al. 1986, Marcus, Nolen et al. 1988, Ghirardi, Braha et al. 1992, Cohen, Kaplan et al. 1997, Antonov, Kandel et al. 1999, Hawkins, Cohen et al. 2006)." Hochner 1986 and Marcus 1988 in fact indicated otherwise. Hochner 1986 suggests that dishabituation and sensitization involve different molecular processes, while Marcus 1988 showed that dishabituation and sensitization have different behavioral characteristics. Therefore, the authors' statement is not supported by the cited literature.

      We are grateful to the reviewer for pointing out these significant discrepancies, consequent of multiple rounds of edits followed by our own oversight. These important publications for this manuscript will be referenced properly in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes the role of PRDM16 in modulating BMP response during choroid plexus (ChP) development. The authors combine PRDM16 knockout mice and cultured PRDM16 KO primary neural stem cells (NSCs) to determine the interactions between BMP signaling and PRDM16 in ChP differentiation.

      They show PRDM16 KO affects ChP development in vivo and BMP4 response in vitro. They determine genes regulated by BMP and PRDM16 by ChIP-seq or CUT&TAG for PRDM16, pSMAD1/5/8, and SMAD4. They then measure gene activity in primary NSCs through H3K4me3 and find more genes are co-repressed than co-activated by BMP signaling and PRDM16. They focus on the 31 genes found to be co-repressed by BMP and PRDM16. Wnt7b is in this set and the authors then provide evidence that PRDM16 and BMP signaling together repress Wnt activity in the developing choroid plexus.

      Strengths:

      Understanding context-dependent responses to cell signals during development is an important problem. The authors use a powerful combination of in vivo and in vitro systems to dissect how PRDM16 may modulate BMP response in early brain development.

      We thank the reviewer for the thoughtful summary and positive feedback. We appreciate the recognition of our integrative in vivo and in vitro approach. We're glad the reviewer found our findings on context-dependent gene regulation and developmental signaling valuable.

      Main weaknesses of the experimental setup:

      (1) Because the authors state that primary NSCs cultured in vitro lose endogenous Prdm16 expression, they drive expression by a constitutive promoter. However, this means the expression levels are very different from endogenous levels (as explicitly shown in Supplementary Figure 2B) and the effect of many transcription factors is strongly dose-dependent, likely creating differences between the PRDM16-dependent transcriptional response in the in vitro system and in vivo.

      We acknowledge that our in vitro experiments may not ideally replicate the in vivo situation, a common limitation of such experiments, our primary aim was to explore the molecular relationship between PRDM16 and BMP signaling in gene regulation. Such molecular investigations are challenging to conduct using in vivo tissues. In vitro NSCs treated with BMP4 has been used a model to investigate NSC proliferation and quiescence, drawing on previous studies (e.g., Helena Mira, 2010; Marlen Knobloch, 2017). Crucially, to ensure the relevance of our in vitro findings to the in vivo context, we confirmed that cultured cells could indeed be induced into quiescence by BMP4, and this induction necessitated the presence of PRDM16. Furthermore, upon identifying target genes co-regulated by PRDM16 and SMADs, we validated PRDM16's regulatory role on a subset of these genes in the developing Choroid Plexus (ChP) (Fig. 7 and Suppl.Fig7-8). Only by combining evidence from both in vitro and in vivo experiments could we confidently conclude that PRDM16 serves as an essential co-factor for BMP signaling in restricting NSC proliferation.

      (2) It seems that the authors compare Prdm16_KO cells to Prdm16 WT cells overexpressing flag_Prdm16. Aside from the possible expression of endogenous Prdm16, other cell differences may have arisen between these cell lines. A properly controlled experiment would compare Prdm16_KO ctrl (possibly infected with a control vector without Prdm16) to Prdm16_KO_E (i.e. the Prdm16_KO cells with and without Prdm16 overexpression.)

      We agree that Prdm16 KO cells carrying the Prdm16-expressing vector would be a good comparison with those with KO_vector. However, despite more than 10 attempts with various optimization conditions, we were unable to establish a viable cell line after infecting Prdm16 KO cells with the Prdm16-expressing vector. The overall survival rate for primary NSCs after viral infection is low, and we observed that KO cells were particularly sensitive to infection treatment when the viral vector was large (the Prdm16 ORF is more than 3kb).

      As an alternative oo assess vector effects, we instead included two other control cell lines, wt and KO cells infected with the 3xNLS_Flag-tag viral vector, and presented the results in supplementary Fig 2.  When we compared the responses of the four lines — wt, KO, wt infected with the Flag vector, KO infected with the Flag vector — to the addition and removal of BMP4, we confirmed that the viral infection itself has no significant impacts on the responses of these cells to these treatments regarding changes in cell proliferation and Ttr induction.

      Given that wt cells and the KO cells, with or without viral backbone infection behave quite similarly in terms of cell proliferation, we speculate that even if we were successful in obtaining a cell line with Prdm16-expressing vector in the KO cells, it may not exhibit substantial differences compared to wt cells infected with Prdm16-expressing vector.

      Other experimental weaknesses that make the evidence less convincing:

      (1) The authors show in Figure 2E that Ttr is not upregulated by BMP4 in PRDM16_KO NSCs. Does this appear inconsistent with the presence of Ttr expression in the PRDM16_KO brain in Figure1C?

      The reviwer’s point is that there was no significant increase in Ttr expression in Prdm16_KO cells after BMP4 treatment (Fig. 2E), but there remained residule Ttr mRNA signals in the Prdm16 mutant ChP (Fig. 1C). We think the difference lies in the measuable level of Ttr expression between that induced by BMP4 in NSC culture and that in the ChP. This is based on our immunostaining expreriment in which we tried to detect Ttr using a Ttr antibody. This antibody could not detect the Ttr protein in BMP4-treated Prdm16_expressing NSCs but clearly showed Ttr signal in the wt ChP. This means that although Ttr expression can be significantly increased by BMP4 in vitro to a level measurable by RT-qPCR, its absolute quantity even in the Prdm16_expressing condition is much lower compared to that in vivo. Our results in Fig 1C and Fig 2E, as well as Fig 7B, all consistently showed that Prdm16 depletion significantly reduced Ttr expression in in vitro and in vivo.

      (2) Figure 3: The authors use H3K4me3 to measure gene activity. This is however, very indirect, with bulk RNA-seq providing the most direct readout and polymerase binding (ChIP-seq) another more direct readout. Transcription can be regulated without expected changes in histone methylation, see e.g. papers from Josh Brickman. They verify their H3K4me3 predictions with qPCR for a select number of genes, all related to the kinetochore, but it is not clear why these genes were picked, and one could worry whether these are representative.

      H3K4me3 has widely been used as an indicator of active transcription and is a mark for cell identity genes. And it has been demonstrated that H3K4me3 has a direct function in regulating transciption at the step of RNApolII pausing release. As stated in the text, there are advantages and disadvantages of using H3K4me3 compared to using RNA-seq. RNA-seq profiles all gene products, which are affected by transcription and RNA stability and turnover. In contrast, H3K4me3 levels at gene promoter reflects transcriptional activity. In our case, we aimed to identify differential gene expression between proliferation and quiescence states. The transition between these two states is fast and dynamic. RNA-seq may not be able to identify functionally relevant genes but more likely produces false positive and negative results. Therefore, we chose H3K4me3 profiling.

      We agree that transcription may change without histone methylation changes. This may cause an under-estimation of the number of changed genes between the conditions. 

      We validated 7 out of 31 genes (Wnt7b, Id3, Mybl2, Spc24, Spc25, Ndc80 and Nuf2). We chose these genes based on two critira: 1) their function is implicated in cell proliferation and cell-cycle regulation based on gene ontology analysis; 2) their gene products are detectable in the developing ChP based on the scRNA-seq data. Three of these genes (Wnt7b, Id3, Mybl2) are not related to the kinetochore. We now clarify this description in the revised text.

      (3) Line 256: The overlap of 31 genes between 184 BMP-repressed genes and 240 PRDM16-repressed genes seems quite small.

      This result indicates that in addition to co-repressing cell-cycle genes, BMP and PRDM16 have independent fucntions. For example, it was reported that BMP regulates neuronal and astrocyte differentiation (Katada, S. 2021), while our previous work demonstrated that Prdm16 controls temporal identity of NSCs (He, L. 2021).

      (4) The Wnt7b H3K4me3 track in Fig. 3G is not discussed in the text but it shows H3K4me3 high in _KO and low in _E regardless of BMP4. This seems to contradict the heatmap of H3K4me3 in Figure 3E which shows H3K4me3 high in _E no BMP4 and low in _E BMP4 while omitting _KO no BMP4. Meanwhile CDKN1A, the other gene shown in 3G, is missing from 3E.

      The track in Fig 3G shows the absolute signal of H3K4me3 after mapping the sequencing reads to the genome and normaliz them to library size. Compare the signal in Prdm16_E with BMP4 and that in Prdm16_E without BMP4, the one with BMP4 has a lower peak. The same trend can be seen for the pair of Prdm16_KO cells with or without BMP4.  The heatmap in Fig. 3E shows the relative level of H3K4me3 in three conditions. The Prdm16_E cells with BMP4 has the lowest level, while the other two conditions (Prdm16_KO with BMP4 and Prdm16_E without BMP4) display higher levels. These two graphs show a consistent trend of H3K4me3 changes at the Wnt7b promoter across these conditions. Figure 3E only includes genes that are co-repressed by PRDM16 and BMP. CDKN1A’s H3K4me3 signals are consistent between the conditions, and thus it is not a PRDM16- or BMP-regulated gene. We use it as a negative control. 

      (5) The authors use PRDM16 CUT&TAG on dissected dorsal midline tissues to determine if their 31 identified PRDM16-BMP4 co-repressed genes are regulated directly by PRDM16 in vivo. By manual inspection, they find that "most" of these show a PRDM16 peak. How many is most? If using the same parameters for determining peaks, how many genes in an appropriately chosen negative control set of genes would show peaks? Can the authors rigorously establish the statistical significance of this observation? And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.

      In our text, we indicated the genes containing PRDM16 binding peaks in the figures and described them as “Text in black in Fig. 6A and Supplementary Fig. 5A”. We will add the precise number “25 of these genes” in the main text to clarify it. We used BMP-only repressed 184-31 =153 genes (excluding PRDM16-BMP4 co-repressed) as a negative control set of genes. By computationally determine the nearest TSS to a PRDM16 peak, we identified 24/31 co-repressed genes and 84/153 BMP-only-repressed genes, containing PRDM16 peaks in the E12.5 ChP data. Fisher’s Exact Test comparing the proportions yields the P-value = 0.015.

      We are confused with the second part of the comment “And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.” If the reviewer meant why we didn’t sequence the material from sequential-ChIP or validate more taget genes, the reason is the limitation of the material. Sequential ChIP requires a large quantity of the antibodies, and yields little material barely sufficient for a few qPCR after the second round of IP. This yielded amount was far below the minimum required for library construction. The PRDM16 antibody was a gift, and the quantity we have was very limited. We made a lot of efforts to optimize all available commercial antibodies in ChIP and Cut&Tag, but none of them worked in these assays.

      (6) In comparing RNA in situ between WT and PRDM16 KO in Figure 7, the authors state they use the Wnt2b signal to identify the border between CH and neocortex. However, the Wnt2b signal is shown in grey and it is impossible for this reviewer to see clear Wnt2b expression or where the boundaries are in Figure 7A. The authors also do not show where they placed the boundaries in their analysis. Furthermore, Figure 7B only shows insets for one of the regions being compared making it difficult to see differences from the other region. Finally, the authors do not show an example of their spot segmentation to judge whether their spot counting is reliable. Overall, this makes it difficult to judge whether the quantification in Figure 7C can be trusted.

      In the revised manuscript we have included an individal channel of Wnt2b and mark the boundaries. We also provide full-view images and examples of spot segmentation in the new supplementary figure 8. 

      (7) The correlation between mKi67 and Axin2 in Figure 7 is interesting but does not convincingly show that Wnt downstream of PRDM16 and BMP is responsible for the increased proliferation in PRDM16 mutants.

      We agree that this result (the correlation between mKi67 and Axin2) alone only suggests that Wnt signaling is related to the proliferation defect in the Prdm16 mutant, and does not necessarily mean that Wnt is downstream of PRDM16 and BMP. Our concolusion is backed up by two additional lines of evidences:  the Cut&Tag data in which PRDM16 binds to regulatory regions of Wnt7b and Wnt3a; BMP and PRDM16 co-repress Wnt7b in vitro.

      An ideal result is that down-regulating Wnt signaling in Prdm16 mutant can rescue Prdm16 mutant phenotype. Such an experiment is technically challenging. Wnt plays diverse and essential roles in NSC regulation, and one would need to use a celltype-and stage-specific tool to down-regulate Wnt in the background of Prdm16 mutation. Moreover, Wnt genes are not the only targets regulated by PRDM16 in these cells, and downregulating Wnt may not be sufficient to rescue the phenotype. 

      Weaknesses of the presentation:

      Overall, the manuscript is not easy to read. This can cause confusion.

      We have revised the text to improve clarity.

      Reviewer #1 (Recommendations for the authors):

      (1) Overall, the manuscript is not easy to read. Here are some causes of confusion for which the presentation could be cleaned up:

      We are grateful for the reviewer’s suggestion. In the revised manuscript, we have made efforts to improve the clarity of the text.

      (a) Part of the first section is confusing in that some statements seem contradictory, in particular:

      "there is no overall patterning defect of ChP and CH in the Prdm16 mutant" (line 125)

      "Prdm16 depletion disrupted the transition from neural progenitors into ChP epithelia" (line 144)

      It would be helpful if the authors could reformulate this more clearly.

      We modified the text to clarify that while the BMP-patterned domain is not affected, the transition of NSCs into ChP epithelial cells is compromised in the Prdm16 mutant.

      (b) Flag_PRDM16, PRDM16_expressing, PRDM16_E, PRDM16 OE all seem to refer to the same PRDM16 overexpressing cells, which is very confusing. The authors should use consistent naming. Moreover, it would be good if they renamed these all to PRDM16_OE to indicate expression is not endogenous but driven by a constitutive promoter.

      We appreciate the comment and agree that the use of multiple terms to refer to the same PRDM16-overexpressing condition was confusing. Our original intention in using Prdm16_E was to distinguish cells expressing PRDM16 from the two other groups: wild-type cells and Prdm16_KO cells, which both lack PRDM16 protein expression. However, we acknowledge that Prdm16_E could be misinterpreted as indicating expression from the endogenous Prdm16 promoter. To avoid this confusion and ensure consistency, we have now standardized the terminology and refer to this condition as Prdm16_OE, indicating Flag-tagged PRDM16 expression driven by a constitutive promoter.

      (c) Line 179 states "generated a cell line by infecting Prdm16_KO cells with the same viral vector, expressing 3xNSL_Flag". Do the authors mean 3xNLS_Flag_Prdm16, so these are the Prdm16_KO_E cells by the notation suggested above? Or is this a control vector with Flag only? The following paragraph refers to Supplementary Figure 2C-F where the same construct is called KO_CDH, suggesting this was an empty CDH vector, without Flag, or Prdm16. This is confusing.

      We appreciate the reviewer’s careful reading and helpful comment. We acknowledge the confusion caused by the inconsistent terminology. To clarify: in line 179, we intended to describe an attempt to generate a Prdm16_KO cell line expressing 3xNLS_Flag_Prdm16, not a control vector with Flag only. However, despite repeated attempts, we were unable to establish this line due to low viral efficiency and the vulnerability of Prdm16_KO cells to infection with the large construct. Therefore, these cells were not included in the subsequent analyses.

      The term KO_CDH refers to Prdm16_KO cells infected with the empty CDH control vector, which lacks both Flag and Prdm16. This is the line used in the experiments shown in Supplementary Fig. 2C–F. We have revised the text throughout the manuscript to ensure consistent use of terminology and to avoid this confusion.

      (2) The introductory statements on lines 53-54 could use more references.

      Thanks for the suggestion. We have now included more references.

      (3) It would be helpful if all structures described in the introduction and first section were annotated in Figure 1, or otherwise, if a cartoon were included. For example, the cortical hem, and fourth ventricle.

      Thanks for the suggestion. We have now indicated the structures, ChP, CH and the fourth ventricle, in the images in Figure 1 and Supplementary Figure 1.

      (4) In line 115, "as previously shown.." - to keep the paper self-contained a figure illustrating the genetics of the KO allele would be helpful.

      Thanks for the suggestion. We have now included an illustration of the Prdm16 cGT allele in Figure 1B.

      (5) In Figure 1D as costain for a ChP marker would be helpful because it is hard to identify morphologically in the Prdm16 KO.

      Appoligize for the unclarity. The KO allele contains a b-geo reporter driven by Prdm16 endogenous promoter. The samples were co-stained for EdU, b-Gal and DAPI. To distingquish the ChP domain from the CH, we used the presence of b b-Gal as a marker. We indicated this in the figure legend, but now have also clarified this in the revised text.

      (6) The details in Figure 1E are hard to see, a zoomed-in inset would help.

      A zoomed-in inset is now included in the figure.

      (7) Supplementary Figure 2A does not convincingly show that PRDM16 protein is undetectable since endogenous expression may be very low compared to the overexpression PRDM16_E cells so if the contrast is scaled together it could appear black like the KO.

      We appreciate the reviewer’s point and have carefully considered this concern. We concluded that PRDM16 protein is effectively undetectable in cultured wild-type NSCs based on direct comparison with brain tissue. Both cultured NSCs and brain sections were processed under similar immunostaining and imaging conditions. While PRDM16 showed robust and specific nuclear localization in embryonic brain sections (Fig. 1B and Supplementary Fig. 1A), only a small subset of cultured NSCs exhibited PRDM16 signal, primarily in the cytoplasm (middle panel of Fig. 2A). This stark contrast supports our conclusion that endogenous PRDM16 protein is either absent or significantly downregulated in vitro. Because of this limitation, we turned to over-expressing Prdm16 in NSC culture using a constitutive promoter. 

      (9) Line 182 "Following the washout step" - no such step had been described, maybe replace by "After washout of BMP".

      Yes, we have revised the text.

      (8) Line 214: "indicating a modest level" - what defines modest? Compared to what? Why is a few thousand moderate rather than low? Does it go to zero with inhibitors for pathways?

      Here a modest level means a lower level than to that after adding BMP4. To clarify this, we revised the description to “indicating endogenous levels of …”

      (9) The way qPCR data are displayed makes it difficult to appreciate the magnitude of changes, e.g. in Supplementary Figure 2B where a gap is introduced on the scale. Displaying log fold change / relative CT values would be more informative.

      We used a segmented Y-axis in Supplementary Figure 2B because the Prdm16 overexpression samples exhibited much higher experssion levels compared to other conditions. In response to this suggestion, we explored alternative ways to present the result, including ploting log-transformed values and log fold changes. However, these methods did not enhance the clarity of the differences – in fact, log scaling made the magnitude of change appear less apparent. To address this, we now present the overexpression samples in a separate graph, thereby eliminating the need for a broken Y-axis and improving the overall readability of the data.

      (10) Writing out "3 days" instead of 3D in Figure 2A would improve clarity. It would be good if the used time interval is repeated in other figures throughout the paper so it is still clear the comparison is between 0 and 3 days.

      We have changed “3D” to “3 days”. All BMP4 treatments in this study were 3 days.

      (11) Line 290: "we found that over 50% of SMAD4 and pSMAD1/5/8 binding peaks were consistent in Prdm16_E and Prdm16_KO cells, indicating that deletion of Prdm16 does not affect the general genomic binding ability of these proteins" - this only makes sense to state with appropriate controls because 50% seems like a big difference, what is the sample to sample variability for the same condition? Moreover, the next paragraph seems to contradict this, ending with "This result suggests that SMAD binding to these sites depends on PRDM16". The authors should probably clarify the writing.

      We appreciate the reviwer’s comment and agree that clarification was needed. Our point was that SMAD4 and pSMAD1/5/8 retain the ability to bind DNA broadly in the Prdm16 KO cells, with more than half of the original binding sites still occupied. This suggests that deletion of Prdm16 does not globally impair SMAD genomic binding. Howerever, our primary interest lies in the subset of sites that show differential by SMAD binding between wt and Prdm16 KO conditions, as thse are likely to be PRDM16-dependent. 

      In the following paragraph, we focused specifically on describing SMAD and PRDM16 co-bound sites. At these loci, SMAD4 and pSMAD1/5/8 showed reduced enrichment in the absence of PRDM16, suggesting PRDM16 facilitates SMAD binding at these particular regions. We have revised the text in the manuscript to more clearly distinguish between global SMAD binding and PRDM16-dependent sites.

      (12) Much more convincing than ChIP-qPCR for c-FOS for two loci in Figures 5F-G would be a global analysis of c-FOS ChIP-seq data.

      We agree that a global c-FOS ChIP-seq analysis would provide a more comprehensive view of c-FOS binding patterns. However, the primary focus of this study is the interaction between BMP signaling and PRDM16. The enrichment of AP-1 motifs at ectopic SMAD4 binding sites was an unexpected finding, which we validated using c-FOS ChIP-qPCR at selected loci. While a genome-wide analysis would be valuable, it falls beyond the current scope. We agree that future studies exploring the interplay among SMAD4/pSMAD, PRDM16, and AP-1 will be important and informative.

      (13) Figure 6A is hard to read. A heatmap would make it much easier to see differences in expression. Furthermore, if the point is to see the difference between ChP and CH, why not combine the different subclusters belonging to those structures? Finally, why are there 28 genes total when it is said the authors are evaluating a list of 31 genes and also displaying 6 genes that are not expressed (so the difference isn't that unexpressed genes are omitted)?

      For the scRNA-seq data, we chose violin plots because they display both gene expression levels and the number of cells that express each gene. However, we agree that the labels in Figure 6A were too small and difficult to read. We have revised the figure by increasing the font size and moved genes with low expression to  Supplementary Figure 5A. Figure 6A includes 17 more highly expressed genes together with three markers, and  Supplementary Figure 5A contains 13 lowly expressed genes. One gene Mrtfb is missing in the scRNA-seq data and thus not included. We have revised the description of the result in the main text and figure legends.

      Reviewer #2 (Public review):

      Summary:

      This article investigates the role of PRDM16 in regulating cell proliferation and differentiation during choroid plexus (ChP) development in mice. The study finds that PRDM16 acts as a corepressor in the BMP signaling pathway, which is crucial for ChP formation.

      The key findings of the study are:

      (1) PRDM16 promotes cell cycle exit in neural epithelial cells at the ChP primordium.

      (2) PRDM16 and BMP signaling work together to induce neural stem cell (NSC) quiescence in vitro.

      (3) BMP signaling and PRDM16 cooperatively repress proliferation genes.

      (4) PRDM16 assists genomic binding of SMAD4 and pSMAD1/5/8.

      (5) Genes co-regulated by SMADs and PRDM16 in NSCs are repressed in the developing ChP.

      (6) PRDM16 represses Wnt7b and Wnt activity in the developing ChP.

      (7) Levels of Wnt activity correlate with cell proliferation in the developing ChP and CH.

      In summary, this study identifies PRDM16 as a key regulator of the balance between BMP and Wnt signaling during ChP development. PRDM16 facilitates the repressive function of BMP signaling on cell proliferation while simultaneously suppressing Wnt signaling. This interplay between signaling pathways and PRDM16 is essential for the proper specification and differentiation of ChP epithelial cells. This study provides new insights into the molecular mechanisms governing ChP development and may have implications for understanding the pathogenesis of ChP tumors and other related diseases.

      Strengths:

      (1) Combining in vitro and in vivo experiments to provide a comprehensive understanding of PRDM16 function in ChP development.

      (2) Uses of a variety of techniques, including immunostaining, RNA in situ hybridization, RT-qPCR, CUT&Tag, ChIP-seq, and SCRINSHOT.

      (3) Identifying a novel role for PRDM16 in regulating the balance between BMP and Wnt signaling.

      (4) Providing a mechanistic explanation for how PRDM16 enhances the repressive function of BMP signaling. The identification of SMAD palindromic motifs as preferred binding sites for the SMAD/PRDM16 complex suggests a specific mechanism for PRDM16-mediated gene repression.

      (5) Highlighting the potential clinical relevance of PRDM16 in the context of ChP tumors and other related diseases. By demonstrating the crucial role of PRDM16 in controlling ChP development, the study suggests that dysregulation of PRDM16 may contribute to the pathogenesis of these conditions.

      We thank the reviewer for the thorough and thoughtful summary of our study. We’re glad the key findings and significance of our work were clearly conveyed, particularly regarding the role of PRDM16 in coordinating BMP and Wnt signaling during ChP development. We also appreciate the recognition of our integrated approach and the potential implications for understanding ChP-related diseases.

      Weaknesses:

      (1) Limited investigation of the mechanism controlling PRDM16 protein stability and nuclear localization in vivo. The study observed that PRDM16 protein became nearly undetectable in NSCs cultured in vitro, despite high mRNA levels. While the authors speculate that post-translational modifications might regulate PRDM16 in NSCs similar to brown adipocytes, further investigation is needed to confirm this and understand the precise mechanism controlling PRDM16 protein levels in vivo.

      While mechansims controlling PRDM16 protein stability and nuclear localization in the developing brain are interesting, the scope of this paper is revealing the function of PRDM16 in the choroid plexus and its interaction with BMP signaling. We will be happy to pursuit this direction in our next study.

      (2) Reliance on overexpression of PRDM16 in NSC cultures. To study PRDM16 function in vitro, the authors used a lentiviral construct to constitutively express PRDM16 in NSCs. While this approach allowed them to overcome the issue of low PRDM16 protein levels in vitro, it is important to consider that overexpressing PRDM16 may not fully recapitulate its physiological role in regulating gene expression and cell behavior.

      As stated above, we acknowledge that findings from cultured NSCs may not directly apply to ChP cells in vivo. We are cautious with our statements. The cell culture work was aimed to identify potential mechanisms by which PRDM16 and SMADs interact to regulate gene expression and target genes co-regulated by these factors. We expect that not all targets from cell culture are regulated by PRDM16 and SMADs in the ChP, so we validated expression changes of several target genes in the developing ChP and now included the new data in Fig. 7 and Supplementary Fig. 7. Out of the 31 genes identified from cultured cells, four cell cycle regulators including Wnt7b, Id3, Spc24/25/nuf2 and Mybl2, showed de-repression in Prdm16 mutant ChP. These genes can be relevant downstream genes in the ChP, and other target genes may be cortical NSC-specific or less dependent on Prdm16 in vivo.

      (3) Lack of direct evidence for AP1 as the co-factor responsible for SMAD relocation in the absence of PRDM16. While the study identified the AP1 motif as enriched in SMAD binding sites in Prdm16 knockout cells, they only provided ChIP-qPCR validation for c-FOS binding at two specific loci (Wnt7b and Id3). Further investigation is needed to confirm the direct interaction between AP1 and SMAD proteins in the absence of PRDM16 and to rule out other potential co-factors.

      We agree that the finding of the AP1 motif enriched at the PRDM16 and SMAD co-binding regions in Prdm16 KO cells can only indirectly suggest AP1 as a co-factor for SMAD relocation. That’s why we used ChIP-qPCR to examine the presence of C-fos at these sites. Although we only validated two targets, the result confirms that C-fos binds to the sites only in the Prdm16 KO cells but not Prdm16_expressing cells, suggesting AP1 is a co-factor.  Our results cannot rule out the presence of other co-factors.

      Reviewer #2 (Recommendations for the authors):

      Minor typo: [7, page 3] "sicne" should be "since".

      We appreciate the reviewer’s careful reading. We have now corrected the typo and revised some part of the text to improve clarity.

      Reviewer #3 (Public review):

      Summary:

      Bone morphogenetic protein (BMP) signaling instructs multiple processes during development including cell proliferation and differentiation. The authors set out to understand the role of PRDM16 in these various functions of BMP signaling. They find that PRDM16 and BMP co-operate to repress stem cell proliferation by regulating the genomic distribution of BMP pathway transcription factors. They additionally show that PRDM16 impacts choroid plexus epithelial cell specification. The authors provide evidence for a regulatory circuit (constituting of BMP, PRDM16, and Wnt) that influences stem cell proliferation/differentiation.

      Strengths:

      I find the topics studied by the authors in this study of general interest to the field, the experiments well-controlled and the analysis in the paper sound.

      We thank the reviewer for their positive feedback and thoughtful summary. We appreciate the recognition of our efforts to define the role of PRDM16 in BMP signaling and stem cell regulation, as well as the soundness of our experimental design and analysis.

      Weaknesses:

      I have no major scientific concerns. I have some minor recommendations that will help improve the paper (regarding the discussion).

      We have revised the discussion according to the suggestions.

      Reviewer #3 (Recommendations for the authors):

      Specific minor recommendations:

      Page 18. Line 526: In a footnote, the authors point out a recent report which in parallel was investigating the link between PRDM16 and SMAD4. There is substantial non-overlap between these two papers. To aid the reader, I would encourage the authors to discuss that paper in the discussion section of the manuscript itself, highlighting any similarities/differences in the topic/results.

      Thanks for the suggestion. We now included the comparison in the discussion. One conclusion between our study and this publication is consistent, that PRDM16 functions as a co-repressor of SMAD4. However, the mechanims are different. Our data suggests a model in which PRDM16 facilitates SMAD4/pSMAD binding to repress proliferation genes under high BMP conditions. However, the other report suggests that SMAD4 steadily binds to Prdm16 promoter and switches regulatory functions depending on the co-factors. Together with PRDM16, SMAD4 represses gene expression, while with SMAD3 in response to high levels of TGF-b1, it activates gene expression. These differences could be due to different signaling (BMP versus TGF-b), contexts (NSCs versus Pancreatic cancers) etc.

      Page 3. Line 65: typo 'since'

      We appreciate the reviewer’s careful reading. We have now corrected the typo and revised the text to improve clarity.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript describes a series of experiments documenting trophic egg production in a species of harvester ant, Pogonomyrmex rugosus. In brief, queens are the primary trophic egg producers, there is seasonality and periodicity to trophic egg production, trophic eggs differ in many basic dimensions and contents relative to reproductive eggs, and diets supplemented with trophic eggs had an effect on the queen/worker ratio produced (increasing worker production).

      The manuscript is very well prepared and the methods are sufficient. The outcomes are interesting and help fill gaps in knowledge, both on ants as well as insects, more generally. More context could enrich the study and flow could be improved.

      We thank the reviewer for these comments. We agree that the paper would benefit from more context. We have therefore greatly extended the introduction.

      Reviewer #2 (Public Review):

      The manuscript by Genzoni et al. provides evidence that trophic eggs laid by the queen in the ant Pogonomyrmex rugosis have an inhibitory effect on queen development. The authors also compare a number of features of trophic eggs, including protein, DNA, RNA, and miRNA content, to reproductive eggs. To support their argument that trophic eggs have an inhibitory effect on queen development, the authors show that trophic eggs have a lower content of protein, triglycerides, glycogen, and glucose than reproductive eggs, and that their miRNA distributions are different relative to reproductive eggs. Although the finding of an inhibitory influence of trophic eggs on queen development is indeed arresting, the egg cross-fostering experiment that supports this finding can be effectively boiled down to a single figure (Figure 6). The rest of the data are supplementary and correlative in nature (and can be combined), especially the miRNA differences shown between trophic and reproductive eggs. This means that the authors have not yet identified the mechanism through which the inhibitory effect on queen development is occurring. To this reviewer, this finding is more appropriate as a short report and not a research article. A full research article would be warranted if the authors had identified the mechanism underlying the inhibitory effect on queen development. Furthermore, the article is written poorly and lacks much background information necessary for the general reader to properly evaluate the robustness of the conclusions and to appreciate the significance of the findings.

      We thank the reviewer for these comments. We agree that the paper would benefit by having more background information and more discussion. We have followed this advice in the revision.

      Reviewer #3 (Public Review):

      In "Trophic eggs affect caste determination in the ant Pogonomyrmex rugosus" Genzoni et al. probe a fundamental question in sociobiology, what are the molecular and developmental processes governing caste determination? In many social insect lineages, caste determination is a major ontogenetic milestone that establishes the discrete queen and worker life histories that make up the fundamental units of their colonies. Over the last century, mechanisms of caste determination, particularly regulators of caste during development, have remained relatively elusive. Here, Genzoni et al. discovered an unexpected role for trophic eggs in suppressing queen development - where bi-potential larvae fed trophic eggs become significantly more likely to develop into workers instead of gynes (new queens). These results are unexpected, and potentially paradigm-shifting, given that previously trophic eggs have been hypothesized to evolve to act as an additional intracolony resource for colonies in potentially competitive environments or during specific times in colony ontogeny (colony foundation), where additional food sources independent of foraging would be beneficial. While the evidence and methods used are compelling (e.g., the sequence of reproductive vs. trophic egg deposition by single queens, which highlights that the production of trophic eggs is tightly regulated), the connective tissue linking many experiments is missing and the downstream mechanism is speculative (e.g., whether miRNA, proteins, triglycerides, glycogen levels in trophic eggs is what suppresses queen development). Overall, this research elevates the importance of trophic eggs in regulating queen and worker development but how this is achieved remains unknown.

      We thank the reviewer for these comments and agree that future work should focus on identifying the substances in trophic eggs that are responsible for caste determination.  

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Introduction:

      The context for this study is insufficiently developed in the introduction - it would be nice to have a more detailed survey of what is known about trophic eggs in insects, especially social insects. The end of the introduction nicely sets up the hypothesis through the prior work described by Helms Cahan et al. (2011) where they found JH supplementation increased trophic egg production and also increased worker size. I think that the introduction could give more context about egg production in Pogonomyrmex and other ants, including what is known about worker reproduction. For example, Suni et al. 2007 and Smith et al. 2007 both describe the absence of male production by workers in two different harvester ants. Workers tend to have underdeveloped ovaries when in the presence of the queen. Other species of ants are known to have worker reproduction seemingly for the purpose of nutrition (see Heinze and Hölldober 1995 and subsequent studies on Crematogaster smithi). Because some ants, including Pogonomyrmex, lack trophallaxis, it has been hypothesized that they distribute nutrients throughout the nest via trophic eggs as is seen in at least one other ant (Gobin and Ito 2000). Interestingly, Smith and Suarez (2009) speculated that the difference in nutrition of developing sexual versus worker larvae (as seen in their pupal stable isotope values) was due to trophic egg provisioning - they predicted the opposite as was found in this study, but their prediction was in line with that of Helms Cahan et al. (2011). This is all to say that there is a lot of context that could go into developing the ideas tested in this paper that is completely overlooked. The inclusion of more of what is known already would greatly enrich the introduction.

      We agree that it would be useful to provide a larger context to the study. We now provide more information on the life-history of ants and explained under what situations queens and workers may produce trophic eggs. We also mentioned that some ants such as Crematogaster smithi have a special caste of “large workers” which are morphologically intermediate between winged queens and small workers and appear to be specialized in the production of unfertilized eggs. We now also mention the study of Goby and Ito (200) where the authors show that trophic eggs may play an important role in food distribution withing the colony, in particular in species where trophallaxis is rare or absent.

      Methods:

      L49: What lineage is represented in the colonies used? The collection location is near where both dependent-lineage (genetic caste determining) P. rugosus and "H" lineage exist. This is important to know. Further, depending on what these are, the authors should note whether this has relevance to the study. Not mentioning genetic caste determination in a paper that examines caste determination is problematic.

      This is a good point. We have now provided information at the very beginning of the material and method section that the queens had been collected in populations known not to have dependentlineage (genetic caste determining) mechanisms of caste determination.

      L63 and throughout: It would be more efficient to have a paragraph that cites R (must be done) and RStudio once as the tool for all analyses. It also seems that most model construction and testing was done using lme4 - so just lay this out once instead of over and over.

      We agree and have updated the manuscript accordingly.

      L95: 'lenght' needs to be 'length' in the formula.

      Thanks, corrected.

      L151: A PCA was used but not described in the methods. This should be covered here. And while a Mantel test is used, I might consider a permANOVA as this more intuitively (for me, at least) goes along with the PCA.

      We added the PCA description in the Material and Method section.

      Results:

      I love Fig. 3! Super cool.

      Thanks for this positive comment.

      Discussion:

      It would be good to have more on egg cannibalism. This is reasonably well-studied and could be good extra context.

      We have added a paragraph in the discussion to mention that egg cannibalism is ubiquitous in ants.

      Supp Table 1: P. badius is missing and citations are incorrectly attributed to P. barbatus.

      P. badius was present in the Table but not with the other Pogonomyrmex species. For some genera the species were also not listed in alphabetic order. This has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      Comments on introduction:

      The introduction is missing information about caste determination in ants generally and Pogonomyrmex rugosis specifically. This is important because some colonies of Pogonomyrmex rugosis have been shown to undergo genetic caste determination, in which case the main result would be rendered insignificant. What is the evidence that caste determination in the lineages/colonies used is largely environmentally influenced and in what contexts/environmental factors? All of this should be made clear.

      This is a good point. We have expanded the introduction to discuss previous work on caste determination in Pogonomyrmex species with environmental caste determination and now also provide evidence at the beginning of the Material and Method section that the two populations studied do not have a system of genetic caste determination.

      Line 32 and throughout the paper: What is meant exactly by 'reproductive eggs'? Are these eggs that develop specifically into reproductives (i.e., queens/males) or all eggs that are non-trophic? If the latter, then it is best to refer to these eggs as 'viable' in order to prevent confusion.

      We agree and have updated the manuscript accordingly.

      Figure 1/Supp Table 1: It is surprising how few species are known to lay trophic eggs. Do the authors think this is an informative representation of the distribution of trophic egg production across subfamilies, or due to lack of study? Furthermore, the branches show ant subfamilies, not families. What does the question mark indicate? Also, the information in the table next to the phylogeny is not easy to understand. Having in the branches that information, in categories, shown in color for example, could be better and more informative. Finally, having the 'none' column with only one entry is confusing - discuss that only one species has been shown to definitely not lay trophic eggs in the text, but it does not add much to the figure.

      Trophic eggs are probably very common in ants, but this has not been very well studied. We added a sentence in the manuscript to make this clear.

      Thanks for noticing the error family/subfamily error. This has been corrected in Figure 1 and Supplementary Table 1.

      The question mark indicates uncertainty about whether queens also contribute to the production of trophic eggs in one species (Lasius niger). We have now added information on that in the Figure legend.

      We agree with the reviewer that it would be easier to have the information on whether queens and workers produce trophic on the branches of the Tree. However, having the information on the branches would suggest that the “trait” evolved on this part of the tree. As we do not know when worker or queen production of trophic eggs exactly evolved, we prefer to keep the figure as it is.

      Finally, we have also removed the none in the figure as suggested by the reviewer and discussed in the manuscript the fact that the absence of trophic eggs has been reported in only one ant species (Amblyopone silvestrii: Masuko 2003_)._

      Comments on materials and methods:

      Why did they settle on three trophic eggs per larva for their experimental setup?

      We used three trophic eggs because under natural conditions 50-65% of the eggs are trophic. The ratio of trophic eggs to viable eggs (larvae) was thus similar natural condition.

      Line 50: In what kind of setup were the ants kept? Plaster nests? Plastic boxes? Tubes? Was the setup dry or moist? I think this information is important to know in the context of trophic eggs.

      We now explain that colonies were maintained in plastic boxes with water tubes.

      Line 60: Were all the 43 queens isolated only once, or multiple times?

      Each of the 43 queens were isolated for 8 hours every day for 2 weeks, once before and once after hibernation (so they were isolated multiple times). We have changed the text to make clear that this was done for each of the 43 queens.

      Could isolating the queen away from workers/brood have had an effect on the type of eggs laid?

      This cannot be completely ruled out. However, it is possible to reliably determine the proportion of viable and trophic eggs only by isolating queens. And importantly the main aim of these experiments was not to precisely determine the proportion viable and trophic eggs, but to show that this proportion changes before and after hibernation and that queens do not lay viable and trophic eggs in a random sequence.

      Since it was established that only queens lay trophic eggs why was the isolation necessary?

      Yes this was necessary because eggs are fragile and very difficult to collect in colonies with workers (as soon as eggs are laid they are piled up and as soon as we disturb the nest, a worker takes them all and runs away with them). Moreover, it is possible that workers preferentially eat one type of eggs thus requiring to remove eggs as soon as queens would have laid them. This would have been a huge disturbance for the colonies.

      Line 61: Is this hibernation natural or lab induced? What is the purpose of it? How long was the hibernation and at what temperature? Where are the references for the requirement of a diapause and its length?

      The hibernation was lab induced. We hibernated the queens because we previously showed that hibernation is important to trigger the production of gynes in P. rugosus colonies in the laboratory (Schwander et al 2008; Libbrecht et al 2013). Hibernation conditions were as described in Libbrecht et al (2013).  

      Line 73: If the queen is disturbed several times for three weeks, which effect does it have on its egg-laying rate and on the eggs laid? Were the eggs equally distributed in time in the recipient colonies with and without trophic eggs to avoid possible effects?

      It is difficult to respond what was the effect of disturbance on the number and type of eggs laid. But again our aim was not to precisely determine these values but determine whether there was an effect of hibernation on the proportion of trophic eggs. The recipient colonies with and without trophic eggs were formed in exactly the same way. No viable eggs were introduced in these colonies, but all first instar larvae have been introduced in the same way, at the same time, and with random assignment. We have clarified this in the Material and Method section.

      Line 77: Before placing the freshly hatched larvae in recipient colonies, how long were the recipient colonies kept without eggs and how long were they fed before giving the eggs? Were they kept long enough without the queen to avoid possible effects of trophic eggs, or too long so that their behavior changed?

      The recipient colonies were created 7 to 10 days before receiving the first larvae and were fed ad libitum with grass seeds, flies and honey water from the beginning. Trophic eggs that would have been left over from the source colony should have been eaten within the first few days after creating the recipient colonies. However, even if some trophic eggs would have remained, this would not influence our conclusion that trophic eggs influence caste fate, given the fully randomized nature of our treatments and the considerable number of independent replicates. The same applies to potential changes in worker behavior following their isolation from the queen.

      Line 77: Is it known at what stage caste determination occurs in this species? Here first instar larvae were given trophic eggs or not. Does caste-determination occur at the first instar stage? If not, what effect could providing trophic eggs at other stages have on caste-determination?

      A previous study showed that there is a maternal effect on caste determination in the focal species (Schwander et al 2008). The mechanism underlying this maternal effect was hypothesized to be differential maternal provisioning of viable eggs. However, as we detail in the discussion, the new data presented in our study suggests that the mechanism is in fact a different abundance of trophic eggs laid by queens. There is currently no information when exactly caste determination occurs during development

      Comments on results:

      Line 65: How does investigating the order of eggs laid help to "inform on the mechanisms of oogenesis"?

      We agree that the aim was not to study the mechanism of oogenesis. We have changed this sentence accordingly: “To assess whether viable and trophic eggs were laid in a random order, or whether eggs of a given type were laid in clusters, we isolated 11 queens for 10 hours, eight times over three weeks, and collected every hour the eggs laid”

      Figure 2: There is no description/discussion of data shown in panels B, C, E, and F in the main text.

      We have added information in the main text that while viable eggs showed embryonic development at 25 and 65 hours (Fig 12 B, C) there was no such development for trophic eggs (Fig. 2 E,F).

      Line 172: Please explain hibernation details and its significance on colony development/life cycle.

      We have added this information in the Material and Method section.

      Figure 6: How is B plotted? How could 0% of gynes have 100% survival?

      The survival is given for the larvae without considering caste. We have changed the de X axis of panel B and reworded the Figure legend to clarify this.

      Is reduced DNA content just an outcome of reduced cell number within trophic eggs, i.e., was this a difference in cell type or cell number? Or is it some other adaptive reason?

      It is likely to be due to a reduction in cell number (trophic eggs have maternal DNA in the chorion, while viable eggs have in addition the cells from the developing zygote) but we do not have data to make this point.

      Is there a logical sequence to the sequence of egg production? The authors showed that the sequence is non-random, but can they identify in what way? What would the biological significance be?

      We could not identify a logical sequence. Plausibly, the production of the two types of eggs implies some changes in the metabolic processes during egg production resulting in queens producing batches of either viable or trophic eggs. This would be an interesting question to study, but this is beyond the scope of this paper.

      Figure 6b is difficult to follow, and more generally, legends for all figures can be made clearer and more easy to follow.

      We agree. We have now improved the legends of Fig 6B and the other figures.

      Lines 172-174: "The percentage of eggs that were trophic was higher before hibernation...than after. This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable" - are these data shown? It would be nice to see how the total egglaying rate changes after hibernation. Also, is the proportion of trophic eggs laid similar between individual queens?

      No the data were not shown and we do not have excellent data to make this point. We have therefore removed the sentence “This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable” from the manuscript.

      Figure 6B: Do several colonies produce 100% gynes despite receiving trophic eggs? It would be interesting if the authors discussed why this might occur (e.g., the larvae are already fully determined to be queens and not responsive to whatever signal is in the trophic eggs).

      The reviewer is correct that 4 colonies produced 100% gynes despite receiving trophic eggs. However, the number of individuals produced in these four colonies was small (2,1,2,1, see supplementary Table 2). So, it is likely that it is just by chance that these colonies produced only gynes.

      Figure 5: Why a separation by "size distribution variation of miRNA"? What is the relevance of looking at size distributions as opposed to levels?

      We did that because there many different miRNA species, reflected by the fact that there is not just one size peak but multiple one. This is why we looked at size distribution

      Figure 2: The image of the viable embryo is not clear. If possible, redo the viable to show better quality images.

      Unfortunately, we do not anymore have colonies in the laboratory so this is not possible.

      Comments on discussion:

      Lines 236-247: Can an explanation be provided as to why the effect of trophic eggs in P. rugosus is the opposite of those observed by studies referenced in this section? Could P. rugosus have any life history traits that might explain this observation?

      In the two mentioned studies there were other factors that co-varied with variation in the quantity of trophic eggs. We mentioned that and suggested that it would be useful to conduct experimental manipulation of the quantity of trophic eggs in the Argentine ant and P. barbatus (the two species where an effect of trophic eggs had been suggested).

      The discussion should include implications and future research of the discovery.

      We made some suggestions of experiments that should be performed in the future

      The conclusion paragraph is too short and does not represent what was discussed.

      We added two sentences at the end of the paragraph to make suggestions of future studies that could be performed.

      Lines 231 to 247: Drastically reduce and move this whole part to the introduction to substantiate the assumption that trophic eggs play a nutritional role.

      We moved most of this paragraph to the introduction, as suggested by the reviewer.

      Reviewer #3 (Recommendations For The Authors):

      I would like to commend the authors on their study. The main findings of the paper are individually solid and provide novel insight into caste determination and the nature of trophic eggs. However, the inferences made from much of the data and connections between independent lines of evidence often extend too far and are unsubstantiated.

      We thank the reviewer for the positive comment. We made many changes in the manuscript to improve the discussion of our results.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript submission by Zhao et al. entitled, "Cardiac neurons expressing a glucagon-like receptor mediate cardiac arrhythmia induced by high-fat diet in Drosophila" the authors assert that cardiac arrhythmias in Drosophila on a high-fat diet are due in part to adipokinetic hormone (Akh) signaling activation. High-fat diet induces Akh secretion from activated endocrine neurons, which activate AkhR in posterior cardiac neurons. Silencing or deletion of Akh or AkhR blocks arrhythmia in Drosophila on a high-fat diet. Elimination of one of two AkhR-expressing cardiac neurons results in arrhythmia similar to a high-fat diet.

      Strengths:

      The authors propose a novel mechanism for high-fat diet-induced arrhythmia utilizing the Akh signaling pathway that signals to cardiac neurons.

      Weaknesses:

      Major comments:

      (1) The authors state, "Arrhythmic pathology is rooted in the cardiac conduction system." This assertion is incorrect as a blanket statement on arrhythmias. There are certain arrhythmias that have been attributable to the conduction system, such as bradycardic rhythms, heart block, sinus node reentry, inappropriate sinus tachycardia, AV nodal reentrant tachycardia, bundle branch reentry, fascicular ventricular tachycardia, or idiopathic ventricular fibrillation to name a few. However the etiological mechanism of many atrial and ventricular arrhythmias, such as atrial fibrillation or substrate-based ventricular tachycardia, are not rooted in the conduction system. The introduction should be revised to reflect a clear focus (away from?) on atrial fibrillation (AF). In addition, AF susceptibility is known to be modulated by autonomic tone, which is topically relevant (irrelevant?) to this manuscript.

      Thank you for the helpful comment. We rephrased the sentence as “Arrhythmic pathology is often rooted in the cardiac conduction system”.

      (2) The authors state that "HFD led to increased heartbeat and an irregular rhythm." In representative examples shown, HFD resulted in pauses, slower heart rate, and increased irregularity in rhythm but not consistently increased heart rate (Figures 1B, 3A, and 4C). Based on the cited work by Ocorr et al (https://doi.org/10.1073/pnas.0609278104), Drosophila heart rate is highly variable with periods of fast and slow rates, which the authors attributed to neuronal and hormonal inputs. Ocorr et al then describe the use of "semi-intact" flies to remove autonomic input to normalize heart rate. Were semi-intact flies used? If not, how was heart rate variability controlled? And how was heart rate "increase" quantified in high-fat diet compared to normal-fat diet? Lastly, how does one measure "arrhythmia" when there is so much heart rate variability in normal intact flies?

      We also observed that fly heart rate is highly variable with periods of fast and slow rates. To control heart rate variability, Ocorr et al. used semi-intact flies to record the heartbeat  (https://doi.org/10.1073/pnas.0609278104). We consider it a rigorous method to get highly consistent results with high quality videos/images. Since our work has a focus on the neuronal inputs to the heart, we did not use the semi-intact method. Our concern is that it is likely to disrupt the neuronal processes during the dissection. Using OCT, we recorded the heartbeat of intact flies in an 8 s time window, when the heartbeat was relatively stable. The different groups of flies, which were fed on a high-fat diet or a normal-fat diet, were recorded using the same method. Thus, we could compare the differences in heart rate.

      (3) The authors state, "to test whether the HFD-induced increase in Akh in the APC affects APC neuron activity, we used CaLexA (https://doi.org/10.3109/01677063.2011.642910)." According to the reference, CaLexA is a tool to map active neurons and would not indicate, as the authors state, whether Akh affects APC neuron activity specifically. It is equally possible that APC neurons may be activated by HFD and produce more Akh. Please clarify this language.

      Thank you for clarifying the calcium reporter, CaLexA. We rephrased this sentence to “to test whether HFD affects APC neuron activity, we used CaLexA”.

      (4) Are the AkhR+ neurons parasympathetic or sympathetic? Please provide additional experimentation that characterizes these neurons. The AkhR+ neurons appear to be anti-arrhythmic. Please expand the discussion to include a working hypothesis of the overall findings on Akh, AkhR, and AkhR+ neurons.

      Noyes et al. showed that Akh treatment increases heartbeat (Noyes, B. E., F. N. Katz, and M. H. Schaffer. 1995. “Identification and Expression of the Drosophila Adipokinetic Hormone Gene.” Molecular and Cellular Endocrinology 109 (2): 133–41.), suggesting that AkhR+ neurons are sympathetic. We showed that high-fat diet induced Akh expression and secretion, which led to stimulation of AkhR+ neuron and increased heart rate, supporting the sympathetic role of the AkhR+ neurons. Additional explanation on the sympathetic & anti-arrhythmic role of the Akh, AkhR, and AkhR+ neurons were added to the discussion.

      (5) The authors state, "Heart function is dependent on glucose as an energy source." However, the heart's main energy source is fatty acids with minimal use of glucose (doi: 10.1016/j.cbpa.2006.09.014). Glucose becomes more utilized by cardiomyocytes under heart failure conditions. Please amend/revise this statement.

      Thank you for pointing this out and providing the reference. We rephrased this sentence “Heart function is dependent on continuous ATP production. Cardiac ATP in Drosophila might come from fatty acids, glucose, and lactate (Kodde et al., 2007), as well as trehalose.”

      Reviewer #2 (Public Review):

      This manuscript explores mechanisms underlying heart contractility problems in metabolic disease using Drosophila as a model. They confirm, as others have demonstrated, that a high-fat diet (HFD) induces cardiac problems in flies. They showed that a high-fat diet increased Akh mRNA levels and calcium levels in the Akh-producing cells (APC), suggesting there is increased production and release of this hormone in a HFD context. When they knock down Akh production in the APCs using RNAi they see that cardiac contractility problems are abolished. They similarly show that levels of the Akh receptor (Akhr) are increased on a HFD and that loss of Akhr also rescues contractility problems on a HFD.

      One highlight of the paper was the identification of a pair of neurons that express a receptor for the metabolic hormone Akh, and showing initial data that these neurons innervate the cardiac muscle. They then overexpress cell death gene reaper (rpr) in all Akhr-positive cells with Akhr-GAL4 and see that cardiac contractility becomes abnormal.

      However, this paper contains several findings that have been reported elsewhere and it contains key flaws in both experimental design and data interpretation. There is some rationale for doing the experiments, and the data and images are of good quality. However, others have shown that HFD induces cardiac contractility problems (Birse 2010), that Akh mRNA levels are changed with HFD (Liao 2021) that Akh modulates cardiac rhythms (Noyes 1995), so Figures 1-4 are largely a confirmation of what is already known. This limits the overall magnitude of the advances presented in these figures. Overall, the stated concerns limit the impact of the manuscript in advancing our understanding of heart contractility.

      We thank the reviewer for the positive comments and appreciate the reviewer for the instructive suggestions. Birse 2010 (PMID: 21035763) was cited in our manuscript. Liao 2021 showed that Akh mRNA levels are changed with HFD. We added the reference to the revised manuscript and modified the text as: “In consistent with a previous work (Liao et al., 2020), we showed that the expression of Akh was significantly up-regulated in the flies fed a HFD, compared to NFD-fed flies (Figure 2B)”. Our qPCR verified Liao’s results. On top of this, we investigated the calcium levels in the Akh producing cells (APCs) and showed elevated calcium levels in the APC in HFD fed flies. In the revised version, we added more data to show that Akh protein levels were increased with HFD (Figure 2E-F). In line with Noyes' discovery, which showed that Akh injection caused cardioaccelation in prepupae, we showed that genetic manipulation of Akh expression affected heartbeat in the adults.   

      Reviewer #3 (Public Review):

      Zhao et al. provide new insights into the mechanism by which a high-fat diet (HFD) induces cardiac arrhythmia employing Drosophila as a model. HFD induces cardiac arrhythmia in both mammals and Drosophila. Both glucagon and its functional equivalent in Drosophila Akh are known to induce arrhythmia. The study demonstrates that Akh mRNA levels are increased by HFD and both Akh and its receptor are necessary for high-fat diet-induced cardiac arrhythmia, elucidating a novel link. Notably, Zhao et al. identify a pair of AKH receptor-expressing neurons located at the posterior of the heart tube. Interestingly, these neurons innervate the heart muscle and form synaptic connections, implying their roles in controlling the heart muscle. The study presented by Zhao et al. is intriguing, and the rigorous characterization of the AKH receptor-expressing neurons would significantly enhance our understanding of the molecular mechanism underlying HFD-induced cardiac arrhythmia.

      Many experiments presented in the manuscript are appropriate for supporting the conclusions while additional controls and precise quantifications should help strengthen the authors' augments. The key results obtained by loss of Akh (or AkhR) and genetic elimination of the identified AkhR-expressing cardiac neurons do not reconcile, complicating the overall interpretation.

      It is intriguing to see an increase in Akh mRNA levels in HFD-fed animals. This is a key result for linking HFD-induced arrhythmia to Akh. Thus, demonstrating that HFD also increases the Akh protein levels and Akh is secreted more should significantly strengthen the manuscript.

      Thank you for the positive comments and the instructive suggestions. We performed immunostaining to show that Akh protein levels increased, which is consistent with elevated Akh mRNA expression in HFD-fed flies. The data was added to Figure 2, panels E and F. Akh secretion from the APCs is regulated by APC activity (https://doi.org/10.1038/s41586-019-1675-4). We used a calcium reporter CaLexA (https://doi.org/10.3109/01677063.2011.642910) to monitor APC activity and showed that HFD increased APC activity (Figure 2, C-D).

      The experiments employing an AkhR null allele nicely demonstrate its requirement for HFD-induced cardiac arrhythmia. Depletion of Akh in Akh-expressing cells recapitulates the consequence of AkhR knockout, supporting that both Akh and its receptor are required for HFD-induced cardiac arrhythmia. Given that RNAi is associated with off-target effects and some RNAi reagents do not work, testing multiple independent RNAi lines is the standard procedure. It is also important to show the on-target effect of the RNAi reagents used in the study.

      Indeed, RNAi approaches can suffer from off-target effects. For Akh experiments, we used an RNAi line BL_34960, which was generated using artificial microRNAs shRNA (DOI: 10.1038/nmeth.1592). In comparison to long-hairpin constructs, shRNA constructs are expected to be advantageous, e.g., more efficient and minimized off-target. We performed immunostaining to determine Akh-Gal4>UAS-Akh-RNAi efficiency. We showed that anti-Akh fluorescence diminished in Akh-Gal4>UAS-Akh-RNAi APCs. The data was added to Figure 3-figure supplement 1.

      The most exciting result is the identification of AkhR-expressing neurons located at the posterior part of the heart tube (ACNs). The authors attempted to determine the function of ACNs by expressing rpr with AkhR-GAL4, which would induce cell death in all AkhR-expressing cells, including ACNs. The experiments presented in Figure 6 are not straightforward to interpret. Moreover, the conclusion contradicts the main hypothesis that elevated Akh is the basis of HFD-induced arrhythmia. The results suggest the importance of AkhR-expressing cells for normal heartbeat. However, elimination of Akh or AkhR restores normal rhythm in HFD-fed animals, suggesting that Akh and AkhR are not important for maintaining normal rhythms. If Akh signaling in ACNs is key for HFD-induced arrhythmia, genetic elimination of ACNs should unalter rhythm and rescue the HFD-induced arrhythmia. An important caveat is that the experiments do not test the specific role of ACNs. ACNs should be just a small part of the cells expressing AkhR. The experiments presented in Figure 6 cannot justify the authors' conclusion. Specific manipulation of ACNs will significantly improve the study. Moreover, the main hypothesis suggests that HFD may alter the activity of ACNs in a manner dependent on Akh and AkhR. Testing how HFD changes calcium, possibly by CaLexA (Figure 2) and/or GCaMP, in wild-type and AkhR mutants could be a way to connect ACNs to HFD-induced arrhythmia. Moreover, optogenetic manipulation of ACNs will allow for specific manipulation of ACNs, which is crucial for studying the specific role of ACNs in controlling cardiac rhythms.

      Thank you for the insightful comments. We have been trying to find a way to only target the AkhR neurons using split-Gal4. Up to now, it’s not successful. Akh/AkhR signaling shall play a key role in the ACNs, however, we cannot rule out the possibility that ACNs also receive signals other than Akh in the modulation of heartbeat.

      Interestingly, expressing rpr with AkhR-GAL4 was insufficient to eliminate both ACNs. It is not clear why it didn't eliminate both ACNs. Given the incomplete penetrance, appropriate quantifications should be helpful. Additionally, the impact on other AhkR-expressing cells should be assessed. Adding more copies of UAS-rpr, AkhR-GAL4, or both may eliminate all ACNs and other AkhR-expressing cells. The authors could also try UAS-hid instead of UAS-rpr.

      We added more data to show that AkhR+ neurons are positive in anti-Akh staining, indicating the AkhR+ neurons indeed receive Akh.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Typo in line 765: "increased Akh section into the circulation." Section should be secretion.

      Thank you for finding the typo. We changed section to secretion.

      Reviewer #2 (Recommendations For The Authors):

      One interesting extension to our knowledge in Figures 3 & 4 is that loss of Akhr and loss of Akh both block the cardiac contractility defects that accompany a HFD. The main concern I have with the Akh finding is that the authors use only a GAL4 control and no UAS alone control. Metabolic phenotypes often show strain-specific effects, so to make conclusions it is essential that the authors include a UAS alone control alongside the other genotypes to be sure it does not rescue the cardiac contractility defects that accompany a HFD by itself.

      I am interested in the authors' identification of a pair of Akhr-positive neurons that innervate the cardiac muscle. I am not aware of any other studies identifying these neurons, or revealing their function. The contents of Figure 5 therefore represent the largest advance in the study. However, the characterization of these neurons is very superficial, and a lot more work to understand their regulation and function in a HFD context is needed to make conclusions about their role in any HFD-induced cardiac contractility problems. Or to determine how Akh influences the function of these specific neurons in an HFD context.

      The reason I say this is that the authors ablate all Akhr-positive cells in Figure 6 and show that this disturbs normal cardiac contractility. While studies on the one pair of Akhr-positive neurons would be really interesting, ablating all Akhr-positive cells, which includes the fat and many other cell types in the fly, is not a scientifically rigorous approach to answering this question. As a result, the authors are only able to make the claim that ablating many cell types throughout the animal disrupts cardiac contractility, which does not advance our understanding of mechanisms underlying heart contractility problems. In addition, because the experiments they designed did not test whether it was Akh binding to Akhr on those neurons that regulate cardiac contractility problems in a HFD context, their experiments do not support their model in Figure 7.

      The authors also make conclusions that are fairly speculative around Line 231 when describing their model in Figure 7. These claims are simply not supported by the data they present and must be removed. For example, the authors have not identified an endocrine-heart axis, they simply showed that changes in Akh can influence the heart, but this is not necessarily a direct effect on a specific cell type. They do not show data that Akh binds the newly identified Akhr-positive neuron pair to mediate the effects of HFD-induced contractility defects - they just ablate all Akhr-positive cells (fat, neurons, and other types) and show cardiac defects. If those neurons did mediate the abnormal cardiac rhythm promoted by Akh, then ablating those neurons (and not a large number of additional tissues) should rescue HFD-induced heart defects just like reducing Akhr or Akh did (but this is the opposite of what they see). Overall, concerns with experimental design, data interpretation, and relatively few findings that aren't reported elsewhere reduce the impact of this paper.

      We appreciate the positive comments and helpful suggestions. Indeed, it is important to get clean genetic access to the cardiac neurons. We intended to use split Gal4 system to target the AkhR cardiac neurons. We have tried to build a split Gal4 driver AkhR-p65.AD. Two rounds of injection were carried out. However, we did not recover a transgenic line.

      In the revised version, we performed immunostaining using Akh antibodies to show that anti-Akh fluorescence was observed in AkhR neurons (Figure 5-figure supplement 1), indicating an endocrine-heart axis.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Duilio M. Potenza et al. explores the role of Arginase II in cardiac aging, majorly using whole-body arg-ii knock-out mice. In this work, the authors have found that Arg-II exerts non-cell-autonomous effects on aging cardiomyocytes, fibroblasts, and endothelial cells mediated by IL-1b from aging macrophages. The authors have used arg II KO mice and an in vitro culture system to study the role of Arg II. The authors have also reported the cell-autonomous effect of Arg-II through mitochondrial ROS in fibroblasts that contribute to cardiac aging. These findings are sufficiently novel in cardiac aging and provide interesting insights. While the phenotypic data seems strong, the mechanistic details are unclear. How Arg II regulates the IL-1b and modulates cardiac aging is still being determined. The authors still need to determine whether Arg II in fibroblasts and endothelial contributes to cardiac fibrosis and cell death. This study also lacks a comprehensive understanding of the pathways modulated by Arg II to regulate cardiac aging.

      We sincerely appreciate the valuable feedback provided by the reviewer. It's gratifying to hear that our work provided novel information on the role of arginase-II in cardiac aging which is a complex process involving various cell types and mechanisms. We have devoted considerable effort by performing new experiments to address the reviewer's comments and to delineate more detailed mechanisms of Arg-II in cardiac aging. Please, see below our specific answers to each point of the reviewers.

      Strengths:

      This study provides interesting information on the role of Arg II in cardiac aging.

      The phenotypic data in the arg II KO mice is convincing, and the authors have assessed most of the aging-related changes.

      The data is supported by an in vitro cell culture system.

      We appreciate this reviewer’s positive assessment on the strength of our study.

      Weaknesses:

      The manuscript needs more mechanistic details on how Arg II regulates IL-1b and modulates cardiac aging.

      We made great effort and have performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology. Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. We found that in the human THP1 monocytes in which Arg-II but not iNOS is induced by LPS (100 ng/mL for 24 hours) (Suppl. Fig. 6A), mRNA and protein levels of IL-1b precursor are markedly reduced in arg-ii knockout THP1<sup>arg-ii<sup>-/-</sup></sup> as compared to the THP1<sup>wt</sup> cells (Suppl. Fig. 6B and 6C), further confirming that Arg-II promotes IL-1b production as also shown in RAW264.7 macrophages (Suppl. Fig. 5A and 5C). Moreover, in the mouse bone-marrow-derived macrophages, LPS-induced IL-1b production is inhibited by inos deficiency (BMDM<sup>inos-/-</sup> vs BMDM<sup>wt</sup>) (Suppl. Fig. 6D and 6E), while Arg-II levels are slightly enhanced in the BMDM<sup>inos-/-</sup> cells (Suppl. Fig. 6D and 6F). All together, these results suggest that iNOS slightly reduces Arg-II expression. Arg-II and iNOS can be upregulated by LPS independently. Both Arg-II and iNOS are required for IL-1b production upon LPS stimulation as illustrated in Suppl. Fig. 6G. For detailed results and discussion, please see answers to the comments point 2 or point 6 raised by this reviewer.

      The authors used whole-body KO mice, and the role of macrophages in cardiac aging is not studied in this model. A macrophage-specific arg II Ko would be a better model.

      We fully agree with this comment of the reviewer. Unfortunately, this macrophage specific arg-ii knockout animal model is not available, yet. Future research shall develop the macrophage-specific arg-ii<sup>-/-</sup> mouse model to confirm this conclusion with aging animals. Since Arg-II is also expressed in fibroblasts and endothelial cells and exerts cell-autonomous and paracrine functions, aging mouse models with conditional arg-ii knockout in the specific cell types would be the next step to elucidate cell-specific function of Arg-II in cardiac aging. We have pointed out this aspect for future research on page 19, lines 2 to 6.

      Experiments need to validate the deficiency of Arg II in cardiomyocytes.

      As pointed out by this reviewer in the comment point 10, Arg-II was previously reported to be expressed in isolated cardiomyocytes from in rats (PMID: 16537391). Unfortunately, negative controls. i.e., arg-ii<sup>-/-</sup> samples were not included in the study to avoid any possible background signals. We made great effort to investigate whether Arg-II is present in the cardiomyocytes from different species including mice, rats and humans and have included old arg-ii<sup>-/-</sup> mouse samples as a negative control. This allows to validate the antibody specificity and background noises beyond any reasonable doubt. The new experiments in Suppl. Fig. 4 confirms the specificity of the antibody against Arg-II in old mouse kidney which is known to express Arg-II in the S3 proximal tubular cells (Huang J, et al. 2021). To exclude the possible species-specific different expression of Arg-II in the cardiomyocytes, aged mouse and rat heart tissues were used for cellular localization of Arg-II by confocal immunofluorescence staining. As shown in Suppl. Fig. 4B and 4C, both species show Arg-II expression only in non-cardiomyocytes (cells between striated cardiomyocytes) (red arrows) but not in striated cardiomyocytes. Even in the rat myocardial infarction tissues, Arg-II was not found in cardiomyocytes but in endocardium cells (Suppl. Fig. 4B). In isolated cardiomyocytes exposed to hypoxia, a well know strong stimulus for Arg-II protein levels, no Arg-II signals could be detected, while in fibroblasts from the same animals, an elevated Arg-II levels under hypoxia is demonstrated (Fig. 5B). Furthermore, even RT-qPCR could not detect arg-ii mRNA in cardiomyocytes but in non-cardiomyocytes (Fig. 5C). All together, these results demonstrate that Arg-II are not expressed or at negligible levels in cardiomyocytes but expressed in non-cardiomyocytes. This new experiments with rat heart are included in the method section on page 20, the 1st paragraph. The results are described on page 7, the 1st paragraph, and discussed on page 12, the 2nd paragraph. Legend to Suppl. Fig. 4 is included in the file “Suppl. figure legend_R”.

      The authors have never investigated the possibility of NO involvement in this mice model.

      As above mentioned, we made great effort and have performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology. Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. The results show that Arg-II and iNOS can be upregulated by LPS independent of each other and iNOS slightly reduces Arg-II expression. However, both Arg-II and iNOS are required for IL-1b production upon LPS stimulation. For detailed results and discussion, please see answers to the comments point 2 or point 6 raised by this reviewer.

      A co-culture system would be appropriate to understand the non-cell-autonomous functions of macrophages.

      We appreciate the suggestion by this reviewer regarding the co-culture system to test the non-cell autonomous role of Arg-II. We think that our current model, which involves treating cells with conditioned media, is a well-established and effective method for demonstrating the non-cell autonomous role of Arg-II. This approach allows us to observe the effects of Arg-II on surrounding cells through the factors present in the conditioned media released from macrophages. The co-culture system could be considered, if the released factor in the conditioned medium is not stable. This is however not the case. Therefore, we are confident that our experimental model with conditioned medium is sufficiently enough to demonstrate a paracrine effect of cell-cell interaction (please also see answers to the comment point 16.

      The Myocardial infarction data shown in the mice model may not be directly linked to cardiac aging.

      As we have introduced and discussed in the manuscript, aging is a predominant risk factor for cardiovascular disease (CVD). Studies in experimental animal models and in humans provide evidence demonstrating that aging heart is more vulnerable to stressors such as ischemia/reperfusion injury and myocardial infarction as compared to the heart of young individuals. Even in the heart of apparently healthy individuals of old age, chronic inflammation, cardiomyocyte senescence, cell apoptosis, interstitial/perivascular tissue fibrosis, endothelial dysfunction and endothelial-mesenchymal transition (EndMT), and cardiac dysfunction either with preserved or reduced ejection fraction rate are observed. Our study is aimed to investigate the role of Arg-II in cardiac aging phenotype and age-associated cardiac vulnerability to stressors. Therefore, cardiac functional changes and myocardial infarction in response to ischemia/reperfusion injury are suitable surrogate parameters for the purpose.

      Reviewer #2 (Public Review):

      Summary:

      The results from this study demonstrated a cell-specific role of mitochondrial enzyme arginase-II (Arg-II) in heart aging and revealed a non-cell-autonomous effect of Arg-II on cardiomyocytes, fibroblasts, and endothelial cells through the crosstalk with macrophages via inflammatory factors, such as by IL-1b, as well as a cell-autonomous effect of Arg-II through mtROS in fibroblasts contributing to cardiac aging phenotype. These findings highlight the significance of non-cardiomyocytes in the heart and bring new insights into the understanding of pathologies of cardiac aging. It also provides new evidence for the development of therapeutic strategies, such as targeting the ArgII activation in macrophages.

      We're grateful for the reviewer's positive feedback, acknowledging the significant findings of our study on the role of arginase-II (Arg-II) in cardiac aging. We appreciate this reviewer’s insight into the therapeutic potential of targeting Arg-II activation in macrophages and are excited about the implications for future interventions in age-related cardiac pathologies. Thank you for recognizing the importance of our work in advancing our understanding of cardiac aging and potential therapeutic strategies.

      Strengths:

      This study targets an important clinical challenge, and the results are interesting and innovative. The experimental design is rigorous, the results are solid, and the representation is clear. The conclusion is logical and justified.

      We thank this reviewer for the positive comment.

      Weaknesses:

      The discussion could be extended a little bit to improve the realm of the knowledge related to this study.

      We appreciate this comment and have added and revised our discussion on this aspect accordingly at the end of the discussion section on page 19.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have several critical concerns, specifically about the mechanism of how Arg-II plays a role in cardiac aging.

      My major concerns are:

      (1) The authors have shown non-cell-autonomous effects on aging cardiomyocytes, fibroblasts, and endothelial cells mediated by IL-1b from aging macrophages. A macrophage-specific Arg-II knock-out mouse model is a suitable and necessary control to establish claims.

      We fully agree with this comment of the reviewer. Unfortunately, this macrophage specific arg-ii knockout animal model is not available, yet. Future research shall develop the macrophage-specific arg-ii<sup>-/-</sup> mouse model to confirm this conclusion with aging animals. Since Arg-II is also expressed in fibroblasts and endothelial cells and exerts cell-autonomous and paracrine functions, aging mouse models with conditional arg-ii knockout in the specific cell types would be the next step to elucidate cell-specific function of Arg-II in cardiac aging. We have pointed out this aspect for future research on page 19, lines 2 to 6.

      (2) This study suggests that Arg-II exerts its effect through IL-1b in cardiac ageing. However, all experiments performed to demonstrate the link between ArgII and IL-1β are correlative at best. The underlying molecular mechanism, including transcription factors involved in the regulation of IL-1β by arg-ii, has not been demonstrated.

      We sincerely appreciate this reviewer’s comment on the aspect! To make it clear, a causal role of Arg-II in promoting IL-1β production in macrophages is evidenced by the experimental results showing that old arg-ii<sup>-/-</sup> mouse heart has lower IL-1β levels than the age-matched wt mouse heart (Fig. 6A to 6D). We further showed that the cellular IL-1β protein levels and release are reduced in old arg-ii<sup>-/-</sup> mouse splenic macrophages as compared to the wt cells (Fig. 7A, 7C, and 7D). This result is further confirmed with the mouse macrophage cell line RAW264.7 (Suppl. Fig. 5A and suppl. Fig. 5C), in which we demonstrate that silencing arg-ii reduces IL-1β levels stimulated with LPS.

      According to this reviewer’s comment (see comment point 6), we made further effort to investigate possible involvement of iNOS in Arg-II-regulated IL-1β production in macrophages stimulated with LPS. We performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology in the cells.

      Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. We found that in the human THP1 monocytes in which Arg-II but not iNOS is induced by LPS (100 ng/mL for 24 hours) (Suppl. Fig. 6A), mRNA and protein levels of IL-1b are markedly reduced in arg-ii knockout THP1<sup>arg-ii<sup>-/-</sup></sup> as compared to the THP1<sup>wt</sup> cells (Suppl. Fig. 6B and 6C), further confirming that Arg-II promotes IL-1b production as also shown in RAW264.7 macrophages (Suppl. Fig. 5A and 5C). The results suggest that Arg-II promotes IL-1b production independently of iNOS. Moreover, the role of iNOS in IL-1b production was also studied in the mouse bone-marrow-derived macrophages in which inos gene is ablated. The results demonstrate that LPS-induced IL-1b production is inhibited by inos deficiency (BMDM<sup>inos-/-</sup> vs BMDM<sup>wt</sup>) (Suppl. Fig. 6D and 6E), while Arg-II levels are slightly enhanced in the BMDM<sup>inos-/-</sup> cells (Suppl. Fig. 6D and 6F). Since arginase and iNOS share the same metabolic substrate L-arginine, <sup>inos-/-</sup> is expected to increase IL-1b production. This is however not the case. A strong inhibition of IL-1β production in <sup>inos-/-</sup> macrophages is observed. These results implicate that iNOS promotes IL-1β production independently of Arg-II and the inhibiting effect of IL-1β by inos deficiency is dominant and able to counteract Arg-II’s stimulating effect on IL-1β production. Hence, our results demonstrate that Arg-II promotes IL-1β production in macrophages independently of iNOS. All together, these results suggest that iNOS slightly reduces Arg-II expression. Arg-II and iNOS can be upregulated by LPS independently. Both Arg-II and iNOS are required for IL-1b production upon LPS stimulation (This concept is illustrated in the Suppl. Fig. 6G). The new results are described on page 8, the last paragraph and page 9, the 1st paragraph, presented in Suppl. Fig.6. The legend to Suppl. Fig. 6 is described in the file “Supplementary figure legend-R”. The related experimental methods are updated on page 23, the last two paragraphs and page 26 the last paragraph. The results are discussed o page 14, the last paragraph and page 15, the first two paragraphs.

      (3) Figure 2: The authors have not validated the whole-body Arg-II knock-out mice for arg-ii ablation.

      Thanks for pointing out this missing information! We have added the information regarding genotyping of the mice in the method section on page 20, first paragraph. Moreover, Fig. 5C also confirms the genotyping of the non-cardiomyocyte cells isolated from wt and arg-ii<sup>-/-</sup> animals.

      (4) It is unclear why the authors have chosen to focus on IL-1β specifically, among other pro-inflammatory cytokines that were also downregulated in Arg-II-/- mice as demonstrated in Fig. 2A-D.

      We appreciate the reviewer's question, which provides an opportunity to delve deeper into our findings. In our investigation, we observed that aging is accompanied by elevated levels of various proinflammatory markers. Intriguingly, our data revealed that tnf-α remained unaffected by the ablation of arg-ii during aging in the heart tissues, while Il-1β showed a significant reduction in arg-ii<sup>-/-</sup> animals compared to age-matched wild-type (wt) mice (Fig. 2). Mcp1 is however a chemoattractant for macrophages and F4-80 serves as a pan marker for macrophages. Moreover, our previous studies demonstrate a relationship between Arg-II and IL-1β in vascular disease and obesity and age-associated renal and pulmonary fibrosis. Finally, IL-1β has been shown to play a causal role in patients with coronary atherosclerotic heart disease as shown by CANTOS trials. Therefore, we have focused on IL-1β in this study. We have now explained and strengthened this aspect in the manuscript on page 7, the last two lines and page 8, the 1st paragraph as following:

      “Taking into account that our previous studies demonstrated a relationship of Arg-II and IL-1β in vascular disease and obesity (Ming et al., 2012) and in age-associated organ fibrosis such as renal and pulmonary fibrosis (Huang et al., 2021; Zhu et al., 2023), and IL-1β has been shown to play a causal role in patients with coronary atherosclerotic heart disease as shown by CANTOS trials (Ridker et al., 2017), we therefore focused on the role of IL-1β in crosstalk between macrophages and cardiac cells such as cardiomyocytes, fibroblasts and endothelial cells”.

      (5) Although macrophages are shown to be involved in cardiac ageing in the arg-ii mouse model, the authors have not estimated macrophage infiltration and expression of inflammatory or senescence markers in the hearts of these mice.

      Thank you very much for raising this important point! Taking the comments of the reviewer into account, we have performed new experiments, i.e., multiple immunofluorescent staining to analyze the infiltrated (CCR2<sup>+</sip>/F4-80<sup>+</sup>) and resident (LYVE1<sup>+</sup>/F4-80<sup>+</sup>) macrophage populations and to investigate to which extent that Arg-II affects the infiltrated and resident macrophage populations in the aging heart and whether this is regulated by arg-ii<sup>-/-</sup>. The results show an age-associated increase in the numbers of F4/80<sup>+</sup> cells in the wt mouse heart, which is reduced in the age-matched arg-ii<sup>-/-</sup> animals (Fig. 2G). This result is in accordance with the result of f4/80 gene expression shown in Fig. 2A, demonstrating that arg-ii gene ablation reduces macrophage accumulation in the aging heart. Interestingly, resident macrophages as characterized by LYVE1<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2E and 2H) are predominant in the aging heart as compared to the infiltrated CCR2<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2F and 2I). The increase in both LYVE1<sup>+</sup>/F4-80<sup>+</sup> and CCR2<sup>+</sup>/F4-80<sup>+</sup> macrophages in aging heart is reduced in arg-ii<sup>-/-</sup> mice (Fig. 2E, 2F, 2H, and 2I). These new results are described on page 6, the 1st paragraph, presented in Fig. 2E to 2I, and discussed on page 13, the 2nd, paragraph. The legend to Fig. 2 is revised. The method for this additional experiment is included on page 22, the 1st paragraph.

      Moreover, the aged-associated accumulation of the senescence cells as demonstrated by p16<sup>ink4</sup> positive cells is significantly reduced in arg-ii<sup>-/-</sup> animals. This new result is incorporated in the Fig. 1 as Fig. 1G and 1H and described / discussed on page 5, the 2nd paragraph and page 14, the 2nd last sentences of the 1st paragraph. The method of p16<sup>ink4</sup> staining is included in the method section on page 22, the 1st paragraph, line 7. The legend to Fig. 1 is revised accordingly.

      (6) Previously, Arg-II has been reported to serve a crucial role in ageing associated with reduced contractile function in rat hearts by regulating Nitric Oxide Synthase (PMID: 22160208). Elevated NO and superoxide have been shown to play crucial roles in the etiology of cardiovascular diseases (PMID: 24180388). Therefore, it is important to assess whether Nitric Oxide (NO) is involved in the aging-related phenotype in this mouse model.

      Following the reviewer's suggestion, we conducted new experiments to investigate the role of nitric oxide (NO) in the context of the effect of Arg-II-induced IL-1b production in macrophages. We have addressed this question in the response to the comment point 2.

      (7) Based on the results demonstrated in the study, ablation of Arg-II can be expected to cause a reduction in inflammation-associated phenotypes throughout the body at the multi-organ level. The observed improved cardiac phenotype could be an outcome of whole-body Arg-II ablation. It would be fruitful to develop a cardiac-specific Arg-II knockout mouse model to establish the role of Arg-II in the heart, independent of other organ systems.

      We agree with the comment of the reviewer on this point. Unfortunately, as explained above (see point 1), it is currently not possible for us to perform the requested experiments, due to lack of cardiac specific arg-ii-knockout mouse model. Moreover, such an approach is complicated by the absence of Arg-II in cardiomyocytes and the expression of Arg-II in multiple cells including endothelial cells, fibroblasts and macrophage of different origin (resident and monocyte-derived infiltrating cells). It’s thus difficult to generate a cardiac-specific gene knockout mouse. One shall investigate roles of cell-specific Arg-II in cardiac aging by generating cell-specific arg-ii<sup>-/-</sup> mice. We appreciate very this important aspect and have discussed issue on page 19, the lines 2 to 6.

      (8) Contrary to the findings in this paper, Arg-II has previously been reported to be essential for IL-10-mediated downregulation of pro-inflammatory cytokines, including IL-1β (PMID: 33674584).

      Thank you very much for mentioning this study! We have now discussed thoroughly the controversies as the following on page 15, the last paragraph and page 16, the 1st paragraph;

      “It is of note that a study reported that Arg-II is required for IL-10 mediated-inhibition of IL-1b in mouse BMDM upon LPS stimulation (Dowling et al., 2021), which suggests an anti-inflammatory function of Arg-II. The results of our present study, however, demonstrate that LPS enhances Arg-II and IL-1b levels in macrophages and knockout or silencing Arg-II reduces IL-1b production and release, demonstrating a pro-inflammatory effect of Arg-II. Our findings are supported by the study from another group, which shows decreased pro-inflammatory cytokine production including IL-6 and IL-1b in arg-ii<sup>-/-</sup> BMDM most likely through suppression of NFkB pathway, since arg-ii<sup>-/-</sup> BMDM reveals decreased activation of NFkB and IL-1b levels upon LPS stimulation (Uchida et al., 2023). Most importantly, our previous study also showed that re-introducing arg-ii gene back to the arg-ii<sup>-/-</sup> macrophages markedly enhances LPS-stimulated pro-inflammatory cytokine production (Ming et al., 2012), providing further evidence for a pro-inflammatory role of arg-ii under LPS stimulation. In support of this conclusion, chronic inflammatory diseases such as atherosclerosis and type 2 diabetes (Ming et al., 2012), inflammaging in lung (Zhu et al., 2023), kidney (Huang et al., 2021) and pancreas (Xiong, Yepuri, Necetin, et al., 2017) of aged animals or acute organ injury such as acute ischemic/reperfusion or cisplatin-induced renal injury are reduced in the arg-ii<sup>-/-</sup> mice (Uchida et al., 2023). The discrepant findings between these studies and that with IL-10 may implicate dichotomous functions of Arg-II in macrophages, depending on the experimental context or conditions. Nevertheless, our results strongly implicate a pro-inflammatory role of Arg-II in macrophages in the inflammaging in aging heart”.

      (9) The authors have only performed immunofluorescence-based experiments to show fibrotic and apoptotic phenotypes throughout this study. To verify these findings, we suggest that they additionally perform RT-PCR or western blotting analysis for fibrotic markers and apoptotic markers.

      The fibrotic aspect was analyzed not only by microscopy but also by using a quantitative biochemical assay such as hydroxyproline content assessment. Hydroxyproline is a major component of collagen and largely restricted to collagen. Therefore, the measurement of hydroxyproline levels can be used as an indicator of collagen content as previous investigated in the lung (Zhu et al., 2023). We have also measured collagen genes expression by RT-qPCR as suggested by the reviewer and found an age-related decline of collagen mRNA expression levels in both wt and arg-ii<sup>-/-</sup> mice, suggesting that the age-associated cardiac fibrosis and prevention in arg-ii<sup>-/-</sup> mice is due to alterations of translational and/or post-translational regulations, including collagen synthesis and/or degradation. The results are in accordance with that reported by other studies published in the literature. We have pointed out this aspect on page 5, the 2nd paragraph:

      “The increased cardiac fibrosis in aging is however, associated with decreased mRNA levels of collagen-Ia (col-Ia) and collagen-IIIa (col-IIIa), the major isoforms of pre-collagen in the heart (Suppl. Fig. 2A and 2B), which is a well-known phenomenon in cardiac fibrotic remodelling (Besse et al., 1994; Horn et al., 2016). The results demonstrate that age-associated cardiac fibrosis and prevention in arg-ii<sup>-/-</sup> mice is due to alterations of translational and/or post-translational regulations including collagen synthesis and/or degradation”.

      The results are presented in Suppl. Fig. 2, legend to Suppl. Fig. 2 is included in the file “Suppl. figure legend_R”. Suppl. table 2 for primers is revised accordingly.

      We did not use additional markers to perform apoptotic assays with whole heart, since Fig. 3 shows good evidence that the aging is associated with increased apoptotic cells in the heart and significantly reduced in the arg-ii<sup>-/-</sup> mice. The reduction of TUNEL positive (apoptotic) cells in aged arg-ii<sup>-/-</sup> mice is mainly due to decrease in apoptotic cardiomyocytes. With the histological analysis, the apoptotic cell types can be well analysed. Moreover, biochemical assay for apoptosis such as caspase-3 cleavage with whole heart tissues can not distinguish apoptotic cell types and may not be sensitive enough for aging heart, due to relatively low numbers of apoptotic cells in aging heart as compared to myocardial infarct model.  

      (10) Figure 4: arg-ii has previously been reported to be expressed in rat cardiomyocytes (PMID: 16537391). We strongly suggest the authors verify the expression of Arg-II via immunostaining in isolated cardiomyocytes (using published protocols), and by using multiple different cardiomyocyte-specific markers for colocalization studies to prove the lack of arg-ii expression beyond a reasonable doubt.

      As pointed out by this reviewer, Arg-II was previously reported to be expressed in isolated cardiomyocytes from in rats (PMID: 16537391). Unfortunately, negative controls. i.e., arg-ii<sup>-/-</sup> samples were not included in the study to avoid any possible background signals. We made great effort to investigate whether Arg-II is present in the cardiomyocytes from different species including mice, rats and humans and have included old arg-ii<sup>-/-</sup> mouse samples as a negative control. This allows to validate the antibody specificity and background noises beyond any reasonable doubt. The new experiments in Suppl. Fig. 4 confirms the specificity of the antibody against Arg-II in old mouse kidney which is known to express Arg-II in the S3 proximal tubular cells (Huang J, et al. 2021). To exclude the possible species-specific different expression of Arg-II in the cardiomyocytes, aged mouse and rat heart tissues were used for cellular localization of Arg-II by confocal immunofluorescence staining. As shown in Suppl. Fig. 4B and 4C, both species show Arg-II expression only in non-cardiomyocytes (cells between striated cardiomyocytes) (red arrows) but not in striated cardiomyocytes. Even in the rat myocardial infarction tissues, Arg-II was not found in cardiomyocytes but in endocardium cells (Suppl. Fig. 4B). In isolated cardiomyocytes exposed to hypoxia, a well know strong stimulus for Arg-II protein levels, no Arg-II signals could be detected, while in fibroblasts from the same animals, an elevated Arg-II levels under hypoxia is demonstrated (Fig. 5B). Furthermore, RT-qPCR could not detect arg-ii mRNA in cardiomyocytes but in non-cardiomyocytes (Fig. 5C). All together, these results demonstrate that Arg-II are not expressed or at negligible levels in cardiomyocytes but expressed in non-cardiomyocytes. This new experiments with rat heart are included in the method section on page 20, the 1st paragraph. The results are described on page 7, the 1st paragraph, and discussed on page 12, the 2nd paragraph. Legend to Suppl. Fig. 4 is included in the file “Suppl. figure legend_R”.

      (11) Figure 6G: It may be worthwhile to supplement arg-ii<sup>-/-</sup> old cells with IL-1beta to see if there is an increase in TUNEL-positive cells.

      IL-1b is a well known pro-inflammatory cytokine that causes apoptosis in various cell types including cardiomyocytes (Shen Y., et al., Tex Heart Inst J. 2015;42:109–116. doi: 10.14503/THIJ-14-4254; Liu Z. et. al., Cardiovasc Diabetol 2015;14,125. doi: 10.1186/s12933-015-0288-y; Li. Z., et al., Sci Adv 2020;6:eaay0589. doi: 10.1126/sciadv.aay0589). We appreciate very much the interesting idea of this reviewer to investigate the apoptotic responses of cardiomyocytes from arg-ii<sup>-/-</sup> mice to IL-1b. We agree that it is possible that cardiomyocytes from wt from arg-ii<sup>-/-</sup> mice react differently to IL-1b, although the cardiomyocytes do not express Arg-II as demonstrated in our present study. If this is true, it must be due to non-cell autonomous effects of different aging microenvironment in the heart or epigenetic modulations of the myocytes. We found that this is a very interesting aspect and requires further extensive investigation. Since our current study focused on the effect of wt and arg-ii<sup>-/-</sup> macrophages on cardiomyocytes and non-cardiomyocytes, we prefer not to include this suggested aspect in our manuscript and would like to explore it in the following study.

      (12) Figures 4-9: It would be interesting to see if the effect of ArgII in cardiac ageing is gender-specific. It is recommended to include experimental data with male mice in addition to the results demonstrated in female mice.

      As pointed out in the manuscript, we have focused on female mice, because an age-associated increase in arg-ii expression is more pronounced in females than in males (Fig. 1A). As suggested by this reviewer, we performed additional experiments investigating effects of arg-ii deficiency in male mice during aging, focusing on pathophysiological outcomes of ischemia/reperfusion injury in ex vivo experiments. The ex vivo functional analytic experiments with Langendorff system were performed in aged male mice (see Suppl. Fig. 9). Following ischemia/reperfusion injury, wt male mice display reduced left ventricular developed pressure (LVDP), as well as the inotropic and lusitropic states (expressed as dP/dt max and dP/dt min, respectively). As previously reported (Murphy et al., 2007), we also found that old male mice are more prone to I/R injury than age-matched female animals. Specifically, 15 minutes of ischemia are enough to significantly affect the left ventricle contractile function in the male mice (Suppl. Fig. 9). As opposite, age-matched old female mice are relatively resistant to I/R injury, and at least 20 min of ischemia are necessary to induce a significant impairment of the contractile function (Fig. 10). Similar to females, the post I/R recovery of cardiac function is also significantly improved in the male arg-ii<sup>-/-</sup> mice as compared to age-matched wt animals. In addition to functional recovery, triphenyl tetrazolium chloride (TTC) staining (myocardial infarction) upon I/R-injury in males is significantly reduced in the age-matched male arg-ii<sup>-/-</sup> animals (Suppl. Fig. 9C and 9D). All together, these results reveal a role for Arg-II in heart function impairment during aging in both genders with a higher vulnerability to stress in the males. These new results are presented in Suppl. Fig. 9, described on page 10, the last paragraph and page 11. The results are discussed on page 18, the 2nd paragraph as following:

      “The fact that aged females have higher Arg-II but are more resistant to I/R injury seems contradictory to the detrimental effect of Arg-II in I/R injury. It is presumable that cardiac vulnerability to injuries stressors depends on multiple factors/mechanisms in aging. Other factors/mechanisms associated with sex may prevail and determine the higher sensitivity of male heart to I/R injury, which requires further investigation. Nevertheless, the results of our study show that Arg-II plays a role in cardiac I/R injury also in males”.

      The information on the experimental methods in the male animals is included on page 20, the last paragraph and page 21, the 1st paragraph. Legend to Suppl. Fig. 9 is included in the file “Suppl. figure legend_R”.

      (13) Figure 6G: cardiomyocytes from wild-type mice, when treated with macrophages, show 0% TUNEL-positive cells. Since it is unlikely to obtain no TUNEL staining in a cell population, there may be an experimental or analytical error.

      Now it is Fig. 7F and 7G. This is due to our specific experimental procedure. After tissue digestion, cardiomyocytes were plated on laminin-coated dishes. Laminin promotes the adhesion of survived cells. Following plating, we conducted a deep washing process to remove damaged and partially adherent cells. This step ensures that only well-shaped, viable, and strongly adherent cells remain as bioassay cells. These “healthy” cells are then selected for the experiments. the apoptotic cells are removed by washing out, reflecting the high viability of the bioassay cells. We have added this detailed information in the method section on page 24, the 2nd paragraph.

      (14) Figure 7J: Please assess whether arg-ii depletion also affects the mtROS phenotype.

      According to the suggestion of this reviewer, we performed new experiments which show that human cardiac fibroblasts (HCFs) exposed to hypoxia (1% O<sub>2</sub>, 48 hours), a known physiological trigger of Arg-II up-regulation, exhibit increased mtROS generation, which involves Arg-II (new Fig. 8M to 8P). We found that Arg-II protein level as well as mtROS (assessed by mitoSOX staining) were both enhanced, accompanied by increased levels of HIF1α (Fig 8M). Moreover, mito-TEMPO pre-incubation reduces mtROS, confirming the mitochondrial origin of the ROS. Silencing of arg-ii with rAd-mediated shRNA, significantly reduces mtROS levels demonstrating a role of Arg-II in the production of mitochondrial ROS in cardiac fibroblasts (Fig 8M to 8P). We have included these results on page 9, the last paragraph and discussed the results on page 17, the 1st paragraph. The related method is described on page 26, the 2nd paragraph. Legend to Fig. 8 is updated on page 32.

      (15) Figure 8A-E: The authors have treated human-origin endothelial cells with mice-origin macrophage-conditioned media. It would be more suitable to treat the endothelial cells with human-origin macrophage-conditioned media.

      We acknowledge the concern regarding the use of mouse-origin macrophage-conditioned media on human-origin endothelial cells. It is to note, the biological cross-reactivity of cytokines from one species on cells from a different species has been reported in the literature. It was observed that there is quite a strict threshold of 60% amino acid identity, above which cytokines tend to cross-react and statistically, cytokines would tend to cross-react more often as their % amino acid identity increases (Scheerlinck JPY. Functional and structural comparison of cytokines in different species. Vet Immunol Immunopathol. 1999; 72:39-44. https://doi.org/10.1016/S0165-2427(99)00115-4). Taking IL-1b as an example, the 17.5 kDa mature mouse and human IL-1b share 92% aa sequence identity, suggesting a high cross-reactivity. Indeed, human IL-1b has shown biological cross-reactivity in mouse cells (Ledesma E., et al. Interleukin-1 beta (IL-1β) induces tumor necrosis factor alpha (TNF-α) expression on mouse myeloid multipotent cell line 32D cl3 and inhibits their proliferation. Cytokine. 2004; 26:66-72. https://doi.org/10.1016/j.cyto.2003.12.009). Moreover, our results also support the reported cross-reactivity between human and mouse IL-1b. The CM from mouse macrophage indeed showed biological function in human endothelial cells. The observed effects of the conditioned media from aged wild-type macrophages on endothelial cells were specifically mediated through IL-1β. This conclusion is supported by our data showing that the upregulation induced by the conditioned media was significantly reduced by the addition of an IL-1β receptor blocker.

      (16) The co-culture system would be more interesting to test the non-cell autonomous role of Arg II.

      We appreciate the suggestion by this reviewer regarding the co-culture system to test the non-cell autonomous role of Arg-II. We believe that our current model, which involves treating cells with conditioned media, is a well-established and effective method for demonstrating the non-cell autonomous role of Arg-II. This approach allows us to observe the effects of Arg-II on surrounding cells through the factors present in the conditioned media. The co-culture system could be considered, if the released factor in the conditioned medium is not stable. This is however not the case. So we are confident that our experimental model with conditioned medium is good enough to demonstrate a paracrine effect of cell-cell interaction.

      Reviewer #2 (Recommendations For The Authors):

      Some minor comments may be considered to improve the realm of the knowledge related to this study.

      We appreciate this comment and have added and revised our discussion on this aspect accordingly at the end of the discussion section on page 19, the last 6 lines.

      (1) The current study showed strong evidence demonstrating the key role of cardiac macrophages in pathologies of cardiac aging, particularly, the macrophages (MФ) from the circulating blood (hematogenous). It is known that the heart is among the minority of organs in which substantial numbers of yolk-sac MФ persist in adulthood and play a crucial role in maintaining cardiac function. Thus, the adult mammalian heart contains two separate and discrete cardiac MФ subgroups, i.e., the resident MФs originated from yolk sac-derived progenitors and the hematogenous MФs recruited from circulating blood monocytes. These two subtypes of MФs may play distinctive roles in the aging heart and the response to cardiac injury. The author could extend the discussion on the possibility of the resident MФs in aging hearts, which could be further investigated in the future.

      We appreciate the suggestion and agree that it provides valuable insight into the study. Taking the comments of the reviewer 1 into account, we have performed new experiments, i.e., co- immunostaining to analyze the infiltrated (CCR2<sup>+</sup>/F4-80<sup>+</sup>) and resident (LYVE1<sup>+</sup>/F4-80<sup>+</sup>) macrophage populations and to investigate to which extent that Arg-II affects infiltrated and resident macrophage populations in the aging heart. We found that in line with the gene expression of f4/80, immunofluorescence staining reveals an age-associated increase in the numbers of F4/80<sup>+</sup> cells in the wt mouse heart, which is reduced in the age-matched arg-ii<sup>-/-</sup> animals (Fig. 2E, F, G), demonstrating that arg-ii gene ablation reduces macrophage accumulation in the aging heart. Interestingly, resident macrophages as characterized by LYVE1<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2E and 2H) are predominant in the aging heart as compared to the infiltrated CCR2<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2F and 2I). The increase in both LYVE1<sup>+</sup>/F4-80<sup>+</sup> and CCR2<sup>+</sup>/F4-80<sup>+</sup> macrophages in aging heart is reduced in arg-ii<sup>-/-</sup> mice (Fig. 2E, 2F, 2H, and 2I). These new results are described on page 6, the 1st paragraph, presented in Fig. 2E to 2I, and discussed on page 13, the 2nd, paragraph. The legend to Fig. 2 is revised. The method for this additional experiment is included on page 22, the 1st paragraph.

      (2) It would be beneficial to the readers if the author could provide some explanation about why ArgII could not be detected in VSMCs in the mouse heart and the species difference between humans and mice. In addition, the author may provide an assumption on the possibility that there may also be a cross-talk between macrophages and VSMCs in the aging heart. A little bit more explanation in the Discussion will be helpful.

      We acknowledge and appreciate the suggestion and have discussed these points on page 19 as the following:

      “In this context, another interesting aspect is the cross-talk between macrophages and vascular SMC in the aging heart. In our present study, we could not detect Arg-II in vascular SMC of mouse heart but in that of human heart. This could be due to the difference in species-specific Arg-II expression in the heart or related to the disease conditions in human heart which is harvested from patients with cardiovascular diseases. Indeed, in the apoe<sup>-/-</sup> mouse atherosclerosis model, aortic SMCs do express Arg-II (Xiong et al., 2013). It is interesting to note that rodents hardly develop atherosclerosis as compared to humans. Whether this could be partly contributed by the different expression of Arg-II in vascular SMC between rodents and humans requires further investigation. In our present study, the aspect of the cross-talk between macrophages and vascular SMC is not studied. Since the crosstalk between macrophages and vascular SMC has been implicated in the context of atherogenesis as reviewed (Gong et al., 2025), further work shall investigate whether Arg-II expressing macrophages could interact with vascular SMC in the coronary arteries in the heart and contribute to the development of coronary artery disease and/or vascular remodelling and the underlying mechanisms“.

      (3) Please clarify the arrows in Figure 9C that indicate the infarct area in each splicing section from one heart.

      The arrows in Figure 9C (now Fig. 10C) are indeed utilized to indicate the sections displaying the infarcted area within each splicing section from one heart. We have explained the arrow in the figure legend (now Fig. 10 and also new Suppl. Fig. 9).

    1. Author response:

      Our response aims to address the following:

      The lack of pleiotropy is an unconfirmable assumption of MR, and the addition of those models is therefore quite important, as this is a primary weakness of the MR approach. Given that concern, I read the sensitivity analyses using pleiotropy-robust models as the main result, and in that case, they can't test their hypotheses as these models do not show a BMI instrumental variable association. The other weakness, which might be remedied, is that the power of the tests here is not described. When a hypothesis is tested with an under-powered model, the apparent lack of association could be due to inadequate sample size rather than a true null. Typically, when a statistically significant association is reported, power concerns are discounted as long as the study is not so small as to create spurious findings. That is the case with their primary BMI instrumental variable model - they find an association so we can presume it was adequately powered. But the primary models they share are not the pleiotropy-robust methods MR-Egger, weighted median, and weighted mode. The tests for these models are null, and that could mean a couple of things: (1) the original primary significant association between the BMI genetic instrument was due to pleiotropy, and they therefore don't have a robust model to explore the effects of the tobacco genetic instrument. (2) The power for the sensitivity analysis models (the pleiotropy-robust methods) is inadequate, and the authors share no discussion about the relative power of the different MR approaches. If they do have adequate power, then again, there is no need to explore the tobacco instrument.

      We would like to highlight that post-hoc power calculations are often considered redundant since the statistical power estimated for an observed association is directly related to its p-value[1]. In other words, the uncertainty of the association is already reflected in its 95% confidence interval. However, we understand power calculations may still be of interest to the reader, so we will incorporate them in the revised manuscript.

      The reason we use inverse variance weighted (IVW) Mendelian randomization (MR) to obtain our main results rather than the pleiotropy-robust methods mentioned by the reviewer/editors (i.e., MR-Egger, weighted median and weighted mode) is that the former has greater statistical power than the latter[2]. Hence, instead of focussing on the statistical significance of the pleiotropy-robust analyses, we consider it is of more value to compare the consistency of the effect sizes and direction of the effect estimates across methods. Any evidence of such consistency increases our confidence in our main findings, since each method relies on different assumptions. As we cannot be sure about the presence and nature of horizontal pleiotropy, it is useful to compare results across methods even though they are not equally powered. It is true that our results for the genetically predicted effects of body mass index (BMI) on the risk of head and neck cancer (HNC) differ across methods. This is precisely what led us to question the validity of our main finding (suggesting a positive effect of BMI on HNC risk). We will clarify this in the discussion section of the revised manuscript as advised.

      We understand that the reviewer/editors are concerned that we do not have a robust model to explore the role of tobacco consumption in the link between BMI and HNC. However, we have a different perspective on the matter. If indeed, the main IVW finding for BMI and HNC is due to pleiotropy (since some of the pleiotropy-robust methods suggest conflicting results), then the IVW multivariable MR method is a way to explore the potential source of this bias[3]. We were particularly interested in exploring the role of smoking in the observed association because smoking and adiposity are known to influence each other [4-9] and share a genetic basis[10, 11].

      References:

      (1) Heinsberg LW, Weeks DE: Post hoc power is not informative. Genet Epidemiol 2022, 46(7):390-394.

      (2) Burgess S, Butterworth A, Thompson SG: Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol 2013, 37(7):658-665.

      (3) Burgess S, Davey Smith G, Davies NM, Dudbridge F, Gill D, Glymour MM, Hartwig FP, Kutalik Z, Holmes MV, Minelli C et al: Guidelines for performing Mendelian randomization investigations: update for summer 2023. Wellcome Open Res 2019, 4:186.

      (4) Morris RW, Taylor AE, Fluharty ME, Bjorngaard JH, Asvold BO, Elvestad Gabrielsen M, Campbell A, Marioni R, Kumari M, Korhonen T et al: Heavier smoking may lead to a relative increase in waist circumference: evidence for a causal relationship from a Mendelian randomisation meta-analysis. The CARTA consortium. BMJ Open 2015, 5(8):e008808.

      (5) Taylor AE, Morris RW, Fluharty ME, Bjorngaard JH, Asvold BO, Gabrielsen ME, Campbell A, Marioni R, Kumari M, Hallfors J et al: Stratification by smoking status reveals an association of CHRNA5-A3-B4 genotype with body mass index in never smokers. PLoS Genet 2014, 10(12):e1004799.

      (6) Taylor AE, Richmond RC, Palviainen T, Loukola A, Wootton RE, Kaprio J, Relton CL, Davey Smith G, Munafo MR: The effect of body mass index on smoking behaviour and nicotine metabolism: a Mendelian randomization study. Hum Mol Genet 2019, 28(8):1322-1330.

      (7) Asvold BO, Bjorngaard JH, Carslake D, Gabrielsen ME, Skorpen F, Smith GD, Romundstad PR: Causal associations of tobacco smoking with cardiovascular risk factors: a Mendelian randomization analysis of the HUNT Study in Norway. Int J Epidemiol 2014, 43(5):1458-1470.

      (8) Carreras-Torres R, Johansson M, Haycock PC, Relton CL, Davey Smith G, Brennan P, Martin RM: Role of obesity in smoking behaviour: Mendelian randomisation study in UK Biobank. BMJ 2018, 361:k1767.

      (9) Freathy RM, Kazeem GR, Morris RW, Johnson PC, Paternoster L, Ebrahim S, Hattersley AT, Hill A, Hingorani AD, Holst C et al: Genetic variation at CHRNA5-CHRNA3-CHRNB4 interacts with smoking status to influence body mass index. Int J Epidemiol 2011, 40(6):1617-1628.

      (10) Thorgeirsson TE, Gudbjartsson DF, Sulem P, Besenbacher S, Styrkarsdottir U, Thorleifsson G, Walters GB, Consortium TAG, Oxford GSKC, consortium E et al: A common biological basis of obesity and nicotine addiction. Transl Psychiatry 2013, 3(10):e308.

      (11) Wills AG, Hopfer C: Phenotypic and genetic relationship between BMI and cigarette smoking in a sample of UK adults. Addict Behav 2019, 89:98-103.

    1. Author response:

      The following is the authors’ response to the previous reviews

      In response to Reviewer #1, we have replaced the original images in Figure 1A with new immunofluorescence data showing matched DAPI staining density between control and AD patient samples. We also have updated the PINK1 staining images of mouse brain sections in Figure 1C to eliminate potential non-specific signals. These revisions provide clearer evidence supporting our conclusions about PINK1/pUb’s role in neurodegeneration.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this beautiful paper the authors examined the role and function of NR2F2 in testis development and more specifically on fetal Leydig cells development. It is well known by now that FLC are developed from an interstitial steroidogenic progenitors at around E12.5 and are crucial for testosterone and INSL3 production during embryonic development, which in turn shapes the internal and external genitalia of the male. Indeed, lack of testosterone or INSL3 are known to cause DSD as well as undescended testis, also termed as cryptorchidism. The authors first characterized the expression pattern of the NR2R2 protein during testis development and then used two cKO systems of NR2F2, namely the Wt1-creERT2 and the Nr5a1-cre to explore the phenotype of loss of NR2F2. They found in both cases that mice are presenting with undescended testis and major reduction in FLC numbers. They show that NR2F2 has no effect on the amount and expression of the progenitor cells but in its absence, there are less FLC and they are immature.

      The effect of NR2F2 is cell autonomous and does not seem to affect other signalling pathways implemented in Leydig cell development as the DHH, PDGFRA and the NOTCH pathway.

      Overall, this paper is excellent, very well written, fluent and clear. The data is well presented, and all the controls and statistics are in place. I think this paper will be of great interest to the field and paves the way for several interesting follow up studies as stated in the discussion

      Reviewer #2 (Public review):

      The major conclusion of the manuscript is expressed in the title: "NR2F2 is required in the embryonic testis for Fetal Leydig Cell development" and also at the end of the introduction and all along the result part. All the authors' assertions are supported by very clear and statistically validated results from ISH, IHC, precise cell counting and gene expression levels by qPCR. The authors used two different conditional Nr2f2 gene ablation systems that demonstrate the same effects at the FLC level. They also showed that the haplo-insufficiency of Wt1 in the first system (knock-in Wt1-cre-ERT2) aggravated the situation in FLC differentiation by disturbing the differentiation of Sertoli cells and their secretion of pro-FLC factors, which had a confounding effect and encouraged them to use the second system. This demonstrates the great rigor with which the authors interpreted the results. In conclusion, all authors' claims and conclusions are justified by their high-quality results.

      Recommendations for the authors:

      We thank the reviewers for their comments which have improved and strengthened our manuscript. Please see our responses to specific comments below in blue.

      Reviewer #1 (Recommendations for the authors):

      I have several small comments:

      (1) There has been recently a preprint from the Yao lab about the role of NR2F2 is steroidogenic cells (https://www.biorxiv.org/content/10.1101/2024.09.16.613312v1). They performed cKO of NR2F2 using the Wt1creERT2 and found similar results. You should present and discuss this paper in light of your results.

      Estermann et al., report a very similar phenotype of FLC hypoplasia in an independent mouse model of Nr2f2 conditional mutation. We have now referred to this article in the discussion of our manuscript as suggested.

      (2) In the introduction I think it is important to mention that the steroidogenic progenitors are derived from Wnt5a positive cells (https://pubmed.ncbi.nlm.nih.gov/35705036/).

      We have mentioned this point in the introduction as suggested.

      (3) In both models you show a decrease in the number of FLC (60% or 40%) and yet they both present with undescended testis. It is important to discuss the fact that there is no need for a complete ablation of testosterone and INSL3 in order to get cryptorchidism.

      We have mentioned this point in the discussion as suggested.

      The fact that you get only partial reduction in FLC is likely due to redundancy with additional factors, possibly the ARX like you stated in the discussion and it will be interesting to explore that in the future but is beyond the scope of the current paper.

      We agree with the reviewer, this question could be addressed by analyzing Arx,Nr2f2 double mutants.

      (4) In page 8 line 11 you mention data not shown- not sure if this is allowed in the journal .

      The data is now shown in Figure S5A as suggested.

      (5) In Figure 2- it will be good if you add a schematic model of the mouse strains used as well as the experimental and control mice next to the Tam scheme. Similar scheme should be in figure 3 for Nr5a1-cre.

      We have modified Figures 2 and 3 as suggested.

      (6) There is a clear and pronounced effect of the testis cords number and size. It will be good if you could qualify testis cord numbers/ diameter in the mutants even if you do not follow in detail the effect on Sertoli cells

      We have quantified testis cords numbers and area in E14.5 Control and Wt1<sup>CreERT2/+</sup>; Nr2f2<sup>flox/flox</sup> testes. This data is now shown in Figure S2M.

      (7) It will be good to present the undescended testis in the Wt1-cre model in figure 2 and not in the supp figure

      The data is now shown in Figure 2H-I as suggested.

      (8) Please add labelling of the testis, kidney, bladder, vas deferens in figure 3 N+O and in the Wt1-cre model

      We have added the labels in Figures 2 and 3 as suggested.

      (9) In figure 5 which present both models- it will be good to use the scheme I suggested before to highlight which results refer to which ko model.

      We have modified Figure 5 as suggested.

      Reviewer #2 (Recommendations for the authors):  

      The work presented in this manuscript gave me food for thought. I have always been intrigued by the fact that of the large number of interstitial cells in the testis, a minority differentiate into mature androgen-producing Leydig cells. In other words, how is the number of functional steroidogenic cells defined from a large pool of progenitor cells (ARX and NR2F2 positive ones)? This may have a link with the levels of androgens produced (a kind of feedback control) or the effectiveness of these androgens on the target tissues (i.e.: as spermatogenesis efficiency in adults). In addition, there must be specific signals (probably linked to gonadotropins) that induce the recruitment of Leydig cells from the progenitor pool. Perhaps the genetic models generated in this study could help to address these questions. I leave it to the authors to judge.

      We agree with the reviewer. How NR2F2 (and other factors) integrate extrinsic cues to regulate the recruitment of a subset of interstitial steroidogenic progenitors along the Leydig cell differentiation pathway is a fascinating question beyond the scope of this work.

      In addition to this reflection, I propose a few minor modifications likely to improve the quality of the manuscript:

      (1) Page 3, lane 3: I suggest to replace "growth" by "differentiation"

      We have modified the text as suggested.

      (2) Page 3, lane 4: the "scrotum" is missing in the parenthesis. Please add it before "and penis"

      We have modified the text as suggested.

      (3) Page 5, lanes 21-24: kidney hypoplasia is also evident on Fig S2H (stated in the figure legend). It could be also mentioned in this sentence and it implies "...that NR2F2 function is required for testicular and kidney development."

      We have modified the text as suggested.

      (4) Page 5, lanes 28-30. In addition to the reduction in the number of HSD3B-positive cells, HSD3B staining seems clearly more faint in mutant FLC (Fig 2M) compared to adrenal cells on the same section or FLC in control gonads. This fits well with other results on the level of steroidogenic enzymes (Fig 2O) and those presented thereafter (Fig S4 I-J and Fig 5). Perhaps the author could mention this fact.

      We have modified the text as suggested in the results section “NR2F2 is required for FLC maturation” (Page 8).

      (5) Page 5, lanes 31-34: testicular descent is hugely sensible to INSL3 in the mouse (by contrast with other species where androgens seem to be more critical). I was wondering if you can check a better phenotypic marker for the absence (or reduction) of androgens like the differentiation of epididymides by HE staining or the anogenital distance at birth.

      We have measured the anogenital distance at P0 and P1 as suggested and have included the corresponding graph in Fig. S3P

      (6) Page 8, lanes 21-22: "HSD3B positive FLC were smaller and more elongated". It is clear on Fig 5F but not evident on Fig 5D. Could the authors propose another image?

      We have modified Figure 5 as suggested and provide now another example of HSD3B positive FLCs in a Nr5a1Cre; Nr2f2<sup>flox/flox</sup> mutant gonad (Fig. 5D) and the corresponding control littermate (Fig. 5C).

      (7) Page 14, lane 12: "(arrow in I)" should be "(arrow in H)"

      We have modified the text as suggested. Please note that ACTA 2 expression is now shown in Figure S2 G-H.

      (8) Page 15, lane 6: "Arrows indicate NR5A1 positive FLC". There is no arrow on Fig4 C,D; but a kind of scale bar on the enlargement shown in C.

      We have modified Figure 4 as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This paper provides a computational model of a synthetic task in which an agent needs to find a trajectory to a rewarding goal in a 2D-grid world, in which certain grid blocks incur a punishment. In a completely unrelated setup without explicit rewards, they then provide a model that explains data from an approach-avoidance experiment in which an agent needs to decide whether to approach or withdraw from, a jellyfish, in order to avoid a pain stimulus, with no explicit rewards. Both models include components that are labelled as Pavlovian; hence the authors argue that their data show that the brain uses a Pavlovian fear system in complex navigational and approach-avoid decisions.

      Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Lines 290-302):

      “When it comes to our experiments, both the simulation and VR experiment models are related and derived from the same theoretical framework maintaining an algebraic mapping. They differ only in task-specific adaptations i.e. differ in action sets and differ in temporal difference learning rules - multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task. This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. A further minor difference between the simulation and VR experiment models is the use of a baseline bias in the human experiment's RL and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in the grid world simulations. As mentioned previously, we use the grid world tasks for didactic purposes, similar to Dayan et al. [2006] and common to test-beds for algorithms in reinforcement learning [Sutton et al., 1998]. The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning, which span different aspects of the threat-imminence continuum [Mobbs et al., 2020].”

      In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling. 

      Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Line 303-313):

      “In our simulation experiments, we assume the coexistence of the Pavlovian fear system and the instrumental system to demonstrate the emergent safety-efficiency trade-off from their interaction. It is possible that similar behaviours could be modelled using an instrumental system alone, with higher punishment sensitivity, therefore we do not argue for the necessity for the Pavlovian fear system here. Instead, the Pavlovian fear system itself could be a potential biologically plausible implementation of punishment sensitivity. Unlike punishment sensitivity (scaling of the punishments), which has not been robustly mapped to neural substrates in fMRI studies; the neural substrates for the Pavlovian fear system are well known (e.g., the limbic loop and amygdala, further see Supplementary Fig. 16). Additionally, Pavlovian fear system provides a separate punishment memory that cannot be erased by greater rewards like [Elfwing and Seymour, 2017, Wang et al., 2018]. This fundamental point can be observed in our simple T-maze simulations, where the Pavlovian fear system encourages avoidance behaviour and the agent chooses the smaller reward instead of the greater reward.”

      In the second setup, an agent learns about punishments alone. "Pavlovian biases" have previously been demonstrated in this task (i.e. an overavoidance when the correct decision is to approach). The authors explore several models (all of which are dissimilar to the ones used in the first setup) to account for the Pavlovian biases. 

      Thanks to the reviewer’s comments, we have now added a paragraph in our Discussion section (Line 290-302) explaining the similarity of our models and their integrated interpretation. We hope this addresses the reviewer’s concerns.

      Strengths: 

      Overall, the modelling exercises are interesting and relevant and incrementally expand the space of existing models. 

      Weaknesses: 

      I find the conclusions misleading, as they are not supported by the data. 

      First, the similarity between the models used in the two setups appears to be more semantic than computational or biological. So it is unclear to me how the results can be integrated. 

      Thanks to the reviewer’s comments, we have now added a paragraph in our Discussion section (Line 290-302 onwards) explaining the similarity of our models and their integrated interpretation. We hope this addresses the reviewer’s concerns.

      Secondly, the authors do not show "a computational advantage to maintaining a specific fear memory during exploratory decision-making" (as they claim in the abstract). Making such a claim would require showing an advantage in the first place. For the first setup, the simulation results will likely be replicated by a simple Q-learning model when scaling up the loss incurred for punishments, in which case the more complex model architecture would not confer an advantage. The second setup, in contrast, is so excessively artificial that even if a particular model conferred an advantage here, this is highly unlikely to translate into any real-world advantage for a biological agent. The experimental setup was developed to demonstrate the existence of Pavlovian biases, but it is not designed to conclusively investigate how they come about. In a nutshell, who in their right mind would touch a stinging jellyfish 88 times in a short period of time, as the subjects do on average in this task? Furthermore, in which real-life environment does withdrawal from a jellyfish lead to a sting, as in this task? 

      Crucially, simplistic models such as the present ones can easily solve specifically designed lab tasks with low dimensionality but they will fail in higher-dimensional settings. Biological behaviour in the face of threat is utterly complex and goes far beyond simplistic fight-flight-freeze distinctions (Evans et al., 2019). It would take a leap of faith to assume that human decision-making can be broken down into oversimplified sub-tasks of this sort (and if that were the case, this would require a meta-controller arbitrating the systems for all the sub-tasks, and this meta-controller would then struggle with the dimensionality j). 

      Thanks to the reviewer’s comments, we have now mentioned this point in Lines 299-302.

      On the face of it, the VR task provides higher "ecological validity" than previous screen-based tasks. However, in fact, it is only the visual stimulation that differs from a standard screen-based task, whereas the action space is exactly the same. As such, the benefit of VR does not become apparent, and its full potential is foregone. 

      If the authors are convinced that their model can - then data from naturalistic approach-avoidance VR tasks is publicly available, e.g. (Sporrer et al., 2023), so this should be rather easy to prove or disprove. In summary, I am doubtful that the models have any relevance for real-life human decision-making. 

      Finally, the authors seem to make much broader claims that their models can solve safety-efficiency dilemmas. However, a combination of a Pavlovian bias and an instrumental learner (study 1) via a fixed linear weighting does not seem to be "safe" in any strict sense. This will lead to the agent making decisions leading to death when the promised reward is large enough (outside perhaps a very specific region of the parameter space). Would it not be more helpful to prune the decision tree according to a fixed threshold (Huys et al., 2012)? So, in a way, the model is useful for avoiding cumulatively excessive pain but not instantaneous destruction. As such, it is not clear what real-life situation is modelled here. 

      We hope our additions to the Discussion section, from Line 290 to Line 313 address the reviewer’s concerns.  

      A final caveat regarding Study 1 is the use of a PH associability term as a surrogate for uncertainty. The authors argue that this term provides a good fit to fear-conditioned SCR but that is only true in comparison to simpler RW-type models. Literature using a broader model space suggests that a formal account of uncertainty could fit this conditioned response even better (Tzovara et al., 2018). 

      We have now added a line discussing this. (Line 356-358)

      “Future work could also use a formal account of uncertainty which could fit the fear-conditioned skin-conductance response better than Pearce-Hall associability [Tzovara et al., 2018].”

      Reviewer #2 (Public review): 

      Summary: 

      The authors tested the efficiency of a model combining Pavlovian fear valuation and instrumental valuation. This model is amenable to many behavioral decision and learning setups - some of which have been or will be designed to test differences in patients with mental disorders (e.g., anxiety disorder, OCD, etc.). 

      Strengths: 

      (1) Simplicity of the model which can at the same time model rather complex environments. 

      (2) Introduction of a flexible omega parameter. 

      (3) Direct application to a rather advanced VR task. 

      (4) The paper is extremely well written. It was a joy to read. 

      Weaknesses: 

      Almost none! In very few cases, the explanations could be a bit better. 

      Thank you, we have added further explanations in the discussion section. We have further improved the writing in abstract, introduction and Methods section taking into account recommendations from reviewer #2 and #3.

      Reviewer #2 (Recommendations for the authors): 

      (1) Why is there no flexible omega in Figures 3B and 3C? Did I miss this? 

      Thank you. We have now added additional text to explain our motivation in Experiment 2, which only varies the fixed omega and omits the flexible omega (Lines 136-140).

      “In this set of results, we wish to qualitatively tease apart the role of a Pavlovian bias in shaping and sculpting the instrumental value and also provide more insight into the resulting safety-efficiency trade-off. Having shown the benefits of a flexible ω in the previous section, here we only vary the fixed ω to illustrate the effect of a constant bias and are not concerned with the flexible bias in this experiment.”

      We encourage the reader to consider this akin to an additional study that will explain how Pavlovian bias to withdraw can play a role in avoiding punishments similar to that of punishment sensitivity. This is particularly important as we do have neural correlates for Pavlovian biases but lack a clear neural correlation for punishment sensitivity so far, as mentioned in our new additions to the Discussion section (Lines 303-313).

      (2) The introduction of the flexible omega and the PAL agent in the results is a bit sudden. Some more details are needed to understand this during the first read of this passage. 

      We thank reviewer #2 for bringing this to our notice. We have attempted to refine our passage by including sentences like - 

      “The standard (rational) reinforcement learning system is modelled as the instrumental learning system. The additional Pavlovian fear system biases the withdrawal actions to aid in safe exploration, in line with our hypothesis.”

      “Both systems learn using a basic temporal difference updating rule (or in instances, its special case, the Rescorla-Wagner rule)”

      “We implement the flexible ω using Pearce-Hall associability (see equation 15 in Methods). The Pearce-Hall associability maintains a running average of absolute temporal difference errors (δ) as per equation 14. This acts as a crude but easy-to-compute metric for outcome uncertainty which gates the influence of the Pavlovian fear system, in line with our hypothesis. This implies that higher the outcome uncertainty, as is the case in early exploration, the more cautious our agent will be, resulting in safer exploration”

      (3) In my view, the possibility of modeling moving predators is extremely interesting. I would include Figure 8D and the corresponding explanation in the main text. 

      Response with revision: We thank the reviewer for finding our simulation on moving predators extremely interesting. Unfortunately, since our instrumental system is not model-based, and especially is not explicitly modelling the predator dynamics, our simulation might not be a very accurate representation of real moving predator environments. As pointed out by Reviewer #1, perhaps several other systems other than Pavlovian fear responses are necessary for safe behaviour in such environments and we hope to address these in future studies. Thanks again for taking an interest in our simulations.

      (4) The VR experiment should be mentioned more clearly in the abstract and the introduction. It should be mentioned a bit more clearly why VR was helpful and why the authors did not use a simple bird's eye grid world task. 

      I cannot assess the RLDDM and I did not check the code. 

      Thank you, we have now mentioned the VR experiment more clearly in the abstract and the introduction. We also now further mention that the VR experiment “builds upon previous Go-No Go studies studying Pavlovian-Instrumental transfer (Guitart-Masip et al, 2012; Cavanagh et al, 2013). The virtual-reality approach confers a greater ecological validity and the immersive nature may contribute better fear conditioning, making it easier to distinguish the aversive components.”

      A bird’s eye grid world may not invoke a strong withdrawal response, as seen in these immersive approach-withdrawal tasks where we can clearly distinguish a Pavlovian fear-based withdrawal response. We did include immersive VR maze results in the supplementary materials, but future work is needed to isolate the different systems at play in such a complex behaviour.

      Reviewer #3 (Public review): 

      Summary: 

      This paper aims to address the problem of exploring potentially rewarding environments that contain the danger, based on the assumption that an independent Pavlovian fear learning system can help guide an agent during exploratory behaviour such that it avoids severe danger. This is important given that otherwise later gains seem to outweigh early threats, and agents may end up putting themselves in danger when it is advisable not to do so. 

      The authors develop a computational model of exploratory behaviour that accounts for both instrumental and Pavlovian influences, combining the two according to uncertainty in the rewards. The result is that Pavlovian avoidance has a greater influence when the agent is uncertain about rewards. 

      Strengths: 

      The study does a thorough job of testing this model using both simulations and data from human participants performing an avoidance task. Simulations demonstrate that the model can produce "safe" behaviour, where the agent may not necessarily achieve the highest possible reward but ensures that losses are limited. Interestingly, the model appears to describe human avoidance behaviour in a task that tests for Pavlovian avoidance influences better than a model that doesn't adapt the balance between Pavlovian and instrumental based on uncertainty. The methods are robust, and generally, there is little to criticise about the study. 

      Weaknesses: 

      The extent of the testing in human participants is fairly limited but goes far enough to demonstrate that the model can account for human behaviour in an exemplar task. There are, however, some elements of the model that are unrealistic (for example, the fact that pre-training is required to select actions with a Pavlovian bias would require the agent to explore the environment initially and encounter a vast amount of danger in order to learn how to avoid the danger later). The description of the models is also a little difficult to parse. 

      Thank you, we have now attempted to clarify these points in the Discussion section by adding the following text (Lines 313-321):

      “ We next discuss the plausibility of pre-training to select the hardwired actions In the human experiment, the withdrawal action is straightforwardly biased, as noted, while in the grid world, we assume a hardwired encoding of withdrawal actions for each state/grid. This innate encoding of withdrawal actions could be represented in the dPAG [Kim et al., 2013]. We implement this bias using pre-training, which we assume would be a product of evolution. Alternatively, this could be interpreted as deriving from an appropriate value initialization where the gradient over initialized values determines the action bias. Such aversive value initialization, driving avoidance of novel and threatening stimuli, has been observed in the tail of the striatum in mice, which is hypothesised to function as a Pavlovian fear/threat learning system [Menegas et al., 2018].”

      Reviewer #3 (Recommendations for the authors): 

      I have relatively little to suggest, as in my view the paper is robust, thorough, and creative, and does enough to support the primary argument being made at the most fundamental level. My suggestions for improvement are as follows: 

      (1) Some aspects of the model are potentially unrealistic (as described in the public review), and the paper may benefit from some discussion of these issues or attempts to make the model more realistic - i.e., to what extent is this plausible in explaining more complex avoidance behaviour? Primarily, the fact that pre-training is required to identify actions subject to Pavlovian bias seems unlikely to be effective in real-world situations - is there a better way to achieve this in cases where there isn't necessarily an instinctual Pavlovian response? 

      Thank you, we agree that the advantage of Pavlovian bias is restricted to the bias/instinctual Pavlovian response conferred by evolution. Future work is needed to model more complex avoidance behaviour such as escapes. We hope to have made this more clear with our edits to the Discussion (Lines 299-302) in our response to Reviewer #1’s comments, specifically:

      “The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning which span different aspects of the threat-imminence continuum [Mobbs et al., 2020]”  

      (2) The description of the model in the method can be a little hard to follow and would benefit from further explanation of certain parameters. In general, it would be good to ensure that all terms mentioned in equations are described clearly in the text (for example, in Equation1 it isn't clear what k refers to). 

      Thank you, we have now added further information on all of the parameters in Equation 1 and overall improved the Methods section writing, for instance using time subscript for less confusion while introducing the parameters. We use the standard notation used in Sutton and Barto textbook. k refers to the timesteps into the future, and is now explained better in the Methods section.

      (3) Another point of clarification in Equation 1 - does the policy account for the Pavlovian influence or is this purely instrumental? 

      Thank you, Equation 1 is purely instrumental. We have now specifically mentioned this. The Pavlovian influence follows later. They are combined into propensities for action as per equations 11-13.

      (4) I was curious whether similar outcomes could be achieved by more complex instrumental models without the need for Pavlovian influences. For example, could different risk-sensitive decision rules (e.g., conditional value at risk) that rely only on the instrumental system afford safe behaviour without the need for an additional Pavlovian system? 

      Thank you for your comment. Yes, CVaR can achieve safe exploration/cautious behaviour in choices similar to Pavlovian avoidance learning. But we think both differ in the following ways:

      (1) CVaR provides the correct solution to the wrong problem (objective that only maximises the lower tail of the distribution of outcomes)

      (2) Pavlovian bias provides the wrong solution to the right problem (normative objective, but a Pavlovian bias which may be vestige of evolution)

      Here we use the “wrong problem, wrong solution, wrong environment” categorisation terminology from Huys et al. 2015.

      Huys, Q. J., Guitart-Masip, M., Dolan, R. J., & Dayan, P. (2015). Decision-theoretic psychiatry. Clinical Psychological Science, 3(3), 400-421.

      Secondly, we find an effect of Pavlovian bias on reaction times - slowing down of approach responses and faster withdrawal responses. We do not think this can be best explained in a CVaR type model and is a direction for future work. We think such model-based methods are slower to compute, but Pavlovian withdrawal bias is quicker response.

      We have now included this in brief in Lines 280-288.

      (5) Figure 5 would benefit from a clearer caption as it is not necessarily clear from the current one that the left panels refer to choices and the right panels to reaction times. 

      Thank you, we have improved the caption for Fig. 5.

      (6) It would be good to include some indication of the quality of the model fits for the human behavioural study (i.e., diagnostics such as R-hat) to ensure that differences in model fit between models are not due to convergence issues with different models. This would be especially helpful for the RLDDM models as these can be difficult to fit successfully.

      Thank you, we observed that all Rhat values were strictly less than 1.05 (most parameters were less than 1.01 and generally close to 1), indicating that the models converged. We have now added this line to the results (Line 246-248). Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Lines 290-302): “When it comes to our experiments, both the simulation and VR experiment models are related and derived from the same theoretical framework maintaining an algebraic mapping. They differ only in task-specific adaptations i.e. differ in action sets and differ in temporal difference learning rules - multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task. This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. A further minor difference between the simulation and VR experiment models is the use of a baseline bias in the human experiment's RL and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in the grid world simulations. As mentioned previously, we use the grid world tasks for didactic purposes, similar to Dayan et al. [2006] and common to test-beds for algorithms in reinforcement learning [Sutton et al., 1998]. The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning, which span different aspects of the threat-imminence continuum [Mobbs et al., 2020].” In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Azlan et al. identified a novel maternal factor called Sakura that is required for proper oogenesis in Drosophila. They showed that Sakura is specifically expressed in the female germline cells. Consistent with its expression pattern, Sakura functioned autonomously in germline cells to ensure proper oogenesis. In Sakura KO flies, germline cells were lost during early oogenesis and often became tumorous before degenerating by apoptosis. In these tumorous germ cells, piRNA production was defective and many transposons were derepressed. Interestingly, Smad signaling, a critical signaling pathway for GSC maintenance, was abolished in sakura KO germline stem cells, resulting in ectopic expression of Bam in whole germline cells in the tumorous germline. A recent study reported that Bam acts together with the deubiquitinase Otu to stabilize Cyc A. In the absence of sakura, Cyc A was upregulated in tumorous germline cells in the germarium. Furthermore, the authors showed that Sakura co-immunoprecipitated Otu in ovarian extracts. A series of in vitro assays suggested that the Otu (1-339 aa) and Sakura (1-49 aa) are sufficient for their direct interaction. Finally, the authors demonstrated that the loss of otu phenocopies the loss of sakura, supporting their idea that Sakura plays a role in germ cell maintenance and differentiation through interaction with Otu during oogenesis.

      Strengths:

      To my knowledge, this is the first characterization of the role of CG14545 genes. Each experiment seems to be well-designed and adequately controlled.

      Weaknesses:

      However, the conclusions from each experiment are somewhat separate, and the functional relationships between Sakura's functions are not well established. In other words, although the loss of Sakura in the germline causes pleiotropic effects, the cause-and-effect relationships between the individual defects remain unclear.

      Reviewer #2 (Public review):

      In this study, the authors identified CG14545 (and named it Sakura), as a key gene essential for Drosophila oogenesis. Genetic analyses revealed that Sakura is vital for both oogenesis progression and ultimate female fertility, playing a central role in the renewal and differentiation of germ stem cells (GSC).

      The absence of Sakura disrupts the Dpp/BMP signaling pathway, resulting in abnormal bam gene expression, which impairs GSC differentiation and leads to GSC loss. Additionally, Sakura is critical for maintaining normal levels of piRNAs. Also, the authors convincingly demonstrate that Sakura physically interacts with Otu, identifying the specific domains necessary for this interaction, suggesting a cooperative role in germline regulation. Importantly, the loss of otu produces similar defects to those observed in Sakura mutants, highlighting their functional collaboration.

      The authors provide compelling evidence that Sakura is a critical regulator of germ cell fate, maintenance, and differentiation in Drosophila. This regulatory role is mediated through the modulation of pMad and Bam expression. However, the phenotypes observed in the germarium appear to stem from reduced pMad levels, which subsequently trigger premature and ectopic expression of Bam. This aberrant Bam expression could lead to increased CycA levels and altered transcriptional regulation, impacting piRNA expression. Given Sakura's role in pMad expression, it would be insightful to investigate whether overexpression of Mad or pMad could mitigate these phenotypic defects (UAS-Mad line is available at Bloomington Drosophila Stock Center).

      As suggested reviewer 1, we tested whether overexpression of Mad could rescue or mitigate the loss of sakura phenotypic defects, by using nos-Gal4-VP16 > UASp-Mad-GFP in the background of sakura<sup>null</sup>. As shown in Fig S11, we did not observe any mitigation of defects.

      Then, we also tested whether expressing a constitutive active form of Tkv, by using UAS-Dcr2, NGT-Gal4 > UASp-tkv.Q235D in the background of sakura<sup>RNAi</sup>. As shown in Fig S12, we did not observe any mitigation of defects by this approach either.

      A major concern is the overstated role of Sakura in regulating Orb. The data does not reveal mislocalized Orb; rather, a mislocalized oocyte and cytoskeletal breakdown, which may be secondary consequences of defects in oocyte polarity and structure rather than direct misregulation of Orb. The conclusion that Sakura is necessary for Orb localization is not supported by the data. Orb still localizes to the oocyte until about stage 6. In the later stage, it looks like the cytoskeleton is broken down and the oocyte is not positioned properly, however, there is still Orb localization in the ~8-stage egg chamber in the oocyte. This phenotype points towards a defect in the transport of Orb and possibly all other factors that need to localize to the oocyte due to cytoskeletal breakdown, not Orb regulation directly. While this result is very interesting it needs further evaluation on the underlying mechanism. For example, the decrease in E-cadherin levels leads to a similar phenotype and Bam is known to regulate E-cadherin expression. Is Bam expressed in these later knockdowns?

      We examined Bam and DE-Cadherin expression in later RNAi knockdowns driven by ToskGal4. As shown in Fig S9, Bam was not expressed in these later knockdowns compared with controls. DE-Cadherin staining suggested a disorganized structure in late-stage egg chambers.

      We agree that we overstated a role of Sakura in regulating Orb in the initial manuscript. We changed the text to avoid overstating.

      The manuscript would benefit from a more balanced interpretation of the data concerning Sakura's role in Orb regulation. Furthermore, a more expanded discussion on Sakura's potential role in pMad regulation is needed. For example, since Otu and Bam are involved in translational regulation, do the authors think that Mad is not translated and therefore it is the reason for less pMad? Currently the discussion presents just a summary of the results and not an extension of possible interpretation discussed in context of present literature.

      We changed the text to avoid overstating a role of Sakura in regulating Orb localization.

      Based on our newly added results showing that transgenic overexpression of Mad could not rescue or mitigate the phenotypic defects of sakura<sup>null</sup> mutant (Fig S11), we do not think the reason for less pMad is less translation of Mad.

      Reviewer #3 (Public review):

      In this very thorough study, the authors characterize the function of a novel Drosophila gene, which they name Sakura. They start with the observation that sakura expression is predicted to be highly enriched in the ovary and they generate an anti-sakura antibody, a line with a GFP-tagged sakura transgene, and a sakura null allele to investigate sakura localization and function directly. They confirm the prediction that it is primarily expressed in the ovary and, specifically, that it is expressed in germ cells, and find that about 2/3 of the mutants lack germ cells completely and the remaining have tumorous ovaries. Further investigation reveals that Sakura is required for piRNA-mediated repression of transposons in germ cells. They also find evidence that sakura is important for germ cell specification during development and germline stem cell maintenance during adulthood. However, despite the role of sakura in maintaining germline stem cells, they find that sakura mutant germ cells also fail to differentiate properly such that mutant germline stem cell clones have an increased number of "GSC-like" cells. They attribute this phenotype to a failure in the repression of Bam by dpp signaling. Lastly, they demonstrate that sakura physically interacts with otu and that sakura and otu mutants have similar germ cell phenotypes. Overall, this study helps to advance the field by providing a characterization of a novel gene that is required for oogenesis. The data are generally high-quality and the new lines and reagents they generated will be useful for the field. However, there are some weaknesses and I would recommend that they address the comments in the Recommendations for the authors section below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      General Comments:

      (1) The gene nomenclature: As mentioned in the text, Sakura means cherry blossom and is one of the national flowers of Japan. I am not sure whether the phenotype of the CG14545 mutant is related to Sakura or not. I would like to suggest the authors reconsider the naming.

      The striking phenotype of sakura mutant­ is tumorous and germless ovarioles. The tumorous phenotype, exhibiting lots of round fusome in germarium visualized by anti-Hts staining, looks like cherry blossom blooming to us. Also, the germless phenotype reminds us falling of the cherry blossom, especially considering that the ratio of tumorous phenotype decreases and that of germless decreases over fly age. Furthermore, “Sakura” symbolizes birth and renewal in Japanese culture (the last author of this manuscript is Japanese). Our findings indicated that the gene sakura is involved in regulation of renewal and differentiation of GSCs (which leads to birth). These are the reasons for the naming, which we would like to keep.

      (2) In many of the microscopic photographs in the figures, especially for the merged confocal images, the resolution looks low, and the images appear blurred, making it difficult to judge the authors' claims. Also, the Alpha Fold structure in Figure 10A requires higher contrast images. The magnification of the images is often inadequate (e.g. Figures 3A, 3B, 5E, 7A, etc). The authors should take high-magnification images separately for the germarium and several different stages of the egg chambers and lay out the figures.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      Specific Comments

      (1) How Sakura can cooperate with Otu remains unanswered. Sakura does not regulate deubiquitinase activity in vitro. Both sakura and otu appear to be involved in the Dpp-Smad signaling pathway and in the spatial control of Bam expression in the germarium, whereas Otu has been reported to act in concert with Bam to deubiquitinate and stabilize Cyc A for proper cystoblast differentiation. Therefore, it is plausible that the stabilization of Cyc A in the Sakura mutant is an indirect consequence of Bam misexpression and independent of the Sakura-Otu interaction. The authors may need to provide much deeper insight into the mechanism by which Sakura plays roles in these seemingly separable steps to orchestrate germ cell maintenance and differentiation during early oogenesis.

      Yes, it is possible that the stabilization of CycA in the sakura mutant is an indirect consequence of Bam misexpression and independent of the Sakura-Otu interaction. To test the significance and role of the Sakura-Otu interaction, we have attempted to identify Sakura point mutants that lose interaction with Otu. If such point mutants were successfully obtained, we were planning to test if their transgene expression could rescue the phenotypes of sakura mutant as the wild-type transgene did. However, after designing and testing the interaction of over 30 point mutants with Otu, we could not obtain such mutant version of Sakura yet. We will continue making efforts, but it is beyond the scope of the current study. We hope to address this important point in future studies.

      (2) Figure 3A and Figure 4: The authors show that piRNA production is abolished in Sakura KO ovaries. It is known that piRNA amplification (the ping-pong cycle) occurs in the Vasa-positive perinuclear nuage in nurse cells. Is the nuage normally formed in the absence of Sakura? The authors provide high-magnification images in the germarium expressing Vas-GFP. How does Sakura, and possibly Out, contribute to piRNA production? Are the defects a direct or indirect consequence of the loss of Sakura?

      We provided higher magnification images of germarium expressing Vasa-EGFP in sakura mutant background (Fig 3A and 3B). The nuage formation does not seem to be dysregulated in sakura mutant. Currently, we do not know if the piRNA defects are direct or indirect consequence of the loss of Sakura. This question cannot be answered easily. We hope to address this in future studies.

      (3) Figure 7 and Figure 12: The authors showed that Dpp-Smad signaling was abolished in Sakura KO germline cells. The same defects were also observed in otu mutant ovaries (Figure 12B). How does the Sakura-Otu axis contribute to the Dpp-Smad pathway in the germline?

      As we mentioned in the response to comment (1), we attempted to test the significance and role of the Sakura-Otu interaction, including in the Dpp-Smad pathway in the germline, but we have not yet been able to obtain loss-of-interaction mutant(s) of Sakura. We hope to address this in future studies.

      (4) Figure 9 and Fig 10: The authors raised antibodies against both Sakura and Otu, but their specificities were not provided. For Western blot data, the authors should provide whole gel images as source data files. Also, the authors argue that the Otu band they observed corresponds to the 98-kDa isoform (lines 302-304). The molecular weight on the Western blot alone would be insufficient to support this argument.

      When we submitted the initial manuscript, we also submitted original, uncropped, and unmodified whole Western blot images for all gel images to the eLife journal, as requested. We did the same for this revised submission. I believe eLife makes all those files available for downloading to readers.

      In the newly added Fig S13B, we used very young 2-5 hours ovaries and 3-7 days ovaries. 2-5 days ovaries contain only mostly pre-differentiated germ cells. Older ovaries (3-7 days in our case here) contain all 14 stages of oogenesis and later stages predominate in whole ovary lysates.

      As reported in previous literature (Sass et al. 1995), we detected a higher abundance of the 104 kDa Otu isoform than the 98 kDa isoform in from 2-5 hours ovaries and predominantly the 98 kDa isoform in 3-7 days ovaries (Fig S13B). These results confirmed that the major Otu isoform we detected in Western blot, all of which uses old ovaries except for the 2-5 hours ovaries in Fig S13B, is the 98 kDa isoform.

      (5) Otu has been reported to regulate ovo and Sxl in the female germline. Is Sakura involved in their regulation?

      We examined sxl alternative splicing pattern in sakura mutant ovaries. As shown in Fig S6, we detected the male-specific isoform of sxl RNA and a reduced level of the female-specific sxl isoform in sakura mutant ovaries. Thus Sakura seems to be involved in sxl splicing in the female germline, while further studies will be needed to understand whether Sakura has a direct or indirect role here.

      (6) Lines 443-447: The GSC loss phenotype in piwi mutant ovaries is thought to occur in a somatic cell-autonomous manner: both piwi-mutant germline clones and germline-specific piwi knockdown do not show the GSC-loss phenotype. In contrast, the authors provide compelling evidence that Sakura functions in the germline. Therefore, the Piwi-mediated GSC maintenance pathway is likely to be independent of the Sakura-Otu axis.

      We changed the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      Overall, this is a cleanly written manuscript, with some sentences/sections that are confusing the way they are constructed (i.e. Line 37-38, 334, section on Flp/FRT experiments).

      We rewrote those sections to avoid confusion.

      Comment for all merged image data: the quality of the merged images is very poor - the individual channels are better but should also be reprocessed for more resolved image data sets. Also, it would be helpful to have boundaries drawn in an individual panel to identify the regions of the germarium, as cartooned in Figure S1A (which should be brought into Figure 1) F-actin or Vsg staining would have helped throughout the manuscript to enhance the visualization of described phenotypes.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      We outlined the germarium in Fig 1E.

      We brought the former FigS1 into Fig 1A.

      We provided Phalloidin (F-Actin) staining images in Fig S7.

      All p-values seem off. I recommend running the data through the student t-test again.

      We used the student t-test to calculate p-values and confirmed that they are correct. We don’t understand why the reviewer thinks all p-values seem off.

      In the original manuscript, as we mentioned in each figure legends, we used asterisk (*) to indicate p-value <0.05, without distinguishing whether it’s <0.001, <0.01< or <0.05.

      Probably reviewer 2 is suggesting us to use ***, **, and *, to indicate p-value of <0.001, <0.01, and <0.05, respectively? If so, we now followed reviewer2’s suggestions.

      Figure 1

      (1) Within the text, C is mentioned before A.

      We updated the text and now we mentioned Fig 1A before Fig 1C.

      (2) B should be the supplemental figure.

      We moved the former Fig 1B to Supplemental Figure 1.

      (3) C - How were the different egg chamber stages selected in the WB? Naming them 'oocytes' is deceiving. Recommend labeling them as 'egg chambers', since an oocyte is claimed to be just the one-cell of that cyst.

      We changed the labeling to egg chambers.

      (4) Is the antibody not detecting Sakura in IF? There is no mention of this anywhere in the manuscript.

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain (which fully rescues sakura<sup>null</sup> phenotypes) to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies for IF.

      (5) Expand on the reliance of the sakura-EGFP fly line. Does this overexpression cause any phenotypes?

      sakura-EGFP does not cause any phenotypes in the background of sakura[+/+] and sakura[+/-].

      (6) Line 95 "as shown below" is not clear that it's referencing panel D.

      We now referenced Fig 1D.

      (7) Re: Figures 1 E and F. There is no mention of Hts or Vasa proteins in the text.<br /> "Sakura-EGFP was not expressed in somatic cells such as terminal filament, cap cells, escort cells, or follicle cells (Figure 1E). In the egg chamber, Sakura-EGFP was detected in the cytoplasm of nurse cells and was enriched in developing oocytes (Figure 1F)". Outline these areas or label these structures/sites in the images. The color of Merge labels is confusing as the blue is not easily seen.

      We mentioned Hts and Vasa in the text. We labeled the structures/sites in the images and updated the color labeling.

      Figure 2

      (1) Entire figure is not essential to be a main figure, but rather supplemental.

      We don’t agree with the reviewer. We think that the female fertility assay data, where sakura null mutant exhibits strikingly strong phenotype, which was completely rescued by our Sakura-EGFP transgene, is very important data and we would like to present them in a main figure.

      (2) 2A- one star (*) significance does not seem correct for the presented values between 0 and 100+.

      In the original manuscript, as we mentioned in each figure legends, we used asterisk (*) to indicate p-value <0.05, without distinguishing whether it’s <0.001, <0.01< or <0.05.

      Probably reviewer 2 is suggesting us to use ***, **, and *, to indicate p-value of <0.001, <0.01, and <0.05, respectively? If so, we now followed reviewer2’s suggestions.

      (3) 2C images are extremely low quality. Should be presented as bigger panels.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images. We also presented as bigger panels.

      Figure 3

      (1) "We observed that some sakura<sup>null</sup> /null ovarioles were devoid of germ cells ("germless"), while others retained germ cells (Fig 3A)" What is described is, that it is hard to see. Must have a zoomed-in panel.

      We provided zoomed-in panels in Fig 3B

      (2) C - The control doesn't seem to match. Must zoom in.

      We provided matched control and also zoomed in.

      (3) For clarity, separate the tumorous and germless images.

      In the new image, only one tumorous and one germless ovarioles are shown with clear labeling and outline, for clarity.

      (4) Use arrows to help clearly indicate the changes that occur. As they are presented, they are difficult to see.

      We updated all the panels to enhance clarity.

      (5) Line 158 seems like a strong statement since it could be indirect.

      We softened the statement.

      Figure 4

      (1) Line 188-189 - Conclusion is an overstatement.

      We softened the statement.

      (2) Is the piRNA reduction due to a change in transcription? Or a direct effect by Sakura?

      We do not know the answers to these questions. We hope to address these in future studies.

      Figure 5

      (1) D - It might make more sense if this graph showed % instead of the numbers.

      We did not understand the reviewer’s point. We think using numbers, not %, makes more sense.

      (2) Line 213 - explain why RNAi 2 was chosen when RNAi 1 looks stronger.

      Fly stock of RNAi line 2 is much healthier than RNAi line 1 (without being driven Gal4) for some reasons. We had a concern that the RNAi line 1 might contain an unwanted genetic background. We chose to use the RNAi 2 line to avoid such an issue.

      (3) In Line 218 there's an extra parenthesis after the PGC acronym.

      We corrected the error.

      (4) TOsk-Gal4 fly is not in the Methods section.

      We mentioned TOsk-Gal4 in the Methods.

      Figure 6:

      (1) The FLP-FRT section must be rewritten.

      We rewrote the FLP-FRT section.

      (2) A - include statistics.

      We included statistics using the chi-square test.

      (3) B - is not recalled in the Results text.

      We referred Fig 6B in the text.

      (4) Line 232 references Figure 3, but not a specific panel.

      We referred Fig 3A, 3C, 3D, and 3E, in the text.

      Figure 7/8 - can go to Supplemental.

      We moved Fig 8 to supplemental. However, we think Fig 7 data is important and therefore we would like to present them as a main figure.

      (1) There should be CycA expression in the control during the first 4 divisions.

      Yes, there is CycA expression observed in the control during the first 4 divisions, while it’s much weaker than in sakura<sup>null</sup> clone.

      (2) Helpful to add the dotted lines to delineate (A) as well.

      We added a dotted outline for germarium in Fig 7A.

      (3) Line 263 CycA is miswritten as CyA.

      We corrected the typo.

      Figure 9

      (1) Otu antibody control?

      We validated Otu antibody in newly added Fig 10C and Fig S13A.

      (2) Which Sakura-EGFP line was used? sakura het. or null background? This isn't mentioned in the text, nor legend.

      We used Sakura-EGFP in the background of sakura[+/+]. We added this information in the methods and figure legend.

      (3) C - Why the switch to S2 cells? Not able to use the Otu antibody in the IP of ovaries?

      We can use the Otu antibody in the IP of ovaries. However, in anti-Sakura Western after anti-Otu IP, antibody light chain bands of the Otu antibodies overlap with the Sakura band. Therefore, we switched to S2 cells to avoid this issue by using an epitope tag.

      Figure 10

      (1) A- The resolution of images of the ribbon protein structure is poor.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      (2) A table summarizing the interactions between domains would help bring clarity to the data presented.

      We added a table summarizing the fragment interaction results.

      (3) Some images would be nice here to show that the truncations no longer colocalize.

      We did not understand the reviewer’s points. In our study, even for the full-length proteins.

      We have not shown any colocalization of Sakura and Otu in S2 cells or in ovaries, except that they both are enriched in developing oocytes in egg chambers.

      Figure 12

      (1) A - control and RNAi lines do not match.

      We provided matched images.

      (2) In general, since for Sakura, only its binding to Otu was identified and since they phenocopy each other, doesn't most of the characterization of Sakura just look at Otu phenotypes? Does Sakura knockdown affect Otu localization or expression level (and vice versa)?

      We tested this by Western (Fig S15) and IF (Fig 12). Sakura knockdown did not decrease Otu protein level, and Otu knockdown did not decrease Sakura protein level (Fig S15). In sakura<sup>null</sup> clone, Otu level was not notably affected (Fig 12). In sakura<sup>null</sup> clone, Otu lost its localization to the posterior position within egg chambers.

      Figure S6

      (1) It is Luciferase, not Lucifarase.

      We corrected the typo.

      Reviewer #3 (Recommendations for the authors):

      (1) It is interesting that germless and tumorous phenotypes coexist in the same population of flies. Additional consideration of these essentially opposite phenotypes would significantly strengthen the study. For example, do they co-exist within the same fly and are the tumorous ovarioles present in newly eclosed flies or do they develop with age? The data in Figure 8 show that bam knockdown partially suppresses the germless phenotype. What effect does it have on the tumorous phenotype? Is transposon expression involved in either phenotype? Do Sakura mutant germline stem cell clones overgrow relative to wild-type cells in the same ovariole? Does sakura RNAi driven by NGT-Gal4 only cause germless ovaries or does it also cause tumorous phenotypes? What happens if the knockdown of Sakura is restricted to adulthood with a Gal80ts? It may not be necessary to answer all of these questions, but more insight into how these two phenotypes can be caused by loss of sakura would be helpful.

      We performed new experiments to answer these questions.

      do they co-exist within the same fly and are the tumorous ovarioles present in newly eclosed flies or do they develop with age?

      Tumorous and germless ovarioles coexist in the same fly (in the same ovary). Tumorous ovarioles are present in very young (0-1 day old) flies, including newly eclosed (Fig S5). The ratio of germless ovarioles increases and that of tumorous ovarioles decreases with age (Fig S5).

      The data in Figure 8 show that bam knockdown partially suppresses the germless phenotype. What effect does it have on the tumorous phenotype?

      bam knockdown effect on tumorous phenotype is shown in Fig S10. bam knockdown increased the ratio of tumorous ovarioles and the number of GSC-like cells.

      Is transposon expression involved in either phenotype?

      Since our transposon-piRNA reporter uses germline-specific nos promoter, it is expressed only in germ line cells, so we cannot examine in germless ovarioles.

      Do Sakura mutant germline stem cell clones overgrow relative to wild-type cells in the same ovariole?

      Yes, Sakura mutant GSC clones overgrow. Please compare Fig 6C and Fig S8.

      Does sakura RNAi driven by NGT-Gal4 only cause germless ovaries or does it also cause tumorous phenotypes?

      Fig S10 and Fig S12 show the ovariole phenotypes of sakura RNAi driven by NGT-Gal4. It causes both germless and tumorous phenotypes.

      What happens if the knockdown of Sakura is restricted to adulthood with a Gal80ts?

      Our mosaic clone was induced at the adult stage, so we already have data of adulthood-specific loss of function. Gal80ts does not work well with nos-Gal4.

      (2) The idea that the excessive bam expression in tumorous ovaries is due to a failure of bam repression by dpp signaling is not well-supported by the data. Dpp signaling is activated in a very narrow region immediately adjacent to the niche but the images in Figure 7A show bam expression in cells that are very far away from the niche. Thus, it seems more likely to be due to a failure to turn bam expression off at the 16-cell stage than to a failure to keep it off in the niche region. To determine whether bam repression in the niche region is impaired, it would be important to examine cells adjacent to the niche directly at a higher magnification than is shown in Figure 7A.

      We provided higher magnification images of cells adjacent to the niche in new Fig 7A.

      We found that cells adjacent to the niche also express Bam-GFP.

      That said, we agree with the reviewer. A failure to turn bam expression off at the 16-cell stage may be an additional or even a main cause of bam misexpression in sakura mutant. We added this in the Discussion.

      (3) In addition, several minor comments should be addressed:

      a. Does anti-Sakura work for immunofluorescence?

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies.

      b. Please provide insets to show the phenotypes indicated by the different color stars in Figure 3C more clearly.

      We provided new, higher-magnification images to show the phenotypes more clearly.

      c. Please indicate the frequency of the expression patterns shown in Figure 4D (do all ovarioles in each genotype show those patterns or is there variable penetrance?).

      We indicated the frequency.

      d. An image showing TOskGal4 driving a fluorophore should be provided so that readers can see which cells express Gal4 with this driver combination.

      It has been already done in the paper ElMaghraby et al, GENETICS, 2022, 220(1), iyab179, so we did not repeat the same experiment.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mallimadugula et al. combined Molecular Dynamics (MD) simulations, thiol-labeling experiments, and RNA-binding assays to study and compare the RNA-binding behavior of the Interferon Inhibitory Domain (IID) from Viral Protein 35 (VP35) of Zaire ebolavirus, Reston ebolavirus, and Marburg marburgvirus. Although the structures and sequences of these viruses are similar, the authors suggest that differences in RNA binding stem from variations in their intrinsic dynamics, particularly the opening of a cryptic pocket. More precisely, the dynamics of this pocket may influence whether the IID binds to RNA blunt ends or the RNA backbone.

      Overall, the authors present important findings to reveal how the intrinsic dynamics of proteins can influence their binding to molecules and, hence, their functions. They have used extensive biased simulations to characterize the opening of a pocket which was not clearly seen in experimental results - at least when the proteins were in their unbound forms. Biochemical assays further validated theoretical results and linked them to RNA binding modes. Thus, with the combination of biochemical assays and state-of-the-art Molecular Dynamics simulations, these results are clearly compelling.

      Strengths:

      The use of extensive Adaptive Sampling combined with biochemical assays clearly points to the opening of the Interferon Inhibitory Domain (IID) as a factor for RNA binding. This type of approach is especially useful to assess how protein dynamics can affect its function.

      Weaknesses:

      Although a connection between the cryptic pocket dynamics and RNA binding mode is proposed, the precise molecular mechanism linking pocket opening to RNA binding still remains unclear.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to determine whether a cryptic pocket in the VP35 protein of Zaire ebolavirus has a functional role in RNA binding and, by extension, in immune evasion. They sought to address whether this pocket could be an effective therapeutic target resistant to evolutionary evasion by studying its role in dsRNA binding among different filovirus VP35 homologs. Through simulations and experiments, they demonstrated that cryptic pocket dynamics modulate the RNA binding modes, directly influencing how VP35 variants block RIG-I and MDA5-mediated immune responses.

      The authors successfully achieved their aim, showing that the cryptic pocket is not a random structural feature but rather an allosteric regulator of dsRNA binding. Their results not only explain functional differences in VP35 homologs despite their structural similarity but also suggest that targeting this cryptic pocket may offer a viable strategy for drug development with reduced risk of resistance.

      This work represents a significant advance in the field of viral immunoevasion and therapeutic targeting of traditionally "undruggable" protein features. By demonstrating the functional relevance of cryptic pockets, the study challenges long-standing assumptions and provides a compelling basis for exploring new drug discovery strategies targeting these previously overlooked regions.

      Strengths:

      The combination of molecular simulations and experimental approaches is a major strength, enabling the authors to connect structural dynamics with functional outcomes. The use of homologous VP35 proteins from different filoviruses strengthens the study's generality, and the incorporation of point mutations adds mechanistic depth. Furthermore, the ability to reconcile functional differences that could not be explained by crystal structures alone highlights the utility of dynamic studies in uncovering hidden allosteric features.

      Weaknesses:

      While the methodology is robust, certain limitations should be acknowledged. For example, the study would benefit from a more detailed quantitative analysis of how specific mutations impact RNA binding and cryptic pocket dynamics, as this could provide greater mechanistic insight. This study would also benefit from providing a clear rationale for the selection of the amber03 force field and considering the inclusion of volume-based approaches for pocket analysis. Such revisions will strengthen the robustness and impact of the study.

      Reviewer #3 (Public review):

      Summary:

      The authors suggest a mechanism that explains the preference of viral protein 35 (VP35) homologs to bind the backbone of double-stranded RNA versus blunt ends. These preferences have a biological impact in terms of the ability of different viruses to escape the immune response of the host.

      The proposed mechanism involves the existence of a cryptic pocket, where VP35 binds the blunt ends of dsRNA when the cryptic pocket is closed and preferentially binds the RNA double-stranded backbone when the pocket is open.

      The authors performed MD simulation results, thiol labelling experiments, fluorescence polarization assays, as well as point mutations to support their hypothesis.

      Strengths:

      This is a genuinely interesting scientific question, which is approached through multiple complementary experiments as well as extensive MD simulations. Moreover, structural biology studies focused on RNA-protein interactions are particularly rare, highlighting the importance of further research in this area.

      Weaknesses:

      - Sequence similarity between Ebola-Zaire (94% similarity) explains their similar behaviour in simulations and experimental assays. Marburg instead is a more distant homolog (~80% similarity relative to Ebola/Zaire). This difference is sequence and structure can explain the propensities, without the need to involve the existence of a cryptic pocket.  

      - No real evidence for the presence of a cryptic pocket is presented, but rather a distance probability distribution between two residues obtained from extensive MD simulations. It would be interesting to characterise the modelled RNA-protein interface in more detail

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Before assessing the overall quality and significance of this work, this reviewer needs to specify the context of this review. This reviewer's expertise lies in biased and unbiased molecular dynamics simulations and structural biology. Hence, while this reviewer can overall understand the results for thiol-labeling and RNA-binding assays, this review will not assess the quality of these biochemical assays and will mainly focus on the modelling results.

      Overall, the authors present important findings to reveal how the intrinsic dynamics of proteins can influence their binding to molecules and, hence, their functions. They have used extensive biased simulations to characterize the opening of a pocket which was not clearly seen in experimental results - at least when the proteins were in their unbound forms. Biochemical assays further validated theoretical results and linked them to RNA binding modes. Thus, with the combination of biochemical assays and state-of-the-art Molecular Dynamics simulations, these results are clearly compelling.

      Beyond the clear qualities of this work, I would like to mention a few points that may help to better contextualize and rationalize the results presented here.

      - First, both the introduction and discussion sections seem relatively condensed. Extending them to, for example, better describe the methodological context and discuss the methodological limitations and potential future developments related to biased simulations may help the reader get a better idea of the significance of this work.

      - The authors presented 3 homologs in this study: IIDs of Reston, Zaire, and Marburg viruses. While Zaire and Reston are relatively similar in terms of sequence (Figure S1). The sequences clearly differ between Marburg and the two other viruses. Can the author indicate a similarity/identity score for each sequence alignment and extend Figure S1 to really compare Marburg sequence with Reston and Zaire? Can they also discuss how these differences may impact the comparison of the three IIDs? This may also help the reader to understand why sometimes the authors compare the three viruses and why sometimes they are focusing only on comparing Zaire and Reston.

      We would like to thank the reviewer for raising this point and we agree that additional details about the sequence comparison provide more context for the choices of substitutions we made. Therefore, we have updated Fig S1 to include a detailed pairwise comparison of all the IID sequences including the percentage sequence similarity and identity. We have also added the following sentences to the results section where we first introduced the substitutions between Zaire and Reston IIDs

      “While the sequence of Marburg IID differs significantly from Reston and Zaire IIDs with a sequence identity of 42% and 45% respectively (Fig S1), the sequences of Reston and Zaire IID are 88% identical and 94% similar. Particularly, substitutions between these homologs are all distal to the RNA-binding interfaces and all the residues known to make contacts with dsRNA from structural studies are identical. Therefore, we reasoned that comparing these two homologs would help us identify minimal substitutions that control pocket opening probability and allow us to study its effect on dsRNA binding with minimal perturbation of other factors.”

      - In this work, the authors mentioned the cryptic pocket but only illustrated the opening of this pocket by using a simple distance between residues (Figure 2) and a SASA of one cysteine (Figure 3). In previous work done by the authors (Cruz et al. , Nature Communications, 2022), they better characterized residues involved in RNA binding and forming the cryptic pocket. Thus, would it be possible to better described this cryptic pocket (residues involved, volume, etc ..) and better explain how, structurally speaking, it can affect RNA binding mode (blunt ends vs backbone) ?

      We thank the reviewer for pointing out the need for clarification on the residues involved in RNA binding and pocket opening and the mechanism linking them. We have performed the CARDS analysis on Reston and Marburg IID simulations as we had done on Zaire IID simulations in Cruz et al, 2022. The results are shown in Fig S3 and discussed in the main text in the first results section.

      - As a counter-example, the authors used C315 for SASA calculation and thiol labeling (Figure 3). This cysteine is mainly buried as seen by SASA for Reston and Marburg and thiol labelling (Figure 3 E,G,H). Would it be possible to also get thiol labeling rates for Cystein 264 in Reston and its equivalent to see a case where the residue is solvent exposed?

      We have shown the SASA for C264 from the simulations in Fig S4 and the thiol labeling rates for all 4 cysteines in Reston IID in Fig S6. Comparing these rates to the rates of all 4 cysteines obtained for Zaire IID (Fig 4 in Cruz et Al, 2022), we observe that the rates for C264, which is expected to be exposed are significantly faster than those of C315 which is largely buried in all variants.  

      - I strongly support here the will of the authors to share their data by depositing them in an OSF repository. These data help this reviewer to assess some of the results produced by the authors and help to better understand the dynamics of their respective systems. I have just a few comments that need to be addressed regarding these data: o While there are data for WT Reston and Marburg, there is no data for Zaire. Is this because these data correspond to the previous work (Cruz et al. 2022) (in this case, it would be good to make this clear in the main text) or is it an omission? o There is no center.xtc file in the Marburg-MSM directory o There is no protmasses.pdb in the Reston-MSM directory

      - In general, if possible, it would be good to use the same name for each type of file presented in each directory to help a potential user understand a bit more how to use these data.

      - If possible, adding a bit more of metadata and explanations on the OSF webpage would be very beneficial to help find these data. To help in this direction, the authors may have a look to the guidelines presented at the end of this article: https://elifesciences.org/articles/90061

      We thank the reviewer for pointing out the omissions from the OSF repository. We have added the missing files and followed a uniform naming convention. We have also added documentation in the metadata section of the OSF repository to help others use the data.  

      Indeed, the simulation data used for Zaire IID is available on the OSF repository corresponding to Cruz et al. 2022 at https://osf.io/5pg2a. We have also clarified this in the data availability section of the main text.  

      Minor point:

      In Figure 2, there is a slight bump for the 225-295 distance around 1 nm for Reston. Can the author comment it ? As these results are based on long AS, even if very small, do the authors think this population is significant?

      Comparing the probability distributions obtained from bootstrapping the frames used to calculate the MSM equilibrium probabilities (Revised Fig1), we observe that the bump for the Reston IID distribution is persistent in all bootstraps indicating that it might indeed be significant. This is also consistent with our observation that the cysteine 296 does get fully labeled in our thiol labeling experiments, albeit significantly slowly compared to the other homologs.  

      Reviewer #2 (Recommendations for the authors):

      I recommend that the authors implement moderate revisions prior to the publication of this research article, addressing the identified weaknesses (see below).

      The authors should provide a rationale for their selection of the amber03 force field (Duan et al., JCTC 24, 1999-2012, 2003) for molecular dynamics simulations, particularly given the availability of more recent and optimized versions of the AMBER force fields. These newer force fields may offer improved parameterization for biomolecular systems, potentially enhancing the accuracy and reliability of the simulation results.

      We chose the Amber03 force field because it has performed well in much of our past work, including the original prediction of the cryptic pocket that we study in this manuscript. The results presented in this manuscript also demonstrate the predictive power of Amber03.

      Additionally, while the authors utilized solvent-accessible surface area (SASA) for cryptic pocket analysis, volume-based approaches may be more suitable for this purpose. Several studies (e.g., Sztain et al. J. Chem. Inf. Model. 2021, 61, 7, 3495-3501) have demonstrated the utility of volume analysis in identifying and characterizing cryptic pockets. The authors could consider incorporating such methodologies to provide a more comprehensive assessment of pocket dynamics.

      The authors propose that the cryptic pocket is not merely a random structural feature but functions as an allosteric regulator of dsRNA binding. To further substantiate this claim, an in-depth analysis of this allosteric effect using for instance network analysis could significantly enhance the study. Such an approach could identify key residues and interaction networks within the protein that mediate the allosteric regulation. This type of mechanistic insight would not only provide a stronger theoretical framework but also offer valuable information for the rational design of therapeutic interventions targeting the cryptic pocket.  

      We thank the reviewer for pointing out the need for clarification on the molecular mechanism linking the opening of the cryptic pocket to RNA binding. We have performed the CARDS analysis on Reston and Marburg IID simulations as was done on Zaire IID simulations in Cruz et al, 2022. The results are shown in Fig S3 and discussed in the main text in the first results section. Briefly, we do find a community (blue) comprising the pocket residues in Reston and Marburg IIDs as we did in Zaire. Similarly, we find that many of the RNA binding residues fall into the orange and green communities as in Zaire. However, there are differences in exactly which residues are clustered into which of these two communities. There are also differences in how strongly connected these communities are in the three homologs. Therefore, while we can conclude that pocket residues likely have varying influence on the RNA binding residues in the homologs, it is hard to say exactly what that variation is from this analysis alone.  

      Reviewer #3 (Recommendations for the authors):

      - MD simulations: All simulations were initialised from the 3 crystal structures, is it correct? In all cases, RNA ds was not included in simulations, right? Were crystallographic MG ions in the vicinity of the binding site included? these are known to influence structural dynamics to a large extent.

      All simulations were indeed initialized using only protein atoms from the crystal structures 3FKE, 4GHL, and 3L2A. Therefore, crystallographic Mg ions were not included in the simulations. However, we do agree with the reviewer and think that the effect of parameters such as salt concentration, specifically Mg ions which are known to be important for the stability of dsRNA, on the pocket opening equilibrium merits detailed study in future work.

      - Figure 2: Would it be possible to perform e.g. a block error analysis and show the statistical errors of the distributions?

      We agree that showing the statistical variation in the MSM equilibrium probabilities is important for comparing the different distributions. Therefore, we have updated Figs 2 and 5 to show the distributions obtained from MSMs constructed using 100 and 10 random samples of the data respectively to indicate the extent of the statistical variability in the MSM construction.  

      - More detailed structural biology experiments (such as NMR or HDX-MS) could potentially shed more light on the differential behaviour of the three different homologs, providing more evidence for the presence of the cryptic pocket.

      We agree that NMR and HDX-MS are powerful means to study dynamics and are actively exploring these approaches for our future work.

    1. Author response:

      Reviewer #1:

      We appreciate the Reviewer's positive feedback on the strengths of our study.

      The timescales of the peptide recognition and unbinding process are much longer than what can be sampled from unbiased simulations. Therefore, the proposed mechanism of recognition should only be considered a hypothesis based on the results presented here. For example, peptides that do not dissociate within one one-microsecond MD simulation are considered to be stable binders. However, they may not have a viable way to bind to the narrow protein cleft in the first place.

      We thank the Reviewer for this valuable feedback. We agree with the Reviewer. Our work on the IRE1 cLD activation mechanism is focused on generating hypotheses of the binding mechanism driven by MD simulations. We recognize the limitations in defining a stable binder due to the time scales sampled. However, our primary focus was to sample and characterize a possible binding pose in the center of the cLD dimer. We will contextualize our statements about stable binders and limit our claims to stating that the protein-peptide complex is stable within 1 μs-long simulations. However, we believe that our finding that the cLD dimer groove is not able to accommodate peptides is solid, as the steric impediment described is present in all our replicas, both with and without peptides, in a cumulative sampling time of 72 μs. Additionally, we will include a plot showing the distribution of groove width across all replicas.

      Oftentimes, representative structures sampled from MD simulation are used to draw conclusions (e.g., Figure 4 about the role of R161 mutation in binding affinity). This is not appropriate as one unbinding event being observed or not observed in a microsecond-long trajectory does not provide sufficient information about the binding strength of the free energy difference.

      We thank the Reviewer for the insightful comment. As explained in the previous point, we believe that our simulations provide useful hypotheses, and we agree that we do not currently have data to comment on binding affinity. We will, therefore, remove all references to this term. We are aware of the limitations due to the timescale and agree that these limitations cannot be overcome with standard equilibrium simulations. To address these limitations, we plan to use orthogonal methods, namely MM/PB(GB)SA calculations for calculating binding free energies from existing trajectories (as performed by https://doi.org/10.1021/acs.jcim.4c00975). We will add predictions of all the peptides using AlphaFold 3, to confirm the binding region.

      Reviewer #2:

      We thank the Reviewer for their positive feedback.

      Improving presentation to include more computational details.

      We thank the Reviewer for raising this critical point. We agree that the manuscript is tailored for a biology audience, as the data are particularly relevant for that community. Nevertheless, we also understand the importance of providing sufficient methodological detail for computational readers. We will add appropriate computational information in the main text.

      More quantitative analysis in addition to visual structures.

      We will add an uncertainty estimate for the HDX calculations using bootstrapping and include additional information on bond distances for Y161. We will also incorporate time-series data showing the distance of the peptide from the groove across all replicas.

      Reviewer #3:

      We appreciate the Reviewer's positive feedback on our work.

      A potential weakness of the study is the usage of equilibrium (unbiased) molecular dynamics simulations so that processes and conformational changes on the microsecond time scale can be probed. Furthermore, there can be inaccuracies and biases in the description of unfolded peptides and protein segments due to the protein force fields. Here, it should be noted that the authors do acknowledge these possible limitations of their study in the conclusions.

      We appreciate the Reviewer's thoughtful comment. As noted in our response to Reviewer 1, we plan to address the concern about sampling by applying orthogonal methods. We agree with the Reviewer that some form of enhanced sampling is necessary if we want to assess binding in a more quantitative way, e.g., via free energy calculations. However, we also realize that applying any enhanced sampling scheme to our system is very challenging, given its large size and the complex peptide-protein interactions, which are not easily captured in a few collective variables. After a careful assessment and some preliminary tests, we decided that estimating free energies using enhanced sampling would necessitate a separate paper due to both the conceptual complexity of the project and the size of the necessary sampling campaign.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We wanted to clarify Reviewer #1’s latest comment in the last round of review, “Furthermore, the referee appreciates that the authors have echoed the concern regarding the limited statistical robustness of the observed scrambling events.” We appreciate the follow up information provided from Reviewer #1 that their comment is specifically about the low count alternative pathway events that we view at the dimer interface, and not the statistics of the manuscript overall as they believe that “the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations (Reviewer #1)”. We agree with the Reviewer and acknowledge that overall our coarse-grained study represents the most comprehensive single manuscript of the entire TMEM16 family to date.


      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates lipid scrambling mechanisms across TMEM16 family members using coarse-grained molecular dynamics (MD) simulations. While the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations, several critical issues undermine its novelty, impact, and alignment with experimental observations.

      Critical issues:

      (1) Lack of Novelty:

      The phenomenon of lipid scrambling via an open hydrophilic groove is already well-established in the literature, including through atomistic MD simulations. The authors themselves acknowledge this fact in their introduction and discussion. By employing coarse-grained simulations, the study essentially reiterates previously known findings with limited additional mechanistic insight. The repeated observation of scrambling occurring predominantly via the groove does not offer significant advancement beyond prior work.

      We agree with the reviewer’s statement regarding the lack of novelty when it comes to our observations of scrambling in the groove of open Ca2+-bound TMEM16 structures. However, we feel that the inclusion of closed structures in this study, which attempts to address the yet unanswered question of how scrambling by TMEM16s occurs in the absence of Ca2+, offers new observations for the field. In our study we specifically address to what extent the induced membrane deformation, which has been theorized to aid lipids cross the bilayer especially in the absence of Ca2+, contributes to the rate of scrambling (see references 36, 59, and 66). There are also several TMEM16F structures solved under activating conditions (bound to Ca2+ and in the presence of PIP2) which feature structural rearrangements to TM6 that may be indicative of an open state (PDB 6P48) and had not been tested in simulations. We show that these structures do not scramble and thereby present evidence against an out-of-the-groove scrambling mechanism for these states. Although we find a handful of examples of lipids being scrambled by Ca2+-free structures of TMEM16 scramblases, none of our simulations suggest that these events are related to the degree of deformation.

      (2) Redundancy Across Systems:

      The manuscript explores multiple TMEM16 family members in activating and non-activating conformations, but the conclusions remain largely confirmatory. The extensive dataset generated through coarse-grained MD simulations primarily reinforces established mechanistic models rather than uncovering fundamentally new insights. The effort, while statistically robust, feels excessive given the incremental nature of the findings.

      Again, we agree with the reviewer’s statement that our results largely confirm those published by other groups and our own. We think there is however value in comparing the scrambling competence of these TMEM16 structures in a consistent manner in a single study to reduce inconsistencies that may be introduced by different simulation methods, parameters, environmental variables such as lipid composition as used in other published works of single family members. The consistency across our simulations and high number of observed scrambling events have allowed us to confirm that the mechanism of scrambling is shared by multiple family members and relies most obviously on groove dilation.

      (3) Discrepancy with Experimental Observations:

      The use of coarse-grained simulations introduces inherent limitations in accurately representing lipid scrambling dynamics at the atomistic level. Experimental studies have highlighted nuances in lipid permeation that are not fully captured by coarse-grained models. This discrepancy raises questions about the biological relevance of the reported scrambling events, especially those occurring outside the canonical groove.

      We thank the reviewer for bringing up the possible inaccuracies introduced by coarse graining our simulations. This is also a concern for us, and we address this issue extensively in our discussion. As the reviewer pointed out above, our CG simulations have largely confirmed existing evidence in the field which we think speaks well to the transferability of observations from atomistic simulations to the coarse-grained level of detail. We have made both qualitative and quantitative comparisons between atomistic and coarse-grained simulations of nhTMEM16 and TMEM16F (Figure 1, Figure 4-figure supplement 1, Figure 4-figure supplement 5) showing the two methods give similar answers for where lipids interact with the protein, including outside of the canonical groove. We do not dispute the possible discrepancy between our simulations and experiment, but our goal is to share new nuanced ideas for the predicted TMEM16 scrambling mechanism that we hope will be tested by future experimental studies.

      (4) Alternative Scrambling Sites:

      The manuscript reports scrambling events at the dimer-dimer interface as a novel mechanism. While this observation is intriguing, it is not explored in sufficient detail to establish its functional significance. Furthermore, the low frequency of these events (relative to groove-mediated scrambling) suggests they may be artifacts of the simulation model rather than biologically meaningful pathways.

      We agree with the reviewer that our observed number of scrambling events in the dimer interface is too low to present it as strong evidence for it being the alternative mechanism for Ca2+-independent scrambling. This will require additional experiments and computational studies which we plan to do in future research. However, we are less certain that these are artifacts of the coarse-grained simulation system as we observed a similar event in an atomistic simulation of TMEM16F.

      Conclusion:

      Overall, while the study is technically sound and presents a large dataset of lipid scrambling events across multiple TMEM16 structures, it falls short in terms of novelty and mechanistic advancement. The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      Reviewer #2 (Public review):

      Summary:

      Stephens et al. present a comprehensive study of TMEM16-members via coarse-grained MD simulations (CGMD). They particularly focus on the scramblase ability of these proteins and aim to characterize the "energetics of scrambling". Through their simulations, the authors interestingly relate protein conformational states to the membrane's thickness and link those to the scrambling ability of TMEM members, measured as the trespassing tendency of lipids across leaflets. They validate their simulation with a direct qualitative comparison with Cryo-EM maps.

      Strengths:

      The study demonstrates an efficient use of CGMD simulations to explore lipid scrambling across various TMEM16 family members. By leveraging this approach, the authors are able to bypass some of the sampling limitations inherent in all-atom simulations, providing a more comprehensive and high-throughput analysis of lipid scrambling. Their comparison of different protein conformations, including open and closed groove states, presents a detailed exploration of how structural features influence scrambling activity, adding significant value to the field. A key contribution of this study is the finding that groove dilation plays a central role in lipid scrambling. The authors observe that for scrambling-competent TMEM16 structures, there is substantial membrane thinning and groove widening. The open Ca2+-bound nhTMEM16 structure (PDB ID 4WIS) was identified as the fastest scrambler in their simulations, with scrambling rates as high as 24.4 {plus minus} 5.2 events per μs. This structure also shows significant membrane thinning (up to 18 Å), which supports the hypothesis that groove dilation lowers the energetic barrier for lipid translocation, facilitating scrambling.

      The study also establishes a correlation between structural features and scrambling competence, though analyses often lack statistical robustness and quantitative comparisons. The simulations differentiate between open and closed conformations of TMEM16 structures, with open-groove structures exhibiting increased scrambling activity, while closed-groove structures do not. This finding aligns with previous research suggesting that the structural dynamics of the groove are critical for scrambling. Furthermore, the authors explore how the physical dimensions of the groove qualitatively correlate with observed scrambling rates. For example, TMEM16K induces increased membrane thinning in its open form, suggesting that membrane properties, along with structural features, play a role in modulating scrambling activity.

      Another significant finding is the concept of "out-of-the-groove" scrambling, where lipid translocation occurs outside the protein's groove. This observation introduces the possibility of alternate scrambling mechanisms that do not follow the traditional "credit-card model" of groove-mediated lipid scrambling. In their simulations, the authors note that these out-of-the-groove events predominantly occur at the dimer interface between TM3 and TM10, especially in mammalian TMEM16 structures. While these events were not observed in fungal TMEM16s, they may provide insight into Ca2+-independent scrambling mechanisms, as they do not require groove opening.

      Weaknesses:

      A significant challenge of the study is the discrepancy between the scrambling rates observed in CGMD simulations and those reported experimentally. Despite the authors' claim that the rates are in line experimentally, the observed differences can mean large energetic discrepancies in describing scrambling (larger than 1kT barrier in reality). For instance, the authors report scrambling rates of 10.7 events per μs for TMEM16F and 24.4 events per μs for nhTMEM16, which are several orders of magnitude faster than experimental rates. While the authors suggest that this discrepancy could be due to the Martini 3 force field's faster diffusion dynamics, this explanation does not fully account for the large difference in rates. A more thorough discussion on how the choice of force field and simulation parameters influence the results, and how these discrepancies can be reconciled with experimental data, would strengthen the conclusions. Likewise, rate calculations in the study are based on 10 μs simulations, while experimental scrambling rates occur over seconds. This timescale discrepancy limits the study's accuracy, as the simulations may not capture rare or slow scrambling events that are observed experimentally and therefore might underestimate the kinetics of scrambling. It's however important to recognize that it's hard (borderline unachievable) to pinpoint reasonable kinetics for systems like this using the currently available computational power and force field accuracy. The faster diffusion in simulations may lead to overestimated scrambling rates, making the simulation results less comparable to real-world observations. Thus, I would therefore read the findings qualitatively rather than quantitatively. An interesting observation is the asymmetry observed in the scrambling rates of the two monomers. Since MARTINI is known to be limited in correctly sampling protein dynamics, the authors - in order to preserve the fold - have applied a strong (500 kJ mol-1 nm-2) elastic network. However, I am wondering how the ENM applies across the dimer and if any asymmetry can be noticed in the application of restraints for each monomer and at the dimer interface. How can this have potentially biased the asymmetry in the scrambling rates observed between the monomers? Is this artificially obtained from restraining the initial structure, or is the asymmetry somehow gatekeeping the scrambling mechanism to occur majorly across a single monomer? Answering this question would have far-reaching implications to better describe the mechanism of scrambling.

      The main aim of our computational survey was to directly compare all relevant published TMEM16 structures in both open and closed states using the Martini 3 CGMD force field. Our standardized simulation and analysis protocol allowed us to quantitatively compare scrambling rates across the TMEM16 family, something that has never been done before. We do acknowledge that direct comparison between simulated versus experimental scrambling rates is complicated and is best to be interpreted qualitatively. In line with other reports (e.g., Li et al, PNAS 2024), lipid scrambling in CGMD is 2-3 orders of magnitude faster than typical experimental findings. In the CG simulation field, these increased dynamics due to the smoother energy landscape are a well known phenomenon. In our view, this is a valuable trade-off for being able to capture statistically robust scrambling dynamics and gain mechanistic understanding in the first place, since these are currently challenging to obtain otherwise. For example, with all-atom MD it would have been near-impossible to conclude that groove openness and high scrambling rates are closely related, simply because one would only measure a handful of scrambling events in (at most) a handful of structures.

      Considering the elastic network: the reviewer is correct in that the elastic network restrains the overall structure to the experimental conformation. This is necessary because the Martini 3 force field does not accurately model changes in secondary (and tertiary) structure. In fact, by retaining the structural information from the experimental structures, we argue that the elastic network helped us arrive at the conclusion that groove openness is the major contributing factor in determining a protein’s scrambling rate. This is best exemplified by the asymmetric X-ray structure of TMEM16K (5OC9), in which the groove of one subunit is more dilated than the other. In our simulation, this information was stored in the elastic network, yielding a 4x higher rate in the open groove than in the closed groove, within the same trajectory.

      Notably, the manuscript does not explore the impact of membrane composition on scrambling rates. While the authors use a specific lipid composition (DOPC) in their simulations, they acknowledge that membrane composition can influence scrambling activity. However, the study does not explore how different lipids or membrane environments or varying membrane curvature and tension, could alter scrambling behaviour. I appreciate that this might have been beyond the scope of this particular paper and the authors plan to further chase these questions, as this work sets a strong protocol for this study. Contextualizing scrambling in the context of membrane composition is particularly relevant since the authors note that TMEM16K's scrambling rate increases tenfold in thinner membranes, suggesting that lipid-specific or membrane-thickness-dependent effects could play a role.

      Considering different membrane compositions: for this study, we chose to keep the membranes as simple as possible. We opted for pure DOPC membranes, because it has (1) negligible intrinsic curvature, (2) forms fluid membranes, and (3) was used previously by others (Li et al, PNAS 2024). As mentioned by the reviewer, we believe our current study defines a good, standardized protocol and solid baseline for future efforts looking into the additional effects of membrane composition, tension, and curvature that could all affect TMEM16-mediated lipid scrambling.

      Reviewer #3 (Public review):

      Strengths:

      The strength of this study emerges from a comparative analysis of multiple structural starting points and understanding global/local motions of the protein with respect to lipid movement. Although the protein is well-studied, both experimentally and computationally, the understanding of conformational events in different family members, especially membrane thickness less compared to fungal scramblases offers good insights.

      We appreciate the reviewer recognizing the value of the comparative study. In addition to valuable insights from previous experimental and computational work, we hope to put forward a unifying framework that highlights various TMEM16 structural features and membrane properties that underlie scrambling function.

      Weaknesses:

      The weakness of the work is to fully reconcile with experimental evidence of Ca²⁺-independent scrambling rates observed in prior studies, but this part is also challenging using coarse-grain molecular simulations. Previous reports have identified lipid crossing, packing defects, and other associated events, so it is difficult to place this paper in that context. However, the absence of validation leaves certain claims, like alternative scrambling pathways, speculative.

      Answer: It is generally difficult to quantitatively compare bulk measurements of scrambling phenomena with simulation results. The advantage of simulations is to directly observe the transient scrambling events at a spatial and temporal resolution that is currently unattainable for experiments. The current experimental evidence for the precise mechanism of Ca2+-independent scrambling is still under debate. We therefore hope to leverage the strength of MD and statistical rigor of coarse-grained simulations to generate testable hypotheses for further structural, biochemical, and computational studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      While we agree with what the reviewer may be hinting at regarding limitations of coarse-grained MD simulations, we believe that our study holds much more merit than this comment suggests. We have provided something that has yet to be done in the field: a comprehensive study that directly compares the scrambling rates of multiple TMEM16 family members in different conformations using identical simulation conditions. Our work clearly shows that a sufficiently dilated grooves is the major structural feature that enables robust scrambling for all TMEM16 scramblases members with solved structures. While all TMEM16s cause significant distortion and thinning of the membrane, we assert that the extreme thinning observed around open grooves is significantly enhanced by the lipid scrambling itself as the two leaflets merge through lipid exchange.  We saw no evidence that membrane thinning/distortion alone, in the absence of an open groove, could support scrambling at the rates observed under activating conditions or even the low rates observed in Ca2+-independent scrambling. Moreover, our handful of observations of scrambling events outside of the groove, which has not yet been reported in any study, opens an exciting new direction for studying alternative scrambling mechanisms. That said, we are currently following up on many of the observations reported here such as: scrambling events outside the groove, the kinetics of scrambling, the possibility that lipids line the groove of non-scramblers like TMEM16A, etc. This is being done experimentally with our collaborators through site directed mutagenesis and with all-atom MD in our lab. Unfortunately, it is well beyond the scope of the current study to include all of this in the current paper.

      Reviewer #2 (Recommendations for the authors):

      Major comments and questions:

      (1) Line 214 and Figure 1- Figure Supplement 1: why have you only compared the final frame of the trajectory to the cryo-EM structure? Even if these comparisons are qualitative, they should be representative of the entire trajectory, not a single frame.

      We thank the reviewer for this suggestion and replaced the single-frame snapshots in Figure 1-figure supplement 1 for ensemble-averaged head groups densities. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

      (2) Lines 228-231: You comment 'Residues in this site on nhTMEM16 and TMEMF also seem to play a role in scrambling but the mechanism by which they do so is unclear.' This is something you could attempt to quantify in the simulations by calculating the correlation between scrambling and protein-membrane interactions/contacts in this site. Can you speculate on a mechanism that might be a contributing factor?

      We probed the correlation between these residues and scrambling lipids, as suggested by the reviewer, and interestingly not all scrambling lipids interact with these residues. Yet there is strong lipid density in this vicinity (see insets in Figure 1 and Figure 4-figure supplement 2). These observations lead us to suspect these residues impact scrambling indirectly through influencing the conformation of the protein or flexibility and shape of the membrane. This interpretation fits with mutagenesis studies highlighting a role for these residues in scrambling (see refs 59, 62, and 67). Specifically, Falzone et al. 2022 (ref 59) suggested that they may thin the membrane near the groove, but this has not been tested via structure determination and a detailed model of how they impact scrambling is missing. We could address this question with in silico mutations; however, CG simulation is not an appropriate method to study large scale protein dynamics, and AA simulations are likely best, but beyond the scope of this paper.

      (3) Lines 240-245 and Figure 1B: This section discusses the coupling between membrane distortions and the sinusoidal curve around the protein, however, Figure 1B only shows snapshots of the membrane distortions. Is it possible to understand how these two collective variables are correlated quantitatively (as opposed to the current qualitative analysis)?

      We believe that it may be possible to quantitatively capture these two key features of the membrane, as we did previously with nhTMEM16 using our continuum elasticity-based model of the membrane (Bethel and Grabe 2016). Our model agreed with all atom MD surfaces to within ~1 Å, hence showing good quantitative agreement throughout the entire membrane. However, we doubt that we could distill the essence of our model down to a simple functional relationship between the sinusoidal wave and pinching, which we think the reviewer is asking. Rather, we believe that the large-scale sinusoidal distortion (collective variable 1) and pinching/distortion (collective variable 2) near the groove arise from the interplay of the specific protein surface chemistry for each protein (patterning of polar and non-polar residues) and the membrane. This is why we chose to simply report the distinct patterns that the family members impose on the surrounding membrane, which we think is fascinating. Specifically, Fig. 1B shows that different TMEM16 family members distort the membrane in different ways. Most notably, fungal TMEM16s feature a more pronounced sinusoidal deformation, whereas the mammalian members primarily produce local pinching. Then, in Fig. 3A we show that the thinning at the groove happens in all structures and is more pronounced in open, scrambling-competent conformations. In other words, proteins can show very strong thinning (e.g. TMEM16K, 5OC9) even though the membrane generally remains flat.

      (4) Lines 257-258: Authors comment that TMEM16A lacks scramblase activity yet can achieve a fully lipid-lined groove (note the typo - should be lipid-lined, not lipid-line). Is a fully lipid-lined groove a prerequisite for scramblase activity? Are lipid-lined grooves the only requirement for scramblase activity? Could the authors clarify exactly what the prerequisite for scramblase activity is to avoid any confusion; this will be useful for later descriptions (i.e. line 295) where scrambling competence is again referred to. Additionally, the associated figure panel (Figure 1D) shows a snapshot of this finding but lacks any statistical quantifications - is a fully lipid-lined groove a single event? Perhaps the additional analyses, such as the groove-lipid contacts, may be useful here.

      The definition of lipid scrambling is that a lipid fully transitions from one membrane leaflet to the other. While a single lipid could transition through the groove on its own, it is well documented in both atomistic and CG MD simulations, that lipid scrambling typically happens through a lipid-lined groove, as shown in Fig. 1A-B. The lipids tend to form strong choline-to-phosphate interactions with nearest neighbors that make this energetically favorable. That said, lipid-lined grooves are not sufficient for robust scrambling, which is what we show in Fig. 1D where the non-scrambler TMEM16A did in fact feature a lipid-lined groove. As suggested, we performed contact analysis and found that residue K645 on TM6 in the middle of the groove contacts lipids in 9.2% of the simulation frames.

      To get a better understanding of how populated the TM4-TM6 pathway is with lipids across all simulated structures, we determined for every simulation frame how many headgroup beads resided in the groove. This indicates that the ion-conductive state of TMEM16A (5OYB*, Fig. 1D) only had 1 lipid in the pathway, on average, meaning that the configuration shown Fig. 1D is indeed exceptional. As a reference, our strongest scrambler nhTMEM16 4WIS, had an average of 2.8 lipids in the groove. We added a table containing the means and standard deviations that resulted from this analysis as Figure 1-Table supplement 1.

      (5) Lines 295-298 : The scrambling rates of the Ca²⁺-bound and Ca²⁺-free structures fall within overlapping error margins, it becomes difficult to definitively state that Ca²⁺ binding significantly enhances scrambling activity. This undermines the claim that the Ca²⁺-bound structure is the strongest scrambler. The authors should conduct statistical analyses to determine if the difference between the two conditions is statistically significant.

      In contrast to the reviewer’s comment, we do not claim that Ca2+-binding itself enhances lipid scrambling. Instead, what we show is that WT structures that are solved in an open confirmation (all of which are Ca2+-bound, except 6QM6) are robust scramblers. For nhTMEM16, we did not observe any scrambling events for the closed-groove proteins, making further statistical analysis redundant.

      (6) The authors claim that the scrambling rates derived from their MD simulations are in "excellent agreement" with experimental findings (lines 294-295), despite significant discrepancy between simulated and experimentally measured rates. For example, the simulated rate of 24.4 {plus minus} 5.2 events/µs for the open, Ca²⁺-bound fungal nhTMEM16 (PDB ID 4WIS) corresponds to approximately 24 million events per second, which is vastly higher than experimental rates. Experimental studies have reported scrambling rate constants of ~0.003 s⁻¹ for TMEM16 family members in the absence of Ca²⁺, measured under physiological conditions (https://doi.org/10.1038/s41467-019-11753-1 ). Even with Ca²⁺ activation, scrambling rates remain several orders of magnitude lower than the rates observed in simulations. Moreover, this highlights a larger problem: lipid scrambling rates occur over timescales that are not captured by these simulations. While the authors elude to these discrepancies (lines 605-606), they should be emphasised in the text, as opposed to the table caption. These should also be reconducted to differences between the membrane compositions of different studies.

      We agree with the spirit of the reviewer’s comment, and because of that, we were very careful not to claim that we reproduce experimental scrambling rates, just that the trends (scrambling-competent, or not) are correct. On lines 294-295, we actually said that the scrambling rates in our simulations excellently agree with “the presumed scrambling competence of each experimental structure”, which is true. 

      As explained extensively in the discussion section of our paper (and by many others), direct comparison between MD (e.g., Martini 3, but also atomistic force fields) dynamics and experimental measurements is challenging. The primary goal of our paper is to quantify and compare the scrambling capacity of different TMEM16 family members and different states, within a CGMD context.

      That said, we agree with the reviewer that we may have missed rare or long-timescale events (as is the case in any MD experiment) and added this point to the discussion.

      (7) To address these discrepancies, the authors should: i) emphasize that simulated rates serve as qualitative indicators of scrambling competence rather than absolute values comparable to experimental findings and ii) discuss potential reasons for the divergence, such as simulation timescale limitations or lipid bilayer compositions that may favor scrambling and force field inaccuracies.

      Please see our answer to question 6. Within the context of our CGMD survey, we confidently call our results quantitative. However, we agree with the reviewer that comparison with experimental scrambling rates is qualitative and should be interpreted with caution. To reflect this, we rewrote the first sentence of the relevant paragraph in the discussion section.

      (8) Line 310: Can the authors provide a rationale as to why one monomer has a wider groove than the other? Perhaps a contact analysis could be useful. See the comment above about ENM.

      The simulation of Ca2+-bound TMEM16K was initiated from an asymmetric X-ray structure in which chain B features a more dilated groove than chain A (PDB 5OC9). The backbones of TM4 and TM6 in the closed groove (A) are close enough together to be directly interconnected by the elastic network. In contrast, TM4 and TM6 in the more dilated subunit (B) are not restricted by the elastic network and, as a consequence, display some “breathing” behavior (Fig. 3B and Fig. 3-Suppl. 6A), giving rise to a ~4x higher scrambling rate. We explicitly added the word “cryo-EM” and the PDB ID to the sentence to emphasize that the asymmetry stems from the original experimental structure.

      When answering this question, we also corrected a mislabeled chain identifier which was in the original manuscript ‘chain A’ when it is actually ‘chain B’ in Fig.2-Suppl. 3A.

      (9) Line 312: Authors speculate that increased groove width likely accounts for increased scrambling rates. For statistical significance, authors should attempt to correlate scrambling rates and groove width over the simulation period.

      The Reviewer is referring to our description of scrambling rates we measured for TMEM16K where we noted that on average the groove with the highest scrambling rate is also on average wider than the opposite subunit which is below 6 Å. We do not suggest that the correlation between scrambling and groove width is continuous, as the Reviewer may have interpreted from our original submission, but we think it is a binary outcome – lipids cannot easily enter narrow grooves (< 6 Å) and hence scrambling can only occur once this threshold is reached at which point it occurs at a near constant rate. We showed this for 4 different family members in the original Fig. 3B, where scrambling events (black dots) were much more likely during, or right after, groove dilation to distances > 6 Å. 

      (10) Line 359: Authors have plotted the minimum distance between residues TM4 and TM6 in Fig. 3A/B, claiming that a wide groove is required for scrambling. Upon closer examination, it is clear that several of these distributions overlap, reducing the statistical significance of these claims. Statistical tests (i.e. KS-tests) should be performed to determine whether the differences in distributions are significant.

      The Reviewer appears to be asking for a statistical test between the six distance distributions represented by the data in Fig. 3A for the scrambling competent structures (6QP6*, 8B8J, 6QM6, 7RXG, 4WIS, 5OC9), and we think this is being asked because it is believed that we are making a claim that the greater the distance, the greater the scrambling rate. If we have interpreted this comment correctly, we are not making this claim. Rather, we are simply stating that we only observe robust scrambling when the groove width regularly separates beyond 6 Å. The full distance distributions can now be found in Figure 3-figure supplement 6B, and we agree there is significant overlap between some of these distributions. However, the distinguishing characteristic of the 6 distributions from scrambling competent proteins is that they all access large distances, while the others do not. Notably, TMEM16F proteins (6QP6*, 8B8J) are below the 6 Å threshold on average, but they have wide standard deviations and spend well over ¼ of their time in the permissive regime (the upper error bar in the whisker plots in Fig. 3A is the 75% boundary).

      (11) Line 363-364: The authors state that all TMEM16 structures thin the membrane. Could the authors include a description of how membrane thinning is calculated, for instance, is the entire membrane considered, or is thinning calculated on a membrane patch close to the protein? Do membrane patches closer to the transmembrane protein increase or decrease thickness due to hydrophobic packing interactions? The latter question is of particular concern since Martini3 has been shown to induce local thinning of the membrane close to transmembrane helices, yielding thicknesses 2-3 Å thinner than those reported experimentally (https://doi.org/10.1016/j.cplett.2023.140436). This could be an important consideration in the authors' comparison to the bulk membrane thickness (line 364). Finally, how is the 'bulk membrane thickness' measured (i.e., from the CG simulations, from AA simulations, or from experiments)?

      Regarding the calculation of thinning and bulk membrane thickness, as described in Method “Quantification of membrane deformations”, the minimal membrane thickness, or thinning, is defined as the shortest distance between any two points from the interpolated upper and lower leaflet surfaces constructed using the glycerol beads (GL1 and GL2). Bulk membrane thickness is calculated by taking the vertical distance between the averaged glycerol surfaces at the membrane edge.

      The concern of localized membrane deformation due to force field artifacts is well-founded. However, the sinusoidal deformations shown here are much greater than 2-3 Å Martini3 imperfections, and they extend for up to 10 Å radially away from the protein into the bulk membrane (see Figure 3-figure supplement 1-5 for more of a description). Most importantly, the sinusoidal wave patterns set up by the proteins is very similar to those described in the previous continuum calculation and all-atom MD for nhTMEM16 (https://www.pnas.org/doi/full/10.1073/pnas.1607574113).

      (12) Line 374: The authors state a 'positive correlation' between membrane thinning/groove opening and scrambling rates. To support this claim, the authors should report. the correlation coefficients.

      We have removed any discussion concerning correlations between the magnitude of the scrambling rate and the degree of membrane thinning/groove opening. Rather we simply state that opening beyond a threshold distance is required for robust scrambling, as shown in our analysis in Fig. 3A.

      Concerning the relation between thinning and scrambling: Instantaneous membrane thinning is poorly defined (because it is governed by fluctuations of single lipids), and therefore difficult to correlate with the timing of individual scrambling events in a meaningful way.  Moreover, as we state later in that same section, “we argue that the extremely thin membranes are likely correlated with groove opening, rather than being an independent contributing factor to lipid scrambling”.

      (13) Line 396: It is stated that TMEM16A is not a scramblase but the simulating scrambling activity is not zero. How can you be sure that you are monitoring the correct collective variable if you are getting a false positive with respect to experiments?

      We only observe 2 scrambling events in 10 ms, which is a very small rate compared to the scrambling competent states. In a previous large survey Martini CG simulation study that inspired our protocol (Li et al, PNAS 2024), they employed a 1 event/ms cut-off to distinguish scramblers from non-scramblers. Hence, they would have called TMEM16A a non-scrambler as well. We expect that false negatives in this context might be an artifact of the CG forcefield, or it could be that TMEM16A can scramble but too slowly to be experimentally detected. Regarding the collective variable for lipid flipping, it is correct, and we know that this lipid actually flipped.

      (14) Line 402: Distance distributions for the electrostatic interactions between E633 and K645 should be included in the manuscript. This is also the case for the interactions between E843-K850 (lines 491-492).

      Our description of interactions between lipid headgroups and E633 and K645 in TMEM16A (5OYB*) are based on qualitative observations of the MD trajectory, and we highlight an example of this interaction in Figure 3-video 4. The video clearly shows that the lipid headgroups in the center of the groove orient themselves such that the phosphate bead (red) rests just above K645 (blue) and at other times the choline bead (blue) rests just below E633 (red). We do not think an additional plot with the distance distributions between lipids and these residues will add to our understanding of how lipids interact residues in the TMEM16A pore.

      We made a similar qualitative observation for the interaction between the POPC choline to E843 and POPC phosphate to K850 while watching the AAMD simulation trajectory of TMEM16F (PDB ID 6QP6). Given that this was a single observation, and the same interactions does not appear in CG simulation of the same structure (see simulation snapshots in Figure 4-figure supplement 5) we do not think additional analysis would add significantly to our understanding of which residues may stabilize lipids in the dimer interface.

      (15) Lines 450-451: 'As the groove opens, water is exposed to the membrane core and lipid headgroups insert themselves into the water-filled groove to bridge the leaflets.' Is this a qualitative observation? Could the authors report the correlation between groove dilation and the number of water permeation events?

      Yes, this is qualitative, and it sketches the order of events during scrambling, and we revised the main text starting at line 450 to indicate this. As illustrated by the density isosurfaces in Appendix 1-Figure 2A, the amount of water found in the closed versus open grooves is striking – there is a significant flood of water that connects the upper and lower solutions upon groove opening. Moreover, Appendix 1-Figure 2B shows much greater water permeation for open structures (4WIS, 7RXG, 5OC9, 8B8J, …) compared to closed structures (6QMB, 6QMA, 8B8Q, and many of the non-labeled data in the figure that all have closed grooves and near 0 water permeation). A notable exception is TMEM16A (7ZK3*8), which has water permeation but a closed groove and little-to-no lipid scrambling.

      Minor Comments:

      (1) Inconsistent use of '10' and 'ten' throughout.

      We like to kindly point out that we do not find examples of inconsistent use.

      (2) Line 32: 'TM6 along with 3, 4 and 5...' should be 'TM6 along with TM3, TM4 and TM5...'. Same in line 142. Naming should stay consistent.

      Changes are reflected in the updated manuscript.

      (3) Line 141: do you mean traverse (i.e. to travel across)? Or transverse (i.e. to extend across the membrane)?

      This is a typo. We meant “traverse”. Thanks for pointing it out.

      (4) Line 142: 'greasy' should be 'strongly hydrophobic'.

      Changes are reflected in the updated manuscript.

      (5) Line 143-144: "credit card mechanism" requires quotation marks.

      Changes are reflected in the updated manuscript.

      (6) Line 144: state if Nectria haematococca is mammalian or fungal, this is not obvious for all readers.

      Changes are reflected in the updated manuscript.

      (7) Line 147-148: Is TMEM16A/TMEM16K fungal or mammalian? What was the residue before the mutation and which residue is mutated? Perhaps the nomenclature should read as TMEM16X10Y where X=the residue prior to the mutation, 10 is a placeholder for the residue number that is mutated and Y=the new residue following mutation.

      “TMEM16” is the protein family. “A” denotes the specific homolog rather than residue.  

      (8) Lines 157-158: same as 10, it is unclear if these are fungal or mammalian.

      Clarifications added.

      (9) Line 184: "...CGMD simulation" should be "...CGMD simulations".

      Changes made.

      (10) Line 191-192: It would help to create a table of all of the mutants (including if they are mammalian or fungal) summarizing the salt concentrations, lipid and detergent environments, the presence of modulators/activators, etc.

      We added this information to Appendix 1-Table 1 in the supplemental information. We did not specify NaCl concentrations, because they all experimental procedures used standard physiological values for this (100-150 mM).

      (11) Line 210: inconsistencies with 'CG' and 'coarse-grain'.

      Changes made.

      (12) Figure 1 caption: '...totaling ~2μs (B)...' is missing the fullstop after 2μs.

      Changes made.

      (13) Figure 1B: it may be useful to label where the Ca2+ ion binds or include a schematic.

      We updated Fig. 1A to illustrate where Ca2+ binds.

      (14) Line 311: Are these mean distances? The authors should add standard deviations.

      Yes, they are. We added the standard deviations to the text.

      (15) Line 321-322: Perhaps a schematic in Figure 2 would be useful to visualize the structural features described here.

      We would kindly refer interested readers to reference [60].

      (16) Line 377: '...are likely a correlate of groove opening...' should read as: '...are likely correlated to groove opening...'.

      Thank you for pointing it out. Changes made.

      (17) Line 398: the '...empirically determined 6Å threshold for scrambling.' Was this determined from the simulations or from experiments? What does "empirically" mean here? Please state this.

      This value was determined from the simulations. Based on our analysis of the correlation between scrambling rate and groove dilation, we found that the minimal TM4/6 distance of 6 Å can distinguish between the high and low activity scramblers. The exact numerical value is somewhat arbitrary as there is a range of values around 6 Å that serve to distinguish scramblers from non-scramblers.

      (18) Figure 4: This figure should be labelled as A, B, C and D, with the figure caption updated accordingly.

      We updated Figure 4 and its caption.

      Reviewer #3 (Recommendations for Authors):

      The authors must do additional simulations to further validate their claim with different lipids and further substantiate dimer interface independent of Ca2+ ions.

      Thank you for the suggestion. We completely agree that studying scrambling in the context of a diverse lipid environment is an exciting area to explore. We are indeed actively working on a project that shares the similar idea. We decided not to include that study because we think the additional discussion involved would be excessive for the current manuscript. We, however, look forward to publishing our findings in a separate manuscript in the near future. In terms of Ca2+-independent scrambling, we are planning with our experimental collaborator for mutagenesis studies that target the residues we identified along the dimer interface.

      Since calcium ions are critical for the stability of these structures, authors should show that they were placed throughout the simulations consistently.

      As stated in the method section “Coarse-grained system preparation and simulation detail”, all Ca2+ ions are manually placed into the coarse-grained structure from the beginning of the simulation at their identical corresponding position in the experimental structure and harmonically bonded to adjacent acidic residues throughout the duration of simulation. We have also added a label to Fig 1A to indicate where the two Ca2+ ions are located.

      The comparison with experimental structures should be consistent with complete simulation, and not the last structure of the trajectory. Depending on the conformational variability, this might be misleading.

      We agree and updated Fig. 1-supplement figure 1 accordingly. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Review:

      Reviewer #1 (Public review):

      Summary:

      Meteorin proteins were initially described as secreted neurotrophic factors. In this manuscript, Eggeler et al. demonstrate a novel role for Meteorins in establish left-right axis formation in the zebrafish embryo. The authors generated null mutations in each of the three zebrafish meteorin genes - metrn, metrnla, and metrnlab. Triple mutant embryos displayed phenotypes strongly associated with left-right defects such as heart looping and visceral organ placement, and disrupted expression of Nodal-responsive genes, as did single mutants for metrn and metrnla. The authors then go on to demonstrate that these defects in left-right asymmetry are likely to due to defects in Kupffer's Vesicle and the progenitor dorseal forerunner cells including impaired lumen formation and reduced fluid flow, reduced clustering among DFCs, impaired DFC migration, mislocalization of apical proteins ZO-1 and aPKC, and detachment of DFCs from the EVL. Notably, the authors found that expression of marker genes sox32 and sox17 were not affected, suggesting Meteorins are required for DFC/KV morphogenesis but not necessarily fate specification. Finally, the authors show genetic interaction between Meteorins and integrin receptors, which were previously implicated in left-right patterning. In a supplemental figure, the manuscript also presents data showing expression of meteorin genes around the chick Hensen's node, suggesting that the left-right patterning functions may be conserved among vertebrates.

      Strengths:

      Strengths of this study include the generation of a triple mutant line that targets all known zebrafish meteorin family members. The experiments presented in this study were rigorous, especially with respect to quantification and statistical analysis.

      Weaknesses:

      Although the authors convincingly demonstrate a role for Meteorins in zebrafish left-right patterning, data supporting a conserved role in other vertebrates is compelling but limited to one supplemental figure.

      We thank the reviewer for their thoughtful summary of our study and for highlighting the strengths of our work, including the generation of the triple mutant line and the rigor of our experimental design and quantitative analyses. We also appreciate the constructive feedback regarding the limited functional data supporting the conservation of Meteorin function in other vertebrates. We agree that this is an important aspect that could be further explored. While functional studies in additional species are beyond the current scope, we will consider such experiments in future work.

      We would like to highlight the phylogenetic analysis of Meteorin proteins we have already performed and included in the manuscript (Fig. S7D), which illustrates the evolutionary conservation of this protein family and supports the possibility of a conserved role in left-right patterning.

      Additionally, we have expanded the methods and discussion to include: (1) details on zebrafish viability in contrast to reported embryonic lethality in metrn mutant mice, (2) the background strains used in our study, (3) observed variability in DFC number and potential batch effects and (4) clarification of our 'convergence ratio' quantification approach.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors describe their study on the role of meteorins in establishing the left-right organizer. The left-right organizer is a transient organ in vertebrate embryos in which rotating cilia cause a fluid flow that breaks the left-right symmetry and coordinates lateralization of internal organs such as gut and heart. In zebrafish, the left-right organizer (also named Kupffer's vesicle) is formed by dorsal forerunner cells, but very little is known about how dorsal forerunner cells coalles and form this ciliated vesicle in the embryo. The authors mutated the three meteorin-coding genes in zebrafish and observed that mutations in each one of these causes laterality defects with the strongest defects observed in the triple mutant. Loss of meteorins affects nodal gene expression, which play essential roles in establishing organ laterality. Meteorins are widely expressed in developing embryos and expression in lateral plate mesoderm and dorsal forerunner cells was observed. The meteorin triple mutant embryos display defects in the migration and clustering of the dorsal forerunner cells impairing kupffer's vesicle formation and cilia rotation. Finally, the authors show that meteorins genetically interact with integrins.

      Strengths:

      - These authors went through the lengthy process of generating triple mutants affecting all three meteorin genes. This provides robust genetic evidence on the role of meteorins in establishing organ laterality and circumvented that interpretation of the results would be hard due to redundant functions of meteorins.

      - The use of life imaging on triple mutants is appreciated

      - High-quality imaging of dorsal forerunner to quantify cell migrations and its relation to Kupffer's vesicle formation.

      Weaknesses:

      - Lack of a model how meteorins regulate dorsal forerunner cell migration.

      - Only genetic data to suggest a link between meteorins and integrins

      - Besides its role in DFC migration, meteorins may also play a more direct role in regulating Nodal signaling, which is not addressed here.

      We appreciate the recognition of the strengths of our study, particularly the generation of the triple meteorin mutants and the use of high-resolution imaging to quantify DFC behavior and Kupffer’s vesicle formation—both of which were central to providing robust evidence for Meteorins' role in left-right patterning.

      We also value the reviewer’s comments on areas that need further exploration, including the need for a mechanistic model explaining how Meteorins regulate DFC migration, the genetic interaction with integrins, and the potential direct involvement of Meteorins in Nodal signaling.

      We agree that deeper mechanistic insights would strengthen the study. While our findings suggest that Meteorins influence DFC migration and clustering through integrin pathways, a detailed mechanistic dissection, particularly regarding the yet unidentified Meteorin receptor, lies beyond the current scope. However, we consider this a key aspect for future research and have discussed it further in the revised discussion section.

      In response to the reviewer’s suggestions, we have expanded the discussion to address the limitations of the current data linking Meteorins and integrins, including relevant citations to studies that implicate integrins in similar contexts. Additionally, we have added a more detailed discussion of the potential for Meteorins to directly influence Nodal signaling, and we cite a relevant study to support this possibility.

      Once again, we thank the reviewer for their insightful and constructive comments. These points raise important directions for future investigation that will further advance our understanding of Meteorin function in left-right axis formation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In the Results section (p. 9), the authors state, "...a reduced ZO-1 enrichment at the apical junctions of triplMUT GFP-positive DFCs could be detected." However, in Fig. 4F-G, the areas of ZO-1 enrichment indicated by arrowheads appear quite far from the DFCs themselves, making it unclear if these ZO-1-enriched areas are apical DFC junctions (as stated in the text) or instead are part of the EVL. Is it possible to include an additional cell membrane marker or other landmarks? In addition, the differences in ZO-1 accumulation between mutants and WT appear relatively modest. Is it possible to provide quantification of this effect?

      We appreciate the reviewer’s request for additional stainings and further clarification and we would like to highlight the requested quantifications of ZO-1 accumulation, including statistical analysis, are already provided in Fig. S5E.

      In mouse, loss of Meteorin is embryonic lethal yet the zebrafish triple mutants are viable. Could the authors discuss this discrepancy?

      We have expanded the discussion to address this point, suggesting that species-specific differences in compensatory mechanisms may explain the observed differences in viability. We would like to reiterate that while one study has reported embryonic lethality in metrn mutant mice, this specific mouse line has not been further investigated in any recent publications. Additionally, in collaboration with the lab of Alain Chédotal, we generated independent metrn and metrnl mutant mouse lines, which did not exhibit the phenotype described in the previously mentioned study.

      It has been reported that TL and AB strains exhibit variable numbers of DFCs and thus laterality defects (Moreno-Ayala et al., 2021, Cell Reports 34(2):108606). Would it be possible for the authors to report background stains used in this study and those used to generate the meteorin knock-outs?

      We appreciate the comment highlighting the importance of specifying the background strains used in our study. We have now included this information in the methods section, detailing the zebrafish strains utilized throughout our experiments.

      For statistical analysis, would be possible for the authors to report the number of clutches examined to control for batch effects (especially given the wide variability in DFC numbers as noted above)?

      For further clarification, we have now included additional explanation on number of clutches in the methods section.

      In the Methods section (p. 19), the description of how the convergence ratio was computed was somewhat unclear. Could the authors provide a citation or include a diagram/schematic?

      We have revised the Methods section to provide a clearer definition of the convergence ratio and have included a schematic (Fig. 4D) to illustrate how it was calculated.

      Reviewer #2 (Recommendations for the authors):

      - Meteorins are widely expressed in the embryo. Can the authors comment on whether meteorin expression is required in the dorsal forerunner cells (DFCs) or in other cells? This could be addressed by knockdown experiments in DFCs as described by others (PMID: 15716348)

      We thank the reviewer for this important comment. In our study, we have shown that Meteorins are not required for the identity of DFCs, as several DFC-specific markers remain expressed in the respective cells within the meteorin mutant background (see Fig. S4).

      - In fig1d and 1e the authors use heterotaxy to describe visceral organ placement. The embryo shown in 1d seems to display situs inversus instead of heterotaxy, which is defined as discordance in organ position. The authors should clarify this.

      We agree with the reviewer and have revised the figures and figure legends to clarify the distinction between situs inversus and heterotaxy.

      - In Fig2 the authors show that nodal pathway genes are reduced, suggesting reduced Nodal signaling. How do they explain this as loss of cilia rotation generally leads to randomization of Nodal signaling but not a reduction in signaling.

      Following this suggestion we have now added a further discussion on the possibility that Meteorins could directly regulate Nodal signaling in addition to their role in DFC migration and have cited a relevant study.

      - Reduced Nodal signaling in the LPM leads to organ laterality defects. Most anterior tissues like the heart are more sensitive to perturbation in Nodal signaling in the LPM compared to more posterior organs like gut (see also PMID: 25684355). Since in triple mutants the position of the heart is more affected than the position of the visceral organs this suggests that meteorins play an additional role in Nodal signaling in the LPM. As others have shown that meteorins regulate nodal activity (PMID: 24558432), the authors should address this further.

      As described above, we have now added a further discussion on the possibility that Meteorins could directly regulate Nodal signaling in addition to their role in DFC migration and have cited a relevant study. Further investigation into a possible direct role of Meteorins in Nodal signaling will be pursued in future work.

      - The term 'convergence ratio' is not clearly described and confusing as convergence is also used for the movement of LPM cells towards the midline.

      As noted in response to Reviewer #1, we have revised the Methods section and included a schematic in Fig. 4D to better explain this parameter.

      We are grateful for the thoughtful critiques from both reviewers, which have been very constructive and improved the clarity of our study. We believe that the revisions we have made address the concerns raised, and we look forward to your evaluation of our revised manuscript.

    1. Author response:

      Reviewer #1 (Public Review):

      In this manuscript, Tran et al. investigate the interaction between BICC1 and ADPKD genes in renal cystogenesis. Using biochemical approaches, they reveal a physical association between Bicc1 and PC1 or PC2 and identify the motifs in each protein required for binding. Through genetic analyses, they demonstrate that Bicc1 inactivation synergizes with Pkd1 or Pkd2 inactivation to exacerbate PKD-associated phenotypes in Xenopus embryos and potentially in mouse models. Furthermore, by analyzing a large cohort of PKD patients, the authors identify compound BICC1 variants alongside PKD1 or PKD2 variants in trans, as well as homozygous BICC1 variants in patients with early-onset and severe disease presentation. They also show that these BICC1 variants repress PC2 expression in cultured cells.

      Overall, the concept that BICC1 variants modify PKD severity is plausible, the data are robust, and the conclusions are largely supported. However, several aspects of the study require clarification and discussion:

      (1) The authors devote significant effort to characterizing the physical interaction between Bicc1 and Pkd2. However, the study does not examine or discuss how this interaction relates to Bicc1's well-established role in posttranscriptional regulation of Pkd2 mRNA stability and translation efficiency.

      The reviewer is correct that the present study has not addressed the downstream consequences of this interaction considering that Bicc1 is a posttranscriptional regulator of Pkd2 (and potentially Pkd1). We think that the complex of Bicc1/Pkd1/Pkd2 retains Bicc1 in the cytoplasm and thus restrict its activity in participating in posttranscriptional regulation. As we do not have yet experimental data to support this model, we have not included this model in the manuscript. Yet, we will update the discussion of the manuscript to further elaborate on the potential mechanism of the Bicc1/Pkd1/Pkd2 complex.

      (2) Bicc1 inactivation appears to downregulate Pkd1 expression, yet it remains unclear whether Bicc1 regulates Pkd1 through direct interaction or by antagonizing miR-17, as observed in Pkd2 regulation. This should be further examined or discussed.

      This is a very interesting comment. The group of Vishal Patel published that PKD1 is regulated by a mir-17 binding site in its 3’UTR (PMID: 35965273). We, however, have not evaluated whether BICC1 participates in this regulation. A definitive answer would require us utilize some of the mice described in above reference, which is beyond the scope of this manuscript. We, however, will revise the discussion to elaborate on this potential mechanism.

      (3) The evidence supporting Bicc1 and ADPKD gene cooperativity, particularly with Pkd1, in mouse models is not entirely convincing, likely due to substantial variability and the aggressive nature of Bpk/Bpk mice. Increasing the number of animals or using a milder Bicc1 strain, such as jcpk heterozygotes, could help substantiate the genetic interaction.

      We have initially performed the analysis using our Bicc1 complete knockout, we previously reported on (PMID 20215348) focusing on compound heterozygotes. Yet, like the Pkd1/Pkd2 compound heterozygotes (PMID 12140187) no cyst development was observed until we sacrificed the mice at P21. Our strain is similar to the above mentioned jcpk, which is characterized by a short, abnormal transcript thought to result in a null allele (PMID: 12682776). We thank the reviewer for pointing use to the reference showing the heterozygous mice show glomerular cysts in the adults (PMID: 7723240). This suggestion is an interesting idea we will investigate. In general, we agree with the reviewer that the better understanding the contribution of Bicc1 to the adult PKD phenotype will be critical. To this end, we are currently generating a floxed allele of Bicc1 that will allow us to address the cooperativity in the adult kidney, when e.g. crossed to the Pkd1<sup>RC/RC</sup> mice. Yet, these experiments are unfortunately beyond the scope of this manuscript.

      Reviewer #2 (Public Review):

      Tran and colleagues report evidence supporting the expected yet undemonstrated interaction between the Pkd1 and Pkd2 gene products Pc1 and Pc2 and the Bicc1 protein in vitro, in mice, and collaterally, in Xenopus and HEK293T cells. The authors go on to convincingly identify two large and non-overlapping regions of the Bicc1 protein important for each interaction and to perform gene dosage experiments in mice that suggest that Bicc1 loss of function may compound with Pkd1 and Pkd2 decreased function, resulting in PKD-like renal phenotypes of different severity. These results led to examining a cohort of very early onset PKD patients to find three instances of co-existing mutations in PKD1 (or PKD2) and BICC1. Finally, preliminary transcriptomics of edited lines gave variable and subtle differences that align with the theme that Bicc1 may contribute to the PKD defects, yet are mechanistically inconclusive.

      These results are potentially interesting, despite the limitation, also recognized by the authors, that BICC1 mutations seem exceedingly rare in PKD patients and may not "significantly contribute to the mutational load in ADPKD or ARPKD". The manuscript has several intrinsic limitations that must be addressed.

      As mentioned above, the study was designed to explore whether there is an interaction between BICC1 and the PKD1/PKD2 and whether this interaction is functionally important. How this translates into the clinical relevance will require additional studies (and we have addressed this in the discussion of the manuscript).

      The manuscript contains factual errors, imprecisions, and language ambiguities. This has the effect of making this reviewer wonder how thorough the research reported and analyses have been.

      We respectfully disagree with the reviewer on the latter interpretation. The study was performed with rigor. We have carefully assessed the critiques raised by the reviewer. Most of the criticisms raised by the reviewer will be easily addressed in the revised version of the manuscript. Yet, none of the critiques raised by the reviewer seems to directly impact the overall interpretation of the data.

      Reviewer #3 (Public Review):

      Summary:

      This study investigates the role of BICC1 in the regulation of PKD1 and PKD2 and its impact on cytogenesis in ADPKD. By utilizing co-IP and functional assays, the authors demonstrate physical, functional, and regulatory interactions between these three proteins.

      Strengths:

      (1) The scientific principles and methodology adopted in this study are excellent, logical, and reveal important insights into the molecular basis of cystogenesis.

      (2) The functional studies in animal models provide tantalizing data that may lead to a further understanding and may consequently lead to the ultimate goal of finding a molecular therapy for this incurable condition.

      (3) In describing the patients from the Arab cohort, the authors have provided excellent human data for further investigation in large ADPKD cohorts. Even though there was no patient material available, such as HUREC, the authors have studied the effects of BICC1 mutations and demonstrated its functional importance in a Xenopus model.

      Weaknesses:

      This is a well-conducted study and could have been even more impactful if primary patient material was available to the authors. A further study in HUREC cells investigating the critical regulatory role of BICC1 and potential interaction with mir-17 may yet lead to a modifiable therapeutic target.

      This is an excellent suggestion. We agree with the reviewer that it would have been interesting to analyze HUREC material from the affected patients. Unfortunately, besides DNA and the phenotypic analysis described in the manuscript neither human tissue nor primary patient-derived cells collected before the two patients with the BICC1 p.Ser240Pro mutation passed away. To address this missing link, we have – as a first pass - generated HEK293T cells carrying the BICC1 p.Ser240Pro variant. While these admittingly are not kidney epithelial cells, they indeed show a reduced level of PC2 expression. These data are shown in the manuscript. We have not yet addressed how this relates to its crosstalk with miR-17.

      Conclusion:

      The authors achieve their aims. The results reliably demonstrate the physical and functional interaction between BICC1 and PKD1/PKD2 genes and their products.

      The impact is hopefully going to be manifold:

      (1) Progressing the understanding of the regulation of the expression of PKD1/PKD2 genes.

      (2) Role of BiCC1 in mir/PKD1/2 complex should be the next step in the quest for a modifiable therapeutic target.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Filamentous fungi are established workhorses in biotechnology, with Aspergillus oryzae as a prominent example with a thousand-year history. Still, the cell biology and biochemical properties of the production strains is not well understood. The paper of the Takeshita group describes the change in nuclear numbers and correlates it to different production capacities. They used microfluidic devices to really correlate the production with nuclear numbers. In addition, they used microdissection to understand expression profile changes and found an increase in ribosomes. The analysis of two genes involved in cell volume control in S. pombe did not reveal conclusive answers to explain the phenomenon. It appears that it is a multi-trait phenotype. Finally, they identified SNPs in many industrial strains and tried to correlate them to the capability of increasing their nuclear numbers.

      The methods used in the paper range from high-quality cell biology, Raman spectroscopy, to atomic force and electron microscopy, and from laser microdissection to the use of microfluidic devices to study individual hyphae.

      This is a very interesting, biotechnologically relevant paper with the application of excellent cell biology. I have only minor suggestions for improvement.

      We sincerely appreciate your fair and positive evaluation of our work. Thank you for your suggestions for improvement. We respond to each of them appropriately.

      Reviewer #2 (Public review):

      Summary:

      In the study presented by Itani and colleagues, it is shown that some strains of Aspergillus oryzae - especially those used industrially for the production of sake and soy sauce - develop hyphae with a significantly increased number of nuclei and cell volume over time. These thick hyphae are formed by branching from normal hyphae and grow faster and therefore dominate the colonies. The number of nuclei positively correlates with the thicker hyphae and also the amount of secreted enzymes. The addition of nutrients such as yeast extract or certain amino acids enhanced this effect. Genome and transcriptome analyses identified genes, including rseA, that are associated with the increased number of nuclei and enzyme production. The authors conclude from their data involvement of glycosyltransferases, calcium channels, and the tor regulatory cascade in the regulation of cell volume and number of nuclei. Thicker hyphae and an increased number of nuclei were also observed in high-production strains of other industrially used fungi such as Trichoderma reesei and Penicillium chrysogenum, leading to the hypothesis that the mentioned phenotypes are characteristic of production strains, which is of significant interest for fungal biotechnology.

      Strengths:

      The study is very comprehensive and involves the application of diverse state-of-the-art cell biological, biochemical, and genetic methods. Overall, the data are properly controlled and analyzed, figures and movies are of excellent quality.

      The results are particularly interesting with regard to the elucidation of molecular mechanisms that regulate the size of fungal hyphae and their number of nuclei. For this, the authors have discovered a very good model: (regular) strains with a low number of nuclei and strains with a high number of nuclei. Also, the results can be expected to be of interest for the further optimization of industrially relevant filamentous fungi.

      Weaknesses:

      There are only a few open questions concerning the activity of the many nuclei in production strains (active versus inactive), their number of chromosomes (haploid/diploid), and whether hyper-branching always leads to propagation of nuclei.

      We are very grateful for your recognition of our findings, the proposed model, and their significance for future applications. We are grateful for the questions, which contribute to a more accurate understanding.

      Our responses to each are provided below. Necessary experiments are in progress.

      Reviewer #3 (Public review):

      Summary:

      The authors seek to determine the underlying traits that support the exceptional capacity of Aspergillus oryzae to secrete enzymes and heterologous proteins. To do so, they leverage the availability of multiple domesticated isolates of A. oryzae along with other Aspergillus species to perform comparative imaging and genomic analysis.

      Strengths:

      The strength of this study lies in the use of multifaceted approaches to identify significant differences in hyphal morphology that correlate with enzyme secretion, which is then followed by the use of genomics to identify candidate functions that underlie these differences.

      Weaknesses:

      There are aspects of the methods that would benefit from the inclusion of more detail on how experiments were performed and data interpreted.

      Overall, the authors have achieved their aims in that they are able to clearly document the presence of two distinct hyphal forms in A. oryzae and other Aspergillus species, and to correlate the presence of the thicker, rapidly growing form with enhanced enzyme secretion. The image analysis is convincing. The discovery that the addition of yeast extract and specific amino acids can stimulate the formation of the novel hyphal form is also notable. Although the conclusions are generally supported by the results, this is perhaps less so for the genetic analysis as it remains unclear how direct the role of RseA and the calcium transporters might be in supporting the formation of the thicker hyphae.

      The results presented here will impact the field. The complexity of hyphal morphology and how it affects secretion is not well understood despite the importance of these processes for the fungal lifestyle. In addition, the description of approaches that can be used to facilitate the study of these different hyphal forms (i.e., stimulation using yeast extract or specific amino acids) will benefit future efforts to understand the molecular basis of their formation.

      We are very grateful for your fair and thoughtful evaluation of our work. We agree that the genetic analysis in the latter part is relatively weaker compared to the imaging analysis in the first half. Rather than a single mutation causing a dramatic phenotypic change, we believe that the accumulation of various mutations through breeding leads to the observed phenotype, making it difficult to clearly demonstrate causality. Since transcriptome and SNP analyses have revealed key pathways and phenotypes, it would be gratifying if these insights could contribute to future applications utilizing filamentous fungi.

    1. Author Response:

      We sincerely thank the reviewers and the editorial team for their thoughtful and constructive evaluation of our manuscript. We are very pleased that both reviewers and the Reviewing Editor found the work to be compelling and of interest to the community studying membrane-associated condensates. Below we outline our planned revisions in response to the public reviews.

      Reviewer #1

      We appreciate Reviewer #1’s positive evaluation of the study’s significance and the utility of our theoretical framework.

      1. Understandably, the authors used one system to test their theory (ZO-1). However, to establish a theoretical framework, this is sufficient.

      Response: We acknowledge this limitation. While we agree that additional systems would strengthen the generality of our theory, we note that the focus of this work is to introduce and validate a theoretical framework. As the reviewer notes, this is sufficient for establishing the framework. Nonetheless, we are open to further collaborations or future studies to test the model with other systems.

      Reviewer #2

      We are grateful for Reviewer #2’s detailed comments and will address each of the points as follows:

      1. In the theoretical section, what has previously been known, compared to which equations are new, should be made more clear.

      Response: We will revise the theory section to clearly distinguish previously established formulations from novel contributions.

      1. Some assumptions in the model are made purely for convenience and without sufficient accompanying physical justification. E.g., the authors should justify, on physical grounds, why binding rate effects are/could be larger than the other fluxes.

      Response: We will expand the discussion to provide key physical justification, especially to explain why binding rate effects are/could be larger than the other fluxes.

      1. I feel that further mechanistic explanation as to why bulk phase separation widens the regime of surface phase separation is warranted.

      Response: We will elaborate on the mechanism underlying this coupling.

      1. The major advantage of the non-dilute theory as compared with a best parameterized dilute (or homogenous) theory requires further clarification/evidence with respect to capturing the experimental data.

      Response: We will clarify this comparison more explicitly and highlight how the non-dilute model captures key nonlinear behaviors and concentration-dependent adsorption phenomena that the dilute model fails to reproduce.

      1. Discrete (particle-based) molecular modelling could help to delineate the quantitative improvements that the non-dilute theory has over the previous state-of-the-art. Also, this could help test theoretical statements regarding the roles of bulk-phase separation, which were not explored experimentally.

      Response:  We appreciate the suggestion and agree that such modeling would be valuable. However, this is beyond the scope of the current study. We will add a discussion on how discrete simulations could be used to further test our theory in future work.

      1. Discussion of the caveats and limitations of the theory and modelling is missing from the text.

      Response:  We will add a paragraph outlining caveats and limitations of the modelling.

      We believe these changes will significantly improve the clarity and impact of our manuscript, and we thank the reviewers again for their valuable input.

    1. Author response:

      We thank the reviewers for their thoughtful and constructive feedback. As the reviewers noted, dissecting the contributions of Gtr1/2 and Pib2 to TORC1 signaling across diverse nutrient states is a technically and conceptually challenging problem. Indeed, many of the issues raised—including the interpretation of non-canonical TORC1 readouts (e.g., Rps6, Par32), the influence of strain auxotrophy and media composition, and the limitations of phosphoproteomic analysis performed under a single growth condition—underscore the challenges of working with the TORC1 signaling system.

      In response to the reviewers’ comments, we have undertaken a broader and more systematic analysis of TORC1 regulation across defined nitrogen transitions, building directly on the signaling framework established in Figures 6 and 8 of this manuscript. This work, which includes expanded phosphoproteomic profiling and the use of refined genetic tools, supports and extends the key conclusions of Cecil et. al. Specifically, it reinforces the existence of a Pib2-dependent TORC1 output under nitrogen-limited conditions and further clarifies the physiological relevance of the intermediate TORC1 activity state. Due to the scope and depth of this expanded work, we are reporting those findings in a separate publication. Nonetheless, we view the data presented here as a key foundational step in establishing a non-redundant framework for Gtr1/2- and Pib2-dependent control of TORC1.

      We have therefore made minor changes to the manuscript to clarify our use of different growth media and to temper our conclusions where appropriate. These changes, together with the context of ongoing work, should reinforce the value of Cecil et. al. in advancing our understanding of TORC1 and nutrient signaling in eukaryotes.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work Jeong and colleagues focus on exploring the role of the acyltransferase ZDHHC9 in myelinating OLs in particular in the palmitoylation of several myelin proteins. After confirming the specific enrichment of the Zdhhc9 transcript in mouse and human OLs, the authors examine the subcellular localization of the protein in vitro and observed that in comparison with other isoforms, ZDHHC9 localizes at OLs cell bodies and at discrete puncta in the processes. These observations (Figures 1 and 2) led the authors to hypothesize that ZDHHC9 plays an important role in myelination. No gross changes were detected in OL development in Zdhhc9 KO mice and analyses from P28 Zdhhc9 KO mice crossed with Mobp-EGFP reporter mice did not show changes in EGFP+ OL differentiation (Figure 3).

      However, and given the observed subcellular localization of ZDHHC9 in OL processes (Figure 2) and the observation that the percentage of unmyelinated axons is increased in Zdhhc9 KO (Figure 6), early time points to examine the differentiated pools of OLs and their capacity to extend processes/contact axons need to be considered.

      We appreciate this point, but due to the order in which experiments were performed, the ZDHHC9 KO mouse colony that we maintained after initial submission of this work contains homozygous MOBP-EGFP, but not the mT/mG transgene that would be most optimal for the proposed experiment. We hope the reviewer appreciates that it would take considerable time and effort regarding mouse breeding to cross out the MOBP and add back the mT/mG. We nonetheless appreciate the importance of the point raised and therefore examined an earlier developmental time point (P21, 3 weeks) to quantify OLs and NG2+ OPCs. In our updated Fig 3C1-C3, we use Mobp-EGFP mice to show that Zdhhc9 KO does not significantly affect the number of EGFP+ OLs at this time point in the cortex, corpus callosum and spinal cord. We also show that in corpus callosum, Zdhhc9 KO does not significantly affect the number of NG2+ OPCs at this earlier time point (Fig 3D, E). Furthermore, immunostaining to detect BCAS1, a marker of pre-mature OLs, also revealed no qualitative difference with ZDHHC9 loss at P21. We show representative images from these BCAS1 experiments in an updated Fig S3. While these new experiments do not address the morphology of OLs in Zdhhc9 KO, they do provide further evidence that deficits in myelination in young Zdhhc9 KO mice (Figure 6) are not likely due to gross differences in OPC or OL numbers during development.

      Maturation of OL in Zdhhc9 KO was examined by crossing Zdhhc9 KO with Pdgfra-CreER;R26- EGFP and following the newly EGFP-labelled OPCs following tamoxifen administration. No changes in the numbers of EGFP+ OL were detected. The authors concluded that the loss of ZDHHC9 does not alter oligodendrogenesis in either the young or mature CNS. The authors observed defects in Zdhhc9 KO OL protrusions that they attributed to abnormal OL membrane expansion (Fig 4 and 5). Can they show evidence for this?

      This is an important point, and we appreciate the opportunity to explain the reasoning behind our initial statement more fully, while noting that other explanations are possible. Fig 5B (an Imaris-assisted reconstruction using the EGFP cell fill/morphology marker) highlights large spheroid-like distensions along OL processes. We reason that these spheroids are enclosed by the OL lipid membrane because if the membrane were ruptured, the EGFP signal would likely diffuse. This in turn suggests that the caliber of the OL process at the position of the spheroid is grossly abnormal i.e. the membrane has hyper-expanded. Given that OL membrane growth during myelination extends in two directions, i.e., spiral growth to the axonal surface and longitudinal growth along the axon, it is possible that spheroid-like structures are formed by uneven myelin growth. We recognize that we cannot yet conclude whether and how spheroid formation might be linked to the myelination deficit that we observe in Zdhhc9 KO mice. However, defining the subcellular mechanism for spheroid formation may provide further insights into this issue. We have therefore largely retained the original statement but have added the reasoning above to our revised Discussion.

      The authors report that Zdhhc9 KO primary and secondary branches in OL were longer, some contained spheroid-like swellings and the OL protrusion complexity was higher. However, these data is partially contradictory to what they show in OL differentiation experiments in vitro (Fig 7). There is also no evidence for increased membrane expansion in Zdhhc9 knockdown myelin forming cells in culture. How to reconcile this? 

      We appreciate the reviewer’s interest in this issue. Several non-mutually exclusive factors could account for the differences in OL morphology in vitro versus in vivo caused by Zdhhc9 loss. First, morphology in vivo may well be influenced by the axons and/or other extrinsic components around each OL that are not present in our primary cultures. Second, OL growth in vivo is highly 3-dimensional, whereas growth in culture is largely 2-dimensional – it may be difficult to support formation of spheroids (by definition, a 3-dimensional structure) in the latter situation. Finally, Zdhhc9 is absent in vivo from the beginning of development until the time points examined, whereas in our cultured OL experiments, Zdhhc9 shRNA is virally delivered to OPC cultures at DIV2 and likely acutely affects Zdhhc9 expression predominantly in committed OLs (following the switch to differentiation medium at DIV3). These differences may also affect the ability of other PATs or, potentially, palmitoylation-independent subcellular processes, to compensate for Zdhhc9 loss. We have more fully explained these points in our revised Discussion. 

      Reviewer #2 (Public Review):

      This study provides an in-depth exploration of the impact of X-linked ZDHHC9 gene mutations on cognitive deficits and epilepsy, with a particular focus on the expression and function of ZDHHC9 in myelin-forming oligodendrocytes (OLs). These findings offer crucial insights into understanding ZDHHC9-related X-linked intellectual disability (XLID) and shed light on the regulatory mechanisms of palmitoylation in myelination. The experimental design and analysis of results are convincing, providing a valuable reference for further research in this field. However, upon careful review, I believe the article still needs further improvement and supplementation in the following aspects:

      (1) Regarding the subcellular localization experiment of ZDHHC9 mutants in OL, it is currently limited to in vitro cultured OL, lacking validation in vivo OL or myelin sheath. Additionally, it is necessary to investigate whether the abnormal subcellular localization of ZDHHC9 mutants affects their enzyme activity and palmitoylation modification of substrate proteins.

      This is an important point but is technically challenging to address in vivo as it would likely require delivery of AAV to express ZDHHC9wt and XLID mutants specifically in OLs, preferably in the absence of endogenous ZDHHC9. We hope the reviewers would agree that this experiment is beyond the scope of the current study. However, we did compare the ability of ZDHHC9wt and XLID mutants to palmitoylate MBP, and to autopalmitoylate (sometimes used as a surrogate measure of PAT activity) in transfected heterologous cells. Although we recognize that this over-expression system is less physiological than a native OL, it has the benefit of being able to readily compare transfected wt vs mutant forms of ZDHHC9 with minimal contribution from endogenous ZDHHC9. Intriguingly, using this system, we found that autopalmitoylation activity of the XLID ZDHHC9-P150S mutant does not differ significantly from that of ZDHHC9wt, and that this mutant is still capable of palmitoylating MBP. Moreover, the R96W mutant, while impaired in autopalmitoylation, still palmitoylated MBP approximately 50% as effectively as ZDHHC9wt in our cell-based assay. These findings suggest that ZDHHC9-P150S and, probably, ZDHHC9-R96W mutants might still be able to palmitoylate substrates in OLs if they were properly localized. This possibility in turn suggests that impaired subcellular targeting in addition to, or instead of, impaired catalytic activity, may be a key factor in certain cases of ZDHHC9-associated XLID. We have expanded our Figure 8 (new panels 8E-G) to show these additional experiments and have summarized the conclusions above in our revised Discussion. We thank the reviewer for suggesting that we further investigate this issue.

      (2) The experimental period (P21+21 days) using genetic labeling to track the development of myelinating cells may not be long enough. It is recommended to extend the observation time and analyze at more time points to more comprehensively reflect the impact of Zdhhc9 KO.

      We appreciate this point from the reviewer but, regrettably, we did not maintain the PdgfraCreER; R26-EGFP; Zdhhc9 KO mouse line and hope the reviewer appreciates that it would take considerable time and effort to rederive this line and then perform the suggested extended time course experiments. However, we note for the reviewer that our preliminary studies did not reveal any effect of Zdhhc9 KO on the number of MOBP-EGFP+ OLs in 6-month-old mice (not shown), consistent with a model in which Zdhhc9 loss does not affect OPC-OL commitment per se.

      (3) The author speculates that Zdhhc9 may regulate myelination by affecting the membrane localization of specific myelin proteins, but lacks direct experimental evidence to support this. It is suggested to detect the expression and distribution of relevant proteins in the myelin of Zdhhc9 KO mice.

      We share the reviewer’s interest in this point but realized that it is more technically challenging to address than might be initially thought. The main protein we would implicate and seek to test is MBP, but we already found that there is no gross change in MBP distribution in vivo in Zdhhc9 KO mice (Fig 3A). However, an anti-MBP antibody recognizes all forms of MBP, not just the specific splice variants whose palmitoylation is affected by ZDHHC9 loss. Specifically assessing nanoscale distribution of these splice variants would require a way (e.g. anti-MBP splice form-specific antibodies that are compatible with immuno-EM) to distinguish these variants from other, non-palmitoylated forms of MBP. Although such an antibody could be an important tool, we hope the reviewers would agree that developing and characterizing such a reagent is beyond the scope of the current study.

      We do, however, note that the lack of gross change in MBP distribution and levels in Zdhhc9 KO mice is consistent with the relatively mild phenotype of these mice, compared with shiverer (shi/shi) mice, in which MBP is completely lost. In shiverer, CNS compact myelin is almost absent (PMID: 671037; PMID: 88695; PMID: 460693) and, as the name suggests, mice display a shivering gait, and exhibit seizures and early death. In contrast, Zdhhc9 mice show only subtle behavioral deficits (PMID: 29944857). These differences are all consistent with a model in which Zdhhc9 KO mice, despite their significantly reduced MBP palmitoylation (Fig 8) have grossly normal distribution and levels of MBP when all splice variants are assessed (Fig 3, Fig 8). It is not inconceivable that Zdhhc9 KO mice have a nanoscale change in the distribution of MBP, particularly of specific palmitoylated splice variants, within myelin that profoundly affects myelin ultrastructure, without grossly altering MBP distribution. However, an alternative and not mutually exclusive possibility is that aberrant palmitoylation of other Zdhhc9 substrates accounts for, or contributes to, the abnormalities in myelin at the ultrastructural level. Addressing this issue would require a multi-pronged approach, not just to assess palmitoylation and distribution of such proteins in Zdhhc9 KO, but also to test whether they are direct Zdhhc9 substrates, in order to rule out indirect effects. We hope reviewers would agree that this is best left to a separate study. However, in our revised Discussion we now summarize what can be inferred regarding Zdhhc9-dependent effects on total and splicevariant specific distribution and levels of MBP.  

      (4) Although the article mentions the association of Zdhhc9 with intellectual disabilities, it does not involve behavioral analysis of Zdhhc9 KO mice. It is recommended to supplement some behavioral experimental data to support the important role of Zdhhc9 in maintaining normal cognitive function, enhancing the clinical relevance of the article.

      We appreciate this point from the reviewer. The behavior of the same ZDHHC9 KO mouse line that we used was reported in PMID: 31747610 and in PMID: 29944857. In the former study, Zdhhc9 KO mice were reported to display seizures reminiscent of phenotypes in human patients with ZDHHC9 mutation. The latter study assessed performance of Zddhc9 KO mice in several tasks that test cognitive function. Specifically the KO mice were reported to display “altered behaviour in the open-field test, elevated plus maze and acoustic startle test that is consistent with a reduced anxiety level; a reduced hang time in the hanging wire test that suggests underlying hypotonia but which may also be linked to reduced anxiety [and] deficits in the Morris water maze test of hippocampal-dependent spatial learning and memory.”. We have incorporate these findings in our revised Discussion, where we summarize how these phenotypes are common, not just to human patients with ZDHHC9 mutation, but also to other human neurodevelopmental conditions and mouse models in which ID is a common feature.

      (5) For the abnormal myelination observed in Zdhhc9 KO mice, including unmyelinated large-diameter axons and excessively myelinated small-diameter axons, the article lacks indepth research and explanation on the exact mechanism and mode of action of ZDHHC9 in regulating myelination.

      We share the reviewer’s interest in this point but again note that gaining definitive insights into this issue is far from trivial. Convincing evidence of a causative mechanism would require an exhaustive identification of ZDHHC9 in vivo substrates, followed by point mutation of substrate palmitoylation site(s) to determine the extent to which palmitoylation of such protein(s) phenocopies ZDHHC9 loss. Nonetheless, it is possible to break this question down and to summarize what we do and do not know. For example, our experiments in cultured OLs show that ZDHHC9 loss causes call-autonomous deficits in morphological maturation of these cells. We also know that ZDHHC9 loss results in impaired palmitoylation of MBP, a direct substrate for ZDHHC9. Moreover, loss of ZDHHC9 at Golgi outposts in OLs (a phenotype observed with several XLID-associated mutant forms of ZDHHC9, even those with no significant loss of catalytic activity) correlates with intellectual disability. Together, these findings are consistent with a model in which ZDHHC9 action at OL Golgi outposts is critical for normal myelination. However, it is yet to be determined whether the key substrates of ZDHHC9 include MBP, other palmitoyl-proteins that are key constituents of CNS myelin, or proteins whose palmitoylation is important for myelin protein trafficking and targeting. Another non-mutually exclusive possibility is that ZDHHC9 acts at Golgi outposts but indirectly, for example to drive the expression of myelin protein genes. Future experiments, including but not limited to palmitoyl-proteomics in ZDHHC9 (OL-specific) KO mice, will be needed to provide more definitive insights into this issue. We have expanded our Discussion of links between ZDHHC9 mutation and impaired myelination to summarize the above points.

      (6) The function of ZDHHC9 in OL may be related to the Golgi apparatus, but its exact role in these structures is still unclear. It is suggested to discuss in more detail the role of ZDHHC9 in the Golgi apparatus in the discussion section.

      We appreciate this point, which we considered as related to point (5) above. In our revised Discussion we highlight how ZDHHC9 action at Golgi outposts may involve direct palmitoylation of myelin proteins, palmitoylation of proteins that direct myelin proteins to the myelin membrane and/or activation of gene expression programs that serve to drive myelination. We further note that these possibilities are not mutually exclusive.

      (7) More experimental support and in-depth research are needed on the detailed mechanism of how ZDHHC9 and Golga7 cooperatively regulate MBP palmitoylation, and how this decrease in palmitoylation level leads to myelination defects.

      This is another important point – our new experiments suggest that, although some XLID mutations markedly affect ZDHHC9’s ability to palmitoylate MBP, others do not, yet all of the mutant forms fail to localize to Golgi outposts. These findings are consistent with a model in which the subcellular location at which ZDHHC9 palmitoylates MBP, and potentially other substrates, is critical for normal myelination. Interestingly, despite their marked differences in basal catalytic activity (as assessed by autopalmitoylation), wt and all XLID forms of ZDHHC9 appear to show enhanced activity (measured by both auto- and MBP palmitoylation) in the presence of ZDHHC9, suggesting that the association with Golga7 (which also localizes to Golgi outposts) is central to ZDHHC9 activity. This model is also highly consistent with the biased expression of Golga7 in OLs, compared to other CNS cell types (Fig 1E, 1F). Moreover, XLID-associated mutant forms of ZDHHC9 also show reduced protein stability and are impaired in their ability to form complexes with Golga7 (also known as Golgi Complex Protein 16kDa; GCP16; PMID: 37035671). Failure of ZDHHC9 XLID mutants to localize to Golgi outposts may thus be due to aberrant trafficking of mutant ZDHHC9 per se, but may also involve impaired association/stabilization of ZDHHC9/Golga7 complexes at these locations. Again, it is possible that either or both of these mechanisms, which are not mutually exclusive, contribute to impaired MBP palmitoylation and/or myelination deficits. We summarize these points in our revised Discussion.

      In summary, it is recommended that the authors address the above issues through additional experiments and improved discussions to further strengthen the credibility and clinical relevance of the article.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      No gross changes were detected in OL development in Zdhhc9 KO mice and analyses from P28 Zdhhc9 KO mice crossed with Mobp-EGFP reporter mice did not show changes in EGFP+ OL differentiation (Figure 3). However, and given the observed subcellular localization of ZDHHC9 in OL processes (Figure 2) and the observation that the percentage of unmyelinated axons is increased in Zdhhc9 KO (Figure 6), ***early time points to examine the differentiated pools of OLs and their capacity to extend processes/contact axons need to be considered***.

      We appreciate this point, but due to the order in which experiments were performed, the ZDHHC9 KO mouse colony that we maintained after initial submission of this work contains homozygous MOBP-EGFP, but not the mT/mG transgene that would be most optimal for the proposed experiment. We hope the reviewer appreciates that it would take considerable time and effort regarding mouse breeding to cross out the MOBP and add back the mT/mG. We nonetheless appreciate the importance of the point raised and therefore examined an earlier developmental time point (P21, 3 weeks) to quantify OLs and NG2+ OPCs. In our updated Fig 3C1-C3, we use Mobp-EGFP mice to show that Zdhhc9 KO does not significantly affect the number of EGFP+ OLs at this time point in the cortex, corpus callosum and spinal cord. We also show that in corpus callosum, Zdhhc9 KO does not significantly affect the number of NG2+ OPCs at this earlier time point (Fig 3D, E). Furthermore, immunostaining to detect BCAS1, a marker of pre-mature OLs, also revealed no qualitative difference with ZDHHC9 loss at P21. We show representative images from these BCAS1 experiments in an updated Fig S3. While these new experiments do not address the morphology of OLs in Zdhhc9 KO, they do provide further evidence that deficits in myelination in young Zdhhc9 KO mice (Figure 6) are not likely due to gross differences in OPC or OL numbers during development.

      The authors observed defects in Zdhhc9 KO OL protrusions that they attributed to abnormal OL membrane expansion (Fig 4 and 5). Can they show evidence for this?

      This is an important point, and we appreciate the opportunity to explain the reasoning behind our initial statement more fully, while noting that other explanations are possible. Fig 5B (an Imaris-assisted reconstruction using the EGFP cell fill/morphology marker) highlights large spheroid-like distensions along OL processes. We reason that these spheroids are enclosed by the OL lipid membrane because if the membrane were ruptured, the EGFP signal would likely diffuse. This in turn suggests that the caliber of the OL process at the position of the spheroid is grossly abnormal i.e. the membrane has hyper-expanded. Given that OL membrane growth during myelination extends in two directions, i.e., spiral growth to the axonal surface and longitudinal growth along the axon, it is possible that spheroid-like structures are formed by uneven myelin growth. We recognize that we cannot yet conclude whether and how spheroid formation might be linked to the myelination deficit that we observe in Zdhhc9 KO mice.

      However, defining the subcellular mechanism for spheroid formation may provide further insights into this issue. We have therefore largely retained the original statement but have added the reasoning above to our revised Discussion.

      The authors report that Zdhhc9 KO primary and secondary branches in OL were longer, some contained spheroid-like swellings and the OL protrusion complexity was higher. However, these data is partially contradictory to what they show in OL differentiation experiments in vitro (Fig 7). There is also no evidence for increased membrane expansion in Zdhhc9 knockdown myelin forming cells in culture. How do they reconcile these different findings?

      We appreciate the reviewer’s interest in this issue. Several non-mutually exclusive factors could account for the differences in OL morphology in vitro versus in vivo caused by Zdhhc9 loss. First, morphology in vivo may well be influenced by the axons and/or other extrinsic components around each OL that are not present in our primary cultures. Second, OL growth in vivo is highly 3-dimensional, whereas growth in culture is largely 2-dimensional – it may be difficult to support formation of spheroids (by definition, a 3-dimensional structure) in the latter situation. Finally, Zdhhc9 is absent in vivo from the beginning of development until the time points examined, whereas in our cultured OL experiments, Zdhhc9 shRNA is virally delivered to OPC cultures at DIV2 and likely acutely affects Zdhhc9 expression predominantly in committed OLs (following the switch to differentiation medium at DIV3). These differences may also affect the ability of other PATs or, potentially, palmitoylation-independent subcellular processes, to compensate for Zdhhc9 loss. We have more fully explained these points in our revised Discussion. 

      Page 7: "The OL processes in this culture condition correspond to large lipid-rich membranous sheets that form spiral membrane expansion on axons in vivo (49)." At which stage are authors referring to? OL processes are extended in culture before membrane formation and this is not clear here. In a 3-days differentiation culture, most OLs have not yet formed a myelin sheath (eg., Figure 2 in Zuchero et al., 2015, Dev Cell).

      We appreciate the reviewer highlighting this point. We first note that our oligodendrocyte (OL) culture conditions differ from the immunopanning method used by Zuchero et al., 2015 (original reference (Emery and Dugas, 2013)), which may affect the time course and progression of OL process elaboration and/or myelin sheath formation. We further note that in our cultures most EGFP+ processes are also MBP+ at the time point examined (strictly 3 days plus 9 hours post-differentiation). It thus seems likely that these MBP+ structures largely correspond to the MBP+ wrapping sheaths that occur in vivo, so we have therefore retained our original statement but have added this further explanation.

      Minor: Figure 6 (Legend): Time points should be indicated throughout the panels.

      We have added this information as requested

      Reviewer 2 Recommendations for the Authors:

      (1) Regarding the subcellular localization experiment of ZDHHC9 mutants in OL, it is currently limited to in vitro cultured OL, lacking validation in vivo OL or myelin sheath. Additionally, it is necessary to investigate whether the abnormal subcellular localization of ZDHHC9 mutants affects their enzyme activity and palmitoylation modification of substrate proteins.

      We thank the reviewer for raising this point. New data in our revised Figure 8 compares autopalmitoylation (sometimes used as a surrogate measure of PAT activity) of ZDHHC9wt and XLID mutants, and their ability to palmitoylate MBP in transfected cells. Intriguingly, we found that autopalmitoylation activity of the ZDHHC9-P150S mutant does not differ significantly from that of ZDHHC9wt, and that this mutant is still capable of palmitoylating MBP. Moreover, the R96W mutant, while impaired in autopalmitoylation, still palmitoylated MBP approximately 50% as effectively as ZDHHC9wt in our cell-based assay. These findings suggest that ZDHHC9-P150S and, probably, ZDHHC9-R96W mutants might still be able to palmitoylate substrates in OLs if they were properly localized. This possibility in turn suggests that impaired subcellular targeting in addition to, or instead of, impaired catalytic activity, may be a key factor in certain cases of ZDHHC9-associated XLID. We have expanded our Figure 8 to show these new experiments and have summarized the conclusions above in our revised Discussion. We thank the reviewer for suggesting that we further investigate this issue.

      (2) The experimental period (P21+21 days) using genetic labeling to track the development of myelinating cells may not be long enough. It is recommended to extend the observation time and analyze at more time points to more comprehensively reflect the impact of Zdhhc9 KO.

      We appreciate this point from the reviewer but, regrettably, we did not maintain the PdgfraCreER; R26-EGFP; Zdhhc9 KO mouse line and hope the reviewer appreciates that it would take considerable time and effort to rederive this line and then perform the suggested extended time course experiments. However, we note for the reviewer that our preliminary studies did not reveal any effect of Zdhhc9 KO on the number of MOBP-EGFP+ OLs in 6-month-old mice (not shown), consistent with a model in which Zdhhc9 loss does not affect OPC-OL commitment per se.

      (3) The author speculates that Zdhhc9 may regulate myelination by affecting the membrane localization of specific myelin proteins, but lacks direct experimental evidence to support this. It is suggested to detect the expression and distribution of relevant proteins in the myelin of Zdhhc9 KO mice.

      We share the reviewer’s interest in this point but realized that it is more technically challenging to address than might be initially thought. The main protein we would implicate and seek to test is MBP, but we already found that there is no gross change in MBP distribution in vivo in Zdhhc9 KO mice (Fig 3A). However, an anti-MBP antibody recognizes all forms of MBP, not just the specific splice variants whose palmitoylation is affected by ZDHHC9 loss. Specifically assessing nanoscale distribution of these splice variants would require a way (e.g. am anti-MBP splice form-specific antibody that is compatible with immuno-EM) to distinguish these variants from other, non-palmitoylated forms of MBP. Although such an antibody could be an important tool we hope the reviewers would agree that developing and characterizing such a reagent is beyond the scope of the current study.

      We do, however, note that the lack of gross change in MBP distribution and levels in Zdhhc9 KO mice is consistent with the relatively mild phenotype of these mice, compared with shiverer (shi/shi) mice, in which MBP is completely lost. In shiverer, CNS compact myelin is almost absent (PMID: 671037; PMID: 88695; PMID: 460693) and, as the name suggests, mice display a shivering gait, and exhibit seizures and early death. In contrast, Zdhhc9 mice show only subtle behavioral deficits (PMID: 29944857). These differences are all consistent with a model in which Zdhhc9 KO mice, despite their significantly reduced MBP palmitoylation (Fig 8) have grossly normal distribution and levels of MBP when all splice variants are assessed (Fig 3, Fig 8). It is not inconceivable that Zdhhc9 KO mice have a nanoscale change in the distribution of MBP, particularly of specific palmitoylated splice variants, within myelin that profoundly affects myelin ultrastructure, without grossly altering MBP distribution. However, an alternative and not mutually exclusive possibility is that aberrant palmitoylation of other

      Zdhhc9 substrates accounts for, or contributes to, the abnormalities in myelin at the ultrastructural level. Addressing this issue would require a multi-pronged approach, not just to assess palmitoylation and distribution of such proteins in Zdhhc9 KO, but also to test whether they are direct Zdhhc9 substrates, in order to rule out indirect effects. We hope reviewers would agree that this is best left to a separate study. However, in our revised Discussion we now summarize what can be inferred regarding Zdhhc9-dependent effects on total and splicevariant specific distribution and levels of MBP.  

      (4) Although the article mentions the association of Zdhhc9 with intellectual disabilities, it does not involve behavioral analysis of Zdhhc9 KO mice. It is recommended to supplement some behavioral experimental data to support the important role of Zdhhc9 in maintaining normal cognitive function, enhancing the clinical relevance of the article.

      We appreciate this point from the reviewer. The behavior of the same ZDHHC9 KO mouse line that we used was reported in PMID: 31747610 and in PMID: 29944857. In the former study, Zdhhc9 KO mice were reported to display seizures reminiscent of phenotypes in human patients with ZDHHC9 mutation. The latter study assessed performance of Zddhc9 KO mice in several tasks that test cognitive function. Specifically the KO mice were reported to display “altered behaviour in the open-field test, elevated plus maze and acoustic startle test that is consistent with a reduced anxiety level; a reduced hang time in the hanging wire test that suggests underlying hypotonia but which may also be linked to reduced anxiety [and] deficits in the Morris water maze test of hippocampal-dependent spatial learning and memory.”. We have incorporate these findings in our revised Discussion, where we summarize how these phenotypes are common, not just to human patients with ZDHHC9 mutation, but also to other human neurodevelopmental conditions and mouse models in which ID is a common feature.

      (5) For the abnormal myelination observed in Zdhhc9 KO mice, including unmyelinated large-diameter axons and excessively myelinated small-diameter axons, the article lacks indepth research and explanation on the exact mechanism and mode of action of ZDHHC9 in regulating myelination.

      We share the reviewer’s interest in this point but again note that gaining definitive insights into this issue is far from trivial. Convincing evidence of a causative mechanism would require an exhaustive identification of ZDHHC9 in vivo substrates, followed by point mutation of substrate palmitoylation site(s) to determine the extent to which palmitoylation of such protein(s) phenocopies ZDHHC9 loss. Nonetheless, it is possible to break this question down and to summarize what we do and do not know. For example, our experiments in cultured OLs show that ZDHHC9 loss causes call-autonomous deficits in morphological maturation of these cells. We also know that ZDHHC9 loss results in impaired palmitoylation of MBP, a direct substrate for ZDHHC9. Moreover, loss of ZDHHC9 at Golgi outposts in OLs (a phenotype observed with several XLID-associated mutant forms of ZDHHC9, even those with no significant loss of catalytic activity) correlates with intellectual disability. Together, these findings are consistent with a model in which ZDHHC9 action at OL Golgi outposts is critical for normal myelination. However, it is yet to be determined whether the key substrates of ZDHHC9 include MBP, other palmitoyl-proteins that are key constituents of CNS myelin, or proteins whose palmitoylation is important for myelin protein trafficking and targeting. Another non-mutually exclusive possibility is that ZDHHC9 acts at Golgi outposts but indirectly, for example to drive the expression of myelin protein genes. Future experiments, including but not limited to palmitoyl-proteomics in ZDHHC9 (OL-specific) KO mice, will be needed to provide more definitive insights into this issue. We have expanded our Discussion of links between ZDHHC9 mutation and impaired myelination to summarize the above points.

      (6) The function of ZDHHC9 in OL may be related to the Golgi apparatus, but its exact role in these structures is still unclear. It is suggested to discuss in more detail the role of ZDHHC9 in the Golgi apparatus in the discussion section.

      We appreciate this point, which we considered as related to point (5) above. In our revised Discussion we highlight how ZDHHC9 action at Golgi outposts may involve direct palmitoylation of myelin proteins, palmitoylation of proteins that direct myelin proteins to the myelin membrane and/or activation of gene expression programs that serve to drive myelination. We further note that these possibilities are not mutually exclusive.

      (7) More experimental support and in-depth research are needed on the detailed mechanism of how ZDHHC9 and Golga7 cooperatively regulate MBP palmitoylation, and how this decrease in palmitoylation level leads to myelination defects.

      This is another important point – our new experiments suggest that, although some XLID mutations markedly affect ZDHHC9’s ability to palmitoylate MBP, others do not, yet all of the mutant forms fail to localize to Golgi outposts. These findings are consistent with a model in which the subcellular location at which ZDHHC9 palmitoylates MBP, and potentially other substrates, is critical for normal myelination. Interestingly, despite their marked differences in basal catalytic activity (as assessed by autopalmitoylation), wt and all XLID forms of ZDHHC9 appear to show enhanced activity (measured by both auto- and MBP palmitoylation) in the presence of ZDHHC9, suggesting that the association with Golga7 (which also localizes to Golgi outposts) is central to ZDHHC9 activity. This model is also highly consistent with the biased expression of Golga7 in OLs, compared to other CNS cell types (Fig 1E, 1F). Moreover, XLID-associated mutant forms of ZDHHC9 also show reduced protein stability and are impaired in their ability to form complexes with Golga7 (also known as Golgi Complex Protein 16kDa; GCP16; PMID: 37035671). Failure of ZDHHC9 XLID mutants to localize to Golgi outposts may thus be due to aberrant trafficking of mutant ZDHHC9 per se, but may also involve impaired association/stabilization of ZDHHC9/Golga7 complexes at these locations. Again, it is possible that either or both of these mechanisms, which are not mutually exclusive, contribute to impaired MBP palmitoylation and/or myelination deficits. We summarize these points in our revised Discussion.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      This manuscript determines how PA28g, a proteasome regulator that is overexpressed in tumors, and C1QBP, a mitochondrial protein for maintaining oxidative phosphorylation that plays a role in tumor progression, interact in tumor cells to promote their growth, migration and invasion. Evidence for the interaction and its impact on mitochondrial form and function was provided although it is not particularly strong.

      The revised manuscript corrected mislabeled data in figures and provides more details in figure legends. Misleading sentences and typos were corrected. However, key experiments that were suggested in previous reviews were not done, such as making point mutations to disrupt the protein interactions and assess the consequence on protein stability and function. Results from these experiments are critical to determine whether the major conclusions are fully supported by the data.

      The second revision of the manuscript included the proximity ligation data to support the PA28g-C1QBP interaction in cells. However, the method and data were not described in sufficient detail for readers to understand. The revision also includes the structural models of the PA28g-C1QBP complex predicted by AlphaFold. However, the method and data were not described with details for readers to understand how this structural modeling was done, what is the quality of the resulting models, and the physical nature of the protein-protein interaction such as what kind of the non-covalent interactions exist in the interface of the protein complexes. Furthermore, while the interactions mediated by the protein fragments were tested by pull-down experiments, the interactions mediated by the three residues were not tested by mutagenesis and pull-down experiments. In summary, the revision was improved, but further improvement is needed.

      Thank you very much for your comments.

      (1) Based on your suggestion, we predicted the possible interaction sites using AlphaFold 3 and found that mutations in amino acids 76 and 78 of C1QBP affect the interaction with PA28γ (Revised Appendix Figure 1J). Subsequently, pulldown experiment also found that after mutating the amino acids at the two aforementioned sites (T76A, G78N), C1QBP that could bind to PA28γ decreased (Revised Figure 1J). The above results confirm that PA28γ could interacts with C1QBP, in a manner dependent on the N-terminus of C1QBP. These findings are now included in the revised manuscript “In addition, we employed AlphaFold 3 to perform energy minimization and predict hydrogen bonds between the C1QBP N-terminus (amino acids 1-167) and the PA28γ protein interaction region. The results suggest that the T76 and G78 residues of C1QBP may be key contributors to the interaction. Consistently, coimmunoprecipitation analysis demonstrated that mutations at these sites (C1QBPT76A and C1QBPG78N) significantly reduced the binding ability to PA28γ (Fig. 1J and Appendix Fig. 1J)”, specifically in results section. We believe this additional validation strengthens the robustness of our findings.

      (2) According to your suggestion, we have added a description of the results of PLA in the figure legend (Revised Figure 1C) and the method of PLA in the appendix file (Revised Appendix file, Part “Proximity Ligation Assay”). The revised text reads as follows: (C) PLA image of UM1 cells shows the interaction between C1QBP and PA28γ in both cytoplasm and nucleus (red fluorescence).

      (3) In the light of your suggestion, we have enriched the description of AlphaFold 3 analysis in the appendix file (Revised Appendix file, Page 10-11). The revised text reads as follows:

      “Prediction and Analysis of Protein Interactions

      Protein Sequence Retrieval and Structure Prediction

      The protein sequences of C1QBP and PA28γ were obtained from the AlphaFold Protein Structure Database. Structural predictions of the protein-protein interaction between C1QBP and PA28γ were conducted using AlphaFold 3. The plDDT (predicted local distance difference test) values were utilized to assess the confidence of the predicted models. Models with a plDDT score above 70 were considered confident, while those with a score above 90 were categorized as very high confidence. These values were annotated in the figures to indicate the reliability of the structural predictions.”

      “Protein Preparation and Structure Optimization

      The best-scored model for the C1QBP-PA28γ interaction predicted by AlphaFold 3 was selected for further analysis. The model was imported into MOE 2022 (Molecular Operating Environment) software for protein preparation. This process included the removal of water molecules and other heteroatoms, followed by the addition of hydrogen atoms to the structure. This step was essential for optimizing the protein’s 3D conformation and ensuring the correctness of the protonation states at physiological pH.”

      “Energy Minimization and Hydrogen Bond Prediction

      The protein structure was subjected to energy minimization using the Amber10: EHT (Effective Hamiltonian Theory) force field, with R-field 1: 80 settings to refine the model’s geometry. The minimization process was performed to optimize the protein’s internal energy and ensure stable conformation, followed by calculation of hydrogen bond interactions. The interaction energies and hydrogen bonds were analyzed to identify potential binding sites and stabilize the predicted protein-protein complex.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Astrocytes are known to express neuroligins 1-3. Within neurons, these cell adhesion molecules perform important roles in synapse formation and function. Within astrocytes, a significant role for neuroligin 2 in determining excitatory synapse formation and astrocyte morphology was shown in 2017. However, there has been no assessment of what happens to synapses or astrocyte morphology when all three major forms of neuroligins within astrocytes (isoforms 1-3) are deleted using a well characterized, astrocyte specific, and inducible cre line. By using such selective mouse genetic methods, the authors here show that astrocytic neuroligin 1-3 expression in astrocytes is not consequential for synapse function or for astrocyte morphology. They reach these conclusions with careful experiments employing quantitative western blot analyses, imaging and electrophysiology. They also characterize the specificity of the cre line they used. Overall, this is a very clear and strong paper that is supported by rigorous experiments. The discussion considers the findings carefully in relation to past work. This paper is of high importance, because it now raises the fundamental question of exactly what neuroligins 1-3 are actually doing in astrocytes. In addition, it enriches our understanding of the mechanisms by which astrocytes participate in synapse formation and function. The paper is very clear, well written and well illustrated with raw and average data.

      Comments on revisions:

      My previous comments have been addressed. I have no additional points to make and congratulate the authors.

      Thank you for your acceptance.

      Reviewer #2 (Public Review):

      In the present manuscript, Golf et al. investigate the consequences of astrocyte-specific deletion of Neuroligin (Nlgn) family cell adhesion proteins on synapse structure and function in the brain. Decades of prior research had shown that Neuroligins mediate their effects at synapses through their role in the postsynaptic compartment of neurons and their transsynaptic interaction with presynaptic Neurexins. More recently, it was proposed for the first time that Neuroligins expressed by astrocytes can also bind to presynaptic Neurexins to regulate synaptogenesis (Stogsdill et al. 2017, Nature). However, several aspects of the model proposed by Stogsdill et al. on astrocytic Neuroligin function conflict with prior evidence on the role of Neuroligins at synapses, prompting Golf et al. to further investigate astrocytic Neuroligin function in the current study. Using postnatal conditional deletion of Nlgn1-3 specifically from astrocytes in mice, Golf et al. show that virtually no changes in the expression of synaptic proteins or in the properties of synaptic transmission at either excitatory or inhibitory synapses are observed. Moreover, no alterations in the morphology of astrocytes themselves were found. To further extend this finding, the authors additionally analyzed human neurons co-cultured with mouse glia lacking expression of Nlgn1-4. No difference in excitatory synaptic transmission was observed between neurons cultured in the present of wildtype vs. Nlgn1-4 conditional knockout glia. The authors conclude that while Neuroligins are indeed expressed in astrocytes and are hence likely to play some role there, this role does not include any direct consequences on synaptic structure and function, in direct contrast to the model proposed by Stogsdill et al.

      Overall, this is a strong study that addresses a fundamental and highly relevant question in the field of synaptic neuroscience. Neuroligins are not only key regulators of synaptic function, they have also been linked to numerous psychiatric and neurodevelopmental disorders, highlighting the need to precisely define their mechanisms of action. The authors take a wide range of approaches to convincingly demonstrate that under their experimental conditions, Nlgn1-3 are efficiently deleted from astrocytes in vivo, and that this deletion does not lead to major alterations in the levels of synaptic proteins or in synaptic transmission at excitatory or inhibitory synapses, or in the morphology of astrocytes. While the co-culture experiments are somewhat more difficult to interpret due to lack of a control for the effect of wildtype mouse astrocytes on human neurons, they are also consistent with the notion that deletion of Nlgn1-4 from astrocytes has no consequences for the function of excitatory synapses. Together, the data from this study provide compelling and important evidence that, whatever the role of astrocytic Neuroligins may be, they do not contribute substantially to synapse formation or function under the conditions investigated.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors have fully addressed my concerns, and have in particular conducted a very elegant and compelling analysis of the degree of deletion of astrocytic Nlgn1-3/4 in their models. This greatly strengthens the main claims of their study and the fundamental nature of their conclusions for the field of synapse biology.

      I am somewhat less convinced by the newly added experiment to investigate deletion of Nlgns1-4 from glia in glia-neuron co-cultures. The authors provide no evidence to show that either WT or cKO glia have any effect on synapse formation or function in human neurons, and therefore, the current lack of a difference could equally result from the fact that both WT and cKO glia were non-functional altogether. The authors cite two studies to state that human neurons do not form synapses in the absence of astrocytes, Zhang et al. 2013 and Huang et al. 2017, but neither seem to be listed in the references (unless Zhang et al. 2014 was meant), making it difficult to assess the relevance of these data. However, since the data on astrocytic Nlgn1-3 deletion in vivo are compelling on their own, I do not see the co-culture experiment as essential for the main conclusions of the study.

      Minor comment:

      Please add the information on the strain background of the mice to the methods section of the manuscript. Strain background can have a significant impact on many aspects of neuronal function, and this information is therefore essential for the interpretation of potential differences to other studies.

      We deeply apologize for forgetting to include the two important references mentioned by the reviewer in the reference list. We understand that the reviewer as a result could not assess the validity of our statement that co-culture of glia is required for efficient synapse formation by human neurons that are induced from ES or iPS cells. Note that this conclusion does not postulate that all synapse formation requires glia, since the cited papers demonstrate that human neurons induced by our protocol still form scarce synapses without glia. This observation has been confirmed in many different experiments that were performed after the data presented in the cited papers. As a result of this extensive prior documentation that human neurons produced by forced expression of Ngn2 require coculture of glia for efficient synapse formation, we do not feel that we need to repeat this basic characterization of our culture system again to validate multiple previous papers and hope the reviewer will concur. We have additionally added the relevant mouse strain information to the methods section.

    1. Author response:

      Reviewer #1:

      Point 1

      Not many weaknesses, but probably validation at more enhancers could have made the paper stronger.

      We experimentally validated two sets of enhancers from two distinct tissues and observed similar effects. While this supports the idea that the TEAD-tissue-specific TF interaction we observe is not restricted to a single tissue, we agree that testing additional enhancers from a third tissue would strengthen our conclusions. We will acknowledge in the discussion that including a third tissue could provide additional support for the generality of our findings.

      Reviewer #2:

      Point 1

      The authors propose a mechanism of a TF trio (TEAD - CHD4 - tissue-specific TFs). However, only one validation experiment checked CHD4. CHD4 binding was not mentioned at all in the other cases.

      Indeed, CHD4 binding was experimentally validated at only one enhancer. This was a deliberate decision based on two key considerations:

      (1) Consistent functional response across enhancers: We tested multiple enhancers (n =8) for functional response to the TEAD+YAP and GATA4/6 combination. All enhancers tested exhibited the same trend—attenuation of GATA-mediated activation upon co-expression of TEAD or TEAD/YAP. This consistent pattern supports a shared mechanism across these elements.

      (2) Substantial prior evidence supporting CHD4 recruitment by both GATA4 and YAP: Specifically, CHD4 recruitment by GATA4 has been described in the context of cardiovascular development[1], and CHD4 can also be recruited by TEAD coactivator YAP2. Furthermore, published genomic occupancy data from embryonic heart tissue show widespread co-binding of GATA4, TEAD, and CHD4[1,3], including at most of the cardiac enhancers we functionally tested (4 out of 5).

      Given the consistent enhancer responses and the supporting literature and genomic data indicating TEAD-CHD4 co-occupancy, we chose to validate CHD4 binding at a representative enhancer as a proof of concept.

      We will clarify this rationale in the revised manuscript to better address this concern.

      Reviewer #2:

      Point 2

      The authors integrated E12.5 TEAD binding with E11.5 acetylation data, and it would be important to show that this experimental approach is valid or otherwise qualify its limitations.

      We will provide additional evidence in support of this approach in the revised manuscript or alternatively acknowledge its limitations.

      Reviewer #2:

      Point 3

      Motif co-occurrence analysis was extended to claiming TF interactions without further validation.

      We thank the reviewer for pointing out this important distinction. We reviewed the manuscript and identified seven instances where TF interactions were mentioned. Four of these correctly refer to previously established protein-protein interactions. For the remaining instances, we will adjust the wording to reflect the level of evidence, e.g.  describe combinatorial binding based on motif co-occurrence, rather than implying direct interaction.

      Reviewer #3:

      Point 1

      Much of this manuscript focuses on confirming transcription factor relationships that have been reported previously. For example, it is well known that GATA4 interacts with MEF2 in the ventricle. There are limited new or unexpected associations discussed and tested.

      We thank the reviewer for this important observation and see the recurrence of known interactions, such as GATA4-MEF2, not as a drawback, but as an important validation of our methodology.

      The identification of novel TF-TF combinations was geared toward uncovering shared regulatory principles across diverse human developmental tissues. While analysing 13 heterogeneous embryonic tissues introduced limitations, such as cellular complexity that may obscure rare interactions, it also allowed the identification of robust, recurrent patterns across tissues.  Indeed, using this approach, we identified the widespread combinatorial effect of TEAD in partnership with lineage-specific TFs, which is explored more in depth in the manuscript.

      Another main goal of the study was to develop and demonstrate a generalizable strategy for identifying combinatorial TF binding patterns that underlie tissue-specific gene regulation. Given the inherent heterogeneity of the embryonic organs analysed, the approach is naturally biased toward recovering the most prevalent, and often well-characterized, TF combinations. While we fully acknowledge this limitation, we believe that the ability to robustly recover well-established TF partnerships across multiple organs provides a valuable proof of concept. The next step will be to apply this strategy to single-cell RNA datasets, in order to define TF relationships at higher resolution, for example, resolving associations down to specific family members that cooperate within distinct lineages or cell types, and identifying less frequent or underrepresented TF-TF relationships.

      In this context, we believe that our strategy has successfully highlighted shared enhancer logic and offers a framework for future high-resolution dissection of TF cooperativity at the single-cell level. The rationale for analysing heterogeneous tissues, along with its limitations, will be addressed in the revised version.

      Reviewer #3:

      Point 2

      Embryonic tissues are highly heterogeneous, limiting the utility of the bulk ChIP-seq employed in these analyses. Does the cellular heterogeneity explain the discrepancy between TEAD binding and histone acetylation? Similarly, how does conservation between species affect the TF predictions?

      We thank the reviewer for raising these important points. We acknowledge the limitations of using bulk ChIP-seq data in the context of complex embryonic tissues (see also previous point). We cannot exclude that the discrepancy between TEAD binding and histone acetylation is an effect of cellular heterogeneity. Indeed, we mention in the results “Our ventricle-specific enhancers were sampled at a single time point and likely represent enhancers that are selectively active in different cell types and developmental stages, given the heterogeneity of cell types in the ventricle”. The limitation of bulk ChIP-seq will be addressed in the discussion. In the specific case of the enhancers selected for validation, the binding site sequences are conserved between species, suggesting that the cis-regulatory activity is likely to be similar in both.

      Reviewer #3:

      Point 3

      Some of the interpretations should also be fleshed out a bit more to clarify the advantage of the analyses presented here. For example, if Gata4 and Foxa2 transcripts are expressed during different stages of development, then it's likely that (as stated by the authors) these motifs are not used during the same stage of development. But examining the flanking regions wasn't necessary to make that statement. This type of conclusion seems tangential to the benefit of this analysis, which is to understand which TFs work together in a single organ at a single time point.

      We appreciate the reviewer’s comment and the opportunity to clarify our interpretation. The reviewer refers to the finding that GATA4 and FOXA2 motifs are flanked by different sets of motifs in liver enhancers, suggesting that these TFs operate within distinct regulatory contexts.

      Our aim was not to state that GATA4 and FOXA2 do not function simultaneously—this can indeed be inferred from their non-overlapping expression patterns. Rather, we intended to highlight the potential of our approach, even when applied to bulk data, to resolve distinct regulatory modules that may act in different subpopulations of cells or developmental windows within the same tissue.

      We will revise the relevant section of the manuscript to make this interpretative point clearer.

      Reviewer #3:

      Point 4

      This manuscript hinges on luciferase assays whose results can be difficult to translate to complex gene regulation networks. Many motifs are often clustered together, which makes designing experiments at endogenous loci important in studies such as this one.

      We agree with the Reviewer that luciferase assays represent an oversimplified model of gene regulation and do not fully capture the complexity of endogenous regulatory networks. We will explicitly acknowledge this limitation in the discussion.

      Mutagenesis of TEAD and tissue-specific TF motifs at endogenous loci would provide more conclusive evidence. However, our goal was to test the generality of TEAD effect across multiple enhancers and tissues. Despite its limitations, a luciferase-based assay was the most feasible approach, as an endogenous strategy would not have allowed us to assess a broader set of enhancers efficiently. Additionally, the presence of recurrent motifs and the potential functional redundancy among enhancers targeting the same gene can complicate the interpretation of single-locus perturbations.

      References

      (1) Robbe ZL, Shi W, Wasson LK, Scialdone AP, Wilczewski CM, Sheng X, et al. CHD4 is recruited by GATA4 and NKX2-5 to repress noncardiac gene programs in the developing heart. Genes Dev. 2022 Apr 1;36(7–8):468–82.

      (2) Kim M, Kim T, Johnson RL, Lim DS. Transcriptional Co-repressor Function of the Hippo Pathway Transducers YAP and TAZ. Cell Rep. 2015 Apr;11(2):270–82.

      (3) Akerberg BN, Gu F, VanDusen NJ, Zhang X, Dong R, Li K, et al. A reference map of murine cardiac transcription factor chromatin occupancy identifies dynamic and conserved enhancers. Nat Commun. 2019 Oct 28;10(1):4907.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: 

      (1) a large set of behavioral attributes, 

      (2) with inter-individual variability, that are 

      (3) stable over time. 

      A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining the correlation of locomotion features between different contexts.

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      We thank the reviewer for his exceptionally kind assessment of our work!

      Weaknesses: 

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. 

      We have now uploaded a high-resolution PDF to the Github Address: https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality/blob/main/S8.pdf, and this is also mentioned in the figure legend for Fig. S8

      Why were five or so parameters selected from the full set? How were these selected? 

      The five parameters (% of time walked, walking speed, vector strength, angular velocity, and centrophobicity) were selected because they describe key aspects of the investigated behaviors that can be compared directly across assays. Importantly, several parameters we typically use (e.g., Linneweber et al., 2020) cannot be applied under certain conditions, such as darkness or the absence of visual cues. Furthermore, these five parameters encompass three critical aspects of navigation across standard visual behavioral arenas: (1) The “exploration” category is characterized by parameters describing the fly’s activity. (2) Parameters related to “attention” reflect heightened responses to visual cues, but unlike commonly used metrics such as angle or stripe deviations (e.g., Coulomb, 2012; Linneweber et al., 2020), they can also be measured in absence of visual cues and are therefore suitable for cross-assay comparisons. (3) The parameter “centrophobicity,” used as a potential indicator of anxiety, is conceptually linked to the open-field test in mice, where the ratio of wall-to-open-field activity is frequently calculated as a measurement of anxiety (see for example Carter, Sheh, 2015, chapter 2. https://www.sciencedirect.com/book/9780128005118/guide-to-researchtechniques-in-neuroscience). Admittedly, this view is frequently challenged in mice, but it has a long history which is why we use it.

      Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset? 

      As noted above, we only included a subset of parameters in our final analysis, as many were unsuitable for comparison across assays while still providing valuable assayspecific information which are important to relate these results to previous publications.

      The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts, it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency". 

      Thank you for this suggestion. During the preparation of the manuscript, we indeed frequently alternated between the terms “stability” and “consistency.” And decided to go with “stability” as the only descriptor, to keep it simple. We now fully agree with the reviewer’s argument and have replaced “stability” by “consistency” throughout the current version of the manuscript in order to increase clarity and coherence.

      The parameters are considered one by one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability' and analyses of single-parameter variability stability.

      We agree with the reviewer that a multivariate analysis adds clear advantages in terms of statistical power, in addition to our chosen approach. On one hand, we believe that the simplicity of our initial analysis, both for correlational and mean data, makes easy for readers to understand and reproduce our data. While preparing the previous version of the manuscript we were skeptical since more complex analyses often involve numerous choices, which can complicate reproducibility. For instance, a recent study in personality psychology (Paul et al., 2024) highlighted the risks of “forking paths” in statistical analysis, showing that certain choices of statistical methods could even reverse findings—a concern mitigated by our simplistic straightforward approach. Still, in preparation of this revised version of the manuscript, we accepted the reviewer’s advice and reanalyzed the data using a generalized linear model. This analysis nicely recapitulates our initial findings and is now summarized in a single figure (Fig. 9).

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that a 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      We agree that this is an important question. Our paper clearly demonstrates that individuality always plays a role in decision-making (and, in this context, any behavioral output can be considered a decision). However, the non-linear relationship between certain situations and the individual’s behavior often reduces the predictive value (or correlation) across contexts, sometimes quite drastically.

      For instance, temperature has a relatively linear effect on certain behavioral parameters, leading to predictable changes across individuals. As a result, correlations across temperature conditions are often similar to those observed across time within the same situation. In contrast, this predictability diminishes when comparing conditions like the presence or absence of visual stimuli, the use of different arenas, or different modalities.

      For this reason, we believe that significance remains the best indicator for describing how measurable individuality persists, even across vastly different situations.

      The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining the correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?  

      We thank the reviewer for this suggestion, and we have now addressed this point. To account for slope effects, we have now introduced in-group ranks for our linear model computation (see Fig. 9). 

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general and with regard to these specific parameters? Is the increased walking speed at higher temperatures necessarily due to an increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      We agree that grouping our parameters into traits like exploration, attention, and anxiety always includes subjective decisions. The classification into these three categories is even considered partially controversial in the mouse specific literature, which uses the term “anxiety” in similar experiments (see for exampler Carter, Sheh, 2015, chapter 2 . https://www.sciencedirect.com/book/9780128005118/guide-to-research-techniquesin-neuroscience). Nevertheless, we believe that readers greatly benefit from these categories, since they make it easier to understand (beyond mathematical correlations) which aspects of the flies’ individuality can be considered consistent across situations. Furthermore, these categories serve as a bridge to compare insight from very distinct models.

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      We assume the reviewer is referring to Figure 3a. The detailed experimental protocol can be found in the Materials and Methods section under Setup 2: IndyTrax Multi-Arena Platform. We have now clarified this in the mentioned figure legend.

      Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The reviewer raises an important point about hierarchies within the concept of animal individuality or personality. We agree that this is best addressed by first focusing on single behavioral traits/parameters and then integrating several trait properties into a cohesive concept of animal personality (holistic individuality). To ensure consistency throughout the text, we have now thoroughly reviewed the entire manuscript clearly distinguish between single-parameter variability stability/consistency and holistic individuality/personality.

      The study presents a bounty of new technology to study visually guided behaviors. The GitHub link to the software was not available. To verify the successful transfer of open hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      We have now uploaded all codes and materials to GitHub and made them available as soon as we received the reviewers’ comments. All files and materials can be accessed at https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality, which is now frequently mentioned throughout the revised manuscript.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. 

      We thank the reviewer again for the extensive and constructive feedback.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths: 

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting it to their own needs.

      We thank the reviewer for highlighting the strengths of our study.

      Weaknesses/Limitations: 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting and temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low-risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. 

      We agree with the reviewer that the definition of environmental context can differ between fields and that behavioral context is differently defined, particularly in ecology. Nevertheless, we highlight that our alternations of environmental context are highly stereotypic, well-defined, and unbiased from any interpretation (we only modified what we stated in the experimental description without designing a specific situation that might be again perceived individually differently. E.g., comparing a context with a predator and one without might result in a binary response because one fraction of the tested individuals might perceive the predator in the predator situation, and the other half does not. 

      The analytical framework in terms of statistical methods is lacking. It appears as though the authors used correlations across time/situations to estimate individual variation; however, far more sophisticated and elegant methods exist. The paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data these models could capture and estimate differences in individual behavior across time and situations simultaneously. Along with this, it's currently unclear whether and how any statistical inference was performed. Right now, it appears as though any results describing how individuality changes across situations are largely descriptive (i.e. a visual comparison of the strengths of the correlation coefficients?). 

      The reviewer raises an important point, also raised by reviewer #1. On one hand, we agree with both reviewers that a more aggregated analysis has clear advantages like more statistical power and has the potential to streamline our manuscript, which is why we added such an analysis (see below). On the other hand, we would also like to defend the initial approach we took, since we think that the simplicity of the analysis for both correlational and mean data is easy to understand and reproduce. More complex analyses necessarily include the selection of a specific statistical toolbox by the experimenters and based on these decisions, different analyses become less comparable and more and more complicated to reproduce, unless the entire decision tree is flawlessly documented. For instance, a recent personality psychology paper investigated the relationship between statistical paths within the decision tree (forking analysis) and their results, leading to very surprising results (Paul et al., 2024), since some paths even reversed their findings. Such a variance in conclusions is hardly possible with the rather simplistic and easily reproducible analysis we performed. One of the major strengths of our study is the simple experimental design, allowing for rather simple and easy to understand analyses.

      We nevertheless took the reviewer’s advice very seriously and reanalyzed the data using a generalized linear model, which largely recapitulated the findings of our previously performed “low-tech” analysis in a single figure (Fig. 9).

      Another pretty major weakness is that right now, I can't find any explicit mention of how many flies were used and whether they were re-used across situations. Some sort of overall schematic showing exactly how many measurements were made in which rigs and with which flies would be very beneficial. 

      We apologize for this inconvenience. A detailed overview of male and female sample sizes has been listed in the supplemental boxplots next to the plots (e.g, Fig S6). Apparently, this was not visible enough. Therefore, we have now also uniformly added the sample sizes to the main figure legends.

      I don't necessarily doubt the robustness of the results and my guess is that the author's interpretations would remain the same, but a more appropriate modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation.

      As described above, we have now added the suggested analyses. We hope that the reviewer will appreciate the new Fig. 9, which, in our opinion, largely confirms our previous findings using a more appropriate generalized linear modelling framework.

      Reviewer #3 (Public Review): 

      This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable the individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

      They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

      (1) Many individualistic behaviours remain stable over the course of many days. 

      (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.

      (3) All the behaviours they tested failed to remain stable over the spatially varying environment (arena shape).

      (4) Only angular velocity (a readout of attention) remains stable across varying internal states (walking and flying).

      Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

      The manuscript is a technical feat with the authors having built many new highthroughput assays. The number of animals is large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, and different temperatures among others. 

      We thank the reviewer for this extraordinary kind assessment of our work!

      Recommendations for the authors:  

      Reviewing Editor (Recommendations For The Authors): 

      While appreciating the effort and quality of the work that went into this manuscript, the reviewers identified a few key points that would greatly benefit this work.

      (1) Statistical methods adopted. The dataset produced through this work is large, with multiple conditions and comparisons that can be made to infer parameters that both define and affect the individualistic behaviour of an animal. Hierarchical mixed models would be a more appropriate approach to handle such datasets and infer statistically the influence of different parameters on behaviours. We recommend that the authors take this approach in the analyses of their data.

      (2) Brevity in the text. We urge the authors to take advantage of eLife's flexible template and take care to elaborate on the text in the results section, the methods adopted, the legends, and the guides to the legends embedded in the main text. The findings are likely to be of interest to a broad audience, and the writing currently targets the specialist.

      Reviewer #2 (Recommendations For The Authors): 

      I want to start by saying this seems like a really cool study! It's an impressive amount of work and addressing a pretty basic question that is interesting (at least I think so!)

      We thank the reviewer again for this assessment!

      That said, I would really strongly recommend the authors embrace using mixed/hierarchical models to analyze their data. They're producing some really impressive data and just doing Pearson correlation coefficients across time points and situations is very clunky and actually losing out on a lot of information. The most up-todate, state-of-the-art are mixed models - these models can handle very complex (or not so complex) random structures which can estimate variance and importantly, covariance, in individual intercepts both over time and across situations. I actually think this could add some really cool insights into the data and allow you to characterize the patterns you're seeing in far more detail. It's datasets exactly like this that are tailormade for these complex variance partitioning models! 

      As mentioned before, we have now adopted a more appropriate GLM-based data analysis (see above).

      Regardless of which statistical methods you decide to use, please explicitly state in your methods exactly what analyses you did. That is completely lacking now and was a bit frustrating. As such, it's completely unclear whether or how statistical inference was performed. How did you do the behavioral clustering? 

      We apologize that these points were not clearly represented in the previous version of the manuscript. We have now significantly extended the methods section to include a separate paragraph on the statistical methods used, in order to address this critique and hope that the revised version is clear now.

      Also, I could not for the life of me figure out how many flies had been measured. Were they reused across the situation? Or not?

      We reused the same flies across situations whenever possible. However, having one fly experience all assays consecutively was not feasible due to their fragility. Instead, individual flies were exposed to at least 2 of the 3 groups of assays used here: in the Indytrax setup ,  the Buridan arenas and variants thereof, and the virtual arenas Hence, we have compared flies across entirely different setups, but the number of times flies can be retested is limited (as otherwise, sample sizes will drop over time, and the flies will have gone through too many experimental alternations). To make this more clear, we have elaborated on this point in the main text, and we added group sample sizes to figure legends r.

      What are these "groups" and "populations" that are referred to in the results (e.g. lines 384, 391, 409)?

      We apologize for using these two terms somewhat interchangeably without proper introduction/distinction. We have now made this more clear in at the beginning of the results in the main text, by focusing on the term ‘group’. By ‘group’ we refer to the average of all individuals tested in the same situation. Sample sizes in the figure legends now indicate group/population sizes to make this clearer.

      Some of the rationale for the development of the behavioral rigs would have actually been nice to include in the intro, rather than in the results.

      This rationale is introduced at the beginning of the last paragraph of the introduction. We hope that this now becomes clear in the revised version of the manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      This manuscript would do well to take advantage of eLife's flexible word limit. I sense that it has been written in brevity for a different journal but I would urge the authors to revisit this and unpack the language here - in the text, in the figure legends, in references to the figures within the text. The way it's currently written, though not misleading, will only speak to the super-specialist or the super-invested :). But the findings are nice, and it would be nice to tailor it to a broader audience.

      We appreciate this suggestion. Initially, we were hoping that we had described our results as clearly and brief as possible. We apologize if that was not always the case. The comments and requests of all three reviewers now led to a series of additions to both main text and methods, leading to a significantly expanded manuscript. We hope that these additons are helpful for the general, non-expert audience.

    1. Author response:

      The following is the authors’ response to the original reviews

      Overview of changes in the revision

      We thank the reviewers for the very helpful comments and have extensively revised the paper. We provide point-by-point responses below and here briefly highlight the major changes:

      (1) We expanded the discussion of the relevant literature in children and adults.

      (2) We improved the contextualization of our experimental design within previous reinforcement studies in both cognitive and motor domains highlighting the interplay between the two.

      (3) We reorganized the primary and supplementary results to better communicate the findings of the studies.

      (4) The modeling has been significantly revised and extended. We now formally compare 31 noise-based models and one value-based model and this led to a different model from the original being the preferred model. This has to a large extent cleaned up the modeling results. The preferred model is a special case (with no exploration after success) of the model proposed in Therrien et al. (2018). We also provide examples of individual fits of the model, fit all four tasks and show group fits for all, examine fits vs. data for the clamp phases by age, provide measures of relative and absolute goodness of fit, and examine how the optimal level of exploration varies with motor noise.

      Reviewer #1 (Public review):

      Summary:

      Here the authors address how reinforcement-based sensorimotor adaptation changes throughout development. To address this question, they collected many participants in ages that ranged from small children (3 years old) to adulthood (1 8+ years old). The authors used four experiments to manipulate whether binary and positive reinforcement was provided probabilistically (e.g., 30 or 50%) versus deterministically (e.g., 100%), and continuous (infinite possible locations) versus discrete (binned possible locations) when the probability of reinforcement varied along the span of a large redundant target. The authors found that both movement variability and the extent of adaptation changed with age.

      Thank you for reviewing our work. One note of clarification. This work focuses on reinforcementbased learning throughout development but does not evaluate sensorimotor adaptation. The four tasks presented in this work are completed with veridical trajectory feedback (no perturbation).

      The goal is to understand how children at different ages adjust their movements in response to reward feedback but does not evaluate sensorimotor adaptation. We now explain this distinction on line 35.

      Strengths:

      The major strength of the paper is the number of participants collected (n = 385). The authors also answer their primary question, that reinforcement-based sensorimotor adaptation changes throughout development, which was shown by utilizing established experimental designs and computational modelling.

      Thank you.

      Weaknesses:

      Potential concerns involve inconsistent findings with secondary analyses, current assumptions that impact both interpr tation and computational modelling, and a lack of clearly stated hypotheses.

      (1) Multiple regression and Mediation Analyses.

      The challenge with these secondary analyses is that:

      (a) The results are inconsistent between Experiments 1 and 2, and the analysis was not performed for Experiments 3 and 4,

      (b) The authors used a two-stage procedure of using multiple regression to determine what variables to use for the mediation analysis, and

      (c)The authors already have a trial-by-trial model that is arguably more insightful.

      Given this, some suggested changes are to:

      (a) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are consistent.

      (b) Move the regression/mediation analysis to Supplementary, since it is slightly distracting given current inconsistencies and that the trial-by-trial model is arguably more insightful.

      Based on these comments, we have chosen to remove the multiple regression and mediation analyses. We agree that they were distracting and that the trial-by-trial model allows for differentiation of motor noise from exploration variability in the learning block.

      (2) Variability for different phases and model assumptions:

      A nice feature of the experimental design is the use of success and failure clamps. These clamped phases, along with baseline, are useful because they can provide insights into the partitioning of motor and exploratory noise. Based on the assumptions of the model, the success clamp would only reflect variability due to motor noise (excludes variability due to exploratory noise and any variability due to updates in reach aim). Thus, it is reasonable to expect that the success clamps would have lower variability than the failure clamps (which it obviously does in Figure 6), and presumably baseline (which provides success and failure feedback, thus would contain motor noise and likely some exploratory noise).

      However, in Figure 6, one visually observes greater variability during the success clamp (where it is assumed variability only comes from motor noise) compared to baseline (where variability would come from: (a) Motor noise.

      (b) Likely some exploratory noise since there were some failures.

      (c) Updates in reach aim.

      Thanks for this comment. It made us realize that some of our terminology was unintentionally misleading. Reaching to discrete targets in the Baseline block was done to a) determine if participants could move successfully to targets that are the same width as the 100% reward zone in the continuous targets and b) determine if there are age dependent changes in movement precision. We now realize that the term Baseline Variability was misleading and should really be called Baseline Precision.

      This is an important distinction that bears on this reviewer's comment. In clamp trials, participants move to continuous targets. In baseline, participants move to discrete targets presented at different locations. Clamp Variability cannot be directly compared to Baseline Precision because they are qualitatively different. Since the target changes on each baseline trial, we would not expect updating of desired reach (the target is the desired reach) and there is therefore no updating of reach based on success or failure. The SD we calculate over baseline trials is the endpoint variability of the reach locations relative to the target centers. In success clamp, there are no targets so the task is qualitatively different.

      We have updated the text to clarify terminology, expand upon our operational definitions, and motivate the distinct role of the baseline block in our task paradigm (line 674).

      Given the comment above, can the authors please:

      (a) Statistically compare movement variability between the baseline, success clamp, and failure clamp phases.

      Given our explanation in the previous point we don't think that comparing baseline to the clamp makes sense as the trials are qualitatively different.

      (b) The authors have examined how their model predicts variability during success clamps and failure clamps, but can they also please show predictions for baseline (similar to that of Cashaback et al., 2019; Supplementary B, which alternatively used a no feedback baseline)?

      Again, we do not think it makes sense to predict the baseline which as we mention above has discrete targets compared to the continuous targets in the learning phase.

      (c) Can the authors show whether participants updated their aim towards their last successful reach during the success clamp? This would be a particularly insightful analysis of model assumptions.

      We have now compared 31 models (see full details in next response) which include the 7 models in Roth et al. (2023). Several of these model variants have updating even after success with so called planning noise). We also now fit the model to the data that includes the clamp phases (we can't easily fit to success clamp alone as there are only 10 trials). We find that the preferred model is one that does not include updating after success.

      (d) Different sources of movement variability have been proposed in the literature, as have different related models. One possibility is that the nervous system has knowledge of 'planned (noise)' movement variability that is always present, irrespective of success (van Beers, R.J. (2009). Motor learning is optimally tuned to the properties of motor noise. Neuron, 63(3), 406-417). The authors have used slightly different variations of their model in the past. Roth et al (2023) directly Rill compared several different plausible models with various combinations of motor, planned, and exploratory noise (Roth A, 2023, "Reinforcement-based processes actively regulate motor exploration along redundant solution manifolds." Proceedings of the Royal Society B 290: 20231475: see Supplemental). Their best-fit model seems similar to the one the authors propose here, but the current paper has the added benefit of the success and failure clamps to tease the different potential models apart. In light of the results of a), b), and c), the authors are encouraged to provide a paragraph on how their model relates to the various sources of movement variability and ther models proposed in the literature.

      Thank you for this. We realized that the models presented in Roth et al. (2023) as well as in other papers, are all special cases of a more general model. Moreover, in total there are 30 possible variants of the full model so we have now fit all 31 models to our larger datasets and performed model selection (Results and Methods). All the models can be efficiently fit by Kalman smoother to the actual data (rather than to summary statistics which has sometimes been done). For model selection, we fit only the 100 learning trials and chose the preferred model based on BIC on the children's data (Figure 5—figure Supplement 1). After selecting the preferred model we then refit this model to all trials including the clamps so as to obtain the best parameter estimates.

      The preferred model was the same whether we combined the continuous and discrete probabilistic data or just examin d each task separately either for only the children or for the children and adults combined. The preferred model is a pecial case (no exploration after success) of the one proposed in Therrien et al. (2018) and has exploration variability (after failure) and motor noise with full updating with exploration variability (if any) after success. This model differs from the model in the original submission which included a partial update of the desired reach after exploration this was considered the learning rate. The current model suggests a unity learning rate.

      In addition, as suggested by another reviewer, we also fit a value-based model which we adapted from the model described in Giron et al. (2023). This model was not preferred.

      We have added a paragraph to the Discussion highlighting different sources of variability and links to our model comparison.

      (e) line 155. Why would the success clamp be composed of both motor and exploratory noise? Please clarify in the text

      This sentence was written to refer to clamps in general and not just success clamps. However, in the revision this sentence seemed unnecessary so we have removed it.

      (3) Hypotheses:

      The introduction did not have any hypotheses of development and reinforcement, despite the discussion above setting up potential hypotheses. Did the authors have any hypotheses related to why they might expect age to change motor noise, exploratory noise, and learning rates? If so, what would the experimental behaviour look like to confirm these hypotheses? Currently, the manuscript reads more as an exploratory study, which is certainly fine if true, it should just be explicitly stated in the introduction. Note: on line 144, this is a prediction, not a hypothesis. Line 225: this idea could be sharpened. I believe the authors are speaking to the idea of having more explicit knowledge of action-target pairings changing behaviour.

      We have included our hypotheses and predictions at two points in the paper In the introduction we modified the text to:

      "We hypothesized that children's reinforcement learning abilities would improve with age, and depend on the developmental trajectory of exploration variability, learning rate (how much people adjust their reach after success), and motor noise (here defined as all sources of noise associated with movement, including sensory noise, memory noise, and motor noise). We think that these factors depend on the developmental progression of neural circuits that contribute to reinforcement learning abilities (Raznahan et al., 2014; Nelson et al., 2000; Schultz, 1998)."

      In results we modified the sentence to:

      "We predicted that discrete targets could increase exploration by encouraging children to move to a different target after failure.”

      Reviewer #2 (Public review):

      Summary:

      In this study, Hill and colleagues use a novel reinforcement-based motor learning task ("RML"), asking how aspects of RML change over the course of development from toddler years through adolescence. Multiple versions of the RML task were used in different samples, which varied on two dimensions: whether the reward probability of a given hand movement direction was deterministic or probabilistic, and whether the solution space had continuous reach targets or discrete reach targets. Using analyses of both raw behavioral data and model fits, the authors report four main results: First, developmental improvements reflected 3 clear changes, including increases in exploration, an increase in the RL learning rate, and a reduction of intrinsic motor noise. Second, changes to the task that made it discrete and/or deterministic both rescued performance in the youngest age groups, suggesting that observed deficits could be linked to continuous/probabilistic learning settings. Overall, the results shed light on how RML changes throughout human development, and the modeling characterizes the specific learning deficits seen in the youngest ages.

      Strengths:

      (1) This impressive work addresses an understudied subfield of motor control/psychology - the developmental trajectory of motor learning. It is thus timely and will interest many researchers.

      (2) The task, analysis, and modeling methods are very strong. The empirical findings are rather clear and compelling, and the analysis approaches are convincing. Thus, at the empirical level, this study has very few weaknesses.

      (3) The large sample sizes and in-lab replications further reflect the laudable rigor of the study.

      (4) The main and supplemental figures are clear and concise.

      Thank you.

      Weaknesses:

      (1) Framing.

      One weakness of the current paper is the framing, namely w/r/t what can be considered "cognitive" versus "non-cognitive" ("procedural?") here. In the Intro, for example, it is stated that there are specific features of RML tasks that deviate from cognitive tasks. This is of course true in terms of having a continuous choice space and motor noise, but spatially correlated reward functions are not a unique feature of motor learning (see e.g. Giron et al., 2023, NHB). Given the result here that simplifying the spatial memory demands of the task greatly improved learning for the youngest cohort, it is hard to say whether the task is truly getting at a motor learning process or more generic cognitive capacities for spatial learning, working memory, and hypothesis testing. This is not a logical problem with the design, as spatial reasoning and working memory are intrinsically tied to motor learning. However, I think the framing of the study could be revised to focus in on what the authors truly think is motor about the task versus more general psychological mechanisms. Indeed, it may be the case that deficits in motor learning in young children are mostly about cognitive factors, which is still an interesting result!

      Thank you for these comments on the framing of our study. We now clearly acknowledge that all motor tasks have cognitive components (new paragraph at line 65). We also explain why we think our tasks has features not present in typical cognitive tasks.

      (2) Links to other scholarship.

      If I'm not mistaken a common observation in tudies of the development of reinforcement learning is a decrease in exploration over-development (e.g., Nussenbaum and Hartley, 2019; Giron et al., 2023; Schulz et al., 2019); this contrasts with the current results which instead show an increase. It would be nice to see a more direct discussion of previous findings showing decreases in exploration over development, and why the current study deviates from that. It could also be useful for the authors to bring in concepts of different types of exploration (e.g. "directed" vs "random"), in their interpretations and potentially in their modeling.

      We recognize that our results differ from prior work. The optimal exploration pattern differs from task to task. We now discuss that exploration is not one size fits all, it's benefits vary depending upon the task. We have added the following paragraphs to the Discussion section:

      "One major finding from this study is that exploration variability increases with age. Some other studies of development have shown that exploration can decrease with age indicating that adults explore less compared to children (Schulz et al., 2019; Meder et al., 2021; Giron et al., 2023). We believe the divergence between our work and these previous findings is largely due to the experimental design of our study and the role of motor noise. In the paradigm used initially by Schulz et al. (2019) and replicated in different age groups by Meder et al. (2021) and Giron et al. (2023), participants push buttons on a two-dimensional grid to reveal continuous-valued rewards that are spatially correlated. Participants are unaware that there is a maximum reward available and therefore children may continue to explore to reduce uncertainty if they have difficulty evaluating whether they have reached a maxima. In our task by contrast, participants are given binary reward and told that there is a region in which reaches will always be rewarded. Motor noise is an additional factor which plays a key role in our reaching task but minimal if any role in the discretized grid task. As we show in simulations of our task, as motor noise goes down (as it is known to do through development) the optimal amount of exploration goes up (see Figure 7—figure Supplement 2 and Appendix 1). Therefore, the behavior of our participants is rational in terms of R230 increasing exploration as motor noise decreases.

      A key result in our study is that exploration in our task reflects sensitivity to failure. Older children make larger adjustments after failure compared to younger children to find the highly rewarded zone more quickly. Dhawale et al. (2017) discuss the different contexts in which a participant may explore versus exploit (i.e., stick at the same position). Exploration is beneficial when reward is low as this indicates that the current solution is no longer ideal, and the participant should search for a better solution. Konrad et al. (2025) have recently shown this behavior in a real-world throwing task where 6 to 12 year old children increased throwing variability after missed trials and minimized variability after successful trials. This has also been shown in a postural motor control task where participants were more variable after non-rewarded trials compared to rewarded trials (Van Mastrigt et al., 2020). In general, these studies suggest that the optimal amount of exploration is dependent on the specifics of the task."

      (3) Modeling.

      First, I may have missed something, but it is unclear to me if the model is actually accounting for the gradient of rewards (e.g., if I get a probabilistic reward moving at 45°, but then don't get one at 40°, I should be more likely to try 50° next then 35°). I couldn't tell from the current equations if this was the case, or if exploration was essentially "unsigned," nor if the multiple-trials-back regression analysis would truly capture signed behavior. If the model is sensitive to the gradient, it would be nice if this was more clear in the Methods. If not, it would be interesting to have a model that does "function approximation" of the task space, and see if that improves the fit or explains developmental changes.

      The model we use (similar to Roth et al. (2023) and Therrien et al. (2016, 2018)) does not model the gradient. Exploration is always zero-mean Gaussian. As suggested by the reviewer, we now also fit a value-based model (described starting at line 810) which we adapted from the model presented in Giron et al. (2023). We show that the exploration and noise-based model is preferred over the value-based model.

      The multiple-trials-back regression was unsigned as the intent was to look at the magnitude and not the direction of the change in movement. We have decided to remove this analysis from the manuscript as it was a source of confusion and secondary analysis that did not add substantially to the findings of these studies.

      Second, I am curious if the current modeling approach could incorporate a kind of "action hysteresis" (aka perseveration), such that regardless of previous outcomes, the same action is biased to be repeated (or, based on parameter settings, avoided).

      In some sense, the learning rate in the model in the original submission is highly related to perseveration. For example if the learning rate is 0, then there is complete perseveration as you simply repeat the same desired movement. If the rate is 1, there is no perseveration and values in between reflect different amounts of perseveration. Therefore, it is not easy to separate learning rate from perseveration. Adding perseveration as another parameter would likely make it and the learning unidentifiable. However, we now compare 31 models and those that have a non-unity learning rate are not preferred suggesting there is little perseveration.

      (4) Psychological mechanisms. There is a line of work that shows that when children and adults perform RL tasks they use a combination of working memory and trial-by-trial incremental learning processes (e.g., Master et al., 2020; Collins and Frank 2012). Thus, the observed increase in the learning rate over development could in theory reflect improvements in instrumental learning, working memory, or both. Could it be that older participants are better at remembering their recent movements in short-term memory (Hadjiosif et al., 2023; Hillman et al., 2024)?

      We agree that cognitive processes, such as working memory or visuospatial processing, play a role in our task and describe cognitive elements of our task in the introduction (new paragraph at line 65). However, the sensorimotor model we fit to the data does a good job of explaining the variation across age, which suggests that that age-dependent cognitive processes probably play a smaller role.

      Reviewer #3 (Public review):

      Summary:

      The study investigates reinforcement learning across the lifespan with a large sample of participants recruited for an online game. It finds that children gradually develop their abilities to learn reward probability, possibly hindered by their immature spatial processing and probabilistic reasoning abilities. Motor noise, reinforcement learning rate, and exploration after a failure all contribute to children's subpar performance.

      Strengths:

      (1) The paradigm is novel because it requires continuous movement to indicate people's choices, as opposed to discrete actions in previous studies.

      (2) A large sample of participants were recruited.

      (3) The model-based analysis provides further insights into the development of reinforcement learning ability.

      Thank you.

      Weaknesses:

      (1 ) The adequacy of model-based analysis is questionable, given the current presentation and some inconsistency in the results.

      Thank you for raising this concern. We have substantially revised the model from our first submission. We now compare 31 noise-based models and 1 value-based model and fit all of the tasks with the preferred model. We perform model selection using the two tasks with the largest datasets to identify the preferred model. From the preferred model, we found the parameter fits for each individual dataset and simulated the trial by trial behavior allowing comparison between all four tasks. We now show examples of individual fits as well as provide a measure of goodness of fit. The expansion of our modeling approach has resolved inconsistencies and sharpened the conclusions drawn from our model.

      (2) The task should not be labeled as reinforcement motor learning, as it is not about learning a motor skill or adapting to sensorimotor perturbations. It is a classical reinforcement learning paradigm.

      We now make it clear that our reinforcement learning task has both motor and cognitive demands, but does not fall entirely within one of these domains. We use the term motor learning because it captures the fact that participants maximize reward by making different movements, corrupted by motor noise, to unmarked locations on a continuous target zone. When we look at previous ublications, it is clear that our task is similar to those that also refer to this as reinforcement motor learning Cashaback et al. (2019) (reaching task using a robotic arm in adults), Van Mastrigt et al. (2020) (weight shifting task in adults), and Konrad et al. (2025) (real-world throwing task in children). All of these tasks involve trial-by-trial learning through reinforcement to make the movement that is most effective for a given situation. We feel it is important to link our work to these previous studies and prefer to preserve the terminology of reinforcement motor learning.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Thank you for this summary. Rather than repeat the extended text from the responses to the reviewers here, we point the Editor to the appropriate reviewer responses for each issue raised.

      The reviewers and editors have rated the significance of the findings in your manuscript as "Valuable" and the strength of evidence as "Solid" (see eLife evalutation). A consultancy discussion session to integrate the public reviews and recommendations per reviewer (listed below), has resulted in key recommendations for increasing the significance and strength of evidence:

      To increase the Significance of the findings, please consider the following:

      (1) Address and reframe the paper around whether the task is truly getting at a motor learning process or more generic cognitive decision-making capacities such as spatial memory, reward processing, and hypothesis testing.

      We have revised the paper to address the comments on the framing of our work. Please see responses to the public review comments of Reviewers #2 and #3.

      (2) It would be beneficial to specify the differences between traditional reinforcement algorithms (i.e., using softmax functions to explore, and build representations of state-action-reward) and the reinforcement learning models used here (i.e., explore with movement variability, update reach aim towards the last successful action), and compare present findings to previous cognitive reinforcement learning studies in children.

      Please see response to the public review comments of Reviewer #1 in which we explain the expansion of our modeling approach to fit a value-based model as well as 31 other noise-based models. In our response to the public review comments of Reviewer #2, we comment on our expanded discussion of how our findings compare with previous cognitive reinforcement learning studies.

      To move the "Strength of Evidence" to "Convincing", please consider doing the following:

      (1 ) Address some apparently inconsistent and unrealistic values of motor noise, exploration noise, and learning rate shown for individual participants (e.g., Figure 5b; see comments reviewers 1 and take the following additional steps: plotting r squares for individual participants, discussing whether individual values of the fitted parameters are plausible and whether model parameters in each age group can extrapolate to the two clamp conditions and baselines.

      We have substantially updated our modeling approach. Now that we compare 31 noise-based models, the preferred model does not show any inconsistent or unrealistic values (see response to Reviewer #3). Additionally, we now show example individual fits and provide both relative and absolute goodness of fit (see response to Reviewer #3).

      (2) Relatedly, to further justify if model assumptions are met, it would be valuable to show that the current learning model fits the data better than alternative models presented in the literature by the authors themselves and by others (reviewer 1). This could include alternative development models that formalise the proposed explanations for age-related change: poor spatial memory, reward/outcome processing, and exploration strategies (reviewer 2).

      Please see response to public review comments of Reviewer #1 in which we explain that we have now fit a value-based model as well as 31 other noise-based models providing a comparison of previous models as well as novel models. This led to a slightly different model being preferred over the model in the original submission (updated model has a learning rate of unity). These models span many of the processes previously proposed for such tasks. We feel that 32 models span a reasonable amount of space and do not believe we have the power to include memory issues or heuristic exploration strategies in the model.

      (3) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are more consistent across studies and with the current approach (see comments reviewer 1).

      Please see response to public review comments of Reviewer #1. We chose to focus only on the model based analysis because it allowed us to distinguish between exploration variability and motor noise.

      Please see below for further specific recommendations from each reviewer.

      Reviewer #1 (Recommendations for the author):

      (1) In general, there should be more discussion and contextualization of other binary reinforcement tasks used in the motor literature. For example, work from Jeroen Smeets, Katinka van der Kooij, and Joseph Galea.

      Thank you for this comment. We have edited the Introduction to better contextualize our work within the reinforcement motor learning literature (see line 67 and line 83).

      (2) Line 32. Very minor. This sentence is fine, but perhaps could be slightly improved. “select a location along a continuous and infinite set of possible options (anywhere along the span of the bridge)"

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (3) Line 57. To avoid some confusion in successive sentences: Perhaps, "Both children over 12 and adolescents...".

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (4) Line 80. This is arguably not a mechanistic model, since it is likely not capturing the reward/reinforcement machinery used by the nervous system, such as updating the expected value using reward predic tion errors/dopamine. That said, this phenomenological model, and other similar models in the field, do very well to capture behaviour with a very simple set of explore and update rules.

      We use mechanistic in the standard use in modeling, as in Levenstein et al. (2023), for example. The contrast is not with neural modeling, but with normative modeling, in which one develops a model to optimize a function (or descriptive models as to what a system is trying to achieve). In mechanistic modeling one proposes a mechanism and this can be at a state-space level (as in our case) or a neural level (as suggested my the reviewer) but both are considered mechanistic, just at different levels. Quoting Levenstein "... mechanistic models, in which complex processes are summarized in schematic or conceptual structures that represent general properties of components and their interactions, are also commonly used." We now reference the Levenstein paper to clarify what we mean by mechanistic.

      (5) Figure 1. It would be useful to state that the x-axis in Figure 1 is in normalized units, depending on the device.

      Thank you for this comment. We have added a description of the x-axis units to the Fig. 1 caption.

      (6) Were there differences in behaviour for these different devices? e.g., how different was motor noise for the mouse, trackpad, and touchscreen?

      Thank you for this question. We did not find a significant effect of device on learning or precision in the baseline block. We have added these one way ANOVA results for each task in Supplementary Table 1.

      (7) Line 98. Please state that participants received reinforcement feedback during baseline.

      Thank you for this comment. We have updated the text to specify that participants receive reward feedback during the baseline block.

      (8) Line 99. Did the distance from the last baseline trial influence whether the participant learned or did not learn? For example, would it place them too far from the peak success location such that it impacted learning?

      Thank you for this question. We looked at whether the position of movement on the last baseline block trial was correlated with the first movement position in the learning block. We did not find any correlations between these positions for any of the tasks. Interestingly, we found that the majority of participants move to the center of the workspace on the first trial of the learning block for all tasks (either in the presence of the novel continuous target scene or the presentation of 7 targets all at once). We do not think that the last movement in the baseline block "primed" the participant for the location of the success zone in the learning block. We have added the following sentence to the Results section:

      "Note that the reach location for the first learning trial was not affected by (correlated with) the target position on the last baseline trial (p > 0.3 for both children and adults, separately)."

      (9) The term learning distance could be improved. Perhaps use distance from target.

      Thank you for this comment. We appreciate that learning distance defined with 0 as the best value is counter intuitive. We have changed the language to be "distance from target" as the learning metric.

      (10) Line 188. This equation is correct, but to estimate what the standard deviation by the distribution of changes in reach position is more involved. Not sure if the authors carried out this full procedure, which is described in Cashaback et al., 2019; Supplemental 2.

      There appear to be no Supplemental 2 in the referenced paper so we assume the reviewer is referring to Supplemental B which deals with a shuffling procedure to examine lag-1 correlations.

      In our tasks, we are limited to only 9 trials to analyze in each clamp phase so do not feel a shuffling analysis is warranted. In these blocks, we are not trying to 'estimate what the standard deviation by the distribution of changes in reach position' but instead are calculating the standard deviation of the reach locations and comparing the model fit (for which the reviewer says the formula is correct) with the data. We are unclear what additional steps the reviewer is suggesting. In our updated model analysis, we fit the data including the clamp phases for better parameter estimation. We use simulations to estimate s.d. in the clamp phase (as we ensure in simulations the data does not fall outside the workspace) making the previous analytic formulas an approximation that are no longer used.

      (11) Line 197-199. Having done the demo task, it is somewhat surprising that a 3-year-old could understand these instructions (whose comprehension can be very different from even a 5-year old).

      Thank you for raising this concern. We recognize that the younger participants likely have different comprehension levels compared to older participants. However, we believe that the majority of even the youngest participants were able to sufficiently understand the goal of the task to move in a way to get the video clip to play. We intentionally designed the tasks to be simple such that the only instructions the child needed to understand were that the goal was to get the video clip to play as much as possible and the video clip played based on their movement. Though the majority of younger children struggled to learn well on the probabilistic tasks, they were able to learn well on the deterministic tasks where the task instructions were virtually identical with the exception of how many places in the workspace could gain reward. On the continuous probabilistic task, we did have a small number (n = 3) of 3 to 5 year olds who exhibited more mature learning ability which gives us confidence that the younger children were able to understand the task goal.

      (12) Line 497: Can the authors please report the F-score and p-value separately for each of these one-way ANOVA (the device is of particular interest here).

      Thank you for this request. We have added ina upplementarytable (Supplementary Table 1) with the results of these ANOVAs.

      (13) Past work has discussed how motivation influences learning, which is a function of success rate (van der Kooij, K., in 't Veld, L., & Hennink, T. (2021). Motivation as a function of success frequency. Motivation and Emotion, 45, 759-768.). Can the authors please discuss how that may change throughout development?

      Thank you for this comment. While motivation most probably plays a role in learning, in particular in a game environment, this was out of the scope of the direct focus of this work and not something that our studies were designed to test. We have added the following sentence to the discussion section to address this comment:

      "We also recognize that other processes, such as memory and motivation, could affect performance on these tasks however our study was not designed to test these processes directly and future work would benefit from exploring these other components more explicitly."

      (14) Supplement 6. This analysis is somewhat incomplete because it does not consider success.

      Pekny and collegues (2015) looked at 3 trials back but considered both success and reward. However, their analysis has issues since successive time points are not i.i.d., and spurious relationships can arise. This issue is brought up by Dwahale (Dhawale, A. K., Miyamoto, Y. R., Smith, M. A., & R475 Ölveczky, B. P. (2019). Adaptive regulation of motor variability. Current Biology, 29(21), 3551-3562.). Perhaps it is best to remove this analysis from the paper.

      Thank you for this comment. We have decided to remove this secondary analysis from the paper as it was a source of confusion and did not add to the understanding and interpretation of our behavioral results.

      Reviewer #2 (Recommendations for the author):

      (1 ) the path length ratio analyses in the supplemental are interesting but are not mentioned in the main paper. I think it would be helpful to mention these as they are somewhat dramatic effects

      Thank you for this comment. Path length ratios are defined in the Methods and results are briefly summarized in the Results section with a point to the supplementary figures. We have updated the text to more explicitly report the age related differences in path length ratios.

      (2) The second to last paragraph of the intro could use a sentence motivating the use ofthe different task features (deterministic/probabilistic and discrete/continuous).

      Thank you for this comment. We have added an additional motivating sentence to the introduction.

      Reviewer #3 (Recommendations for the author):

      The paper labeled the task as one for reinforcement motor learning, which is not quite appropriate in my opinion. Motor learning typically refers to either skill learning or motor adaptation, the former for improving speed-accuracy tradeoffs in a certain (often new) motor skill task and the latter for accommodating some sensorimotor perturbations for an existing motor skill task. The gaming task here is for neither. It is more like a

      decision-making task with a slight contribution to motor execution, i.e., motor noise. I would recommend the authors label the learning as reinforcement learning instead of reinforcement motor learning.

      Thank you for this comment. As noted in the response to the public review comments, we agree that this task has components of classical reinforcement learning (i.e. responding to a binary reward) but we specifically designed it to require the learning of a movement within a novel game environment. We have added a new paragraph to the introduction where we acknowledge the interplay between cognitive and motor mechanisms while also underscoring the features in our task that we think are not present in typical cognitive tasks.

      My major concern is whether the model adequately captures subjects' behavior and whether we can conclude with confidence from model fitting. Motor noise, exploration noise, and learning rate, which fit individual learning patterns (Figure 5b), show some quite unrealistic values. For example, some subjects have nearly zero motor noise and a 100% learning rate.

      We have now compared 31 models and the preferred model is different from the one in the first submission. The parameter fits of the new model do not saturate in any way and appear reasonable to us. The updates to the model analysis have addressed the concern of previously seen unrealistic values in the prior draft.

      Currently, the paper does not report the fitting quality for individual subjects. It is good to have an exemplary subject's fit shown, too. My guess is that the r-squared would be quite low for this type of data. Still, given that the children's data is noisier, it might be good to use the adult data to show how good the fitting can be (individual fits, r squares, whether the fitted parameters make sense, whether it can extrapolate to the two clamp phases). Indeed, the reliability of model fitting affects how we should view the age effect of these model parameters.

      We now show fits to individual subjects. But since this is a Kalman smoother it fits the data perfectly by generating its best estimate of motor noise and exploration variability on each trial to fully account for the data — so in that sense R<sup>2</sup> is always 1 so that is not helpful.

      While the BIC analysis with the other model variants provides a relative goodness of fit, it is not straightforward to provide an absolute goodness of fit such as standard R<sup>2</sup> for a feedforward simulation of the model given the parameters (rather than the output of the Kalman smoother). There are two problems. First, there is no single model output. Each time the model is simulated with the fit parameters it produces a different output (due to motor noise, exploration variability and reward stochasticity). Second, the model is not meant to reproduce the actual motor noise, exploration variability and reward stochasticity of a trial. For example, the model could fit pure Gaussian motor noise across trials (for a poor learner) by accurately fitting the standard deviation of motor noise but would not be expected to actually match each data point so would have a traditional R<sup>2</sup> of O.

      To provide an overall goodness of fit we have to reduce the noise component and to do so we exam ined the traditional R<sup>2</sup> between the average of all the children's data and the average simulation of the model (from the median of 1000 simulations per participant) so as to reduce the stochastic variation. The results for the continuous probabilistic and discrete probabilistic task are R<sup>2</sup> of 0.41 and 0.72, respectively.

      Not that variability in the "success clamp" doe not change across ages (Figure 4C) and does not contribute to the learning effect (Figure 4F). However, it is regarded as reflecting motor noise (Figure SC), which then decreases over age from the model fitting (Figure 5B). How do we reconcile these contradictions? Again, this calls the model fitting into question.

      For the success clamp, we only have 9 trials to calculate variability which limits our power to detect significance with age. In contrast, the model uses all 120 trials to estimate motor noise. There is a downward trend with age in the behavioral data which we now show overlaid on the fits of the model for both probabilistic conditions (Figure 5—figure Supplement 4) and Figure 6—figure Supplement 4). These show a reasonable match and although the variance explained is 1 6 and 56% (we limit to 9 trials so as to match the fail clamp), the correlations are 0.52 and 0.78 suggesting we have reasonable relation although there may be other small sources of variability not captured in the model.

      Figure 5C: it appears one bivariate outlier contributes a lot to the overall significant correlation here for the "success clamp".

      Recalculating after removing that point in original Fig 5C was still significant and we feel the plots mentioned in the previous point add useful information to this issue. With the new model this figure has changed.

      It is still a concern that the young children did not understand the instructions. Nine 3-to-8 children (out of 48) were better explained by the noisy only model than the full model. In contrast, ten of the rest of the participants (out of 98) were better explained by the noisy-only model. It appears that there is a higher percentage of the "young" children who didn't get the instruction than the older ones.

      Thank you for this comment. We did take participant comprehension of the task into consideration during the task design. We specifically designed it so that the instructions were simple and straight forward. The child simply needs to understand the underlying goal to make the video clip play as often as possible and that they must move the penguin to certain positions to get it to play. By having a very simple task goal, we are able to test a naturalistic response to reinforcement in the absence of an explicit strategy in a task suited even for young children.

      We used the updated reinforcement learning model to assess whether an individual's performance is consistent with understanding the task. In the case of a child who does not understand the task, we expect that they simply have motor noise on their reach, and crucially, that they would not explore more after failure, nor update their reach after success. Therefore, we used a likelihood ratio test to examine whether the preferred model was significantly better at explaining each participant's data compared to the model variant which had only motor noise (Model 1). Focusing on only the youngest children (age 3-5), this analysis showed that that 43, 59, 65 and 86% of children (out of N = 21, 22, 20 and 21 ) for the continuous probabilistic, discrete probabilistic, continuous deterministic, and discrete deterministic conditions, respectively, were better fit with the preferred model, indicating non-zero exploration after failure. In the 3-5 year old group for the discrete deterministic condition, 18 out of 21 had performance better fit by the preferred model, suggesting this age group understands the basic task of moving in different directions to find a rewarding location.

      The reduced numbers fit by the preferred model for the other conditions likely reflects differences in the task conditions (continuous and/or probabilistic) rather than a lack of understanding of the goal of the task. We include this analysis as a new subsection at the end of the Results.

      Supplementary Figure 2: the first panel should belong to a 3-year-old not a 5-year-old? How are these panels organized? This is kind of confusing.

      Thank you for this comment. Figure 2—figure Supplement 1 and Figure 2—figure Supplement 2 are arranged with devices in the columns and a sample from each age bin in the rows. For example in Figure 2—figure Supplement 1, column 1, row 1 is a mouse using participant age 3 to 5 years old while column 3, row 2 is a touch screen using participant age 6 to 8 years old. We have edited the labeling on both figures to make the arrangement of the data more clear.

      Line 222: make this a complete sentence.

      This sentence has been edited to a complete sentence.

      Line 331: grammar.

      This sentence has been edited for grammar.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This article investigates the phenotype of macrophages with a pathogenic role in arthritis, particularly focusing on arthritis induced by immune checkpoint inhibitor (ICI) therapy. 

      Building on prior data from monocyte-macrophage coculture with fibroblasts, the authors hypothesized a unique role for the combined actions of prostaglandin PGE2 and TNF. The authors studied this combined state using an in vitro model with macrophages derived from monocytes of healthy donors. They complemented this with single-cell transcriptomic and epigenetic data from patients with ICI-RA, specifically, macrophages sorted out of synovial fluid and tissue samples. The study addressed critical questions regarding the regulation of PGE2 and TNF: Are their actions co-regulated or antagonistic? How do they interact with IFN-γ in shaping macrophage responses? 

      This study is the first to specifically investigate a macrophage subset responsive to the PGE2 and TNF combination in the context of ICI-RA, describes a new and easily reproducible in vitro model, and studies the role of IFNgamma regulation of this particular Mф subset. 

      Strengths: 

      Methodological quality: The authors employed a robust combination of approaches, including validation of bulk RNA-seq findings through complementary methods. The methods description is excellent and allows for reproducible research. Importantly, the authors compared their in vitro model with ex vivo single-cell data, demonstrating that their model accurately reflects the molecular mechanisms driving the pathogenicity of this macrophage subset. 

      Weaknesses: 

      Introduction: The introduction lacks a paragraph providing an overview of ICI-induced arthritis pathogenesis and a comparison with other types of arthritis. Including this would help contextualize the study for a broader audience.

      Thank you for this suggestion, we have added a paragraph on ICI-arthritis to intro (pg. 4, middle paragraph).  

      Results Section: At the beginning of the results section, the experimental setup should be described in greater detail to make an easier transition into the results for the reader, rather than relying just on references to Figure 1 captions.

      We have clarified the experimental setup (pg. 5).  

      There is insufficient comparison between single-cell RNA-seq data from ICI-induced arthritis and previously published single-cell RA datasets. Such a comparison may include DEGs and GSEA, pathway analysis comparison for similar subsets of cells. Ideally, an integration with previous datasets with RA-tissue-derived primary monocytes would allow for a direct comparison of subsets and their transcriptomic features.

      We thank the Reviewer for this suggestion, which has increased the impact of our data and analysis. A computationally rigorous representation mapping approach showed that ICI-arthritis myeloid subsets predominantly mapped onto 4 previously defined RA subsets including IL-1β+ cells. This result was corroborated using a complementary data integration approach. Analysis of (TNF + PGE)-induced gene sets (TP signatures) in ICI-arthritis myeloid cells projected onto the RA subsets using the AUCell package showed elevated TP gene expression in similar ICI-arthritis and RA monocytic cell subsets. We also found mutually exclusive expression of TP and IFN signatures in distinct RA and ICI-arthritis myeloid cell subsets, which supports that the opposing cross-regulation between IFN-γ and PGE2 pathways that we identified in vitro also functions similarly in vivo. This analysis is shown in the new Fig. 3, described on pg. 7, and discussed on pp. 13-14.

      While it's understandable that arthritis samples are limited in numbers and myeloid cell numbers, it would still be interesting to see the results of PGE2+TNF in vitro stimulation on the primary RA or ICI-RA macrophages. It would be valuable to see RNA-Seq signatures of patient cell reactivation in comparison to primary stimulation of healthy donor-derived monocytes.

      We agree that this would be interesting but given limited samples and distribution of samples amongst many studies and investigators this is beyond the scope of the current study.  

      Discussion: Prior single-cell studies of RA and RA macrophage subpopulations from 2019, 2020, 2023 publications deserve more discussion. A thorough comparison with these datasets would place the study in a broader scientific context. 

      Creating an integrated RA myeloid cell atlas that combines ICI-RA data into the RA landscape would be ideal to add value to the field. 

      As one of the next research goals, TNF blockade data in RA and ICI-RA patients would be interesting to add to such an integrated atlas. Combining responders and non-responders to TNF blockade would help to understand patient stratification with the myeloid pathogenic phenotypes. It would be great to read the authors' opinion on this in the Discussion section. 

      Please see our response to point 3 above. This point is addressed in Fig. 3, pg. 7, and pp. 13-14, which includes a discussion of responders and nonresponders and patient stratification.  

      Conclusion: The authors demonstrated that while PGE2 maintains the inflammatory profile of macrophages, it also induces a distinct phenotype in simultaneous PGE2 and TNF treatment. The study of this specific subset in single-cell data from ICI-RA patients sheds light on the pathogenic mechanisms underlying this condition, however, how it compares with conventional RA is not clear from the manuscript. 

      Given the substantial incidence of ICI-induced autoimmune arthritis, understanding the unique macrophage subsets involved for future targeting them therapeutically is an important challenge. The findings are significant for immunologists, cancer researchers, and specialists in autoimmune diseases, making the study relevant to a broad scientific audience. 

      Reviewer #2 (Public review): 

      Summary/Significance of the findings: 

      The authors have done a great job by extensively carrying out transcriptomic and epigenomic analyses in the primary human/mouse monocytes/macrophages to investigate TNF-PGE2 (TP) crosstalk and their regulation by IFN-γ in the Rheumatoid arthritis (RA) synovial macrophages. They proposed that TP induces inflammatory genes via a novel regulatory axis whereby IFN-γ and PGE2 oppose each other to determine the balance between two distinct TNF-induced inflammatory gene expression programs relevant to RA and ICI-arthritis. 

      Strengths: 

      The authors have done a great job on RT-qPCR analysis of gene expression in primary human monocytes stimulated with TNF and showing the selective agonists of PGE2 receptors EP2 and EP4 22 that signal predominantly via cAMP. They have beautifully shown IFN-γ opposes the effects of PGE2 on TNF-induced gene expression. They found that TP signature genes are activated by cooperation of PGE2-induced AP-1, CEBP, and NR4A with TNF-induced NF-κB activity. On the other hand, they found that IFN-γ suppressed induction of AP-1, CEBP, and NR4A activity to ablate induction of IL-1, Notch, and neutrophil chemokine genes but promoted expression of distinct inflammatory genes such as TNF and T cell chemokines like CXCL10 indicating that TP induces inflammatory genes via IFN-γ in the RA and ICI-arthritis. 

      Weaknesses: 

      (1) The authors carried out most of the assays in the monocytes/macrophages. How do APCcells like Dendritic cells behave with respect to this TP treatment similar dosing? 

      We agree that this is an interesting topic especially as TNF + PGE2 is one of the standard methods of maturing in vitro generated human DCs and promoting antigen-presenting function. As DC maturation is quite different from monocyte activation this would represent a new study and is beyond the scope of the current manuscript. We have instead added a paragraph to the discussion (pg. 12) and cited the literature on DC maturation by TNF + PGE2 including one of our older papers (PMID: 18678606; 2008)  

      (2) The authors studied 3h and 24h post-treatment transcriptomic and epigenomic. What happens to TP induce inflammatory genes post-treatment 12h, 36h, 48h, 72h. It is critical to see the upregulated/downregulated genes get normalised or stay the same throughout the innate immune response.

      We now clarify that subsets of inducible genes showed distinct kinetics of induction with transient expression at 3 hr versus sustained expression over the 24 hr stimulation period as shown in Supplementary Fig. 1 (pg. 5).

      (3) The authors showed IL1-axis in response to the TP-treatment. Do other cytokine axes get modulated? If yes, then how do they cooperate to reduce/induce inflammatory responses along this proposed axis?

      This is an interesting question, which we approached using a combination of pathway analysis and targeted inspection of pathways important pathogenesis of RA, which is the inflammatory condition most relevant for this study. In addition to genes in the IL-1-NF-κB core inflammatory pathway, pathway analysis of genes induced by TP co-stimulation showed enrichment of genes related to leukocyte chemotaxis, in particular neutrophil migration. Accordingly, TP costimulation increased expression of CSF3, which plays a key role in mobilizing neutrophils from the bone marrow, and major neutrophil chemokines CXCL1, CXCL2, CXCL3 and CXCL5 that recruit neutrophils to sites of inflammation including in inflammatory arthritis. Analysis of the late response to TNF similarly showed enrichment of genes important in chemotaxis, and suppression of genes in the cholesterol biosynthetic pathway, which we and others have previously linked to IFN responses. Targeted inspection of genes in additional pathways implicated in RA pathogenesis showed increased expression of genes in the Notch pathway. We believe that these pathways work together with the IL-1 pathway to increase immune cell recruitment and activation in inflammatory responses; these results are described on pp. 5-6 and are incorporated into Figures 1, 2 and Supplementary Fig. 2. 

      Overall, the data looks good and acceptable but I need to confirm the above-mentioned criticisms. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors):   

      The discussion section of the manuscript claims: "In this study, we utilized transcriptomics to demonstrate a 'TNF + PGE2' (TP) signature in RA and ICI-arthritis IL-1β+ synovial macrophages." This statement is misleading, as no new transcriptomic data from RA synovial samples were generated in this study. To support such a claim, the authors would need to compare primary monocytes or macrophages from RA patients using bulk RNA-seq or singlecell RNA-seq. Based on the current data, the comparison is limited to bulk RNA-seq findings from the authors' in vitro model and prior monocyte-fibroblast coculture studies. 

      We have modified the abstract and discussion (pg. 10) to reflect that we have compared an in vitro generated TP signature with gene expression in previously identified RA macrophage subsets.

    1. Author response:

      [The following is the authors’ response to the original reviews.]

      We extend our sincere thanks to the editor, referees for eLife, and other commentators who have written evaluations of this manuscript, either in whole or in part. Sources of these comments were highly varied, including within the bioRxiv preprint server, social media (including many comments received on X/Twitter and some YouTube presentations and interviews), comments made by colleagues to journalists, and also some reviews of the work published in other academic journals. Some of these are formal and referenced with citations. Others were informal but nonetheless expressed perspectives that helped enable us to revise the manuscript with the inclusion of broader perspectives than the formal review process. It is beyond the scope of this summary to list every one of these, which have often been brought to the attention of different coauthors, but we begin by acknowledging the very wide array of peer and public commentary that have contributed to this work. The reaction speaks to a broad interest in open discussion and review of preprints. 

      As we compiled this summary of changes to the manuscript, we recognized that many colleagues made comments about the process of preprint dissemination and evaluation rather than the data or analyses in the manuscript. Addressing such comments is outside the scope of this revised manuscript, but we do feel that a broader discussion of these comments would be valuable in another venue. Many commentators have expressed confusion about the eLife system of evaluation of preprints, which differs from the editorial acceptance or rejection practiced in most academic journals. As authors in many different nations, in varied fields, and in varied career stages, we ourselves are still working to understand how the academic publication landscape is changing, and how best to prepare work for new models of evaluation and dissemination. 

      The manuscript and coauthor list reflect an interdisciplinary collaboration. Analyses presented in the manuscript come from a wide range of scientific disciplines. These range from skeletal inventory, morphology, and description, spatial taphonomy, analysis of bone fracture patterns and bone surface modifications, sedimentology, geochemistry, and traditional survey and mapping. The manuscript additionally draws upon a large number of previous studies of the Rising Star cave system and the Dinaledi Subsystem, which have shaped our current work. No analysis within any one area of research stands alone within this body of work: all are interpreted in conjunction with the outcomes of other analyses and data from other areas of research. Any single analysis in isolation might be consistent with many different hypotheses for the formation of sediments and disposition of the skeletal remains. But testing a hypothesis requires considering all data in combination and not leaving out data that do not fit the hypothesis. We highlight this general principle at the outset because a number of the comments from referees and outside specialists have presented alternative hypotheses that may arguably be consistent with one kind of analysis that we have presented, while seeming to overlook other analyses, data, or previous work that exclude these alternatives. In our revision, we have expanded all sections describing results to consider not only the results of each analysis, but how the combination of data from different kinds of analysis relate to hypotheses for the deposition and subsequent history of the Homo naledi remains. We address some specific examples and how we have responded to these in our summary of changes below. 

      General organization:

      The referee and editor comments are mostly general and not line-by-line questions, and we have compiled them and treated them as a group in this summary of changes, except where specifically noted. 

      The editorial comments on the previous version included the suggestion that the manuscript should be reorganized to test “natural” (i.e. noncultural) hypotheses for the situations that we examine. The editorial comment suggested this as a “null hypothesis” testing approach. Some outside comments also viewed noncultural deposition as a null hypothesis to be rejected. We do not concur that noncultural processes should be construed as a null hypothesis, as we discuss further below. However, because of the clear editorial opinion we elected to revise the manuscript to make more explicit how the data and analyses test noncultural depositional hypotheses first, followed by testing of cultural hypotheses. This reorganization means that the revised manuscript now examines each hypothesis separately in turn. 

      Taking this approach resulted in a substantial reorganization of the “Results” section of the manuscript. The “Results” section now begins with summaries of analyses and data conducted on material from each excavation area. After the presentation of data and analyses from each area, we then present a separate section for each of several hypotheses for the disposition and sedimentary context of the remains. These hypotheses include deposition of bodies upon a talus (as hypothesized in some previous work), slow sedimentary burial on a cave floor or within a natural depression, rapid burial by gravity-driven slumping, and burial of naturally mummified remains. We then include sections to test the hypothesis of primary cultural burial and secondary cultural burial. This approach adds substantial length to the Results. While some elements may be repeated across sections, we do consider the new version to be easier to take piece by piece for a reader trying to understand how each hypothesis relates to the evidence. 

      The Results section includes analyses on several different excavation areas within the Dinaledi Subsystem. Each of these presents somewhat different patterns of data. We conceived of this manuscript combining these distinct areas because each of them provides information about the formation history of the Homo naledi-associated sediments and the deposition of the Homo naledi remains. Together they speak more strongly than separately. In the previous version of the manuscript, two areas of excavation were considered in detail (Dinaledi Feature 1 and the Hill Antechamber Feature), with a third area (the Puzzle Box area) included only in the Discussion and with reference to prior work. We now describe the new work undertaken after the 2013-2014 excavations in more detail. This includes an overview of areas in the Hill Antechamber and Dinaledi Chamber that have not yielded substantial H. naledi remains and that thereby help contextualize the spatial concentration of H. naledi skeletal material. The most substantial change in the data presented is a much expanded reanalysis of the Puzzle Box area. This reanalysis provides greater clarity on how previously published descriptions relate to the new evidence. The reanalysis also provides the data to integrate the detailed information on bone identification fragmentation, and spatial taphonomy from this area with the new excavation results from the other areas. 

      In addition to Results, the reorganization also affected the manuscript’s Introduction section. Where the previous version led directly from a brief review of Pleistocene burial into the description of the results, this revised manuscript now includes a review of previous studies of the Rising Star cave system. This review directly addresses referee comments that express some hesitation to accept previous results concerning the structure and formation of sediments, the accessibility of the Dinaledi Subsystem, the geochronological setting of the H. naledi remains, and the relation of the Dinaledi Subsystem to nearby cave areas. Some parts of this overview are further expanded in the Supplementary Information to enable readers to dive more deeply into the previous literature on the site formation and geological configuration of the Rising Star cave system without needing to digest the entirety of the cited sources. 

      The Discussion section of the revised manuscript is differentiated from Results and focuses on several areas where the evidence presented in this study may benefit from greater context. One new section addresses hypothesis testing and parsimony for Pleistocene burial evidence, which we address at greater length in this summary below. The majority of the Discussion concerns the criteria for recognizing evidence for burial as applied in other studies. In this research we employ a minimal definition but other researchers have applied varied criteria. We consider whether these other criteria have relevance in light of our observations and whether they are essential to the recognition of burial evidence more broadly. 

      Vocabulary:

      We introduce the term “cultural burial” in this revised manuscript to refer to the burial of dead bodies as a mortuary practice. “Burial” as an unmodified term may refer to the passive covering of remains by sedimentary processes. Use of the term “intentional burial” would raise the question of interpreting intent, which we do not presume based on the evidence presented in this research. The relevant question in this case is whether the process of burial reflects repeated behavior by a group. As we received input from various colleagues it became clear that burial itself is a highly loaded term. In particular there is a common assumption within the literature and among professionals that burial must by definition be symbolic. We do not take any position on that question in this manuscript, and it is our hope that the term “cultural burial” may focus the conversation around the extent that the behavioral evidence is repeated and patterned. 

      Sedimentology and geochemistry of Dinaledi Feature 1:

      Reviewer 4 provided detailed comments on the sedimentological and geochemical context that we report in the manuscript. One outside review (Foecke et al. 2024) included some of the points raised by reviewer 4, and additionally addressed the reporting of geochemical and sedimentological data in previous work that we cite. 

      To address these comments we have revised the sedimentary context and micromorphology of sediments associated with Dinaledi Feature 1. In the new text we demonstrate the lack of microstratigraphy (supported by grain size analysis) in the unlithified mud clast breccia (UMCB), while such a microstratigraphy is observed in the laminated orange-red mudstones (LORM) that contribute clasts to the UMCB. Thus, we emphasize the presence and importance of a laterally continuous layer of LORM nature occurring at a level that appears to be the maximum depth of fossil occurrence. This layer is severely broken under extensive accumulation of fossils such as Feature 1 and only evidenced by abundant LORM clasts within and around the fossils. 

      We have completely reworked the geochemical context associated with Feature 1 following the comments of reviewer 4. We described the variations and trends observed in the major oxides separate from trace and rare-earth elements. We used Harker variations plots to assess relationships between these element groups with CaO and Zn, followed by principal component analysis of all elements analyzed. The new geochemical analysis clearly shows that Feature 1 is associated with localized trace element signatures that exist in the sediments only in association with the fossil bones, which suggests lack of postdepositional mobilization of the fossils and sediments. We additionally have included a fuller description of XRF methods. 

      To clarify the relation of all results to the features described in this study, we removed the geochemical and sedimentological samples from other sites within the Dinaledi Subsystem. These localities within the fissure network represent only surface collection of sediment, as no excavation results are available from those sites to allow for comparison in the context of assessing evidence of burial. These were initially included for comparison, but have now been removed to avoid confusion.  

      Micromorphology of sediments:

      Some referees (1, 3, and 4) and other commentators (including Martinón-Torres et al. 2024) have suggested that the previous manuscript was deficient due to an insufficient inclusion of micromorphological analysis of sediments. Because these commentators have emphasized this kind of evidence as particularly important, we review here what we have included and how our revision has addressed this comment. Previous work in the Dinaledi Chamber (Dirks et al., 2015; 2017) included thin section illustrations and analysis of sediment facies, including sediments in direct association with H. naledi remains within the Puzzle Box area. The previous work by Wiersma and coworkers (2020) used micromorphological analysis as one of several approaches to test the formation history of Unit 3 sediments in the Dinaledi Subsystem, leading to the interpretation of autobrecciation of earlier Unit 1 sediment. In the previous version of this manuscript we provided citations to this earlier work. The previous manuscript also provided new thin section illustrations of Unit 3 sediment near Dinaledi Feature 1 to place the disrupted layer of orange sediment (now designated the laminated orange silty mudstone unit) into context. 

      In the new revised manuscript we have added to this information in three ways. First, as noted above in response to reviewer 4, we have revised and added to our discussion of micromorphology within and adjacent to the Dinaledi Feature 1. Second, we have included more discussion in the Supplementary Information of previous descriptions of sediment facies and associated thin section analysis, with illustrations from prior work (CC-BY licensed) brought into this paper as supplementary figures, so that readers can examine these without following the citations. Third, we have included Figure 10 in the manuscript which includes six panels with microtomographic sections from the Hill Antechamber Feature. This figure illustrates the consistency of sub-unit 3b sediment in direct contact with H. naledi skeletal material, including anatomically associated skeletal elements, with previous analyses that demonstrate the angular outlines and chaotic orientations of LORM clasts. It also shows density contrasts of sediment in immediate contact with some skeletal elements, the loose texture of this sediment with air-filled voids, and apparent invertebrate burrowing activity. To our knowledge this is the first application of microtomography to sediment structure in association with a Pleistocene burial feature. 

      To forestall possible comments that the revised manuscript does not sufficiently employ micromorphological observations, or that any one particular approach to micromorphology is the standard, we present here some context from related studies of evidence from other research groups working at varied sites in Africa, Europe, and Asia. Hodgkins et al. (2021) noted: “Only a handful of micromorphological studies have been conducted on human burials and even fewer have been conducted on suspected burials from Paleolithic or hunter-gatherer contexts.” In that study, one supplementary figure with four photomicrographs of thin sections of sediments was presented. Interpretation of the evidence for a burial pit by Hodgkins et al. (2021) noted the more open microstructure of sediment but otherwise did not rely upon the thin section data in characterizing the sediments associated with grave fill. Martinón-Torres et al. (2021) included one Extended Data figure illustrating thin sections of sediments and bone, with two panels showing sediments (the remainder showing bone histology). The micromorphological analysis presented in the supplementary information of that paper was restricted to description of two microfacies associated with the proposed “pit” in that study. That study did carry out microCT scanning of the partially-prepared skeletal remains but did not report any sediment analysis from the microtomographic results. Maloney et al. (2022) reported no micromorphological or thin section analysis. Pomeroy et al. (2020a) included one illustration of a thin section; this study may be regarded as a preliminary account rather than a full description of the work undertaken. Goldberg et al. (2017) analyzed the geoarchaeology of the Roc de Marsal deposits in which possible burial-associated sediments had been fully excavated in the 1960s, providing new morphological assessments of sediment facies; the supplementary information to this work included five scans (not microscans) of sediment thin sections and no microphotographs. Fewlass et al. (2023) presented no thin section or micromorphological illustrations or methods. In summary of this research, we note that in one case micromorphological study provided observations that contributed to the evidence for a pit, in other cases micromorphological data did not test this hypothesis, and many researchers do not apply micromorphological techniques in their particular contexts. 

      Sediment micromorphology is a growing area of research and may have much to provide to the understanding of ancient burial evidence as its standards continue to develop (Pomeroy et al. 2020b). In particular microtomographic analysis of sediments, as we have initiated in this study, may open new horizons that are not possible with more destructive thin-section preparation. In this manuscript, the thin section data reveals valuable evidence about the disruption of sediment structure by features within the Dinaledi Chamber, and microtomographic analysis further documents that the Hill Antechamber Feature reflects similar processes, in addition to possible post-burial diagenesis and invertebrate activity. Following up in detail on these processes will require further analysis outside the scope of this manuscript. 

      Access into the Dinaledi Subsystem:

      Reviewer 1 emphasizes the difficulty of access into the Dinaledi Subsystem as a reason why the burial hypothesis is not parsimonious. Similar comments have been made by several outside commentators who question whether past accessibility into the Dinaledi Subsystem may at one time have been substantially different from the situation documented in previous work. Several pieces of evidence are relevant to these questions and we have included some discussion of them in the Introduction, and additionally include a section in the Supplementary Information (“Entrances to the cave system”) to provide additional context for these questions. Homo naledi remains are found not only within the Dinaledi Subsystem but also in other parts of the cave system including the Lesedi Chamber, which is similarly difficult for non-expert cavers to access. The body plan, mass, and specific morphology of H. naledi suggest that this species would be vastly more suited to moving and climbing within narrow underground passages than living people. On this basis it is not unparsimonious to suggest that the evidence resulted from H. naledi activity within these spaces. We note that the accessibility of the subsystem is not strictly relevant to the hypothesis of cultural burial, although the location of the remains does inform the overall context which may reflect a selection of a location perceived as special in some way. 

      Stuffing bodies down the entry to the subsystem:

      Reviewer 3 suggests that one explanation for the emplacement of articulated remains at the top of the sloping floor of the Hill Antechamber is that bodies were “stuffed” into the chute that comprises the entry point of the subsystem and passively buried by additional accumulation of remains. This was one hypothesis presented in earlier work (Dirks et al. 2015) and considered there as a minimal explanation because it did not entail the entry of H. naledi individuals into the subsystem. The further exploration (Elliott et al. 2021) and ongoing survey work, as well as this manuscript, all have resulted in data that rejects this hypothesis. The revised manuscript includes a section in the results “Deposition upon a talus with passive burial” that examines this hypothesis in light of the data. 

      Recognition of pits:

      Referee 3 and 4 and several additional commentators have emphasized that the recognition of pit features is necessary to the hypothesis of burial, and questioned whether the data presented in the manuscript were sufficient to demonstrate that pits were present. We have revised the manuscript in several ways to clarify how all the different kinds of evidence from the subsystem test the hypothesis that pits were present. This includes the presentation of a minimal definition of burial to include a pit dug by hominins, criteria for recognizing that a pit was present, and an evaluation of the evidence in each case to make clear how the evidence relates to the presence of a pit and subsequent infill. As referee 3 notes, it can be challenging to recognize a pit when sediment is relatively homogeneous. This point was emphasized in the review by Pomeroy and coworkers (2020b), who reflected on the difficulty seeing evidence for shallow pits constructed by hominins, and we have cited this in the main text. As a result, the evidence for pits has been a recurrent topic of debate for most Pleistocene burial sites. However in addition to the sedimentological and contextual evidence in the cases we describe, the current version also reflects upon other possible mechanisms for the accumulation of bones or bodies. The data show that the sedimentary fill associated with the H. naledi remains in the cases we examine could not have passively accumulated slowly and is not indicative of mass movement by slumping or other high-energy flow. To further put these results into context, we added a section to the Discussion that briefly reviews prior work on distinguishing pits in Pleistocene burial contexts, including the substantial number of sites with accepted burial evidence for which no evidence of a pit is present. 

      Extent of articulation and anatomical association:

      We have added significantly greater detail to the descriptions of articulated remains and orientation of remains in order to describe more specifically the configuration of the skeletal material. We also provide 14 figures in main text (13 of them new) to illustrate the configuration of skeletal remains in our data. For the Puzzle Box area, this now includes substantial evidence on the individuation of skeletal fragments, which enables us to illustrate the spatial configuration of remains associated with the DH7 partial skeleton, as well as the spatial position of fragments refitted as part of the DH1, DH2, DH3, and DH4 crania. For Dinaledi Feature 1 and the Hill Antechamber Feature we now provide figures that key skeletal parts as identified, including material that is unexcavated where possible, and a skeletal part representation figure for elements excavated from Dinaledi Feature 1. 

      Archaeothanatology:

      Reviewer 2 suggests that a greater focus on the archaeothanatology literature would be helpful to the analysis, with specific reference to the sequence of joint disarticulation, the collapse of sediment and remains into voids created by decomposition, and associated fragmentation of the remains. In the revised manuscript we have provided additional analysis of the Hill Antechamber Feature with this approach in mind. This includes greater detail and illustration of our current hypothesis for individuation of elements. We now discuss a hypothesis of body disposition, describe the persistent joints and articulation of elements, and examine likely decomposition scenarios associated with these remains. Additionally, we expand our description and illustration of the orientation of remains and degree of anatomical association and articulation within Dinaledi Feature 1. For this feature and for the Hill Antechamber Feature we have revised the text to describe how fracturing and crushing patterns are consistent with downward pressure from overlying sediment and material. In these features, postdepositional fracturing occurred subsequent to the decomposition of soft tissue and partial loss of organic integrity of the bone. We also indicate that the loss by postdepositional processes of most long bone epiphyses, vertebral bodies, and other portions of the skeleton less rich in cortical bone, poses a challenge for testing the anatomical associations of the remaining elements. This is a primary reason why we have taken a conservative approach to identification of elements and possible associations. 

      A further aspect of the site revealed by our analysis is the selective reworking of sediments within the Puzzle Box area subsequent to the primary deposition of some bodies. The skeletal evidence from this area includes body parts with elements in anatomical association or articulation, juxtaposed closely with bone fragments at varied pitch and orientation. This complexity of events evidenced within this area is a challenge for approaches that have been developed primarily based on comparative data from single-burial situations. In these discussions we deepen our use of references as suggested by the referee.   

      Burial positions:

      Reviewer 2 further suggests that illustrations of hypothesized burial positions would be valuable. We recognize that a hypothesized burial position may be an appealing illustration, and that some recent studies have created such illustrations in the context of their scientific articles. However such illustrations generally include a great deal of speculation and artist imagination, and tend to have an emotive character. We have added more discussion to the manuscript of possible primary disposition in the case of the Hill Antechamber Feature as discussed above. We have not created new illustrations of hypothesized burial positions for this revision. 

      Carnivore involvement:

      Referee 1 suggests that the manuscript should provide further consideration of whether carnivore activity may have introduced bones or bodies into the cave system. The reorganized Introduction now includes a review of previous work, and an expanded discussion within the Supplementary Information (“Hypotheses tested in previous work”). This includes a review of literature on the topic of carnivore accumulation and the evidence from the Dinaledi and Lesedi Chamber that rejects this hypothesis. 

      Water transport and mud:

      The eLife referees broadly accepted previous work showing that water inundation or mass flow of water-saturated sediment did not occur within the history of Unit 2 and 3 sediments, including those associated with H. naledi remains. However several outside commentators did refer specifically to water flow or mud flow as a mechanism for slumping of deposits and possible sedimentary covering of the remains. To address these comments we have added a section to the

      Supplementary Information (“Description of the sedimentary deposits of the Dinaledi Subsystem”) that reviews previous work on the sedimentary units and formation processes documented in this area. We also include a subsection specifically discussing the term “mud” as used in the description of the sedimentology within the system, as this term has clearly been confusing for nonspecialists who have read and commented on the work. We appreciate the referees’ attention to the previous work and its terminology.  

      Redescription of areas of the cave system:

      Reviewer 1 suggests that a detailed reanalysis of all portions of the cave system in and around the Dinaledi Subsystem is warranted to reject the hypothesis that bodies entered the space passively and were scattered from the floor by natural (i.e. noncultural) processes. The referee suggests that National Geographic could help us with these efforts. To address this comment we have made several changes to the manuscript. As noted above, we have added material in Supplementary Information to review the geochronology of the Dinaledi Subsystem and nearby Dragon’s Back Chamber, together with a discussion of the connections between these spaces. 

      Most directly in response to this comment we provide additional documentation of the possibility of movement of bodies or body parts by gravity within the subsystem itself. This includes detailed floor maps based on photogrammetry and LIDAR measurement, where these are physically possible, presented in Figures 2 and 3. In some parts of the subsystem the necessary equipment cannot be used due to the extremely confined spaces, and for these areas our maps are based on traditional survey methods. In addition to plan maps we have included a figure showing the elevation of the subsystem floor in a cross-section that includes key excavation areas, showing their relative elevation. All figures that illustrate excavation areas are now keyed to their location with reference to a subsystem plan. These data have been provided in previous publications but the visualization in the revised manuscript should make the relationship of areas clear for readers. The Introduction now includes text that discusses the configuration of the Hill Antechamber, Dinaledi Chamber, and nearby areas, and also discusses the instances in which gravity-driven movement may be possible, at the same time reviewing that gravity-driven movement from the entry point of the subsystem to most of the localities with hominin skeletal remains is not possible. 

      Within the Results, we have added a section on the relationship of features to their surroundings in order to assist readers in understanding the context of these bone-bearing areas and the evidence this context brings to the hypothesis in question. We have also included within this new section a discussion of the discrete nature of these features, a question that has been raised by outside commentators. 

      Passive sedimentation upon a cave floor or within a natural depression:

      Reviewer 3 suggests that the situation in the Dinaledi Subsystem may be similar to a European cave where a cave bear skeleton might remain articulated on a cave floor (or we can add, within a hollow for hibernation), later to be covered in sediment. The reviewer suggests that articulation is therefore no evidence of burial, and suggests that further documentation of disarticulation processes is essential to demonstrating the processes that buried the remains. We concur that articulation by itself is not sufficient evidence of cultural burial. To address this comment we have included a section in the Results that tests the hypothesis that bodies were exposed upon the cave floor or within a natural depression. To a considerable degree, additional data about disarticulation processes subsequent to deposition are provided in our reanalysis of the Puzzle Box area, including evidence for selective reworking of material after burial. 

      Postdepositional movement and floor drains:

      Reviewer 3 notes that previous work has suggested that subsurface floor drains may have caused some postdepositional movement of skeletal remains. The hypothesis of postdepositional slumping or downslope movement has also been discussed by some external commentators (including Martinón-Torres et al. 2024). We have addressed this question in several places within the revised manuscript. As we now review, previous discussion of floor drains attempted to explain the subvertical orientation of many skeletal elements excavated from the Puzzle Box area. The arrangement of these bones reflects reworking as described in our previous work, and without considering the possibility of reworking by hominins, one mechanism that conceivably might cause reworking was downward movement of sediments into subsurface drains. Further exploration and mapping, combined with additional excavation into the sediments beneath the Puzzle Box area provided more information relevant to this hypothesis. In particular this evidence shows that subsurface drains cannot explain the arrangement of skeletal material observed within the Puzzle Box area. As now discussed in the text, the reworking is selective and initiated from above rather than below. This is best explained by hominin activity subsequent to burial. 

      In a new section of the Results we discuss slumping as a hypothesis for the deposition of the remains. This includes discussion of downslope movement within the Hill Antechamber and the idea that floor drains may have been a mechanism for sediment reworking in and around the Puzzle Box area and Dinaledi Feature 1. As described in this section the evidence does not support these hypotheses. 

      Hypothesis testing and parsimony:

      Referees 1 and 3 and the editorial guidance all suggested that a more appropriate presentation would adopt a null hypothesis and test it. The specific suggestion that the null hypothesis should be a natural sedimentary process of deposition was provided not only by these reviewers but also by some outside commentators. To address this comment, we have edited the manuscript in two ways. The first is the addition of a section to the Discussion that specifically discusses hypothesis testing and parsimony as related to Pleistocene evidence of cultural burial. This includes a brief synopsis of recent disciplinary conversations and citation of work by other groups of authors, none of whom adopted this “null hypothesis” approach in their published work. 

      As we now describe in the manuscript, previous work on the Dinaledi evidence never assumed any role for H. naledi in the burial of remains. Reading the reviewer reports caused us to realize that this previous work had followed exactly the “null hypothesis” approach that some suggested we follow. By following this null hypothesis approach, we neglected a valuable avenue of investigation. In retrospect, we see how this approach impeded us from understanding the pattern of evidence within the Puzzle Box area. Thus in the revised manuscript we have mentioned this history within the Discussion and also presented more of the background to our previous work in the Introduction. Hopefully by including this discussion of these issues, the manuscript will broaden conversation about the relation of parsimony to these issues. 

      Language and presentation style:

      Reviewer 4 criticizes our presentation, suggesting that the text “gives the impression that a hypothesis was formulated before data were collected.” Other outside commentators have mentioned this notion also, including Martinón-Torres et al. (2024) who suggest that the study began from a preferred hypothesis and gathered data to support it. The accurate communication of results and hypotheses in a scientific article is a broader issue than this one study. Preferences about presentation style vary across fields of study as well as across languages. We do not regret using plain language where possible. In any study that combines data and methods from different scientific disciplines, the use of plain language is particularly important to avoid misunderstandings where terms may mean different things in different fields. 

      The essential question raised by these comments is whether it is appropriate to present the results of a study in terms of the hypothesis that is best supported. As noted above, we read carefully many recent studies of Pleistocene burial evidence. We note that in each of these studies that concluded that burial is the best hypothesis, the authors framed their results in the same way as our previous manuscript: an introduction that briefly reviews background evidence for treatment of the dead, a presentation of results focused on how each analysis supports the hypothesis of burial for the case, and then in some (but not all) cases discussion of why some alternative hypotheses could be rejected. We do not infer from this that these other studies started from a presupposition and collected data only to confirm it. Rather, this is a simple matter of presentation style. 

      The alternative to this approach is to present an exhaustive list of possible hypotheses and to describe how the data relate to each of them, at the end selecting the best. This is the approach that we have followed in the revised manuscript, as described above under the direction of the reviewer and editorial guidance. This approach has the advantage of bringing together evidence in different combinations to show how each data point rejects some hypotheses while supporting others. It has the disadvantage of length and repetition. 

      Possible artifact:

      We have chosen to keep the description of the possible artifact associated with the Hill Antechamber Feature in the Supplementary Information. We do this while acknowledging that this is against the opinion of reviewer 4, who felt the description should be removed unless the object in question is fully excavated and physically analyzed. The previous version of the manuscript did not rely upon the stone as positive evidence of grave goods or symbolic content, and it noted that the data do not test whether the possible artifact was placed or was intentionally modified. However this did not satisfy reviewer 4, and some outside commentators likewise asserted that the object must be a “geofact” and that it should be removed. 

      We have three arguments against this line of thinking. First, we do not omit data from our reporting. Whether Homo naledi shaped the rock or not, used it as a tool or not, whether the rock was placed with the body or not, it is unquestionably there. Omitting this one object from the report would be simply dishonest. Second, the data on this rock are at 16 micron resolution. While physical inspection of its surface may eventually reveal trace evidence and will enable better characterization of the raw material, no mode of surface scanning will produce better evidence about the object’s shape. Third, the position of this possible artifact within the feature provides significant information about the deposition of the skeletal material and associated sediments. The pitch, orientation, and position of the stone is not consistent with slow deposition but are consistent with the hypothesis that the surrounding sediment was rapidly emplaced at the same time as the articulated elements less than 2 cm away. 

      In the current version, we have redoubled our efforts to provide information about the position and shape of this stone while not presupposing the intentionality of its shape or placement. We add here that the attitude expressed by referee 4 and other commentators, if followed at other sites, would certainly lead to the loss or underreporting of evidence, which we are trying to avoid.  

      Consistency versus variability of behavior:

      As described in the revised manuscript, different features within the Dinaledi Subsystem exhibit some shared characteristics. At the same time, they vary in positioning, representation of individuals and extent of commingling. Other localities within the subsystem and broader cave system present different evidence. Some commentators have questioned whether the patterning is consistent with a single common explanation, or whether multiple explanations are necessary. To address this line of questioning, we have added several elements to the manuscript. We created a new section on secondary cultural burial, discussing whether any of the situations may reflect this practice. In the Discussion, we briefly review the ways in which the different features support the involvement of H. naledi without interpreting anything about the intentionality or meaning of the behavior. We further added a section to the Discussion to consider whether variation among the features reflects variation in mortuary practices by H. naledi. One aspect of this section briefly cites variation in the location and treatment of skeletal remains at other sites with evidence of burial. 

      Grave goods:

      Some commentators have argued that grave goods are a necessary criterion for recognizing evidence of ancient burial. We added a section to the Discussion to review evidence of grave goods at other Pleistocene sites where burial is accepted. 

      References:

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. eLife, 4, e09561. https://doi.org/10.7554/eLife.09561

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. eLife, 6, e24231. https://doi.org/10.7554/eLife.24231

      • Elliott, M., Makhubela, T., Brophy, J., Churchill, S., Peixotto, B., FEUERRIEGEL, E., Morris, H., Van Rooyen, D., Ramalepa, M., Tsikoane, M., Kruger, A., Spandler, C., Kramers, J., Roberts, E., Dirks, P., Hawks, J., & Berger, L. R. (2021). Expanded Explorations of the Dinaledi Subsystem,Rising Star Cave System, South Africa. PaleoAnthropology, 2021(1), 15–22. https://doi.org/10.48738/2021.iss1.68

      • Fewlass, H., Zavala, E. I., Fagault, Y., Tuna, T., Bard, E., Hublin, J.-J., Hajdinjak, M., & Wilczyński, J. (2023). Chronological and genetic analysis of an Upper Palaeolithic female infant burial from Borsuka Cave, Poland. iScience, 26(12). https://doi.org/10.1016/j.isci.2023.108283

      • Foecke, Kimberly K., Queffelec, Alain, & Pickering, Robyn. (n.d.). No Sedimentological Evidence for Deliberate Burial by Homo naledi – A Case Study Highlighting the Need for Best Practices in Geochemical Studies Within Archaeology and Paleoanthropology. PaleoAnthropology, 2024. https://doi.org/10.48738/202x.issx.xxx

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015. https://doi.org/10.1007/s12520-013-0163-2

      • Maloney, T. R., Dilkes-Hall, I. E., Vlok, M., Oktaviana, A. A., Setiawan, P., Priyatno, A. A. D., Ririmasse, M., Geria, I. M., Effendy, M. A. R., Istiawan, B., Atmoko, F. T., Adhityatama, S., Moffat, I., Joannes-Boyau, R., Brumm, A., & Aubert, M. (2022). Surgical amputation of a limb 31,000 years ago in Borneo. Nature, 609(7927), 547–551. https://doi.org/10.1038/s41586-022-05160-8

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), Article 7857. https://doi.org/10.1038/s41586021-03457-8

      • Martinón-Torres, M., Garate, D., Herries, A. I. R., & Petraglia, M. D. (2023). No scientific evidence that Homo naledi buried their dead and produced rock art. Journal of Human Evolution, 103464. https://doi.org/10.1016/j.jhevol.2023.103464

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020a). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26. https://doi.org/10.15184/aqy.2019.207

      • Pomeroy, E., Hunt, C. O., Reynolds, T., Abdulmutalb, D., Asouti, E., Bennett, P., Bosch, M., Burke, A., Farr, L., Foley, R., French, C., Frumkin, A., Goldberg, P., Hill, E., Kabukcu, C., Lahr, M. M., Lane, R., Marean, C., Maureille, B., … Barker, G. (2020b). Issues of theory and method in the analysis of Paleolithic mortuary behavior: A view from Shanidar Cave. Evolutionary Anthropology: Issues, News, and Reviews, 29(5), 263–279. https://doi.org/10.1002/evan.21854

      • Robbins, J. L., Dirks, P. H. G. M., Roberts, E. M., Kramers, J. D., Makhubela, T. V., HilbertWolf, H. L., Elliott, M., Wiersma, J. P., Placzek, C. J., Evans, M., & Berger, L. R. (2021). Providing context to the Homo naledi fossils: Constraints from flowstones on the age of sediment deposits in Rising Star Cave, South Africa. Chemical Geology, 567, 120108. https://doi.org/10.1016/j.chemgeo.2021.120108

      • Wiersma, J. P., Roberts, E. M., & Dirks, P. H. G. M. (2020). Formation of mud clast breccias and the process of sedimentary autobrecciation in the hominin-bearing (Homo naledi) Rising Star Cave system, South Africa. Sedimentology, 67(2), 897–919. https://doi.org/10.1111/sed.12666

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work tried to map the synaptic connectivity between the inputs and outputs of the song premotor nucleus, HVC in zebra finches to understand how sensory (auditory) to motor circuits interact to coordinate song production and learning. The authors optimized the optogenetic technique via AAV to manipulate auditory inputs from a specific auditory area one-by-one and recorded synaptic activity from a neuron with whole-cell recording from slice preparation with identification of the projection area by retrograde neuronal tracing. This thorough and detailed analysis provides compelling evidence of synaptic connections between 4 major auditory inputs (3 forebrain and 1 thalamic region) within three projection neurons in the HVC; all areas give monosynaptic excitatory inputs and polysynaptic inhibitory inputs, but proportions of projection to each projection neuron varied. They also find specific reciprocal connections between mMAN and Av. Taken together the authors provide the map of the synaptic connection between intercortical sensory to motor areas which is suggested to be involved in zebra finch song production and learning.

      Strengths:

      The authors optimized optogenetic tools with eGtACR1 by using AAV which allow them to manipulate synaptic inputs in a projection-specific manner in zebra finches. They also identify HVC cell types based on projection area. With their technical advance and thorough experiments, they provided detailed map synaptic connections.

      Weaknesses:

      As it is the study in brain slice, the functional implication of synaptic connectivity is limited. Especially as all the experiments were done in the adult preparation, there could be a gap in discussing the functions of developmental song learning.

      We thank the reviewer for their appreciation of our work. Although we agree that there can be limitations to brain slice preparations, the approaches used here for synaptic connectivity mapping are well-designed to identify long-range synaptic connectivity patterns. Optogenetic stimulation of axon terminals in brain slices does not require intact axons and works well when axons are cut, allowing identification of all inputs expressing optogenetic channels from aXerent regions. Terminal stimulation in slices yields stable post-synaptic responses for hours without rundown, assuring that polysynaptic and monosynaptic connections can be reliably identified in our brain slices.  Additionally, conducting similar types of experiments in vivo can run into important limitations. First, the extent of TTX and 4-AP diXusion, which is necessary for identification of long-range monosynaptic connections, can be diXicult to verify in vivo - potentially confounding identification of monosynaptic connectivity.  Second, conducting whole-cell patch-clamp experiments in vivo, particularly in deeper brain regions, is technically challenging, and would limit the number of cells that can be patched and increase the number of animals needed. 

      We agree that there may well be important diXerences between adult connectivity and connectivity patterns in the juvenile brain. Indeed, learning and experience during development almost certainly shape connectivity patterns and these patterns of connectivity may change incrementally and/or dynamically during development. Ultimately, adult connectivity patterns are the result of changes in the brain that accrue over development. Given that this is the first study mapping long-range connectivity of HVC input-output pathways, we reasoned that the adult connectivity would provide a critical reference allowing future studies to map diXerent stages of juvenile connectivity and the changes in connectivity driven by milestones like forming a tutor song memory, sensorimotor learning, and song crystallization.

      In this revision we worked to better highlight the points raised above and thank the reviewer for their comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes synaptic connectivity in the Songbird cortex's four main classes of sensory neuron aXerents onto three known classes of projection neurons of the pre-motor cortical region HVC. HVC is a region associated with the generation of learned bird songs. Investigators here use all male zebra finches to examine the functional anatomy of this region using patch clamp methods combined with optogenetic activation of select neuronal groups.

      Strengths:

      The quality of the recordings is extremely high and the quantity of data is on a very significant scale, this will certainly aid the field.

      Weaknesses:

      The authors could make the figures a little easier to navigate. Most of the figures use actual anatomical images but it would be nice to have this linked with a zebra finch atlas in more of a cartoon format that accompanied each fluro image. Additionally, for the most part, figures showing the labeling lack scale bar values (in um). These should be added not just shown in the legends.

      The authors could make it clear in the abstract that this is all male zebra finches - perhaps this is obvious given the bird song focus, but it should be stated. The number of recordings from each neuron class and the overall number of birds employed should be clearly stated in the methods (this is in the figures, but it should say n=birds or cells as appropriate).

      The authors should consider sharing the actual electrophysiology records as data.

      We thank the reviewer for their assessment of our research and suggestions. We have implemented many of these suggestions and provide details in our response to their specific Recommendations. Additionally, we are organizing our data and will make it publicly available with the version of record.

      Reviewer #3 (Public review):

      Nucleus HVC is critical both for song production as well as learning and arguably, sitting at the top of the song control system, is the most critical node in this circuit receiving a multitude of inputs and sending precisely timed commands that determine the temporal structure of song. The complexity of this structure and its underlying organization seem to become more apparent with each experimental manipulation, and yet our understanding of the underlying circuit organization remains relatively poorly understood. In this study, Trusel and Roberts use classic whole-cell patch clamp techniques in brain slices coupled with optogenetic stimulation of select inputs to provide a careful characterization and quantification of synaptic inputs into HVC. By identifying individual projection neurons using retrograde tracer injections combined with pharmacological manipulations, they classify monosynaptic inputs onto each of the three main classes of glutamatergic projection neurons in HVC (RA-, Area X- and Av-projecting neurons). This study is remarkable in the amount of information that it generates, and the tremendous labor involved for each experiment, from the expression of opsins in each of the target inputs (Uva, NIf, mMAN, and Av), the retrograde labelling of each type of projection neuron, and ultimately the optical stimulation of infected axons while recording from identified projection neurons. Taken together, this study makes an important contribution to increasing our identification, and ultimately understanding, of the basic synaptic elements that make up the circuit organization of HVC, and how external inputs, which we know to be critical for song production and learning, contribute to the intrinsic computations within this critic circuit.

      This study is impressive in its scope, rigorous in its implementation, and thoughtful regarding its limitations. The manuscript is well-written, and I appreciate the clarity with which the authors use our latest understanding of the evolutionary origins of this circuit to place these studies within a larger context and their relevance to the study of vocal control, including human speech. My comments are minor and primarily about legibility, clarification of certain manipulations, and organization of some of the summary figures.

      We thank the reviewer for their thoughtful assessment of our research.

      Recommendations for the authors:

      The following recommendations were considered by all reviewers to be important to incorporate for improving this paper:

      (1) Clarify the site of viral injection and the possibility of labeling other structures a) Show images of viral injection sites.

      We provide a representative image of viral expression for each pathway studied in this manuscript. Please see panel A in Figures 2-3 and 5-6 showing our viral expression in Uva, NIf, mMAN, and Av respectively.  

      b) Include in discussion caveats that the virus may spread beyond the boundaries of structures (e.g. especially injections into NIF could spread into Field L).

      For each HVC aXerent nucleus we have now included a sentence describing the possible spread of viral infection in surrounding structures in the Results. We also now expanded the image from the Av section to include NIf, to showcase lack of viral expression in NIf (see Fig. 6A).

      (2) Clarify the logic and precise methods of the TTX and 4-AP experiments

      a) Please see the detailed issue raised by Reviewer 3, Major Point 1 below.

      The TTX and 4AP application is the gold-standard of opsin-assisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 (Petreanu, Mao et al. 2009) and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review(Linders, Supiot et al. 2022). We now better describe the logic of this approach in the second paragraph of the Results section and cite the first description of this method from the Svoboda lab and a recent review weighing this method with other optogenetic methods for tracing synaptic connections in the brain.

      (3) Include caveats in discussion

      a) Note that there may be other inputs to HVC that were not examined in this study (e.g. CMM, Field L)

      In our original manuscript we did state “Although a complete description of HVC circuitry will require the examination of other potential inputs (i.e. RA<sub>HVC</sub> PNs, A11 glutamatergic neurons(Roberts, Klein et al. 2008, Ben-Tov, Duarte et al. 2023)) and a characterization of interneuron synaptic connectivity, here we provide a map of the synaptic connections between the 4 best described aPerents to HVC and its 3 populations of projection neurons” in the last paragraph of the Discussion. We have now edited this sentence to include the projection from NCM to HVC and cited Louder et al., 2024.

      We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.

      b) Also note that birds in this study were adults and that some inputs to HVC likely to be important for learning may recede during development (e.g. Louder et al, 2024).

      In the second to last paragraph of the Discussion we now state: While our opsin-assisted circuit mapping provides us with a new level of insight into HVC synaptic circuitry, there are limitations to this research that should be considered. All circuit mapping in this study was carried out in brain slices from adult male zebra finches. Future studies will be needed to examine how this adult connectivity pattern relates to patterns of connectivity in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds.   

      (4) Consider cosmetic changes to figures as suggested by Reviewers 2-3 below.

      We thank the reviewers for their suggestions and have implemented the changes as best we can.

      (5) Address all minor issues raised below.

      Reviewer #1 (Recommendations for the authors):

      I see this study is well designed to answer the author's specific question, mapping synaptic auditorymotor connections within HVC. Their experiments with advanced techniques of projection-specific optogenetic manipulation of synaptic inputs and retrograde identification of projection areas revealed input-output combination selective synaptic mapping.

      As I found this study advanced our knowledge with the compelling dataset, I have only some minor comments here.

      (1) One technical concern is we don't see how much the virus infection was focused on the target area and if we can ignore the eXect of synaptic connectivity from surrounding areas. As the amount of virus they injected is large (1.5ul) and target areas are small, we assume the virus might spread to the surrounding area, such as field L which also projects to HVC when targeting Nif. While I think the majority of the projections were from their target areas, it would be better to mention (also the images with larger view areas) the possibilities of projections of surrounding areas.

      We agree with the reviewer about the concern about specificity of viral expression. For this reason, we included sample images of the viral expression in each target area (panel A in Fig. 2,3,5,6). We have now also included a sentence at the beginning of each subsection of our Result to describe how we have ensured interpretability of the results. Uva and mMAN’s surrounding areas are not known to project to HVC. Possible cross-infection is an issue for Av and NIf, and we checked each bird’s injection site to ensure that eGtACR1+ cells were not visible in the unintended HVC-projecting areas.

      As mentioned in our response the public comment, consistent with Vates (Vates, Broome et al. 1996) we do not see evidence that Field L projects directly to HVC (see Fig. 3G).

      (2) Another concern about the technical issue is the damage to axonal projections. While I understand the authors stimulated axonal terminals axonal projections were assumed to be cut and their ability to release neurotransmitters would be reduced especially after long-term survival or repeated stimulation. Mentioning whether projection pathways were within their 230um-thick slice (probably depends on input sites) or not and the eXect of axonal cut would be helpful.

      We agree that slice electrophysiology has limitations. However, we disagree with the claim of reduced reliability or stability of the evoked response. We and others find that electrical and optogenetic repeated terminal stimulation in slices can yield stable post-synaptic responses for tens of minutes and even hours (Bliss and Gardner-Medwin 1973, Bliss and Lomo 1973, Liu, Kurotani et al. 2004, Pastalkova, Serrano et al. 2006, Xu, Yu et al. 2009, Trusel, Cavaccini et al. 2015, Trusel, Nuno-Perez et al. 2019). Indeed, long-term synaptic plasticity experiments in most preparations and across brain areas rely on such stability of the presynaptic machinery for synaptic release, despite axons being severed from their parent soma. Our assumption is the vast majority, if not all, connections between axon terminals and their cell body in the aXerent regions have been cut in our preparations. Nonetheless, the diversity of outcomes we report (currents returning after TTX+4AP or not, depending on the specific combination of input and HVCPN class) is consistent with the robustness of the synaptic interrogation method. 

      (3) While I understand this study focused on 4 major input areas and the authors provide good pictures of synaptic HVC connections from those areas, HVC has been reported to receive auditory inputs from other areas as well (CMM, FieldL, etc.). It is worth mentioning that there are other auditory inputs and would be interesting to discuss coordination with the inputs from other areas.

      We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.

      (4) The HVC local neuronal connections have been reported to be modified and a recent study revealed the transient auditory inputs into HVC during song learning period. The author discusses the functions of HVC synaptic connections on song learning (also title says synaptic connection for song learning), however, the experiments were done in adults and dp not discuss the possibility of diXerent synaptic connection mapping in juveniles in the song learning period. Mentioning the neuronal activities and connectivity changes during song learning is important. Also, it would be helpful for the readers to discuss the potential diXerences between juveniles/adults if they want to discuss the functions of song learning.

      We now mention in the Discussion that this is an important caveat of our research and that future studies will be needed to examine how these adult connectivity patterns relate to connectivity patterns in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds. Nonetheless, the title and abstract cite song learning because it is important for the broader public to understand that at least some of these aXerent brain regions carry an essential role in song learning (Foster and Bottjer 2001, Roberts, Gobes et al. 2012, Roberts, Hisey et al. 2017, Zhao, Garcia-Oscos et al. 2019, Koparkar, Warren et al. 2024).

      Reviewer #2 (Recommendations for the authors):

      The work is very detailed and will be an important resource to those working in the field. The recordings are of a high quality and lots of information is included such as measures of response kinetics amplitude and pharmacological confirmation of excitatory and inhibitory synaptic responses. In general, I feel the quality is extremely high and the quantity of data is on a very significant exhaustive scale that will certainly aid the field. I have come at this conclusion as a non zebra finch person but I feel the connection information shown will be of benefit given its high quality.

      Figure 7 is a nice way of showing the overall organization. Optional suggestion, consider highlighting anything in Figure 7 that results in a new understanding of the song system as compared to previous work on anatomy and function.

      We thank the reviewer for the kind comments about our research. We have highlighted our newly found connection between mMAN and Av and all the connections onto the HVC PNs in Panel B are newly identified in this study.

      Reviewer #3 (Recommendations for the authors):

      Major points

      (1) Clarification regarding methods for determining monosynaptic events:

      One of the manipulations that I struggled the most with was those describing the use of TTX + 4AP to isolate monosynaptic events. Initially, not being as familiar with the use of optically based photostimulation of axons to release transmitter locally, I was initially confused by statements such as "we found that oEPSC returned after application of TTX+4AP". This might be clear to someone performing these manipulations, but a bit more clarification would be helpful. Should I assume that an existing monosynaptic EPSC would be masked by co-occurring polysynaptic IPSCs which disappear following application of TTX + 4AP, thereby unmasking the monosynaptic EPSC, thereby causing the EPSC to "return"? A word that I am not sure works. Continuing my confusion with these experiments, I am unsure how this cocktail of drugs is added, if it is even added as a cocktail, which is what I initially assumed. The methods and the results are not so clear if they are added in sequence and why and if traces are recorded after the addition of both drugs or if they are recorded for TTX and then again for TTX + 4AP. Finally, looking at the traces in the experimental figures (e.g. Figures 2F, 3F, 5F, and 6F), it is diXicult to see what is being shown, at least for me. First, the authors need to describe better in the results why they stimulate twice in short succession and why they seem to use the response to the second pulse (unless I am mistaken) to measure the monosynaptic event. Second, I was confused by the traces (which are very small) in the presence of TTX. I would have expected to see a response if there was a monosynaptic EPSC but I only seem to see a flat line.  

      The confusion that I list above might be due in part to my ignorance, but it is important in these types of papers not to assume too much expertise if you want readers with a less sophisticated understanding of synaptic physiology to understand the data. In other words, a little bit more clarity and hand-holding would be welcome.

      We understand the reviewer’s confusion about the methodology.  In Voltage clamp, the amplifier injects current through the electrode maintaining the membrane voltage to -70mV, where the equilibrium potential for Cl- is near equilibrium, and therefore the only synaptic current evoked by light stimulation is due to cation influx, mainly through AMPA receptors (see Fig. 1).  Therefore, cooccurring polysynaptic IPSCs wouldn’t be visible. We examine those holding the membrane voltage at +10mV, see Fig. 1. TTX application suppresses V-dependent Na+ channels and therefore stops all neurotransmission. We show the traces upon TTX to show that currents we were recording prior to TTX application were of synaptic origin, and not due to accidental expression of opsin in the patched cell. Also, this ensures that any current visible after 4AP application is due to monosynaptic transmission and not to a failure of TTX application.

      After recording and light stimulation with TTX, we then add 4AP, which is a blocker of presynaptic K+ channels. This prevents the repolarization of the terminals that would occur in response to opsinmediated local depolarization. 4AP application, therefore, allows local opsin-driven depolarizations to reach the threshold for Ca2+-dependent vesicle docking and release. This procedure selectively reveals or unmasks the monosynaptic currents because any non-monosynaptically connected neuron would still need V-dependent Na+ channels to eXectively produce indirect neurotransmission onto the patched cell. The TTX and 4AP application is the gold-standard of opsinassisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review (Linders et al., 2022). We now include 2 more sentences near the beginning of the Results to clarify this process and directly point to the Linders review for researchers wanting a deeper explanation of this technique. 

      The double stimulation is unrelated to our testing of monosynaptic connections. We originally conducted the experiments by delivering 2 pulses of light separated by 50ms, a common way to examine the pair-pulse ratio (PPR) – a physiological measure which is used to probe synapses for short-term plasticity and release probability. However, through discussions with colleagues we realized that the slow decay time of eGtACR1 may complicate interpretation of the response to the second light pulse. Thus, we elected to not report these results and indicated this in the Methods section:  “We calculated the paired-pulse ratio (PPR) as the amplitude of the second peak divided by the amplitude of the first peak elicited by the twin stimuli, however due to slow kinetics of eGtACR1 the results would be diPicult to interpret, and therefore we are not currently reporting them.” 

      (2) Suggestions for improving summary figures:

      Summary Figure 1a: The circuit diagram (schematic to the right of 1a) is OK but I initially found it a bit diXicult to interpret. For example, it is not clear why pink RA projecting neurons don't reach as far to the right as X or Av projecting neurons, suggesting that they are not really projection neurons. Also, the big question marks in the intermediate zone are not entirely intuitive. It seems there might be a better way of representing this. It might also be worth stating in the figure legend that the interconnectivity patterns shown in the figure between PNs in HVC are based on specific prior studies.

      We thank the reviewer for the constructive criticism. We have modified the figure to extend the RA projection line and mentioned in the figure legend that connectivity between PNs is based on prior studies.

      Summary Figure 1a: I am not sure I love this figure. There are a few minor issues. First, there are too many browns [Nif/AV and mMAN] which makes it more challenging to clearly disambiguate the diXerent projections. Second, it is unclear why this figure does not represent projections from RA to HVC. My biggest concern with this figure is that it oversimplifies some of the findings. From the figure, one gets the impression that Uva only projects to RA-PNs and that Av only projects to X-PNs even though the authors show connections to other PNs. With the small sample size in this current study for each projection and each PN type, one really cannot rule out that these "minority" projections are not important. I, therefore, suggest that the authors qualitatively represent the strength/probability of connections by weighting with thickness of aXerent connections.

      We assume the reviewer is commenting on our summary figure panel 7B. We agree with the referee that this is a simplified representation of our findings. We had indeed indicated in the legend that this was just a “Schematic of the HVC aXerent connectivity map resulting from the present work” and that “For conceptualization purposes, aXerent connectivity to HVC-PNs is shown only when the rate of monosynaptic connectivity reaches 50% of neurons examined”. We have added a title to highlight that this is but a simplification. We have now adjusted the colors to make the figure easier to follow. Based on the reviewers critique we searched for a better method for summarizing the complex connectivity patterns described in this research. We settled on a Sankey diagram of connectivity. This is now Figure 7C. In this diagram, we are able to show the proportion of connections from each input pathway onto each class of neuron and if these connections are poly or monosynaptic. We find this to a straightforward way of displaying all of the connectivity patterns identified in our figure 2-3 and 4-5 look forward to understanding if the reviewers find this a useful way of illustrating our findings.

      Minor points:

      (1) Line 50 - typo - song circuits.

      Thank you for catching this.

      (2) Line 106 - 111 - The findings suggest that 100% of Uva projections onto HVCRA neurons are monosynaptic. However, because the authors only tested 6 neurons their statements that their findings are so diXerent from other studies, should be somewhat tempered since these other studies (e.g. Moll et al.) looked at 251 neurons in HVC and sampling bias could still somewhat explain the diXerence.

      We observed oEPSCs in 43 of 51 (84.3%) HVC-RA neurons recorded (mean rise time = 2.4 ms) and monosynaptic connections onto 100% of the HVC-RA neurons tested (n = 6). Moll et al. combined electrical stimulation of Uva with two-photon calcium imaging (GCaMP6s) of putative HVC-RA neurons (n = 251 neurons). We should note that these are putative HVC-RA neurons because they were not visually identified using retrograde tracing or using some other molecular handle. They report that only ~16% of HVC-RA neurons showed reliable calcium responses following Uva stimulation. Although the experiments by Moll et al are technically impressive, calcium imaging is an insensitive technique for measuring post-synaptic responses, particularly subthreshold responses, when compared to whole-cell patch-clamp recordings. This approach cannot identify monosynaptic connections and is likely limited to only be sensitive suprathreshold activity that likely relies on recruitment of other polysynaptic inputs onto the neurons in HVC. Furthermore, as indicated in the Discussion, our opsin-mediated synaptic interrogation recruits any eGtACR1+ Uva terminal in the slice and therefore will have great likelihood of revealing any existing connections. 

      A limitation of whole-cell patch-clamp recordings is that it is a laborious low throughput technique. Future experiments using better imaging approaches, like voltage imaging, may be able to weigh in on diXerences between what we report here using whole-cell patch-clamp recordings from visually identified HVC-RA neurons combined with optogenetic manipulations of Uva terminals and the calcium imaging results reported by Moll. Nonetheless, whole-cell patch-clamp recordings combined with optogenetic manipulations is likely to remain the most sensitive method for identifying synaptic connectivity.

      (3) Figure 2G - the significance of white circles is not clear.

      The figure legend indicates that those highlight and mark the position of “retrogradely labeled HVCprojecting neurons in Uva (cyan, white circles)” to facilitate identification of colocalization with the in-situ markers.

      (4) Line 135 - Cardin et al. (J. Neurophys. 2004) is the first to show that song production does not require Nif.

      We thank the reviewer pointing this out and we have cited this important study. 

      (5) Line 183 - This is a confusing sentence because I initially thought that mMAN-mMANHVC PNs was a category!

      We switched the dash with a colon.

      (6) Figure 4d could use some arrows to identify what is shown. It is assumed that the box represents mMAN. Should it be assumed that Av is not in the plane of this section? If not, this should be stated in the legend. It is also unclear where the anterograde projections are. Is this the dork highway that goes from the box to the dorsal surface? If yes this should be indicated but it should also be made clear why the projections go both in the dorsal as well as the ventral directions.

      The inset, as indicated by the lines around it, is a magnification of the terminal fields in Av. We added an explanation of the inset.

      (7) Discussion. In the introduction, the authors mention projections from RA to HVC but never end up studying them in the current manuscript which seems like a missed opportunity and perhaps even a weakness of the study. In the discussion, it would certainly be good for the authors to at least discuss the possible significance of these projections and perhaps why they decided not to study them.

      We thank the reviewer for the comment. Unfortunately, we couldn’t reliably evoke interpretable currents from RA, and we elected to publish the current version of the paper with these 4 major inputs. Nonetheless, we have indicated in the Introduction and in the Discussion that more inputs (e.g. RA, A11, NCM) remain to be evaluated. 

      (8) Line 622 - Is this reference incomplete?

      We thank the reviewer. We have corrected the reference.

      • Ben-Tov, M., F. Duarte and R. Mooney (2023). "A neural hub for holistic courtship displays." Curr Biol 33(9): 1640-1653 e1645.

      • Bliss, T. V. and A. R. Gardner-Medwin (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the unanaestetized rabbit following stimulation of the perforant path." J Physiol 232(2): 357-374.

      • Bliss, T. V. and T. Lomo (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path." J Physiol 232(2): 331-356.

      • Foster, E. F. and S. W. Bottjer (2001). "Lesions of a telencephalic nucleus in male zebra finches: Influences on vocal behavior in juveniles and adults." J Neurobiol 46(2): 142-165.

      • Koparkar, A., T. L. Warren, J. D. Charlesworth, S. Shin, M. S. Brainard and L. Veit (2024). "Lesions in a songbird vocal circuit increase variability in song syntax." Elife 13.

      • Linders, L. E., L. F. Supiot, W. Du, R. D'Angelo, R. A. H. Adan, D. Riga and F. J. Meye (2022). "Studying Synaptic Connectivity and Strength with Optogenetics and Patch-Clamp Electrophysiology." Int J Mol Sci 23(19).

      • Liu, H. N., T. Kurotani, M. Ren, K. Yamada, Y. Yoshimura and Y. Komatsu (2004). "Presynaptic activity and Ca2+ entry are required for the maintenance of NMDA receptor-independent LTP at visual cortical excitatory synapses." J Neurophysiol 92(2): 1077-1087.

      • Louder, M. I. M., M. Kuroda, D. Taniguchi, J. A. Komorowska-Muller, Y. Morohashi, M. Takahashi, M. Sanchez-Valpuesta, K. Wada, Y. Okada, H. Hioki and Y. Yazaki-Sugiyama (2024). "Transient sensorimotor projections in the developmental song learning period." Cell Rep 43(5): 114196.

      • Pastalkova, E., P. Serrano, D. Pinkhasova, E. Wallace, A. A. Fenton and T. C. Sacktor (2006). "Storage of spatial information by the maintenance mechanism of LTP." Science 313(5790): 1141-1144.

      • Petreanu, L., T. Mao, S. M. Sternson and K. Svoboda (2009). "The subcellular organization of neocortical excitatory connections." Nature 457(7233): 1142-1145.

      • Roberts, T. F., S. M. Gobes, M. Murugan, B. P. Olveczky and R. Mooney (2012). "Motor circuits are required to encode a sensory model for imitative learning." Nat Neurosci 15(10): 1454-1459.

      • Roberts, T. F., E. Hisey, M. Tanaka, M. G. Kearney, G. Chattree, C. F. Yang, N. M. Shah and R. Mooney (2017). "Identification of a motor-to-auditory pathway important for vocal learning." Nat Neurosci 20(7): 978-986.

      • Roberts, T. F., M. E. Klein, M. F. Kubke, J. M. Wild and R. Mooney (2008). "Telencephalic neurons monosynaptically link brainstem and forebrain premotor networks necessary for song." J Neurosci 28(13): 3479-3489.

      • Trusel, M., A. Cavaccini, M. Gritti, B. Greco, P. P. Saintot, C. Nazzaro, M. Cerovic, I. Morella, R. Brambilla and R. Tonini (2015). "Coordinated Regulation of Synaptic Plasticity at Striatopallidal and Striatonigral Neurons Orchestrates Motor Control." Cell Rep 13(7): 1353-1365.

      • Trusel, M., A. Nuno-Perez, S. Lecca, H. Harada, A. L. Lalive, M. Congiu, K. Takemoto, T. Takahashi, F. Ferraguti and M. Mameli (2019). "Punishment-Predictive Cues Guide Avoidance through Potentiation of Hypothalamus-to-Habenula Synapses." Neuron 102(1): 120-127.e124.

      • Vates, G. E., B. M. Broome, C. V. Mello and F. Nottebohm (1996). "Auditory pathways of caudal telencephalon and their relation to the song system of adult male zebra finches." Journal of Comparative Neurology 366(4): 613-642.

      • Xu, T., X. Yu, A. J. Perlik, W. F. Tobin, J. A. Zweig, K. Tennant, T. Jones and Y. Zuo (2009). "Rapid formation and selective stabilization of synapses for enduring motor memories." Nature 462(7275): 915-919.

      • Zhao, W., F. Garcia-Oscos, D. Dinh and T. F. Roberts (2019). "Inception of memories that guide vocal learning in the songbird." Science 366: 83 - 89.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Wang et al., recorded concurrent EEG-fMRI in 107 participants during nocturnal NREM sleep to investigate brain activity and connectivity related to slow oscillations (SO), sleep spindles, and in particular their co-occurrence. The authors found SO-spindle coupling to be correlated with increased thalamic and hippocampal activity, and with increased functional connectivity from the hippocampus to the thalamus and from the thalamus to the neocortex, especially the medial prefrontal cortex (mPFC). They concluded the brain-wide activation pattern to resemble episodic memory processing, but to be dissociated from task-related processing and suggest that the thalamus plays a crucial role in coordinating the hippocampal-cortical dialogue during sleep.

      The paper offers an impressively large and highly valuable dataset that provides the opportunity for gaining important new insights into the network substrate involved in SOs, spindles, and their coupling. However, the paper does unfortunately not exploit the full potential of this dataset with the analyses currently provided, and the interpretation of the results is often not backed up by the results presented. I have the following specific comments.

      Thank you for your thoughtful and constructive feedback. We greatly appreciate your recognition of the strengths of our dataset and findings Below, we address your specific comments and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We hope these revisions address your comments and further strengthen our manuscript. Thank you again for the constructive feedback.

      (1) The introduction is lacking sufficient review of the already existing literature on EEG-fMRI during sleep and the BOLD-correlates of slow oscillations and spindles in particular (Laufs et al., 2007; Schabus et al., 2007; Horovitz et al., 2008; Laufs, 2008; Czisch et al., 2009; Picchioni et al., 2010; Spoormaker et al., 2010; Caporro et al., 2011; Bergmann et al., 2012; Hale et al., 2016; Fogel et al., 2017; Moehlman et al., 2018; Ilhan-Bayrakci et al., 2022). The few studies mentioned are not discussed in terms of the methods used or insights gained.

      We acknowledge the need for a more comprehensive review of prior EEG-fMRI studies investigating BOLD correlates of slow oscillations and spindles. However, these articles are not all related to sleep SO or spindle. Articles (Hale et al., 2016; Horovitz et al., 2008; Laufs, 2008; Laufs, Walker, & Lund, 2007; Spoormaker et al., 2010) mainly focus on methodology for EEG-fMRI, sleep stages, or brain networks, which are not the focus of our study. Thank you again for your attention to the comprehensiveness of our literature review, and we will expand the introduction to include a more detailed discussion of the existing literature, ensuring that the contributions of previous EEG-fMRI sleep studies are adequately acknowledged.  

      Introduction, Page 4 Lines 62-76

      “Investigating these sleep-related neural processes in humans is challenging because it requires tracking transient sleep rhythms while simultaneously assessing their widespread brain activation. Recent advances in simultaneous EEG-fMRI techniques provide a unique opportunity to explore these processes. EEG allows for precise event-based detection of neural signal, while fMRI provides insight into the broader spatial patterns of brain activation and functional connectivity (Horovitz et al., 2008; Huang et al., 2024; Laufs, 2008; Laufs, Walker, & Lund, 2007; Schabus et al., 2007; Spoormaker et al., 2010). Previous EEG-fMRI studies on sleep have focused on classifying sleep stages or examining the neural correlates of specific waves (Bergmann et al., 2012; Caporro et al., 2012; Czisch et al., 2009; Fogel et al., 2017; Hale et al., 2016; Ilhan-Bayrakcı et al., 2022; Moehlman et al., 2019; Picchioni et al., 2011). These studies have generally reported that slow oscillations are associated with widespread cortical and subcortical BOLD changes, whereas spindles elicit activation in the thalamus, as well as in several cortical and paralimbic regions. Although these findings provide valuable insights into the BOLD correlates of sleep rhythms, they often do not employ sophisticated temporal modeling (Huang et al., 2024), to capture the dynamic interactions between different oscillatory events, e.g., the coupling between SOs and spindles.”

      (2) The paper falls short in discussing the specific insights gained into the neurobiological substrate of the investigated slow oscillations, spindles, and their interactions. The validity of the inverse inference approach ("Open ended cognitive state decoding"), assuming certain cognitive functions to be related to these oscillations because of the brain regions/networks activated in temporal association with these events, is debatable at best. It is also unclear why eventually only episodic memory processing-like brain-wide activation is discussed further, despite the activity of 16 of 50 feature terms from the NeuroSynth v3 dataset were significant (episodic memory, declarative memory, working memory, task representation, language, learning, faces, visuospatial processing, category recognition, cognitive control, reading, cued attention, inhibition, and action).

      Thank you for pointing this out, particularly regarding the use of inverse inference approaches such as “open-ended cognitive state decoding.” Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. We will refocus the main text on direct neurobiological insights gained from our EEG-fMRI analyses, particularly emphasizing the hippocampal-thalamocortical network dynamics underlying SO-spindle coupling, and we will acknowledge the exploratory nature of these findings and highlight their limitations.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      (3) Hippocampal activation during SO-spindles is stated as a main hypothesis of the paper - for good reasons - however, other regions (e.g., several cortical as well as thalamic) would be equally expected given the known origin of both oscillations and the existing sleep-EEG-fMRI literature. However, this focus on the hippocampus contrasts with the focus on investigating the key role of the thalamus instead in the Results section.

      We appreciate your insight regarding the relative emphasis on hippocampal and thalamic activation in our study. We recognize that the manuscript may currently present an inconsistency between our initial hypothesis and the main focus of the results. To address this concern, we will ensure that our Introduction and Discussion section explicitly discusses both regions, highlighting the complementary roles of the hippocampus (memory processing and reactivation) and the thalamus (spindle generation and cortico-hippocampal coordination) in SO-spindle dynamics.

      Introduction, Page 5 Lines 87-103

      “To address this gap, our study investigates brain-wide activation and functional connectivity patterns associated with SO-spindle coupling, and employs a cognitive state decoding approach (Margulies et al., 2016; Yarkoni et al., 2011)—albeit indirectly—to infer potential cognitive functions. In the current study, we used simultaneous EEG-fMRI recordings during nocturnal naps (detailed sleep staging results are provided in the Methods and Table S1) in 107 participants. Although directly detecting hippocampal ripples using scalp EEG or fMRI is challenging, we expected that hippocampal activation in fMRI would coincide with SO-spindle coupling detected by EEG, given that SOs, spindles, and ripples frequently co-occur during NREM sleep. We also anticipated a critical role of the thalamus, particularly thalamic spindles, in coordinating hippocampal-cortical communication.

      We found significant coupling between SOs and spindles during NREM sleep (N2/3), with spindle peaks occurring slightly before the SO peak. This coupling was associated with increased activation in both the thalamus and hippocampus, with functional connectivity patterns suggesting thalamic coordination of hippocampal-cortical communication. These findings highlight the key role of the thalamus in coordinating hippocampal-cortical interactions during human sleep and provide new insights into the neural mechanisms underlying sleep-dependent brain communication. A deeper understanding of these mechanisms may contribute to future neuromodulation approaches aimed at enhancing sleep-dependent cognitive function and treating sleep-related disorders.”

      Discussion, Page 16-17 Lines 292-307

      “When modeling the timing of these sleep rhythms in the fMRI, we observed hippocampal activation selectively during SO-spindle events. This suggests the possibility of triple coupling (SOs–spindles–ripples), even though our scalp EEG was not sufficiently sensitive to detect hippocampal ripples—key markers of memory replay (Buzsáki, 2015). Recent iEEG evidence indicates that ripples often co-occur with both spindles (Ngo, Fell, & Staresina, 2020) and SOs (Staresina et al., 2015; Staresina et al., 2023). Therefore, the hippocampal involvement during SO-spindle events in our study may reflect memory replay from the hippocampus, propagated via thalamic spindles to distributed cortical regions.

      The thalamus, known to generate spindles (Halassa et al., 2011), plays a key role in producing and coordinating sleep rhythms (Coulon, Budde, & Pape, 2012; Crunelli et al., 2018), while the hippocampus is found essential for memory consolidation (Buzsáki, 2015; Diba & Buzsá ki, 2007; Singh, Norman, & Schapiro, 2022). The increased hippocampal and thalamic activity, along with strengthened connectivity between these regions and the mPFC during SO-spindle events, underscores a hippocampal-thalamic-neocortical information flow. This aligns with recent findings suggesting the thalamus orchestrates neocortical oscillations during sleep (Schreiner et al., 2022). The thalamus and hippocampus thus appear central to memory consolidation during sleep, guiding information transfer to the neocortex, e.g., mPFC.”

      (4) The study included an impressive number of 107 subjects. It is surprising though that only 31 subjects had to be excluded under these difficult recording conditions, especially since no adaptation night was performed. Since only subjects were excluded who slept less than 10 min (or had excessive head movements) there are likely several datasets included with comparably short durations and only a small number of SOs and spindles and even less combined SO-spindle events. A comprehensive table should be provided (supplement) including for each subject (included and excluded) the duration of included NREM sleep, number of SOs, spindles, and SO+spindle events. Also, some descriptive statistics (mean/SD/range) would be helpful.

      We appreciate your recognition of our sample size and the challenges associated with simultaneous EEG-fMRI sleep recordings. We acknowledge the importance of transparently reporting individual subject data, particularly regarding sleep duration and the number of detected SOs, spindles, and SO-spindle events. To address this, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (5)Density of detected SOs; (6)Density of detected spindles; (7)Density of detected SO-spindle coupling events.

      However, most of the excluded participants were unable to fall asleep or had too short a sleep duration, so they basically had no NREM sleep period, so it was impossible to count the NREM sleep duration, SO, spindle, and coupling numbers.

      Supplementary Materials, Page 42-54, Table S1-S4

      (5) Was the 20-channel head coil dedicated for EEG-fMRI measurements? How were the electrode cables guided through/out of the head coil? Usually, the 64-channel head coil is used for EEG-fMRI measurements in a Siemens PRISMA 3T scanner, which has a cable duct at the back that allows to guide the cables straight out of the head coil (to minimize MR-related artifacts). The choice for the 20-channel head coil should be motivated. Photos of the recording setup would also be helpful.

      Thank you for your comment regarding our choice of the 20-channel head coil for EEG-fMRI measurements. We acknowledge that the 64-channel head coil is commonly used in Siemens PRISMA 3T scanners; however, the 20-channel coil was selected due to specific practical and technical considerations in our study. In particular, the 20-channel head coil was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil allowed us to maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.

      We have made this clearer in the revised manuscript. 

      Methods, Page 20 Lines 385-392

      “All MRI data were acquired using a 20-channel head coil on a research-dedicated 3-Tesla Siemens Magnetom Prisma MRI scanner. Earplugs and cushions were provided for noise protection and head motion restriction. We chose the 20-channel head coil because it was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil helped maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.”

      (6) Was the EEG sampling synchronized to the MR scanner (gradient system) clock (the 10 MHz signal; not referring to the volume TTL triggers here)? This is a requirement for stable gradient artifact shape over time and thus accurate gradient noise removal.

      Thank you for raising this important point. We confirm that the EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This synchronization was achieved using the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift. As a result, the gradient artifact waveform remained stable across volumes, allowing for more effective artifact correction during preprocessing. We appreciate your attention to this critical aspect of EEG-fMRI data acquisition.

      We have made this clearer in the revised manuscript. 

      Methods, Page 19-20 Lines 371-383

      “EEG was recorded simultaneously with fMRI data using an MR-compatible EEG amplifier system (BrainAmps MR-Plus, Brain Products, Germany), along with a specialized electrode cap. The recording was done using 64 channels in the international 10/20 system, with the reference channel positioned at FCz. In order to adhere to polysomnography (PSG) recording standards, six electrodes were removed from the EEG cap: one for electrocardiogram (ECG) recording, two for electrooculogram (EOG) recording, and three for electromyogram (EMG) recording. EEG data was recorded at a sample rate of 5000 Hz, the resistance of the reference and ground channels was kept below 10 kΩ, and the resistance of the other channels was kept below 20 kΩ. To synchronize the EEG and fMRI recordings, the BrainVision recording software (BrainProducts, Germany) was utilized to capture triggers from the MRI scanner. The EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This was achieved via the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift.”

      (7) The TR is quite long and the voxel size is quite large in comparison to state-of-the-art EPI sequences. What was the rationale behind choosing a sequence with relatively low temporal and spatial resolution?

      We acknowledge that our chosen TR and voxel size are relatively long and large compared to state-of-the-art EPI sequences. This decision was made to optimize the signal-to-noise ratio (SNR) and reduce susceptibility-related distortions, which are particularly critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. A longer TR allowed us to sample whole-brain activity with sufficient coverage, while a larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures such as the thalamus and hippocampus, which are key regions of interest in our study. We appreciate your concern and hope this clarification provides sufficient rationale for our sequence parameters.

      We have made this clearer in the revised manuscript. 

      Methods, Page 20-21 Lines 398-408

      “Then, the “sleep” session began after the participants were instructed to try and fall asleep. For the functional scans, whole-brain images were acquired using k-space and steady-state T2*-weighted gradient echo-planar imaging (EPI) sequence that is sensitive to the BOLD contrast. This measures local magnetic changes caused by changes in blood oxygenation that accompany neural activity (sequence specification: 33 slices in interleaved ascending order, TR = 2000 ms, TE = 30 ms, voxel size = 3.5 × 3.5 × 4.2 mm3, FA = 90°, matrix = 64 × 64, gap = 0.7 mm). A relatively long TR and larger voxel size were chosen to optimize SNR and reduce susceptibility-related distortions, which are critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. The longer TR allowed whole-brain coverage with sufficient temporal resolution, while the larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures (e.g., the thalamus and hippocampus), which are key regions of interest in this study.”

      (8) The anatomically defined ROIs are quite large. It should be elaborated on how this might reduce sensitivity to sleep rhythm-specific activity within sub-regions, especially for the thalamus, which has distinct nuclei involved in sleep functions.

      We appreciate your insight regarding the use of anatomically defined ROIs and their potential limitations in detecting sleep rhythm-specific activity within sub-regions, particularly in the thalamus. Given the distinct functional roles of thalamic nuclei in sleep processes, we acknowledge that using a single, large thalamic ROI may reduce sensitivity to localized activity patterns. To address this, we will discuss this limitation in the revised manuscript, acknowledging that our approach prioritizes whole-structure effects but may not fully capture nucleus-specific contributions.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (9) The study reports SO & spindle amplitudes & densities, as well as SO+spindle coupling, to be larger during N2/3 sleep compared to N1 and REM sleep, which is trivial but can be seen as a sanity check of the data. However, the amount of SOs and spindles reported for N1 and REM sleep is concerning, as per definition there should be hardly any (if SOs or spindles occur in N1 it becomes by definition N2, and the interval between spindles has to be considerably large in REM to still be scored as such). Thus, on the one hand, the report of these comparisons takes too much space in the main manuscript as it is trivial, but on the other hand, it raises concerns about the validity of the scoring.

      We appreciate your concern regarding the reported presence of SOs and spindles in N1 and REM sleep and the potential implications. Our detection method for detecting SO, spindle, and coupling were originally designed only for N2&N3 sleep data based on the characteristics of the data itself, and this method is widely recognized and used in the sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). While, because the detection methods for SO and spindle are based on percentiles, this method will always detect a certain number of events when used for other stages (N1 and REM) sleep data, but the differences between these events and those detected in stage N23 remain unclear. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      (10) Why was electrode F3 used to quantify the occurrence of SOs and spindles? Why not a midline frontal electrode like Fz (or a number of frontal electrodes for SOs) and Cz (or a number of centroparietal electrodes) for spindles to be closer to their maximum topography?

      We appreciate your suggestion regarding electrode selection for SO and spindle quantification. Our choice of F3 was primarily based on previous studies (Massimini et al., 2004; Molle et al., 2011), where bilateral frontal electrodes are commonly used for detecting SOs and spindles. Additionally, we considered the impact of MRI-related noise and, after a comprehensive evaluation, determined that F3 provided an optimal balance between signal quality and artifact minimization. We also acknowledge that alternative electrode choices, such as Fz for SOs and Cz for spindles, could provide additional insights into their topographical distributions.

      (11) Functional connectivity (hippocampus -> thalamus -> cortex (mPFC)) is reported to be increased during SO-spindle coupling and interpreted as evidence for coordination of hippocampo-neocortical communication likely by thalamic spindles. However, functional connectivity was only analysed during coupled SO+spindle events, not during isolated SOs or isolated spindles. Without the direct comparison of the connectivity patterns between these three events, it remains unclear whether this is specific for coupled SO+spindle events or rather associated with one or both of the other isolated events. The PPIs need to be conducted for those isolated events as well and compared statistically to the coupled events.

      We appreciate your critical perspective on our functional connectivity analysis and the interpretation of hippocampus-thalamus-cortex (mPFC) interactions during SO-spindle coupling. We acknowledge that, in the current analysis, functional connectivity was only examined during coupled SO-spindle events, without direct comparison to isolated SOs or isolated spindles. To address this concern, we have conducted PPI analyses for all three ROIs(Hippocampus, Thalamus, mPFC) and all three event types (SO-spindle couplings, isolated SOs, and isolated spindles). Our results indicate that neither isolated SOs nor isolated Spindles yielded significant connectivity changes in all three ROIs, as all failed to survive multiple comparison corrections. This suggests that the observed connectivity increase is specific to SO-spindle coupling, rather than being independently driven by either SOs or spindles alone.

      Results, Page 14 Lines 248-255

      “Crucially, the interaction between FC and SO-spindle coupling revealed that only the functional connectivity of hippocampus -> thalamus (ROI analysis, t(106) = 1.86, p = 0.0328) and thalamus -> mPFC (ROI analysis, t(106) = 1.98, p = 0.0251) significantly increased during SO-spindle coupling, with no significant changes in all other pathways (Fig. 4e). We also conducted PPI analyses for the other two events (SOs and spindles), and neither yielded significant connectivity changes in the three ROIs, as all failed to survive whole-brain FWE correction at the cluster level (p < 0.05). Together, these findings suggest that the thalamus, likely via spindles, coordinates hippocampal-cortical communication selectively during SO-spindle coupling, but not isolated SOs or spindle events alone.”

      (12) The limited temporal resolution of fMRI does indeed not allow for easily distinguishing between fMRI activation patterns related to SO-up- vs. SO-down-states. For this, one could try to extract the amplitudes of SO-up- and SO-down-states separately for each SO event and model them as two separate parametric modulators (with the risk of collinearity as they are likely correlated).

      We appreciate your insightful comment regarding the challenge of distinguishing fMRI activation patterns related to SO-up vs. SO-down states due to the limited temporal resolution of fMRI. While our current analysis does not differentiate between these two phases, we acknowledge that separately modeling SO-up and SO-down states using parametric modulators could provide a more refined understanding of their distinct neural correlates. However, as you notes, this approach carries the risk of collinearity, and there is indeed a high correlation between the two amplitudes across all subjects in our results (r=0.98). Future studies could explore more on leveraging high-temporal-resolution techniques. While implementing this in the current study is beyond our scope, we will acknowledge this limitation in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (13) L327: "It is likely that our findings of diminished DMN activity reflect brain activity during the SO DOWN-state, as this state consistently shows higher amplitude compared to the UP-state within subjects, which is why we modelled the SO trough as its onset in the fMRI analysis." This conclusion is not justified as the fact that SO down-states are larger in amplitude does not mean their impact on the BOLD response is larger.

      We appreciate your concern regarding our interpretation of diminished DMN activity reflecting the SO down-state. We acknowledge that the current expression is somewhat misleading, and our interpretation of it is: it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. And we will make this clear in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”

      (14) Line 77: "In the current study, while directly capturing hippocampal ripples with scalp EEG or fMRI is difficult, we expect to observe hippocampal activation in fMRI whenever SOs-spindles coupling is detected by EEG, if SOs- spindles-ripples triple coupling occurs during human NREM sleep". Not all SO-spindle events are associated with ripples (Staresina et al., 2015), but hippocampal activation may also be expected based on the occurrence of spindles alone (Bergmann et al., 2012).

      We appreciate your clarification regarding the relationship between SO-spindle coupling and hippocampal ripples. We acknowledge that not all SO-spindle events are necessarily accompanied by ripples (Staresina et al., 2015). However, based on previous research, we found that hippocampal ripples are significantly more likely to occur during SO-spindle coupling events. This suggests that while ripple occurrence is not guaranteed, SO-spindle coupling creates a favorable network state for ripple generation and potential hippocampal activation. To ensure accuracy, we will revise the manuscript to delete this misleading sentence in the Introduction section and acknowledge in the Discussion that our results cannot conclusively directly observe the triple coupling of SO, spindle, and hippocampal ripples.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      Reviewer #2 (Public review):

      In this study, Wang and colleagues aimed to explore brain-wide activation patterns associated with NREM sleep oscillations, including slow oscillations (SOs), spindles, and SO-spindle coupling events. Their findings reveal that SO-spindle events corresponded with increased activation in both the thalamus and hippocampus. Additionally, they observed that SO-spindle coupling was linked to heightened functional connectivity from the hippocampus to the thalamus, and from the thalamus to the medial prefrontal cortex-three key regions involved in memory consolidation and episodic memory processes.

      This study's findings are timely and highly relevant to the field. The authors' extensive data collection, involving 107 participants sleeping in an fMRI while undergoing simultaneous EEG recording, deserves special recognition. If shared, this unique dataset could lead to further valuable insights. While the conclusions of the data seem overall well supported by the data, some aspects with regard to the detection of sleep oscillations need clarification.

      The authors report that coupled SO-spindle events were most frequent during NREM sleep (2.46 [plus minus] 0.06 events/min), but they also observed a surprisingly high occurrence of these events during N1 and REM sleep (2.23 [plus minus] 0.09 and 2.32 [plus minus] 0.09 events/min, respectively), where SO-spindle coupling would not typically be expected. Combined with the relatively modest SO amplitudes reported (~25 µV, whereas >75 µV would be expected when using mastoids as reference electrodes), this raises the possibility that the parameters used for event detection may not have been conservative enough - or that sleep staging was inaccurately performed. This issue could present a significant challenge, as the fMRI findings are largely dependent on the reliability of these detected events.

      Thank you very much for your thorough and encouraging review. We appreciate your recognition of the significance and relevance of our study and dataset, particularly in highlighting how simultaneous EEG-fMRI recordings can provide complementary insights into the temporal dynamics of neural oscillations and their associated spatial activation patterns during sleep. In the sections that follow, we address each of your comments in detail. We have revised the text and conducted additional analyses wherever possible to strengthen our argument, clarify our methodological choices. We believe these revisions improve the clarity and rigor of our work, and we thank you for helping us refine it.

      We appreciate your insightful comments regarding the detection of sleep oscillations. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Regarding the reported SO amplitudes (~25 µV), during preprocessing, we applied the Signal Space Projection (SSP) method to more effectively remove MRI gradient artifacts and cardiac pulse noise. While this approach enhances data quality, it also reduces overall signal power, leading to systematically lower reported amplitudes. Despite this, our SO detection in NREM sleep (especially N2/N3) remain physiologically meaningful and are consistent with previous fMRI studies using similar artifact removal techniques. We appreciate your careful evaluation and valuable suggestions.

      In addition, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (2)Density of detected SOs; (3)Density of detected spindles; (4)Density of detected SO-spindle coupling events.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      Supplementary Materials, Page 42-54, Table S1-S4

      Reviewer #3 (Public review):

      Summary:

      Wang et al., examined the brain activity patterns during sleep, especially when locked to those canonical sleep rhythms such as SO, spindle, and their coupling. Analyzing data from a large sample, the authors found significant coupling between spindles and SOs, particularly during the upstate of the SO. Moreover, the authors examined the patterns of whole-brain activity locked to these sleep rhythms. To understand the functional significance of these brain activities, the authors further conducted open-ended cognitive state decoding and found a variety of cognitive processing may be involved during SO-spindle coupling and during other sleep events. The authors next investigated the functional connectivity analyses and found enhanced connectivity between the hippocampus, the thalamus, and the medial PFC. These results reinforced the theoretical model of sleep-dependent memory consolidation, such that SO-spindle coupling is conducive to systems-level memory reactivation and consolidation.

      Strengths:

      There are obvious strengths in this work, including the large sample size, state-of-the-art neuroimaging and neural oscillation analyses, and the richness of results.

      Weaknesses:

      Despite these strengths and the insights gained, there are weaknesses in the design, the analyses, and inferences.

      Thank you for your detailed and thoughtful review of our manuscript. We are delighted that you recognize our advanced analysis methods and rich results of neuroimaging and neural oscillations as well as the large sample size data. In the following sections, we provide detailed responses to each of your comments. And we have revised the text and conducted additional analyses to strengthen our arguments and clarify our methodological choices. We believe these revisions enhance the clarity and rigor of our work, and we sincerely appreciate your thoughtful feedback in helping us refine the manuscript.

      (1) A repeating statement in the manuscript is that brain activity could indicate memory reactivation and thus consolidation. This is indeed a highly relevant question that could be informed by the current data/results. However, an inherent weakness of the design is that there is no memory task before and after sleep. Thus, it is difficult (if not impossible) to make a strong argument linking SO/spindle/coupling-locked brain activity with memory reactivation or consolidation.

      We appreciate your suggestion regarding the lack of a pre- and post-sleep memory task in our study design. We acknowledge that, in the absence of behavioral measures, it is hard to directly link SO-spindle coupling to memory consolidation in an outcome-driven manner. Our interpretation is instead based on the well-established role of these oscillations in memory processes, as demonstrated in previous studies. We sincerely appreciate this feedback and will adjust our Discussion accordingly to reflect a more precise interpretation of our findings.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (2) Relatedly, to understand the functional implications of the sleep rhythm-locked brain activity, the authors employed the "open-ended cognitive state decoding" method. While this method is interesting, it is rather indirect given that there were no behavioral indices in the manuscript. Thus, discussions based on these analyses are speculative at best. Please either tone down the language or find additional evidence to support these claims.

      Moreover, the results from this method are difficult to understand. Figure 3e showed that for all three types of sleep events (SO, spindle, SO-spindle), the same mental states (e.g., working memory, episodic memory, declarative memory) showed opposite directions of activation (left and right panels showed negative and positive activation, respectively). How to interpret these conflicting results? This ambiguity is also reflected by the term used: declarative memory and episodic memories are both indexed in the results. Yet these two processes can be largely overlapped. So which specific memory processes do these brain activity patterns reflect? The Discussion shall discuss these results and the limitations of this method.

      We appreciate your critical assessment of the open-ended cognitive state decoding method and its interpretational challenges. Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. 

      Due to the complexity of memory-related processes, we acknowledge that distinguishing between episodic and declarative memory based solely on this approach is not straightforward. We will revise the Supplementary Materials to explicitly discuss these limitations and clarify that our findings do not isolate specific cognitive processes but rather suggest general associations with memory-related networks.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potenial functional claims.”

      (3) The coupling strength is somehow inconsistent with prior results (Hahn et al., 2020, eLife, Helfrich et al., 2018, Neuron). Specifically, Helfrich et al. showed that among young adults, the spindle is coupled to the peak of the SO. Here, the authors reported that the spindles were coupled to down-to-up transitions of SO and before the SO peak. It is possible that participants' age may influence the coupling (see Helfrich et al., 2018). Please discuss the findings in the context of previous research on SO-spindle coupling.

      We appreciate your concern regarding the temporal characteristics of SO-spindle coupling. We acknowledge that the SO-spindle coupling phase results in our study are not identical to those reported by Hahn et al. (2020); Helfrich et al. (2018). However, these differences may arise due to slight variations in event detection parameters, which can influence the precise phase estimation of coupling. Notably, Hahn et al. (2020) also reported slight discrepancies in their group-level coupling phase results, highlighting that methodological differences can contribute to variability across studies. Furthermore, our findings are consistent with those of Schreiner et al. (2021), further supporting the robustness of our observations.  

      That said, we acknowledge that our original description of SO-spindle coupling as occurring at the "transition from the lower state to the upper state" was not entirely precise. The -π/2 phase represents the true transition point, while our observed coupling phase is actually closer to the SO peak rather than strictly at the transition. We will revise this statement in the manuscript to ensure clarity and accuracy in describing the coupling phase.  

      Discussion, Page 16 Lines 283-291

      “Our data provide insights into the neurobiological underpinnings of these sleep rhythms. SOs, originating mainly in neocortical areas such as the mPFC, alternate between DOWN- and UP-states. The thalamus generates sleep spindles, which in turn couple with SOs. Our finding that spindle peaks consistently occurred slightly before the UP-state peak of SOs (in 83 out of 107 participants), concurs with prior studies, including Schreiner et al. (2021). Yet it differs from some results suggesting spindles might peak right at the SO UP-state (Hahn et al., 2020; Helfrich et al., 2018). Such discrepancies could arise from differences in detection algorithms, participant age (Helfrich et al., 2018), or subtle variations in cortical-thalamic timing. Nonetheless, these results underscore the importance of coordinated SO-spindle interplay in supporting sleep-dependent processes.”

      (4) The discussion is rather superficial with only two pages, without delving into many important arguments regarding the possible functional significance of these results. For example, the author wrote, "This internal processing contrasts with the brain patterns associated with external tasks, such as working memory." Without any references to working memory, and without delineating why WM is considered as an external task even working memory operations can be internal. Similarly, for the interesting results on SO and reduced DMN activity, the authors wrote "The DMN is typically active during wakeful rest and is associated with self-referential processes like mind-wandering, daydreaming, and task representation (Yeshurun, Nguyen, & Hasson, 2021). Its reduced activity during SOs may signal a shift towards endogenous processes such as memory consolidation." This argument is flawed. DMN is active during self-referential processing and mind-wandering, i.e., when the brain shifts from external stimuli processing to internal mental processing. During sleep, endogenous memory reactivation and consolidation are also part of the internal mental processing given the lack of external environmental stimulation. So why during SO or during memory consolidation, the DMN activity would be reduced? Were there differences in DMN activity between SO and SO-spindle coupling events?

      We appreciate your concerns regarding the brevity of the discussion and the need for clearer theoretical arguments. We will expand this section to provide more in-depth interpretations of our findings in the context of prior literature. Regarding working memory (WM), we acknowledge that our phrasing was ambiguous. We will modify this statement in the Discussion section.

      For the SO-related reduction in DMN activity, we recognize the need for a more precise explanation. This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state.

      To address your final question, we have conducted the additional post hoc comparison of DMN activity between isolated SOs and SO-spindle coupling events. Our results indicate that

      DMN activation during SOs was significantly lower than during SO-spindle coupling (t(106) = -4.17, p < 1e-4). This suggests that SO-spindle coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. We appreciate your constructive feedback and will integrate these expanded analyses and discussions into our revised manuscript.

      Results, Page 11 Lines 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Discussion, Page 17-18 Lines 308-332

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      Recommendations for the authors:

      Reviewing Editor Comment:

      The reviewers think that you are working on a relevant and important topic. They are praising the large sample size used in the study. The reviewers are not all in line regarding the overall significance of the findings, but they all agree the paper would strongly benefit from some extra work, as all reviewers raise various critical points that need serious consideration.

      We appreciate your recognition of the relevance and importance of our study, as well as your acknowledgment of the large sample size as a strength of our work. We understand that there are differing perspectives regarding the overall significance of our findings, and we value the constructive critiques provided. We are committed to addressing the key concerns raised by all reviewers, including refining our analyses, clarifying our interpretations, and incorporating additional discussions to strengthen the manuscript. Below, we address your specific recommendations and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We believe that these revisions will significantly enhance the rigor and impact of our study, and we sincerely appreciate your thoughtful feedback in helping us improve our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The phrase "overnight sleep" suggests an entire night, while these were rather "nocturnal naps". Please rephrase.

      Response: Thank you for pointing this out. We have revised the phrasing in our manuscript to "nocturnal naps" instead of "overnight sleep" to more accurately reflect the duration of the sleep recordings.

      (2) Sleep staging results (macroscopic sleep architecture) should be provided in more detail (at least min and % of the different sleep stages, sleep onset latency, total sleep duration, total recording duration), at least mean/SD/range.

      Thank you for this suggestion. We will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics. This information will help provide a clearer overview of the macroscopic sleep architecture in our dataset.

      Reviewer #2 (Recommendations for the authors):

      In order to allow for a better estimation of the reliability of the detected sleep events, please:

      (1) Provide densities and absolute numbers of all detected SOs and spindles (N1, NREM, and REM sleep).

      Thank you for pointing this out. We will provide comprehensive tables in the supplementary materials, contains detailed information about sleep waves at each sleep stage for all 107 subjects (Table S2-S4), listing for each subject:1) Different sleep stage duration; 2) Number of detected SOs; 3) Number of detected spindles; 4) Number of detected SO-spindle coupling events; 5) Density of detected SOs; 6) Density of detected spindles; 7) Density of detected SO-spindle coupling events.

      Supplementary Materials, Page 43-54, Table S2-S4

      (2) Show ERPs for all detected SOs and spindles (per sleep stage).

      Thank you for the suggestion. We will provide ERPs for all detected SOs and spindles, separated by sleep stage (N1, N2&N3, and REM) in supplementary Fig. S2-S4. These ERP waveforms will help illustrate the characteristic temporal profiles of SOs and spindles across different sleep stages.

      Methods, Page 25, Line 525-532

      “Event-related potentials (ERP) analysis. After completing the detection of each sleep rhythm event, we performed ERP analyses for SOs, spindles, and coupling events in different sleep stages. Specifically, for SO events, we took the trough of the DOWN-state of each SO as the zero-time point, then extracted data in a [-2 s to 2 s] window from the broadband (0.1–30 Hz) EEG and used [-2 s to -0.5 s] for baseline correction; the results were then averaged across 107 subjects (see Fig. S2a). For spindle events, we used the peak of each spindle as the zero-time point and applied the same data extraction window and baseline correction before averaging across 107 subjects (see Fig. S2b). Finally, for SO-spindle coupling events, we followed the same procedure used for SO events (see Fig. 2a, Figs. S3–S4).”

      (3) Provide detailed info concerning sleep characteristics (time spent in each sleep stage etc.).

      Thank you for this suggestion. Same as the response above, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics.

      Supplementary Materials, Page 42, Table S1 (same as above)

      (4) What would happen if more stringent parameters were used for event detection? Would the authors still observe a significant number of SO spindles during N1 and REM? Would this affect the fMRI-related results?

      Thank you for this suggestion. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).

      Furthermore, in order to explore the impact of this on our fMRI results, we conducted an additional sensitivity analysis by applying different detection parameters for SOs. Specifically, we adjusted amplitude percentile thresholds for SO detection (the parameter that has the greatest impact on the results). We used the hippocampal activation value during N2&N3 stage SO-spindle coupling as an anchor value and found that when the parameters gradually became stricter, the results were similar to or even better than the current results. However, when we continued to increase the threshold, the results began to gradually decrease until the threshold was increased to 80%, and the results were no longer significant. This indicates that our results are robust within a specific range of parameters, but as the threshold increases, the number of trials decreases, ultimately weakening the statistical power of the fMRI analysis.

      Thank you again for your suggestions on sleep rhythm event detection. We will add the results in Supplementary and revise our manuscript accordingly.

      Results, Page 11, Line 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Finally, we sincerely thank all again for your thoughtful and constructive feedback. Your insights have been invaluable in refining our analyses, strengthening our interpretations, and improving the clarity and rigor of our manuscript. We appreciate the time and effort you have dedicated to reviewing our work, and we are grateful for the opportunity to enhance our study based on your recommendations.  

      References:

      Bergmann, T. O., Mölle, M., Diedrichs, J., Born, J., & Siebner, H. R. (2012). Sleep spindle-related reactivation of category-specific cortical regions after learning face-scene associations. NeuroImage, 59(3), 2733-2742. 

      Buzsáki, G. (2015). Hippocampal sharp wave‐ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188. 

      Caporro, M., Haneef, Z., Yeh, H. J., Lenartowicz, A., Buttinelli, C., Parvizi, J., & Stern, J. M. (2012). Functional MRI of sleep spindles and K-complexes. Clinical neurophysiology, 123(2), 303-309. 

      Coulon, P., Budde, T., & Pape, H.-C. (2012). The sleep relay—the role of the thalamus in central and decentral sleep regulation. Pflügers Archiv-European Journal of Physiology, 463, 53-71. 

      Crunelli, V., Lőrincz, M. L., Connelly, W. M., David, F., Hughes, S. W., Lambert, R. C., Leresche, N., & Errington, A. C. (2018). Dual function of thalamic low-vigilance state oscillations: rhythm-regulation and plasticity. Nature Reviews Neuroscience, 19(2), 107-118. 

      Czisch, M., Wehrle, R., Stiegler, A., Peters, H., Andrade, K., Holsboer, F., & Sämann, P. G. (2009). Acoustic oddball during NREM sleep: a combined EEG/fMRI study. PloS one, 4(8), e6749. 

      Diba, K., & Buzsáki, G. (2007). Forward and reverse hippocampal place-cell sequences during ripples. Nature Neuroscience, 10(10), 1241. 

      Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114-126. 

      Fogel, S., Albouy, G., King, B. R., Lungu, O., Vien, C., Bore, A., Pinsard, B., Benali, H., Carrier, J., & Doyon, J. (2017). Reactivation or transformation? Motor memory consolidation associated with cerebral activation time-locked to sleep spindles. PloS one, 12(4), e0174755. 

      Hahn, M. A., Heib, D., Schabus, M., Hoedlmoser, K., & Helfrich, R. F. (2020). Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9, e53730. 

      Halassa, M. M., Siegle, J. H., Ritt, J. T., Ting, J. T., Feng, G., & Moore, C. I. (2011). Selective optical drive of thalamic reticular nucleus generates thalamic bursts and cortical spindles. Nature Neuroscience, 14(9), 1118-1120. 

      Hale, J. R., White, T. P., Mayhew, S. D., Wilson, R. S., Rollings, D. T., Khalsa, S., Arvanitis, T. N., & Bagshaw, A. P. (2016). Altered thalamocortical and intra-thalamic functional connectivity during light sleep compared with wake. NeuroImage, 125, 657-667. 

      Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J., & Knight, R. T. (2019). Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572. 

      Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T., & Walker, M. P. (2018). Old brains come uncoupled in sleep: slow wave-spindle synchrony, brain atrophy, and forgetting. Neuron, 97(1), 221-230. e224. 

      Horovitz, S. G., Fukunaga, M., de Zwart, J. A., van Gelderen, P., Fulton, S. C., Balkin, T. J., & Duyn, J. H. (2008). Low frequency BOLD fluctuations during resting wakefulness and light sleep: A simultaneous EEG‐fMRI study. Human brain mapping, 29(6), 671-682. 

      Huang, Q., Xiao, Z., Yu, Q., Luo, Y., Xu, J., Qu, Y., Dolan, R., Behrens, T., & Liu, Y. (2024). Replay-triggered brain-wide activation in humans. Nature Communications, 15(1), 7185. 

      Ilhan-Bayrakcı, M., Cabral-Calderin, Y., Bergmann, T. O., Tüscher, O., & Stroh, A. (2022). Individual slow wave events give rise to macroscopic fMRI signatures and drive the strength of the BOLD signal in human resting-state EEG-fMRI recordings. Cerebral Cortex, 32(21), 4782-4796. 

      Laufs, H. (2008). Endogenous brain oscillations and related networks detected by surface EEG‐combined fMRI. Human brain mapping, 29(7), 762-769. 

      Laufs, H., Walker, M. C., & Lund, T. E. (2007). ‘Brain activation and hypothalamic functional connectivity during human non-rapid eye movement sleep: an EEG/fMRI study’—its limitations and an alternative approach. Brain, 130(7), e75. 

      Margulies, D. S., Ghosh, S. S., Goulas, A., Falkiewicz, M., Huntenburg, J. M., Langs, G., Bezgin, G., Eickhoff, S. B., Castellanos, F. X., & Petrides, M. (2016). Situating the default-mode network along a principal gradient of macroscale cortical organization. Proceedings of the National Academy of Sciences, 113(44), 12574-12579. 

      Massimini, M., Huber, R., Ferrarelli, F., Hill, S., & Tononi, G. (2004). The sleep slow oscillation as a traveling wave. Journal of Neuroscience, 24(31), 6862-6870. 

      Moehlman, T. M., de Zwart, J. A., Chappel-Farley, M. G., Liu, X., McClain, I. B., Chang, C., Mandelkow, H., Özbay, P. S., Johnson, N. L., & Bieber, R. E. (2019). All-night functional magnetic resonance imaging sleep studies. Journal of neuroscience methods, 316, 83-98. 

      Molle, M., Bergmann, T. O., Marshall, L., & Born, J. (2011). Fast and slow spindles during the sleep slow oscillation: disparate coalescence and engagement in memory processing. Sleep, 34(10), 1411-1421. 

      Ngo, H.-V., Fell, J., & Staresina, B. (2020). Sleep spindles mediate hippocampal-neocortical coupling during long-duration ripples. Elife, 9, e57011. 

      Picchioni, D., Horovitz, S. G., Fukunaga, M., Carr, W. S., Meltzer, J. A., Balkin, T. J., Duyn, J. H., & Braun, A. R. (2011). Infraslow EEG oscillations organize large-scale cortical– subcortical interactions during sleep: a combined EEG/fMRI study. Brain research, 1374, 63-72. 

      Schabus, M., Dang-Vu, T. T., Albouy, G., Balteau, E., Boly, M., Carrier, J., Darsaud, A., Degueldre, C., Desseilles, M., & Gais, S. (2007). Hemodynamic cerebral correlates of sleep spindles during human non-rapid eye movement sleep. Proceedings of the National Academy of Sciences, 104(32), 13164-13169. 

      Schreiner, T., Kaufmann, E., Noachtar, S., Mehrkens, J.-H., & Staudigl, T. (2022). The human thalamus orchestrates neocortical oscillations during NREM sleep. Nature communications, 13(1), 5231. 

      Schreiner, T., Petzka, M., Staudigl, T., & Staresina, B. P. (2021). Endogenous memory reactivation during sleep in humans is clocked by slow oscillation-spindle complexes. Nature Communications, 12(1), 3112. 

      Singh, D., Norman, K. A., & Schapiro, A. C. (2022). A model of autonomous interactions between hippocampus and neocortex driving sleep-dependent memory consolidation. Proceedings of the National Academy of Sciences, 119(44), e2123432119. 

      Spoormaker, V. I., Schröter, M. S., Gleiser, P. M., Andrade, K. C., Dresler, M., Wehrle, R., Sämann, P. G., & Czisch, M. (2010). Development of a large-scale functional brain network during human non-rapid eye movement sleep. Journal of Neuroscience, 30(34), 11379-11387. 

      Staresina, B. P., Bergmann, T. O., Bonnefond, M., van der Meij, R., Jensen, O., Deuker, L., Elger, C. E., Axmacher, N., & Fell, J. (2015). Hierarchical nesting of slow oscillations, spindles and ripples in the human hippocampus during sleep. Nature Neuroscience, 18(11), 1679-1686. 

      Staresina, B. P., Niediek, J., Borger, V., Surges, R., & Mormann, F. (2023). How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nature Neuroscience, 1-9. 

      Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature methods, 8(8), 665-670. 

      Yeshurun, Y., Nguyen, M., & Hasson, U. (2021). The default mode network: where the idiosyncratic self meets the shared social world. Nature Reviews Neuroscience, 1-12.

    1. Author response:

      The following is the authors’ response to the original reviews

      Main revision made to the manuscript

      The main revision made to the manuscript is to reconcile our findings with the line attractor model. The revision is based on Reviewer 1’s comment on reinterpreting our results as a superposition of an attractor model with fast timescale dynamics. We expanded our analysis regime to the start of a trial and characterized the overall within-trial dynamics to reinterpret our findings.

      We first acknolwedge that our results are not in contradiction with evidence integration on a line attractor. As pointed out by the reviewers, our finding that the integration of reward outcome explains the reversal probability activity x_rev (Figure 3) is compatible with the line attractor model. However, the reward integration equation is an algebraic relation and does not characterize the dynamics of reversal probability activity. So a closer analysis on the neural dynamics is needed to assess the feasibility of line attractor.

      In the revised manuscript, we show that x_rev exhibits two different activity modes (Figure 4). First, x_rev has substantial non-stationary dynamics during a trial, and this non-stationary activity is incompatible with the line attractor model, as claimed in the original manuscript. Second, we present new results showing that x_rev is stationary (i.e., constant in time) and stable (i.e., contracting) at the start of a trial. These two properties of x_rev support that it is a point attractor at the start of a trial and is compatible with the line attractor model. 

      We further analyze how the two activity modes are linked (Figure 4, Support vector regression). We show that the non-stationary activity is predictable from the stationary activity if the underlying dynamics can be inferred. In other words, the non-stationary activity during a trial is generated by an underlying dynamics with the initial condition provided by the stationary state at the start of trial.

      These results suggest an extension of the line attractor model where an attractor state at the start of a trial provides an initial condition from which non-stationary activity is generated during a trial by an underlying dynamics associated with task-related behavior (Figure 4, Augmented model). 

      The separability of non-stationary trajectories (Figure 5 and 6) is a property of the non-stationary dynamics that allows separable points in the initial stationary state to remain separable during a trial, thus making it possible to represent distinct probabilistic values in non-stationary activity.

      This revised interpretation of our results (1) retains our original claim that the non-stationary dynamics during a trial is incompatible with the line attractor model and (2) introduces attractor state at the start of a trial which is compatible with the line attractor model. Our anlaysis shows that the two activity modes are linked by an underlying dynamics, and the attractor state serves as initial state to launch the non-stationary activity.

      Responses to the Public Reviews:

      Reviewer # 1:

      (1) To provide better explanation of the reversal learning task and network training method, we added detailed description of RNN and monkey task structure (Result Section 1), included a schematic of target outputs (Figure1B), explained the rationale behind using inhibitory network model (Method Section 1) and explained the supervised RNN training scheme (Result Section 1). This information can also be found in the Methods.

      (2) Our understanding is that the augmented model discussed in the previous page is aligned with the model suggested by Reviewer 1: “a curved line attractor, with faster timescale dynamics superimposed on this structure”. It is likely that the “fast” non-stationary activity observed during the trial is driven by task-related behavior, thus is transient. For instance, we do not observe such non-stationary activity in the inter-trial-interval when the task-related behavior is absent. For this reason, the non-stationary trajectories were not considered to be part of the attractor. Instead, they are transient activity generated by the underlying neural dynamics associated with task-related behavior. We believe such characterization of faster timescale dynamics is consistent with Reviewer 1’s view and wanted to clarify that there are two different activity modes.

      (3) We appreciate the reviewers (Reviewer 1 and Reviewer 2) comment that TDR may be limited in isolating the neural subspace of interest. Our study presents what could be learned from TDR but is by no means the only way to interpret the neural data. It would be of future work to apply other methods for isolating task-related neural activities.

      We would appreciate it if the reviewers could share thoughts on what other alternative methods could better isolate the reversal probability activity.

      Reviewer # 2:

      (1) (i) We respectfully disagree with Reviewer 2’s comment that “no action is required to be performed by neurons in the RNN”. In our network setup, the output of RNN learns to choose a sign (+ or -), as Reviewer 2 pointed out, to make a choice. This is how the RNN takes an action. It is unclear to us what Reviewer 2 has intended by “action” and how reaching a target value (not just taking a sign) would make a significant difference in how the network performs the task. 

      (ii)  From Reviewer 2’s comment that “no intervening behavior is thus performed by neurons”, we noticed that the term “intervening behavior” has caused confusion. It refers to task-related behavior, such as making choices or receiving reward, that the subject must perform across trials before reversing its preferred choice. These are the behaviors that intervene the reversal of preferred choice. To clarify its meaning, in the revised manuscript, we changed the term to “task-related behavior” and put them in context. For example, in the Introduction we state that “However, during a trial, task-related behavior, such as making decisions or receiving feedback, produced …”

      (iii) As pointed out by Reviewer 2, the lack of fixation period in the RNN could make differences in the neural dynamics of RNN and PFC, especially at the start of a trial. We demonstrate this issue in Result Section 4 where we analyze the stationary activity at the start of a trial. We find that fixating the choice output to zero before making a choice promotes stationary activity and makes the RNN activity more similar to the PFC activity.

      Reviewer #3:

      (1) (i) In the previous study (Figure 1 in [Bartolo and Averbeck ‘20]), it was shown that neural activity can predict the behavioral reversal trial. This is the reason we examined the neural activity in the trials centered at the behavioral reversal trial. We explained in Result Section 2 that we followed this line of analysis in our study.

      (ii) We would like to emphasize that the main point of Figures 4 and 5 is to show the separability of neural trajectories: the entire trajectory shifts without overlapping. It is not obvious that high-dimensional neural population activity from two trials should remain separated when their activities are compressed into a one-dimensional subspace. The onedimensional activities can easily collide since their activities are compressed into a lowdimensional space. We revised the manuscript to bring out these points. We added an opening paragraph that discusses separability of trajectories and revised the main text to bring out the findings on separability. 

      (iii) We agree with Reviewer 3 that it would be interesting to look at what happens in other subspace of neural activity that are not related to reversal probability and characterize how different neural subspace interact with each. However, the focus of this paper was the reversal probability activity, and we’d consider these questions out of the scope of current paper. We point out that, using the same dataset, neural activity related to other experimental variables were analyzed in other papers [Bartolo and Averbeck ’20; Tang, Bartolo and Averbeck ‘21] 

      (2) (i) In the revised manuscript, we added explanation on the rational behind choosing inhibitory network as a simplified model for the balanced state. In brief, strong inhibitory recurrent connections with strong excitatory external input operates in the balanced state, as in the standard excitatory-inhibitory network. We included references that studied this inhibitory network. We also explained the technical reason (GPU memory) for choosing the inhibitory model.

      (ii) We thank the reviewer for pointing out that the original manuscript did not mention how the feedback and cue were initialized. They were random vectors sample from Gaussian distribution. We added this information in the revised manuscript. In our opinion, it is common to use random external inputs for training RNNs, as it is a priori unclear how to choose them. In fact, it is possible to analyze the effects of random feedback on one-dimensional x_rev dynamics by projecting the random feedback vector to the reversal probability vector. This is shown in Figure 4F.

      (iii) We agree that it would be more natural to train the RNN to solve the task without using the Bayesian model. We point out this issue in the Discussion in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1:

      (1) My understanding of network training was that a Bayesian ideal observer signaled target output based on previous reward outcomes. However, the authors never mention that networks are trained by supervised learning in the main text until the last paragraph of the discussion. There is no mention that there was an offset in the target based on the behavior of the monkeys in the main text. These are really important things to consider in the context of the network solution after training. I couldn't actually find any figure that presents the target output for the network. Did I miss something key here?

      In Result Section 1, we added a paragraph that describes in detail how the RNN is trained. We explained that the network is first simulated and then the choice outputs and reward outcomes are fed into the Bayesian model to infer the scheduled reversal trial. A few trials are added to the inferred reversal trial to obtain the behavioral reversal trial, as found in a previous study [Bartolo and Averbeck ‘20]. Then the network weights are updated by backpropagation-through-time via supervised learning. 

      In the original manuscript, the target output for the network was described in Methods Section 2.5, Step 4. To make this information readily accessible, we added a schematic in Figure 1B that shows the scheduled, inferred and behavioral reversal trials. It also shows how the target choice ouputs are defined. They switch abruptly at the behavioral reversal trial.

      (2) The role of block structure in the task is an important consideration. What are the statistics of block switches? The authors say on average the reversals are every 36 trials, but also say there are random block switches. The reviewer's notes suggest that both the networks and monkeys may be learning about the typical duration of blocks, which could influence their expectations of reversals. This aspect of the task design should be explained more thoroughly and considered in the context of Figure 1E and 5 results.

      We provided more detailed description of the reversal learning task in Result Section 1. We clarified that (1) a task is completed by executing a block of fixed number of trials and (2) reversal of reward schedule occurrs at a random trial around the mid-trial in a block. The differences in the number of trials in a block that the RNNs (36) and the monkeys (80) perform are also explained. We also pointed out the differences in how the reversal trial is randomly sampled.

      However, it is unclear what Reviewer 1 meant by random block switches. Our reversal learning task is completed when a block of fixed number of trials is executed. Reversal of reward schedule occurs only once on a randomly selected trial in the block, and the reversed reward schedule is maintained until the end of a block. It is different from other versions of reveral learning where the reward schedule switches multiple times across trials. We clarified this point in Result Section 1.

      (3) The relationship between the supervised learning approach used in the RNNs and reinforcement learning was confused in the discussion. "Although RNNs in our study were trained via supervised learning, animals learn a reversal-learning task from reward feedback, making it into a reinforcement learning (RL) problem." This is fundamentally not true. In the case of this work, the outcome of the previous trial updates the target output, rather than the trial and error type learning as is typical in reinforcement learning. Networks are not learning by reinforcement learning and this statement is confusing.

      We agree with Reviewer 1’s comment that the statement in the original manuscript is confusing. Our intention was to point out that our study used supervised learning, and this is different from animals learn by reinforcement learning in rea life. We revised the sentence in Discussion as follows:

      “The RNNs in our study were trained via supervised learning. However, in real life, animals learn a reversal learning task via reinforcement learning (RL), i.e., learn the task from reward outcomes.”

      (4) The distinction between line attractors and the dynamic trajectories described by the authors deserves further investigation. A significant concern arises from the authors' use of targeted dimensionality reduction (TDR), a form of regression, to identify the axis determining reversal probability. While this approach can reveal interesting patterns in the data, it may not necessarily isolate the dimension along which the RNN computes reversal probability. This limitation could lead to misinterpretation of the underlying neural dynamics.

      a) This manuscript cites work described in "Prefrontal cortex as a meta-reinforcement learning system," which examined a similar task. In that study, the authors identified a v-shaped curve in the principal component space of network states, representing the probability of choosing left or right.

      Importantly, this curve is topologically equivalent to a line and likely represents a line attractor. However, regressing against reversal probability in such a case would show that a single principal component (PC2) directly correlates with reversal probability.

      b) The dynamics observed in the current study bear a striking resemblance to this structure, with the addition of intervening loops in the network state corresponding to within-trial state evolution. Crucially, these observations do not preclude the existence of a line attractor. Instead, they may reflect the network's need to produce fast timescale dynamics within each trial, superimposed on the slower dynamics of the line attractor.

      c) This alternative interpretation suggests that reward signals could function as inputs that shift the network state along the line attractor, with information being maintained across trials. The fast "intervening behaviors" observed by the authors could represent faster timescale dynamics occurring on top of the underlying line attractor dynamics, without erasing the accumulated evidence for reversals.

      d) Given these considerations, the authors' conclusion that their results are better described by separable dynamic trajectories rather than fixed points on a line attractor may be premature. The observed dynamics could potentially be reconciled with a more nuanced understanding of line attractor models, where the attractor itself may be curved and coexist with faster timescale dynamics.

      We appreciate the insightful comments on (1) the similarity of the work by Wang et al ’18 with our findings and (2) an alternative interpretation that augments the line attractor with fast timescale dynamics. 

      (1) We added a discussion of the work by Wang et al ’18 in Result Section 2 to point out the similarity of their findings in the principal component space with ours in the x_rev and x_choice space. We commented that such network dynamics could emerge when learning to perform the reversal learning the task, regardless of the training schemes. 

      We also mention that the RL approach in Wang et al ’18 does not consider within-trial dynamics, therefore lacks the non-stationary activity observed during the trial in the PFC of monkeys and our trained RNNs.

      (2) We revised our original manuscript substantially to reconcile the line attractor model with the nonstationary activity observed during a trial. 

      Here are the highlights of the revised interpretation of the PFC and the RNN network activity

      - The dynamics of x_rev consists of two activity modes, i.e., stationary activity at the start of a trial and non-stationary activity during the trial. Schematic of the augmented model that reconciles two activity modes is shown in Figure 4A. Analysis of the time derivative (dx_reverse / dt) and contractivity of the stationary state are shown in Figure 4B,C to demonstrate two activity modes.

      - We discuss in Result Section 4 main text that the stationary activity is consistent with the line attractor model, but the non-stationary activity deviates from the model. 

      - The two activity modes are linked dynamically. There is an underlying dynamics that can map the stationary state to the non-stationary trajectory. This is shown by predicting the nonstationary trajectory with the stationary state using a support vector regression model. The prediction results are shown in Figure 4D,E,F.

      - We discuss in Result Section 4 an extension of the standard line attractor model: points on the line attractor can serve as initial states that launch non-stationary activity associated with taskrelated behavior.

      - The separability of neural trajectories presented in Result Section 5 is framed as a property of the non-stationary dynamics associated with task-related behavior.

      To strengthen their claims, the authors should:

      (1) Provide a more detailed description of their RNN training paradigm and task structure, including clear illustrations of target outputs.

      (2) Discuss how their findings relate to and potentially extend previous work on similar tasks, particularly addressing the similarities and differences with the v-shaped state organization observed in reinforcement learning contexts. (https://www.nature.com/articles/s41593-018-0147-8 Figure1).

      (3) Explore whether their results could be consistent with a curved line attractor model, rather than treating line attractors and dynamic trajectories as mutually exclusive alternatives.

      Our response to these three comments is described above.

      Addressing these points would significantly enhance the impact of the study and provide a more nuanced understanding of how reversal probabilities are represented in neural circuits.

      In conclusion, while this study provides interesting insights into the neural representation of reversal probability, there are several areas where the methodology and interpretations could be refined.

      Additional Minor Concerns:

      (1) Network Training and Reversal Timing: The authors mention that the network was trained to switch after a reversal to match animal behavior, stating "Maximum a Posterior (MAP) of the reversal probability converges a few trials past the MAP estimate." More explanation of how this training strategy relates to actual animal behavior would enhance the reader's understanding of the meaning of the model's similarity to animal behavior in Figure 1.

      In Method Section 2.5, we described how our observation that the running estimate of MAP converges a few trials after the actual MAP is analogous to the animal’s reversal behavior.

      “This observation can be interpreted as follows. If a subject performing the reversal learning task employs the ideal observer model to detect the trial at which reward schedule is reversed, the subject can infer the reversal of reward schedule a few trials past the actual reversal and then switch its preferred choice. This delay in behavioral reversal, relative to the reversal of reward schedule, is analogous to the monkeys switching their preferred choice a few trials after the reversal of reward schedule.”

      In Step 4, we also mentioned that the target choice outputs are defined based on our observation in Step 3.

      “We used the observation from Step 3 to define target choice outputs that switch abruptly a few trials after the reversal of reward schedule, denoted as $t^*$ in the following. An example of target outputs are shown in Fig.\,\ref{fig_behavior}B.”

      (2) How is the network simulated in step 1 of training? Is it just randomly initialized? What defines this network structure?

      The initial state at the start of a block was random. We think the initial state is less relevant as the external inputs (i.e., cue and feedback) are strong and drive the network dynamics. We mentioned these setup and observation in Step 1 of training.

      “Step 1. Simulate the network starting from a random initial state, apply the external inputs, i.e., cue and feedback inputs, at each trial and store the network choices and reward outcomes at all the trials in a block. The network dynamics is driven by the external inputs applied periodically over the trials.”

      (3) Clarification on Learning Approach: More description of the approach in the main text would be beneficial. The statement "Here, we trained RNNs that learned from a Bayesian inference model to mimic the behavioral strategies of monkeys performing the reversal learning task [2, 4]" is somewhat confusing, as the model isn't directly fit to monkey data. A more detailed explanation of how the Bayesian inference model relates to monkey behavior and how it's used in RNN training would improve clarity.

      We described the learning approach in more detail, but also tried to be concise without going into technical details.

      We revised the sentence in Introduction as follows:

      “We sought to train RNNs to mimic the behavioral strategies of monkeys performing the reversal learning task. Previous studies \cite{costa2015reversal, bartolo2020prefrontal} have shown that a Bayesian inference model can capture a key aspect of the monkey's behavioral strategy, i.e., adhere to the preferred choice until the reversal of reward is detected and then switch abruptly. We trained the RNNs to replicate this behavioral strategy by training them on target behaviors generated from the Bayesian model.”

      We also added a paragraph in Result Section 1 that explains in detail how the training approach works.

      (4) In Figure 1B, it would be helpful to show the target output.

      We added a figure in Fig1B that shows a schematic of how the target output is generated.

      (5) An important point to consider is that a line attractor can be curved while still being topologically equivalent to a line. This nuance makes Figure 4A somewhat difficult to interpret. It might be helpful to discuss how the observed dynamics relate to potentially curved line attractors, which could provide a more nuanced understanding of the neural representations.

      As discussed above, we interpret the “curved” activity during the trial as non-stationary activity. We do not think this non-stationary activity would be characterized as attractor. Attractor is (1) a minimal set of states that is (2) invariant under the dynamics and (3) attracting when perturbed into its neighborhood [Strogatz, Nonlinear dynamics and chaos]. If we consider the autonomous system without the behavior-related external input as the base system, then the non-stationary states could satisfy (2) and (3) but not (1), so they are not part of the attractor. If we include the behavior-related external input to the autonomous dynamics, then it may be possible that the non-stationary trajectories are part of the attractor. We adopted the former interpretation as the behavior-related inputs are external and transient.

      (6) The results of the perturbation experiments seem to follow necessarily from the way x_rev was defined. It would be valuable to clarify if there's more to these results than what appears to be a direct consequence of the definition, or if there are subtleties in the experimental design or analysis that aren't immediately apparent.

      The neural activity x_rev is correlated to the reversal probability, but it is unclear if the activity in this neural subspace is causally linked to behavioral variables, such as choice output. We added this explanation at the beginning of Results Section 7 to clarify the reason for performing the perturbation experiments.

      “The neural activity $x_{rev}$ is obtained by identifying a neural subspace correlated to reversal probability. However, it remains to be shown if activity within this neural subspace is causally linked to behavioral variables, such as choice output.”

      Reviewer #2:

      Below is a list of things I have found difficult to understand, and been puzzled/concerned about while reading the manuscript:

      (1) It would be nice to say a bit more about the dataset that has been used for PFC analysis, e.g. number of neurons used and in what conditions is Figure 2A obtained (one has to go to supplementary to get the reference).

      We added information about the PFC dataset in the opening paragraph of Result Section 2 to provide an overview of what type of neural data we’ve analyzed. It includes information about the number of recorded neurons, recording method and spike binning process.

      (2) It would be nice to give more detail about the monkey task and better explain its trial structure.

      In Result Section 1 we added a description of the overall task structure (and its difference with other versions of revesal learning task), the RNN / monkey trial structure and differences in RNN and monkey tasks.

      (3) In the introduction it is mentioned that during the hold period, the probability of reversal is represented. Where does this statement come from?

      The fact that neural activity during a hold period, i.e., fixation period before presenting the target images, encodes the probability of reversal was demonstrated in a previous study (Bartolo and Averbeck ’20). 

      We realize that our intention was to state that, during the hold period, the reversal probability activity is stationary as in the line attractor model, instead of focusing on that the probability of reversal is represented during this period. We revised the sentence to convey this message. In addition, we revised the entire paragraph to reinterpret our findings: there are two activity modes where the stationary activity is consistent with the line attractor model but the non-stationary activity deviates from it.

      (4) "Around the behavioral reversal trial, reversal probabilities were represented by a family of rankordered trajectories that shifted monotonically". This sentence is confusing and hard to understand.

      Thank you for point this out. We rewrote the paragraph to reflect our revised interpretation. This sentence was removed, as it can be considered as part of the result on separable trajectories.

      (5) For clarity, in the first section, when it is written that "The reversal behavior of trained RNNs was similar to the monkey's behavior on the same task" it would be nice to be more precise, that this is to be expected given the strategy used to train the network.

      We removed this sentence as it makes a blanket statement. Instead, we compared the behavioral outputs of the RNNs and the monkeys one by one.

      We added a sentence in Result Section 1 that the RNN’s abrupt behavioral reversal is expected as they are trained to mimic the target choice outputs of the Bayesian model.

      “Such abrupt reversal behavior was expected as the RNNs were trained to mimic the target outputs of the Bayesian inference model.”

      (6) What is the value of tau used in eq (1), and how does it compare to trial duration?

      We described the value of time constant tau in Eq (1) and also discussed in Result Section 1 that tau=20ms is much faster than trial duration 500ms, thus the persistent behavior seen in trained RNNs is due to learning.

      (7) It would be nice to expand around the notion of « temporally flexible representation » to help readers grasp what this means.

      Instead of stating that the separable dynamic trajectories have “temporally flexible representation”, we break down in what sense it is temporally flexible: separable dynamic trajectories can accommodate the effects that task-related behavior have on generating non-stationary neural dynamics.

      “In sum, our results show that, in a probabilistic reversal learning task, recurrent neural networks encode reversal probability by adopting, not only stationary states as in a line attractor, but also separable dynamic trajectories that can represent distinct probabilistic values while accommodating non-stationary dynamics associated with task-related behavior.”

      Reviewer #3:

      (1) Data:

      It would be useful to describe the experimental task, recording setup, and analyses in much more detail - both in the text and in the methods. What part of PFC are the recordings from? How many neurons were recorded over how many sessions? Which other papers have they been used in? All of these things are important for the reader to know, but are not listed anywhere. There are also some inconsistencies, with the main text e.g. listing the 'typical block length' as 36 trials, and the methods listing the block length as 24 trials (if this is a difference between the biological data and RNN, that should be more explicit and motivated).

      We provided more detailed description of the monkey experimental task and PFC recordings in Result Section 1. We also added a new section in Methods 2.1 to describe the monkey experiment.

      The experimental analyses should be explained in more detail in the methods. There is e.g. no detailed description of the analysis in Figure 6F.

      We added a new section in Methods 6 to describe how the residual PFC activity is computed. It also describes the RNN perturbation experiments.

      Finally, it would be useful for more analyses of monkey behaviour and performance, either in the main text or supplementary figures.

      We did not pursue this comment as it is unclear how additional behavioral analyses would improve the manuscript.

      (2) Model:

      When fitting the network, 'step 1' of training in 2.3 seems superfluous. The posterior update from getting a reward at A is the same as that from not getting a reward at B (and vice versa), and it is therefore completely independent of the network choice. The reversal trial can therefore be inferred without ever simulating the network, simply by generating a sample of which trials have the 'good' option being rewarded and which trials have the 'bad' option being rewarded.

      We respectfully disagree with Reviewer 3’s comment that the reversal trial can be inferred without ever simulating the network. The only way for the network to know about the underlying reward schedule is to perform the task by itself. By simulating the network, it can sample the options and the reward outcomes. 

      Our understanding is that Review 3 described a strategy that a human would use to perform this task. Our goal was to train the RNN to perform the task.

      Do the blocks always start with choice A being optimal? Is everything similar if the network is trained with a variable initial rewarded option? E.g. in Fig 6, would you see the appropriate swap in the effect of the perturbation on choice probability if choice B was initially optimal?

      Thank you for pointing out that the initial high-value option can be random. When setting up the reward schedule, the initial high-value option was chosen randomly from two choice outputs and, at the scheduled reversal, it was switched to the other option. We did not describe this in the original manuscript.

      We added a descrption in Training Scheme Step 4 that the the initial high-value option is selected randomly. This is also explained in Result Section 1 when we give an overview of the RNN training procedure.

      (3) Content:

      It is rarely explained what the error bars represent (e.g. Figures 3B, 4C, ...) - this should be clear in all figures.

      We added that the error bars represent the standard error of mean.

      Figure 2A: this colour scheme is not great. There are abrupt colour changes both before and after the 'reversal' trial, and both of the extremes are hard to see.

      We changed the color scheme to contrast pre- and post-reversal trials without the abrupt color change.

      Figure 3E/F: how is prediction accuracy defined?

      We added that the prediction accuracy is based on Pearson correlation.

      Figure 4B: why focus on the derivative of the dynamics? The subsequent plots looking at the actual trajectories are much easier to understand. Also - what is 'relative trial' relative to?

      The derivative was analyzed to demonstrate stationarity or non-stationarity of the neural activity. We think it will be clearer in the revised manuscript that the derivative allows us to characterize those two activity modes.

      Relative trial number indicate the trial position relative to the behavioral reversal trial. We added this description to the figures when “relative trial” is used.

      Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories? As it is now, there will presumably be more rewarded trials early and late in each block, and more unrewarded trials around the reversal point. Does this introduce biases in the analysis? A related question is (i) why the black lines are different in the top and bottom plots, and (ii) why the ends of the black lines are discontinuous with the beginnings of the red/blue lines.

      We could not understand what Reviewer 3 was asking in this comment. It’d help if Review 3 could clarify the following question:

      “Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories?”

      Question (i): We wanted to look at how the trajectory shifts in the subsequent trial if a reward is or is not received in the current trial. The top panel analyzed all the trials in which the subsquent trial did not receive a reward. The bottom panel analyzed all the trials in which the subsequent trial received a reward. So, the trials analyzed in the top and bottom panels are different, and the black lines (x_rev of “current” trial) in the top and bottom panels are different.

      Question (ii): Black line is from the preceding trial of the red/blue lines, so if trials are designed to be continuous with the inter-trial-interval, then black and red/blue should be continuous. However, in the monkey experiment, the inter-trial-intervals were variable, so the end of current trial does not match with the start of next trial. The neural trajectories presented in the manuscript did not include the activity in this inter-trial-interval.

      Figure 6C: are the individual dots different RNNs? Claiming that there is a decrease in Delta x_choice for a v_+ stimulation is very misleading.

      Yes individual dots are different RNN perturbations. We added explanation about the dots in Figure7C caption. 

      We agree with the comment that \Delta x_choice did not decrease. This sentence was removed. Instead, we revised the manuscript to state that x_choice for v_+ stimulation was smaller than the x_choice for v_- stimulation. We performed KS-test to confirm statistical significance.

      Discussion: "...exhibited behaviour consistent with an ideal Bayesian observer, as found in our study". The RNN was explicitly trained to reproduce an ideal Bayesian observer, so this can only really be considered an assumption (not a result) in the present study.

      We agree that the statement in the original manuscript is inaccurate. It was revised to reflect that, in the other study, behavior outputs similar to a Bayesian observer emerged by simply learning to do the task, intead of directly mimicking the outputs of Bayesian observer as done in our study.

      “Authors showed that trained RNNs exhibited behavior outputs consistent with an ideal Bayesian observer without explicitly learning from the Bayesian observer. This finding shows that the behavioral strategies of monkeys could emerge by simply learning to do the task, instead of directly mimicking the outputs of Bayesian observer as done in our study.”

      Methods: Would the results differ if your Bayesian observer model used the true prior (i.e. the reversal happens in the middle 10 trials) rather than a uniform prior? Given the extensive literature on prior effects on animal behaviour, it is reasonable to expect that monkeys incorporate some non-uniform prior over the reversal point.

      Thank you for pointing out the non-uniform prior. We haven’t conducted this analysis, but would guess that the convergence to the posterior distribution would be faster. We’d have to perform further analysis, which is out of the scope of this paper, to investigate whether the posteior distribution would be different from what we obtained from uniform prior.

      Making the code available would make the work more transparent and useful to the community.

      The code is available in the following Github repository: https://github.com/chrismkkim/LearnToReverse

    1. Author response:

      Reviewer #1 (Public review):

      This study investigates the sex determination mechanism in the clonal ant Ooceraea biroi, focusing on a candidate complementary sex determination (CSD) locus-one of the key mechanisms supporting haplodiploid sex determination in hymenopteran insects. Using whole genome sequencing, the authors analyze diploid females and the rarely occurring diploid males of O. biroi, identifying a 46 kb candidate region that is consistently heterozygous in females and predominantly homozygous in diploid males. This region shows elevated genetic diversity, as expected under balancing selection. The study also reports the presence of an lncRNA near this heterozygous region, which, though only distantly related in sequence, resembles the ANTSR lncRNA involved in female development in the Argentine ant, Linepithema humile (Pan et al. 2024). Together, these findings suggest a potentially conserved sex determination mechanism across ant species. However, while the analyses are well conducted and the paper is clearly written, the insights are largely incremental. The central conclusion - that the sex determination locus is conserved in ants - was already proposed and experimentally supported by Pan et al. (2024), who included O. biroi among the studied species and validated the locus's functional role in the Argentine ant. The present study thus largely reiterates existing findings without providing novel conceptual or experimental advances.

      Although it is true that Pan et al., 2024 demonstrated (in Figure 4 of their paper) that the synteny of the region flanking ANTSR is conserved across aculeate Hymenoptera (including O. biroi), Reviewer 1’s claim that that paper provides experimental support for the hypothesis that the sex determination locus is conserved in ants is inaccurate. Pan et al., 2024 only performed experimental work in a single ant species (Linepithema humile) and merely compared reference genomes of multiple species to show synteny of the region, rather than functionally mapping or characterizing these regions.

      Other comments:

      The mapping is based on a very small sample size: 19 females and 16 diploid males, and these all derive from a single clonal line. This implies a rather high probability for false-positive inference. In combination with the fact that only 11 out of the 16 genotyped males are actually homozygous at the candidate locus, I think a more careful interpretation regarding the role of the mapped region in sex determination would be appropriate. The main argument supporting the role of the candidate region in sex determination is based on the putative homology with the lncRNA involved in sex determination in the Argentine ant, but this argument was made in a previous study (as mentioned above).

      Our main argument supporting the role of the candidate region in sex determination is not based on putative homology with the lncRNA in L. humile. Instead, our main argument comes from our genetic mapping (in Fig. 2), and the elevated nucleotide diversity within the identified region (Fig. 4). Additionally, we highlight that multiple genes within our mapped region are homologous to those in mapped sex determining regions in both L. humile and Vollenhovia emeryi, possibly including the lncRNA.

      In response to the Reviewer’s assertion that the mapping is based on a small sample size from a single clonal line, we want to highlight that we used all diploid males available to us. Although the primary shortcoming of a small sample size is to increase the probability of a false negative, small sample sizes can also produce false positives. We used two approaches to explore the statistical robustness of our conclusions. First, we generated a null distribution by randomly shuffling sex labels within colonies and calculating the probability of observing our CSD index values by chance (shown in Fig. 2). Second, we directly tested the association between homozygosity and sex using Fisher’s Exact Test (shown in Supplementary Fig. S2). In both cases, the association of the candidate locus with sex was statistically significant after multiple-testing correction using the Benjamini-Hochberg False Discovery Rate. These approaches are clearly described in the “CSD Index Mapping” section of the Methods.

      We also note that, because complementary sex determination loci are expected to evolve under balancing selection, our finding that the mapped region exhibits a peak of nucleotide diversity lends orthogonal support to the notion that the mapped locus is indeed a complementary sex determination locus.

      The fourth paragraph of the results and the sixth paragraph of the discussion are devoted to explaining the possible reasons why only 11/16 genotyped males are homozygous in the mapped region. The revised manuscript will include an additional sentence (in what will be lines 384-388) in this paragraph that includes the possible explanation that this locus is, in fact, a false positive, while also emphasizing that we find this possibility to be unlikely given our multiple lines of evidence.

      In response to Reviewer 1’s suggestion that we carefully interpret the role of the mapped region in sex determination, we highlight our careful wording choices, nearly always referring to the mapped locus as a “candidate sex determination locus” in the title and throughout the manuscript. For consistency, the revised manuscript version will change the second results subheading from “The O. biroi CSD locus is homologous to another ant sex determination locus but not to honeybee csd” to “O. biroi’s candidate CSD locus is homologous to another ant sex determination locus but not to honeybee csd,” and will add the word “candidate” in what will be line 320 at the beginning of the Discussion, and will change “putative” to “candidate” in what will be line 426 at the end of the Discussion.

      In the abstract, it is stated that CSD loci have been mapped in honeybees and two ant species, but we know little about their evolutionary history. But CSD candidate loci were also mapped in a wasp with multi-locus CSD (study cited in the introduction). This wasp is also parthenogenetic via central fusion automixis and produces diploid males. This is a very similar situation to the present study and should be referenced and discussed accordingly, particularly since the authors make the interesting suggestion that their ant also has multi-locus CSD and neither the wasp nor the ant has tra homologs in the CSD candidate regions. Also, is there any homology to the CSD candidate regions in the wasp species and the studied ant?

      In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of diploid males being produced via losses of heterozygosity during asexual reproduction, the revised manuscript will include the following sentence: “Therefore, if O. biroi uses CSD, diploid males might result from losses of heterozygosity at sex determination loci (Fig. 1C), similar to what is thought to occur in other asexual Hymenoptera that produce diploid males (Rabeling and Kronauer 2012; Matthey-Doret et al. 2019).”

      We note, however, that in their 2019 study, Matthey-Doret et al. did not directly test the hypothesis that diploid males result from losses of heterozygosity at CSD loci during asexual reproduction, because the diploid males they used for their mapping study came from inbred crosses in a sexual population of that species.

      We address this further below, but we want to emphasize that we do not intend to argue that O. biroi has multiple CSD loci. Instead, we suggest that additional, undetected CSD loci is one possible explanation for the absence of diploid males from any clonal line other than clonal line A. In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of multilocus CSD, the revised manuscript version will include the following additional sentence in the fifth paragraph of the discussion: “Multi-locus CSD has been suggested to limit the extent of diploid male production in asexual species under some circumstances (Vorburger 2013; Matthey-Doret et al. 2019).”

      Regarding Reviewer 2’s question about homology between the putative CSD loci from the (Matthey-Doret et al. 2019) study and O. biroi, we note that there is no homology. The revised manuscript version will have an additional Supplementary Table (which will be the new Supplementary Table S3) that will report the results of this homology search. The revised manuscript will also include the following additional sentence in the Results: “We found no homology between the genes within the O. biroi CSD index peak and any of the genes within the putative L. fabarum CSD loci (Supplementary Table S3).”

      The authors used different clonal lines of O. biroi to investigate whether heterozygosity at the mapped CSD locus is required for female development in all clonal lines of O. biroi (L187-196). However, given the described parthenogenesis mechanism in this species conserves heterozygosity, additional females that are heterozygous are not very informative here. Indeed, one would need diploid males in these other clonal lines as well (but such males have not yet been found) to make any inference regarding this locus in other lines.

      We agree that a full mapping study including diploid males from all clonal lines would be preferable, but as stated earlier in that same paragraph, we have only found diploid males from clonal line A. We stand behind our modest claim that “Females from all six clonal lines were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.” In the revised manuscript version, this sentence (in what will be lines 199-201) will be changed slightly in response to a reviewer comment below: “All females from all six clonal lines (including 26 diploid females from clonal line B) were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.”

      Reviewer #2 (Public review):

      The manuscript by Lacy et al. is well written, with a clear and compelling introduction that effectively conveys the significance of the study. The methods are appropriate and well-executed, and the results, both in the main text and supplementary materials, are presented in a clear and detailed manner. The authors interpret their findings with appropriate caution.

      This work makes a valuable contribution to our understanding of the evolution of complementary sex determination (CSD) in ants. In particular, it provides important evidence for the ancient origin of a non-coding locus implicated in sex determination, and shows that, remarkably, this sex locus is conserved even in an ant species with a non-canonical reproductive system that typically does not produce males. I found this to be an excellent and well-rounded study, carefully analyzed and well contextualized.

      That said, I do have a few minor comments, primarily concerning the discussion of the potential 'ghost' CSD locus. While the authors acknowledge (line 367) that they currently have no data to distinguish among the alternative hypotheses, I found the evidence for an additional CSD locus presented in the results (lines 261-302) somewhat limited and at times a bit difficult to follow. I wonder whether further clarification or supporting evidence could already be extracted from the existing data. Specifically:

      We agree with Reviewer 2 that the evidence for a second CSD locus is limited. In fact, we do not intend to advocate for there being a second locus, but we suggest that a second CSD locus is one possible explanation for the absence of diploid males outside of clonal line A. In our initial version, we intentionally conveyed this ambiguity by titling this section “O. biroi may have one or multiple sex determination loci.” However, we now see that this leads to undue emphasis on the possibility of a second locus. In the revised manuscript, we will split this into two separate sections: “Diploid male production differs across O. biroi clonal lines” and “O. biroi lacks a tra-containing CSD locus.”

      (1) Line 268: I doubt the relevance of comparing the proportion of diploid males among all males between lines A and B to infer the presence of additional CSD loci. Since the mechanisms producing these two types of males differ, it might be more appropriate to compare the proportion of diploid males among all diploid offspring. This ratio has been used in previous studies on CSD in Hymenoptera to estimate the number of sex loci (see, for example, Cook 1993, de Boer et al. 2008, 2012, Ma et al. 2013, and Chen et al., 2021). The exact method might not be applicable to clonal raider ants, but I think comparing the percentage of diploid males among the total number of (diploid) offspring produced between the two lineages might be a better argument for a difference in CSD loci number.

      We want to re-emphasize here that we do not wish to advocate for there being two CSD loci in O. biroi. Rather, we want to explain that this is one possible explanation for the apparent absence of diploid males outside of clonal line A. We hope that the modifications to the manuscript described in the previous response help to clarify this.

      Reviewer 2 is correct that comparing the number of diploid males to diploid females does not apply to clonal raider ants. This is because males are vanishingly rare among the vast numbers of females produced. We do not count how many females are produced in laboratory stock colonies, and males are sampled opportunistically. Therefore, we cannot report exact numbers. However, we will add the following sentence to the revised manuscript: “Despite the fact that we maintain more colonies of clonal line B than of clonal line A in the lab, all the diploid males we detected came from clonal line A.”

      (2) If line B indeed carries an additional CSD locus, one would expect that some females could be homozygous at the ANTSR locus but still viable, being heterozygous only at the other locus. Do the authors detect any females in line B that are homozygous at the ANTSR locus? If so, this would support the existence of an additional, functionally independent CSD locus.

      We thank the reviewer for this suggestion, and again we emphasize that we do not want to argue in favor of multiple CSD loci. We just want to introduce it as one possible explanation for the absence of diploid males outside of clonal line A.

      The 26 sequenced diploid females from clonal line B are all heterozygous at the mapped locus, and the revised manuscript will clarify this in what will be lines 199-201. Previously, only six of those diploid females were included in Supplementary Table S2, and that will be modified accordingly.

      (3) Line 281: The description of the two tra-containing CSD loci as "conserved" between Vollenhovia and the honey bee may be misleading. It suggests shared ancestry, whereas the honey bee csd gene is known to have arisen via a relatively recent gene duplication from fem/tra (10.1038/nature07052). It would be more accurate to refer to this similarity as a case of convergent evolution rather than conservation.

      In the sentence that Reviewer 2 refers to, we are representing the assertion made in the (Miyakawa and Mikheyev 2015) paper in which, regarding their mapping of a candidate CSD locus that contains two linked tra homologs, they write in the abstract: “these data support the prediction that the same CSD mechanism has indeed been conserved for over 100 million years.” In that same paper, Miyakawa and Mikheyev write in the discussion section: “As ants and bees diverged more than 100 million years ago, sex determination in honey bees and V. emeryi is probably homologous and has been conserved for at least this long.”

      As noted by Reviewer 2, this appears to conflict with a previously advanced hypothesis: that because fem and csd were found in Apis mellifera, Apis cerana, and Apis dorsata, but only fem was found in Mellipona compressipes, Bombus terrestris, and Nasonia vitripennis, that the csd gene evolved after the honeybee (Apis) lineage diverged from other bees (Hasselmann et al. 2008). However, it remains possible that the csd gene evolved after ants and bees diverged from N. vitripennis, but before the divergence of ants and bees, and then was subsequently lost in B. terrestris and M. compressipes. This view was previously put forward based on bioinformatic identification of putative orthologs of csd and fem in bumblebees and in ants [(Schmieder et al. 2012), see also (Privman et al. 2013)]. However, subsequent work disagreed and argued that the duplications of tra found in ants and in bumblebees represented convergent evolution rather than homology (Koch et al. 2014). Distinguishing between these possibilities will be aided by additional sex determination locus mapping studies and functional dissection of the underlying molecular mechanisms in diverse Aculeata.

      Distinguishing between these competing hypotheses is beyond the scope of our paper, but the revised manuscript will include additional text to incorporate some of this nuance. We will include these modified lines below:

      “A second QTL region identified in V. emeryi (V.emeryiCsdQTL1) contains two closely linked tra homologs, similar to the closely linked honeybee tra homologs, csd and fem (Miyakawa and Mikheyev 2015). This, along with the discovery of duplicated tra homologs that undergo concerted evolution in bumblebees and ants (Schmieder et al. 2012; Privman et al. 2013) has led to the hypothesis that the function of tra homologs as CSD loci is conserved with the csd-containing region of honeybees (Schmieder et al. 2012; Miyakawa and Mikheyev 2015). However, other work has suggested that tra duplications occurred independently in honeybees, bumblebees, and ants (Hasselmann et al. 2008; Koch et al. 2014), and it remains to be demonstrated that either of these tra homologs acts as a primary CSD signal in V. emeryi.”

      (4) Finally, since the authors successfully identified multiple alleles of the first CSD locus using previously sequenced haploid males, I wonder whether they also observed comparable allelic diversity at the candidate second CSD locus. This would provide useful supporting evidence for its functional relevance.

      As is already addressed in the final paragraph of the results and in Supplementary Fig. S4, there is no peak of nucleotide diversity in any of the regions homologous to V.emeryiQTL1, which is the tra-containing candidate sex determination locus (Miyakawa and Mikheyev 2015). In the revised manuscript, the relevant lines will be 307-310. We want to restate that we do not propose that there is a second candidate CSD locus in O. biroi, but we simply raise the possibility that multi-locus CSD *might* explain the absence of diploid males from clonal lines other than clonal line A (as one of several alternative possibilities).

      Overall, these are relatively minor points in the context of a strong manuscript, but I believe addressing them would improve the clarity and robustness of the authors' conclusions.

      Reviewer #3 (Public review):

      Summary:

      The sex determination mechanism governed by the complementary sex determination (CSD) locus is one of the mechanisms that support the haplodiploid sex determination system evolved in hymenopteran insects. While many ant species are believed to possess a CSD locus, it has only been specifically identified in two species. The authors analyzed diploid females and the rarely occurring diploid males of the clonal ant Ooceraea biroi and identified a 46 kb CSD candidate region that is consistently heterozygous in females and predominantly homozygous in males. This region was found to be homologous to the CSD locus reported in distantly related ants. In the Argentine ant, Linepithema humile, the CSD locus overlaps with an lncRNA (ANTSR) that is essential for female development and is associated with the heterozygous region (Pan et al. 2024). Similarly, an lncRNA is encoded near the heterozygous region within the CSD candidate region of O. biroi. Although this lncRNA shares low sequence similarity with ANTSR, its potential functional involvement in sex determination is suggested. Based on these findings, the authors propose that the heterozygous region and the adjacent lncRNA in O. biroi may trigger female development via a mechanism similar to that of L. humile. They further suggest that the molecular mechanisms of sex determination involving the CSD locus in ants have been highly conserved for approximately 112 million years. This study is one of the few to identify a CSD candidate region in ants and is particularly noteworthy as the first to do so in a parthenogenetic species.

      Strengths:

      (1) The CSD candidate region was found to be homologous to the CSD locus reported in distantly related ant species, enhancing the significance of the findings.

      (2) Identifying the CSD candidate region in a parthenogenetic species like O. biroi is a notable achievement and adds novelty to the research.

      Weaknesses

      (1) Functional validation of the lncRNA's role is lacking, and further investigation through knockout or knockdown experiments is necessary to confirm its involvement in sex determination.

      See response below.

      (2) The claim that the lncRNA is essential for female development appears to reiterate findings already proposed by Pan et al. (2024), which may reduce the novelty of the study.

      We do not claim that the lncRNA is essential for female development in O. biroi, but simply mention the possibility that, as in L. humile, it is somehow involved in sex determination. We do not have any functional evidence for this, so this is purely based on its genomic position immediately adjacent to our mapped candidate region. We agree with the reviewer that the study by Pan et al. (2024) decreases the novelty of our findings. Another way of looking at this is that our study supports and bolsters previous findings by partially replicating the results in a different species.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have used full-length single-cell sequencing on a sorted population of human fetal retina to delineate expression patterns associated with the progression of progenitors to rod and cone photoreceptors. They find that rod and cone precursors contain a mix of rod/cone determinants, with a bias in both amounts and isoform balance likely deciding the ultimate cell fate. Markers of early rod/cone hybrids are clarified, and a gradient of lncRNAs is uncovered in maturing cones. Comparison of early rods and cones exposes an enriched MYCN regulon, as well as expression of SYK, which may contribute to tumor initiation in RB1 deficient cone precursors.

      Strengths:

      (1) The insight into how cone and rod transcripts are mixed together at first is important and clarifies a long-standing notion in the field.

      (2) The discovery of distinct active vs inactive mRNA isoforms for rod and cone determinants is crucial to understanding how cells make the decision to form one or the other cell type. This is only really possible with full-length scRNAseq analysis.

      (3) New markers of subpopulations are also uncovered, such as CHRNA1 in rod/cone hybrids that seem to give rise to either rods or cones.

      (4) Regulon analyses provide insight into key transcription factor programs linked to rod or cone fates.

      (5) The gradient of lncRNAs in maturing cones is novel, and while the functional significance is unclear, it opens up a new line of questioning around photoreceptor maturation.

      (6) The finding that SYK mRNA is naturally expressed in cone precursors is novel, as previously it was assumed that SYK expression required epigenetic rewiring in tumors.

      We thank the reviewer for describing the study’s strengths, reflecting the major conclusions of the initially submitted manuscript.  However, based on new analyses – including the requested analyses of other scRNA-seq datasets, our revision clarifies that:

      -  related to point (1), cone and rod transcripts do not appear to be mixed together at first (i.e., in immediately post-mitotic immature cone and rod precursors) but appear to be coexpressed in subsequent cone and rod precursor stages; and 

      - related to point (3), CHRNA1 appears to mark immature cone precursors that are distinct from the maturing cone and rod precursors that co-express cone- and rod-related RNAs (despite the similar UMAP positions of the two populations in our dataset). 

      Weaknesses:

      (1) The writing is very difficult to follow. The nomenclature is confusing and there are contradictory statements that need to be clarified.

      (2) The drug data is not enough to conclude that SYK inhibition is sufficient to prevent the division of RB1 null cone precursors. Drugs are never completely specific so validation is critical to make the conclusion drawn in the paper.

      We thank the reviewer for noting these important issues. Accordingly, in the revised manuscript:

      (1) We improve the writing and clarify the nomenclature and contradictory statements, particularly those noted in the Reviewer’s Recommendations for Authors. 

      (2) We scale back claims related to the role of SYK in the cone precursor response to RB1 loss, with wording changes in the Abstract, Results, and Discussion, which now recognize that the inhibitor studies only support the possibility that cone-intrinsic SYK expression contributes to retinoblastoma initiation, as detailed in our responses to Reviewer’s Recommendations for Authors. We agree and now mention that genetic perturbation of SYK is required to prove its role.  

      Reviewer #2 (Public review):

      Summary:

      The authors used deep full-length single-cell sequencing to study human photoreceptor development, with a particular emphasis on the characteristics of photoreceptors that may contribute to retinoblastoma.

      Strengths:

      This single-cell study captures gene regulation in photoreceptors across different developmental stages, defining post-mitotic cone and rod populations by highlighting their unique gene expression profiles through analyses such as RNA velocity and SCENIC. By leveraging fulllength sequencing data, the study identifies differentially expressed isoforms of NRL and THRB in L/M cone and rod precursors, illustrating the dynamic gene regulation involved in photoreceptor fate commitment. Additionally, the authors performed high-resolution clustering to explore markers defining developing photoreceptors across the fovea and peripheral retina, particularly characterizing SYK's role in the proliferative response of cones in the RB loss background. The study provides an in-depth analysis of developing human photoreceptors, with the authors conducting thorough analyses using full-length single-cell RNA sequencing. The strength of the study lies in its design, which integrates single-cell full-length RNA-seq, longread RNA-seq, and follow-up histological and functional experiments to provide compelling evidence supporting their conclusions. The model of cell type-dependent splicing for NRL and THRB is particularly intriguing. Moreover, the potential involvement of the SYK and MYC pathways with RB in cone progenitor cells aligns with previous literature, offering additional insights into RB development.

      We thank the reviewer for summarizing the main findings and noting the compelling support for the conclusions, the intriguing cell type-dependent splicing of rod and cone lineage factors, and the insights into retinoblastoma development.  

      Weaknesses:

      The manuscript feels somewhat unfocused, with a lack of a strong connection between the analysis of developing photoreceptors, which constitutes the bulk of the manuscript, and the discussion on retinoblastoma. Additionally, given the recent publication of several single-cell studies on the developing human retina, it is important for the authors to cross-validate their findings and adjust their statements where appropriate.

      We agree that the manuscript covers a range of topics resulting from the full-length scRNAseq analyses and concur that some studies of developing photoreceptors were not well connected to retinoblastoma. However, we also note that the connection to retinoblastoma is emphasized in several places in the Introduction and throughout the manuscript and was a significant motivation for pursuing the analyses. We suggest that it was valuable to highlight how deep, fulllength scRNA-seq of developing retina provides insights into retinoblastoma, including i) the similar biased expression of NRL transcript isoforms in cone precursors and RB tumors, ii) the cone precursors’ co-expression of rod- and cone-related genes such as NR2E3 and GNAT2, which may explain similar co-expression in RB cells, and iii) the expression of  SYK in early cones and RB cells.  While the earlier version had mainly highlighted point (iii), the revised Discussion further refers to points (i) and (ii) as described further in the response to the Reviewer’s Recommendations for Authors. 

      We address the Reviewer’s request to cross-validate our findings with those of other single-cell studies of developing human retina by relating the different photoreceptor-related cell populations identified in our study to those characterized by Zuo et al (PMID 39117640), which was specifically highlighted by the reviewer and is especially useful for such cross-validation given the extraordinarily large ~ 220,000 cell dataset covering a wide range of retinal ages (pcw 8–23) and spatiotemporally stratified by macular or peripheral retina location. Relevant analyses of the Zuo et al dataset are shown in Supplementary Figures S3G-H, S10B, S11A-F, and S13A,B. 

      Reviewer #3 (Public review):

      Summary:

      The authors use high-depth, full-length scRNA-Seq analysis of fetal human retina to identify novel regulators of photoreceptor specification and retinoblastoma progression.

      Strengths:

      The use of high-depth, full-length scRNA-Seq to identify functionally important alternatively spliced variants of transcription factors controlling photoreceptor subtype specification, and identification of SYK as a potential mediator of RB1-dependent cell cycle reentry in immature cone photoreceptors.

      Human developing fetal retinal tissue samples were collected between 13-19 gestational weeks and this provides a substantially higher depth of sequencing coverage, thereby identifying both rare transcripts and alternative splice forms, and thereby representing an important advance over previous droplet-based scRNA-Seq studies of human retinal development.

      Weaknesses:

      The weaknesses identified are relatively minor. This is a technically strong and thorough study, that is broadly useful to investigators studying retinal development and retinoblastoma.

      We thank the reviewer for describing the strengths of the study. Our revision addresses the concerns raised separately in the Reviewer’s Recommendations for Authors, as detailed in the responses below.  

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers have completed their reviews. Generally, they note that your work is important and that the evidence is generally convincing. The reviewers are in general agreement that the paper adds to the field. The findings of rod/cone fate determination at a very early stage are intriguing. Generally, the paper would benefit from clarifications in the writing and figures. Experimentally, the paper would benefit from validation of the drug data, for example using RNAi or another assay. Alternatively, the authors could note the caveats of the drug experiments and describe how they could be improved. In terms of analysis, the paper would be improved by additional comparisons of the authors' data to previously published datasets.

      We thank the reviewing editor for this summary. As described in the individual reviewer responses, we clarify the writing and figures and provide comparisons to previously published datasets (in particular, the large snRNA-seq dataset of Zuo et al., 2024 (PMID 39117640).  With regard to the drug (i.e., SYK inhibitor) studies, we opted to provide caveats and describe the need for genetic approaches to validate the role of SYK, owing to the infeasibility of completing genetic perturbation experiments in the appropriate timeframe.  We are grateful for the opportunity to present our findings with appropriate caveats. 

      Reviewer #1 (Recommendations for the authors):

      Shayler cell sort human progenitor/rod/cone populations then full-length single cell RNAseq to expose features that distinguish paths towards rods or cones. They initially distinguish progenitors (RPCs), immature photoreceptor precursors (iPRPs), long/medium wavelength (LM) cones, late-LM cones, short wavelength (S) cones, early rods (ER) and late rods (LR), which exhibit distinct transcription factor regulons (Figures 1, 2). These data expose expected and novel enriched genes, and support the notion that S cones are a default state lacking expression of rod (NRL) or cone (THRB) determinants but retaining expression of generic photoreceptor drivers (CRX/OTX2/NEUROD1 regulons). They identify changes in regulon activity, such as increasing NRL activity from iPRP to ER to LR, but decreasing from iPRP to cones, or increasing RAX/ISL2/THRB regulon activity from iPRP to LM cones, but decreasing from iPRP to S cones or rods.

      They report co-expression of rod/cone determinants in LM and ER clusters, and the ratios are in the expected directions (NRLTHRB or RXRG in ER). A novel insight from the FL seq is that there are differing variants generated in each cell population. Full-length NRL (FL-NRL) predominates in the rod path, whereas truncated NRL (Tr-NRL) does so in the cone path, then similar (but opposite) findings are presented for THRB (Fig 3, 4), whereas isoforms are not a feature of RXRG expression, just the higher expression in cones.

      The authors then further subcluster and perform RNA velocity to uncover decision points in the tree (Figure 5). They identify two photoreceptor precursor streams, the Transitional Rods (TRs) that provide one source for rod maturation and (reusing the name from the initial clustering) iPRPs that form cones, but also provide a second route to rods. TR cells closest to RPCs (immediately post-mitotic) have higher levels of the rod determinant NR2E3 and NRL, whereas the higher resolution iPRPs near RPCs lack NR2E3 and have higher levels of ONECUT1, THRB, and GNAT2, a cone bias. These distinct rod-biased TR and cone-biased high-resolution iPRPs were not evident in published scRNAseq with 3′ end-counting (i.e. not FL seq). Regulon analysis confirmed higher NRL activity in TR cells, with higher THRB activity in highresolution iPRP cells.

      Many of the more mature high-resolution iPRPs show combinations of rod (GNAT1, NR2E3) and cone (GNAT2, THRB) paths as well as both NRL and THRB regulons, but with a bias towards cone-ness (Figure 6). Combined FISH/immunofluorescence in fetal retina uncovers cone-biased RXRG-protein-high/NR2E3-protein-absent cone-fated cells that nevertheless expressed NR2E3 mRNA. Thus early cone-biased iPRP cells express rod gene mRNA, implying a rod-cone hybrid in early photoreceptor development. The authors refer to these as "bridge region iPRP cells".

      In Figure 7, they identify CHRNA1 as the most specific marker of these bridge cells (overlapping with ATOH7 and DLL3, previously linked to cone-biased precursors), and FISH shows it is expressed in rod-biased NRL protein-positive and cone-biased RXRG proteinpositive cones at fetal week 12.

      Figure 8 outlines the graded expression of various lncRNAs during cone maturation, a novel pattern.

      Finally (Figure 9), the authors identify differential genes expressed in early rods (ER cluster from Figure 1) vs early cones (LM cluster, excluding the most mature opsin+ cells), revealing high levels of MYCN targets in cones. They also find SYK expression in cones. SYK was previously linked to retinoblastoma, so intrinsic expression may predispose cone precursors to transformation upon RB loss. They finish by showing that a SYK inhibitor blocks the proliferation of dividing RB1 knockdown cone precursors in the human fetal retina.

      Overall, the authors have uncovered interesting patterns of biased expression in cone/rod developmental paths, especially relating to the isoform differences for NRL and THRB which add a new layer to our understanding of this fate choice. The analyses also imply that very soon after RPCs exit the cell cycle, they generate post-mitotic precursors biased towards a rod or cone fate, that carry varying proportions of mixed rod/cone determinants and other rod/cone marker genes. They also introduce new markers that may tag key populations of cells that precede the final rod/cone choice (e.g. CHRNA1), catalogue a new lncRNA gradient in cone maturation, and provide insight into potential genes that may contribute to retinoblastoma initiation, like SYK, due to intrinsic expression in cone precursors. However, as detailed below, the text needs to be improved considerably, and overinterpretations need to be moderated, removed, or tested more rigorously with extra data.

      Major Comments

      The manuscript is very difficult to follow. The nomenclature is at times torturous, and the description of hybrid rod/cone hybrid cells is confusing in many aspects.

      (1) A single term, iPRP, is used to refer to an initial low-resolution cluster, and then to a subset of that cluster later in the paper.

      We agree that using immature photoreceptor precursor (iPRP) for both high-resolution and lowresolution clusters was confusing. We kept this name for the low-resolution cluster (which includes both immature cone and immature rod precursors), renamed the high-resolution iPRP cluster immature cone precursors (iCPs). and renamed their transitional rod (TR) counterparts immature rod precursors (iRPs). These designations are based on 

      - the biased expression of THRB, ONECUT1, and the THRB regulon in iCPs (Fig. 5D,E);

      - the biased expression of NRL, NR2E3, and NRL regulon iRPs (Fig. 5D,E);

      - the partially distinct iCP and iRP UMAP positions (Figure 5C); and 

      - the evidence of similar immature cone versus rod precursor populations in the Zuo et al 3’ snRNA-seq dataset, as noted below and described in two new paragraphs starting at the bottom of p. 12.

      (2) To complicate matters further, the reader needs to understand the subset within the iPRP referred to as bridge cells, and we are told at one point that the earliest iPRPs lack NR2E3, then that they later co-express NR2E3, and while the authors may be referring to protein and RNA, it serves to further confuse an already difficult to follow distinction. I had to read and re-read the iPRP data many times, but it never really became totally clear.

      We agree that the description of the high-resolution iPRP (now “iCP”) subsets was unclear, although our further analyses of a large 3’ snRNA-seq dataset in Figure S11 support the impression given in the original manuscript that the earliest iCPs lack NR2E3 and then later coexpress NR2E3 while the earliest iRPs lack THRB and then later express THRB. As described in new text in the Two post-mitotic immature photoreceptor precursor populations section (starting on line 7 of p. 13): 

      When considering only the main cone and rod precursor UMAP regions, early (pcw 8 – 13) cone precursors expressed THRB and lacked NR2E3 (Figure S11D,E, blue arrows), while early (pcw 10 – 15) rod precursors expressed NR2E3 and lacked THRB (Figure S11D,E, red arrows), similar to RPC-localized iCPs and iRPs in our study (Figure 5D).

      Next, as summarized in new text in the Early cone and rod precursors with rod- and conerelated RNA co-expression section (new paragraph at top of p. 16): 

      Thus, a 3’ snRNA-seq analysis confirmed the initial production of immature photoreceptor precursors with either L/M cone-precursor-specific THRB or rod-precursor-specific NR2E3 expression, followed by lower-level co-expression of their counterparts, NR2E3 in cone precursors and THRB in rod precursors. However, in the Zuo et al. analyses, the co-expression was first observed in well-separated UMAP regions, as opposed to a region that bridges the early cone and early rod populations in our UMAP plots. These findings are consistent with the notion that cone- and rod-related RNA co-expression begins in already fate-determined cone and rod precursors, and that such precursors aberrantly intermixed in our UMAP bridge region due to their insufficient representation in our dataset.  

      Importantly, and as noted in our ‘Public response’ to Reviewer 1, “CHRNA1 appears to mark immature cone precursors that are distinct from the maturing cone and rod precursors that coexpress cone- and rod-related RNAs (despite the similar UMAP positions of the two populations in our dataset).” In support of this notion, the immature cone precursors expressing CHRNA1  and other  populations did not overlap in UMAP space in the Zuo et al dataset. We hope the new text cited above along with other changes will significantly clarify the observations.

      (3) The term "cone/rod precursor" shows up late in the paper (page 12), but it was clear (was it not?) much earlier in this manuscript that cone and rod genes are co-expressed because of the coexpressed NRL and THRB isoforms in Figures 3/4.

      We thank the reviewer for noting that the differential NRL and THRB isoform expression already implies that cone and rod genes are co-expressed. However, as we now state, the co-expression of RNAs encoding an additional cone marker (GNAT2) and rod markers (GNAT1, NR2E3) was 

      “suggestive of a proposed hybrid cone/rod precursor state more extensive than implied by the coexpression of different THRB and NRL isoforms” (first paragraph of “Early cone and rod …” section on p. 14; new text underlined). 

      (4) The (incorrect) impression given later in the manuscript is that the rod/cone transcript mixture applies to just a subset of the iPRP cells, or maybe just the bridge cells (writing is not clear), but actually, neither of those is correct as the more abundant and more mature LM and ER populations analyzed earlier coexpress NRL and THRB mRNAs (Figures 2, 3). Overall, the authors need to vastly improve the writing, simplify/clarify the nomenclature, and better label figures to match the text and help the reader follow more easily and clearly. As it stands, it is, at best, obtuse, and at worst, totally confusing.

      We thank the reviewer for bringing the extent of the confusing terminology and wording to our attention. We revised the terminology (as in our response to point 1) and extensively revised the text.  We also performed similar analyses of the Zuo et al. data (as described in more detail in our response to Reviewer 2), which clarifies the distinct status of cells with the “rod/cone transcript mixture” and cells co-expressing early cone and rod precursor markers.  

      To more clearly describe data related to cells with rod- and cone-related RNA co-expression, we divided the former Figure 6 into two figures, with Figure 6 now showing the cone- and rodrelated RNA co-expression inferred from scRNA-seq and Figure 7 showing GNAT2 and NR2E3 co-expression in FISH analyses of human retina plus a new schematic in the new panel 7E.

      To separate the conceptually distinct analyses of cone and rod related RNA co-expression and the expression of early photoreceptor precursor markers (which were both found in the so-called bridge region – yet now recognized to be different subpopulations), we separated the analyses of the early photoreceptor precursor markers to form a new section, “Developmental expression of photoreceptor precursor markers and fate determinants,” starting on p. 16. 

      Additionally, we further review the findings and their implications in four revised Discussion paragraphs starting at the bottom of p. 23).

      (5) The data showing that overexpressing Tr-NRL in murine NIH3T3 fibroblasts blocks FL-NRL function is presented at the end of page 7 and in Figure 3G. Subsequent analysis two paragraphs and two figures later (end page 8, Figure 5C + supp figs) reveal that Tr-NRL protein is not detectable in retinoblastoma cells which derive from cone precursors cells and express Tr-NRL mRNA, and the protein is also not detected upon lentiviral expression of Tr-NRL in human fetal retinal explants, suggesting it is unstable or not translated. It would be preferable to have the 3T3 data and retinoblastoma/explant data juxtaposed. E.g. they could present the latter, then show the 3T3 that even if it were expressed (e.g. briefly) it would interfere with FL-NRL. The current order and spacing are somewhat confusing.

      We thank the reviewer for this suggestion and moved the description of the luciferase assays to follow the retinoblastoma and explant data and switched the order of Figure panels 3G and 3H.  

      (6) On page 15, regarding early rod vs early cone gene expression, the authors state: "although MYCN mRNA was not detected....", yet on the volcano plot in Figure S14A MYCN is one of the marked genes that is higher in cones than rods, meaning it was detected, and a couple of sentences later: "Concordantly, the LM cluster had increased MYCN RNA". The text is thus confusing.

      With respect, we note that the original text read, “although MYC RNA was not detected,” which related to a statement in the previous sentence that the gene ontology analysis identified “MYC targets.” However, given that this distinction is subtle and may be difficult for readers to recognize, we revised the text (now on p. 19) to more clearly describe expression of MYCN (but not MYC) as follows:

      “The upregulation of MYC target genes was of interest given that many MYC target genes are also targets of MYCN, that MYCN protein is highly expressed in maturing (ARR3+) cone precursors but not in NRL+ rods (Figure 10A), and that MYCN is critical to the cone precursor proliferative response to pRB loss8–10.  Indeed, whereas MYC RNA was not detected, the LM cone cluster had increased MYCN RNA …”

      (7) The authors state that the SYK drug is "highly specific". They provide no evidence, but no drug is 100% specific, and it is possible that off-target hits are important for the drug phenotype. This data should be removed or validated by co-targeting the SYK gene along with RB1.

      We agree that our data only show the potential for SYK to contribute to the cone proliferative response; however, we believe the inhibitor study retains value in that a negative result (no effect of the SYK inhibitor) would disprove its potential involvement. To reflect this, we changed wording related to this experiment as follows:

      In the Abstract, we changed:

      (1) “SYK, which contributed to the early cone precursors’ proliferative response to RB1 loss” To: “SYK, which was implicated in the early cone precursors’ proliferative response to RB1 loss.”  

      (2) “These findings reveal … and a role for early cone-precursor-intrinsic SYK expression.” To:  “These findings reveal … and suggest a role for early cone-precursor-intrinsic SYK expression.”

      In the last paragraph of the Results, we changed:

      (1) “To determine if SYK contributes…” To:  “To determine if SYK might contribute…”

      (2) “the highly specific SYK inhibitor” To:  “the selective SYK inhibitor”  

      (3)  “indicating that cone precursor intrinsic SYK activity is critical to the proliferative response” To: “consistent with the notion that cone precursor intrinsic SYK activity contributes to the proliferative response.”

      In the Results, we added a final sentence: 

      “However, given potential SYK inhibitor off-target effects, validation of the role of SYK in retinoblastoma initiation will require genetic ablation studies.”

      In the Discussion (2nd-to-last paragraph), we changed: 

      “SYK inhibition impaired pRB-depleted cone precursor cell cycle entry, implying that native SYK expression rather than de novo induction contributes to the cone precursors’ initial proliferation.” To: “…the pRB-depleted cone precursors’ sensitivity to a SYK inhibitor suggests that native SYK expression rather than de novo induction contributes to the cone precursors’ initial proliferation, although genetic ablation of SYK is needed to confirm this notion.” In the Discussion last sentence, we changed:

      “enabled the identification of developmental stage-specific cone precursor features that underlie retinoblastoma predisposition.” To: “enabled the identification of developmental stage-specific cone precursor features that are associated with the cone precursors’ predisposition to form retinoblastoma tumors.”

      Minor/Typos

      Figure 7 legend, H should be D.

      We corrected the figure legend (now related to Figure 8).

      Reviewer #2 (Recommendations for the authors):

      (1) The author should take advantage of recently published human fetal retina data, such as PMID:39117640, which includes a larger dataset of cells that could help validate the findings. Consequently, statements like "To our knowledge, this is the first indication of two immediately post-mitotic photoreceptor precursor populations with cone versus rod-biased gene expression" may need to be revised.

      We thank the reviewer for noting the evidence of distinct immediately post-mitotic rod and cone populations published by others after we submitted our manuscript. In response, we omitted the sentence mentioned and extensively cross-checked our results including:

      - comparison of our early versus late cone and rod maturation states to the cone and rod precursor versus cone and rod states identified by Zuo et al (new paragraph on the top half of p. 6 and new figure panels S3G,H);

      - detection of distinct immediately post-mitotic versus later cone and rod precursor populations (two new paragraphs on pp. 12-13 and new Figures S10B and S11A-E); 

      - identification of cone and rod precursor populations that co-express cone and rod marker genes (two new paragraphs starting at the bottom of p. 15 and new Figures S11D-F);

      - comparison of expression patterns of immature cone precursor (iCP) marker genes in our and the Zuo et al dataset (new paragraph on top half of p. 17 and new Figure S13).

      We also compare the cell states discerned in our study and the Zuo et al. study in a new Discussion paragraph (bottom of p. 23) and new Figure S17.

      (2) The data generated comes from dissociated cells, which inherently lack spatial context. Additionally, it is unclear whether the dataset represents a pool of retinas from multiple developmental stages, and if so, whether the developmental stage is known for each cell profiled. If this information is available, the authors should examine the distribution of developmental stages on the UMAP and trajectory analysis as part of the quality control process. 

      We thank the reviewer for highlighting the importance of spatial context and developmental stage. 

      Related to whether the dataset represents a pool of retinae from multiple developmental stages, the different cell numbers examined at each time point are indicated in Figure S1A. To draw the readers’ attention to this detail, Figure S1A is now cited in the first sentence of the Results. 

      Related to the age-related cell distributions in UMAP plots, the distribution of cells from each retina and age was (and is) shown in Fig. S1F. In addition, we now highlight the age distributions by segregating the FW13, FW15-17, and FW17-18-19 UMAP positions in the new Figure 1C. We describe the rod temporal changes in a new sentence at the top of  p. 5:

      “Few rods were detected at FW13, whereas both early and late rods were detected from FW15-19 (Figure 1C), corroborating prior reports [15,20].”  

      We describe the cone temporal changes and note the likely greater discrimination of cell state changes that would be afforded by separately analyzing macula versus peripheral retina at each age in a new sentence at the bottom of p. 5:

      “L/M cone precursors from different age retinae occupied different UMAP regions, suggesting age-related differences in L/M cone precursor maturation (Figure 1C).”

      Moreover, they should assess whether different developmental stages impact gene expression and isoform ratios. It is well established that cone and rod progenitors typically emerge at different developmental times and in distinct regions of the retina, with minimal physical overlap. Grouping progenitor cells based solely on their UMAP positioning may lead to an oversimplified interpretation of the data.

      (2a) We agree that different developmental stages may impact gene expression and isoform ratios, and evaluated stages primarily based on established Louvain clustering rather than UMAP position. However, we also used UMAP position to segregate so-called RPC-localized and nonRPC-localized iCPs and iRPs, as well as to characterize the bridge region iCP sub-populations. In the revision, we examine whether cell groups defined by UMAP positions helped to identify transcriptomically distinct populations and further examine the spatiotemporal gene expression patterns of the same genes in the Zuo et al. 3’ snRNA-seq dataset. 

      (2b) Related to analyses of immediately post-mitotic iRPs and iCPs, the new Figure S10A expanded the violin plots first shown in Figure 5D to compare gene expression in RPC-localized versus non-RPC-localized iCPs and iRPs and subsequent cone and rod precursor clusters (also presented in response to Reviewer 3). The new Figure S10C, shows a similar analysis of UMAP region-specific regulon activities. These figures support the idea that there are only subtle UMAP region-related differences in the expression of the selected gene and regulons. 

      To further evaluate early cone and rod precursors, we compared expression patterns in our cluster- and UMAP-defined cell groups to those of the spatiotemporally defined cell groups in the Zuo et al. 3’ snRNA-seq study. The results revealed similar expression timing of the genes examined, although the cluster assignments of a subset of cells were brought into question, especially the assigned rod precursors at pcw 10 and 13, as shown in new Figures S10B (grey columns) and S11, and as described in two new paragraphs starting near the bottom of p.12. 

      (2c) Related to analyses of iCPs in the so-called bridge region, our analyses of the Zuo et al dataset helped distinguish early cone and rod precursor populations (expressing early markers such as ATOH7 and CHRNA1) from the later stages exhibiting rod- and cone-related gene coexpression, which had intermixed in the UMAP bridge region in our dataset. Further parsing of early cone precursor marker spatiotemporal expression revealed intriguing differences as now described in the second half of a new paragraph at the top of p. 17, as follows:

      “Also, different iCP markers had different spatiotemporal expression: CHRNA1 and ATOH7 were most prominent in peripheral retina with ATOH7 strongest at pcw 10 and CHRNA1 strongest at pcw 13; CTC-378H22.2 was prominently expressed from pcw 10-13 in both the macula and the periphery; and DLL3 and ONECUT1 showed the earliest, strongest, and broadest expression (Figure S13B). The distinct patterns suggest spatiotemporally distinct roles for these factors in cone precursor differentiation.”

      (3) I would commend the authors for performing a validation experiment via RNA in situ to validate some of the findings. However, drawing conclusions from analyzing a small number of cells can still be dangerous. Furthermore, it is not entirely clear how the subclustering is done. Some cells change cell type identities in the high-resolution plot. For example, some iPRP cells from the low-resolution plots in Figure 1 are assigned as TR in high-resolution plots in Figure 5.

      The authors should provide justification on the identifies of RPC localized iPRP and TR.

      Comparison of their data with other publicly available data should strengthen their annotation

      We agree that drawing conclusions from scRNA-seq or in situ hybridization analysis of a small number of cells can be dangerous and have followed the reviewer’s suggestion to compare our data with other publicly available data, focusing on the 3’ snRNA-seq of Zuo et al. given its large size and extensive annotation. Our analysis of  the Zuo et al. dataset helped clarify cell identities by segregating cone and rod precursors with similar gene expression properties in distinct UMAP regions. However, we noted that the clustering of early cone and rod precursors likely gave numerous mis-assigned cells (as noted in response 2b above and shown in the new Figure S11). It would appear that insights may be derived from the combination of relatively shallow sequencing of a high number of cells and deep sequencing of substantially fewer cells. 

      Related to how subclustering was done, the Methods state, “A nearest-neighbors graph was constructed from the PCA embedding and clusters were identified using a Louvain algorithm at low and high resolutions (0.4 and 1.6)[70],” citing the Blondel et al reference for the Louvain clustering algorithm used in the Seurat package.  To clarify this, the results text was revised such that it now indicates the levels used to cluster at low resolution (0.4, p. 4, 2nd paragraph) and at high resolution (1.6, top of p. 11) .

      Related to the assignment of some iPRP cells from the low-resolution plots in Figure 1 to the TR cluster (now called the ‘iRP’ ‘cluster) in the high-resolution plots in Figure 5, we suggest that this is consistent with Louvain clustering, which does not follow a single dendrogram hierarchy. 

      The justification for referring to these groups as RPC-localized iCPs and iRPs relates to their biased gene and regulon expression in Fig. 5D and 5E, as stated on p. 12: 

      “In the RPC-localized region, iCPs had higher ONECUT1, THRB, and GNAT2, whereas iRPs trended towards higher NRL and NR2E3 (p= 0.19, p=0.054, respectively).”

      (4) Late-stage LM5 cluster Figure 9 is not defined anywhere in previous figures, in which LM clusters only range from 1 to 4. The inconsistency in cluster identification should be addressed.

      We revised the text related to this as follows: 

      “Indeed, our scRNA-seq analyses revealed that SYK RNA expression increased from the iCP stage through cluster LM4, in contrast to its minimal expression in rods (Figure 10E).  Moreover, SYK expression was abolished in the five-cell group with properties of late maturing cones (characterized in Figure 1E), here displayed separately from the other LM4 cells and designated LM5 (Figure 10E).”  (p. 19-20)

      (5) Syk inhibitor has been shown to be involved in RB cell survival in previous studies. The manuscript seems to abruptly make the connection between the single-cell data to RB in the last figure. The title and abstract should not distract from the bulk of the manuscript focusing on the rod and cone development, or the manuscript should make more connection to retinoblastoma.

      We appreciate the reviewer’s concern that the title may seem to over-emphasize the connection to retinoblastoma based solely on the SYK inhibitor studies. However, we suggest the title also emphasizes the identification and characterization of early human photoreceptor states, per se, and that there are a number of important connections beyond the SYK studies that could warrant the mention of cell-state-specific retinoblastoma-related features in the title.

      Most importantly, a prior concern with the cone cell-of-origin theory was that retinoblastoma cells express RNAs thought to mark retinal cell types other than cones, especially rods. The evidence presented here, that cone precursors also express the rod-related genes helps resolve this issue. The issue is noted numerous times in the manuscript, as follows:  

      In the Introduction, we write:

      “However, retinoblastoma cells also express rod lineage factor NRL RNAs, which – along with other evidence – suggested a heretofore unexplained connection between rod gene expression and retinoblastoma development[12,13]. Improved discrimination of early photoreceptor states is needed to determine if co-expression of rod- and cone-related genes is adopted during tumorigenesis or reflects the co-expression of such genes in the retinoblastoma cell of origin.” (bottom, p. 2) And: 

      “In this study, we sought to further define the transcriptomic underpinnings of human  photoreceptor development and their relationship to retinoblastoma tumorigenesis.” (last paragraph, p. 3)

      The Discussion also alluded to this issue and in the revised Discussion, we aimed to make the connection clearer.  We previously ended the 3rd-to-last paragraph with,  

      “iPRP [now iCP] and early LM cone precursors’ expression of NR2E3 and NRL RNAs suggest that their presence in retinoblastomas[12,13] reflects their normal expression in the L/M cone precursor cells of origin.” 

      We now separate and elaborate on this point in a new paragraph as follows: 

      “Our characterization of cone and rod-related RNA co-expression may help resolve questions about the retinoblastoma cell of origin. Past studies suggested that retinoblastoma cells co-express RNAs associated with rods, cones, or other retinal cells due to a loss of lineage fidelity[12]. However, the early L/M cone precursors’ expression of NR2E3 and NRL RNAs suggest that their presence in retinoblastomas[12,13] reflects their normal expression in the L/M cone precursor cells of origin. This idea is further supported by the retinoblastoma cells’ preferential expression of cone-enriched NRL transcript isoforms (Figure S5B).” (middle of p. 24) Based on the above, we elected to retain the title.  

      Minor comments:

      (1) It is difficult to see the orange and magenta colors in the Fig 3E RNA-FISH image. The colors should be changed, or the contrast threshold needs to be adjusted to make the puncta stand out more.

      We re-assigned colors, with red for FL-NRL puncta and green for Tr-NRL puncta. 

      (2) Figure 5C on page 8 should be corrected to Supplementary Figure 5C.

      We thank the reviewer for noting this error and changed the figure citation.

      Reviewer #3 (Recommendations for the authors):

      (1) Minor concerns

      a. Abbreviation of some words needs to be included, example: FW. 

      We now provide abbreviation definitions for FW and others throughout the manuscript.  

      b. Cat # does not matches with the 'key resource table' for many reagents/kits. Some examples are: CD133-PE mentioned on Page # 22 on # 71, SMART-Seq V4 Ultra Low Input RNA Kit and SMARTer Ultra Low RNA Kit for the Fluidigm C1 Sytem on Page # 22 on # 77, Nextera XT DNA Library preparation kit on Page # 23 on # 77.

      We thank the reviewer for noting these discrepancies. We have now checked all catalog numbers and made corrections as needed.

      c. Cat # and brand name of few reagents & kits is missing and not mentioned either in methods or in key resource table or both. Eg: FBS, Insulin, Glutamine, Penicillin, Streptomycin, HBSS, Quant-iT PicoGreen dsDNA assay, Nextera XT DNA LibraryPreparation Kit, 5' PCR Primer II A with CloneAmp HiFi PCR Premix. 

      Catalog numbers and brand names are now provided for the tissue culture and related reagents within the methods text and for kits in the Key Resources Table. Additional descriptions of the primers used for re-amplification and RACE were added to the Methods (p. 28-29).

      d. Spell and grammar check is needed throughout the manuscript is needed. Example. In Page # 46 RXRγlo is misspelled as RXRlo.

      Spelling and grammar checks were reviewed.

      (2) Methods & Key Resource table.

      a. In Page # 21, IRB# needs to be stated.      

      The IRB protocols have been added, now at top of p. 26.

      b. In Page # 21, Did the authors dissociate retinae in ice-cold phosphate-buffered saline or papain?   

      The relevant sentence was corrected to “dissected while submerged in ice-cold phosphatebuffered saline (PBS) and dissociated as described10.” ( p. 26)

      c. In Page # 21, How did the authors count or enumerate the cell count? Provide the details.

      We now state, “… a 10 µl volume was combined with 10 µl trypan blue and counted using a hemocytometer” (top of p. 27)

      d. Why did the authors choose to specifically use only 8 cells for cDNA preparation in Page # 22? State the reason and provide the details.

      The reasons for using 8 cells (to prevent evaporation and to manually transfer one slide-worth of droplets to one strip of PCR tubes) and additional single cell collection details are now provided as follows (new text underlined): 

      “Single cells were sorted on a BD FACSAria I at 4°C using 100 µm nozzle in single-cell mode into each of eight 1.2 µl lysis buffer droplets on parafilm-covered glass slides, with droplets positioned over pre-defined marks … .  Upon collection of eight cells per slide, droplets were transferred to individual low-retention PCR tubes (eight tubes per strip) (Bioplastics K69901, B57801) pre-cooled on ice to minimize evaporation. The process was repeated with a fresh piece of parafilm for up to 12 rounds to collect 96 cells). (p. 27, new text underlined)

      e. Key resource table does not include several resources used in this study. Example - NR2E3 antibody.

      We added the NR2E3 antibody and checked for other omissions.

      (3) Results & Figures & Figure Legends

      a. Regulon-defined RPC and photoreceptor precursor states

      i. On page # 4, 1 paragraph - Clarify the sentence 'Exclusion of all cells with <100,000 cells read and 18 cells.........Emsembl transcripts inferred'. Did the authors use 18 cells or 18FW retinae? 

      The sentence was changed to:

      “After sequencing, we excluded all cells with <100,000 read counts and 18 cells expressing one or more markers of retinal ganglion, amacrine, and/or horizontal cells (POU4F1, POU4F2, POU4F3, TFAP2A, TFAP2B, ISL1) and concurrently lacking photoreceptor lineage marker OTX2. This yielded 794 single cells with averages of 3,750,417 uniquely aligned reads, 8,278 genes detected, and 20,343 Ensembl transcripts inferred (Figure S1A-C).” (p. 4, new words underlined)

      To clarify that 18 retinae were used, the first sentence of the Results was revised as follows:

      “To interrogate transcriptomic changes during human photoreceptor development, dissociated RPCs and photoreceptor precursors were FACS-enriched from 18 retinae, ages FW13-19 …” (p. 4).

      Why did the authors 'exclude cells lacking photoreceptor lineage marker OTX2' from analysis especially when the purpose here was to choose photoreceptor precursor states & further results in the next paragraph clearly state that 5 clusters were comprised of cells with OTX2 and CRX expression. This is confusing.

      We apologize for the imprecise diction. We divided the evidently confusing sentence into two sentences to more clearly indicate that we removed cells that did not express OTX2, as in the first response to the previous question.

      ii. In Page # 5, the authors reported the number of cell populations (363 large and 5 distal) identified in the THRB+ L/M-cone cluster. What were the # of cell populations identified in the remaining 5 clusters of the UMAP space?

      We added the cell numbers in each group to Fig. 1B. We corrected the large LM group to 366 cells (p. 5) and note 371 LM cells , which includes the five distal cells, in Figure 1B.

      b. Differential expression of NRL and THRB isoforms in rod and cone precursors

      i. In Figure 3B, the authors compare and show the presence of 5 different NRL isoforms for all the 6 clusters that were defined in 3A. However, in the results, the ENST# of just 2 highly assigned transcript isoforms is given. What are the annotated names of the three other isoforms which are shown in 3B? Please explain in the Results.

      As requested, we now annotate the remaining isoforms as encoding full-length or truncated NRL in Fig. 3B and show isoform structures in new Supplementary Figure S4B.  We also refer to each transcript isoform in the Results (p. 7, last paragraph) and similarly evaluate all isoforms in RB31 cells (Fig. S5B).

      ii. What does the Mean FPM in the y-axis of Fig 3C refer to?

      Mean FPM represents mean read counts (fragments per million, FPM) for each position across Ensembl NRL exons for each cluster, as now stated in the 6th line of the Fig. 3 legend.

      iii. A clear explanation of the results for Figures 3E-3F is missing.

      We revised the text to more clearly describe the experiment as follows:

      “The cone cells’ higher proportional expression of Tr-NRL first exon sequences was validated by RNA fluorescence in situ hybridization (FISH) of FW16 fetal retina in which NRL immunofluorescence was used to identify rod precursors, RXRg immunofluorescence was used to identify cone precursors, and FISH probes specific to truncated Tr-NRL exon 1T or FL-NRL exons 1 and 2 were used to assess Tr-NRL and FL-NRL expression (Figure 3E,F).” (p. 8, new text underlined).

      c. Two post-mitotic photoreceptor precursor populations

      i. Although deep-sequencing and SCENIC analysis clarified the identities of four RPC-localized clusters as MG, RPC, and iPRP indicative of cone-bias and TR indicative of rod-bias. It would be interesting to see the discriminating determinant between the TR and ER by SCENIC and deep-sequencing gene expression violin/box plots.

      We agree it is of interest to see the discriminating determinant between the TR [now termed iRP] and ER clusters by SCENIC and deep-sequencing gene expression violin/box plots. We now provide this information for selected genes and regulons of interest in the new Supplementary Figures S10A and S10C, along with a similar comparison between the prior high-resolution iPRP (now termed iCP) cluster and the first high-resolution LM cluster, LM1, as described for gene expression on p. 12:

      “Notably, THRB and GNAT2 expression did not significantly change while ONECUT1 declined in the subsequent non-RPC-localized iCP and LM1 stages, whereas NR2E3 and NRL dramatically increased on transitioning to the ER state (Figure S10A).”

      And as described for regulon activities on pp. 13-14:

      “Finally, activities of the cone-specific THRB and ISL2 regulons, the rod-specific NRL regulon, and the pan-photoreceptor LHX3, OTX2, CRX, and NEUROD1 regulons increased to varying extents on transitioning from the immature iCP or iRP states to the early-maturing LM1 or ER states (Figure 10C).”

      We also show expression of the same genes for spatiotemporally grouped cells from the Zuo et al. dataset in the new Figure S10B, which displays a similar pattern (apart from the possibly mixed pcw 10 and pcw13 designated rod precursors).

      d. Early cone precursors with cone- and rod-related RNA expression

      i. On page #12, the last paragraph where the authors explain the multiplex RNA FISH results of RXRγ and NR2E3 by citing Figure S8E. However, in Fig S8E, the authors used NRL to identify the rods. Please clarify which one of the rod markers was used to perform RNA FISH?

      Figure S8E (where NRL was used as a rod marker) was cited to remind readers that RXRg has low expression in rods and high expression in cones, rather than to describe the results of this multiplex FISH section. To avoid confusion on this point, Figure S8E is now cited using “(as earlier shown in Figure S8E).” With this issue clarified, we expect the markers used in the FISH + IF analysis will be clear from the revised explanation, 

      “… we examined GNAT2 and NR2E3 RNA co-expression in RXRg+ cone precursors in the outermost NBL and in RXRg+ rod precursors in the middle NBL … .” (p. 14-15).

      To provide further clarity, we provide a diagram of the FISH probes, protein markers, and expression patterns in the new Figure 7E.

      ii. The Y-axis of Fig 6G-6H needs to be labelled.

      The axes have been re-labeled from “Nb of cells” to “Number of RXRg+ outermost NBL cells in each region” (original Fig. 6G, now Fig. 7C) and “Number of RXRg+ middle NBL cells in each region” (original Fig. 6H, now Fig. 7D).

      iii. The legends of Figures 6G and 6H are unclear. In the Figure 6G legend, the authors indicate 'all cells are NR2E3 protein-'. Does that imply the yellow and green bars alone? Similarly, clarify the Figure 6H legend, what does the dark and light magenta refer to? What does the light magenta color referring to NR2E3+/ NR2E3- and the dark magenta color referring to NR2E3+/ NR2E3+ indicate? 

      We regret the insufficient clarity. We revised the Fig. 6G (now Fig. 7C) key, which now reads

      “All outermost NBL cells are NR2E3 protein-negative.”  We added to the figure legend for panel 7C,D “(n.b., italics are used for RNAs, non-italics for proteins).”  The new scheme in Figure 7E shows the RNAs in italics proteins in non-italics. We hope these changes will clarify when RNA or protein are represented in each histogram category.

      Overall, the results (on page # 13) reflecting Figures 6E-6H & Figure S11 are confusing and difficult to understand. Clear descriptions and explanations are needed.

      We revised this results section described in the paragraph now spanning p. 14:

      -  We now refer to the bar colors in Figures 7C and 7D that support each statement. 

      -  We provide an illustration of the findings in Figure 7E.

      iv. Previously published literature has shown that cells of the inner NBL are RXRγ+ ganglion cells. So, how were these RXRγ+ ganglion cells in the inner NBL discriminated during multiplex RNA FISH (in Fig 6E-6H and in Fig S11)?

      We thank the reviewer for requesting this clarification. We agree that “inner NBL” is the incorrect term for the region in which we examined RXRg+ photoreceptor precursors, as this could include RXRγ+ nascent RGCs. We now clarify that 

      “we examined GNAT2 and NR2E3 RNA co-expression in RXRg+ cone precursors in the outermost NBL and in RXRg+ rod precursors in the middle NBL … .”  (p. 14-15) We further state, 

      “Limiting our analysis to the outer and middle NBL allowed us to disregard RXRγ+ retinal ganglion cells in the retinal ganglion cell layer or inner NBL (top of p. 15)”

      Figure 7E is provided to further aid the reader in understanding the positions examined, and the legend states “RXRg+ retinal ganglion cells in the inner NBL and ganglion cell layer not shown. 

      v. In Figure 6E, what marker does each color cell correspond to?

      In this figure (now panel 7A), we declined to provide the color key since the image is not sufficiently enlarged to visualize the IF and FISH signals. The figure is provided solely to document the regions analyzed and readers are now referred to “see Figure S12 for IF + FISH images” (2nd line, p. 15), where the marker colors are indicated.

      vi. In Figure S11 & 6E, Protein and RNA transcript color of NR2E3, GNAT2 are hard to distinguish. Usage of other colors is recommended.  

      We appreciate the reviewer’s concern related to the colors (in the now redesignated Figure S12 and 7A); however, we feel this issue is largely mitigated by our use of arrows to point to the cells needed to illustrate the proposed concepts in Figure S12B. All quantitation was performed by examining each color channel separately to ensure correct attribution, which is now mentioned in the Methods (2nd-to-last line of Quantitation of FISH section, p. 35).

      vii. 

      With due respect, we suggest that labeling each box (now in Figure 8B) makes the figure rather busy and difficult to infer the main point, which is that boxed regions were examined at various distanced from the center (denoted by the “C” and “0 mm”) with distances periodically indicated. We suggest the addition of such markers would not improve and might worsen the figure for most readers.    

      e. An early L/M cone trajectory marked by successive lncRNA expression

      i. In Figure 8C - color-coded labelling of LM1-4 clusters is recommended.

      We note Fig. 8C (now 9C) is intended to use color to display the pseudotemporal positions of each cell. We recognize that an additional plot with the pseudotime line imposed on LM subcluster colors could provide some insights, yet we are unaware of available software for this and are unable to develop such software at present. To enable readers to obtain a visual impression of the pseudotime vs subcluster positions, we now refer the reader to Figure 5A in the revised figure legend, as follows:  (“The pseudotime trajectory may be related to LM1-LM4 subcluster distributions in Figure 5A.”).

      ii. In Figure 8G - what does the horizontal color-coded bar below the lncRNAs name refer to? These bars are similar in all four graphs of the 8G figure.

      As stated in the Fig. 8G (now 9G) legend, “Colored bars mark lncRNA expression regions as described in the text.”  We revised the text to more clearly identify the color code. (p. 18-19)   

      f. Cone intrinsic SYK contributions to the proliferative response to pRB loss

      i. In Fig 9F - The expression of ARR3+ cells (indicated by the green arrow in FW18) is poorly or rarely seen in the peripheral retina.

      We thank the reviewer for finding this oversight. In panel 9F (now 10F), we removed the green arrows from the cells in the periphery, which are ARR3- due to the immaturity of cones in this region. 

      ii. In Figure 9F - Did the authors stain the FW16 retina with ARR3?

      Unfortunately, we did not stain the FW16 retina for ARR3 in this instance.

      iii. Inclusion of DAPI staining for Fig 9F is recommended to justify the ONL & INL in the images.

      We regret that we are unable to merge the DAPI in this instance due to the way in which the original staining was imaged.  A more detailed analysis corroborating and extending the current results is in progress. 

      iv. Immunostaining images for Figure 9G are missing & are required to be included. What does shSCR in Fig 9G refer to?

      We now provide representative immunostaining images below the panel (now 10G). The legend was updated: “Bottom: Example of Ki67, YFP, and RXRg co-immunostaining with DAPI+ nuclei (yellow outlines). Arrows: Ki67+, YFP+, RXRg+ nuclei.”  The revised legend now notes that shSCR refers to the scrambled control shRNA.

      v. For Figure 9H - Is the presence and loss of SYK activity consistent with all the subpopulations (S & LM) of early maturing and matured cones?

      We appreciate the reviewer’s question and interest (relating to the redesignated Figure 10H); however, we have not yet completed a comprehensive evaluation of SYK expression in all the subpopulations (S & LM) of early maturing and matured cones and will reserve such data for a subsequent study. We suggest that this information is not critical to the study’s major conclusions.

      vi. Figure 9A is not explained in the results. Why were MYCN proteins assessed along with ARR3 and NRL? What does this imply?

      We thank the reviewer for noting that this figure (now Figure 10A) was not clearly described. 

      As per the response to Reviewer 1, point 6 , the text now states,  

      “The upregulation of MYC target genes was of interest given that many MYC target genes are also MYCN targets, that MYCN protein is highly expressed in maturing (ARR3+) cone precursors but not in NRL+ rods (Figure 10A), and that MYCN is critical to the cone precursor proliferative response to pRB loss [8–10].” (middle, p. 19, new text underlined).

      Hence, the figure demonstrates the cone cell specificity of high MYCN protein.  This is further noted in the Fig. 10a legend: “A. Immunofluorescent staining shows high MYCN in ARR3+ cones but not in NRL+ rods in FW18 retina.”

    1. Author response:

      Reviewer #1 (Public review):

      Functional lateralization between the right and left hemispheres is reported widely in animal taxa, including humans. However, it remains largely speculative as to whether the lateralized brains have a cognitive gain or a sort of fitness advantage. In the present study, by making use of the advantages of domestic chicks as a model, the authors are successful in revealing that the lateralized brain is advantageous in the number sense, in which numerosity is associated with spatial arrangements of items. Behavioral evidence is strong enough to support their arguments. Brain lateralization was manipulated by light exposure during the terminal phase of incubation, and the left-to-right numerical representation appeared when the distance between items gave a reliable spatial cue. The light-exposure induced lateralization, though quite unique in avian species, together with the lack of intense inter-hemispheric direct connections (such as the corpus callosum in the mammalian cerebrum), was critical for the successful analysis in this study. Specification of the responsible neural substrates in the presumed right hemisphere is expected in future research. Comparable experimental manipulation in the mammalian brain must be developed to address this general question (functional significance of brain laterality) is also expected.

      We sincerely appreciate the Reviewer's insightful feedback and his/her recognition of the key contributions of our study.

      Reviewer #2 (Public review):

      Summary:

      This is the first study to show how a L-R bias in the relationship between numerical magnitude and space depends on brain lateralisation, and moreover, how is modulated by in ovo conditions.

      Strengths:

      Novel methodology for investigating the innateness and neural basis of an L-R bias in the relationship between number and space.

      We would like to thank the Reviewer for their valuable feedback and for highlighting the key contributions of our study.

      Weaknesses:

      I would query the way the experiment was contextualised. They ask whether culture or innate pre-wiring determines the 'left-to-right orientation of the MNL [mental number line]'.

      We thank the Reviewer for raising this point, which has allowed us to provide a more detailed explanation of this aspect. Rather than framing the left-to-right orientation of the mental number line (MNL) as exclusively determined by either cultural influences or innate pre-wiring, our study highlights the role of environmental stimulation. Specifically, prenatal light exposure can shape hemispheric specialization, which in turn contributes to spatial biases in numerical processing. Please see lines 115-118.

      The term, 'Mental Number Line' is an inference from experimental tasks. One of the first experimental demonstrations of a preference or bias for small numbers in the left of space and larger numbers in the right of space, was more carefully described as the spatialnumerical association of response codes - the SNARC effect (Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and numerical magnitude. Journal of Experimental Psychology: General, 122, 371-396).

      We have refined our description of the MNL and SNARC effect to ensure conceptual accuracy in the revised manuscript; please see lines 53-59.

      This has meant that the background to the study is confusing. First, the authors note, correctly, that many other creatures, including insects, can show this bias, though in none of these has neural lateralisation been shown to be a cause. Second, their clever experiment shows that an experimental manipulation creates the bias. If it were innate and common to other species, the experimental manipulation shouldn't matter. There would always be an LR bias. Third, they seem to be asserting that humans have a left-to-right (L-R) MNL. This is highly contentious, and in some studies, reading direction affects it, as the original study by Dehaene et al showed; and in others, task affects direction (e.g. Bachtold, D., Baumüller, M., & Brugger, P. (1998). Stimulus-response compatibility in representational space. Neuropsychologia, 36, 731-735, not cited). Moreover, a very careful study of adult humans, found no L-R bias (Karolis, V., Iuculano, T., & Butterworth, B. (2011), not cited, Mapping numerical magnitudes along the right lines: Differentiating between scale and bias. Journal of Experimental Psychology: General, 140(4), 693-706). Indeed, Rugani et al claim, incorrectly, that the L-R bias was first reported by Galton in 1880. There are two errors here: first, Galton was reporting what he called 'visualised numerals', which are typically referred to now as 'number forms' - spontaneous and habitual conscious visual representations - not an inference from a number line task. Second, Galton reported right-to-left, circular, and vertical visualised numerals, and no simple left-to-right examples (Galton, F. (1880). Visualised numerals. Nature, 21, 252-256.). So in fact did Bertillon, J. (1880). De la vision des nombres. La Nature, 378, 196-198, and more recently Seron, X., Pesenti, M., Noël, M.-P., Deloche, G., & Cornet, J.-A. (1992). Images of numbers, or "When 98 is upper left and 6 sky blue". Cognition, 44, 159-196, and Tang, J., Ward, J., & Butterworth, B. (2008). Number forms in the brain. Journal of Cognitive Neuroscience, 20(9), 1547-1556.

      We sincerely appreciate the opportunity to discuss numerical spatialization in greater detail. We have clarified that an innate predisposition to spatialize numerosity does not necessarily exclude the influence of environmental stimulation and experience. We have proposed an integrative perspective, incorporating both cultural and innate factors, suggesting that numerical spatialization originates from neural foundations while remaining flexible and modifiable by experience and contextual influences. Please see lines 69–75.

      We have incorporated the Reviewer’s suggestions and cited all the recommended papers; please see lines 47–75.

      If the authors are committed to chicks' MN Line they should test a series of numbers showing that the bias to the left is greater for 2 and 3 than for 4, etc. 

      What does all this mean? I think that the paper should be shorn of its misleading contextualisation, including the term 'Mental Number Line'. The authors also speculate, usefully, on why chicks and other species might have a L-R bias. I don't think the speculations are convincing, but at least if there is an evolutionary basis for the bias, it should at least be discussed.

      In the revised version of the manuscript, we have resorted to adopt the Spatial Numerical Association (SNA). We thank the Reviewer for this valuable comment.

      We appreciated the Reviewer’s suggestion regarding the evolutionary basis of lateralization and have included considerations of its relevance in chicks and other species; please see lines 143-151 and 381-386.

      This paper is very interesting with its focus on why the L-R bias exists, and where and why it does not.

      We wish to thank the Reviewer again for his/her work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) In several instances the paper does not address apparent inconsistencies between the prior literature and the findings. For example, the first main finding is that recalled items have more differentiated lateral temporal cortex representations within lists than not recalled items. This seems to be the opposite of the prediction from temporal context models that are used to motivate the paper-context models would predict that greater contextual similarity within a list should lead to greater memory through enhanced temporal clustering in recall. This is what El-Kalliny et al (2019) found, using a highly similar design (free recall, intracranial recordings from the lateral temporal lobe). The authors never address this contradiction in any depth to reconcile it with the previous literature and with the motivating theoretical model. 

      Figure 2 supports the findings from El-Kalliny and colleagues because it shows the relationship of each list item relative to the first item (El-Kalliny et al. 2019). Items encoded adjacent to SP1 show the highest spectral similarity supporting the idea of overlapping context predicted by the Temporal Context Model. However, our figure characterizes how increasing inter-item distance affects spectral similarity. It shows that two items successfully recalled from temporally distant serial positions show reduced spectral similarity. These findings align with the predictions of the temporal context model because two temporally distant items would lack significant contextual overlap and therefore would have more distinct spectral representations.

      El-Kalliny and colleagues do use a similar experimental set-up however the authors define drift differently. They identified patients with a tendency to temporally cluster, and observed those patients tend to drift less between temporally clustered items however they do not specify drift relative to a constant serial position as we do in our analysis. They define drift as spectral change between two adjacent items which is a more relative measure between any two items rather than in relation to a fixed point like SP1. Finally, our analysis focuses only on gamma activity while El-Kalliny and colleagues identified drift across a much broader set of frequency bands.

      (2) The way that the authors conduct the analysis of medial parietal neural similarity at boundaries leads to results that cannot be conclusively interpreted. The authors report enhanced similarity across lists for the first item in each list, which they interpret as reflecting a qualitatively distinct boundary signal. However, this finding can readily be explained by contextual drift if one assumes that whatever happens at the start of each list is similar or identical across lists (for example, a get ready prompt or reminder of instructions). The authors do not include analyses to rule this out, which undermines one of the main findings. 

      Extensions of the temporal context model (Lohnas et al. 2015) predict context at the beginning of a list will be most similar to the end of the prior list. The theory assumes a single-context state, consisting of a recency-weighted average of prior items, that is updated, even across different encoding periods.

      However, our results show a boundary item representation is most similar to the prior lists first item rather than the last item. Our results conflict with the extension of TCM because the shared similarity of boundary items suggests the context state for the first item in the list is not a recency-weighted average of the items presented immediately prior. The same boundary sensitive signal is not present in other regions, namely the hippocampus and lateral temporal cortex. Those regions do not show similarity between items at the beginning of each list.  

      Our main conclusion from these data was that the medial parietal lobe activity seems to be specifically sensitive to task boundaries, defined by the first event or the get ready prompt, while other regions are not.

      (3) Although several previous studies have linked hippocampal fMRI and electrophysiological activity at event boundaries with memory performance, the authors do not find similar relationships between hippocampal activity, event boundaries, and memory There are potential explanations for why this might be the case, including the distinction between item vs. associative memory, which has been a prominent feature of previous work examining this question. However, the authors do not address these potential explanations (or others) to explain their findings' divergence from prior work -this makes it difficult to interpret and to draw conclusions from the data about the hippocampus' mechanistic role in forming event memories.

      The following text was added and revised in the discussion to discuss hippocampal activity shown in our results and its lack of sensitivity to boundaries.  

      “Spectral activity in the medial parietal lobe aligned closely with boundaries. Drift between item pairs seemed to reset at each boundary, leading to renewed similarity after each boundary. This observation aligns with previous work suggesting boundaries reset temporal context.  In the temporal cortex, our findings extend prior studies which suggest the temporal lobe may play a role in associating adjacently presented items (Yaffe et al. 2014, ElKalliny et al 2019). We found items encoded in distant serial positions, but within the same list, drifted significantly more than items from adjacent serial positions (Figure 2C). Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional to the time elapsed between them. However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ben-Yakov et al. 2018, Ezzyat et al.  2014; Griffiths et al. 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al. 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions.”

      (4) There is a similar absence of interpretation with respect to the previous literature for the data showing enhanced boundary-related similarity in the medial parietal cortex. The authors’ interpretation seems to be that they have identified a boundary-specific signal that reflects a large and abrupt change in context, however, another plausible interpretation is that enhanced similarity in the medial parietal cortex is related to a representation of a schema for the task structure that has been acquired across repeated instances. 

      We agree our results could suggest the MPL creates a generalized situational model or schematic of the task. Unfortunately, our behavioral task does not allow us to differentiate between these ideas and pure boundary representation. However, given boundaries are a component in defining situational models, we chose to interpret our results conservatively as a form of boundary representation.  

      (5) The authors do not directly compare their model to other models that could explain how variability in neural activity predicts memory. One example is the neural fatigue hypothesis, which the authors mention, however there are no analyses or data to suggest that their data is better fit by a boundary/contextual drift mechanism as opposed to neural fatigue. 

      The study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of nonrecalled items in all serial positions to demonstrate the lack of boundary representation in first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (6) P2. Line 65 cites Polyn et al (2009b) as an example where ‘random’ boundary insertions improve subsequent memory. However, the boundaries in that study always occurred at the same serial position and were therefore completely predictable and not random.

      The citation was removed from the corresponding sentence.

      (7) P2. Line 74 cites Pu et al. (2022) as an example of medial temporal lobe ‘regional activity’ showing sensitivity to event boundaries; however, this paper reported behavioral and computational modeling results and did not include measurement of neural activity. 

      The citation was removed from the corresponding sentence.

      (8) P.3 Line 117, Hseih et al (2014) and Hseih and Ranganath (2015) are cited as evidence that ‘spectral’ relatedness decreases as a function of distance, but neither of these studies examined ‘spectral’ activity (fMRI univariate and multivariate). The manuscript would benefit from a careful review and updating of how the prior literature is cited, which will increase the impact of the findings for readers. 

      The text has been updated to reflect this distinction by modifying the statement to:  “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (9) Several previous studies have found hippocampal activity at event boundaries correlates with memory performance (Ben-Yakov et al 2011, 2018; Baldassano et al 2017), yet here the authors do not find evidence for hippocampal activity at event boundaries related to memory. Does this difference reflect something important about how the hippocampus vs. medial parietal cortex vs. lateral temporal cortex contribute to memory formation? Currently, there is not much discussion about how to interpret the differences between brain regions. Previous work has suggested that hippocampal pattern similarity at event boundaries specifically supports associative memory across events (Ezzyat & Davachi, 2014; Griffiths & Fuentemilla, 2020; Heusser et al., 2016), which may help explain their findings. In any case the authors could increase the impact of their paper by further situating their findings within the previous literature. 

      We would not suggest there is no boundary-related activity in the hippocampus. Similar to an earlier point made by the reviewer, to clarify our interpretation of regional differences, the following text has been added to the discussion.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) The authors mention neural fatigue as an alternative theory to explain the primacy effect (Serruya et al., 2014), however there are no analyses or data to suggest that their data is better fit by a boundary mechanism as opposed to neural fatigue. Previous studies have shown that gamma activity in the hippocampus changes with serial position and with encoding history (Serruya et al 2014; Lohnas et al 2020). Here, the authors could compare the reported pattern similarity results to control analyses that replicate this prior work, which would strengthen their argument that there is unique information at boundaries that is distinct from a neural fatigue signal. 

      The serial position effects described by Serruya and colleagues describe decreasing HFA with increasing serial position in the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2014). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global neural fatigue model does not account for our results.

      Notably, the authors do not characterize HFA trends in the MPL. Nevertheless, their findings do not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.  

      Next, the neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2015). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (11) For the analyses that examine cross-list similarity (e.g. the medial parietal analysis in Figure 3), how did the authors choose the number of lists over which similarity was calculated? Was the selection of this free parameter cross-validated to ensure that it is not overfitting the data? Given that there were 25 lists per session, using the three succeeding lists seems arbitrary. Why not use every list across the whole session? 

      Given the volume of data, number of patients, and computational time available at our facility, we extended the analysis as far as we could to characterize the observed trend.

      (12) P4. Line 155 says that Figure 3C shows example subject data, but it looks like it is actually Figure 3D. 

      The text was updated to reference the correct figure.

      (13) The t-tests on P.4 Line 159 have two sets of degrees of freedom but should only have one. 

      The t-tests described by Figure 3B represent the mean parameter estimate of the predictor for boundary proximity contrasted by region for all item pairs. The statistical test in this case was an unpaired t-test between parameter estimates for patients with electrodes in each of the regions. The numbers within parentheses represent the sample size, or number of subjects, contributing electrodes to each region.

      Reviewer 2:

      (1) Because this is not a traditional event boundary study, the data are not ideally positioned to demonstrate boundary specific effects. In a typical study investigating event boundary effects, a series of stimuli are presented and within that series occurs an event boundary – for instance, a change in background color. The power of this design is that all aspects between stimuli are strictly controlled – in particular, the timing – meaning that the only difference between boundary-bridging items is the boundary itself. The current study was not designed in this manner, thus it is not possible to fully control for effects of time or that multiple boundaries occur between study lists (study to distractor, distractor to recall, recall to study). Each list in a free recall study can be considered its own “mini” experiment such that the same mechanisms should theoretically be recruited across any/all lists. There are multiple possible processes engaged at the start of a free recall study list which may not be specific to event boundaries per se. For example, and as cited by the authors, neural fatigue/attentional decline (and concurrent gamma power decline) may account for serial position effects. Thus, SP1 on all lists will be similar by virtue of the fact that attention/gamma decrease across serial position, which may or may not be a boundaryspecific effect. In an extreme example, the analyses currently reported could be performed on an independent dataset with the same design (e.g. 12 word delayed free recall) and such analyses could potentially reveal high similarity between SP1-list1 in the current study and SP1-list1 in the second dataset, effects which could not be specifically attributed to boundaries.

      The neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (2) Comparisons of recalled "pairs" does not account for the lag between those items during study or recall, which based on retrieved context theory and prior findings (e.g. Manning et al., 2011), should modulate similarity between item representations. Although the GLM will capture a linear trend, it will not reveal serial position specific effects. It appears that the betas reported for the SP12 analyses are driven by the fact that similarity with SP12 generally increases across serial position, rather a specific effect of "high similarity to SP12 in adjacent lists" (Page 5, excluding perhaps the comparison with list x+1). It is also unclear how the SP12 similarity analyses support the statement that "end-list items are represented more distinctly, or less similarly, to all succeeding items" (Page 5). It is not clear how the authors account for the fact that the same participants do not contribute equally to all ROIs or if the effects are consistent if only participants who have electrodes in all ROIs are included.

      In our study, all pairs are defined by the lag between a reference and target item. The results in Figure 3 show the similarity between each serial position in relation to SP1; Figure 4 shows lag between each serial position relative to SP2 and 3; and Figure 5 shows lag relative to SP12. Each statistical model accounts for the lag by ordering the data by increased inter-item distance. Further, our definition of lag is significantly more rigorous than that used by Manning and colleagues. Our similarity results for Figures 3-5 characterize the change in similarity relative to a constant reference point, such as SP1, rather than a relative reference point, such as +1 lag, which aggregates similarity between pairs such as SP1 to SP2 with SP4 to SP5, which maybe recalled via different memory mechanisms.  

      In Figure 5, we agree your characterization that ‘similarity with SP12 generally increases across serial position’ is a more accurate description of the trend. The text has been updated to reflect this by changing the interpretation to “later serial positions in adjacent lists shared a gradually increasing similarity to SP12.”  

      Next, we clarify the statement "end-list items are represented more distinctly, or less similarly, to all succeeding items". When recalling SP12, the subsequent items recalled exhibit significantly lower similarity to SP12 (see Figure 5D, pink). Consequently, the spectral representation of successfully recalled end-list items appears more distinct from later items in similar serial positions. This stands in contrast to our observations illustrated in Figures 3 and 4, where successfully recalled start-list items demonstrate greater similarity to later items in similar serial positions.

      (3) The authors use the term "perceptual" boundary which is confusing. First, "perceptual boundary" seems to be a specific subset of the broader term "event boundary," and it is unclear why/how the current study is investigating "perceptual" boundaries specifically. Second and relatedly, the current study does not have a sole "perceptual" boundary (as discussed in point 1 above), it is really a combination of perceptual and conceptual since the task is changing (from recalling the words in the previous list to studying the words in the current list OR studying the words in the current list to solving math problems in the current list) in addition to changes in stimulus presentation. 

      We agree with the statement that ‘perceptual’ as a modifier to the boundaries described here does not add significant information. Therefore, we have removed all reference to perceptual boundaries.

      (4) Although the results show that item-item similarity in the gamma band decreases across serial position, it is unclear how the present findings further describe "how gamma activity facilitates contextual associations" (Page 5). As mentioned in point 1 above, such effects could be driven by attentional declines across serial position -- and a concurrent decline in gamma power -- which may be unrelated to, and actually potentially impair, the formation of contextual associations, given evidence from the literature that increased gamma power facilitates binding processes.

      We agree that our study does not elucidate a mechanistic relationship between gamma power and contextual associations. The referenced sentence has been changed to: “how gamma activity is associated with context”.

      Please see our response to point 1 above. In addition, studies demonstrating decreasing gamma power with increasing serial position focus primarily on the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2012). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global attentional decline or neural fatigue model does not account for our results.

      Notably, HFA trends in the MPL are poorly described. Further, gamma power decline does not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.

      (5) Some of the logic and interpretations are inconsistent with the literature. For example, the authors state that "The temporal context model (TCM) suggests that gradual drift in item similarity provides context information to support recovery of individual items" however, this does not seem like an accurate characterization of TCM. According to TCM, context is a recency-weighted average of previous experience. Context "drifts" insofar as information is added to/removed from context. Context drift thus influences item similarity -- it is not that item similarity itself drifts, but that any change in item-item similarity is due to context drift. 

      The current findings do not appear at odds with the conceptualization of drift and context in current version of the context maintenance and retrieval model. Furthermore, the context representation is posited to include information beyond basic item representations. Two items, regardless of their temporal distance, can be associated with similar contexts if related information is included in both context representations, as predicted and shown for multiple forms of relatedness including semantic relatedness (Manning & Kahana, 2012) and task relatedness (Polyn et al., 2012).

      We revised the sentence and encompassing paragraph to describe the temporal context model more accurately and emphasize how our findings align with the stated version of CMR. The revised text is below:  

      “Next, we asked how gamma spectral activity reflects contextual association between items. In the medial parietal lobe, we observed recurring similarity between items distant in time but adjacent to boundaries. This pattern suggests spectral activity may carry information about an item's relationship to a boundary. These observations align with the Context Maintenance and Retrieval model which extends the predictions of TCM to encompass broader relationships among items. Our results demonstrate boundaries as an important aspect of context and specify the spectral and regional properties of these boundary-related contextual features.”

      (6) Lohnas et al. (2020) Neural fatigue influences memory encoding in the human hippocampus, Neuropsychologia, should be cited when discussing neural fatigue

      Thank you for your suggestion. The citation has been added to the text.

      (7) A within-list, not an across list, similarity analysis should be used to test the interpretation that end-of-list items are more distinct than other list items.

      We believe this recommendation refers to the following line in our text: “These findings suggest end-list items are represented more distinctly, or less similarly, to all succeeding items.” Our statement compares list x, SP12 to all succeeding items (in list x+1, x+2, etc.). Therefore, this statement refers to items in the next lists which is why we performed an across list analysis rather than within-list one.

      (8) It is unclear why it is necessary to use PCA to estimate similarity between items.

      PCA was used to reduce the dimensionality of the time-frequency matrix for the gamma band. This technique allowed us to compare predominant trends in gamma between items. In addition, we added a figure showing 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (9) Lags are listed as -4, 4 (Page 8), however with a list length of 12, possible lags should be 11, 11.

      The listed parenthetical statement ‘(-4 to 4)’ referred to Figure 1 where Lag CRP is shown for transitions from -4 to 4. However, we did calculate lag CRP for all possible transitions. Therefore, the referenced phrase was changed to: “Lagged CRP was calculated for all possible transitions (-11 to 11).”

      (10) Hsieh et al. 2014 and Hsieh & Ranganath (2015) are fMRI studies and as such, do not support the statement "Previous work consistent with temporal context models suggests spectral relatedness reduces as a function of distance between words" (Page 3). 

      The statement has been revised to: “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (11) Although statistically one can measure "How item-item similarity is affected by recollection" (Page 3), this is logically backwards, given that similarity during study necessarily precedes performance during free recall. Additionally, it is erroneous to assume that recalled words are "recollected" without additional measurements (e.g. Mickes et al. (2013) Rethinking familiarity: Remember/Know judgments in free recall, JML).

      The statement was changed to “item-item similarity is affected based on successful recall” given recollection cannot be determined in our paradigm.

      Reviewer 3:

      (1) My primary confusion in the current version of this paper is that the analyses don't seem to directly compare the two proposed models illustrated in Fig 1B, i.e. the temporal context model (with smooth drifts between items, including across lists) versus the boundary model (with similarities across all lists for items near boundaries). After examining smooth drift in the within-list analysis (Fig 2), the across-list analyses (Figs 3-5) use a model with two predictors (boundary proximity and list distance), neither of which is a smoothlydrifting context. Therefore there does not appear to be a quantitative analysis supporting the conclusion that in lateral temporal cortex "drift exhibits a relationship with elapsed time regardless of the presences of intervening boundaries" (lines 272-3).

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists.

      However, we agree with the comment that the presented data does not directly support the lateral temporal cortex drifts independent of intervening boundaries. Therefore, we amended the statement to: “We found successfully recalled items encoded in distant serial positions drifted significantly more than items from adjacent serial positions (Figure 2C)”. Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional time elapsed between them.”

      (2) The feature representation used for the neural response to each item is a gamma power time-frequency matrix. This makes it unclear what characteristics of the neural response are driving the observed similarity effects. It appears that a simple overall scaling of the response after boundaries (stronger responses to initial items during the beginning portion of the 1.6s time window) would lead to the increased cosine similarity between initial items, but wouldn't necessarily reflect meaningful differences in the neural representation or context of these items.

      Our study aims to draw the connection between the neural response after boundaries with neural representation and context of these items. Prior studies (Manning et al. 2011, El Kalliny et al. 2017) have interpreted similarity in neural spectra as a memory relevant phenomenon. We use very similar methods to perform our analysis.  

      In addition, we compare the fit of our boundary similarity model to behavioral performance to show increased boundary representation correlates with improved boundary item recall.

      While our study does not specify which time-frequency components underly the increased similarity, we do limit our analysis to the gamma band. Traditional analyses include log-scaled, broadband time-frequency data (eg. 3-100hz) from which we specify the relevance of a much narrower spectral band.  

      Finally, we tried to study which time–frequency components contributed to the increased similarity, but it varied greatly between patients (see Figure 3 – supplementary figure 2D). Hence, we opted to use principal component analyses to compare the features showing the most variation for each given participant. This added analytical step allows us to detect boundary effects across patients despite individual variability in boundary representation.

      (3) The specific form of the boundary proximity models is not well justified. For initial items, a model of e^(1-d) is used (with d being serial position), but it is not stated how the falloff scale of this model was selected (as opposed to e.g. e^((1-d)/2)). For final items, a different model of d/#items is used, which seems to have a somewhat different interpretation (about drift between boundaries, rather than an effect specific to items near a final boundary). The schematic in Fig 1B appears to show a hypothesis which is not tested, with symmetric effects at initial and final boundaries.

      The boundary proximity models were chosen empirically. Our model was intended to quantify a decreasing relationship across many patients. We acknowledge the constants and variables may not definitively describe underlying neural processes.  

      For start- and end-list boundaries, we used different models because primacy and recency effects are unique phenomena. Primacy memory is classically thought to arise from rehearsal during the encoding time (Polyn et al. 2009, Lohnas et al. 2015). Alternatively, recency memory is thought to arise from strong contextual cues of recency items during recall due to their temporal proximity. Therefore, we have a limited basis on which to assume their spectral representation in relation to task boundaries would be symmetric.

      (4) The main text description of Fig 2 only describes drift effects in lateral temporal cortex, but Fig 2 - supplement 1 shows that there is also drift and a significant subsequent memory effect in the other two ROIs as well. There is not a significant memory x drift slope interaction in these regions; are the authors arguing that the lack of this interaction (different drift rates for remembered versus forgotten items) is critical for interpreting the roles of lateral temporal cortex versus medial parietal and hippocampal regions?

      Yes. Fig 2- Supplement 1 shows that drift occurs in both the HC and MPL. However, the interaction term is not significant, which suggests that the rate of drift between recalled and non-recalled items is not significantly different.  

      In contrast, Fig 2C shows that recalled pairs drift at a higher rate than non-recalled pairs. For the LTC, the interaction term is negative in magnitude and statistically significant. This suggests successfully encoded item pairs encoded far apart share more distinct spectral representations, specifically in the LTC. These findings lead to our interpretation in the discussion that “elevated drift rate might allow the representations of recalled items to remain distinct but ordered in memory.”

      (5) The parameter fits for the "list distance" regressor are not shown or analyzed, though they do appear to be important for the observed similarity structure (e.g. Fig 3E). I would interpret this regressor as also being "boundary-related" in the sense that it assumes discrete changes in similarity at boundaries.

      Parameter fits for the ‘list distance’ regressor are now shown in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant.

      (6) To make strong claims about temporal context versus boundary models as implied by Fig 1B, these two regressors should be fit within the same model to explain across-list similarity. The temporal context model could be based on the number of intervening items (as in Fig 1B) or actual time elapsed between items. The relationship between the smoothly drifting temporal context model and the discretely-jumping list distance models should also be clarified.

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. A model which included a ‘temporal context regressor’ would not be able to account for the presence of a boundary effect and would not allow us to demonstrate a boundary representation in the presence of drift. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists. These regressors allow the model to differentiate between intra-list changes (the boundary regressor) verses inter-list changes (the list distance regressor).  

      (7) The features of the time-frequency matrix that are driving similarity between events could be visualized to provide a better understanding of the boundary-related signals. The analysis could also be re-run with reduced versions of the feature space in order to determine the critical components of this signal; for example, responses could be averaged across time to examine only differences across frequencies, or across frequencies to examine purely temporal changes across the 1.6 second window.

      Figure 3 – supplementary figure 2 A-C has been added to show varying the number of principal components (PCs) does not change the trend of boundary sensitivity in the MPL. In addition, we included 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (8) If the authors are considering a space of multiple models as "boundary proximity models" (e.g. linear models and exponential models with different scale factors), this should be part of the model-fitting process rather than a single model being selected posthoc.

      We agree with the reviewer’s suggestion that the most ideal way to fit a model to the trend would be using a model-fitting process. However, due to a limitation on the amount of computational resources available, we were not able to perform it given the size of our dataset.

      (9) The interpretation of region differences in the results in Fig 2 and Fig 2 - supplement 1 should be clarified. 

      In discussion, we have added the following text to clarify our interpretation of the regional differences shown in the mentioned figures.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2018). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) Whether there are significant fits for the list distance regressor, and whether these fits vary across regions, could be stated. The list distance regressor could also be directly compared (in the same model) to a temporal-context regressor, which predicts graded changes in similarity between items rather than the discrete changes between lists.

      We have added parameter fits for the ‘list distance’ regressor in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant. Therefore, our results show very similar stepwise decrease in similarity across lists between regions (list distance regressor; Figure 3 —supplementary figure 1B).

      We could not compare these parameters to a separate model which includes a smoothly drifting ‘temporal-context’ regressor due to the regressors collinearity with any representation of boundary. See our response to Reviewer 3 –comment 6.  

      (11) The authors should clarify their interpretation of the results, and whether they are proposing a tweak to the temporal context model or a substantially different organizational system. 

      In the disucssion we include the following statements to clarify what we suggest regarding the temporal context model.  

      “Our findings suggest a broader scope of contextual association than just prior items, where temporal proximity as well as task structure in the form of boundaries, play intertwined roles in contextual construction. Our data therefore have implications for updated iterations of the temporal context model incorporating (perhaps) specific terms for boundary information. This may in turn provide a more systematic prediction of primacy effects in behavioral data.”  

      (12) Minor typos and corrections: 

      52: using -> use 

      108: patients -> patients'  156: list -> lists 

      The list distance plot is described as "pink" in Fig 3 and Fig 5 - supplement 1, but appears gray in the figures.

      Each of these corrections has been corrected in the text.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their very constructive and helpful comments on the previous version of this manuscript. They have focused on some important issues and have raised many valuable questions that we expect to answer as research begins on these markings. As has been often the case with preprints, a number of experts beyond the four reviewers and editor have provided comments, questions, and suggestions, and we have taken these on board in our revision of the manuscript. In particular, Martinón-Torres et al. (2024) focused several comments upon this manuscript and raise some points that were not considered by the reviewers, and so we discuss those points here in addition to the reviewer comments.

      Some of us have been engaged in other aspects of the possible cultural activities of Homo naledi. After the discovery of these markings we considered it indefensible to publish further research on the activity of H. naledi within this part of the cave system without making readers aware that the H. naledi skeletal remains occur in a spatial context near markings on cave walls. Of course, the presence of markings leaves many questions open. A spatial context does not answer all questions about the temporal context. The situation of the Dinaledi Subsystem does entail some constraints that would not apply to markings within a more open cave or rock wall, and we discuss those in the text.

      We find ourselves in agreement with most of the reviewers on many points. As reflected by several of the reviewers, and most pointedly in the remarks by reviewer 1, the purpose of this preprint is a preliminary report on the observation of the markings in a very distinctive location. This initial report is an essential step to enable further research to move forward. That research requires careful planning due to the difficulty of working within the Dinaledi Subsystem where the markings are located. This pattern of initial publication followed by more detailed study is common with observations of rock art and other markings identified in South Africa and elsewhere. We appreciate that the reviewers have understood the role of this initial study in that process of research.

      Because of this, the revised manuscript represents relatively minimal changes, and all those at the advice of reviewers. Many thanks to all the reviewers for noting various typographic errors, missed references and other issues that we have done our best to fix in the revised manuscript.

      Expertise of authors. Reviewer 4 mentions that the expertise of the authors does not include previous publication history on the identification of rock art, and other reviewers briefly comment that experts in this area would enhance the description. AF does have several publications on ancient engravings and other markings; LRB has geological training and field experience with rock art. Notwithstanding this, we do take on board the advice to include a wider array of subject experts in this research, and this is already underway.

      Image enhancement. We appreciate the suggestions of some reviewers for possible strategies to use software filters to bring out details that may not be obvious even with our cross-polarization lighting and filtering. These are great ideas to try. In this manuscript we thought that going very far into software editing or image enhancement might be perceived by some readers as excessive manipulation, particularly in an age of AI. In future work we will experiment with the suggested approaches. 

      Natural weathering. In the process of review and commentary by experts and the public there has been broad acceptance that many of the markings illustrated in this paper are artificial and not a product of natural weathering of the dolomite rock. We deeply appreciate this. At the same time, we accept the comments from reviewers that some markings may be difficult to differentiate from natural weathering, and that some natural features that were elaborated or altered may be among the markings we recognize. On pages 3 and 4 we present a description of the process of natural subaerial weathering of dolomite, which we have rooted in several references as well as our own observations of the natural weathering visible on dolomite cave walls in the Rising Star cave system. This includes other cave walls within the Dinaledi Subsystem. We discuss the “elephant skin” patterning of natural dolomite surface weathering, how that patterning emerges, and how that differs from the markings that are the subject of this manuscript.

      Animal claw marks. Martinón-Torres et al. 2024 accept that some of the markings illustrated on Panel A are artificial, but they offer the hypothesis that some of those markings may be consistent with claw marks from carnivores or other mammals. They provide a photo of claw marks within a limestone cave in Europe to illustrate this point. On pages 5 and 6 of the revised manuscript we discuss the hypothesis of claw marks. We discuss the presence of animals in southern Africa that may dig in caves or mark surfaces. However the key aspect of the Malmani dolomite caves is that the hardness of dolomitic limestone rock is much greater than many of the limestone caves in other regions such as Europe and Australia, where claw marks have been noted in rock walls. As we discuss, we have not been able to find evidence of claw marks within the dolomite host bedrock of caves in this region, although carnivores, porcupines, and other animals dig into the soft sediments within and around caves. The form of the markings themselves also counter-indicates the hypothesis that they are claw marks. 

      Recent manufacture. One comment that occurs within the reviews and from other readers of the preprint is that recent human visitors to the cave, either in historic or recent prehistoric times, may have made these marks. We discuss this hypothesis on page 6 of the revised manuscript. The simple answer is that no evidence suggests that any human groups were in the Dinaledi Subsystem between the presence of H. naledi and the entry of explorers within the last 25 years. The list of all explorers and scientific visitors to have entered this portion of the cave system is presented in a table. We can attest that these people did not make the marks. More generally, such marks have not been known to be made by cavers in other contexts within southern Africa.

      Panels B and C. We have limited the text related to these areas, other than indicating that we have observed them. The analysis of these areas and quantification of artificial lines does not match what we have done for the Panel A area and we leave these for future work. 

      Presence of modern humans. We have observed no evidence of modern humans or other hominin populations within the Dinaledi Subsystem, other than H. naledi. Several reviewers raise the question of whether the absence of evidence is evidence of absence of modern humans in this area. This is connected by two of the reviewers to the observation that the investigation of other caves in recent years has shown that markings or paintings were sometimes made by different groups over tens of thousands of years, in some cases including both Neanderthals and modern humans. We have decided it is best for us not to attempt to prove a negative. It is simple enough to say that there is no evidence for modern humans in this area, while there is abundant evidence of H. naledi there.

      Association with H. naledi. Reviewer 2 made an incisive point that the previous version contained some text that appeared contradictory: on the one hand we argued that modern humans were not present in the subsystem due to the absence of evidence of them, yet we accepted that H. naledi may have been present for a longer time than currently established by geochronological methods.

      We appreciate this comment because it helped us to think through the way to describe the context and spatial association of these markings and the skeletal remains, and how it may relate to their timeline. Other reviewers also raised similar questions, whether the context by itself demonstrates an association with H. naledi. We have revised the text, in particular on pages 5 and 7, to simply state that we accept as the most parsimonious alternative at present the hypothesis that the engravings were made by H. naledi, which is the only hominin known to be present in this space.

      Age of H. naledi in the system. At one place in the previous manuscript we indicated that we cannot establish that H. naledi was only active in the cave system within the constraints of the maximum and minimum ages for the Dinaledi Subsystem skeletal remains (viz., 335 ka – 241 ka), because some localities with skeletal material are undated. We have adjusted this paragraph on page 7 to be clear that we are discussing this only to acknowledge uncertainty about the full range of H. naledi use of the cave system.

      Geochronological methods. Several reviewers discuss the issue of geochronology as applied to these markings. This is an area of future investigation for us after the publication of this initial report. As some reviewers note, the prospects for successful placement of these engraved features and other markings with geochronological methods depends on factors that we cannot predict without very high-resolution investigation of the surfaces. We have included greater discussion of the challenges of geochronological placement of engravings on page 6, including more references to previous work on this topic. We also briefly note the ethical problems that may arise as we go further with potentially  invasive, destructive or contact studies of these engravings, which must be carefully considered by not just us, but the entire academy.

      Title. Some reviewers suggested that the title should be rephrased because this paper does not use chronological methods to derive date constraints for the markings. We have rephrased the title to reflect less certainty while hopefully retaining the clear hypothesis discussed in the paper.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The present study aims to associate reproduction with age-related disease as support of the antagonistic pleiotropy hypothesis of ageing predominantly using Mendelian Randomization. The authors found evidence that early-life reproductive success is associated with advanced ageing.

      Strengths:

      Large sample size. Many analyses.

      Weaknesses:

      Still a number of doubts with regard to some of the results and their interpretation.

      Reviewer #1 (Recommendations for the authors):

      Thank you for the opportunity to review a revised version.

      I still have serious doubts with regard to a number of datasets presented. For example, the results on essential hypertension and cervical cancer show very small effect sizes, but according to the authors still reach the level of statistical significance. This is unlikely to be accurate. For MR analyses, this is nearly impossible. The analyses of these data and the statistical analysis need to be checked for errors and repeated. While BOLT-LLM might not be relevant here, there might be other things happening here. The authors should therefore always interpret the results also with regard to the observed effect sizes instead of only looking at the p-values (0.999 means that there is a 0.1% lower risk).

      Thank you for your suggestions. We have updated the results for essential hypertension, GAD, and cervical cancer in results, figures, and supplemental tables (lines 65-89, Figure 1, Tables S3-S4).

      Reviewer #2 (Public review):

      Summary:

      The authors present an interesting paper where they test the antagonistic pleiotropy theory. Based on this theory they hypothesize that genetic variants associated with later onset of age at menarche and age at first birth may have a positive effect on a multitude of health outcomes later in life, such as epigenetic aging and prevalence of chronic diseases. Using a mendelian randomization and colocalization approach, the authors show that SNPs associated with later age at menarche are associated with delayed aging measurements, such as slower epigenetic aging and reduced facial aging and a lower risk of chronic diseases, such as type 2 diabetes and hypertension. Moreover, they identify 128 fertility-related SNPs that associate with age-related outcomes and they identified BMI as a mediating factor for disease risk, discussing this finding in the context of evolutionary theory.

      Strengths:

      The major strength of this manuscript is that it addresses the antagonistic pleiotropy theory in aging. Aging theories are not frequently empirically tested although this is highly necessary. The work is therefore relevant for the aging field as well as beyond this field, as the antagonistic pleiotropy theory addresses the link between fitness (early life health and reproduction) and aging.

      The authors addressed the remarks on the previous version very well. Addressing the two points below would further increase the quality of the manuscript.

      (1) In the previous version the authors mentioned that their results are also consistent with the disposable soma theory: "These results are also consistent with the disposable soma theory that suggests aging as an outcome tradeoff between an organism's investment in reproduction and somatic maintenance and repair."

      Although the antagonistic pleiotropy and disposable soma theories describe different mechanisms, both provide frameworks for understanding how genes linked to fertility influence health. The antagonistic pleiotropy theory posits that genes enhancing fertility early in life may have detrimental effects later. In contrast, the disposable soma theory suggests that energy allocation involves a trade-off, where investment in fertility comes at the expense of somatic maintenance, potentially leading to poorer health in later life.

      To strengthen the manuscript, a discussion section should be added to clarify the overlap and distinctions between these two evolutionary theories and suggest directions for future research in disentangling their specific mechanisms.

      Thank you for your suggestions to clarify the overlap and distinctions between the antagonistic pleiotropy and disposable soma theories. While our primary focus is on the antagonistic pleiotropy framework, we acknowledge that the disposable soma theory also provides a relevant perspective on the trade-offs between reproduction and somatic maintenance.

      To address this, we have expanded the discussion section to highlight how both theories contribute to our understanding of the relationship between fertility-related traits and aging-related health outcomes. We also suggested potential future research directions, such as integrating genetic data with biomarkers of somatic to further explore the mechanisms underlying these trade-offs (lines 213-223).

      (2) In response to the question why the authors did not include age at menopause in addition to the already included age at first child and age at menarche the following explanation was provided: "Our manuscript focuses on the antagonistic pleiotropy theory, which posits that inherent trade-off in natural selection, where genes beneficial for early survival and reproduction (like menarche and childbirth) may have costly consequences later. So, we only included age at menarche and age at first childbirth as exposures in our research."

      It remains, however, unclear why genes beneficial for early survival and reproduction would be reflected only in age at menarche and age at first childbirth, but not in age at menopause. While age at menarche marks the onset of fertility, age at menopause signifies its end. Since evolutionary selection acts directly until reproduction is no longer possible (though indirect evolutionary pressures persist beyond this point), the inclusion of additional fertility-related measures could have strengthened the analysis. A more detailed justification for focusing exclusively on age at menarche and first childbirth would enhance the clarity and rigor of the manuscript.

      Thank you for your question regarding the age at menopause in our analysis. Our decision was based on the theoretical framework of antagonistic pleiotropy, which emphasizes early-life reproductive advantages that may have trade-offs later in life. Age at menarche and age at first childbirth are direct markers of early reproductive investment, which align closely with this framework.

      While age at menopause marks the cessation of reproductive capability, its evolutionary role is distinct. The selective pressures acting on menopause are complex and may involve post-reproductive contributions rather than direct reproductive fitness benefits. Moreover, the genetic architecture of menopause may be influenced by different biological pathways compared to early reproductive traits.

      Nonetheless, we acknowledge that including age at menopause could provide additional insights into reproductive aging. Several papers1,2 were already published regarding age at menopause and age-related outcomes, including diabetes, AD, osteoporosis, cancers, and cardiovascular diseases.

      Reviewing Editor (Recommendations for the authors):

      Above/below you will find the remaining comments from the reviewers. One of the main issues remaining is that some of the data seems to be incorrectly analysed and some of the findings may not be correct. To clarify this a lot more, I asked the reviewer for some details and received the following:

      - In Figure 1B one of their main outcomes is "age of menopause", but they report the data as an odds ratio. This is not correct and should be fixed (it seems the authors can run the right analysis, but just reported it with the wrong heading in the figure). This likely also applies to the outcome "facial aging". Also the heading in Figure 1A should be Beta instead of OR.

      We have updated the figures to ensure that the beta values of continuous outcomes and odds ratio values of categorical outcomes are presented in Figure 1.

      - With essential hypertension, GAD and cervical cancer, the estimates are so small that they need to re-review their results. The current MR analysis is not sufficiently powered to have such small confidence intervals. Essential hypertension was based on data from UK biobank, although I was also unable to find what program was used to generate the GWAS results, I have strong thoughts this was also BOLT-LLM. Same for cervical cancer. Both datasets used familial-related samples, so they are very likely derived with BOLT-LLM.

      I hope this will help to solve this issue.

      Based on published paper, gastrointestinal or abdominal disease (GAD) (GWAS ID: ebi-a-GCST90038597) is after BOLT-LLM. Based on MRC IEU UK Biobank GWAS pipeline, version 1 and 2, essential hypertension (GWAS ID: ukb-b-12493) and cervical cancer (GWAS ID: ukb-b-8777) are after BOLT-LLM. We have updated the MR analysis results and figures (lines 65-89, Figure 1, Tables S3-S4) as well as the following IPA analysis (lines 106-162 and 255-280, Figures 2-3).

      (1) Magnus, M. C., Borges, M. C., Fraser, A. & Lawlor, D. A. Identifying potential causal effects of age at menopause: a Mendelian randomization phenome-wide association study. Eur J Epidemiol 37, 971-982 (2022). https://doi.org:10.1007/s10654-022-00903-3

      (2) Zhang, X., Huangfu, Z. & Wang, S. Review of mendelian randomization studies on age at natural menopause. Front Endocrinol (Lausanne) 14, 1234324 (2023). https://doi.org:10.3389/fendo.2023.1234324

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This manuscript presents insights into biased signaling in GPCRs, namely cannabinoid receptors. Biased signaling is of broad interest in general, and cannabinoid signaling is particularly relevant for understanding the impact of new drugs that target this receptor. Mechanistic insight from work like this could enable new approaches to mitigate the public health impact of new psychoactive drugs. Towards that end, this manuscript seeks to understand how new psychoactive substances (NPS, e.g. MDMB-FUBINACA) elicit more signaling through βarrestin than classical cannabinoids (e.g. HU-210). The authors use an interesting combination of simulations and machine learning. 

      We thank the reviewer for the comments. We have provided point by point response to the reviewer’s comment below and incorporated the suggestions in our revised manuscript. Modified parts of manuscripts are highlighted in yellow.   

      Comments:

      (1) The caption for Figure 3 doesn't explain the color scheme, so it's not obvious what the start and end states of the ligand are. 

      We thank the reviewer to point this out. We have added the color scheme in the figure caption. 

      (2) For the metadynamics simulations were multiple Gaussian heights/widths tried to see what, if any, impact that has on the unbinding pathway? That would be useful to help ensure all the relevant pathways were explored.  

      We thank the reviewer for the suggestion. We agree with the reviewer that gaussian height/width may impact unbinding pathway. However, we like to point out that we used a well-tempered version of the metadynamics. In well-tempered metadynamics, the effective gaussian height decreases as bias deposition progresses. Therefore, we believe that the gaussian height/width should have minimal impact on the unbinding pathway. To address the reviewer's suggestion, we conducted additional well-tempered metadynamics simulations varying key parameters such as bias height, bias factor, and the deposition rate, all of which can influence the sampling space. Parameter values for bias height, bias factor and deposition rate that we originally used in the paper are 0.4 kcal/mol, 15 and 1/5 ps<sup>-1</sup>, respectively. We explored different values for these parameters and projected the sampled space on top of previously sampled region (Figure S4). We observed that new simulations sample similar unbinding pathway in the extracellular direction and discover similar space in the binding pocket as well. 

      Results and Discussion (Page 10)

      “We also performed unbinding simulations using well-tempered metadynamics parameters (bias height, bias deposition rate and bias factor) to confirm the existence of alternative pathways (Figure S4). However, the simulations show that ligands follow the similar pathway for all

      metadynamics runs.”

      (3) It would be nice to acknowledge previous applications of metadynamics+MSMs and (separately) TRAM, such as the Simulation of spontaneous G protein activation... (Sun et al. eLife 2018) and Estimation of binding rates and affinities... (Ge and Voelz JCP 2022). 

      We appreciate the reviewer's feedback. We have incorporated additional citations of studies demonstrating the use of TRAM as an estimator for both kinetics and thermodynamics (e.g. Ligand binding: Ge, Y. and Voelz, V.A., JCP, 2022[1]; Peptide-protein binding kinetics: Paul, F. et al., Nat. Commun., 2017[2], Ge, Y. et al., JCIM, 2021[3]). Additionally, we have included references to studies where biased simulations were initially used to explore the conformational space, and the results were then employed to seed unbiased simulations for building a Markov state model. (Metadynamics: Sun, X. et al., elife, 2018[4]; Umbrella Sampling: Abella, J. R. et al., PNAS, 2020[5]; Replica Exchange: Paul, F. et al., Nat. Commun., 2017[2]).

      (4) What is KL divergence analysis between macrostates? I know KL divergence compares probability distributions, but it is not clear what distributions are being compared. 

      We apologize for this confusion. The KL divergence analysis was performed on the probability distributions of the inverse distances between residue pairs from any two macrostates. Each macrostate was represented by 1000 frames that were selected proportional to the TRAM stationary density. All possible pair-wise inverse distances were calculated per frame for the purpose of these calculations. Although KL divergence is inherently asymmetric, we symmetrized the measurement by calculating the average. Per-residue K-L divergence, which is shown in the main figures as color and thickness gradient, was calculated by taking the sum of all pairs corresponding to the residue. We have included a detailed discussion of K-L divergence in Methods section.  We have also modified the result section to add a brief discussion of K-L divergence methodology.

      Results and Discussion (Page 15)

      “We further performed Kullback-Leibler divergence (K-L divergence) analysis between inverse distance of residue pairs of two macrostates to highlight the protein region that undergoes high conformational change with ligand movement.”

      Methods (Page 33)

      “Kullback–Leibler divergence (K-L divergence) analysis was performed to show the structural differences in protein conformations in different macrostates[4,114] . In this study, this technique was used to calculate the difference in the pairwise inverse distance distributions between macrostates. Each macrostate was represented by 1000 frames that were selected proportional to their TRAM weighted probabilities. Although K-L divergence is an asymmetric measurement, for this study, we used a symmetric version of the K-L divergence by taking the average between two macrostates. Per residue contribution of K-L divergence was calculated by taking the sum of all the pairwise distances corresponding to that residue. This analysis was performed by inhouse Python code.”  

      (5) I suggest being more careful with the language of universality. It can be "supported" but "showing" or "proving" its universal would require looking at all possible chemicals in the class. 

      We thank the reviewer for the suggestion. In response, we have revised the manuscript to ensure that the language reflects that our findings are based on observations from a limited set of ligands, namely one NPS and one classical cannabinoid. We have replaced references to ligand groups (such as NPS or classical cannabinoid) with the specific ligand names (such as MDMB-FUBINACA or HU-210) to avoid claims of universality and prevent any potential confusion.

      Results and Discussion (Page 19)

      “In this work, we trained the network with the NPS (MDMB-FUBINACA), and classical cannabinoid (HU-210) bound unbiased trajectories (Method Section). Here, we compared the allosteric interaction weights between the binding pocket and the NPxxY motif which involves in triad interaction formation. Results show that each binding pocket residue in MDMBFUBINACA bound ensemble shows higher allosteric weights with the NPxxY motif, indicating larger dynamic interactions between the NPxxY motif and binding pocket residues(Figure S9).  The probability of triad formation was estimated to observe the effect of the difference in allosteric control. TRAM weighted probability calculation showed that MDMB-FUBINACA bound CB1 has the higher probability of triad formation (Figure 8A). Comparison of the pairwise interaction of the triad residues shows that interaction between Y397<sup>7.53</sup>-T210<sup>3.46</sup> is relatively more stable in case of MDMB-FUBINACA bound CB1, while other two inter- actions have similar behavior for both systems (Figures S10A, S10B, and S10C). Therefore, higher interaction between Y397<sup>7.53</sup> and T210<sup>3.46</sup> in MDMB-FUBINACA bound receptor causes the triad interaction to be more probable. 

      Furthermore, we also compared TM6 movement for both ligand bound ensemble which is another activation metric involved in both G-protein and β-arrestin binding. Comparison of TM6 distance from the DRY motif of TM3 shows similar distribution for HU-210 and MDMBFUBINACA (Figure 8B). These observations support that NPS binding causes higher β-arrestin signaling by allosterically controlling triad interaction formation.” 

      Reviewer #2 (Public Review): 

      Summary: 

      The investigation provides computational as well as biochemical insights into the (un)binding mechanisms of a pair of psychoactive substances into cannabinoid receptors. A combination of molecular dynamics simulation and a set of state-of-the art statistical post-processing techniques were employed to exploit GPCR-ligand dynamics. 

      Strengths: 

      The strength of the manuscript lies in the usage and comparison of TRAM as well as Markov state modelling (MSM) for investigating ligand binding kinetics and thermodynamics. Usually, MSMs have been more commonly used for this purpose. But as the authors have pointed out, implicit in the usage of MSMs lies the assumption of detailed balance, which would not hold true for many cases especially those with skewed binding affinities. In this regard, the author's usage of TRAM which harnesses both biased and unbiased simulations for extracting the same, provides a more appropriate way out. 

      Weaknesses: 

      (1) While the authors have used TRAM (by citing MSM to be inadequate in these cases), the thermodynamic comparisons of both techniques provide similar values. In this case, one would wonder what advantage TRAM would hold in this particular case. 

      We thank the reviewer for the comment. While we agree that the thermodynamic comparisons between MSM and TRAM provide similar values in this instance, we would like to emphasize the underlying reasoning behind our choice of TRAM.

      MSM can struggle to accurately estimate thermodynamic and kinetic properties in cases where local state reversibility (detailed balance) is not easily achieved with unbiased sampling. This is especially relevant in ligand unbinding processes, which often involve overcoming high free energy barriers. TRAM, by incorporating biased simulation data (such as umbrella sampling) in addition to unbiased data, can better achieve local reversibility and provide more robust estimates when unbiased sampling is insufficient.

      The similarity in thermodynamic estimates between MSM and TRAM in our study can be attributed to the relatively long unbiased sampling period (> 100 µs) employed. With sufficient sampling, MSM can approach detailed balance, leading to results comparable to those from TRAM. However, as we demonstrated in our manuscript (Figure 4D), when the amount of unbiased sampling is reduced, the uncertainties in both the thermodynamics and kinetics estimates increase significantly for MSM compared to TRAM. Thus, while MSM and TRAM perform similarly under the conditions of extensive sampling, TRAM's advantage lies in its robustness when unbiased sampling is limited or difficult to achieve. 

      (2) The initiation of unbiased simulations from previously run biased metadynamics simulations would almost surely introduce hysteresis in the analysis. The authors need to address these issues. 

      We thank the reviewer for the comment. We acknowledge that biased simulations could potentially introduce hysteresis or result in the identification of unphysical pathways. However, we believe this issue is mitigated using well-tempered metadynamics, which gradually deposit a decaying bias. This approach enables the simulation to explore orthogonal directions of collective variable (CV) space, reducing the likelihood of hysteresis effects(Invernizzi, M. and Parrinello, M., JCTC, 2019[6]).

      Furthermore, there is precedent for using metadynamics-derived pathways to initiate unbiased simulations for constructing Markov State Models (MSMs). This methodology has been successfully applied in studying G-protein activation (Sun, X. et al., elife, 2018[4]).

      Additional support to our observation can be found in two independent binding/unbinding studies of ligands from cannabinoid receptors, which have discovered similar pathway using different CVs (Saleh, et al., Angew. Chem., 2018[7]; Hua, T. et al., Cell, 2020[8]).   

      (3) The choice of ligands in the current work seems very forced and none of the results compare directly with any experimental data. An ideal case would have been to use the seminal D.E. Shaw research paper on GPCR/ligand binding as a benchmark and then show how TRAM, using much lesser biased simulation times, would fare against the experimental kinetics or even unbiased simulated kinetics of the previous report 

      We would like to address the reviewer's concerns regarding the choice of ligands, lack of direct experimental comparison, and the use of TRAM, and clarify our rationale point by point:

      Ligand Choice: The ligands selected for this study were chosen due to their relevance and well characterized binding properties. MDMB-FUBINACA is well-known NPS ligand with documented binding properties. This ligand is still the only NPS ligand with experimentally determined CB1 bound structure (Krishna Kumar, K. et al., Cell, 2019[9]). Similarly, the classical cannabinoid (HU-210) used in this study has established binding characteristics and is one of earliest known synthetic classical cannabinoid. Therefore, these ligands serve as representative compounds within their respective categories, making them suitable for our comparative analysis.

      Experimental Comparison: We have indeed compared our simulation results to experimental data, particularly focusing on binding free energies. In the result section, we have shown that the relative binding free energy estimated from our simulation aligns closely with the experimentally measured values. Additionally, Absolute binding energy estimates are also within ~3 kcal/mol of the experimentally predicted value.

      TRAM Performance: TRAM estimated free energies, and rates have been benchmarked against experimental predictions for various studies along with our study (Peptide-protein binding: Paul, F. et al., Nat. Commun., 2017[2]; Ligand unbinding: Wu, H. et al., PNAS, 2016[10]) . As the primary goal of this study is to compare ligand unbinding mechanism, we believe benchmarking against other datasets, such as the D.E. Shaw GPCR/ligand binding paper, is not essential for this work.

      (4) The method section of the manuscript seems to suggest all the simulations were started from a docked structure. This casts doubt on the reliability of the kinetics derived from these simulations that were spawned from docked structure, instead of any crystallographic pose. Ideally, the authors should have been more careful in choosing the ligands in this work based on the availability of the crystallographic structures. 

      We thank the reviewer for the comment. We would like to clarify that we indeed used an experimentally derived pose for one of the ligands (MDMB-FUBINACA) as the cryo-EM structure of MDMB-FUBINACA bound to the protein was available (PDB ID: 6N4B) (Krishna Kumar K. et al., Cell, 2019[9]). However, as the cryo-EM structure had missing loops, we modeled these regions using Rosetta. We apologize for this confusion and have modified our method section to make this point clearer. 

      Regarding HU-210, we acknowledge that a crystallographic or cryo-EM structure for this specific ligand was not available. We selected HU-210 because it is most commonly used example of classical cannabinoid in the literature with extensively studied thermodynamic properties. Importantly, our docking results for HU-210 align closely with previously experimentally determined poses for other classical cannabinoids (Figure S11) and replicate key polar interactions, such as those with S383<sup>7.39</sup>, which are characteristic of this class of compounds. 

      System Preparation (Page 22)

      “Modeling of this membrane proximal region was also performed Remodel protocol of Rosetta loop modeling. A distance constraint is added during this modeling step between C98N−term and C107N−term to create the disulfide bond between the residues. [74,76] 

      As the cryo-EM structure of MDMB-FUBINACA was known, ligand coordinate of MDMB- FUBINACA was added to the modeled PDB structure. The “Ligand Reader & Modeler” module of CHARMM-GUI was used for ligand (e.g., MDMB-Fubinaca) parameterization using CHARMM General Force Field (CGenFF).[77]”

      (5) The last part of using a machine learning-based approach to analyze allosteric interaction seems to be very much forced, as there are numerous distance-based more traditional precedent analyses that do a fair job of identifying an allosteric job. 

      We thank the reviewer for the valuable comment. Neural relational inference method, which leverages a VAE (Variational Autoencoder) architecture, attempts to reconstruct the conformation (X) at time t + τ based on the conformation at time t. In doing so, it captures the non-linear dynamic correlations between residues in the VAE latent space. We chose this method because it is not reliant on specific metrics such as distance or angle, making it potentially more robust in predicting allosteric effects between the binding pocket residues and the NPxxY motif.

      In response to the reviewer's suggestion, we have also performed a more traditional allosteric analysis by calculating the mutual information between the binding pocket residues and the NPxxY motif. Mutual information was computed based on the backbone dihedral angles, as this provides a metric that is independent of the relative distances between residues. Our results indicate that the mutual information between the binding pocket residues and the NPxxY motif is indeed higher for the NPS binding simulation (Figure S11).

      Method

      Mutual information calculation

      Mutual information was calculated on same trajectory data as NRI analysis. Python package MDEntropy was used for estimating mutual information between backbone dihedral angles of two residues. 

      Results and Discussion (Page 21)

      “To further validate our observations, we estimated allosteric weights between the binding pocket and the NPxxY motif by calculating mutual information between residue movements. Mutual information analysis reaffirms that allosteric weights between these residues are indeed higher for the MDMB-FUBINACA bound ensemble (Figure S11).”

      Mutual Information Estimation (Page 37)

      “Mutual information between dynamics of residue pairs was computed based on the backbone dihedral angles, as this provides a metric that is independent of the relative distances between residues. The calculations were done on same trajectory data as NRI analysis. Python package MDEntropy was used for estimating mutual information between backbone dihedral angles of two residues.[124]”

      (6) While getting busy with the methodological details of TRAM vs MSM, the manuscript fails to share with sufficient clarity what the distinctive features of two ligand binding mechanisms are. 

      We thank the reviewer for the insightful comment. In the manuscript, we discussed that the overall ligand (un)binding pathways are indeed similar for both ligands. Therefore, they interact with similar residues during the unbinding process. However, we have focused on two key differences in unbinding mechanism between the two ligands:

      (1) MDMB-FUBINACA exhibits two distinct unbinding mechanisms. In one, the linked portion of the ligand exits the receptor first. In the other mechanism, the ligand rotates within the pocket, allowing the tail portion to exit first. By contrast, for HU-210, we observe only a single unbinding mechanism, where the benzopyran ring leads the ligand out of the receptor. We have highlighted these differences in the Figure 6 and 7 and talked about the intermediate states appear along these different unbinding mechanisms. For further clarification of these differences, we have added arrows in the free energy landscapes to highlight these distinct pathways.

      (2) In the bound state, a significant difference is observed in the interaction profiles. HU-210, a classical cannabinoid, forms strong polar interactions with TM7, while MDMB-FUBINACA shows weaker polar interactions with this region.

      We have discussed these differences in the Results and Discussion section (Page 13-18) & conclusion section (Page 23-24).

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      (1) The authors should choose at least one case where the ligand's crystallographic pose is known and show how TRAM works in comparison to MSM or experimental report. 

      We thank the reviewer for the comment. We have used the experimentally determined cryo-EM pose for one of the ligands (i.e. MDMB-FUBINACA).  We have modified the manuscript to avoid confusion. (Please refer to the response of comment 4 of reviewer 2)

      (2) The authors should consider existing traditional methods that are used to detect allostery and compare their machine-learning-based approach to show its relevance. 

      We appreciate the reviewer’s comment. We have performed the traditional analysis by calculating mutual information between residue dynamics. We have shown that the traditional analysis matches with Machine learning based NRI calculation. (Please refer to the response of comment 5 of reviewer 2)

      (3) Figure 3 doesn't provide a guide on the pathway of ligand. Without a proper arrow, it is difficult to surmise what is the start and end of the pathway. The figures should be improved. 

      We appreciate the reviewer’s suggestion. In response, we have revised Figure 3 to clearly indicate the ligand’s unbinding pathway by adding directional arrows and labeling the bound pose. Additionally, we have updated the figure caption to better clarify the color scheme used in the illustration. 

      (4) The Figure 5 presentation of free energetics has a very similar shape for the two ligands. More clarity is required on how these two ligands are different. 

      We thank the reviewer for the comment. While the overall shapes of the free energy profiles for the two ligands are indeed similar, this is expected as both ligands dissociate from the same pocket and follow a comparable pathway. However, key differences in their unbinding mechanisms arise due to variations in the ligand motion within the pocket. Specifically, the intermediate metastable minima in the free energy landscapes reflect these differences. For instance, in the NPS unbinding free energy landscape, the intermediate metastable state I1 corresponds to a conformation where the NPS ligand maintains a polar interaction with TM7, while the tail of the ligand has shifted away from TM5. This intermediate state is absent in the classical cannabinoid unbinding pathway, where no equivalent conformation appears in the landscape.  

      (6) Page 30: TICA is wrongly expressed as 'Time-independent component analysis'. It is not a time-independent process. Rather it is 'Time structured independent component analysis'. 

      We thank the reviewer for pointing this out. TICA should be expressed as Time-lagged independent component analysis or Time-structure independent component analysis. We have used the first expression and modified the manuscript accordingly.  

      (7) The manuscript's MSM theory part is quite well-known which can be removed and appropriate papers can be cited. 

      We thank the reviewer for the comment. We have removed the theory discussion of MSM and cited relevant papers.

      “Markov State Model

      Markov state model (MSM) is used to estimate the thermodynamics and kinetics from the unbiased simulation.[56,91] MSM characterizes a dynamic process using the transition probability matrix and estimates its relevant thermodynamics and kinetic properties from the eigendecomposition of this matrix. This matrix is usually calculated using either maximum likelihood or Bayesian approach.[56,97] The prevalence of MSM as a post-processing technique for MD simulations was due to its reliance on only local equilibration of MD trajectories to predict the global equilibrium properties.[92,93] Hence, MSM can combine information from distinct short trajectories, which can only attain the local equilibrium.[94–96]  

      The following steps are taken for the practical implementation of the MSM from the MD data. [4,17,98–100]”

      (8) A proper VAMP score-based analysis should be provided to show confidence in MSM's clustering metric and other hyperparameters. 

      We thank the reviewer for the recommendation. VAMP-2 score based analysis had been discussed in the method section.  We estimated VAMP-2 score of MSM built with different cluster number and input TIC dimensions (Figure S15). Model with best VAMP-2 was selected for comparison with TRAM result.

    1. Author response:

      We thank the reviewers for the valuable and constructive reviews. Thanks to these, we believe the article will be considerably improved. We have organized our response to address points that are relevant to both reviewers first, after which we address the unique concerns of each individual reviewer separately. We briefly paraphrase each concern and provide comments for clarification, outlining the precise changes that we will make to the text.

      Common Concerns (Reviewer 1 & Reviewer 2):

      Can you clarify how NREM and REM sleep relate to the oneirogen hypothesis?

      Within the submission draft we tried to stay agnostic as to whether mechanistically similar replay events occur during NREM or REM sleep; however, upon a more thorough literature review, we think that there is moderately greater evidence in favor of Wake-Sleep-type replay occurring during REM sleep which is related to classical psychedelic drug mechanisms of action.

      First, we should clarify that replay has been observed during both REM and NREM sleep, and dreams have been documented during both sleep stages, though the characteristics of dreams differ across stages, with NREM dreams being more closely tied to recent episodic experience and REM dreams being more bizarre/hallucinatory (see Stickgold et al., 2001 for a review). Replay during sleep has been studied most thoroughly during NREM sharp-wave ripple events, in which significant cortical-hippocampal coupling has been observed (Ji & Wilson, 2007). However, it is critical to note that the quantification methods used to identify replay events in the hippocampal literature usually focus on identifying what we term ‘episodic replay,’ which involves a near-identical recapitulation of neural trajectories that were recently experienced during waking experimental recordings (Tingley & Peyrach, 2020). In contrast, our model focuses on ‘generative replay,’ where one expects only a statistically similar reproduction of neural activity, without any particular bias towards recent or experimentally controlled experience. This latter form of replay may look closer to the ‘reactivation’ observed in cortex by many studies (e.g. Nguyen et al., 2024), where correlation structures of neural activity similar to those observed during stimulus-driven experience are recapitulated. Under experimental conditions in which an animal is experiencing highly stereotyped activity repeatedly, over extended periods of time, these two forms of replay may be difficult to dissociate.

      Interestingly, though NREM replay has been shown to couple hippocampal and cortical activity, a similar study in waking animals administered psychedelics found hippocampal replay without any obvious coupling to cortical activity (Domenico et al., 2021). This could be because the coupling was not strong enough to produce full trajectories in the cortex (psychedelic administration did not increase ‘alpha’ enough), and that a causal manipulation of apical/basal influence in the cortex may be necessary to observe the increased coupling. Alternatively, as Reviewer 1 noted, it may be that psychedelics induce a form of hippocampus-decoupled replay, as one would expect from the REM stage of a recently proposed complementary learning systems model (Singh et al., 2022). 

      Evidence in favor of a similarity between the mechanism of action of classical psychedelics and the mechanism of action of memory consolidation/learning during REM sleep is actually quite strong. In particular, studies have shown that REM sleep increases the activity of soma-targeting parvalbumin (PV) interneurons and decreases the activity of apical dendrite-targeting somatostatin (SOM) interneurons (Niethard et al., 2021), that this shift in balance is controlled by higher-order thalamic nuclei, and that this shift in balance is critical for synaptic consolidation of both monocular deprivation effects in early visual cortex (Zhou et al., 2020) and for the consolidation of auditory fear conditioning in the dorsal prefrontal cortex (Aime et al., 2022). These last studies were not discussed in the present manuscript–we will add them, in addition to a more nuanced description of the evidence connecting our model to NREM and REM replay.

      Can you explain how synaptic plasticity induced by psychedelics within your model relates to learning at a behavioral level?

      While the Wake-Sleep algorithm is a useful model for unsupervised statistical learning, it is not a model of reward or fear-based conditioning, which likely occur via different mechanisms in the brain (e.g. dopamine-dependent reinforcement learning or serotonin-dependent emotional learning). The Wake-Sleep algorithm is a ‘normative plasticity algorithm,’ that connects synaptic plasticity to the formation of structured neural representations, but it is not the case that all synaptic plasticity induced by psychedelic administration within our model should induce beneficial learning effects. According to the Wake-Sleep algorithm, plasticity at apical synapses is enhanced during the Wake phase, and plasticity at basal synapses is enhanced during the Sleep phase; under the oneirogen hypothesis, hallucinatory conditions (increased ‘alpha’) cause an increase in plasticity at both apical and basal sites. Because neural activity is in a fundamentally aberrant state when ‘alpha’ is increased, there are no theoretical guarantees that plasticity will improve performance on any objective: psychedelic-induced plasticity within our model could perhaps better be thought of as ‘noise’ that may have a positive or negative effect depending on the context.

      In particular, such ‘noise’ may be beneficial for individuals or networks whose synapses have become locked in a suboptimal local minimum. The addition of large amounts of random plasticity could allow a system to extricate itself from such local minima over subsequent learning (or with careful selection of stimuli during psychedelic experience), similar to simulated annealing optimization approaches. If our model were fully validated, this view of psychedelic-induced plasticity as ‘noise’ could have relevance for efforts to alleviate the adverse effects of PTSD, early life trauma, or sensory deprivation; it may also provide a cautionary note against repeated use of psychedelic drugs within a short time frame, as the plasticity changes induced by psychedelic administration under our model are not guaranteed to be good or useful in-and-of themselves without subsequent re-learning and compensation.

      We should also note that we have deliberately avoided connecting the oneirogen hypothesis model to fear extinction experimental results that have been observed through recordings of the hippocampus or the amygdala (Bombardi & Giovanni, 2013; Jiang et al., 2009; Kelly et al., 2024; Tiwari et al., 2024). Both regions receive extensive innervation directly from serotonergic synapses originating in the dorsal raphe nucleus, which have been shown to play an important role in emotional learning (Lesch & Waider, 2012); because classical psychedelics may play a more direct role in modulating this serotonergic innervation, it is possible that fear conditioning results (in addition to the anxiolytic effects of psychedelics) cannot be attributed to a shift in balance between apical and basal synapses induced by psychedelic administration. We will provide a more detailed review of these results in the text, as well as more clarity regarding their relation to our model.

      Reviewer 1 Concerns:

      Is it reasonable to assign a scalar parameter ‘alpha’ to the effects of classical psychedelics? And is your proposed mechanism of action unique to classical psychedelics? E.g. Could this idea also apply to kappa opioid agonists, ketamine, or the neural mechanisms of hallucination disorders?

      We will clarify that within our model ‘alpha’ is a parameter that reflects the balance between apical and basal synapses in determining the activity of neurons in the network. For the sake of simplicity we used a single ‘alpha’ parameter, but realistically, each neuron would have its own ‘alpha’ parameter, and different layers or individual neurons could be affected differentially by the administration of any particular drug; therefore, our scalar ‘alpha’ value can be thought of as a mean parameter for all neurons, disregarding heterogeneity across individual neurons.

      There are many different mechanisms that could theoretically affect this ‘alpha’ parameter, including: 5-HT2a receptor agonism, kappa opioid receptor binding, ketamine administration, or possibly the effects of genetic mutations underlying the pathophysiology of complex developmental hallucination disorders. We focused exclusively on 5-HT2a receptor agonism for this study because the mechanism is comparatively simple and extensively characterized, but similar mechanisms may well be responsible for the hallucinatory symptoms of a variety of drugs and disorders.

      Can you clarify the role of 5-HT2a receptor expression on interneurons within your model?

      While we mostly focused on the effects of 5-HT2a receptors on the apical dendrites of pyramidal neurons, these receptors are also expressed on soma-targeting parvalbumin (PV) interneurons. This expression on PV interneurons is consistent with our proposed psychedelic mechanism of action, because it could lead to a coordinated decrease in the influence of somatic and proximal dendritic inputs while increasing the influence of apical dendritic inputs. We will elaborate on this point, and will move the discussion earlier in the text.

      Discussions of indigenous use of psychedelics over millenia may amount to over-romanticization.

      We will take great care to conduct a more thorough literature review to reevaluate our statement regarding indigenous psychedelic use (including the citations you suggested), and will either provide a more careful statement or remove this discussion from our introduction entirely, as it has little bearing on the rest of the text. The Ethics Statement will also be modified accordingly.

      You isolate the 5-HT2a agonism as the mechanism of action underlying ‘alpha’ in your model, but there exist 5-HT2a agonists that do not have hallucinatory effects (e.g. lisuride). How do you explain this?

      Lisuride has much-reduced hallucinatory effects compared to other psychedelic drugs at clinical doses (though it does indeed induce hallucinations at high doses; Marona-Lewicka et al., 2002), and we should note that serotonin (5-HT) itself is pervasive in the cortex without inducing hallucinatory effects during natural function. Similarly, MDMA is a partial agonist for 5-HT2a receptors, but it has much-reduced perceptual hallucination effects relative to classical psychedelics (Green et al., 2003) in addition to many other effects not induced by classical psychedelics.

      Therefore, while we argue that 5-HT2a agonism induces an increase in influence of apical dendritic compartments and a decrease in influence of basal/somatic compartments, and that this change induces hallucinations, we also note that there are many other factors that control whether or not hallucinations are ultimately produced, so that not all 5-HT2a agonists are hallucinogenic. We will discuss two such factors in our revision: 5-HT receptor binding affinity and cellular membrane permeability.

      Importantly, many 5-HT2a receptor agonists are also 5-HT1a receptor agonists (e.g. serotonin itself and lisuride), while MDMA has also been shown to increase serotonin, norepinephrine, and dopamine release (Green et al., 2003). While 5-HT2a receptor agonism has been shown to reduce sensory stimulus responses (Michaiel et al., 2019), 5-HT1a receptor agonism inhibits spontaneous cortical activity (Azimi et al., 2020); thus one might expect the net effect of administering serotonin or a nonselective 5-HT receptor agonist to be widespread inhibition of a circuit, as has been observed in visual cortex (Azimi et al., 2020). Therefore, selective 5-HT2a agonism is critical for the induction of hallucinations according to our model, though any intervention that jointly excites pyramidal neurons’ apical dendrites and inhibits their basal/somatic compartments across a broad enough area of cortex would be predicted to have a similar effect. Lisuride has a much higher binding affinity for 5-HT1a receptors than, for instance, LSD (Marona-Lewicka et al., 2002).

      Secondly, it has recently been shown that both the head-twitch effect (a coarse behavioral readout of hallucinations in animals) and the plasticity effects of psychedelics are abolished when administering 5-HT2a agonists that are impermeable to the cellular membrane because of high polarity, and that these effects can be rescued by temporarily rendering the cellular membrane permeable (Vargas et al., 2023). This suggests that the critical hallucinatory effects of psychedelics (apical excitation according to our model) may be mediated by intracellular 5-HT2a receptors. Notably, serotonin itself is not membrane permeable in the cortex.

      Therefore, either of these two properties could play a role in whether a given 5-HT2a agonist induces hallucinatory effects. We will provide a considerably extended discussion of these nuances in our revision.

      Your model proposes that an increase in top-down influence on neural activity underlies the hallucinatory effects of psychedelics. How do you explain experimental results that show increases in bottom-up functional connectivity (either from early sensory areas or the thalamus)?

      Firstly, we should note that our proposed increase in top-down influence is a causal, biophysical property, not necessarily a statistical/correlative one. As such, we will stress that the best way to test our model is via direct intervention in cortical microcircuitry, as opposed to correlative approaches taken by most fMRI studies, which have shown mixed results with regard to this particular question. Correlative approaches can be misleading due to dense recurrent coupling in the system, and due to the coarse temporal and spatial resolution provided by noninvasive recording technologies (changes in statistical/functional connectivity do not necessarily correspond to changes in causal/mechanistic connectivity, i.e. correlation does not imply causation).

      There are two experimental results that appear to contradict our hypothesis that deserve special consideration in our revision. The first shows an increase in directional thalamic influence on the distributed cortical networks after psychedelic administration (Preller et al., 2018). To explain this, we note that this study does not distinguish between lower-order sensory thalamic nuclei (e.g. the lateral and medial geniculate nuclei receiving visual and auditory stimuli respectively) and the higher-order thalamic nuclei that participate in thalamocortical connectivity loops (Whyte et al., 2024). Subsequent more fine-grained studies have noted an increase in influence of higher order thalamic nuclei on the cortex (Pizzi et al., 2023; Gaddis et al., 2022), and in fact extensive causal intervention research has shown that classical psychedelics (and 5-HT2a agonism) decrease the influence of incoming sensory stimuli on the activity of early sensory cortical areas, indicating decoupling from the sensory thalamus (Evarts et al., 1955; Azimi et al., 2020; Michaiel et al. 2019). The increased influence of higher-order thalamic nuclei is consistent with both the cortico-striatal-thalamo-cortical (CTSC) model of psychedelic action as well as the oneirogen hypothesis, since higher-order thalamic inputs modulate the apical dendrites of pyramidal neurons in cortex (Whyte et al., 2024).

      The second experimental result notes that DMT induces traveling waves during resting state activity that propagate from early visual cortex to deeper cortical layers (Alamia et al., 2020). There are several possibilities that could explain this phenomenon: 1) it could be due to the aforementioned difficulties associated with directed functional connectivity analyses, 2) it could be due to a possible high binding affinity for DMT in the visual cortex relative to other brain areas, or 3) it could be due to increases in apical influence on activity caused by local recurrent connectivity within the visual cortex which, in the absence of sensory input, could lead to propagation of neural activity from the visual cortex to the rest of the brain. This last possibility is closest to the model proposed by (Ermentrout & Cowan, 1979), and which we believe would be best explained within our framework by a topographically connected recurrent network architecture trained on video data; a potentially fruitful direction for future research.

      Shouldn’t the hallucinations generated by your model look more ‘psychedelic,’ like those produced by the DeepDream algorithm?

      We believe that the differences in hallucination visualization quality between our algorithm and DeepDream are mostly due to differences in the scale and power of the models used across these two studies. We are confident that with more resources (and potentially theoretical innovations to improve the Wake-Sleep algorithm’s performance) the produced hallucination visualizations could become more realistic, but we believe this falls outside the scope of the present study.

      We note that more powerful generative models trained with backpropagation are able to produce surreal images of comparable quality (Rezende et al., 2014; Goodfellow et al., 2020; Vahdat & Kautz, 2020), though these have not yet been used as a model of psychedelic hallucinations. However, the DeepDream model operates on top of large pretrained image processing models, and does not provide a biologically mechanistic/testable interpretation of its hallucination effects. When training smaller models with a local synaptic plasticity rule (as opposed to backpropagation), the hallucination effects are less visually striking due to the reduced quality of our trained generative model, though they are still strongly tied to the statistics of sensory inputs, as quantified by our correlation similarity metric (Fig. 5b). We will provide a more detailed explanation of this phenomenon when we discuss our model limitations in our revised manuscript.

      Your model assumes domination by entirely bottom-up activity during the ‘wake’ phase, and domination entirely by top-down activity during ‘sleep,’ despite experimental evidence indicating that a mixture of top-down and bottom-up inputs influence neural activity during both stages in the brain. How do you explain this?

      Our use of the Wake-Sleep algorithm, in which top-down inputs (Sleep) or bottom-up inputs (Wake) dominate network activity is an over-simplification made within our model for computational and theoretical reasons. Models that receive a mixture of top-down and bottom-up inputs during ‘Wake’ activity do exist (in particular the closely related Boltzmann machine (Ackley et al., 1985)), but these models are considerably more computationally costly to train due to a need to run extensive recurrent network relaxation dynamics for each input stimulus. Further, these models do not generalize as cleanly to processing temporal inputs. For this reason, we focused on the Wake-Sleep algorithm, at the cost of some biological realism, though we note that our model should certainly be extended to support mixed apical-basal waking regimes. We will make sure to discuss this in our ‘Model Limitations’ section.

      Your model proposes that 5-HT2a agonism enhances glutamatergic transmission, but this is not true in the hippocampus, which shows decreases in glutamate after psychedelic administration.

      We should note that our model suggests only compartment specific increases in glutamatergic transmission; as such, our model does not predict any particular directionality for measures of glutamatergic transmission that includes signaling at both apical and basal compartments in aggregate, as was measured in the provided study (Mason et al., 2020).

      You claim that your model is consistent with the Entropic Brain theory, but you report increases in variance, not entropy. In fact, it has been shown that variance decreases while entropy increases under psychedelic administration. How do you explain this discrepancy?

      Unfortunately, ‘entropy’ and ‘variance’ are heavily overloaded terms in the noninvasive imaging literature, and the particularities of the method employed can exert a strong influence on the reported effects. The reduction in variance reported by (Carhart-Harris et al., 2016) is a very particular measure: they are reporting the variance of resting state synchronous activity, averaged across a functional subnetwork that spans many voxels; as such, the reduction in variance in this case is a reduction in broad, synchronous activity. We do not have any resting state synchronous activity in our network due to the simplified nature of our model (particularly an absence of recurrent temporal dynamics), so we see no reduction in variance in our model due to these effects.

      Other studies estimate ‘entropy’ or network state disorder via three different methods that we have been able to identify. 1) (Carhart-Harris et al., 2014) uses a different measure of variance: in this case, they subtract out synchronous activity within functional subnetworks, and calculate variability across units in the network. This measure reports increases in variance (Fig. 6), and is the closest measure to the one we employ in this study. 2) (Lebedev et al., 2016) uses sample entropy, which is a measure of temporal sequence predictability. It is specifically designed to disregard highly predictable signals, and so one might imagine that it is a measure that is robust to shared synchronous activity (e.g. resting state oscillations). 3) (Mediano et al., 2024) uses Lempel-Ziv complexity, which is, similar to sample entropy, a measure of sequence diversity; in this case the signal is binarized before calculation, which makes this method considerably different from ours. All three of the preceding methods report increases in sequence diversity, in agreement with our quantification method. Our strongest explanation for why the variance calculation in (Carhart-Harris et al., 2016) produces a variance reduction is therefore due to a reduction in low-rank synchronous activity in subnetworks during resting state.

      As for whether the entropy increase is meaningful: we share Reviewer 1’s concern that increases in entropy could simply be due to a higher degree of cognitive engagement during resting state recordings, due to the presence of sensory hallucinations or due to an inability to fall asleep. This could explain why entropy increases are much more minimal relative to non-hallucinating conditions during audiovisual task performance (Siegel et al., 2024; Mediano et al., 2024). However, we can say that our model is consistent with the Entropic Brain Theory without including any form of ‘cognitive processing’: we observe increases in variability during resting state in our model, but we observe highly similar distributions of activity when averaging over a wide variety of sensory stimulus presentations (Fig. 5b-c). This is because variability in our model is not due to unstructured noise: it corresponds to an exploration of network states that would ordinarily be visited by some stimulus. Therefore, when averaging across a wide variety of stimuli, the distribution of network states under hallucinating or non-hallucinating conditions should be highly similar.

      One final point of clarification: here we are distinguishing Entropic Brain Theory from the REBUS model–the oneirogen hypothesis is consistent with the increase in entropy observed experimentally, but in our model this entropy increase is not due to increased influence of bottom-up inputs (it is due instead to an increase in top-down influence). Therefore, one could view the oneirogen hypothesis as consistent with EBT, but inconsistent with REBUS.

      You relate your plasticity rule to behavioral-timescale plasticity (BTSP) in the hippocampus, but plasticity has been shown to be reduced in the hippocampus after psychedelic administration. Could you elaborate on this connection?

      When we were establishing a connection between our ‘Wake-Sleep’ plasticity rule and BTSP learning, the intended connection was exclusively to the mathematical form of the plasticity rule, in which activity in the apical dendrites of pyramidal neurons functions as an instructive signal for plasticity in basal synapses (and vice versa): we will clarify this in the text. Similarly, we point out that such a plasticity rule tends to result in correlated tuning between apical and basal dendritic compartments, which has been observed in hippocampus and cortex: this is intended as a sanity check of our mapping of the Wake-Sleep algorithm to cortical microcircuitry, and has limited further bearing on the effects of psychedelics specifically.

      Reduction in plasticity in the hippocampus after psychedelic administration could be due to a complementary learning systems-type model, in which the hippocampus becomes partly decoupled from the cortex during REM sleep (Singh et al., 2022); were this to be the case, it would not be incompatible with our model, which is mostly focused on the cortex. Notably, potentiating 5HT-2a receptors in the ventral hippocampus does not induce the head-twitch response, though it does produce anxiolytic effects (Tiwari et al., 2024), indicating that the hallucinatory and anxiolytic effects of classical psychedelics may be partly decoupled. 

      Reviewer 2 Concerns:

      Could you provide visualizations of the ‘ripple’ phenomenon that you’re referring to?

      We will do this! For now, you can get a decent understanding of what the ‘ripple effect’ looks like from the ‘eyes closed’ hallucination condition for networks trained on CIFAR10 (Fig. 2d). The ripple effect that we are referring to is very similar, except it is superimposed on a naturalistic image under ordinary viewing conditions; to give a higher quality visualization of the ripple phenomenon itself, we will subtract out the static contribution of the image itself, leaving only the ripple phenomenon.

      Could you provide a more nuanced description of alternative roles for top-down feedback, beyond being used exclusively for learning as depicted in your model?

      For the sake of simplicity, we only treat top-down inputs in our model as a source of an instructive teaching signal, the originator of generative replay events during the Sleep phase, and as the mechanism of hallucination generation. However, as discussed in a response to a previous question, in the cortex pyramidal neurons receive and respond to a mixture of top-down and bottom-up processing.

      There are a variety of theories for what role top-down inputs could play in determining network activity. To name several, top-down input could function as: 1) a denoising/pattern completion signal (Kadkhodaie & Simoncelli, 2021), 2) a feedback control signal (Podlaski & Machens, 2020), 3) an attention signal (Lindsay, 2020), 4) ordinary inputs for dynamic recurrent processing that play no specialized role distinct from bottom-up or lateral inputs except to provide inputs from higher-order association areas or other sensory modalities (Kar et al., 2019; Tugsbayar et al., 2025). Though our model does not include these features, they are perfectly consistent with our approach.

      In particular, denoising/pattern completion signals in the predictive coding framework (closely related to the Wake-Sleep algorithm) also play a role as an instructive learning signal (Salvatori et al., 2021); and top-down control signals can play a similar role in some models (Gilra & Gerstner, 2017; Meulemans et al., 2021). Thus, options 1 and 2 are heavily overlapping with our approach, and are a natural consequence of many biologically plausible learning algorithms that minimize a variational free energy loss (Rao & Ballard, 1997; Ackley et al., 1985). Similarly, top-down attentional signals can exist alongside top-down learning signals, and some models have argued that such signals can be heavily overlapping or mutually interchangeable (Roelfsema & van Ooyen, 2005). Lastly, generic recurrent connectivity (from any source) can be incorporated into the Wake-Sleep algorithm (Dayan & Hinton, 1996), though we avoided doing this in the present study due to an absence of empirical architecture exploration in the literature and the computational complexity associated with training on time series data.

      To conclude, there are a variety of alternative functions proposed for top-down inputs onto pyramidal neurons in the cortex, and we view these additional features as mutually compatible with our approach; for simplicity we did not include them in our model, but we believe that these features are unlikely to interfere with our testable predictions or empirical results.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their positive and constructive comments on the manuscript. In the revised manuscript we addressed these comments, which we believe have improved the quality of our work.

      In summary:

      (1) We acknowledge the reviewer's suggestion to incorporate open-source segmentation and tracking functionalities, increasing its accessibility to a wider user base; however, these additions fall outside the primary scope of our current work, which is to provide an analytical framework for IVM data after segmentation and tracking. Developing open-source segmentation and tracking tools represents a substantial undertaking in its own right, which has been comprehensively explored in other studies (e.g. https://doi.org/10.4049/jimmunol.2100811; https://doi.org/10.7554/eLife.60547; https://doi.org/10.1016/j.media.2022.102358; https://doi.org/10.1038/s41592024-02295-6 - now cited in our revised manuscript). 

      In our analyses, we used data processed with Imaris, a commercial software that, despite its limitations, is widely used by the intravital microscopy community due to its user-friendly platform for 3D image visualization and analysis. Nevertheless, recognizing the need for compatibility with tracking data from various pipelines, we have modified our tool to accept other data formats, such as those generated by open-source Fiji plugins like TrackMate, MTrackJ, ManualTracking (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input). These updates are available in our GitHub repository and are described in the revised manuscript. 

      (2) We appreciate the reviewer #3 suggestion to incorporate additional features into our analytical pipeline. In response, we have already updated the GitHub repository to allow users to input and select which features (dynamic, morphological, or spatial) they wish to include in the analysis (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readmeov-file#feature-selection ). In the revised manuscript, we highlighted this new functionality and provided examples using alternative datasets to demonstrate the application of these features.

      (3)  We appreciate the constructive feedback of reviewers #1 and #2 regarding the statistical analysis and interpretation of the data presented in Figures 3 and 4. We understand the importance of clarity and rigor in data analysis and presentation, and we addressed the concerns raised in the revised version of the manuscript.

      (4) We appreciate reviewer #1's suggestion regarding the inclusion of demo data, as we believe it would greatly enhance the usability of our pipeline. We acknowledge that this was an oversight on our part. To address this, we have now added demos to our GitHub repository (https://github.com/imAIgene-

      Dream3D/BEHAV3D_Tumor_Profiler/tree/BEHAV3D_TP-v2.0/demo_datasets). In the revised manuscript, we referenced this addition and present new figures with examples of these demo’s processing different IVM dataset (2D/3D, different tumors and healthy tissues). Additionally, we have provided processed DMG IVM movie samples in an imaging repository.

      (5) Finally, we made some small changes to the manuscript based on the reviewers’ feedback.

      Below we provide a point-by-point response to the reviewers’ comments

      Reviewer #1 (Public review):

      Comment #1: A key limitation of the pipeline is that it does not overcome the main challenges and bottlenecks associated with processing and extracting quantitative cellular data from timelapse and longitudinal intravital images. This includes correcting breathing-induced movement artifacts, automated registration of longitudinal images taken over days/weeks, and accurate, automated segmentation and tracking of individual cells over time. Indeed, there are currently no standardised computational methods available for IVM data processing and analysis, with most laboratories relying on custom-built solutions or manual methods. This isn't made explicit in the manuscript early on (described below), and the researchers rely on expensive software packages such as IMARIS for image processing and data extraction to feed the required parameters into their pipeline. This limitation unfortunately reduces the likely impact of BEHAV3D-TP on the IVM field. 

      As highlighted above, the tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence, and displacement) from intravital images. Indeed, to use the tool researchers must first extract dynamic cellular parameters from their IVM datasets, requiring access to expensive software (e.g. IMARIS as used here) and/or above-average computational expertise to develop and use custom-made open-source solutions. This limitation is not made explicit or discussed in the text.

      We acknowledge the reviewer's suggestion to incorporate open-source segmentation and tracking functionalities, increasing its accessibility to a wider user base; however, these additions fall outside the primary scope of our current work and represent a substantial undertaking in their own right. Several studies (e.g., Diego Ulisse Pizzagalli et al., J Immunol (2022); Aby Joseph et al., eLife (2020); Molina-Moreno et al., Medical Image Analysis (2022); Hidalgo-Cenalmor et al., Nat Methods (2024); Ershov et al., Nat Methods (2022)) have comprehensively addressed these topics, and we now reference them in the revised manuscript to provide readers with relevant background.

      The objective of our manuscript is not to develop a complete segmentation or tracking pipeline but rather to introduce an analytical framework capable of extracting enhanced insights from the data generated by existing tools. This goal arises from our observations of the field: despite significant investment in image processing, researchers often rely on simplistic approaches, such as averaging single parameters across conditions, which can obscure tumor heterogeneity and spatial behavioral dynamics within the tumor microenvironment.

      Our current tool focuses on providing this much-needed analytical capability. For our analysis we used Imaris, a widely utilized software in the intravital microscopy (IVM) community, known for its intuitive 3D visualization and analysis platform despite certain limitations. 

      In our own literature search of recent IVM studies published by leading laboratories in high-impact journals, we found that close to half used Imaris, while the remainder primarily relied on manual workflows with Fiji plugins. Thus, we consider it valuable to offer a pipeline compatible with such commonly used software, given its prevalence in the field.

      However, following the suggestion of the reviewer, and to enhance the tool’s flexibility and compatibility, we have expanded the pipeline to accept data formats generated by open-source Fiji plugins, such as TrackMate, MTrackJ, and ManualTracking. These updates are detailed in the revised manuscript and are implemented in our GitHub repository (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ), where we also provide several demos using TrackMate and Imaris processed data. This addition demonstrates our tool's capability to integrate with segmented and tracked datasets from diverse platforms, increasing its applicability to a broader range of researchers using both commercial and open-source pipelines.

      Comment #2: The number of cells (e.g. per behavioural cluster), and the number of independent mice, represented in each result figure, is not included in the figure legends and are difficult to ascertain from the methods.

      We appreciate the reviewer's constructive feedback regarding the clarity of the number and type of replicates used in our analyses. In the revised manuscript, we have included detailed information in the figure legends and the number of independent mice represented in each figure legend to ensure transparency. Regarding the number

      of cells, we have indicated the total number of processed cells in Figure 2b legend (953 cells). Additionally, we have now included figures (Sup Fig 4c, Sup Fig 5e-g, Fig 5c,e, Sup Fig 6 c,d) for each cluster, where individual dots represent the individual cell tracks with color indicating the position and the shape indicating individual mice.

      Comment #3: The data used to test the pipeline in this manuscript is currently not available, making it difficult to assess its usability. It would be important to include this for researchers to use as a 'training dataset'.

      As stated above we acknowledge that this was an oversight on our part and thank the reviewer for pointing this out. To address this, we have now added demo data to our GitHub repository (BEHAV3D_Tumor_Profiler/demo_datasets at main · imAIgeneDream3D/BEHAV3D_Tumor_Profiler · GitHub). In the revised manuscript we have referenced this addition in the Data availability section. Since we included now processing with Fiji as well, we provide 4 demo datasets (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler/tree/main/demo_datasets), one processed with Imaris in 3D; and one with CellPose2.0 and Trackmate in 2D; one processed with µSAM and Trackmate in 3D and one manually processed with MtrackJ in 2D . Moreover, we now provide Imaris-processed DMG IVM movie samples in an open-source repository.

      Comment #4: Precisely how the BEHAV3D-TP large-scale phenotyping module can map large-scale spatial phenotyping data generated using LSR-3D imaging data and Cytomap to 3D intravital imaging movies is unclear. Further details in the text and methods would be beneficial to aid understanding.

      We appreciate the reviewer’s comment and in the revised manuscript we have now provided details in the methods section “Tumor large-scale spatial phenotyping with Cytomap” to clarify how the BEHAV3D-TP module maps LSR-3D and Cytomap data to 3D intravital imaging movies:

      “To map the assigned regions onto IVM movies, a 3D image of the cluster distribution within the tumor was generated and exported for each sample (Figure Supplement 5a). Next, regions within the IVM movies were visually matched to the corresponding regions identified by the Large-Scale Phenotyping module of Cytomap (Figure 3c). For each mouse, at least one or two representative positions per matched region type were selected, cropped, and analyzed to assess tumor cell behavior, following the previously described cell tracking methodology (Imaris Cell tracking).”

      Moreover, we updated Figure 3 c to further clarify these steps.

      Comment #5: The analysis provides only preliminary evidence in support of the authors' conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment. Conclusions should therefore be tempered in the absence of additional experiments and controls. 

      We appreciate the reviewer’s comment and acknowledge that our conclusions should be tempered due to the preliminary nature of our evidence. In the revised version of the manuscript we have revised our conclusions accordingly and emphasize the necessity for additional experiments and controls to further validate our findings on DMG cell migratory behaviors and their relationship with the tumor microenvironment.

      In discussion: “While our findings suggest that microenvironmental factors may influence tumor cell migration, further studies will be necessary to establish causal relationships. Additional experimental validation, such as macrophage ablation experiments, could help clarify the specific contributions of these factors.”

      Reviewer #1 (Recommendations for the authors): 

      (1) To test the ability of the pipeline to identify relevant patterns of migratory behaviours additional 'control' experiments would be helpful e.g. comparing non-invasive vs invasive tumour cell lines, artificially controlling migratory behaviours of cells such as implanting beads soaked in factors that would attract/repel cells? 

      (2) Does the pipeline work well for a variety of cell types/contexts? e.g. can it identify and cluster more subtle migratory behaviours such as non-tumour cells during tissue development or regeneration conditions? 

      We appreciate the reviewer’s valuable suggestions. In the revised manuscript, we have included additional examples demonstrating the capability of our pipeline to investigate heterogeneous cell behavior across two additional experimental setups:

      (1) We have now evaluated our BEHAV3D TP heterogeneity module using IVM data from breast cancer cell lines with varying migratory capacities (DOI: 10.1016/j.yexcr.2019.04.009). In these datasets, our pipeline extends beyond predefined characteristics based solely on speed, enabling the identification of distinct cell populations. Notably, our analysis reveals that the breast cancer lines exhibit different proportions of different migratory behaviors such as Fast, Intermediate, Very slow and Static (Supplementary Fig 1).

      (2) We have now evaluated our BEHAV3D TP heterogeneity module using IVM data from healthy breast epithelial cells (DOI: 10.1016/j.celrep.2024.115073), where we identify distinct morhophynamic epithelial cell populations in the terminal end but of the mammary gland that have a distinct distribution among Hormone receptor (HR) + and HR- terminal end but cells.

      (3) To support biological conclusions could the authors show that ablating tumourassociated macrophages or vasculature alters the migratory patterns of nearby tumour cells? 

      We appreciate the reviewer's suggestion regarding the potential effects of ablating tumor-associated macrophages or vasculature on the migratory patterns of nearby tumor cells. While these experiments would functionally validate the observations made by our method, we would like to clarify that the primary focus of our study was on the development and application of computational tools for behavioral analysis and thus we consider that delving deeper in understanding the biology behind our observation is out of the scope of the current study. However, as mentioned previously, we have carefully tempered our conclusions to acknowledge the limitations of our current study. In the revised manuscript, we explicitly highlight that experiments involving the ablation of tumor-associated macrophages or vasculature would be crucial for further understanding the biological relevance of our findings.

      Minor corrections to text: 

      (4) Line 63 - are references formatted correctly?

      Thank you for pointing out this error. We have corrected it in the revised manuscript.

      (5) Lines 161 -162 - 'intravitally imaged' used twice in a sentence.

      Thank you for pointing out the typo. We have corrected it in the revised manuscript.

      Reviewer #2 (Public review):

      Comment#1: The strength of democratizing this kind of analysis is undercut by the reliance upon Imaris for segmentation, so it would be nice if this was changed to an open-source option for track generation.

      As noted in our previous response to Reviewer #1, we would like to point out that although Imaris is a commercial software, it is widely used in the intravital microscopy community due to its user-friendly interface. We conducted a literature review to evaluate this aspect and below we include references from leading laboratories in the IVM field that utilize Imaris. One of its key advantages, which we also utilized, is semi-automated data tracking that allows for manual corrections in 3D—a process that can be more challenging in other open-source software with less effective data visualization.

      However, we recognize that enhancing our pipeline's compatibility with open-source options is important. To this end, we have updated our tool to support 2D and 3D data formats generated by open-source Fiji plugins like TrackMate, MTrackJ, and ManualTracking, improving compatibility with various segmentation and tracking pipelines (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In the revised manuscript, we describe the new functionality and demonstrate the operation of the BEHAV3D-TP heterogeneity module across various IVM datasets, processed in both 2D and 3D with different processing pipelines (Supplementary Fig 1-3). This includes CellPose 2.0 and the novel 'Segment Anything' model, followed by TrackMate tracking, applied to both tumor and healthy IVM data. Moreover we have developed a new web application that integrates morphological and tracking information from Segment Anything segmentation and Trackmate tracking, depicted in Supplementary Fig 3 a (https://morphotrack-merger.streamlit.app/ ). Additionally, we have updated the introduction to better clarify the scope of our study and include references to existing image processing solutions.

      Comment#2: The main issue is with the interpretation of the biological data in Figure 3 where ANOVA was used to analyse the proportional distribution of different clusters. Firstly the n is not listed so it is unclear if this represents an n of 3 where each mouse is an individual or whether each track is being treated as a test unit. If the latter this is seriously flawed as these tracks can't be treated as independent. Also, a more appropriate test would be something like a Chi-squared test or Fisher's exact test. Also, no error bars are included on the stacked bar graphs making interpretation impossible. Ultimately this is severely flawed and also appears to show very small differences which may be statistically different but may not represent biologically important findings. This would need further study.

      We appreciate the reviewer’s insightful comments regarding the interpretation of the biological data in Figure 3. 

      To clarify, each imaged position is considered an independent biological replicate (n = 18 from a total of 6 mice). We acknowledge that the description of the statistical methods and the experimental units was not sufficiently clear in the previous version. In our original submission, we used an ANOVA to test whether the proportion of each behavioral cluster differed across the tumor microenvironment regions. Post hoc pairwise comparisons were performed using Tukey’s test, with the results shown in Supplementary Figure 2d (currently Fig 3d). However, we agree with the reviewer that this approach may be misleading when paired with stacked bar plots that lack error bars, as it can obscure individual variability and does not explicitly represent statistical uncertainty.

      In the revised manuscript, we present the data as boxplots with individual data points, where each dot represents an imaged position, and the shape corresponds to a specific mouse. In Figure 3 d the y-axis displays the normalized percentage of each cluster across TME regions, expressed as z-scores. This normalization corrects for inter-mouse variability and facilitates a comparison of the relative distribution of clusters across TME regions, independent of the overall abundance differences between mice. We performed an ANOVA with Tukey's post hoc test for each individual behavioral cluster to assess differences across TME regions. Additionally, for transparency, in Supplementary Figure 5 d we provide the raw percentage values. The legends provide the number of positions and mice included in the analysis. 

      Comment#3:  Figure 4 has similar statistical issues in that the n is not listed and, again, it is unclear whether they are treating each cell track as independent which, again, would be inappropriate. The best practice for this type of data would be the use of super plots as outlined in Lord et al. (2020) JCI - SuperPlots: Communicating reproducibility and variability in cell biology.

      We appreciate the reviewer’s comments and suggestions regarding Figure 4. In this case as we are comparing overall the behavioral clusters features, each individual cell is treated as a unit. In the revised manuscript, we have clarified this point in the figure legend and incorporated plots in Figure 4c and 4e, indicating the mouse and imaging position each data point originates from. This enhances the visualization of reproducibility and variability in our data, demonstrating that the results are consistent across multiple mice and positions and are not driven by a single mouse or imaging position.

      Comment#4: The main issue that this raises is that the large-scale phenotyping module and the heterogeneity module appear designed to produce these statistical analyses that are used in these figures and, if they are based on the assumption that each track is independent, then this will produce inappropriate analyses as a default.

      We appreciate the reviewer’s comment, although we are unclear about the specific concern being raised. To clarify, in our large-scale phenotyping analysis, each position is assigned to a TME niche based on the CytoMAP analysis and the workflow outlined in Figure 3c. Multiple positions are imaged per mouse. For each position, we measure the proportion of tumor cells exhibiting a specific behavioral phenotype, and these proportions are subsequently used for statistical analysis (Figure 3 d). 

      In contrast, in Supplementary Fig. 5e-g, we treat each cell track as an individual unit, grouping them by their assigned large-scale region. Here, we assess whether differences between regions can be detected using a conventional single-feature analysis—a more traditional approach. However, we find that this method loses important behavioral patterns and distinctions that BEHAV3D-TP captures.

      We hope that this explanation, along with the modifications made to the figures and figure legends, provides greater clarity.  

      Reviewer #3 (Public review):

      Comment #1: The most challenging task of analyzing 3D time-lapse imaging data is to accurately segment and track the individual cells in 3D over a long time duration. BEHAV3D Tumor Profiler did not provide any new advancement in this regard, and instead relies on commercial software, Imaris, for this critical step. Imaris is known to have a very high error rate when used for analyzing 3D time-lapse data. In the Methods section, the authors themselves stated that "Tumor cell tracks were manually corrected to ensure accurate tracking". Based on our own experience of using Imaris, such manual correction is tedious and often required for every time step of the movie. Therefore, Imaris is not a satisfactory tool for analyzing 3D time-lapse data. Moreover, Imaris is expensive and many research labs probably can't afford to buy it. The fact that BEHAV3D Tumor Profiler critically depends on the faulty ImarisTrack module makes it unclear whether the BEHAV3D tool or the results are reliable.

      If the authors want to "democratize the analysis of heterogeneous cancer cell behaviors", they should perform image segmentation and tracking using open-source codes (e.g., Cellpose, Stardisk & 3DCellTracker) and not rely on the expensive and inaccurate ImarisTrack Module for the image analysis step of BEHAV3D.

      We appreciate the reviewer’s comments on the challenges of segmenting and tracking individual cells in 3D time-lapse imaging data. As mentioned previously (please refer to comment #1 to reviewer #1), our primary focus is to develop an analytical tool for comprehensive data analysis rather than developing tools for image processing. However to enhance accessibility, we have updated our tool to support data formats from open-source Fiji plugins, such as TrackMate, which will benefit users without access to commercial software (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In Supplementary Figures 1, 2, and 3, we present IVM data from different sources, processed using three distinct methods: MTrackJ (Supplementary Fig. 1), Cellpose + TrackMate (Supplementary Fig. 2), and µSAM + TrackMate (Supplementary Fig. 3). The latter two represent state-of-the-art deep learning approaches.

      On the other hand, while we recognize the limitations of Imaris, it remains widely used in the intravital microscopy community due to its user-friendly interface for 3D visualization and semi-automated segmentation capabilities. Since no perfect tracking method currently exists, we initially utilized Imaris for its ability to allow manual correction of faulty tracks, ensuring the reliability of our results. This approach, not only widely used (see above) but was the best available option when we began our analysis, allowing us to obtain accurate results efficiently.

      In the revised manuscript, we clarify the scope of our study and provide information on both Imaris and alternative processing options to strengthen the reliability of our findings:

      In introduction: “While significant efforts have been made to develop opensource segmentation and tracking tools for live imaging data, including IVM22–27 fewer tools exist for the unbiased analysis of tumor dynamics. One major barrier is that implementing such analytical methods often requires substantial computational expertise, limiting accessibility for many biomedical researchers conducting IVM experiments. To bridge this gap, we present BEHAV3D Tumor Profiler (BEHAV3D-TP)  by providing a robust, user-friendly tool that allows researchers to extract meaningful insights from dynamic cellular behaviors without requiring advanced programming skills.”

      In the Methods, we describe now describe not only Imaris processing pipeline, but also the µSAM segmentation pipelines and reference to CellPose IVM processing, which are combined with TrackMate for tracking. Additionally, to integrate morphological information from µSAM with tracking data from TrackMate, we developed a web tool to merge the outputs from both processing steps: https://morphotrack-merger.streamlit.app/  

      Comment #2: The authors developed a "Heterogeneity module" to extract distinctive tumor migratory phenotypes from the cell tracks quantified by Imaris. The cell tracks of the individual tumor cells are all quite short, indicating relatively low motility of the tumor cells. It's unclear whether such short migratory tracks are sufficient to warrant the PCA analysis to identify the 7 distinctive migratory phenotypes shown in Figure 2d. It's also unclear whether these 7 migratory phenotypes correspond to unique functional phenotypes.  

      For the 7 distinctive motility clusters, the authors should provide a more detailed analysis of the differences between them. It's unclear whether the difference in retreating, slow retreating, erratic, static, slow, slow invading, and invading correspond to functional difference of the tumor cells.

      While some tumor cells exhibit limited motility, indicated by short tracks, others demonstrate significant migratory capabilities (Figure 2 Invading and Retreating cells). This variability in tumor cell behavior is a central focus of our analysis, and our tool is specifically designed to identify and distinguish these differences. Our PCA analysis effectively captures this variability, as illustrated in Figure 2 d-f. It differentiates between cells exhibiting varying degrees of migratory behavior, including both highly and less migratory phenotypes, as well as their directionality relative to the tumor core and the persistence of their movements. Thus, we believe that our approach provides valuable insights into the distinct migratory phenotypes within the tumor microenvironment. 

      While our current manuscript does not provide explicit evidence linking each motility cluster to functional differences among the tumor cells, it is important to note that the state of the field supports the idea that cell dynamics can predict cell states and phenotypes. Research conducted by ourselves (Dekkers, Alieva et al., Nat Biotech, 2023) and others, such as Craiciuc et al. (Nature, 2022) and Freckmann et al. (Nat Comm, 2022) has shown that variations in cell motility patterns are indicative of underlying functional characteristics. For instance, cell morphodynamic features have been shown to reflect differences in cell types, T cell targeting states (Dekkers, Alieva et al., Nat Biotech, 2023), immune cell types (Crainiciuc et al. (Nature, 2022)), tumor metastatic potential, and drug resistance states (Freckmann et al. (Nat Comm, 2022)). In the revised manuscript, we have referenced relevant studies to underscore the biological significance of these behaviors. By doing so, we hope to clarify the potential implications of our findings and strengthen the overall narrative of our research:

      In discussion: “While our current study does not provide direct functional validation of the distinct motility clusters identified, existing literature strongly supports the notion that cell dynamics can serve as a proxy for functional states and phenotypic heterogeneity. Prior work, including studies by our group[19,66]  as well as Crainiciuc et al.[35] and Freckmann et al.[20], has demonstrated that variations in cell motility patterns can reflect underlying functional characteristics. Specifically, cell morpho-dynamic features have been shown to correlate with differences in cell type identity, T-cell engagement, metastatic potential, and drug resistance states. This growing body of evidence suggests that tumor cell behavior, as captured by BEHAV3D-TP, may serve as a predictive tool for deciphering functional tumor heterogeneity. Future studies integrating transcriptomic or proteomic profiling of motility-defined subpopulations could further elucidate the biological significance of these behavioral phenotypes.”

      Comment #3: Using only motility to classify tumor cell behaviours in the tumor microenvironment (TME) is probably not sufficient to capture the tumor cell difference. There are also other non-tumor cell types in the TME. If the authors aim to develop a computational tool that can elucidate tumor cell behaviors in the TME, they should consider other tumor cell features, e.g., morphology, proliferation state, and tumor cell interaction with other cell types, e.g., fibroblasts and distinct immune cells.

      The authors should expand the scale of tumor behavior features to classify the tumor phenotype clusters, e.g., to include tumor morphology, proliferation state, and tumor cell interaction with other TME cell types.

      We believe that using dynamic features alone is sufficient to capture differences in tumor behavior, as demonstrated by our results in Figure 2. However, we appreciate the reviewer’s suggestion to consider additional features, such as cell morphology, to finetune our analyses. To this end, we have adapted our pipeline to be compatible with any dynamic, morphologic or spatial features present in the data. In the revised manuscript we showcase this new addition with the analyses of two new dataset: 2D IVM data from healthy epithelial breast cells (Supplementary Fig 2) and 3D IVM data from adult gliomas (Supplementary Fig 3). These analyses identified cells with specific morphodynamic characteristics, which exhibited distinct kinetic behaviors or spatial distributions.

      However, we would like to point out that not all features may provide informative insights and that a wide range of features can instead introduce biologically irrelevant noise, making interpretation more challenging. For instance, in 3D microscopy, the zaxis resolution is typically lower, which can lead to artifacts like elongation in that direction. Adding morphological features that capture this may skew the analysis. Therefore, we believe that incorporating additional features should be approached with caution. We clarify these considerations in the revised manuscript to better guide users in utilizing our computational tool effectively:

      In discussion: “In addition to motility-based classification, features such as tumor cell morphology, proliferation state, and interactions with the tumor microenvironment can further refine tumor phenotyping. BEHAV3D-TP allows for the selection of diverse feature types, supporting datasets that include both dynamic, morphological and spatial parameters. However, we recognize that expanding the feature set may introduce biologically irrelevant noise, particularly in 3D microscopy data where limited z-axis resolution can lead to morphological artifacts. This highlights the potential need in the future to include unbiased feature selection strategies, such as bootstrapping methods67, to ensure the identification of meaningful and biologically relevant parameters. Careful consideration of these aspects is key to maximizing the interpretability and predictive value of analyses performed with BEHAV3D-TP.”

      Comment #4: The authors have already published two papers on BEHAV3D [Alieva M et al. Nat Protoc. 2024 Jul;19(7): 2052-2084; Dekkers JF, et al. Nat Biotechnol. 2023 Jan;41(1):60-69]. Although the previous two papers used BEHAV3D to analyze T cells, the basic pipeline and computational steps are similar, in particular regarding cell segmentation and tracking. The addition of a "Heterogeneity module" based on PCA analysis does not make a significant advancement in terms of image analysis and quantification.

      We want to emphasize that we have no intention of duplicating our previous publications. In this manuscript, we have consistently cited our foundational papers, where BEHAV3D was first developed for T cell migratory analysis in in vitro settings. In the introduction, we clearly state that our earlier work inspired us to adopt a similar approach for analyzing cell behavior in intravital microscopy (IVM) data, addressing the specific needs and complexities of analyzing tumor cell behaviors in the tumor microenvironment.

      Importantly, our new work provides several key advancements: 1) a pipeline specifically adapted for intravital microscopy (IVM) data; 2) integration of spatial characteristics from both large-scale and small-scale phenotyping; and 3) a zero-code approach designed to empower researchers without coding skills to effectively utilize the tool. We believe that these enhancements represent meaningful progress in the analysis of cell behaviors within the tumor microenvironment which will be valuable for the IVM community. We ensure that these points are clearly articulated in the revised manuscript:

      In introduction: “In line with this concept of characterizing cellular dynamic properties for cell classification, we have previously developed an analytical platform termed BEHAV3D 19,21 allowing to perform behavioral phenotyping of engineered T cells targeting cancer. While BEHAV3D was initially developed to analyze T cell migratory behavior under controlled in vitro conditions, we sought to expand its application to investigate tumor cell behaviors in IVM data, where the complexity of the TME presents distinct analytical challenges. This manuscript builds on our foundational work but represents a significant advancement by adapting the pipeline specifically for IVM datasets.”

      Reviewer #3 (Recommendations for the authors): 

      (1) If the authors want to "democratize the analysis of heterogeneous cancer cell behaviors", they should perform image segmentation and tracking using open-source codes (e.g., Cellpose, Stardisk & 3DCellTracker) and not rely on the expensive and inaccurate ImarisTrack Module for the image analysis step of BEHAV3D. 

      We thank the reviewer for this recommendation and as stated above we recognize that enhancing our pipeline's compatibility with open-source options is important. To this end, we have updated our tool to support data formats generated by open-source Fiji plugins like TrackMate, MTrackJ, and ManualTracking, improving compatibility with various segmentation and tracking pipelines (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In the revised manuscript, we detail this new functionality and demonstrate the operation of the BEHAV3D-TP heterogeneity module using an example dataset of glioma tumors.

      Additionally, we have updated the introduction to better clarify the scope of our study (See comment #1 from Review #3) and include references to existing image processing solutions.

      (2) For the 7 distinctive motility clusters, the authors should provide a more detailed analysis of the differences between them. It's unclear whether the difference in retreating, slow retreating, erratic, static, slow, slow invading, and invading correspond to functional difference of the tumor cells. 

      As noted in the comment above, the revised manuscript now incorporates references to relevant literature that support our understanding that behavioral differences among cells are driven by their underlying functional differences (See comment #2 from Reviewer #3). Additionally, we would like to point to Figure 2d and Supplementary Fig 4 c that provide evidence of the functional distinctions between the identified clusters.

      (3) The authors should expand the scale of tumor behavior features to classify the tumor phenotype clusters, e.g., to include tumor morphology, proliferation state, and tumor cell interaction with other TME cell types.

      We thank the reviewer for this valuable suggestion. In the revised manuscript, we have added the flexibility to incorporate a wide range of features, including morphological ones, and enabled users to select the specific features they wish to include in their analysis. To illustrate this functionality, we have included 2 example dataset analyzed using this approach (See comment #3 from Reviewer #3). Additionally, as indicated above we emphasize the importance of careful selection and interpretation of features, as improper choices may lead to biologically irrelevant results. This clarification is intended to ensure that users apply the tool thoughtfully and derive meaningful insights.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      We thank reviewer 1 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Weaknesses:

      While this study convincingly describes the phenotype seen upon Drp1 loss, my major concern is that the mechanism underlying these defects in zygotes remains unclear. The authors refer to mitochondrial fragmentation as the mechanism ensuring organelle positioning and partitioning into functional daughters during the first embryonic cleavage. However, could Drp1 have a role beyond mitochondrial fission in zygotes? I raise these concerns because, as opposed to other Drp1 KO models (including those in oocytes) which lead to hyperfused/tubular mitochondria, Drp1 loss in zygotes appears to generate enlarged yet not tubular mitochondria. Lastly, while the authors discard the role of mitochondrial transport in the clustering observed, more refined experiments should be performed to reach that conclusion.

      It would be difficult to answer from this study whether Drp1 plays a role beyond mitochondrial fission in zygotes. However, the reasons why Drp1 KO zygotes differ from the somatic Drp1 KO model can be discussed as follows.

      First, the reviewer mentioned that the loss of Drp1 in oocytes leads to hyperfused/tubular mitochondria, but in fact, unlike in somatic cells, the EM images in Drp1 KO oocytes show enlarged mitochondria rather than tubular structures (Udagawa et al., Curr Biol. 2014, PMID: 25264261, Fig. 2C and Fig. S1B-D), as in the case of zygotes in this study. Mitochondria in oocytes/zygotes have the shape of a small sphere with an irregular cristae located peripherally. These structural features may be the cause of insensitivity or resistance to inner membrane fusion the resultant failure to form tubular mitochondria as seen in somatic cell models. Nonetheless, quantitative analysis of EM images in the revised version confirmed that the mitochondria of Drp1-depleted embryos were not only enlarged but also significantly elongated (Figure 2J-2M). Therefore, in Drp1-depleted embryos, significant structural and functional (e.g., asymmetry between daughters) changes in mitochondria were observed, and these are expected to lead to defects in the embryonic development.

      As for mitochondrial transport, we do not fully understand the intent of this question, but we do not entirely rule out mitochondrial transport. At least clustered mitochondria did not disperse again, but how mitochondria behave through the cytoskeleton within clusters will require further study, as the reviewer pointed out.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors show no effect of Myo19 Trim-Away, yet it remains unclear whether myo19 is involved in the positioning of mitochondria around the spindle. Judging by their co-localization during that stage, it might be. Therefore, in the absence of myo19, mitochondria might remain evenly distributed throughout mitosis, thus passively resulting in equal partitioning to daughter cells, with no severe developmental defects. Could the authors show a video of the whole process and discuss it?

      We have newly performed live imaging of mitochondria and chromosomes in Myo19 Trim-Away zygotes (n=13). As shown in Figure 1-figure supplement 2 and Figure 1-Video 2, there were no obvious changes in mitochondrial (and chromosomal) dynamics throughout the first cleavage and no significant mitochondrial asymmetry was observed, Therefore, we conclude that depletion of Myo19 does not cause mitochondrial asymmetry during embryonic cleavage. These results are described in the revised manuscript (Line 218-221).

      (2) Mitochondrial aggregation upon Drp1 depletion should be characterized in more detail: for example, % of mitochondria free, % in small clusters (> X diameter), and % in big clusters (>Y diameter).

      In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). In control embryos, mitochondria were interspersed in a large number of small clusters, while in Drp1-depleted embryos, mitochondria became highly aggregated into a small number of large clusters that was reversed by expression of mCh-Drp1. These results are described in the revised manuscript (Line 242-245).

      (3) The discrepancies with parthenogenetic embryos derived from Drp1 (-/-) parthenotes should be commented on. Quantification of the dimensions of the clusters would help establish the degree of similarity/difference. Could the authors comment on their hypothesis as to why the clusters are remarkably larger in Drp1 depleted zygotes?

      In the revised version, we have quantified the mitochondrial aggregation in Drp1 KO parthenotes (Figure 2-figure supplement 1; the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). The size of mitochondrial clusters in Drp1 KO parthenotes was significantly increased compared to controls, but as the reviewer noted, mitochondrial aggregation appears to be moderate compared to that in Drp1-depleted embryos. The phenotypic discrepancies in two Drp1-deficient embryo models is discussed below.

      First, it is clear that phenotypic severity of Drp1 KO oocytes is dependent on the age of the female. Indeed, oocytes collected from 8-week-old female arrested meiosis after NEB, mainly due to marked mitochondrial aggregation (Udagawa et al., Curr Biol. 2014, PMID: 25264261), whereas oocytes from juvenile female completed meiosis (Adhikari et al., Sci Adv. 2022, PMID: 35704569), and thus Drp1 KO pathenotes were obtained from juvenile female in the present study. Comparison of mitochondrial morphology in Drp1 KO oocytes in both papers also suggests that mitochondrial aggregation in adult mice is more intense (Udagawa et al., Curr Biol. Fig. 2A) than in juvenile mice (Adhikari et al., Sci Adv. 2022: Fig. 1G, 1H), and appears to be similar to Drp1-depleted embryos in this study (Figure 2E). There may be differences in the level of Drp1 depletion in these Drp1-deficient oocytes/zygotes. Similar results occurring between juvenile and adult KO female have been reported in a previous paper (Yueh et al., Development 2021, PMID: 34935904), as adult-derived Smac3<sup>Δ/Δ<?sup> zygotes arrested at the 2-cell stage, whereas juvenile-derived Smac3<sup>Δ/Δ<?sup> zygotes have developmental competence comparable to the wild type. Remarkably, the SMC3 protein levels in juvenile Smac3<sup>Δ/Δ<?sup> oocytes was also comparable to Smc3<sup>fl/fl</sup>. The authors surmised that the decline maternal SMC3 between juvenile and sexual maturity is probably due to the continuous induction of the promoter-Cre driver, suggesting that similar induction may also occur in Drp1 KO oocytes. In addition, we also observed not only age differences but also batch differences in Drp1 KO oocytes (and resulting embryos) such that little mitochondrial aggregation was observed in oocytes collected from some juvenile KO colonies. Therefore, for KO models showing age (sexual maturation)-dependent gradual phenotypic changes, Trim-way may be an approach that provides more reproducible results as it induces acute degradation of maternal proteins.

      (4) Mitochondrial clusters in Drp1 trim-away zygotes resemble those seen when defects in mitochondrial positioning are obtained by TRAK2 induction (PMID: 38917013), pointing again to a role of actin in the clustering process. Could the authors explore the role of actin further?

      TRAK2 and microtubule-dependent mechanisms may also be involved in mitochondrial dynamics during the first cleavage division, possibly in association with migration of two pronuclei. Although the mitochondrial aggregation induced by TRAK2 overexpression is similar to that in Drp1-depleted embryos, it is unlikely that changes at the EM level occurred as seen in Drp1-depleted embryos (enlarged mitochondria, etc.). In addition, in TRAK2-overexpressing embryos, rather than uneven partitioning of mitochondria, the daughter blatomeres themselves were uneven in size after cleavage, making it difficult to precisely assess the similarity between the two models.

      Regarding the role of F-actin, we show that the subcellular distribution of cytoplasmic actin overlaps with that of mitochondria throughout the first cleavage and seems to accumulate in aggregated mitochondria, particularly during the mitotic phase, as higher correlation was observed (Figure 1E). Although it was not observed that actin and the myo19 motor regulate mitochondrial partitioning, as reported in somatic cell-based studies, it is possible that actin accumulated in mitochondria may be indirectly involved in mitochondrial dynamics via mitochondrial fission. For example, inverted formin 2 (INF2) enhance actin polymerization and is required for efficient mitochondrial fission as an upstream function of Drp1 (Korobova et al., Science 2013, PMID: 23349293). In the revised manuscript, we have added the description on this point. (Line 452-456)

      (5) Electron microscopy images showed indeed aberrant morphology of the mitochondria, yet not a hyperfused morphology. Aspect ratio (long/short axis) quantification should be included, besides the current measurement, since mitochondria in Drp1 trim-away look bigger yet as round as in the control.

      In the revised version, detailed quantitative data on EM images has been added (Figure 2J-2M). In Drp1 depleted embryos, significant increases were observed in both the major and minor axes of mitochondria. As the reviewer noted, we also assumed that mitochondria in depleted embryos were enlarged rather than elongated, but the quantification of aspect ratio shows that significant elongation occurred. These results has been described in the revised manuscript (Line 252-256).

      (6) Why are mitochondria in golgi-mcherry-expressing cells showing a different morphology of the clusters?

      As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.

      (7) Authors comment on ROS being enriched (highly accumulated) in mitochondria. However, while quantification is missing, it might seem that ROS are equally distributed in control or Drp1 Trim-Away embryos. Could the authors quantify ROS signal inside and outside of the mitochondria, perhaps using a mask drawn by mitotracker? Furthermore, it would make these data more convincing to artificially induce/deplete ROS to validate the sensitivity of the technique to variations. Also, why is ROS pattern referred to as ectopic?

      Thank you for your useful suggestions. In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E). The term ectopic was used to mean excessive accumulation of ROS in the mitochondria compared to normal embryos, but has been deleted as it is not very accurate.

      Minor comments:

      (A) Video 1: images at t=-00:20 and t=00:00 of the mtGFP are actually the same images as H2B-mCherry.

      Probably a faulty filter/shutter control failed to capture GFP fluorescence at these times. It appears that the autocontrast function detected a small amount of mCherry fluorescence leakage. It would be possible to replace it with another video, but as the relevant frame were unrelated to the analysis, the previous video was used as is. The same problem also occurs in the newly added Myo19-depleted zygote movie (Figure 1-Video 2, 03:15).

      (B) Could you calculate the degree of colocalization between mt-GFP and ER-mCherry in ctrl and Drp1 trim-away? While it is apparent that ER is somehow more associated with mitochondrial clusters, it would be informative to quantify it.

      Since the ER is partially confined to the mitochondrial aggregation site, it was difficult to calculate correlation coefficients from fluorescence images of mt-GFP and ER-mCherry to quantitatively assess colocalization. Instead, line scan analysis of whole mitochondrial clumps showed that the peak of the ER-mCherry signal overlaps with that of mt-GFP, but this is not the case for Golgi-mCherry or peroxisome-mCherry (Figure 2-figure supplement 2A-2C).

      (C) Regarding the developmental arrest: The quantification of the different stages at each developmental time could be more informative. For example, at E4.5 how many embryos are at each stage (2-cell, 4-cell, ... blastocyst)? Also, could the authors comment on the reduction in developmental competence in Figure 4C, regarding the blastocyst stage?

      Many arrested embryos do not maintain their morphologies and undergo a unique degenerative process over time, known as cell fragmentation. Therefore, it is difficult to accurately determine the number of each developmental stage at, for example, E4.5 days. In this study, the 2-cell stage was observed at E1.5, the 4-8 cell at E2.5-E3.0, morula at E3.5 and the blastocyst at E4.5.

      Although the rate of embryos reaching the blastocyst stage was reduced compared to that of normal embryos, the overexpression of mCh-Drp1 may explain the failure of complete restoration of developmental competence, since embryos injected solely with mCh-Drp1 mRNA also showed reduced developmental competence. For rescue experiments, the comparison with internal controls is more important and therefore we described below. This is a specific effect of Drp1 deletion because none of the internal control conditions increased arrest at the 2-cell stage and arrest was completely reversed by microinjecting Trim-away insensitive exogenous mCh-Drp1 mRNA (Line 337-340).

      (D) In lines 103 to 105, proliferation should be changed to division or development.

      In the revised version, proliferation has been changed to division (Line 103).

      (E) Could the authors reference the statement in lines 168-169?

      The following 3 references have been added (Hardy et al., 1993, PMID: 8410824; Meriano et al., 2004, PMID: 15588469; Seikkula et al., 2018, PMID: 29525505).

      (F) Line 448: "Cells lacking Drp1 have highly elongated mitochondria that cannot be divided into transportable units,..." This is clearly not the case for zygotes, so why are then these mitochondria still clustering and not transported elsewhere?

      Although it is difficult to answer this reviewer's question precisely, EM images of Drp1-depleted embryos suggest that individual mitochondria appear not only to be enlarged but also to have increased outer membrane attachment due to excessive aggregation. Thus, these large mitochondrial clumps may therefore be preventing transport.

      Reviewer #2 (Public review):

      We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Weaknesses:

      The authors first describe the redistribution of mitochondria during normal development, followed by alterations induced by Drp1 depletion. It would be useful to indicate the time post-hCG for imaging of fertilised zygotes (first paragraph of the results/Figure 1) to compare with subsequent Drp1 depletion experiments.

      In the revised version, the time after hCG has been indicated (Line 176-182). In subsequent Drp1 depletion experiments, the revised version notes that “no significant delay in cell cycle progression was observed following Drp1 depletion (data not shown) compared to control embryos (Figure 1A)” (Line 291-193). There was a slight discrepancy in the time post-hCG between live imaging and immunofluorescence analysis (Figure 1-figure supplement 1A), which may be due to manipulation of zygotes outside incubator during the microinjection of mRNA.

      It is noted that Drp1 protein levels were undetectable 5h post-injection, suggesting earlier times were not examined, yet in Figure 3A it would seem that aggregation has occurred within 2 hours (relative to Figure 1).

      As the reviewer pointed out, the depletion of Drp1 is likely to have occurred at an earlier stage. In this study, due to the injection of various mRNAs to visualize organelles such as mitochondria and chromosomes, observations were started after about 5 h of incubation for their fluorescent proteins to be sufficiently expressed. Therefore, for the Western blot analysis, samples were prepared according to the time of the start of the observation.

      Mitochondria appear to be slightly more aggregated in Drp1 fl/fl embryos than in control, though comparison with untreated controls does not appear to have been undertaken. There also appears to be some variability in mitochondrial aggregation patterns following Drp1 depletion (Figure 2-suppl 1 B) which are not discussed.

      In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). We have also quantified the mitochondrial aggregation in Drp1<sup>fl/fl</sup> and Drp1<sup>Δ/Δ</sup> parhenotes (Figure 2-figure supplement 1; note that the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). Mitochondria appear to be slightly more aggregated in Drp1<sup>fl/fl</sup> embryos than in control, but no significant differences in cluster size or number were observed (data not shown). On the other hand, mitochondrial clusters in Drp1 Trim-Away embryos were remarkably larger than Drp1<sup>Δ/Δ</sup> parhenotes, Please refer to the response to reviewer 1's comment (3) for discussion of this discrepancy.

      As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.

      The authors use western blotting to validate the depletion of Drp1, however do not quantify band intensity. It is also unclear whether pooled embryo samples were used for western blot analysis.

      In the revised version, the band intensities in Western blot analysis were quantified and validated the previous results (Figure 1H for Myo19 depletion, Figure 2B for Drp1 expression during preimplantation development, Figure 2D for Drp1 depletion). The number of embryos analyzed was described in Figure legends (Pooled samples ranging from 20 to 100 were used).

      Likewise, intracellular ROS levels are examined however quantification is not provided. It is therefore unclear whether 'highly accumulated levels' are of significance or related to Drp1 depletion.

      In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E).

      In previous work, Drp1 was found to have a role as a spindle assembly checkpoint (SAC) protein. It is therefore unclear from the experiments performed whether aggregation of mitochondria separating the pronuclei physically (or other aspects of mitochondrial function) prevents appropriate chromosome segregation or whether Drp1 is acting directly on the SAC.

      In the revised manuscript, we have discussed this reference (Zhou et al., Nature Communications, PMID: 36513638) (Line 482-483).

      Reviewer #2 (Recommendations For The Authors):

      The authors report that disruption of F-actin organization led to asymmetry in mitochondrial inheritance, however depletion of Myo19 does not impact inheritance. The authors note in the discussion that loss of another mitochondrial motor protein, Miro, has been shown to affect mitochondrial inheritance. They suggest this may be due to reduced levels of Myo19, despite data from the present study suggesting a lack of involvement of Myo19. Given that Miro1 also interacts with microtubules, and crosstalk between actin filaments and microtubules has been reported, have the authors considered whether other motor proteins, such as KIF5, may be involved in mitochondrial movement in the zygote and therefore inheritance? Myo19 also plays a role in mitochondrial architecture. Were any differences noted at the EM level?

      During oocyte meiosis and early embryonic cleavage, kinesin-5 has been reported to be important for the formation of bipolar spindles (Fitzharris, Curr Biol., 2009, PMID: 19465601) and may have some involvement in mitochondrial dynamics. Given that the migration of two pronuclei towards the zygotic centre is dynein-dependent manner (Scheffler Nat Commun. 2021PMID: 33547291), dynein may also be involved in the process of mitochondrial accumulation around the pronuclei. Nevertheless, whether microtubule-dependent mechanisms regulate mitochondrial partitioning remains controversial. Mitochondria basically diverge from microtubules at the onset of mitosis, and indeed Miro1-deleted zygotes did not show the asymmetric mitochondrial partitioning (Lee et al., Front Cell Dev Biol. 2022, PMID: 36325364). More recently, it was reported that overexpression of TRAK2 causes significant mitochondrial aggregation in embryos (Lee et al., Proc Natl Acad Sci U S A. 2024, PMID: 36325364), but since overexpression might disrupt a regulatory balance by other motors/adaptor complexes, further investigation using TRAK2-deficient embryos is expected.

      As noted by the reviewer, myo19 seems to be important for the maintenance of mitochondrial cristae architecture and, consequently, for the regulation of mitochondrial function (Shi et al., Nat Commun. 2022, PMID: 35562374). We have not observed the EM images in myo19-depleted embryos, but we examined their membrane potential and ROS by TMRM and H2DCF staining, respectively, and confirmed that they were comparable to control embryos (data not shown). The loss of myo19 in zygotes/embryos did not cause any functional changes in mitochondria, suggesting that mitochondrial architecture may not be substantially affected either.

      Transcriptomic analysis would be useful to identify alterations in cell cycle checkpoint regulators, as well as immunofluorescence to identify changes in spindle assembly checkpoint protein recruitment.

      The present results showed that the majority of Drp1-depleted embryos arrest at the G2 stage, possibly due to cell cycle checkpoint mechanisms. Transcriptome analysis would certainly be beneficial, but eventually more detailed analysis of proteins and their phosphorylation modifications, etc. is needed for accurate assessment. These studies will be the subject of future work.

      Minor comments:

      There are many instances where the English could be improved, particularly the overuse of the word 'the'.

      We have checked the manuscript again carefully and hopefully it has been improved some.

      Line 144: replace 'took' with 'take'.

      We have corrected this in the revised version (Line 140).

      Line 157: it is unclear what is meant by 'hinders the functional importance of Drp1 in mature oocytes and embryos'.

      This description has been corrected to “complicates the functional analysis of Drp1 in mature oocytes and embryos” (Line 152-153)

      Line 198: replace with 'displayed a mitochondrial distribution pattern closely associated with'

      We have corrected this in the revised version (Line 195-196).

      Line 200: provide a time to clarify when the cytoplasmic meshwork was 'subsequently reorganized'

      In the revised version, “at the metaphase” has been added (Line 198).

      Line 204: replace 'to' with 'for'

      We have corrected this in the revised version (Line 203).

      Lines 285-87: consider rearranging the text to improve the flow.

      To improve the flow of text before and after, the following sentence has been added; We postulated that this asymmetry was due to non-uniformity in the distribution of mitochondria around the spindle (Line 295-297)

      Line 418: replace 'central' with 'centre'

      We have corrected this in the revised version (Line 430).

      Line 427: replace 'pertaining' with 'partitioning'

      We have corrected this in the revised version (Line 438).

      Line 574: clarify to what '1-5% of that of the oocytes' refers

      We have corrected it to “1-5% of the total volume of the zygote.” (Line 587-588).

      Line 619: indicate the dilution used

      We apologize for the previous incorrect description. We used a part of the extract as the template, not a dilution, and have corrected it to be accurate (Line 631-632).

      Line 634: replace 'on' with 'in' and detail in which medium embryos were mounted.

      We have corrected this in the revised version (Line 647).

      Please check all spelling in the figures.

      Figure 1J - inheritance is spelt incorrectly.

      Figure-Suppl 1, D: Interphase (PN) and (2-cell) is spelt incorrectly. G: inheritance is spelt incorrectly.

      Figure 5F - bottom section prior to cytokinesis, spindle is spelt 'spincle'

      Ensure consistency in abbreviation use (e.g. use of NEB and NEBD).

      Thank you for your careful correction of typographical errors. In the revised version, all points raised by the reviewers have been corrected.

      Reviewer #3 (Public review):

      We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Seemingly, there are few apparent shortcomings. Following are the specific comments to activate the further open discussion.

      Line 246: Comments on cristae morphology of mitochondria in Drp1-depleted embryos would better be added.

      In the revised manuscript, we have added the following comment; swollen or partially elongated mitochondria with lamella cristae structures in the inner membrane were observed in Drp1 depleted embryos. In addition, the quantification of aspect ratio (long/short axis) shows that significant mitochondrial elongation was occurred (Figure 2M). These results has been described in the revised manuscript (Line 251-256).

      - Regarding Figure 2H: If possible, a representative picture of Ateam would better be included in the figure. As the authors discussed in line 458, Ateam may be able to detect whether any alterations of local energy demand occurred in the Drp1-depleted embryos.

      Thank you for your very useful comments. Although it would be interesting to investigate whether alterations in ATP levels occurred in localized areas (e.g., around the spindle), the present study used conventional fluorescence microscope instead of confocal laser microscopy to observe ATeam fluorescence in order to quantify the fluorescence intensity in the whole embryo (or whole blastomere) and thus we currently cannot provide the images that reviewer expected. As shown in Figure-figure supplement 1C, the ATP levels tend to be higher at the cell periphery in control and at the mitochondrial aggregation areas in Drp1-depleted embryos, but it would need high resolution images using confocal microscopy to show it clearly.

      - Line 282: In Figure 3-Video 1, mitochondria were seemingly more aggregated around female pronucleus. Is it OK to understand that there is no gender preference of pronuclei being encircled by more aggregated mitochondria?

      Review of multiple videos shows that aggregated mitochondria were localized toward the cell center, but did not exhibit the behavior of preferentially concentrating near the female pronucleus.

      - Line 317: A little more explanation of the "variability" would be fine. Does that basically mean that the Ca<sup>2+</sup> response in both Drp1-depleted blastomeres were lower than control and blastomere with more highly aggregated mitochondria show severer phenotype compared to the other blastomere with fewer mito?

      We think that the reviewer's comments are mostly correct. It is clear that there is a bias in Ca<sup>2+</sup> store levels between blastomeres of Drp1 depleted embryos, However, since mitochondria were not stained simultaneously in this experiment, we cannot draw conclusions in detail, such that daughter blastomere that inherit more mitochondria have higher Ca<sup>2+</sup> stores, or that blastomere with more aggregated mitochondria have lower Ca<sup>2+</sup> stores.

      - Regarding Figure 5B (& Figure 1-figure supplement 1B): Do authors think that there would be less abnormalities in the embryos if Drp1 is trim-awayed after 2-cell or 4-cell, in which mitochondria are less involved in the spindle?

      The marked centration of mitochondrial clusters in Drp1-depleted embryos appears to be associated with migration of the pronuclei toward the cell center, which is unique to the first embryonic cleavage. Since the assembly of the male and female pronuclei at the cell center is also unique to the first cleavage, binucleation due to mitochondrial misplacement was observed only in the first cleavage. Therefore, if Drp1 is depleted at the 2-cell or 4-cell stage, chromosome segregation errors may be less frequent. However, since unequal partitioning of mitochondria is thought to occur, some abnormalities in embryonic development is likely to be observed.

      Reviewer #3 (Recommendations For The Authors):

      Specific comments

      - Line 262: "Since mitochondrial dynamics are spatially coordinated at the ER-mitochondria MCSs," adequate ref. would better be added.

      We have added an adequate reference to the revised manuscript (Friedman et al., 2011, PMID: 21885730).

      - Line 333-336: "...as assessed by the presence of the nuclear envelope." Do authors show the data? In Figure 4-figure supplement 1A, the difference of the phosphoH3-ser10 signal between control and Trim-Away group might be weak. For clarity, it would be helpful if authors indicate the different points to note in the figure.

      Although the data is not shown, nuclear staining of arrested 2-cell stage embryos exhibited clear nuclear membranes, similar to the DAPI image in Figure 4-figure supplement 1A. We have indicated that the data is not shown in the revised version (Line 345). Based on a report that phosphorylated histone H3 (Ser10) localizes in pericentromeric heterochromatin that hat can be visualized by DAPI staining in late G2 interphase cell (Hendzel et al., 1997, Chromosoma, PMID: 9362543), this study qualitatively estimated the G2 phase from the phosphorylated histone H3 signal and the DAPI counterstained images. We have noted this point in the revised figure legend (Line 1012-1014).

      Typos or points for reword/rephrase

      - Line 149: "molecular identification" may better be " molecular characteristics".

      We have corrected this in the revised version (Line 145).

      - Line 157: "hinders the functional importance" would be "implies the functional importance" or "complicates the functional analysis".

      We have corrected this in the revised version (Line 152-153).

      - Line 208: "Since the role of F-actin in many cellular events, such as cytokinesis, preclude them as targets for experimentally manipulating mitochondrial distribution, " may better be "Given many cellular roles, disruption of F-actin per se was unsuitable as a strategy for manipulating mitochondrial distribution", for example.

      We have corrected this in the revised version (Line 207-208).

      - Line 260: "with MCSs with the plasma.." may better be "with MCSs such as with the plasma..".

      We have corrected this in the revised version (Line 267-268).

      - Line 312: "distribution and segregation" may better be "distribution and the resulting segregation of the inter-organelle contacts".

      We have corrected this in the revised version (Line 324-325).

      - Line 427: "pertaining" might be "partitioning".

      We have corrected this in the revised version (Line 438).

      Line 463: "loss of Drp1 induced mitochondrial aggregation disturbs" may better be "mitochondrial aggregation induced by the loss of Drp1 disturbs".

      We have corrected this in the revised version (Line 478-479).

      - Line 752: "endoplasmic reticulum (pink) " would be " endoplasmic reticulum (aqua) ".

      We have corrected this in the revised version (Line 780).

      - Figure 5E: "(Noma 2-cell embryos)" would be "(Nomal 2-cell embryos)".

      - Figure 5F: "Mitochondrial centration prevents dual spincle assembly" would be "Mitochondrial centration prevents dual spindle assembly".

      Thank you for your careful correction of typographical errors. We have corrected all the words/expressions the reviewer pointed out in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews

      The main criticisms levied by both reviewers can be traced down to our use of a long-term video archive to assess for the effects of aging on individual chimpanzees over extended time periods. Specifically, the reviewers raised several points surrounding whether we could exclude ecological variation over years as the explanation of changes with aging, rather than aging itself. Whilst we acknowledge there are limitations to our approach, we provide a comprehensive response to these points highlighting:

      (1) Where ecological variables have been accounted for using controls (including the behaviors of other individuals, or an aging individuals’ behavior at younger ages).

      (2) Where ecological data may be missing, thus a potential limitation to our study, and further data would be beneficial.

      (3) Whether, in light of these limitations, interannual ecological variation offers a likely explanation for the behavioral changes we have identified. We provide an argument that whilst ecological data would be desirable for our study, interannual changes in ecology are unlikely to explain the trends in our data. Additionally, we explain why age-related changes, such as senescence, are more likely to underpin the patterns described in our manuscript.

      Across 1-3, we have made substantial changes to the reporting of our manuscript to ensure that our results are communicated transparently, and conclusions are made with appropriate care. We have also moved all discussion of coula-nut cracking to the supplementary materials, given the points raised by reviewers about the lack of data describing coula-nut cracking in earlier field seasons.

      We hope that these modifications will enhance both the editors’ and reviewers’ assessment of our manuscript, where we have aimed to make careful conclusions that are supported by our available data. Similarly, we have aimed to communicate the importance of our results across fields of research including primatology, evolutionary anthropology, and comparative gerontology, and hope that our research will be of use to further studies within these subfields.

      Reviewer 1 (Recommendations for the authors):

      (1) If possible, include results or a summary of the behaviour of younger adults using stone tools during the same period. It would be helpful to know if they had the same or different pattern to exclude other factors that may influence the tool use (harder nuts in a particular season, diseases, motivation for other foods, etc). 

      We include data for other individuals when analyzing attendance. However, we did not collect comparable long-term efficiency data on younger adult individuals for this study. This is, in part, due to the time constraints imposed by long-term behavior coding. Additionally, only one adult was both present at Bossou throughout the 1999-2016 period, and younger than the threshold for our old-age category across these years (thus, the baseline used to compare with older adults would be just one younger adult, thus would not have been useful for characterizing normal variation of many younger adults over time). However, given the longitudinal data we present, we can use data from the earlier field seasons for each elderly focal individual as a personalized baseline control. Previous studies at Bossou find that across the majority of adulthood, efficiency varies between individuals, but is stable within individuals over time (e.g., Berdugo et al. 2024, cited). We detected similar stability in individuals’ efficiency over the first three field seasons sampled in our analysis, where there was very little intra-individual variation in tool-using efficiency. However, in later years, two individuals (Velu & Yo) began to exhibit relatively large reductions in efficiency.

      These results are unlikely to be explained by ecological variation. If there was a change in ecology underpinning our results, we would expect: [1] changes in ecology to also introduce variation in earlier field seasons, and [2] to influence all individuals in our study similarly. As such, if the changes observed in later field seasons were due to ecological changes, they should have caused a reduced efficiency across individuals, and to a similar degree – we did not observe this result, with large reductions in efficiency were confined to two individuals.   Moreover, for Yo (the individual who exhibited the largest reduction in efficiency) we found some additional evidence that changes in oil-palm-nut cracking efficiency extended beyond the period we sampled, i.e. they were evident even in 2018, reflecting a long-term, directional reduction in efficiency as compared to earlier years of her life. This consistent reduction in tool-using efficiency over multiple years adds further weight to the hypothesis that changes at the level of the individual were causing reduced tool-using efficiency, rather than our results being underpinned by interseasonal variation in ecology.

      Whilst we agree that our study is limited in the extent to which we can analytically assess ecological explanations for changes in nut-cracking efficiency, we believe that hypothetical ecological changes across field seasons do not predict our results. We now raise both sides of this debate in our discussion, where we outline our limitations (see lines 535-593).

      (2) The data from 2011 was scarce, with only one individual having 10 encounters. It would be better to be cautious with this season's results. 

      We appreciate this limitation raised by the reviewer. Velu and Yo were only encountered a few times in 2011; however, both were encountered more frequently in 2016. For 2011, we did not collect oil-palm nut cracking data for either Yo or Velu. Thus, their change in efficiency was detected by models using data from all other years, regardless of the few encounters in 2011. This sparsity of data may still have influenced our metrics for the proportion of time chimpanzees spent engaging in different behaviors when present at the outdoor laboratory in 2011, particularly for Velu, who was one of the two individuals who exhibited a change in behavior in this year (along with Fana, N = 10 for 2011). We have therefore added a line in our results and discussion highlighting the sparsity of data for Velu when estimating these proportions for 2011 (see lines 255-256 & 410).

      Minor corrections 

      (1) The last paragraph of the introduction presents many results, which should be in the results section. 

      We would like to keep this section of the introduction. Our paper investigates the effect of aging on many different aspects of nut cracking, which could become confusing for readers unless laid out clearly. We believe that having a short summary early on in the paper assists readers with following the methods and arguments presented within our paper.

      (2) The first section (Sampled data) of the results contains much information that belongs in the methods section. 

      We appreciate that there is some overlap between our methods and results section. However as the results section comes before the methods in our manuscript, we wanted to ensure that there is suitable information in our results that allow our results to be interpreted clearly by readers, and that the methods used to generate these results are transparently communicated. For these reasons, we will leave this information in the results, as we believe it increases our paper’s readability. 

      Reviewer 2 (Public review):

      One of the main limitations of this study is the small sample size. There are only 5 of the old-aged individuals, which is not enough to draw any inferences about aging for chimpanzees more generally. Howard-Spink and colleagues also study data from only five of the 17 years of recorded data at Bossou. The selection of this subset of data requires clarification: why were these intervals chosen, why this number of data points, and how do we know that it provides a representative picture of the age-related changes of the full 17 years? 

      We note that our sample size is limited to 5 individuals. This is an inevitable constraint of analyzing aging longitudinally in long-lived species, as only few individuals will live to old age. We argue that 17 years is a long enough period of study, as in the initially sampled field season (1999) focal individuals are reaching a mature age of adulthood (39-44 years) and begin to age progressively up to ages that are typically considered to be on the extreme side for chimpanzees’ lifespans in the wild (56-61 years). We raise in our methods that whilst it is difficult to determine precisely when chimpanzees become ‘old aged’, previous studies use the age of around 40 years, as from this age survivorship begins to decrease more rapidly (see Wood et al., Science 2023). Indeed, one focal individual (Tua) disappeared during the period of our study (presumed dead), and one other individual died in 2017 (Velu), the year after our final sampled field season. As of 2025, two other focal females have since died, and only one focal individual was still alive at Bossou (Jire, the individual exhibiting the least evidence for senescence over our study period). These observations suggest that we successfully captured data from chimpanzees during the oldest ages of their lives for most individuals in the community. Moreover, the period of 1999-2016 contains the majority of data available within the Bossou Archive, with years before and after this window containing comparably less data. This information is included within our results and methods (see sections 2.1 and 4.1).

      For our earliest field season (1999), it is unlikely that senescence had already had an effect on stone-tool use, as we measured efficiency to be high across all efficiency metrics for all individuals. For example, in 1999, the median number of hammer strikes performed by focal chimpanzees ranged from 2-4 strikes, and this was comparable to the efficiency reported across all adults observed in previous studies at Bossou (Biro et al. 2003, Anim. Cog.). This finding suggests that senescence effects had not yet taken place, allowing us to evaluate whether aging affects efficiency over subsequent field seasons. This point is now included in the manuscript on lines 449-452. 

      We sampled at 4-to-5-year intervals to balance the time-intensive nature of fine-scale behavior coding against the need to sample data across the extended 17-year time window available in our study. We limited the final year to 2016 as, in following years, data were collected using different sampling protocols (though, see limited data from 2018 in the supplementary materials). We aimed to keep the intervals between years as consistent as possible (approx. 4 years); however, for some years data were not collected at Bossou, due to disease outbreaks in the region. In these instances, we selected the closest field season where suitable data were available for study (always +/- 1 year). We have provided further clarification surrounding our sampling regime in the methods (see amendments in section 4.1)

      With measuring and interpreting the 'efficiency' of behaviors, there are in-built assumptions about the goals of the agents and how we can define efficiency. First, it may be that efficiency is not an intentional goal for nut-cracking at all, but rather, e.g., productivity as far as the number of uncrushed kernels (cf. Putt 2015). Second, what is 'efficient' for the human observer might not be efficient for the chimpanzee who is performing the behavior. More instances of tool-switching may be considered inefficient, but it might also be a valid strategy for extracting more from the nuts, etc. Understanding the goals of chimpanzees may be a difficult proposition, but these are uncertainties that must be kept in mind when interpreting and discussing 'decline' or any change in technological behaviors over time.

      We agree that knowing precisely how chimpanzees perceive their own efficiency during tool use is unlikely to be available through observation alone. However, under optimal foraging theory, it is reasonable to assume that animals aim to economize foraging behaviors such that they maximize their rate of energy intake. Moreover, a wealth of studies demonstrate that adult chimpanzees acquire and refine tool-using skill efficiency throughout their lives. For example, during nut cracking, adults often select tools with specific properties that aid efficient nut cracking (Braun et al. 2025, J. Hum. Evol.; Carvalho et al. 2008, J. Hum. Evol.; Sirianni et al. 2015, Anim. Behav.); perform nut cracking using more streamlined combinations of actions than less experienced individuals (Howard-Spink et al. 2024, Peer J; Inoue-Nakamura & Matsuzawa 1997, J. Comp. Psychol.), and as a result end up cracking nuts using fewer hammer strikes, indicating a higher level of skill (Biro et al. 2003, Anim. Cogn.; Boesch et al. 2019, Sci. Rep.). Ultimately, these factors suggest that across adulthood, experienced chimpanzees perform nut cracking with a level of efficiency which exceeds novice individuals, including across the whole behavioral sequence for tool use, even if they are not aware or intending to do so. Previous studies at Bossou have also highlighted that there are stable inter-individual differences in efficiency of individuals over time (Berdugo et al. 2024, Nat. Hum. Behav.). This pattern of findings allows us to ask whether this acquired level of skill is stable across the oldest years of an individual’s life, or whether some individuals experience decreased efficiency with age. In addition, our selection of efficiency metrics is in keeping with a wealth of studies which examine the efficiency of stone-tool using in apes, thus, we argue that this is not problematic for our study.

      As we stated in our initial responses to reviewers, it is unlikely that tool switching is a valid strategy for tool use, as it is so rarely performed by proficient adult nut crackers (including earlier in life for our focal individuals). Nevertheless, we did not find a significant change in tool switching for oil-palm nut cracking, and this behavioral change was only observed when Yo was cracking coula nuts. As we have now moved discussion of coula nut cracking to the supplementary materials (and tempered discussion of coula nut cracking to emphasize the need for more data) this behavioral variable does not influence our reported results. 

      In our discussion, we also highlight how seemingly less efficient actions may reflect a valid strategy for nut cracking. E.g. a greater number of tool strikes may reflect a strategy of compensation for progressive tool wear. This would still reflect a reduced efficiency (e.g. in terms of the rate at which kernels can be consumed), but may perhaps borne for the necessity to accommodate for changes in an individuals’ physical affordances with aging. Thus, we do take the Reviewer’s point into account, but by using an alternative, more likely, example given the available data. We have now emphasized this point in lines 521-527.

      We have also clarified these matters by adding more information into our methods (see lines 798-802 and 828-829), highlighting that we take a perspective on efficiency that reflects the speed of nut processing and kernel consumption, and the number of different behavioral elements required to do so. Our phrasing now explicitly avoids using language that assumes that individuals’ have some perception of their own efficiency during tool use.

      For the study of the physiological impact of senescence of tool use (i.e., on strength and coordination), the study would benefit from the inclusion of variables like grip type and (approximate) stone size (Neufuss et al., 2016). The size and shape of stones for nut-cracking have been shown to influence the efficacy and 'efficiency' of tool use (i.e., the same metrics of 'efficiency' implemented by Howard-Spink et al. in the current study), meaning raw material properties are a potential confound that the authors have not evaluated. 

      We did not collect this data as part of our study. Whilst grip type could be a useful variable to measure for future studies, it is not necessary to demonstrate senescence per se. However, we agree that this could be a fruitful avenue to understand changes in behavior at greater granularity, and have added this as a recommendation for further study. We also now provide a discussion on stone dimensions and materials as part of our limitations (see lines 581-589 for both points).

      Similarly, inter- and intraspecific variation in the properties of nuts being processed is another confound (Falótico et al., 2022; Proffitt et al., 2022;). If oil palm nuts were varying year-to-year, for example, this would theoretically have an effect on the behavioral forms and strategies employed by the chimpanzees, and thus, any metric of efficiency being collected and analyzed. Further, it is perplexing that the authors analyze only one year where the coula nuts were provided at the test site, but these were provided during multiple field seasons. It would be more useful to compare data from a similar number of field seasons with both species if we are to study age-related changes in nut processing over time (one season of coula nut-cracking certainly does not achieve this). 

      We have moved all discussion of coula nuts to the supplementary materials so as to avoid any confusion with oil-palm nuts (see comments from Reviewer 2, and our response). Nut hardness may influence the difficulty with which nuts are cracked, with one of the most likely factors influencing nut hardness being its age: young nuts are relatively harder to crack, whereas older nuts, which are often worm-eaten or can be empty, crack more easily, yet are not worth cracking (Sakura & Matsuzawa, 1991; Ethology). We largely controlled for this in our study, as the nuts provided at outdoor laboratories were inspected to ensure that the majority of them were of suitable maturity for cracking, and we now clarify this control in our methods (see lines 678-680) and when discussing our study limitations (see lines 551-558). In these sections, we also highlight a previous study at Bossou that shows chimpanzees select nuts which can be readily cracked, based on their age (Sakura & Matsuzawa, 1991; Ethology).

      We acknowledge that we are limited in the extent to which we can control for interannual variation in ecology with our available data. However, we highlight why interannual variability is unlikely to fully explain our results (see lines 551-580 and response to comments from Reviewer 1). We also highlight in our limitations section that future studies should (where possible) aim to collect more ecological data to account for possible confounds more rigorously.

      Both individual personality (especially neophilia versus neophobia; e.g., Forss & Willems, 2022) and motivation factors (Tennie & Call, 2023) are further confounds that can contribute to a more valid interpretation of the patterns found. To draw any conclusions about age-related changes in diet and food preferences, we would need to have data on the overall food intake/preferences of the individuals and the food availability in the home range. The authors refer briefly to this limitation, but the implications for the interpretation of the data are not sufficiently underlined (e.g., for the relevance of age-related decline in stone tool-use ability for individual survival). 

      In our discussion, we highlight that multiple aging factors may influence apes’  dietary preferences and motivations to attend experimental (and perhaps also naturally-occurring) nut cracking sites (see lines 397-443 and 542-550). We do not believe that neophobia is a likely driver underlying our results, given that the outdoor laboratory has been used to collect data for many decades, including over a decade prior to the first field season in which data were sampled for our study (now highlighted in lines 692-694). In addition, previous studies at Bossou have determined that the outdoor laboratory is visited with comparable frequency to naturallyoccurring nut cracking sites, which makes any form of novelty bias unlikely (this information is now included in our methods, see lines 397-400, and also 687-689). 

      We agree that further information is required about foraging behaviours across the home range to understand changes in attendance at the outdoor laboratory, and have now provided more clarity on this within the limitations section of our discussion 542-550. In our discussion of individual survivability, we state clearly that we cannot make a conclusion about how changes in tool use influence survival with the available data, and assert that this would require data across the home range (see lines 627-638). We agree that future research is needed to assess whether changes in tool use would influence survivability, and also suggest that it may not be survival-relevant; instead changes in tool use with aging may simply be a litmus test for detecting more generalized senescence.

      Generally speaking, there is a lack of consideration for temporal variation in ecological factors. As a control for these, Howard-Spink and colleagues have examined behavioral data for younger individuals from Bossou in the same years, to ostensibly show that patterns in older adults are different from patterns in younger adults, which is fair given the available data. Nonetheless, they seem to focus mostly on the start and end points and not patterns that occur in between. For example, there is a curious drop in attendance rate for all individuals in the 2008 season, the implications of which are not discussed by the authors. 

      As the reviewer points out, when examining the attendance rates of older individuals over sampled field seasons, we used the attendance rates of younger individuals as a control. However, we do not run this analysis using start and end points only. Attendance rates were included in our model across the full range of sample field seasons. However, as the key result here is an interaction term between age cohort (old) and the field season (scaled about the mean), we supplement this significant statistical result with a digestible comparison of attendance rates between the first and last field season, to give a general sense of effect size. We have clarified that all data were used in our model (see line 229, and also the legend for Table 2), and in this section we also provide all key model outputs and signpost where the full model output can be found in the supplementary materials.

      As far as attendance, Howard-Spink and colleagues also discuss how this might be explained by changes in social standing in later life (i.e., chimpanzees move to the fringes of the social network and become less likely to visit gathering sites). This is not senescence in the sense of physiological and cognitive decline with older age. Instead, the reduced attendance due to changes in social standing seems rather to exacerbate signs of aging rather than be an indicator of it itself. The authors also mention a flu-like epidemic that caused the death of 5 individuals; the subsequent population decline and related changes in demography also warrant more discussion and characterization in the manuscript. 

      We have adapted this part of the discussion to make it clear that social aging is not necessarily equivalent to physiological and cognitive aging. We have also clarified in this section the changes in demography at Bossou during our study, which may have further impacted social behaviors (see lines 423-443). 

      Understandably, some of these issues cannot be evaluated or corrected with the presented dataset. Nonetheless, these undermine how certain and/or deterministic their conclusions can really be considered. Howard-Spink et al. have not strongly 'demonstrated' the validity of relationships between the variables of the study. If anything, their cursory observations provide us with methods to apply and hypotheses to test in future studies. It is likely that with higher-resolution datasets, the individual variability in age-related decline in tool-use abilities will be replicated. For now, this can be considered a starting point, which will hopefully inspire future attempts to research these questions. 

      We thank the reviewer for their comments. We have adapted our manuscript to highlight that we agree that it serves a starting point for answering these valuable questions; however, we do feel that we can contribute meaningful evidence that it is likely aging effects underlying the findings in our data (see responses above). We agree with the reviewer that further study is needed to understand these questions in more detail, and have tried to ensure that our conclusions are suitably tempered, and the recommendations for research are heavily encouraged to build on our findings.  

      Falótico, T., Valença, T., Verderane, M. & Fogaça, M. D. Stone tools differences across three capuchin monkey populations: food's physical properties, ecology, and culture. Sci. Rep. 12, 14365 (2022). 

      This has now been cited.

      Forss, S. & Willems, E. The curious case of great ape curiosity and how it is shaped by sociality. Ethology 128, 552-563 (2022). 

      We do not cite this – see above.

      Neufuss, J., Humle, T., Cremaschi, A. & Kivell, T. L. Nut-cracking behaviour in wild-born, rehabilitated bonobos (Pan paniscus): a comprehensive study of hand-preference, hand grips and efficiency. Am. J. Primatol. 79, e22589 (2016). 

      This has now been cited.

      Proffitt, T., Reeves, J. S., Pacome, S. S. & Luncz, L. V. Identifying functional and regional differences in chimpanzee stone tool technology. R. Soc. Open Sci. 9, 220826 (2022). 

      This has now been cited.

      Putt, S. S. The origins of stone tool reduction and the transition to knapping: An experimental approach. J. Archaeol. Sci.: Rep. 2, 51-60 (2015). 

      We do not cite this, as we instead cite studies which highlight chimpanzees’ ability to become more efficient in tool use with repeated practice (see above). 

      Tennie, C. & Call, J. Unmotivated subjects cannot provide interpretable data and tasks with sensitive learning periods require appropriately aged subjects: A Commentary on Koops et al. (2022) "Field experiments find no evidence that chimpanzee nut cracking can be independently innovated". ABC 10, 89-94 (2023). 

      We do not cite this – see above

      Reviewer #2 (Recommendations for the authors):

      Minor Comments: 

      (1) Line 494: Citation #53 is listed twice. 

      This has been amended.

      (2) Line 501: The term 'culturally-dependent' as used here is, at best, controversial, and at worst, misapplied. I would recommend replacing it with simply the term 'cultural'. 

      This has been changed to ‘cultural’.

      Major Comments: 

      For the Introduction, in the paragraph starting on Line 91, and the Discussion, starting on Line 369, I would recommend some simple re-structuring of the argumentation. As many in the Public Review, the changes in social standing according to age are not necessarily a case of senescence in the very sense of physiological or cognitive changes of the individual. This seems to have had an effect on attendance rates, which then could have been a driver of behavioral changes and even cognitive decline as ostensibly measured by the other variables. The social impact of aging should be mentioned in the Introduction (it is not currently) and the social and physiological/cognitive effects of aging should be separated in the Discussion. You can then discuss more clearly how the former via other behavioral changes can accelerate the latter (or not). 

      We take the point raised about social aging. Integrating information about social aging into the introduction was challenging without disrupting the flow of the paper; however, we have included these valuable points in the discussion (see lines 423-443). We now structure this section to clearly distinguish social aging, and discuss how, in tandem with changes in demography at Bossou, it may have influenced rates of attendance to the outdoor laboratory over the years. We do not go into detail about how social aging may interact with physiological or cognitive effects of aging, as we cannot support this with the available data, however we highlight at the end of this paragraph how all of these possible factors require further investigation.

      For the present study, it will either be impossible or impractical to gather data on the yearly ecological conditions, contextualized dietary preferences, individual personalities, etc., so I would not ask that you do so. It is important, however, to temper some of the claims being made in the manuscript about what you have 'determined' about the nature of senescence in chimpanzees and to be more transparent about the limitations and potential confounds when interpreting the data. To avoid repetition, the key points can be found in the Public Review under 'Weaknesses'. 

      We appreciate the reviewer’s understanding of the limitations of our study. Some of these factors – such as individual personalities and dietary preferences – are addressed somewhat by our use of long-term data at the level of the individual, particularly in the analyses of efficiency, where we model individuals’ behaviors compared to those in earlier years offers an individuallybespoke control. However, there are other ecological variables of possible importance that we cannot evaluate. We now address several of these points raised by reviewers in the discussion, to ensure transparency of reporting (see limitations section of our discussion, and results to the comments provided by Reviewer 1, and our responses to points raised in the Public Review). We have also tempered some of the phrasing surrounding our conclusions, where we say that this is the first evidence that aging can impact chimpanzee tool use, we also highlight the need for an assortment of further studies. 

      Finally, the integration of the coula nut-cracking data is not well-executed as it stands. I would recommend that they collect and analyze equivalent behavioral data from the other years where coula nuts were provided. By examining only one season of coula nut-cracking, we cannot contextualize the data to past seasons; there is no sense in comparing one season of coula nut-cracking (i.e., in a sense of efficiency) to roughly contemporary seasons of palm-nut cracking due to, as you describe, differences in physical properties of the nuts. If you are not able to collect the additional data and carry out the requisite analysis, then I would recommend that the coula nut-related sections be removed from the manuscript, so that it does not detract from the logical flow of arguments and distract from the other data, which is more logically-attuned to your research questions. 

      We have removed this from the main manuscript. We have decided to include the information surrounding coula nut cracking in the supplementary materials, as this information is still relevant to the findings of our study, and may interest some readers. However, we have phrased this information to make it clear that further data is needed to compare coula nut cracking across years.

      These criticisms do not subtract from the (potential) value or importance of the work for the field. This is, of course, an important contribution to an understudied topic. As such, I would gladly advocate for the manuscript, assuming the authors would reflect on the listed caveats and make changes in response to the 'Major Comments'. 

      We thank the reviewer for their comments.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public reviews):

      (1) Commander-Independent Role of COMMD3: While the authors provided evidence to support the Commander-independent role of COMMD3-such as the absence of other Commander subunits in the CRISPR screen and not decreased COMMD3 levels in other subunit-KO cells- direct evidence is lacking. The mutation that specifically disrupts the COMMD3-ARF1 interaction could serve as a valuable tool to directly address this question.

      The Reviewer raised an excellent point. We fully agree with the Reviewer that multiple lines of evidence are needed to support the novel Commander-independent function of COMMD3.

      Comparative genetic analyses in Figures 4 and 5 indicate that COMMD3 regulates endosomal retrieval independently of the Commander complex. In Figure 8 of the revised manuscript, we show that point mutations introduced into the COMMD3:ARF1 interface impair this Commander-independent function. Moreover, Figure 6 demonstrates that ARF1 upregulation fully rescues the KO phenotype of COMMD3. In addition, Figure S2 further supports that COMMD3 levels, but not those of other Commander subunits, correspond to its Commander-independent function in endosomal trafficking. We have also revised the Discussion section to elaborate on the implications of these findings. We appreciate the Reviewer’s advice.

      (2) Role of ARF1 in Cargo Selection: The Commander-independent function of COMMD3 appears cargo-dependent and relies on ARF1's role in cargo selection. The authors should investigate whether KO/KD of ARF1 reduces cell surface levels of ITGA6 and TfR.

      The Reviewer correctly pointed out that KO/KD of ARF1 may provide further insights into the Commander-independent function of COMMD3. However, since ARF1 is involved in cargo sorting at both the endosome and the trans-Golgi network, its KO would disrupt multiple trafficking routes, making the data difficult to interpret. Instead, we focused on point mutations in the NTD that specifically disrupt ARF1 binding without affecting the function of the Commander complex (Fig. 8). As these mutations impair the Commander-independent function of COMMD3, our data strongly support a direct role for ARF1 in this recycling pathway. We note that the discovery of a novel trafficking pathway inevitably opens many research directions. One such direction is to systematically identify cargoes that rely on COMMD3 but not the Commander complex for endosomal retrieval.

      (3) Impact on TfR Stability: Figure 7D suggests that TfR protein levels are reduced in COMMD3-KO cells, potentially due to degradation caused by disrupted recycling. This raises the question of whether the observed reduction in cell surface TfR is due to impaired endosomal recycling or decreased total protein levels. The authors should quantify the ratio of cell surface protein to total protein for TfR, GLUT-SPR, and ITGA6 in COMMD3-KO cells.

      Based on the Reviewer's suggestion, we quantified both the total levels and the surface-tototal ratio of TfR, as shown in Figure S1 of the revised manuscript. These new data further support the conclusion that defects in TfR retrieval lead to its lysosomal degradation. The GLUT-SPR data presented in the main figures represent the surface-to-total ratio of the GLUT-SPR reporter. We thank the Reviewer for the important suggestion.

      Reviewer #1 (Recommendations for the authors):

      (1) Commander-Independent Role of COMMD3: The mutation that specifically disrupts the COMMD3-ARF1 interaction could serve as a valuable tool to directly address this question. The authors should evaluate whether the full-length mutant of COMMD3 can rescue decreased levels of CCDC93 and VPS35L, as well as cell surface ITGA6, TfR, and GLUT4 inCOMMD3-KO cells.

      This is an excellent point. In our mechanistic experiments, we focused on the NTD of COMMD3 because this domain mediates its Commander-independent function and is not involved in forming the Commander holo-complex. This approach allowed us to draw unambiguous conclusions. Nevertheless, we anticipate that full-length COMMD3 carrying these point mutations would also be defective in regulating Commander-independent cargo.

      (2) Role of ARF1 in Cargo Selection: The authors should investigate whether KO/KD of ARF1 reduces cell surface levels of ITGA6 and TfR. Was ARF1 identified in the initial CRISPR screen? If so, this should be explicitly noted. Alternatively, does ARF1 overexpression rescue ITGA6 levels in COMMD3-KO cells? Furthermore, does ARF1 overexpression rescue TfR levels in COMMD3 and CCDC93 double-KO cells?

      Reinto the Commander-independent function of COMMD3. However, since ARF1 is involved in cargo sorting at both the endosome and the trans-Golgi network, its KO would disrupt multiple trafficking routes, making the data difficult to interpret. Instead, we focused on point mutations that specifically disrupt ARF1 binding without affecting the function of the Commander complex (Fig. 8). Since these mutations impair the Commander-independent function of COMMD3, our data strongly support a direct role for ARF1 in this novel recycling pathway. Based on our genetic data, we anticipate that all COMMD3-dependent cargoes will be similarly rescued in ARF1-overexpressing cells. In line with the Reviewer's comment, a key research direction we are currently pursuing is systematically determining how surface protein levels are affected by COMMD3 KO and ARF1 overexpression using surface proteomics.

      (3) Inconsistency in COMMD3 Rescue Levels (Figure 5A): Figure 5A shows comparable or higher levels of COMMD3 in rescued cells than in CCDC93-KO and VPS35L-KO cells. However, COMMD3 rescue did not increase cell surface TfR as much as in CCDC93-KO and VPS35L-KO cells. This inconsistency should be discussed or validated.

      To address the Reviewer’s inquiry, we quantified COMMD3 expression levels in these cell lines using multiple independent experiments. The new data are presented in Figure S2 of the revised manuscript. These expanded datasets allowed us to more accurately determine the relationship between COMMD3 expression and our genetic data. Since the Commander complex remains intact in the COMMD3 rescue cells, a significant portion of COMMD3 proteins are expected to be incorporated into the Commander complex, which does not regulate TfR recycling. In contrast, because the Commander complex is disrupted in Ccdc93 and Vps35l KO cells, all COMMD3 proteins are available to regulate TfR recycling in a Commander-independent manner. These findings are fully consistent with the similar surface TfR levels observed in Ccdc93/Vps35l KO cells and COMMD3 overexpressing cells. We thank the Reviewer for this excellent suggestion.

      (4) Significance of NTD in COMMD3 Function: The conclusion that "the NTD of COMMD3 mediates its Commander-independent function and interacts with ARF1" (Page 12) is not fully supported without a side-by-side comparison of NTD, CTD, and FL COMMD3 in the same experiment (e.g., Figures 6B and 6G). Additional data is needed to strengthen this claim.

      We conducted the experiment suggested by the Reviewer and included the data in Figure S3. Our results indicate that the COMMD3 CTD cannot mediate the Commander-independent function of COMMD3 in endosomal retrieval. We appreciate the Reviewer’s suggestion.

      (5) ARF1 Stabilization Experiments: To substantiate the claim that COMMD3 binds and stabilizes the GTP-form of ARF1, the authors should include a comparative experiment showing GTP-form, GDPform, and wild-type ARF1 (e.g., Figures 6G and 7C).

      We fully agree with the Reviewer that it would be important to compare how the ARF1:COMMD3 interaction is influenced by the nucleotide-binding state. However, trapping ARF1 in its GDP-bound state remains unfeasible, and nucleotide-free small GTPases are inherently unstable. In addition, WT ARF1 likely exists as a mixture of GTP- and GDP-bound forms, further complicating the analysis. To address the Reviewer’s comment, we used AlphaFold3 predictions. Interestingly, we found that the ipTM score of GTP-ARF1:COMMD3 is significantly higher than that of GDP-ARF1:COMMD3 or apo-ARF1:COMMD3, supporting our conclusion that COMMD3 recognizes and stabilizes the active form of ARF1.

      (6) Validation of NTD Mutation (Figure 8): Co-immunoprecipitation or cellular co-localization experiments should be performed to confirm that the NTD mutation disrupts the interaction between COMMD3 and ARF1, as depicted in Figure 8.

      This is an important question, and the best approach to address it would be to measure the binding affinity of the WT and mutant proteins using ITC or SPR. However, this is currently unfeasible, as we have not yet obtained pure recombinant COMMD3 and GTP-ARF1 proteins. Co-IP, by nature, is a crude assay that often fails to detect changes in binding affinity. A previous study on other proteins showed that mutations in protein-binding interfaces strongly reduced binding affinity as measured by SPR, but these changes would have been missed by co-IP assays (PMID: 25500532). In agreement with this limitation, our co-IP experiments did not yield conclusive results. Instead, we focused on structure-guided genetic experiments, which unequivocally demonstrated the effects of targeted mutations on the Commander-independent function of COMMD3. 

      Reviewer #2 (Public review):

      (1) All existing data suggest that COMMD3 is a subunit of the Commander complex. Is there any evidence that COMMD3 can exist as a monomer?

      The Reviewer raised an intriguing point. Indeed, COMMD proteins, including COMMD3, can exist outside the Commander holo-complex and form homo- or hetero-oligomers, as monomeric COMMD proteins are likely unstable. These observations align well with the Commander-independent function identified in this study. We have revised the Discussion section of the manuscript to further elaborate on this point and thank the Reviewer for the suggestion.

      (2) In Figure 9, the author emphasizes COMMD3-dependent cargo and Commander-dependent cargo. Can the authors speculate what distinguishes these two types of cargo? Do they contain sequence-specific motifs?

      This is another important question. Our data clearly demonstrate that COMMD3 has a Commander-independent function in addition to its canonical role within the Commander holocomplex. Since cargo proteins typically possess multiple sorting signals that operate at different stages of the exocytic and endocytic pathways, identifying COMMD3-dependent sorting signals remains a challenge. ARF4 has been shown to specifically recognize the VXPX motif (PMID: 15728366), suggesting that ARF1 may similarly bind cytosolic sorting signals, with COMMD3 stabilizing this interaction. A key future direction is to systematically identify COMMD3-dependent cargo proteins and elucidate the mechanisms underlying their endosomal sorting. We have revised the Discussion section of the manuscript to explicitly address this point and thank the Reviewer for this important suggestion.

      (3) What could be the possible mechanism underlying the observation that the knockout of COMMD3 results in larger early endosomes? How is the disruption of cargo retrieval related to the increase in endosome size?

      The endosomal retrieval process is critical for recycling membrane proteins and lipids back to the plasma membrane or the trans-Golgi network. When this process is disrupted, cargo that should be recycled accumulates within endosomes, leading to their enlargement. For example, defects in retromer function can cause endosomal swelling due to cargo accumulation (PMID: 33380435). We added this citation to the revised manuscript and thank the Reviewer for the advice. 

      Reviewer 3 (Recommendations for the authors):

      (1) Figure 4: How do the authors define Commander-dependent vs. Commander-independent cargos?

      In Figure 4, the surface expression of ITGA6 is reduced to approximately 0.75 across all knockouts. However, there is a similar level of reduction for GLUT4-SPR in the commd5 knockout and for LAMP1 in the commd5 and commd1 knockouts. Are GLUT4-SPR and LAMP1 Commander-dependent or Commander-independent cargos? Additionally, how does COMMD3 specifically identify/distinguish these cargos?

      This is an excellent point. Our data suggest that TfR is a COMMD3-dependent but Commander-independent cargo, whereas ITGA6 is a Commander-dependent cargo that does not involve COMMD3-specific functions. The other two cargoes we examined—GLUT-SPR and LAMP1—primarily rely on COMMD3, with the Commander complex playing a minor role. Together, these observations clearly demonstrate that COMMD3 has a Commander-independent function in addition to its canonical role within the Commander holo-complex. Since cargo proteins typically possess multiple sorting signals that operate at different stages of the exocytic and endocytic pathways, identifying COMMD3-dependent sorting signals remains a challenge. ARF4 has been shown to specifically recognize the VXPX motif (PMID: 15728366), suggesting that ARF1 may similarly bind cytosolic sorting signals, with COMMD3 stabilizing this interaction. A key future direction is to systematically identify COMMD3-dependent cargo proteins and elucidate the mechanisms underlying their endosomal sorting. We have revised the Discussion section of the manuscript to explicitly address this point. We thank the Reviewer for this important suggestion.

      (2) There is an increase in the surface expression of GLUT4-SPR in the commd1 knockout. Is this increase significant? The figure suggests a significant increase, but the text states it remains unchanged. Clarification is needed.

      We found that surface levels of GLUT-SPR were slightly increased in Commd1 KO cells, in stark contrast to the strong reduction observed in Commd3 KO cells (Fig. 4B). This finding is consistent with our conclusion that COMMD3 has a distinct role from other Commander subunits. We have revised the Results section to more clearly describe these data and thank the Reviewer for the advice.

      (3) Figure 5A: To support the claim that COMMD3 is upregulated in the vps35l KO/Ccdc93 KO, the authors should quantify COMMD3 expression. Also, why is there a Vps35l band present in the Vps35l knockout cells?

      Based on the Reviewer’s suggestion, we quantified the total levels of COMMD3 and included these new data in Figure S2. In this study, gene deletion was achieved through the simultaneous introduction of two independent gRNAs. Based on our previous experience, this strategy typically results in the complete loss of gene expression. We posit that the residual band observed in Vps35l KO cells originates from background signals, such as nonspecific staining by the antibody.

      (4) Figure 7: It is intriguing that COMMD3 stabilizes Arf1-GTP and can compensate for COMMD3 in knockout cells. However, is this stabilization specific to TfR cargo only? The authors should test additional Commander-dependent and Commander-independent cargos to clarify this point.

      Based on our genetic data, we anticipate that all COMMD3-dependent cargoes will be similarly rescued in ARF1-overexpressing cells. In line with the Reviewer's comment, an important direction we are pursuing is the use of surface proteomics to systematically determine how surface protein levels are affected by COMMD3 KO and ARF1 overexpression.

      (5) Is Arf1 interaction specific to COMMD3? The authors should investigate the effects of Arf1 knockout on COMMD3 expression and test its role in regulating Commander-dependent and Commander-independent cargos.

      The Reviewer raised an excellent point. Since ARF1 is involved in cargo sorting at both the endosome and the trans-Golgi network, its KO would interfere with multiple trafficking routes and the data would be difficult to interpret. Thus, in this work, we focused on the function and mechanism of the COMMD3:ARF1 complex on the endosome. Based on the suggestion of the Reviewer, we used AlphaFold3 to predict ARF1 binding to COMMD proteins. Interestingly, the complex with the highest predicted ipTM score is COMMD3:ARF1, while other COMMD proteins have much lower predicted binding scores. These results are consistent with the results of our unbiased CRISPR screens and targeted gene KO, and further support the conclusion that the COMMD3:ARF1 binding is specific and physiologically important in endosomal trafficking.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The selection of inactivated conformations based on AlphaFold modeling seems a bit biased. The authors base their selection of the “most likely” inactivated conformation on the expected flipping of V625 and the constriction at G626 carbonyls. This follows a bit of the “Streetlight effect”. It would be better to have selection criteria that are independent of what they expect to find for the inactivated state conformations. Using cues that favour sampling/modeling of the inactivated conformation, such as the deactivated conformation of the VSD used in the modeling of the closed state, would be more convincing. There may be other conformations that are more accurately representing the inactivated state. I see no objective criteria that justify the non-consideration of conformations from cluster 3 of the inactivated state modeling. I am not sure whether pLDDT is a good selection criterion. It reports on structural confidence, but that may not relate to functional relevance.

      We sincerely thank the reviewer for their perceptive critique highlighting potential bias in selecting the inactivated conformation. We recognize that over-relying on preconceived traits could limit exploration of diverse inactivated states, and we appreciate the opportunity to address this concern.

      Although we selected the model with the flipped V625 in the selectivity filter (SF) from the first round of inactivated-state sampling as the template for the second round, the resulting models still exhibited substantial diversity in their SF conformations. This selection primarily served to steer predictions away from the open-state configuration observed in the PDB 5VA2 SF, and we have clarified this rationale in the Methodology section. To assess conformational variability, we examined backbone dihedral angles (phi φ and psi ψ) at key residues in the selectivity filter (S624 – G628) and drugbinding region on the pore-lining S6 segment (Y652, F656), of all 100 models sampled in the subsequent inactivatedstate-sampling attempt. By overlaying the φ and ψ dihedral angles from different models, including the open state (PDB 5VA2-based), the closed state, and representative models from AlphaFold inactivated-state-sampling Cluster 2 and Cluster 3, we found that these conformations consistently fall within or near high-probability regions of the dihedral angle distributions. This indicates that these structural states are well represented within the ensemble of conformations sampled by AlphaFold within the scope of this study, particularly at functionally critical positions.

      Following the analysis above and consistent with the reviewer’s suggestion, we evaluated the top representative model from inactivated-state-sampling Cluster 3 (named “AF ic3”), which we had initially excluded. This model demonstrated SF residue G626 carbonyl oxygen flipped away from the conduction pathway, hinting at potential impact on ion conduction, yet its pore region structurally resembled the open state (Figure S9a, b). To test this objectively, we ran molecular dynamics (MD) simulations (2 runs, 1 μs long each, with applied 750 mV voltage) with varied initial ion/water configurations in the SF, finding it consistently open and conducting throughout (Figure S9c, d), consistent with our previous observations in Figure S11 that ion conduction can still occur when the upper SF is dilated. Drug docking (Figure S12) further revealed that the model exhibited binding affinities similar to those for the PDB 5VA2-based openstate structure. These findings combined led us to classify it as a possible alternative open-state conformation.

      Models from Cluster 4 were not tested due to extensive steric clashes, where residues in the SF overlapped with neighboring residues from adjacent subunits. The remaining models displayed SF conformations that combined features from earlier clusters. However, due to subunit-to-subunit variability, where individual subunits adopted differing conformations, they were classified as outliers. This combination of features may be valuable to investigate further in a follow-up study.

      We acknowledge that our approach is just one of many ways to sample different states, and alternative strategies, such as generating more models, varying multiple sequence alignment (MSA) subsampling, or testing different templates, might reveal improved models. Given that hERG channel inactivation likely spans a spectrum of conformations, our resource limitations may have restricted us to exploring and validating only part of this diversity. Nevertheless, the putative inactivated (AlphaFold Cluster 2) model’s non-conductivity and improved affinity for drugs targeting the inactivated state observed in our study suggests that this approach may be capturing relevant features of the inactivated-state conformation. We look forward to investigating deeper other possibilities in a future study and are grateful for the reviewer’s feedback.

      (2) The comparison of predicted and experimentally measured binding affinities lacks an appropriate control. Using binding data from open-state conformations only is not the best control. A much better control is the use of alternative structures predicted by AlphaFold for each state (e.g. from the outlier clusters or not considered clusters) in the docking and energy calculations. Using these docking results in the calculations would reveal whether the initially selected conformations (e.g. from cluster 2 for the inactivated state) are truly doing a better job in predicting binding affinities. Such a control would strengthen the overall findings significantly.

      We appreciate the reviewer’s insightful suggestion. To address this, we extended our analysis by incorporating an alternative AlphaFold2-predicted model from inactivated-state-sampling cluster 3 as a structural control. This model was established in a previously discussed analysis to be open and conducting as a follow up to comment #1, so we will call it Open (AF ic3) to differentiate it from Open (PDB 5VA2). We evaluated this new model in single-state and multi-state contexts alongside our original open-state model based on the experimental PDB 5VA2 structure. Additionally, we expanded the drug docking procedure to explore a broader region around the putative drug binding site by increasing the sampling space, and we adopted an improved approach for selecting representative docking poses to better capture relevant binding modes.

      Shown in Figure 7 are comparisons of experimental drug potencies with the binding affinities from the molecular docking calculations under the following conditions:

      (a) Single-state docking using the experimentally derived open-state structure (PDB 5VA2)

      (b) Multi-state docking incorporating open (PDB 5VA2), inactivated, and closed-state conformations weighted by experimentally observed state distributions

      (c) Single-state docking using an alternative AlphaFold-predicted open-state (inactivated-state-sampling cluster 3, AF ic3)

      (d) Multi-state docking combining the AlphaFold-predicted open-state (inactivated-state-sampling cluster 3, AF ic3)

      Using only the open-state model (PDB 5VA2) yielded a moderate correlation with experimental data (R<sup>2</sup> = 0.43, r = 0.66, Figure 7a). Incorporating multi-state binding (weighted by their experimental distributions) improved the correlation substantially (R<sup>2</sup> = 0.63, r = 0.79, Figure 7b), boosting predictive power by 47% and underscoring the value of multi-state modeling. Importantly, this improvement was achieved without considering potential drug-induced allosteric effects on the hERG channel conformation and gating, which will be addressed in future work.

      Next, we substituted the PDB 5VA2-based open-state model with the AF ic3 open-state model. Docking to this alternative model alone produced similar performance (R<sup>2</sup> = 0.44, r = 0.66, Figure 7c), and incorporating it into the multi-state ensemble further improved the correlation with experiments (R<sup>2</sup> = 0.64, r = 0.80, Figure 7d), representing a 45% gain in R<sup>2</sup> and matching the performance of multi-state docking results based on the PDB 5VA2-derived model.

      These findings suggest that the predictive power of computational drug docking is enhanced not merely by the accuracy of individual models, but by the structural diversity and complementarity provided by an ensemble of protein conformations. Rather than relying solely on a single experimentally determined protein structure, the ensemble benefits from incorporating AlphaFold-predicted models that capture alternative conformations identified through our state-specific sampling approach. These diverse protein models reflect different structural features, which together offer a more comprehensive representation of the ion channel’s binding landscape and enhance the predictive performance of computational drug docking. Overall, these results reinforce that multi-state modeling offers a more realistic and predictive framework for understanding drug – ion channel interactions than traditional single-state approaches, emphasizing the value of both individual model evaluation and their collective integration. We are grateful for the reviewer’s suggestion.

      (3) Figures where multiple datapoints are compared across states generally lack assessment of the statistical significance of observed trends (e.g. Figure 3d).

      We appreciate the reviewer’s comment on the statistical significance assessment in Figure 3d. To clarify, the comparisons shown in the subpanels are based on three selected representative models for each state, rather than a broader population sample (similarly for Figure 3b). In the closed-state predicted models, the strong convergence of the voltagesensing domain (VSD), with an all-atom RMSD of 0.36 Å between cluster 1 and 2 closed-state sampling models and 0.95 Å to the outlier cluster, indicates minimal structural variation. Those RMSD values shown in the manuscript text demonstrates good convergence and by themselves represent statistical significance assessment of those models. This trend extends to open-state and inactivated-state AlphaFold models with similarly limited differences in the VSD regions among them. This convergence suggests that population-based statistical analysis may not reveal meaningful deviations, as the low variability among models limits the insights beyond those obtained from comparing representative structures.

      Nonetheless, we acknowledge this limitation. In future studies, we plan to explore alternative modeling approaches to introduce greater variability, enabling a more robust statistical evaluation of state-specific trends in the predictions.

      (4) Figure 3 and Figures S1-S4 compare structural differences between states. However, these differences are inferred from the initial models. The collection of conformations generated via the MD runs allow for much more robust comparisons of structural differences.

      We have explored these conformational state dynamics through MD simulations for the Open (5VA2-based), Inactivated (AlphaFold Cluster 2), and Closed-state models, as presented in Figures S7, S8, S10, S11. These figures provide detailed insights: Figure S7-S8 analyzes SF and pore conformation dynamics, including averaged pore radii with and without voltage and superimposed conformational ensembles; Figure S10 tracks cross-subunit distances between protein backbone carbonyl oxygens, revealing sequential SF dilation steps near residues F627 an G628; and Figure S11 illustrates this SF dilation process over time, highlighting residue F627 carbonyl flipping and SF expansion. We appreciate the opportunity to clarify our approach.

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) Protein fragments are used to model the closed and inactivated states of hERG, but the choices of fragments are not well justified. For instance, in Figure 1a, helices from 8EP1 (deactivated voltage-sensing domain) and a helix+loop from 5VA2 (selectivity filter) are used. Why just the selectivity filter and not the cytosolic domain, for instance? Why not some parts of the helices attached to the selectivity filter, or the whole membrane inserted domain of 8EP1? Same for the inactivated conformation in Figure 1c: why the cytosolic domain only?

      We thank the reviewer for their thoughtful questions regarding our choice of protein fragments for modeling the closed and inactivated states of hERG in Figures 1a and 1c, and we appreciate the opportunity to justify these selections more clearly. Our approach to template selection was guided by our experience that providing AlphaFold2 with larger templates often leads it to overly constrain predictions to the input structure, reducing its flexibility to explore alternative conformations. In contrast, smaller, targeted fragments increase the likelihood that AlphaFold2 will incorporate the desired structural features while predicting the rest of the protein. We have provided a more detailed discussion of this in the methodology section.

      For the closed state (Figure 1a), we chose the deactivated voltage-sensing domain (VSD) from the rat EAG channel (PDB 8EP1) to inspire AlphaFold2 to predict a similarly deactivated VSD conformation characteristic of hERG channel closure, as this domain’s downward shift is a hallmark of potassium channel closure. We paired this with the selectivity filter (SF) and adjacent residues from the open-state hERG structure (PDB 5VA2) to maintain its conductive conformation, as it is generally understood that K<sup>+</sup> channel closure primarily involves the intracellular gate rather than significant SF distortion. Including additional helices (e.g., S5–S6) or the entire membrane domain from PDB 8EP1 risked biasing the model toward the EAG channel’s pore structure, which differs from hERG’s, while omitting the cytosolic domain ensured focus on the VSD-driven closure without over-constraining cytoplasmic domain interactions.

      For the inactivated state (Figure 1c), we initially used only the cytosolic domain from PDB 5VA2 to anchor the prediction while allowing AlphaFold2 to freely sample transmembrane domain conformations, particularly the SF, where the inactivation occurs via its distortion. Excluding the SF or attached helices at this stage avoided locking the model into the open-state SF, and the cytosolic domain alone provided a minimal scaffold to maintain hERG’s intracellular architecture without dictating pore dynamics. Following the initial prediction, we initiated more extensive sampling by using one of the predicted SFs that differs from the open-state SF (PDB 5VA2) as a structural seed, aiming to guide predictions away from the open-state configuration. The VSD and cytosolic domain were also included in this state to discourage pore closure during prediction. Using larger fragments, like the full membrane-spanning domains or additional cytosolic regions from the open-state structure might reduce AlphaFold2’s ability to deviate from the open-state conformation, undermining our goal of capturing more diverse, state-specific features.

      It is worth noting that multiple strategies could potentially achieve the predicted models in our study, and here we only present examples of the paths we took and validated. It is likely that many of the steps may be unnecessary and could be skipped, and future work building on our approach can further explore and streamline this process. A consistent theme underlies our choices: for the closed state, we know the VSD should adopt a deactivated (“down”) conformation, so we provide AlphaFold2 with a specific fragment to guide this outcome; for the inactivated state, we recognize that the SF must change to a non-conductive conformation, so we grant AlphaFold2 flexibility to explore diverse conformations by minimizing initial constraints on the transmembrane region.

      With greater sampling and computational resources, it is possible we could identify additional plausible, non-conductive conformations that might better represent an inactivated state, as hERG inactivation may encompass a spectrum of states. In this study, due to resource limitations, we focused on generating and validating a subset of conformations. Still, we acknowledge that broader exploration could further refine these models, which could be pursued in future studies. We updated the Methods and Discussion sections to reflect this perspective, and we are grateful for the reviewer’s input, which encourages us to clarify our rationale and highlight the adaptability of our approach.

      To demonstrate the broader feasibility of this approach, we applied it to another ion channel system, voltage-gated sodium channel Na<sub>V</sub> 1.5, as illustrated in Figure S14. In this example, a deactivated VSD II from the cryo-EM structure of a homologous ion channel Na<sub>V</sub>1.7 (PDB 6N4R) (DOI: 10.1016/j.cell.2018.12.018), which was trapped in a deactivated state by a bound toxin, was used as a structural template. This guided AlphaFold to generate a Na<sub>V</sub>1.5 model in which all four voltage sensor domains (VSD I–IV) exhibit S4 helices in varying degrees of deactivation. Compared to the cryo-EM openstate Na<sub>V</sub>1.5 structure (PDB 6LQA) (DOI: 10.1002/anie.202102196), the predicted model displays a visibly narrower pore, representing a plausible closed state. This example underscores the versatility of our strategy in modeling alternative conformational states across diverse ion channels.

      (2) While the authors rely on AF2 (ColabFold) for the closed and inactivated states, they use Rosetta to model loops of the open state. Why not just supply 5VA2 as a template to ColabFold and rebuild the loops that way? Without clear explanations, these sorts of choices give the impression that the authors were looking for specific answers that they knew from their extensive knowledge of the hERG system. While the modeling done in this paper is very nice, its generalizability is not obvious.

      We appreciate the reviewer’s question about our use of Rosetta to model loops in the open-state hERG channel (PDB

      5VA2) rather than rebuilding it entirely with ColabFold. In the study, we conducted a control experiment supplying parts of PDB 5VA2 to ColabFold to rebuild the loops, generating 100 models (Figure 2a: predicted open state). The top-ranked model (by pLDDT) differed from our Rosetta-modelled structure by only 0.5 Å RMSD, primarily due to the flexible extracellular loops as expected, with the pore and selectivity filter (our areas of focus) remaining nearly identical. We chose the Rosetta-refined cryo-EM structure as this structure and approach have been widely used as an open-state reference in our other hERG channel studies, such as by Miranda et al. (DOI: 10.1073/pnas.1909196117) and Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404), to ensure that our results are more directly comparable to prior work in the field. Nonetheless, as both models (with loops modeled by Rosetta or AlphaFold) were virtually identical, we would expect no significant differences if either were used to represent the open state in our study. We have incorporated this clarification into the main text.

      (3) pLDDT scores were used as a measure of reliable and accurate predictions, but plDDT is not always reliable for selecting new/alternative conformations (see https://doi.org/10.1038/s41467-024-515072 and https://www.nature.com/articles/s41467-024-51801-z).

      We acknowledge that while pLDDT is a valuable indicator of structural confidence in AlphaFold2 predictions, its limitations warrant consideration. In our revision, we mitigated this by not relying solely on pLDDT, but we also performed protein backbone dihedral angle analysis of the protein regions of focus in all predicted models to ensure comprehensive coverage of conformational variations. From our AlphaFold modeling results, we tested a model from cluster 3 of the inactivated-state sampling process, which exhibited lower pLDDT scores, and included these results in our revised analysis. We included a note in the revised manuscript’s Discussion section: “As noted in recent studies, pLDDT scores are not reliable indicators for selecting alternative conformations (DOI: 10.1038/s41467-024-51507-2 and DOI: 10.1038/s41467-024-51801-z). To address this, we performed a protein backbone dihedral angle analysis in the regions of interest to ensure that our evaluation captured a representative range of sampled conformations.”

      (4) Extensive work has been done using AF2 to model alternative protein conformations (https://www.biorxiv.org/content/10.1101/2024.05.28.596195v1.abstract, along with some references the authors cite, such as work by McHaourab); another group recently modeled the ion channel GLIC (https://www.biorxiv.org/content/10.1101/2024.09.05.611464v1.abstract). Therefore, this work, though generally solid and thorough, seems more like a variation on a theme than a groundbreaking new methodology, especially because of the generalizability issues mentioned above.

      We sincerely thank the reviewer for acknowledging the solidity of our study and for drawing our attention to the impressive recent efforts using AlphaFold2 to explore alternative protein conformations. These studies are valuable contributions that highlight the versatility of AlphaFold2, and we are grateful for their context in evaluating our work.

      Building on these efforts, our approach not only enhances the prediction of conformational diversity but also introduces a twist by incorporating structural templates to guide AlphaFold2 toward specific functional protein states. More significantly, our study advances beyond mere structural modeling by integrating these conformations with their rigorous validation by incorporating multiple simulation results tested against experimental data to reveal that AlphaFold-predicted conformations can align with distinct physiological ion channel states. A key finding is that drug binding predictions using AlphaFold-derived hERG channel states substantially improve correlation with experimental data, which is a longstanding challenge in computational screening of multi-state proteins like the hERG channel, for which previous structural models have been mostly limited to the open state based on the cryo-EM structures. Our approach not only captures this critical state dependence but also reveals potential molecular determinants underlying enhanced drug binding during hERG channel inactivation, a phenomenon observed experimentally but poorly understood. These insights advance drug safety assessment by improving predictive screening for hERG-related cardiotoxicity, a major cause of drug attrition and withdrawal.

      We view our methodology as a natural evolution of the advancements cited by the reviewer, offering an approach that predicts diverse hERG channel conformational states and links them to meaningful functional and pharmacological outcomes. To address the reviewer’s concern about generalizability, we have expanded the methodology section to make it easier to follow and include additional details. As an example, we show how our approach can be applied to model another ion channel system, Na<sub>V</sub>1.5, in Figure S14.

      Furthermore, to enhance the applicability of our methodology, we have uploaded the scripts for analyzing AlphaFoldpredicted models to GitHub (https://github.com/k-ngo/AlphaFold_Analysis), ensuring they are adaptable for a wide range of scenarios with extensive documentation. This enables users, even those not focused on ion channels, to effectively apply our tools to analyze AlphaFold predictions for their own projects and produce publication-ready figures.

      While it is likely that multiple modeling approaches could lead AlphaFold to model alternative protein conformations, the key challenge lies in validating the physiological relevance of those predicted states. This study is intended to support other researchers in applying our template-guided approach to different protein systems, and more importantly, in rigorously in silico testing and validation of the biological significance of the conformation-specific structural models they generate.

      Minor concerns:

      (1) The authors mention in the Introduction section that capturing conformational states, especially for membrane proteins that may be significant as drug targets, is crucial. It would be helpful to relate their work to the NMR studies domains of the hERG channel, particularly the N-terminal “eag” domain, which is crucial for channel function and can provide insights into conformational changes associated with different channel states (https://doi.org/10.1016/j.bbrc.2010.10.132 ).

      We appreciate the reviewer’s insightful comment regarding the PAS domain and the potential influence of other regions, such as the N-linker and distal C-region, on drug binding and state transitions.

      The PAS domain did appear in the starting templates used for initial structural modeling (as shown in Figure 1a, b, c), but it was not included in the final models used for subsequent analyses. The omission was primarily due to hardwareimposed constraints, as including these additional regions would exceed the memory capacity of our current graphics processing unit (GPU) card, leading to failures during the prediction step.

      The PAS domain, even if not serving as a conventional direct drug-binding site, can influence the gating kinetics of hERG channels. By altering the probability and duration with which channels occupy specific states, it can indirectly affect how well drugs bind. For example, if the presence of the PAS domain shifts hERG channel gating so that more channels enter (and remain in) the inactivated state as was shown previously (e.g., DOI: 10.1085/jgp.201210870), drugs with a higher affinity for that state would appear to bind more potently, as observed in previous electrophysiological experiments (e.g., DOI: 10.1111/j.1476-5381.2011.01378.x). It is also plausible that the PAS domain could exert allosteric effects that alter the conformational landscape of the hERG channel during gating transitions, potentially impacting drug accessibility or binding stability. This is an intriguing hypothesis and an important avenue for future research.

      With access to more powerful computational resources, it would be valuable to explore the full-length hERG channel, including the PAS domain and associated regions, to assess their potential contributions to drug binding and gating dynamics. We incorporated a discussion of these points into the main text, acknowledging the limitations of our current models and highlighting the need for future studies to explore these regions in greater detail. The addition reads: “…Our models excluded the N-terminal PAS domain due to GPU memory limitations, despite its inclusion in initial templates. This omission may overlook its potential roles in gating kinetics and allosteric effects on drug binding (e.g., PMID: 21449979, PMID: 23319729, PMID: 29706893, PMID: 30826123, DOI:10.4103/jpp.JPP_158_17). Future research will explore the full-length hERG channel with enhanced computational resources to assess these regions’ contributions to conformational state transitions and pharmacology.”

      (2) In the second-to-last paragraph of the Introduction, the authors describe how AlphaFold2 works. They write, “AlphaFold2 primarily requires the amino acid sequence of a protein as its input, but the method utilizes other key elements: in addition to the amino acid sequence, AlphaFold2 can also utilize multiple sequence alignments (MSAs) of similar sequences from different species, templates of related protein structures when available, and/or homologous proteins (Jumper et al., 2021a). Evolutionarily conserved regions over multiple isoforms and species indicated that the sequence is crucial for structural integrity”. The last sentence is confusing; if the authors mean that all information required to fold the protein into its 3D structure is present in its primary sequence, that has been the paradigm. It is unclear from this paragraph what the authors wanted to convey.

      We apologize for any confusion caused by this phrasing. Our intent was not to restate the well-established paradigm that a protein’s primary sequence contains the information needed for its 3D structure, but rather to emphasize how

      AlphaFold2 leverages evolutionary conservation, via multiple sequence alignments (MSAs), to infer structural constraints beyond what a single sequence alone might reveal. Specifically, we aimed to highlight that conserved regions across species and isoforms provide additional context that AlphaFold2 uses to enhance the accuracy of its predictions, complementing the use of templates and homologous structures as described in Jumper et al. (2021). To clarify this, we revised the sentence in the manuscript to read: “AlphaFold2 primarily requires a protein's amino acid sequence as input, but it also leverages other critical data sources. In addition to the sequence, it incorporates multiple sequence alignments (MSAs) of related proteins from different species, available structural templates, and information on homologous proteins. While the primary sequence encodes the 3D structure, AlphaFold2 harnesses evolutionary conservation from MSAs to reveal structural insights that extend beyond what a single sequence can provide.” We thank the reviewer for pointing out this ambiguity.

      (3) In the Results section, the authors state that the predictions generated by their method are evaluated by standard accuracy metrics, please elaborate - what standard metrics were used to judge the predictions and why (some references would be a nice addition). Further, on Page 6, the sentence “There are fewer differences between the open- and closed-state models (Figure S2b, d)” is confusing, fewer differences than what? or there are a few differences between the two states/models? Please clarify.

      The original sentence referring to “standard accuracy metrics” is somewhat misplaced, as our intent was not to apply any conventional “benchmarking” to judge the predictions, but rather to evaluate functional and structural relevance in a physiologically meaningful context. Specifically, we assessed drug binding affinities from molecular docking simulations (in Rosetta Energy Units, R.E.U.) against experimental drug potency data (e.g., IC<sub>50</sub> values converted to free energies in kcal/mol, Figure 7), analyzed differences in interaction networks across states in relation to known mutations affecting hERG inactivation (Figure 4, Table 2), validated ion conduction properties through MD simulations with the applied voltage against expected state-dependent hERG channel behavior (Figure 5), and compared predicted structural models to available experimental cryo-EM structures (Figure 3). We clarified in the text that our assessment emphasized the physiological plausibility of the generated conformations, drawing on evidence from existing computational and experimental studies at each step of the analysis above.

      As for the sentence on page 6, “There are fewer differences between the open- and closed-state models,” we apologize for the ambiguity; we meant that the hydrogen bond networks in the selectivity filter region exhibit fewer differences between the open and closed states compared to the more pronounced variations seen between the open and inactivated states. We revised this sentence to read: “The open- and closed-state models show fewer differences in their selectivity filter hydrogen bond networks compared to those between the open and inactivated states,” to enhance readability.

      (4) In the Discussion, the authors reiterate that this methodology can be extended to sample multiple protein conformations, and their system of choice was hERG potassium channel. I think this methodology can be applied to a system when there is enough knowledge of static structures, and some information on dynamics (through simulations) and mutagenesis analysis available. A well-studied system can benefit from such a protocol to gauge other conformational states.

      We agree that this approach is well-suited to systems with sufficient static structures, dynamic insights from simulations, and mutagenesis data, as seen with the hERG channel. We appreciate the reviewer’s implicit concern about generalizability to less-characterized systems and addressed this in the Discussion as a limitation, noting that the method’s effectiveness may depend on prior knowledge. Future studies can explore whether the advent of AlphaFold3 and other deep learning approaches can enhance its applicability to systems with more limited data. We have added this comment to the Discussion: “…A limitation of our methodology is its reliance on well-characterized systems with ample static structures, molecular dynamics simulation data, and mutagenesis insights, as demonstrated with the hERG channel, which may limit its applicability to less-studied proteins.”

      (5) The Methods section must be broken down into steps to make it easier to follow for the reader (if they want to implement these steps for themselves on their system of choice).

      a. Is possible to share example scripts and code used to piece templates together for AF2. Also, since the AF3 code is now available, the authors may comment on how their protocol can be applicable there or have plans to implement their protocol using AF3 (which is designed to work better for binding small molecules). Please see https://github.com/google-deepmind/alphafold3 for the recently released code for AF3.

      We appreciate the reviewer’s suggestion to improve the Methods section and their comments on scripts and AlphaFold3 (AF3). We revised the Methods to separate it into clear steps (e.g., template preparation, AF2 setup, clustering, and refinement) for better readability and reproducibility, and uploaded the sample scripts along with the instructions to GitHub (https://github.com/k-ngo/AlphaFold_Analysis).

      Regarding AF3’s recent code release, we plan to explore the applicability of our methodology to AF3 in a follow-up study, leveraging its advanced features to refine conformational predictions and state-specific drug docking, and added a brief comment to the Discussion to reflect this future direction: “…Following the recent release of AlphaFold3’s source code, we plan to explore the applicability of our template-guided methodology in a follow-up study, leveraging AF3’s advanced diffusion-based architecture to enhance protein conformational state predictions and state-specific drug docking, particularly given its improved capabilities for modeling small molecule – protein interactions…”

      b. The authors modified the hERG protein by removing a segment, the N-terminal PAS domain (residues M1 - R397) because of graphics card memory limitation. Would the removal of the PAS domain affect the structure and function of the channel protein? HERG and other members of the “eag K<sup>+</sup> channel” family contain a PAS domain on their cytoplasmic N terminus. Removal of this domain alters a physiologically important gating transition in HERG, and the addition of the isolated domain to the cytoplasm of cells expressing truncated HERG reconstitutes wild-type gating. (see https://doi.org/10.1371/journal.pone.0059265). Please elaborate on this.

      We thank the reviewer for raising an important point about the removal of the N-terminal PAS domain and for highlighting its physiological role in hERG channel gating transitions. In our study, unlike experimental settings where PAS removal alters gating, we believe this omission has minimal impact on our key analyses.

      The drug docking procedure focuses on optimizing drug binding poses with minor protein structural refinement around the putative drug binding site, which in our case is the hERG channel pore region, where hERG-blocking drugs predominantly bind. The cytoplasmic PAS domain, located distally from this site, remains outside the protein structure refinement zone during drug docking simulations. However, one aspect we have not yet considered is the potential effect of drug modulation of the hERG channel gating and vice versa particularly given the PAS domain’s role in gating. This interplay could be significant but requires investigation beyond our current drug docking framework. We plan to explore this in future studies using alternative simulation methodologies, such as extended MD simulations or enhanced sampling techniques, to comprehensively capture these dynamic protein - ligand interactions.

      Similarly, in our 1 μs long MD simulations assessing ion conductivity (Figure 4), the timescale is too short for PASmediated gating changes to propagate through the protein and meaningfully influence ion conduction and channel activation dynamics, which occurs on a millisecond time scale (see e.g., DOI: 10.3389/fphys.2018.00207). To fully address this limitation, we plan to explore the inclusion of the PAS domain in a follow-up study with enhanced computational resources, allowing us to investigate its structural and functional contributions more comprehensively.

      (6) The first paragraph of the Methods reads as though AF2 has layers that recycle structures. We doubt that the authors meant it that way. Please update the language to clarify that recycling is an iterative process in which the pairwise representation, MSA, and predicted structures are passed (“recycled”) through the model multiple times to improve predictions.

      We agree that the phrasing might suggest physical layers recycling structures, which was not our intent. Instead, we meant to describe AlphaFold2’s iterative refinement process, where intermediate outputs, such as the pairwise residue representations, multiple sequence alignments (MSAs), and predicted structures, are iteratively passed (or “recycled”) through the model to enhance prediction accuracy. To clarify this, we revised the relevant sentence to read: “A critical feature of AlphaFold2 is its iterative refinement, where pairwise residue representations, MSAs, and initial structural predictions are recycled through the model multiple times, improving accuracy with each iteration.”

      Reviewer #3 (Recommendations for the authors):

      The authors should integrate the very recently published CryoEM experimental data of hERG inhibition by several drugs (Miyashita et al., Structure, 2024; DOI: 10.1016/j.str.2024.08.021).

      We thank the reviewer for the suggestion. Here, we compare drug binding in our open-states (PDB 5VA2-derived and an additional AlphaFold-predicted model from Cluster 3 of inactivated-state-sampling attempt named “AF ic3”) and inactivated-state models, using the cationic forms of astemizole and E-4031, with the corresponding experimental structures (Figure S13). Drug binding in the closed state is excluded as the pore architecture deviates too much from those in the cryo-EM structures. Experimental data (DOI: 10.1124/mol.108.049056) indicate that both astemizole and E4031 bind more potently to the inactivated state of the hERG channel.

      Astemizole (Figure S13a):

      - In the PDB 5VA2-derived open-state model, astemizole binds centrally within the pore cavity, adopting a bent conformation that allows both aromatic ends of the molecule to engage in π–π stacking with the side chains of Y652 from two opposing subunits. Hydrophobic contacts are observed with S649 and F656 residues.

      - In the AF ic3 open-state model, the ligand is stabilized through multiple π–π stacking interactions with Y652 residues from three subunits, forming a tight aromatic cage around its triazine and benzimidazole rings. Hydrophobic interactions are observed with hERG residues T623, S624, Y652, F656, and S660.

      - In the inactivated-state model, astemizole adopts a compact, horizontally oriented pose deeper in the channel pore, forming the most extensive interaction network among all the states. The ligand is tightly stabilized by multiple π–π stacking interactions with Y652 residues across three subunits, and forms hydrogen bonds with residues S624 and Y652. Additional hydrophobic contacts are observed with residues F557, L622, S649, and Y652.

      - Consistent with our findings, electrophysiology study by Saxena et al. identified hERG residues F557 and Y652 as crucial for astemizole binding, as determined through mutagenesis (DOI: 10.1038/srep24182).

      - In the cryo-EM structure (PDB 8ZYO) (DOI: 10.1016/j.str.2024.08.021), astemizole is stabilized by π–π stacking with Y652 residues. However, no hydrogen bonds are detected which may reflect limitations in cryo-EM resolution rather than true absence of contacts. Additional hydrophobic interacts are observed with L622 and G648 residues.

      E-4031 (Figure S13b):

      - In the PDB 5VA2-derived open-state model, E-4031 binds within the central cavity primarily through polar interactions. It forms a π–π stacking interaction with residue Y652, anchoring one end of the molecule. Polar interactions are observed with residues A653 and S660. Additional hydrophobic contacts are observed with residues A652 and Y652.

      - In the AF ic3 open-state model, E-4031 adopts a slightly deeper pose within the central cavity stabilized by dual π–π stacking interactions between its aromatic rings and hERG residue Y652. Additional hydrogen bonds are observed with residues S624 and Y652, and hydrophobic contacts are observed with residues T623 and S624.

      - In the inactivated-state model, E-4031 adopts its deepest and most stabilized binding pose, consistent with its experimentally observed preference for this state. The ligand is stabilized by multiple π–π stacking interactions between its aromatic rings and hERG residues Y652 from opposing subunits. The sulfonamide nitrogen engages in hydrogen bonding with residue S649, while the piperidine nitrogen hydrogen bonds with residue Y652. Hydrophobic contacts with residues S624, Y652, and F656 further reinforce the binding, enclosing the ligand in a densely packed aromatic and polar environment.

      - Previous mutagenesis study showed that mutations involving hERG residues F557, T623, S624, Y652, and F656 affect E-4031 binding (DOI: 10.3390/ph16091204).

      - In the cryo-EM structure (PDB 8ZYP) (DOI: 10.1016/j.str.2024.08.021), E-4031 engages in a single π–π stacking interaction with hERG residue Y652, anchoring one end of the molecule. The remainder of the ligand is stabilized predominantly through hydrophobic contacts involving residues S621, L622, T623, S624, M645, G648, S649, and additional Y652 side chains, forming a largely nonpolar environment around the binding pocket.

      In both cryo-EM structures, astemizole and E-4031 adopt binding poses that closely resembles the inactivated-state model in our docking study, consistent with experimental evidence that these drugs preferentially bind to the inactivated state (DOI: 10.1124/mol.108.049056). This raises the possibility that the cryo-EM structures may capture an inactivatedlike channel state. However, closer examination of the SF reveals that the cryo-EM conformations more closely resemble the open-state PDB 5VA2 structure (DOI: 10.1016/j.cell.2017.03.048), which has been shown to be conductive here and in previous studies (DOI: 10.1073/pnas.1909196117, 10.1161/CIRCRESAHA.119.316404).

      The conformational differences between the cryo-EM and open-state docking results may reflect limitations of the docking protocol itself, as GALigandDock assumes a rigid protein backbone and cannot account for ligand-induced large conformational changes. In our open-state models, the hydrophobic pocket beneath the selectivity filter is too small to accommodate bulky ligands (Figure 3a, b), whereas the cryo-EM structures show a slight outward shift in the S6 helix that expands this space (Figure S13).These allosteric rearrangements, though small, falls outside the scope of the current docking protocol, which lacks flexibility to capture these local, ligand-induced adjustments (DOI: 10.3389/fphar.2024.1411428).

      In contrast, docking to the AlphaFold-predicted inactivated-state model reveals a reorganization beneath the selectivity filter that creates a larger cavity, allowing deeper ligand insertion. Notably, neither our inactivated-state docking nor the available cryo-EM structures show strong interactions with F656 residues. However, in the AlphaFold-predicted inactivated model, the more extensive protrusion of F656 into the central cavity may further occlude the drug’s egress pathway, potentially trapping the ligand more effectively. This could explain why mutation of F656 significantly reduces the binding affinity of E-4031 (DOI: 10.3390/ph16091204). These findings suggest that inactivation may trigger a series of modular structural rearrangements that influence drug access and binding affinity, with different aspects potentially captured in various computational and experimental studies, rather than resulting from a single, uniform conformational change.

      Discussion of the original Wang and Mackinnon finding, DOI: 10.1016/j.cell.2017.03.048 regarding C-inactivation, pore mutation S631A and F627 rearrangement is likely warranted. Since hERG inactivation is present at 0 mV in WT channels (the likely voltage for the CryoEM study) please discuss how this might affect interpretations of starting with this structure as a template for models presented here, perhaps as part of Figure S1.

      We sincerely thank the reviewer for bringing up the insightful findings from Wang and MacKinnon regarding hERG C-type inactivation as well as the voltage context of their cryo-EM structure (PDB 5VA2). We recognize that WT hERG exhibits inactivation at 0 mV, likely the condition of the cryo-EM study, raising the possibility that PDB 5VA2, while classified as an open state, might subtly reflect features of inactivation. Notably, PDB 5VA2 has been widely adopted in numerous studies and consistently found to represent a conducting state, such as in Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404) and Miranda et al. (DOI: 10.1073/pnas.1909196117). Our MD simulations further support this, showing K<sup>+</sup> conduction in the 5VA2-based open-state model (Figure 4a, c), consistent with its selectivity filter conformation (Figure S1a). Although we used PDB 5VA2 as a starting template for predicting inactivated and closed states, our AlphaFold2 predictions did not rigidly adhere to this structure, as evidenced by distinct differences in hydrogen bond networks, drug binding affinities, pore radii, and ion conductivity between our state-specific hERG channel models (Figures S2, 5, 3b, 4). Nevertheless, this does not preclude the possibility that PDB 5VA2’s certain potential inactivated-like traits at 0 mV could subtly influence our predictions elsewhere in the model, which warrants further exploration in future studies. In our revised analysis, we also tested an alternative AlphaFold-predicted conformation, referred to as Open (AlphaFold cluster 3), which, while sharing some similarities with PDB 5VA2, exhibits subtle differences in the selectivity filter and pore conformations. This structure was also found to be conducting ions and showed a drug binding profile similar to that of the PDB 5VA2-based open-state model. We greatly appreciate this feedback which helped us refine and strengthen our analysis.

      Page 8, the significance of 750 and 500 mV in terms of physiological role?

      We appreciate this opportunity to clarify the methodological rationale. Although these voltages significantly exceed typical physiological membrane potentials, their use in MD simulations is a well-established practice to accelerate ion conduction events. This approach helps overcome the inherent timescale limitations of conventional MD simulations, as demonstrated in previous studies of hERG and other ion channels. For instance, Miranda et al. (DOI: 10.1073/pnas.1909196117), Lau et al. (DOI: 10.1038/s41467-024-51208-w), Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404) applied similarly high voltages (500~750 mV) to study hERG K<sup>+</sup> conduction, which is notably small under physiological conditions at ~2 pS (DOI: 10.1161/01.CIR.94.10.2572), necessitating amplification to observe meaningful permeation within nanosecond-to-microsecond timescales. Likewise, studies of other K<sup>+</sup> ion channels, such as Woltz et al. (DOI: 10.1073/pnas.2318900121) on small-conductance calcium-activated K<sup>+</sup> channel SK2 and Wood et al. (DOI: 10.1021/acs.jpcb.6b12639) on Shaker K<sup>+</sup> channel, have used elevated voltages (250~750 mV) to probe ion conduction mechanisms via MD simulations. In addition, the typical timescale of these simulations (1 μs) is too short to capture major structural effects such as those leading to inactivation or deactivation which occur over milliseconds in physiological conditions.

      The abstract could be edited a bit to more clearly state the novel findings in this study.

      We thank the reviewer for their suggestion. We have revised the abstract to read: “To design safe, selective, and effective new therapies, there must be a deep understanding of the structure and function of the drug target. One of the most difficult problems to solve has been resolution of discrete conformational states of transmembrane ion channel proteins. An example is K<sub>V</sub>11.1 (hERG), comprising the primary cardiac repolarizing current, I<sub>kr</sub>. hERG is a notorious drug antitarget against which all promising drugs are screened to determine potential for arrhythmia. Drug interactions with the hERG inactivated state are linked to elevated arrhythmia risk, and drugs may become trapped during channel closure. While prior studies have applied AlphaFold to predict alternative protein conformations, we show that the inclusion of carefully chosen structural templates can guide these predictions toward distinct functional states. This targeted modeling approach is validated through comparisons with experimental data, including proposed state-dependent structural features, drug interactions from molecular docking, and ion conduction properties from molecular dynamics simulations. Remarkably, AlphaFold not only predicts inactivation mechanisms of the hERG channel that prevent ion conduction but also uncovers novel molecular features explaining enhanced drug binding observed during inactivation, offering a deeper understanding of hERG channel function and pharmacology. Furthermore, leveraging AlphaFold-derived states enhances computational screening by significantly improving agreement with experimental drug affinities, an important advance for hERG as a key drug safety target where traditional single-state models miss critical state-dependent effects. By mapping protein residue interaction networks across closed, open, and inactivated states, we identified critical residues driving state transitions validated by prior mutagenesis studies. This innovative methodology sets a new benchmark for integrating deep learning-based protein structure prediction with experimental validation. It also offers a broadly applicable approach using AlphaFold to predict discrete protein conformations, reconcile disparate data, and uncover novel structure-function relationships, ultimately advancing drug safety screening and enabling the design of safer therapeutics.”

      Many of the Supplemental figures would fit in better in the main text, if possible, in my opinion. For instance, the network analysis (Fig. S2) appears to be novel and is mentioned in the abstract so may fit better in the main text. The discussion section could be focused a bit more, perhaps with headers to highlight the key points.

      Yes, we agree with the reviewer and made the suggested changes. We moved Figure S2 as a new main-text figure.

      Additionally, we revised the Discussion section to improve focus and clarity.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study significantly advances our understanding of how exosomes regulate filopodia formation. Filopodia play crucial roles in cell movement, polarization, directional sensing, and neuronal synapse formation. McAtee et al. demonstrated that exosomes, particularly those enriched with the protein THSD7A, play a pivotal role in promoting filopodia formation through Cdc42 in cancer cells and neurons. This discovery unveils a new extracellular mechanism through which cells can control their cytoskeletal dynamics and interaction with their surroundings. The study employs a combination of rescue experiments, live-cell imaging, cell culture, and proteomic analyses to thoroughly investigate the role of exosomes and THSD7A in filopodia formation in cancer cells and neurons. These findings offer valuable insights into fundamental biological processes of cell movement and communication and have potential implications for understanding cancer metastasis and neuronal development.

      Weaknesses:

      The conclusions of this study are in most cases supported by data, but some aspects of data analysis need to be better clarified and elaborated. Some conclusions need to be better stated and according to the data observed.

      We appreciate the reviewer's recognition of the impact of our study. We will address the concerns about data analysis and the statement of our conclusions in our full response to reviewers.

      Reviewer #2 (Public review):

      Summary:

      The authors show that small EVs trigger the formation of filopodia in both cancer cells and neurons. They go on to show that two cargo proteins, endoglin, and THSD7A, are important for this process. This possibly occurs by activating the Rho-family GTPase CDC42.

      Strengths:

      The EV work is quite strong and convincing. The proteomics work is well executed and carefully analyzed. I was particularly impressed with the chick metastasis assay that added strong evidence of in vivo relevance.

      Weaknesses:

      The weakest part of the paper is the Cdc42 work at the end of the paper. It is incomplete and not terribly convincing. This part of the paper needs to be improved significantly

      We appreciate the reviewer's recognition of the impact of our study. Indeed, more work needs to be done to clarify the role of Cdc42 in the induction of filopodia by exosome-associated THSD7A. We anticipate that this will be a separate manuscript, delving in-depth into how exosome-associated THSD7A interacts with recipient cells to activate Cdc42 and carrying out a variety of assays for Cdc42 activation.

      Reviewer #3 (Public review):

      Summary:

      The authors identify a novel relationship between exosome secretion and filopodia formation in cancer cells and neurons. They observe that multivesicular endosomes (MVE)-plasma membrane (PM) fusion is associated with filopodia formation in HT1080 cells and that MVEs are present in filopodia in primary neurons. Using overexpression and knockdown (KD) of Rab27/HRS in HT1080 cells, melanoma cells, and/or primary rat neurons, they found that decreasing exosome secretion reduces filopodia formation, while Rab27 overexpression leads to the opposite result. Furthermore, the decreased filopodia formation is rescued in the Rab27a/HRS KD melanoma cells by the addition of small extracellular vesicles (EVs) but not large EVs purified from control cells. The authors identify endoglin as a protein unique to small EVs secreted by cancer cells when compared to large EVs. KD of endoglin reduces filopodia formation and this is rescued by the addition of small EVs from control cells and not by small EVs from endoglin KD cells. Based on the role of filopodia in cancer metastasis, the authors then investigate the role of endoglin in cancer cell metastasis using a chick embryo model. They find that injection of endoglin KD HT1080 cells into chick embryos gives rise to less metastasis compared to control cells - a phenotype that is rescued by the co-injection of small EVs from control cells. Using quantitative mass spectrometry analysis, they find that thrombospondin type 1 domain containing 7a protein (THSD7A) is downregulated in small EVs from endoglin KD melanoma cells compared to those from control cells. They also report that THSD7A is more abundant in endoglin KD cell lysate compared to control HT1080 cells and less abundant in small EVs from endoglin KD cells compared to control cells, indicating a trafficking defect. Indeed, using immunofluorescence microscopy, the authors observe THSD7A-mScarlet accumulation in CD63-positive structures in endoglin KD HT1080 cells, compared to control cells. Finally, the authors determine that exosome-secreted THSD7A induces filopodia formation in a Cdc42-dependent mechanism.

      Strengths:

      (1) While exosomes are known to play a role in cell migration and autocrine signaling, the relationship between exosome secretion and the formation of filopodia is novel.

      (2) The authors identify an exosomal cargo protein, THSD7A, which is essential for regulating this function.

      (3) The data presented provide strong evidence of a role for endoglin in the trafficking of THSD7A in exosomes.

      (4) The authors associate this process with functional significance in cancer cell metastasis and neurological synapse formation, both of which involve the formation of filopodia.

      (5) The data are presented clearly, and their interpretation appropriately explains the context and significance of the findings.

      Weaknesses:

      (1) A better characterization of the nature of the small EV population is missing:

      It is unclear why the authors chose to proceed to quantitative mass spectrometry with the bands in the Coomassie from size-separated EV samples, as there are other bands present in the small EV lane but not the large EV lane. This is important to clarify because it underlies how they were able to identify THSD7A as a unique regulator of exosome-mediated filopodia formation. Is there a reason why the total sample fractions were not compared? This would provide valuable information on the nature of the small and large EV populations.

      We would like to clarify that there are two sets of proteomics data in the manuscript. The first was comparing bands from a colloidal Coomassie-stained gel from two samples: small EVs and large EVs from B16F1 cells. In this proteomics experiment, we identified endoglin as present in small EVs, but not large EVs. For this experiment, we only sent four bands from the small EV lane, chosen based on their obvious banding pattern difference on the Coomassie gel.

      In the second proteomics experiment, we used quantitative iTRAQ proteomics to compare small EVs purified from B16F1 control (shScr) and endoglin KD (shEng1 and shEng2) cell lines. In this experiment, we sent total protein extracted from small EV samples for analysis. So, these samples included the entire EV content, not just selected bands from a gel. In this experiment, we identified THSD7A as reduced in the shEng small EVs.

      (2) Data analysis and quantification should be performed with increased rigor:

      a) Figure 1C - The optical and temporal resolution are insufficient to conclusively characterize the association between exosome secretion and filopodia. Specifically, the 10-second interval used in the image acquisitions is too close to the reported 20-second median time between exosome secretion and filopodia formation. Two-5 sec intervals should be used to validate this. It would also be important to correlate the percentage of filopodia events that co-occur with exosome secretion. Is this a phenomenon that occurs with most or only a small number of filopodia? Additionally, resolution with typical confocal microscopy is subpar for these analyses. TIRF microscopy would offer increased resolution to parse out secretion events. As the TIRF objective is listed in the Methods section, figure legends should mention which images were acquired using TIRF microscopy.

      We acknowledge that the frame rate naturally limits our estimates of the timing of filopodia formation after exosome secretion. We set out to show a relationship between exosome secretion and filopodia formation, based on their proximity in timing. While our data set shows a median time interval of 20 seconds, the true median could be between 10-30 seconds, based on our frame rate. Regardless of the exact timing, our data show that exosome secretion is rapidly followed by filopodia formation events.

      To address the question of the percentage of filopodia events that are preceded by exosome secretion, the reviewer is correct in stating that we might need TIRF microscopy and a faster frame rate to observe all the MVB fusion events and get an accurate calculation of this number. The timing of the acquisition was based on the typical timing of filopodia formation, which is slow relative to MVB fusion. Thus, with the current dataset, we could miss secretion events taking place between the 10 second time intervals. Therefore, to address this question, we would need to acquire a new dataset with a much more rapid frame acquisition (multiple frames per second rather than one frame every ten seconds). Regardless, for the secretion events that we visualized with the current dataset, we always observed subsequent filopodia formation.

      No TIRF imaging was used in this manuscript. A TIRF objective was used for selected neuron imaging (see methods); however, it was used for spinning disk confocal microscopy, not for TIRF imaging. This is stated in the methods.

      b) Figure 2 - It would be important to perform further analysis to concretely determine the relationship between exosome secretion and filopodia stability. Are secretion events correlated with the stability of filopodia? Is there a positive feedback loop that causes further filopodia stability and length with increased secretion? Furthermore, is there an association between the proximity of secretion with stability? Quantification of filopodia more objectively (# of filopodia/cell) would be helpful.

      Our data show that manipulation of general exosome secretion, via Hrs knockdown, affects both de novo filopodia formation and filopodia stability (Fig 2g,h). Interestingly, knockdown of endoglin only affects de novo filopodia formation, while filopodia stability is unaffected (Fig 4g,h). These results suggest that filopodia stability is dependent upon exosome cargoes besides endoglin/THSD7A. Such cargoes might include other extracellular matrix molecules, such as fibronectin. We previously showed that exosomes promote nascent cell adhesion and rapid cell migration, through exosome-bound fibronectin (Sung et al., Nature Communications, 6:7164, 2015). We also previously found that inhibition of exosome secretion affects the persistence of invadopodia, which are filopodia-dependent structures (Hoshino et al., Cell Reports, 5:1159-1168, 2013). We agree that this is an interesting research direction, and perhaps future work could focus on exosomal factors that are responsible for filopodia persistence. This would possibly involve more proteomics analysis to identify candidate exosomal cargoes involved in this process.

      With regard to the way we plotted the filopodia data, we plotted the cancer cell data as filopodia per cell area so that it matched the neuron data, which was plotted as filopodia per 100 µm of dendrite distance. Since the neurons cannot be imaged as a whole cell, the quantification is based on the length of the dendrite in the image. We found that graphing the cancer cell data as filopodia per cell gave similar results as filopodia per cell area. To demonstrate that this quantification gives similar results, we have now plotted the filopodia per cell area data from Fig 2 as filopodia per cell and placed these new plots in Supp Fig 2.

      c) Figure 6 - Why use different gel conditions to detect THSD7A in small EVs from B16F1 cells vs HT1080 and neurons? Why are there two bands for THSD7A in panels C and E? It is difficult to appreciate the KD efficiency in E. The absence of a signal for THSD7A in the HT1080 shEng small EVs that show a signal for endoglin is surprising. The authors should provide rigorous quantification of the westerns from several independent experimental repeats.

      Detection of THSD7A via Western blot was, unfortunately, not straightforward and simple. Due to the large size (~260 kDa) of THSD7A, its low level of expression in cancer cells, as well as the inconsistency of commercially available THSD7A antibodies, we had to troubleshoot multiple conditions. We found that it was much easier to detect THSD7A in the human fibrosarcoma cell line HT1080 than in the mouse B16F1 cells, both in the cell lysates and in the small EVs. We were unable to detect THSD7A using the same (reducing) conditions for the mouse melanoma B16F1 samples but were successful using native gel conditions. We also detected THSD7A in rat primary neuron samples. All these samples were from different source organisms (human, mouse, rat) and from either cell lysates or extracellular vesicles, further complicating the analyses. Expression and maturation of THSD7A in these different cell types and compartments could involve different post-translational modifications, such as glycosylation, thus requiring different methods needed to detect THSD7A on Western blots and leading to different banding patterns.

      With regard to the level of knockdown of THSD7A in the Western blot shown in Figure 6E, the normalized level is quantitated below the bands. If you compare that quantitation to the filopodia phenotypes in the same panel, they are quite concordant. Figures 7B and 7C show quantification of triplicate Western blots, highlighting the significant accumulation of THSD7A in shEng cell lysates, as well as significant small EV secretion of THSD7A in control and WT rescued conditions.

      (3) The study lacks data on the cellular distribution of endoglin and THSD7A:

      a) Figure 6 - Is THSD7A expected to be present in the nucleus as shown in panel D (label D is missing in the Figure). It is not clear if this is observed in neurons. a Western of endogenous THSD7A on cell fractions would clarify this. The authors should further characterize the cellular distribution of THSD7A in both cell types. Similarly, the cellular distribution of endoglin in the cancer cells should be provided. This would help validate the proposed model in Figure 8.

      The image in figure 6D shows an HT1080 cell stained with phalloidin-Alexa Fluor 488 to visualize F-actin with or without expression of THSD7A-mScarlet. In order to fully visualize the thin filopodia protrusions, the cellular plane of focus of the images for this panel was purposely taken at the bottom of the cell, where the cell is attached to the coverslip glass. Thus, we interpret the red signal across the cell body as THSD7A-mScarlet expression on the plasma membrane underneath the cell, not in the nucleus. The neuron images only include the dendrite portion of the neurons; therefore, there is no nucleus present in the neuronal images. For the cellular distribution of endoglin, we agree that this is an important future direction to understand how endoglin regulates THSD7A trafficking. We have added the lack of these data to the “Limitations” section at the end of the manuscript.

      b) Figure 7 - Although the western blot provides convincing evidence for the role of endoglin in THSD7A trafficking, the microscopy data lack resolution as well as key analyses. While differences between shSCR and shEng cells are clear visually, the insets appear to be zoomed digitally which decreases resolution and interferes with interpretation. It would be crucial to show the colocalization of endoglin and THSD7A within CD63-postive MVE structures. What are the structures in Figure 7E shSCR zoom1? It would be important to rule out that these are migrasomes using TSPAN4 staining. More information on how the analysis was conducted is needed (i.e. how extracellular areas were chosen and whether the images are representative of the larger population). A widefield image of shSCR and shEng cells and DAPI or HOECHST staining in the higher magnification images should be provided. Additionally, the authors should quantify the colocalization of external CD63 and mScarlet signals from many independently acquired images (as they did for the internal signals in panel F). Is there no external THSD7A signal in the shEng cells?

      The images for Figure 7E were taken with high resolution on a confocal microscope. Insets for Figure 7E were digitally zoomed so that readers could see the tiny structures. Zoom 1 in Figure 7E shows areas of extracellular deposition, whereas Zoom 2 shows THSD7A colocalization with CD63 in MVE. In the extracellular areas (Zoom 1), we observe small punctate depositions that are positive for CD63 and/or THSD7A-mScarlet. Our interpretation of this staining is that the cells are secreting heterogeneous small EVs that are then attached to the glass coverslip. The images and zooms in Fig 7E were chosen to be representative and indeed reveal that there is more extracellular deposition of THSD7A-mScarlet outside the control shScr cells compared to the shEng cells, consistent with more secretion of THSD7A in small EVs from shScr cells when compared to those of shEng cells (Fig 7A,B). However, we did not quantify this difference, as these experiments were conducted with transient transfection of THSD7A-mScarlet, and it is challenging to determine which cell the extracellular THSD7A-mScarlet came from, complicating any quantitative analysis on a per-cell basis.

      Quantification of internal THSD7A localization is much more straightforward in this experimental regime. Indeed, in Figure 7F, we quantitated internal colocalization of THSD7A-mScarlet and CD63, which we obtained by choosing only cells that were visually positive for THSD7A-mScarlet in each transient transfection and omitting all extracellular signals. Quantifying the extracellular colocalization of THSD7A and CD63 could certainly be a future direction for this project and would require establishing cells that stably express THSD7A-mScarlet.

      With regard to whether the extracellular deposits are migrasomes, we have no reason to believe that they would be migrasomes. The preponderance of our evidence points to exosomes as carrying THSD7A and inducing filopodia. Furthermore, CD63 is an exosome marker (Sung et al., Nat Comm, 2020) and does not induce migrasomes, unlike many other tetraspanins (Huang et al., Nat Cell Bio, 2019).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors need to clarify the figure labeling and description and conclusions would be better to be drawn based on the findings. Some figures need to more clear e.g. Figure 1E needs to have information on what are green and red fluorescent proteins. Do all figures in 1A have the same scale bar or different? Figure 3A lacks a scale bar. In Figure 3, the GFP signal is in yellow, does it represent a merge or is it just the GFP alone? Figure 6D is missing a D. Figure 4D needs to be better explained. Additionally, both figures 8B and 8C since represent a model based on all the findings of the study would be better to stand alone as a separate figure from figure 8A.

      The figure legend for figure 1E notes that green corresponds to GFP-Rab27b and the red corresponds to mCherry filler. In addition, the labels are marked to the right of the figure. For Figure 1A, we have now indicated in the legend that all scale bars = 10 µm. In figure 3, neurons were co-transfected with GFP or GFP-Rab27b. Thus, the yellow signal in these images is the merge of the mCherry filler with either GFP (expression throughout the neuron body and dendrites) or GFP-Rab27b (punctate colocalization). We have added a scale bar to Fig 3A. Figure 6D has been corrected, with a “D” label added. Figure 4D shows representative images of cells with filopodia under the various conditions, including add-back of control or endoglin-KD EVs. We have clarified the conditions in the figure legend for 4D. For Figure 8, we have now split it into 2 figures: one with data (Fig 8) and one with the model (Fig 9).

      Reviewer #2 (Recommendations for the authors):

      For the most part, this story is strong and well-presented. The findings are interesting and will significantly advance our understanding of how EVs affect various processes such as cancer metastasis. However, the Cdc42 work is not great. They only indirectly implicate Cdc42 with a somewhat iffy inhibitor (ML141) and a constitutively active form transfected into cells. Both approaches have drawbacks such as off-target effects in the case of the inhibitor and possible cross-talk to other GTPases in the case of the active mutant. The activation of Cdc42 should be demonstrated by an activity assay. Several commercial kits are available. Inhibition of Cdc42 should be tested by knockdown in addition to the inhibitor.

      We appreciate the reviewer’s recognition of our work. To address the limitations of our study, particularly the Cdc42 mechanistic work, we have now added a “Limitations of the study” section at the end of the text. Here, we address our experimental limitations and future directions.

      Reviewer #3 (Recommendations for the authors):

      (1) Since the purified small EVs contain canonical exosomal markers and originate from MVEs, the authors should consider a more consistent use of the term "exosome" to avoid confusion.

      We acknowledge that the usage of both “exosomes” and “small extracellular vesicles” can seem confusing to many readers. Typically in the EV field, we use the term “exosome” when we can reliably determine that the EVs originate from the endocytic pathway. Thus, we use this term when we have specifically perturbed this pathway by targeting Hrs or Rab27. We use the term “small extracellular vesicles” or SEVs when referring to a purified heterogeneous population of SEVs from unknown or a variety of origins. Thus, when referring to vesicles isolated from the conditioned media, we call them SEVs because we cannot determine their origin. Clarification of this terminology has been added to the introduction of the paper.

      (2) 1st results section - expressing mCherry as a "filler" is confusing, clarify that this is meant to identify cellular background.

      This has now been clarified in the paper.

      (3) Figure 3 - Although Rab27a and Rab27b play a role in exosome secretion, Rab27b does not have redundant functions with Rab27a in every cellular context. The authors should mention the specific roles of Rab27a and Rab27b in promoting MVE fusion with the PM and in regulating the anterograde movement of MVEs to the PM, respectively (Ostrowski et al. 2010, Citation 52 in the ms). Although Rab27a is not highly expressed in neurons, it is not currently clear whether Rab27b has a redundant function with Rab27a or whether there is another unknown factor that plays this role. As neurons also do not express endoglin, the mechanisms that mediate how EVs regulate filopodia formation in these cells are most probably different than in cancer cells. This should be highlighted in the discussion.

      We have now added a couple of clarifying sentences about the roles of Rab27a and Rab27b to the results section, including the Ostrowski reference and another reference suggesting possible redundancy of Rab27a and Rab27b. With regard to endoglin not being expressed by neurons, that is one reason why we carried out the proteomics with control and endoglin-KD EVs to find a universal cargo that would directly induce filopodia formation. Indeed, THSD7A seems to be such a universal cargo, expressed in both cancer cell and neuron EVs and inducing filopodia in both cell types. This point, along with the requirement for regulation of THSD7A by other molecules in neurons, is discussed in the results and discussion sections.

      (4) As the authors note, the mechanistic link between endoglin-sorted, exosomal THSD7A and Cdc42-mediated filopodia formation remains unclear. While the findings on Cdc-42 are clear, they are not surprising. What is the role of mDia/ENA/VASP or BAR proteins in this? The authors should also consider an assay to determine whether exosomal THSD7A binds to the PM to cause the signaling or if the cargo is first internalized before performing its function. Since this process is both autocrine and paracrine, the authors could co-culture THSD7A-mScarlet cells with vector control cells and observe how THSD7A-mScarlet is localized in the non-expressing cells.

      As other reviewers also noted, the Cdc42 mechanistic data at the end of the paper has clear limitations that are now addressed within the manuscript in a “Limitations of the Study” section. Here we discuss our experimental troubleshooting and approach to assaying Cdc42 involvement in this process. We acknowledge there are many rigorous experiments that could be pursued in the future to strengthen our mechanism and proposed model.

      We also agree that elucidating how THSD7A specifically interacts with target cells would be very informative and insightful. This would be most effectively assayed using a cell line that is stably expressing THSD7A-mScarlet and could be a future direction of this project. However, it is out of the scope of this current publication.

    1. Author Response:

      We appreciate the reviewers’ thoughtful assessments and constructive feedback on our manuscript. The central goal of our study was to propose a simple and biologically inspired model-based reinforcement learning (MBRL) framework that draws on mechanisms observed in episodic memory systems. Unlike model-free approaches that require processing at each state transition, our model uses sequential activity (= transition model) to predict environmental changes in the long term by leveraging episode-like representations.

      While many prior studies have focused on optimizing task performance in MBRL, our primary aim is to explore how flexible, context-dependent behavior—reminiscent of that observed in biological systems—can be instantiated using simple, neurally plausible mechanisms. In particular, we emphasize the use of an Amari-Hopfield network for the context selection module. This network, governed by Hebbian learning, forms attractors that can correct for sensory noise and facilitate associative recall, allowing dynamic separation of prediction errors due to sensory noise versus those due to contextual mismatches. However, we acknowledge that our explanation of these mechanisms, especially in relation to sensory noise, was not sufficiently developed in the current manuscript. We plan to revise the text to clarify this limitation and to expand on the implications of these mechanisms in the context of psychiatric disorder-like behaviors, as illustrated in Figure 5. Several reviewers raised concerns about the clarity of our model. Our implementation is intentionally algorithmic rather than formal, designed to provide an accessible proof-of-concept model. We will revise the manuscript to better describe the core logic of the model—namely, the bidirectional interaction between the Hopfield network (X) and the hippocampal sequence module (H), where X sends the information on estimated current context to H, and H returns a future prediction based on the episode to X. This interaction forms a loop enabling the current context estimation and its reselection.

      The key advantage of this architecture is its ability to flexibly adjust the temporal span of episodes used for inference and control, providing a potential solution to the challenge of credit assignment over variable time scales in MBRL. Because our model forms and stores the variable length of episodes depending on the context, it can handle both short-horizon and long-horizon tasks simultaneously. Moreover, because each episode is organized by context, reselecting contexts enables rapid switching between these variable timescales. This flexibility addresses a challenge in MBRL—the assignment of credit across variable time scales—without requiring explicit optimization. To better illustrate this important feature, we plan to include additional experiments in the revised manuscript that demonstrate how context-dependent modulation of episode length enhances behavioral flexibility and task performance.

      Finally, we will address the comments on the presentation and the biological grounding of our model. To improve clarity and biological relevance, we will revise the Methods section to explicitly describe how the model is grounded in mechanisms observed in real neural systems. Also, we will clarify which parts of our figures represent computational results versus schematic illustrations and more clearly explain how each model component relates to known neural mechanisms. These revisions aim to improve both clarity and accessibility for a broad audience, while reinforcing the biological relevance of our approach.

      We thank the reviewers again for their insightful comments, which will help us substantially improve the manuscript. We look forward to submitting a revised version that more clearly conveys the contributions and implications of our work.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Hama et al. explored the molecular regulatory mechanisms underlying the formation of the ULK1 complex. By employing the AlphaFold structural prediction tool, they showed notable differences in the complex formation mechanisms between ULK1 in mammalian cells and Atg1 in yeast cells. Their findings revealed that in mammalian cells, ULK1, ATG13, and FIP200 form a complex with a stoichiometry of 1:1:2. These predicted interaction regions were validated through both in vivo and in vitro assays, enhancing our understanding of the molecular mechanisms governing ULK1 complex formation in mammalian cells. Importantly, they identified a direct interaction between ULK1 and FIP200, which is crucial for autophagy. However, some aspects of this manuscript require further clarification, validation, and correction by the authors.

      Thank you for your thorough evaluation of our manuscript. We have carefully revised the manuscript to address your concerns by performing extra experiments and providing additional clarifications, validations, and corrections as written below.

      Reviewer #2 (Public review):

      Summary:

      This is important work that helps to uncover how the process of autophagy is initiated - via structural analyses of the initiating ULK1 complex. High-resolution structural details and a mechanistic insight of this complex have been lacking and understanding how it assembles and functions is a major goal of a field that impacts many aspects of cell and disease biology. While we know components of the ULK1 complex are essential for autophagy, how they physically interact is far from clear. The work presented makes use of AlphaFold2 to structurally predict interaction sites between the different subunits of the ULK1 complex (namely ULK1, ATG13, and FIP200). Importantly, the authors go on to experimentally validate that these predicted sites are critical for complex formation by using site-directed mutagenesis and then go on to show that the three-way interaction between these components is necessary to induce autophagy in cells.

      Strengths:

      The data are very clear. Each binding interface of ATG13 (ATG13 with FIP300/ATG13 with ULK1) is confirmed biochemically with ITC and IP experiments from cells. Likewise, IP experiments with ULK1 and FIP200 also validate interaction domains. A real strength of the work in in their analyses of the consequences of disrupting ATG13's interactions in cells. The authors make CRISPR KI mutations of the binding interface point mutants. This is not a trivial task and is the best approach as everything is monitored under endogenous conditions. Using these cells the authors show that ATG13's ability to interact with both ULK1 and FIP200 is essential for a full autophagy response.

      Thank you for your thoughtful review and for highlighting the importance of our approach.

      Weaknesses:

      I think a main weakness here is the failure to acknowledge and compare results with an earlier preprint that shows essentially the same thing (https://doi.org/10.1101/2023.06.01.543278). Arguably this earlier work is much stronger from a structural point of view as it relies not only on AlphaFold2 but also actual experimental structural determinations (and takes the mechanisms of autophagy activation further by providing evidence for a super complex between the ULK1 and VPS34 complexes). That is not to say that this work is not important, as in the least it independently helps to build a consensus for ULK1 complex structure. Another weakness is that the downstream "functional" consequences of disrupting the ULK1 complex are only minimally addressed. The authors perform a Halotag-LC3 autophagy assay, which essentially monitors the endpoint of the process. There are a lot of steps in between, knowledge of which could help with mechanistic understanding. Not in the least is the kinase activity of ULK1 - how is this altered by disrupting its interactions with ATG13 and/or FIP200?

      Thank you for this valuable feedback. In response, we performed a detailed structural comparison between the cryo-EM structure reported in the referenced preprint and our AlphaFold-based model. We have summarized both the similarities and differences in newly included figures (revised Figure 2A, B, 3B, S1F) and provided an in-depth discussion in the main text. Furthermore, to address the downstream consequences of ULK1 complex disruption, we have investigated the impact on ULK1 kinase activity, specifically examining how mutations affecting ATG13 or FIP200 interaction alter ULK1’s phosphorylation of a key substrate ATG14. In addition, we analyzed the effect on ATG9 vesicle recruitment. We provide the corresponding data as Figure S3C-E and detailed discussions in the revised manuscript.

      Reviewer #3 (Public review):

      In this study, the authors employed the protein complex structure prediction tool AlphaFold-Multimer to obtain a predicted structure of the protein complex composed of ULK1-ATG13-FIP200 and validated the structure using mutational analysis. This complex plays a central role in the initiation of autophagy in mammals. Previous attempts at resolving its structure have failed to obtain high-resolution structures that can reveal atomic details of the interactions within the complex. The results obtained in this study reveal extensive binary interactions between ULK1 and ATG13, between ULK1 and FIP200, and between ATG13 and FIP200, and pinpoint the critical residues at each interaction interface. Mutating these critical residues led to the loss of binary interactions. Interestingly, the authors showed that the ATG13-ULK1 interaction and the ATG13-FIP200 interaction are partially redundant for maintaining the complex.

      We are grateful for your high evaluation of our work.

      The experimental data presented by the authors are of high quality and convincing. However, given the core importance of the AlphaFold-Multimer prediction for this study, I recommend the authors improve the presentation and documentation related to the prediction, including the following:

      (1) I suggest the authors consider depositing the predicted structure to a database (e.g. ModelArchive) so that it can be accessed by the readers.

      We have deposited the AlphaFold model to ModelArchive with the accession code ma-jz53c, which is indicated in the revised manuscript.

      (2) I suggest the authors provide more details on the prediction, including explaining why they chose to use the 1:1:2 stoichiometry for ULK1-ATG13-FIP200 and whether they have tried other stoichiometries, and explaining why they chose to use the specific fragments of the three proteins and whether they have used other fragments.

      We appreciate your suggestion. As we noted in the original manuscript, previous studies have shown that the C-terminal region of ULK1 and the C-terminal intrinsically disordered region of ATG13 bind to the N-terminal region of the FIP200 homodimer (Alers, Loffler et al., 2011; Ganley, Lam du et al., 2009; Hieke, Loffler et al., 2015; Hosokawa, Hara et al., 2009; Jung, Jun et al., 2009; Papinski and Kraft, 2016; Wallot-Hieke, Verma et al., 2018). We relied on these findings when determining the specific regions to include in our complex prediction and when selecting a 1:1:2 stoichiometry for ULK1–ATG13–FIP200 which was reported previously (Shi et al., 2020). We also used AlphaFold2 to predict the structures of the full-length ULK1–ATG13 complex and the complex of the FIP200N dimer with full-length ATG13, confirming that there were no issues with our choice of regions (revised Figure S1A-C). In the revised manuscript, we have provided a more detailed explanation of our rationale based on the previous reports and additional AlphaFold predictions.

      (3) I suggest the authors present the PAE plot generated by AlphaFold-Multimer in Figure S1. The PAE plot provides valuable information on the prediction.

      We provided the PAE plot in the revised Figure S1C.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1D, the labels for the input and IP of ATG13-FLAG should be corrected to ATG13-FLAG FIP3A.

      We thank the reviewer for pointing out these labeling mistakes. We revised the labels based on the suggestions.

      (2) In the discussion section, the authors should address why ATG13-FLAG ULK1 2A in Fig. 2D leads to a significantly lower expression of ULK1 and provide possible explanations for this observation.

      ATG13 and ATG101, both core components of the ULK1 complex, are known to stabilize each other through their mutual interaction. Loss or reduction of one protein typically leads to the destabilization of the other. In this context, ULK1 is similarly stabilized by binding to ATG13. Therefore, ATG13-FLAG ULK2A mutant, which has reduced binding to ULK1, likely loses this stabilizing activity and ULK1 becomes destabilized, resulting in the lower expression levels of ULK1. We added these discussions in the revised manuscript.

      (3) In Figure 4B, the authors should explain why Atg13-FLAG KI significantly affects the expression of endogenous ULK1. Could Atg13-FLAG KI be interfering with its binding to ULK1? Experimental evidence should be provided to support this. Additionally, does Atg13-FLAG KI affect autophagy? Wild-type HeLa cells should be included as a control in Figure 4C and 4D to address this question.

      Thank you for your constructive suggestion. We found a technical error in the ULK1 blot of Figure 4B. Therefore, we repeated the experiment. The results show that ULK1 expression did not significantly change in the ATG13-FLAG KI. These findings are consistent with Figure S3A. We have replaced Figure 4B with this new data.

      We agree that including wild-type HeLa cells as a control is essential to determine whether ATG13-FLAG KI affects autophagy. We performed the same experiments in wild-type HeLa cells and found that ATG13-FLAG KI does not significantly impact autophagic flux. Accordingly, we have replaced Figures 4D and 4E with these new data.

      (4) In Figure 3C, the authors used an in vitro GST pulldown assay to detect a direct interaction between ULK1 and FIP200, which was also confirmed in Figure 3E. However, since FLAG-ULK1 FIP2A affects its binding with ATG13 (Fig. 3E), it is possible that ULK1 FIP2A inhibits autophagy by disrupting this interaction. The authors should therefore use an in vitro GST pulldown assay to determine whether GST-ULK1 FIP2A affects its binding with ATG13. Additionally, the authors should investigate whether the interaction between ULK1 and FIP200 in cells requires the involvement of ATG13 by using ATG13 knockout cells to confirm if the ULK1-FIP200 interaction is affected in the absence of ATG13.

      Thank you for the valuable suggestion. We examined the effect of the FIP2A mutation on the ULK1–ATG13 interaction using isothermal titration calorimetry (ITC) to obtain quantitative binding data. The results showed that the FIP2A mutation does not markedly alter the affinity between ULK1 and ATG13 (revised Figure S2B), suggesting that FIP2A mainly weakens the ULK1–FIP200 interaction. Regarding experiments in ATG13 knockout cells, ULK1 becomes destabilized in the absence of ATG13, making it technically difficult to assess how the ULK1–FIP200 interaction is affected under those conditions.

      Reviewer #2 (Recommendations for the authors):

      I feel the manuscript would benefit from a more detailed comparison with the Hurely lab paper - are the structural binding interfaces the same, or are there differences?

      We appreciate the suggestion to compare our results more closely with the work from the Hurley lab. We performed a detailed structural comparison between the cryo-EM structure reported in the referenced preprint and our AlphaFold-based model (revised Figure 2A, B, 3B, S1F) and provided an in-depth discussion in the main text.

      As mentioned, what happens downstream of disrupting the ULK1 complex? How is ULK1 activity changed, both in vitro and in cells? Does disruption of the ULK1 complex binding sites impair VPS34 activity in cells (for example by looking at PtdIns3P levels/staining)?

      Thank you for your insightful comments. We focused on elucidating how disrupting the ULK1 complex leads to impaired autophagy. To assess ULK1 activity, we measured ULK1-dependent phosphorylation of ATG14 at Ser29 (PMID: 27046250; PMID: 27938392). In FIP3A and FU5A knock-in cells, ATG14 phosphorylation was significantly reduced, indicating decreased ULK1 activity (revised Figure S3D, E). This observation is consistent with previous work showing that FIP200 recruits the PI3K complex. Notably, in ATG13 knockout cells, ATG14 phosphorylation became almost undetectable, though the underlying mechanism remains to be fully investigated. Altogether, these data point to reduced ULK1 activity as a key factor explaining the autophagy deficiency observed in FU5A knock-in cells.

      We also explored possible downstream mechanisms. One well-established function of ATG13 is to recruit ATG9 vesicles (PMID: 36791199). These vesicles serve as an upstream platform for the PI3K complex, providing the substrate for phosphoinositide generation (PMID: 38342428). To clarify how our mutations impact this step, we starved ATG13-FLAG knock-in cells and observed ATG9 localization. Unexpectedly, even in FU5A knock-in cells where ATG13 is almost completely dissociated from the ULK1 complex, ATG9A still colocalized with FIP200 (revised Figure S3C). These puncta also overlapped with p62, likely because p62 bodies recruit both FIP200 and ATG9 vesicles. Although we suspect that ATG9 recruitment is nonetheless impaired under these conditions, we were unable to definitively demonstrate this experimentally and consider it an important avenue for future study.

      Reviewer #3 (Recommendations for the authors):

      Here are some additional minor suggestions:

      (1) The UBL domains are only mentioned in the abstract but not anywhere else in the manuscript. I suggest the authors add descriptions related to the UBL domains in the Results section.

      We thank the reviewer for pointing out the lack of description of UBL domains, which we added in Results in the revised manuscript.

      (2) The authors may want to consider adding a diagram in Figure 1A to show the domain organization of the three full-length proteins and the ranges of the three fragments in the predicted structure.

      We have added a proposed diagram as Figure 1A.

      (3) I suggest the authors consider highlighting in Figure 1A the positions of the binding sites shown in Figure 1B, for example, by adding arrows in Figure 1A.

      We have added arrows in the revised Figure 1B (which was Figure 1A in the original submission).

      (4) In Figure 1D, "Atg13-FLAG" should be "Atg13-FLAG FIP3A".

      We have revised the labeling in Figure 1D.

      (5) "the binding of ATG13 and ULK1 to the FIP200 dimer one by one" may need to be re-phrased. "One by one" conveys a meaning of "sequential", which is probably not what the authors meant to say.

      We have revised the sentence as “the binding of one molecule each of ATG13 and ULK1 to the FIP200 dimer”.

      (6) In "Wide interactions were predicted between the four molecules", I suggest changing "wide" to "extensive".

      We have changed “wide” to “extensive” in the revised manuscript.

      (7) In "which revealed that the tandem two microtubule-interacting and transport (MIT) domains in Atg1 bind to the tandem two MIT interacting motifs (MIMs) of ATG13", I suggest changing the two occurrences of "tandem two" to "two tandem" or simply "tandem".

      We simply used "tandem" in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Recent work has demonstrated that the hummingbird hawkmoth, Macroglossum stellatarum, like many other flying insects, use ventrolateral optic flow cues for flight control. However, unlike other flying insects, the same stimulus presented in the dorsal visual field elicits a directional response. Bigge et al., use behavioral flight experiments to set these two pathways in conflict in order to understand whether these two pathways (ventrolateral and dorsal) work together to direct flight and if so, how. The authors characterize the visual environment (the amount of contrast and translational optic flow) of the hawkmoth and find that different regions of the visual field are matched to relevant visual cues in their natural environment and that the integration of the two pathways reflects a priortiziation for generating behavior that supports hawkmoth safety rather than than the prevalence for a particular visual cue that is more prevalent in the environment.

      Strengths:

      This study creatively utilizes previous findings that the hawkmoth partitions their visual field as a way to examine parallel processing. The behavioral assay is well-established and the authors take the extra steps to characterize the visual ecology of the hawkmoth habitat to draw exciting conclusions about the hierarchy of each pathway as it contributes to flight control.

      Weaknesses:

      The work would be further clarified and strengthened by additional explanation included in the main text, figure legends, and methods that would permit the reader to draw their own conclusions more feasibly. It would be helpful to have all figure panels referenced in the text and referenced in order, as they are currently not. In addition, it seems that sometimes the incorrect figure panel is referenced in the text, Figure S2 is mislabeled with D-E instead of A-C and Table S1 is not referenced in the main text at all. Table S1 is extremely important for understanding the figures in the main text and eliminating acronyms here would support reader comprehension, especially as there is no legend provided for Table S1. For example, a reader that does not specialize in vision may not know that OF stands for optic flow. Further detail in figure legends would also support the reader in drawing their own conclusions. For example, dashed red lines in Figures 3 and 4 A and B are not described and the letters representing statistical significance could be further explained either in the figure legend or materials to help the reader draw their own conclusions.

      We appreciate the suggestions to improve the clarity of the manuscript. We have extensively re-structured the entire manuscript. Among others, we have referenced all figure panels in the text in the order they appear. To do so, we combined the optic flow and contrast measurements of our setup with the methods description of the behavioural experiments (formerly Figs. 5 and 2, respectively). This new figure 2 now introduces the methods of the study, while the remainder of Fig. 2, which presented the experiments that investigated the vetrolateral and dorsal response in more detail, is now a separate figure (Fig. 3). This arrangement also balances the amount of information contained  in each figure better.

      Reviewer #2 (Public review):

      Summary:

      Bigge and colleagues use a sophisticated free-flight setup to study visuo-motor responses elicited in different parts of the visual field in the hummingbird hawkmoth. Hawkmoths have been previously shown to rely on translational optic flow information for flight control exclusively in the ventral and lateral parts of their visual field. Dorsally presented patterns, elicit a formerly completely unknown response - instead of using dorsal patterns to maintain straight flight paths, hawkmoths fly, more often, in a direction aligned with the main axis of the pattern presented (Bigge et al, 2021). Here, the authors go further and put ventral/lateral and dorsal visual cues into conflict. They found that the different visuomotor pathways act in parallel, and they identified a 'hierarchy': the avoidance of dorsal patterns had the strongest weight and optic flow-based speed regulation the lowest weight.

      Strengths:

      The data are very interesting, unique, and compelling. The manuscript provides a thorough analysis of free-flight behavior in a non-model organism that is extremely interesting for comparative reasons (and on its own). These data are both difficult to obtain and very valuable to the field.

      Weaknesses:

      While the present manuscript clearly goes beyond Bigge et al, 2021, the advance could have perhaps been even stronger with a more fine-grained investigation of the visual responses in the dorsal visual field. Do hawkmoths, for example, show optomotor responses to rotational optic flow in the dorsal visual field?

      We thank the reviewer for the feedback, and the suggestions for improvement of the manuscript (our implementations are detailed below). We fully agree that this study raises several intriguing questions regarding the dorsal visual response, including how the animals perceive and respond to rotational optic flow in their dorsal visual field, particularly since rotational optic flow may be processed separately from translational optic flow.

      In our free-flight setup, it was not possible to generate rotational optic flow in a controlled manner. To explore this aspect more systematically, a tethered-flight setup would be ideal, or alternatively, a free-flight setup integrated with virtual reality. This would be a compelling direction for a follow-up study.

      Reviewer #3 (Public review):

      The central goal of this paper as I understand it is to extract the "integration hierarchy" of stimulus in the dorsal and ventrolateral visual fields. The segregation of these responses is different from what is thought to occur in bees and flies and was established in the authors' prior work. Showing how the stimuli combine and are prioritized goes beyond the authors' prior conclusions that separated the response into two visual regions. The data presented do indeed support the hierarchy reported in Figure 5 and that is a nice summary of the authors' work. The moths respond to combinations of dorsal and lateral cues in a mixed way but also seem to strongly prioritize avoiding dorsal optic flow which the authors interpret as a closed and potentially dangerous ecological context for these animals. The authors use clever combinations of stimuli to put cues into conflict to reveal the response hierarchy.

      My most significant concern is that this hierarchy of stimulus responses might be limited to the specific parameters chosen in this study. Presumably, there are parameters of these stimuli that modulate the response (spatial frequency, different amounts of optic flow, contrast, color, etc). While I agree that the hierarchy in Figure 5 is consistent for the particular stimuli given, this may not extend to other parameter combinations of the same cues. For example, as the contrast of the dorsal stimuli is reduced, the inequality may shift. This does not preclude the authors' conclusions but it does mean that they may not generalize, even within this species. For example, other cue conflict studies have quantified the responses to ranges of the parameters (e.g. frequency) and shown that one cue might be prioritized or up-weighted in one frequency band but not in others. I could imagine ecological signatures of dorsal clutter and translational positioning cues could depend on the dynamic range of the optic flow, or even having spatial-temporal frequency-dependent integration independent of net optic flow.

      We absolutely agree that in principle, an observed integration hierarchy is only valid for the stimuli tested. Yet, we do believe that we provide good evidence that our key observations are robust also for related stimuli to the ones tested:

      Most importantly, we found that both pathways act in parallel (and are not mutually exclusive, or winner-takes-all, for example), when the animals can enact the locomotion induced by the dorsal and ventrolateral pathway. We tested this with the same dorsal cue (the line switching direction), but different behavioural paradigms (centring vs unilateral avoidance), and different ventrolateral stimuli (red gratings of one spatial frequency, and 100% nominal contrast black-and-white checkerboard stimuli which comprised a range of spatial frequencies) – and found the same integration strategy.

      Certainly, if the contrast of the visual cues was reduced to the point that the dorsal or ventrolateral responses became weaker, we would expect this to be visible in the combined responses, with the respective reduction in response strength for either pathway, to the same degree as they would be reduced when stimuli were shown independently in the dorsal and ventrolateral visual field.

      For testing whether the animals would show a weighting of responses when it was not possible to enact locomotion to both pathways, we felt it was important to use similar external stimuli to be able to compare the responses. So we can confidently interpret their responses in terms of integration. Indeed, how this is translated to responses in the two pathways depends a) on the spatiotemporal tuning, contrast sensitivity and exact receptive fields of the two systems, b) the geometry of the setup and stimulus coverage, and therefore the ability of the animals to enact responses to both pathways independently and c) on the integration weights.

      It would indeed be fascinating to obtain this tuning and the receptive fields, and having these, test a large array of combinations of stimuli and presentation geometries, so that one could extract integration weights for different presentation scenarios from the resulting flight responses in a future study.

      We also expanded the respective discussion section to reflect these points: l. 391-417. We also updated the former Fig. 5, now Fig. 6 to reflect this discussion.

      The second part of this concern is that there seems to be a missed opportunity to quantify the integration, especially when the optic flow magnitude is already calculated. The discussion even highlights that an advantage of the conflict paradigm is that the weights of the integration hierarchy can be compared. But these weights, which I would interpret as stimulus-responses gains, are not reported. What is the ratio of moth response to optic flow in the different regions? When the moth balances responses in the dorsal and ventrolateral region, is it a simple weighted average of the two? When it prioritizes one over the other is the response gain unchanged? This plays into the first concern because such gain responses could strongly depend on the specific stimulus parameters rather than being constant.

      Indeed, we set up stimuli that are comparable, as they are all in the visual domain, and since we can calculate their external optic flow and contrast magnitudes, to control for imbalances in stimulus presentation, which is important for the interpretation of the resulting data.

      As we discussed above, we are confident that we are observing general principles of the integration of the two parallel pathways. However, we refrained from calculating integration weights, because these might be misleading for several reasons:

      (1) In situations where the animals can enact responses to both pathways, we show that they do so at the full original magnitudes. So there are no “weights” of the hierarchy in this case.

      (2) Only when responses to both systems are not possible in parallel, do we see a hierarchy. However, combined with point (1), this hierarchy likely depends on the geometry of the moths’ environment: it will be more pronounced the less both systems can be enacted in parallel.

      (3) The hierarchy also does not affect all features of the dorsal or ventrolateral pathway equally. The hawkmoths still regulate their perpendicular distance to ventral gratings with dorsal gratings present, to same degree as with only ventral grating - because perpendicular distance regulation is not a feature of the dorsal response. And while the hawkmoths show a significant reduction in their position adjustment to dorsal contrast when it is in conflict with lateral gratings (Fig. 4C), they show exactly the same amount of lateral movement and speed adjustment as for dorsal gratings alone, when not combined with lateral ones (Fig. 4D and Fig. S3A). So even for one particular setup geometry and stimulus combination, there clearly is not one integration weight for all features of the responses.

      We extended the discussion section to clarify these points “The benefit of our study system is that the same cues activate different control pathways in different regions of the visual field, so that the resulting behaviour can directly be interpreted in terms of integration weights” (l. 448-451)

      l. 391-417, we also updated the former Fig. 5, now Fig. 6 to reflect this discussion.

      The authors do explain the choice of specific stimuli in the context of their very nice natural scene analysis in Fig. 1 and there is an excellent discussion of the ecological context for the behaviors. However, I struggled to directly map the results from the natural scenes to the conclusions of the paper. How do they directly inform the methods and conclusions for the laboratory experiments? Most important is the discussion in the middle paragraph of page 12, which suggests a relationship with Figure 1B, but seems provocative but lacking a quantification with respect to the laboratory stimuli.

      We show that contrast cues and translational optic flow are not homogeneously distributed in the natural environments of hawkmoths. This directly related to our laboratory findings, when it comes to responses to these stimuli in different parts of their visual field. In order to interpret the results of these behavioural experiments with respect to the visual stimuli, we did perform measurements of translational optic flow and contrast cues in the laboratory setup. As a result, we make several predictions about the animals’ use of translational optic flow and contrast cues in natural settings:

      a) Hawkmoths in the lab responded strongest to ventral optic flow, even though it was not stronger in magnitude, given our measurements, than lateral optic flow. Thus, we propose that the stronger response to ventral optic flow might be an evolutionary adaptation to the natural distribution of translational optic flow cues.

      b) In the natural habitats of hawkmoths, dorsal coverage is much less frequent that ventrolateral structures generating translational optic flow, yet the hawkmoths responded with a much higher weight to the former. Moreover, in our flight tunnel experiments, the animals responded with the same or higher weights to dorsal cues, which had a lower magnitude of translational optic flow and contrast than the same cues in the ventrolateral visual field. So we showed, combining behavioural experiments and stimulus measurements in the lab that the weighting of dorsal and ventrolateral cues did not follow their stimulus magnitude in the lab. Moreover, comparing to the natural cue distributions, we suggest that the integration weights also did not evolve to match the prevalence of these cues in natural habitats.

      We integrated the measurements of natural visual scene statistics in the new Fig. 6, to relate the behavioural findings to the natural context also in the figure structure, and sequence logic of the text, as they are discussed here.

      The central conclusion of the first section of the results is that there are likely two different pathways mediating the dorsal and the ventrolateral response. This seems reasonable given the data, however, this was also the message that I got from the authors' prior paper (ref 11). There are certainly more comparisons being done here than in that paper and it is perfectly reasonable to reinforce the conclusion from that study but I think what is new about these results needs to be highlighted in this section and differentiated from prior results. Perhaps one way to help would be to be more explicit with the open hypotheses that remain from that prior paper.

      We appreciate the suggestion to highlight more clearly what the open questions that are addressed in this study are. As a result, we have entirely restructured the introduction, added sections to the discussion and fundamentally changed the graphical result summary in Fig. 6, to reflect the following new findings (and differences to the previous paper):

      The previous paper demonstrated that there are two different pathways in hummingbird hawkmoths that mediate visual flight guidance, and newly described one of them, the dorsal response. This established flight guidance in hummingbird hawkmoths as a model for the questions asked in the current study, which are very different in nature from the previous paper.  

      The main question addressed in the current study is how these two flight guidance pathways interact to generate consistent behaviour? Throughout the literature of parallel sensory and motor pathways guiding behaviour, there are different solutions – from winner-takes-all to equal mixed responses. We tested this fundamental question using the hummingbird hawkmoth flight guidance systems as a model.

      This is the main question addressed in the various conflict experiments in this study, and we show that indeed, the two systems operate in parallel. As long as the animals can enact both dorsal and optic-flow responses, they do so at the original strengths of the responses. Only when this is not possible, hierarchies become visible. We carefully measured the optic flow and contrast cues generated by the different stimuli to ensure that the hierarchies we observed were not generated by imbalances of the external stimuli.

      - Does the interaction hierarchy of the two pathways follow the statistics of natural environments?  We did show qualitatively previously how optic flow and contrast cues are distributed across the visual field in natural habitats of the hummingbird hawkmoth. In this study, we quantitatively analysed the natural image data, including a new analysis for the contrast edges, and statistically compared the results across conditions. This quantitative analysis supported the previous qualitative assessment that the prevalence of translational optic flow was highest in the ventral and lowest in the dorsal visual field in all natural habitat types. The distribution of contrast edges across the visual field did depend on habitat type much stronger than visible in the qualitative analysis in the previous paper. When compared to the magnitude of the behavioural responses, and considering that the hummingbird hawkmoth is predominantly found in open and semi-open habitats, the natural distributions of optic flow and contrast edges did not align with the response hierarchy observed in our laboratory experiments. Dorsal cues elicited much stronger responses relative to ventrolateral optic flow responses than would be expected.

      To provide a more complete picture of the dorsal pathway, which will be important to understand its nature, and also compare to other species, we conducted additional experiments that were specifically set up to test for response features known from the translational optic flow response. To compare and contrast the two systems. These experiments here allowed us to show that the dorsal response is not simply a translational optic flow reduction response that creates much stronger output than the ventrolateral optic flow response. We particularly show that the dorsal response was lacking the perpendicular distance regulation of the optic flow response, while it did provide alignment with prominent contrasts (possibly to reduce the perceived translational optic flow), which is not observed in the ventrolateral optic flow response. The strong avoidance of any dorsal contrast cues, not just those inducing translational optic flow, is another feature not found in the ventrolateral pathway.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Many comparisons between visual conditions are made and it was confusing at times to know which conditions the authors were comparing. Thinking of a way to label each condition with a letter or number so that the authors could specify which conditions are specifically being compared would greatly enhance comprehension and readability.

      We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.

      Consider adding in descriptive words to the y-axis labels for the position graphs that would help the reader quickly understand what a positive or negative value means with respect to the visual condition.

      We did now change the viewpoint on the example tracks in Figs. 2-5, to take a virtual viewpoint from the top, not as the camera recorded from below, which requires some mental rotation to reconcile the left and right sides. Moreover, we noticed that the example track axes were labelled in mm, while the axes for the plots showing median position in the tunnel were labelled in cm. We reconciled the units as well. This will make it easier to see the direct equivalent of the axis (as well as positive and negative values) in the example tracks in those figures, and the median positions, as well as the cross-index.

      There are no line numbers provided so it is a bit challenging to provide feedback on specific sentences but there are a handful of typos in the manuscript, a few examples:

      (1) Cue conflict section, first paragraph: "When both cues were presented to in combination, ..." (remove to)

      (2) The ecological relevance section, first paragraph, first sentence: "would is not to fly"

      (3) Figure S3 legend: explanation for C is labeled as B and B is not included with A

      We apologise for the missing line numbers. We added these and resolved the issues 1-3.

      Reviewer #2 (Recommendations for the authors):

      - The pictograms in Fig. 1a were at first glance not clear to me, maybe adding l, r, d, v to the first pictogram could make the figure more immediately accessible.

      We added these labels to make it more accessible.

      - I would suggest noting in the main text that the red patterns were chosen for technical reasons (see Methods), if this is correct.

      We added this information and a reference to the methods in the main text (lines 100-102).

      - "Thus, hawkmoths are currently the only insect species for which a partitioning of the visual field has been demonstrated in terms of optic-flow-based flight control [33-35]." I think that is a bit too strong and maybe it would be more interesting to connect the current data to connected data in other insects to perhaps discuss important similarities. Ref 32 for example shows that fruit flies weigh ventral translational optic flow considerably more than dorsal translational optic flow. Reichardt 1983 (Naturwissenschaften) showed that stripe fixation in large flies (a behaviour relying in part on the motion pathway) is confined to the ventral visual field, etc...

      We have changed this sentence to acknowledge partitioning in other insects, and motivating the use of our model species for this study: While fruit flies weight ventral translational optic flow stronger than dorsal optic flow, the most extreme partitioning of the visual field in terms of  optic-flow-based flight control has been observed in hawkmoths [33-35]. (lines 60-62)

      - I think the statistical differences group mean differences could be described in more detail at least in Fig. 2 (to me the description was not immediately clear, in particular with the double letters).

      We added an explanation of the letter nomenclature to all respective figure legends:

      Black letters show statistically significant differences in group means or median, depending on the normality of the test residuals (see Methods, confidence level: 5%). The red letters represent statistically significant differences in group variance from pairwise Brown–Forsythe tests (significance level 5%). Conditions with different letters were significantly different from each other. The white boxplots depict the median and 25% to 75% range, the whiskers represent the data exceeding the box by more than 1.5 interquartile ranges, and the violin plots indicate the distribution of the individual data points shown in black.

      - "When translational optic flow was presented laterally" I would use a more wordy description, since it is the hawkmoth that is controlling the optic flow and in addition to translational optic flow, there might also be rotational components, retinal expansion etc.

      We extended the description to explain that the moths were generating the optic flow percept based on stationary gratings in different orientations, by way of their flight through the tunnel. Lines 127-129

      - While it is clearly stated that the measure of the perpendicular distance from the ventral and dorsal pattern via the size of the insect as seen by the camera is indirect, I would suggest to determine the measurement uncertainty of distance estimate.

      - Connected to above - is the hawkmoth area averaged over the entire flight and is the variance across frames similar in all the stimuli conditions? Is it, in principle, conceivable that the hawkmoths' pitch (up or down) is different across conditions, e.g. with moths rising and falling more frequently in a certain condition, which could influence the area in addition to distance?

      There are a number of sources that generate variance in the distance estimate (which was based on the size of the moth in each video frame, after background subtraction): the size of the animal, the contrast with which the animal was filmed (which also depended on the type of pattern in the tunnel – it was lower with ventral or dorsal patterns as a background than with lateral ones), and the speed of the animal, as motion blur could impact the moth’s image on the video. The latter is hard to calibrate, but the uncertainty related to animal size and pattern types could theoretically be estimated. However, since we moved between finishing the data acquisition for this study and publishing the paper, the original setup has been dismantled. We could attempt to recreate it as faithfully as possible, but would be worried to introduce further noise. We therefore decided to not attempt to characterise the uncertainty, to not give a false impression of quantifiability of this measure. For the purpose of this study, it will have to remain a qualitative, rather than a quantitative measure. If we should use a similar measure again, we will make sure to quantify all sources of uncertainty that we have access to.

      The variance in area is different between conditions. Most likely, the animals vary their flight height different for different dorsal and ventral patterns, as they vary their lateral flight straightness with different lateral visual input. For the reasons mentioned above, we cannot disentangle the effects of variations in flight height and other sources of uncertainty relating to animal size in the video frames. We therefore averaged the extracted area across the entire flight, to obtain a coarse measure of their flight height. Future studies focusing specifically on the vertical component or filming in 3D will be required to determine the exact amount of vertical flight variation.

      - Results second paragraph, suggestion: pattern wavelength or spatial frequency instead of spatial resolution.

      - Same paragraph, suggestion: For an optimal wavelength/spatial frequency of XX

      We corrected these to spatial frequency.

      - Above Fig 3- "this strongly suggests a different visual pathway". In my opinion it would be better to say sensory-motor /visuomotor pathway or to more clearly define visual pathway? Could one in principle imagine a uniform set of local motion sensitive neurons across the entire visual field that connect differentially to descending/motor neurons.

      We appreciate this point and changed this, and further instances in the manuscript to visuomotor pathway.

      - If I understood correctly, you calculated the magnitude of optic flow in the different tunnel conditions based on the image of a fisheye camera moving centrally in the tunnel, equidistant from all walls. I did not understand why the magnitude of optic flow should differ between the four quadrants showing the same squarewave patterns. Apologies if I missed something, but maybe it is worth explaining this in more detail in the manuscript.

      We recognize that this point may not have been immediately clear and have therefore provided additional clarification in the Methods and results section (lines 106-111, 543-549). We anticipated differences in the magnitude of optic flow due to potential contrast variations arising from the way the stimuli were generated—being mounted on the inner surfaces of different tunnel walls while the light source was positioned above. On the dorsal wall, light from the overhead lamps passed through the red material. For laterally mounted patterns, the animals perceived mainly reflected light, as these tunnel walls were not transparent.

      A similar principle applied to the background, which consisted of a white diffuser allowing light to pass through dorsally, but white non-transmissive paper laterally, with a 5% contrast random checkerboard patterns. The ventral side presented a more complex scenario, as it needed to be partially transparent for the ventrally mounted camera. Consequently, the animals perceived a combination of light reflections from the red patterns and the white gauze covering the ventral tunnel side, against the much darker background of the surrounding room.

      To ensure that the observed flight responses were not artifacts of deviations in visual stimulation from an ideal homogeneous environment, we used the camera to quantify the magnitude of optic flow and contrast patterns under these real experimental conditions. This approach also allowed us to directly relate the optic flow measurements taken indoors to those recorded outdoors, as we employed the same camera and analytical procedures for both datasets.

      Reviewer #3 (Recommendations for the authors):

      In addition to the considerations above I had a few minor points:

      There are so many different directions of stimuli and response that it is quite challenging to parse the results. Can this be made a little easier for the reader?

      We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.

      One suggestion (only a suggestion): I found myself continuously rotating the violin plots in my head so that the lateral position axis lined up with the lateral position of the tunnel icons below. Consider if rotating the plots 90 degs would help interpretability. It was challenging to keep track of which side was side.

      We did discuss this with a number of test-readers, and tried multiple configurations. They all have advantages and drawbacks, but we decided that the current configuration for the majority of testers was the current one. To help the mental transformations from the example flight tracks in the figures, we now present the example flight tracks in Figs. 2-5 in the same reference frame as the figures showing median position (so positive and negative values on those axes correspond directly), and changed the view from a below the tunnel to an above the tunnel view, as this is the more typical depiction. We hope that this enhances readability.

      Are height measurements sensitive to the roll and pitch of the animal? I suspect this is likely small but worth acknowledging.

      They are indeed. These effects are likely small but contribute to the overall inaccuracy, which we could not quantify in this particular setup (see also response to reviewer 2 on that point), which is why the height measurements have to be considered a qualitative approximation rather than a quantification of flight height. We added text to acknowledge the effects of roll and pitch specifically (lines 657-658)

      The Brown-Forsythe test was reported as paired but this seems odd because the same moths were not used in each condition. Maybe the authors meant something different by "paired" than a paired statistical design?

      Indeed, the data was not paired in the sense that we could attribute individual datapoints to individual moths across conditions. We applied the Brown-Forsythe test in a pairwise manner, comparing the variance of each condition with another one in pairs each, to test if the variance in position differed across conditions. We did phrase this misleadingly, and have corrected it to „The variance in the median lateral position (in other words, the spread of the median flight position) was statistically compared between the groups using the pairwise Brown–Forsythe tests“ l. 187-188

      There is some concern about individual moth preferences and bias due to repeated measures. I appreciate that the individual moth's identity was not likely known in most cases, but can the authors provide an approximate breakdown of how many individual moths provided the N sample trajectories?

      This is a very valid concern, and indeed one we did investigate in a previous study with this setup. We confirmed that the majority of animals (70%, 68% and 53% out of 40 hawkmoths, measured on three consecutive days) crossed the tunnel within a randomly picked window of 3h (Stöckl et al. 2019). We now state this explicitly in the methods section (lines 594-597). Thus, for the sample sizes in our study, statistically, each moth would have contributed a small number of tracks compared to the overall number of tracks sampled.

      The statistics section of the methods said that both Tukey-Kramer (post-hoc corrected means) and Kruskal-Wallis (non-parametric medians) were done. It is sometimes not clear which test was done for which figure, and where the Kruskal-Wallis test was done there does not seem to be a corrected statistical significance threshold for the many multiple comparisons (Fig. 2). It is quite possible I am just missing the details and they need to be clarified. I think there also needs to be a correction for the Brown-Forsythe tests but I don't know this method well.

      We first performed an ANOVA, and if the test residuals were not normally distributed, we used a Kruskal-Wallis test instead. For the post-hoc tests of both we used Tukey-Kramer to correct for multiple comparisons. The figure legends did indeed miss this information. We added it to clarify our statistical analysis strategy and refer to the methods section for more details (i.e. l. 185-186). All statistical results, including the type of statistical test used, have been uploaded to the data repository as well.

      The connection to stimulus reliability in the discussion seems to conflate reliability with prevalence or magnitude.

      We have rephrased the respective discussion sections to clearly separate the prevalence and magnitude of stimuli, which was measured, from an implied or hypothesized reliability (lines 510-511).

      Line numbers would be helpful for future review.

      We apologize for missing the line numbers and have added them to the revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      In a previous work Prut and colleagues had shown that during reaching, high frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report they extend their previous work by addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joint. More interestingly, the experiment revealed evidence for decomposition of the reaching movement, as well as an increase in the variance of the trajectory.

      Strengths:

      This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.

      Weaknesses:

      None

      Reviewer #1 (Recommendations for the authors):

      The authors have answered my questions adequately and I have no further comments.

      Reviewer #2 (Public review):

      This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center out reaching movements and has been published from this laboratory in several preceding studies. I found the takehome-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, data were clear, convincing and novel. The key strengths are differentiating acute from subacute (within session but not immediate) kinematic consequences of cerebellar block.

      Reviewer #2 (Recommendations for the authors):

      I think the manuscript is good as is. That said, it would have been nice to see more of the behavioral outcomes in Figure 5 (e.g. decomposition and trajectory variability) analyzed longitudinally like the velocity measurements in Fig. 4. This would clearly strengthen the insight into acute and compensatory components of cerebellar motor deficits.

      The two behavioral measures of motor noise used in our study are movement decomposition and trajectory variability (Figure 5). Since trajectory variability is measured across trials we could not analyze this measure longitudinally as a function of trial number. However, following the reviewer’s advice, we examined movement

      decomposition for successive trials in control vs. cerebellar block for movements to targets 2-4 similar to the analysis of  hand velocity in figure 4. We found no interaction effect between trial sequence x cerebellar block on movement decomposition. This result is consistent with our conclusion that noisy joint activation occurs independently of adaptive slowing of multi-joint movements. We have updated our main text (lines 293-299) and supplementary information (supplementary figure S5 and supplementary table S8) to include this result.  

      Reviewer #3 (Public review):

      Summary:

      In their revised manuscript, Sinha and colleagues aim to identify distinct causes of motor impairments seen when perturbing cerebellar circuits. This goal is an important one, given the diversity of movement related phenotypes in patients with cerebellar lesion or injury, which are especially difficult to dissect given the chronic nature of the circuit damage. To address this goal, the authors use high-frequency stimulation (HFS) of the superior cerebellar peduncle in monkeys performing reaching movements. HFS provides an attractive approach for transiently disrupting cerebellar function previously published by this group. First, they find a reduction in hand velocities during reaching, which was more pronounced for outward versus inward movements. By modeling inverse dynamics, they find evidence that shoulder muscle torques are especially affected. Next, the authors examine the temporal evolution of movement phenotypes over successive blocks of HFS trials. Using this analysis, they find that in addition to the acute, specific effects on torques in early HFS trials, there was an additional progressive reduction in velocity during later trials, which they interpret as an adaptive response to the inability to effectively compensate for interaction torques during cerebellar block. Finally, the authors examine movement decomposition and trajectory, finding that even when low velocity reaches are matched to controls, HFS produces abnormally decomposed movements and higher than expected variability in trajectory.

      Strengths:

      Overall, this work provides important insight into how perturbation of cerebellar circuits can elicit diverse effects on movement across multiple timescales.

      The HFS approach provides temporal resolution and enables analysis that would be hard to perform in the context of chronic lesions or slow pharmacological interventions. Thus, this study describes an important advance over prior methods of circuit disruption in the monkey, and their approach can be used as a framework for future studies that delve deeper into how additional aspects of sensorimotor control are disrupted (e.g., response to limb perturbations).

      In addition, the authors use well-designed behavioral approaches and analysis methods to distinguish immediate from longer-term adaptive effects of HFS on behavior. Moreover, inverse dynamics modeling provides important insight into how movements with different kinematics and muscle dynamics might be differentially disrupted by cerebellar perturbation.

      In this revised version of the manuscript, the authors have provided additional analyses and clarification that address several of the comments from the original submission.

      Remaining comments:

      The argument that there are acute and adaptive effects to perturbing cerebellar circuits is compelling, but there seems to be a lost opportunity to leverage the fast and reversible nature of the perturbations to further test this idea and strengthen the interpretation. Specifically, the authors could have bolstered this argument by looking at the effects of terminating HFS - one might hypothesize that the acute impacts on joint torques would quickly return to baseline in the absence of HFS, whereas the longer-term adaptive component would persist in the form of aftereffects during the 'washout' period. As is, the reversible nature of the perturbation seems underutilized in testing the authors' ideas. While this experimental design was not implemented here, it seems like a good opportunity for future work using these approaches.

      We agree with the reviewer that examining the effect of the cerebellar block on immediate post-block washout trials in future studies will be insightful.    

      The analysis showing that there is a gradual reduction in velocity during what the authors call an adaptive phase is convincing. While it is still not entirely clear why disruption of movement during the adaptive phase is not seen for inward targets, despite the fact that many of the inward movements also exhibit large interaction torques, the authors do raise potential explanations in the Discussion.

      The text in the Introduction and in the prior work developing the HFS approach overstates the selectivity of the perturbations. First, there is an emphasis on signals transmitted to the neocortex. As the authors state several times in the Discussion, there are many subcortical targets of the cerebellar nuclei as well, and thus it is difficult to disentangle target-specific behavioral effects using this approach. Second, the superior cerebellar peduncle contains both cerebellar outputs and inputs (e.g., spinocerebellar). Therefore, the selectivity in perturbing cerebellar output feels overstated. Readers would benefit from a more agnostic claim that HFS affects cerebellar communication with the rest of the nervous system, which would not affect the major findings of the study. In the revised manuscript, the authors do provide additional anatomical and evolutionary context and discuss potential limitations in the selectivity of HFS in the Materials and Methods. However, I feel that at least a brief mention of these caveats in the Introduction, where it is stated, "we then reversibly blocked cerebellar output to the motor cortex", would benefit the reader.

      Following the advice of the reviewer, we have now revised the introduction section of our manuscript in the following way (lines 61-67):

      “…We then reversibly disrupted cerebellar communication with other neural structures using high-frequency stimulation (HFS) of the superior cerebellar peduncle, assessing the impact of this perturbation on subsequent movements. Although our approach primarily affects cerebellar output to the motor cortex, it also disrupts fibers carrying input signals (e.g., spinocerebellar) and pathways to various subcortical targets (e.g., cerebellorubrospinal). Thus, our manipulation broadly interferes with cerebellar communication…”

      Reviewer #3 (Recommendations for the authors):

      Typo on line 102; "subs-sessions"

      We have corrected this typographical error in our revised manuscript (line 106).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Flowers et al describe an improved version of qFit-ligand, an extension of qFit. qFit and qFit-ligand seek to model conformational heterogeneity of proteins and ligands, respectively, cryo-EM and X-ray (electron) density maps using multi-conformer models - essentially extensions of the traditional alternate conformer approach in which substantial parts of the protein or ligand are kept in place. By contrast, ensemble approaches represent conformational heterogeneity through a superposition of independent molecular conformations.

      The authors provide a clear and systematic description of the improvements made to the code, most notably the implementation of a different conformer generator algorithm centered around RDKit. This approach yields modest improvements in the strain of the proposed conformers (meaning that more physically reasonable conformations are generated than with the "old" qFit-ligand) and real space correlation of the model with the experimental electron density maps, indicating that the generated conformers also better explain the experimental data than before. In addition, the authors expand the scope of ligands that can be treated, most notably allowing for multi-conformer modeling of macrocyclic compounds.

      Strengths:

      The manuscript is well written, provides a thorough analysis, and represents a needed improvement of our collective ability to model small-molecule binding to macromolecules based on cryo-EM and X-ray crystallography, and can therefore have a positive impact on both drug discovery and general biological research.

      Weaknesses:

      There are several points where the manuscript needs clarification in order to better understand the merits of the described work. Overall the demonstrated performance gains are modest (although the theoretical ceiling on gains in model fit and strain energy are not clear!).

      We thank the reviewer for their thoughtful review. To address comments, we have added clarifying statements and discussion points around the extent of performance gains, our choice of benchmarking metrics, and the “standards” in the field for significance. We expanded our analysis to highlight how to use qFit ligand in “discovery” mode, which is aimed at supporting individual modeling efforts. As we now write in the discussion:

      “It is advisable to employ qFit-ligand selectively, focusing on cases with a moderate correlation between your input model and the experimental data, strong visual density in the binding pocket, high map resolution, or when your single-conformer ligand model is strained.”

      Additionally, we note in the discussion:

      “qFit-ligand primarily serves as a “thought partner” for manual modeling. Modelers still must resolve many ambiguities, including initial ligand placement, to fully take advantage of qFit capabilities. In active modeling workflows or large scale analyses, the workflow would only accept the output of qFit-ligand when it improves model quality. In cases where qFit-ligand degrades map-to-model fit and/or strain, we can simply revert to the input model. In practice, users can easily remove poorly fitting conformations using molecular modeling software such as COOT, while keeping the well modeled conformations, which is an advantage of the multiconformer approach over ensemble refinement methods.”

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Flowers et al. aimed to enhance the accuracy of automated ligand model building by refining the qFit-ligand algorithm. Recognizing that ligands can exhibit conformational flexibility even when bound to receptors, the authors developed a bioinformatic pipeline to model alternate ligand conformations while improving fitting and more energetically favorable conformations.

      Strengths:

      The authors present a computational pipeline designed to automatically model and fit ligands into electron density maps, identifying potential alternative conformations within the structures.

      Weaknesses:

      Ligand modeling, particularly in cases of poorly defined electron density, remains a challenging task. The procedure presented in this manuscript exhibits clear limitations in low-resolution electron density maps (resolution > 2.0 Å) and low-occupancy scenarios, significantly restricting its applicability. Considering that the maps used to establish the operational bounds of qFit-ligand were synthetically generated, it's likely that the resolution cutoff will be even stricter when applied to real-world data.

      We thank Reviewer #2 for their comments on the role of conformational flexibility and how our tool addresses the complexity involved in modeling alternative conformations. We agree that there are limitations at low resolution, limiting the application of our algorithm. That is the case with all structural biology tools. Automatically finding alternative conformations of ligands in high-resolution structures is an enhancement to the toolbox of ligand fitting. Expanding the algorithm to work with fragment screening data is important in this realm, as almost all of this data fits in the high-resolution range where qFit-ligand works best.

      The reported changes in real-space correlation coefficients (RSCC) are not substantial, especially considering a cutoff of 0.1. Furthermore, the significance of improvements in the strain metric remains unclear. A comprehensive analysis of the distribution of this metric across the Protein Data Bank (PDB) would provide valuable insights.

      We agree that the changes are small, partially because the baseline (manually modeled ligands) is very high. To provide additional evidence, we added evaluations using EDIAm, which is a more sensitive metric. In Figure 2 (page 10), representing the development dataset, we see more improvements above 0.1. With this being said, it is unclear what constitutes a ‘substantial’ improvement for either of these metrics, especially considering alternative conformations may only change the coordinates of a subset of ligands, just slightly improving the fit to density.

      We agree that looking across the PDB on strain would provide valuable insight. To explore this, we looked to see how qFit-ligand could improve the fitting of deposited ligands with high strain (see section: Evaluating qFit-ligand on a set of structures known to be highly strained, Page 15). While only a subset of these structures had alternative conformers placed (24.6%), we observed that in this subset, the ligands often improved the RSCC and strain. This figure also demonstrates that while RSCC may not change much numerically, the alternative conformers explain previously unexplained density with lower energy conformers than what is currently deposited.

      To mitigate the risk of introducing bias by avoiding real strained ligand conformations, the authors should demonstrate the effectiveness of the new procedure by testing it on known examples of strained ligand-substrate complexes.

      See above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A - Specific comments:

      (1) It appears necessary to provide qFit-ligand with an initial model with the ligand already placed. This is not clear from the start of the introduction on page 3. It appears that ligand position is only weakly adjusted fairly late in the process, in step F of Figure 1. It seems, therefore, that the accuracy of initial placement is rather critical (see the example discussed on page 21). At the same time, in my experience, ambiguous cases are quite common, for example with flat ligands with a few substituents sticking out or with ligands with highly mobile tails. It would be helpful for the authors to comment on the sensitivity to initial ligand placement, either in the discussion or, better yet, in the form of an analysis in which the starting model position is randomly perturbed.

      In our revised version, we have modified the introduction to clarify the necessity of including an initial ligand model (page 4).

      “The qFit-ligand algorithm takes as input a crystal or cryo-EM structure of an initial protein-ligand complex with a single conformer ligand in PDBx/mmCIF format, a density map or structure factors (encoded by a ccp4 formatted map or an MTZ), and a SMILES string for the ligand.”

      We also describe our sampling algorithm more clearly (see: Biasing Conformer Generation, page 6). Steps A-E generate many conformations (using RDKit), which are then selected/fit into experimental density (using quadratic programming). To help with additional shifting issues in the input ligand, after the first selection, we do additional rotation/translation of the generated conformers that are kept. We then do another round of fitting to the density (quadratic programming followed by mixed integer quadratic programming).

      Given this sampling, we have not elected to do an additional computational experiment to test the “radius of convergence” or dependence on initial conditions. However, we outline the fundamental procedure here so that someone can build on the work and test the idea:

      - Create single conformer models as we currently do

      - randomly perturb the coordinates of the ligand by 0.1-0.3Å

      - refine to convergence, creating a series of “perturbed, modified true positives” for each dataset

      - Run qFit ligand

      - Evaluate the variability in the resulting multi-conformer models

      (2) Top of page 6 ("Biasing Conformer Generation"): the authors say "as we only want to generate ligands that physically fit within the protein binding pocket, we bias conformation generation towards structures more likely to fit well within the receptor's binding site". Apart from the odd redundancy of this sentence, I am confused: at the stage that seems to be referred to here (A-C in Figure 1) is the fit to the electron density already taken into account, or does this only happen later (after step E)?

      Thank you for pointing this out. We have edited the statement to clarify it:

      “To guide the conformation generation from the Chem.rdDistGeom based on the ligand type and protein pocket, we developed a suite of specialized sampling functions to bias the conformational search towards structures more likely to fit well into the receptor’s binding site.”

      We do not consider the electron density during conformer generation (only selection from the generated conformers). The sampling is additionally biased by the type of ligand and the size of the binding pocket.

      (3) qFit-ligand appears to be quite slow. Are there prospects for speedup? Can the code take advantage of GPUs or multi-CPU environments?

      We agree with this. We have made some algorithmic improvements, most notably removing duplicate conformers based on root mean squared distance. This, along with parallelization, decreased the average runtime from ~19 minutes to ~8 minutes (see additional details: qFit-ligand runtime, page 8). We do not currently take advantage of GPU specific code.

      (4) Section: Detection of experimental true positive multi-conformer ligands:

      a) Why are carbohydrate ligands excluded? This seems like an important class of ligands that one would like qFit to be able to treat! Which brings me to a related question: can covalently attached groups (e.g., glycosylation sites!) be modeled using qFit-ligand, or is qFit-ligand restricted to non-covalently bound groups?

      Currently, qFit-ligand does not support covalently bound ligands, but this is an area of interest we are hoping to expand into. In the revised version, we added the non-covalently attached carbohydrates back into the true positive dataset. In Figure 4 (page 14), we show that qFit-ligand is able to improve fit to the experimental density in around 80% of structures, while also often reducing torsion strain (see additional details: qFit-ligand applied to unbiased dataset of experimental true positives, page 14).

      b) "as well as 758 cases where the ligand model's deposited alternate conformations (altlocs) were not bound in the same chain and residue number" - I do not understand what this means, or why it leads to the exclusion of so many structures. Likewise, a number of additional exclusions are described in Figure S3. Some more background on why these all happened would be helpful. Are you just left with the "easy" cases?

      Sometimes modelers will list the multiple conformations of a bound ligand as a separate residue within the PDB file, rather than as a single multiconformer model. For example, rather than writing a multiconformer LIG bound at A, 201 with altlocs ‘A’ and ‘B’, a modeler might write this instead as LIG, A, 201 and LIG A, 301. We initially excluded these kinds of structures. However, we agree that this choice resulted in the removal of many potentially valid true positives. We have since updated our data processing pipeline to include these cases, and they are examined in the updated manuscript.

      c) I do not follow the argument made at the end of this section (last two paragraphs on page 9): "when using a single average conformation to describe density from multiple conformations, the true low-energy states may be ignored". I get that, but the conformations in the "modified true positives" dataset derive directly from models in which two conformations were modeled, so this cannot be the explanation for why qFit-ligand models result in somewhat lower average strain. It would seem that the paper could be served by providing examples where single conformations were modeled in deposited structures, but qFit detects multiple conformations.

      We agree with this comment that the strain obtained from the modified true positives is likely higher than the deposited models. However, the modified structure is refined with a single conformation, and therefore changed from the deposited “A” conformation. Thus, the reduced strain observed in our qFit-ligand models relative to the modified true positives is not unexpected.

      To expand our dataset, we also looked at deposited structures with high strain, all of which were modeled as single conformers. Here, we saw a decrease in strain when alternative conformers were placed (see section: Evaluating qFit-ligand on a set of structures known to be highly strained, page 15). Further, we provide an example from the XGen macrocycle dataset where a ligand initially modeled as a single conformer exhibited relatively high strain. After qFit‐ligand modeled a second conformation, the overall strain was reduced (Figure 6C, page 19; Figure 6—figure supplement 1C, page 59).

      (5) Section: qFit-ligand applied to an unbiased dataset of experimental true positives Bottom of page 14: The paragraph starting with "qFit-ligand shows particular strength in scenarios with strong evidence..." is enigmatic: there's no illustration (unless it directly relates to the findings in Figure 4, in which case this should be more explicit). Since this points out when the reader will and will not benefit from using qFit-ligand, it should be clear what the authors are talking about.

      This claim considers all the evidence presented in the manuscript, not necessarily one particular aspect of it. We advise using qFit-ligand when there is a moderate correlation between the input model and the experimental data, strong visual density in the binding pocket, high map resolution, and/or when your single conformer ligand model is strained. We have made all of these points clearer in the updated manuscript.

      B  - Section: qFit-ligand can automatically detect and model multiple conformations of macrocycles:

      This is an exciting extension of qFit-ligand, but some aspects of the analysis strike me as worrisome. Of the initial dataset of 150 structures, fewer than half make it all the way through analysis. It's hard to believe that this is a fully representative subset. Why, for example, could 29 structures not be refined against the deposited structure factors? Why does strain calculation (in RDKit?) fail on 30 ligands? What about the other 18 cases--why did these fail (in PHENIX?).

      We agree that this is a striking number of failures, however, we note that they are not specific shortcomings of qFit-ligand (in fact, most are because standard structural biology and/or cheminformatics software fail on many PDB depositions). Therefore, these failures reflect broader limitations in standard bioinformatics and refinement restraint files when handling macrocycles. The strain calculator we used was not built for macrocycles, and after consulting with many experts in the field, the consensus was that no method works well with macrocycles. We discuss these issues in additional detail in the discussion (page 27):

      “Additionally, our algorithm’s placement within the larger refinement and ligand modeling ecosystem highlighted other areas that need improvement. We note that macrocycles, due to their complicated and interconnected degrees of freedom, suffer acutely from the refinement issues, as demonstrated by the failure of approximately one-third of datasets in our standard preparation or post-refinement pipelines due to ligand parameterization issues. Many of these stemmed from problematic ligand restraint files, highlighting the difficulty of encoding the geometric constraints of macrocycles using standard restraint libraries. Improved force-field or restraints for macrocycles are desperately needed to improve their modeling.”

      C  - Minor issues:

      (1) "Fragment-soaked event maps" - this is a semantically strange section title!

      We have updated the section title in our revised manuscript. The new title is ‘qFit-ligand recovers heterogeneity in fragment-soaked event maps’.

      (2) Too many digits! All over the manuscript, percentages are displayed with 0.01% precision, while these mostly refer to datasets with ~150 structures. Shifting just one structure from one category to another changes these percentages by nearly 1%.

      We have updated the sig figs in our revised manuscript.

      (3) The authors are keen to classify decreases in RSCC as significant only when these changes exceed 0.1, but do not apply the same standard for increases. For instance, in Figure 4B if we were to classify improvements as significant if ΔRSCC > 0.1, there would be fewer significant improvements than decreases in performance (although it is visually clear that for most datasets things get better. Similarly, in Figure 5A if we were to classify improvements as significant if ΔRSCC > 0.1, qFit-ligand would only yield significant improvements for two out of 73 cases-not a lot).

      We agree with the reviewer that there needs to be more consistency in our analysis of improvements/deteriorations. However, we note that operationally, when the decreases in model quality are observed, the modeler would simply reject the new model in favor of the input model. We have added to the discussion:

      “In active modeling workflows or large scale analyses, the workflow would only accept the output of qFit-ligand when it improves model quality. In cases where qFit-ligand degrades map-to-model fit and/or strain, we can simply revert to the input model. In practice, users can easily remove poorly fitting conformations using molecular modeling software such as COOT, while keeping the well modeled conformations, which is an advantage of the multiconformer approach over ensemble refinement methods.”

      There is generally no consensus in the field as to what might indicate a ‘significant’ change in RSCC, and any threshold we choose would be arbitrary. We note that in our manuscript, we had previously characterized a decrease in RSCC to be ‘significant’ if it exceeded 0.1. However, as there is no real scientific justification for this cutoff, or any cutoff, we moved away from this framing in the revised manuscript. Therefore, we just classify if we improve RSCC. For example, on page 9:

      “qFit-ligand modeled an alternative conformation in 72.5% (n=98) of structures. Compared with the modified true positive models, 83.7% (n=113) of qFit-ligand models have a better RSCC and 77.0% (n=104) structures saw an improvement in EDIAm, representing an improved fit to experimental data in the vast majority of structures.”

      In addition, we have conducted additional experiments using more sensitive metrics (EDIAm) to further illustrate qFit-ligand’s performance.

      (4) Small peptides are not discussed as a class of ligands, although these are quite common.

      Canonical peptides can be modeled with standard qFit. Non-canonical peptides present failure modes similar to the macrocycles discussed above, with a mix of ATOM and HETATM records and the need for custom cif definitions and link records. For these reasons we have not included an analysis outside of the macrocycle section. We have noted this caveat in the discussion:

      “We note that even linear non-canonical peptides present similar failure modes to macrocycles, with a mix of ATOM and HETATM records and the need for custom cif definitions and link records. For these reasons, we did not include analysis on small peptide ligands; however, canonical peptides can be modeled with standard qFit [8].”

      (5) Top of page 10: "while refinement improves": what kind of refinement does this refer to?

      This refers to refinement with Phenix. We have updated this language to reflect this (page 8). “We refer to these altered structures as our ‘modified true positives’, which we use as input to qFit-ligand, and subsequent refinement using Phenix.”

      (6) Bottom of page 11: "they often did" -> "it often did"

      We have made this change in the revised version.

      (7) Top of page 14: RMSDs and B factors do have units.

      We have added the units in our revision.

      (8) Top of page 24. In the generation of a composite omit map, why are new Rfree flags being generated? Did I misunderstand that?

      r_free_flags.generate=True only creates R-free flags if they are not present in the input file as is the case for many (especially older) PDB depositions.

      (9) Bottom of page 27: how large is the mask? Presumably when alt confs of the ligand are possible, it would be helpful for the mask to cover those?

      We agree that this mask should be updated. In our revision, we define the mask around the coordinates of the full qFit-ligand ensemble. The same mask is used to calculate the RSCC of the input (single conformer) model versus the qFit-ligand model.

      (10) Middle of page 29: "These structure factors are then used to compute synthetic electron density maps." - It is not clear whether the following three sentences are an explanation of the details of that statement or rather things that are done afterwards.

      We clarify this in the manuscript (page 36).

      “These structure factors are then used to compute synthetic electron density maps. To each of these maps, we generate and add random Gaussian noise values scaled proportionally to the resolution. This scaling reflects the escalation of experimental noise as resolution deteriorates, a common occurrence in real-life crystallographic data.”

      (11) Chemical synthesis: I am not qualified to assess this and am surprised to see some much detail here rather than in some other manuscript. Are the corresponding structures deposited anywhere?

      All of the structures we discuss in this manuscript are deposited in the PDB and listed in Supplementary Table 5.

      Reviewer #2 (Recommendations for the authors):

      The data should consistently present the number of structures that exhibit improvements or deterioration in particular metrics, like RSCC and strain, using a cutoff that should be significant. For instance, stating that "85.93% (n=116) of structures having a better RSCC in the qFit-ligand models compared to the modified true positive models" without clarifying the magnitude of improvement (e.g., a marginal increase of 0.01 in RSCC) lacks meaningful context. The figures should clearly indicate the specific cutoff values used for each metric. The accompanying text should provide a detailed explanation for the selection of these cutoff values, justifying their significance in the context of the study.

      Currently, there is no established consensus within the field on what constitutes a 'significant' improvement in RSCC or strain values. As such, we chose not to impose an arbitrary cutoff and just look at which structures improve RSCC. We also removed all language stating significance, as there isn’t a good standard in the field to assess significance. This is especially important as only improvements would be considered in an active modeling project. In cases where qFit ligand degrades the RSCC (or strain) to a large extent, the modeler would simply revert to the input model.

      In the first section of Results: "First, for all ligands, we perform an unconstrained search function allowing the generated conformers to only be constrained from the bounds matrix (Figure 1A). This is particularly advantageous for small ligands that benefit from less restriction to fully explore their conformational space. We then perform a fixed terminal atoms search function (Figure 1B)." It is unclear whether a fixed terminal atom search was conducted for each conformer generated in the initial step to further explore the conformational space. This aspect should be clarified to provide a more comprehensive understanding of the methodology.

      Each independent conformer generation function (A-E) is initialized with only the input ligand model and runs in parallel with the other functions. These functions do not build on each other, but rather perturb the input molecule independently of one another. In our updated manuscript, we have clarified the methodology (page 6).

      “First, in all cases, we perform an unconstrained search function (Figure 1A), a fixed terminal atoms search function (Figure 1B), and a blob search function (Figure 1C).”

      Phrase: "We randomly sampled 150 structures and, after manual inspection of the fit of alternative conformations, chose 135 crystal structures as a development set for improving qFit-ligand." The authors should explain why they filtered 10% of the structures.

      To develop qFit-ligand, we wanted to use a very high-quality dataset. We needed to know with some degree of certainty that if qFit-ligand failed to produce an alternate conformation (or generated conformations low in RSCC or high in strain), the failure was due to an algorithmic limitation rather than poor-quality input data. Therefore, after selection based on numerical metrics, we manually examined each ligand in Coot to observe if we believed the alternative conformers fit well into the density.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1:

      Reviewer #1 (Recommendations For The Authors):

      (1) At several places in the reply to reviewers and the manuscript, when discussing the new simulations conducted, the authors mention they break the 180 trials into a train/test split of 108/108 - is this value correct? If so, how? (pg 19 of updated manuscript)  

      Thank you for pointing this out; it was not clearly explained. We have now added the explanation to the Methods section: 

      “For each iteration, we randomly selected 108 responses from the full set of 180 for training, and then independently sampled another 108 from the same full set for testing. This ensured that the same orientation could appear in both sets, consistent with the structure of the original experiment.”

      (2) I appreciate the authors have added the variance explained of principal components to the axes of Fig. 3, though it took me a while to notice this, and this isn't described in the figure caption at all. It would likely help readers to directly explain what the % means on each axis of Fig. 3.

      Thank you, we have now added a description in both Fig. 2 and 3:

      “The axes represent the first two principal components, with labels indicating the percent of total explained variance.”

      (3) I believe there is a typo/missing word in the new paragraph on pg 15: "neural visual WM representations in the early visual cortices are [[biased]] towards distractors" (I think the bracketed word may be omitted as a typo)

      Thank you - fixed.

    1. Author response:

      We would like to thank the editors and reviewers for their time and their helpful feedback. We largely agree with the reviewer recommendations and comments, which we will address for the next Version on Record of this manuscript. We plan to address reviewer comments in the following ways.

      Reviewers requested a more comprehensive analysis of our RNA-seq experiment comparing vehicle treatment to enoxolone treatment over time. We will improve our analysis by providing clear, accessible, and organized tables defining differentially expressed genes at each time point, gene set lists that comprise our gene ontology analysis, and the lists of shared differentially expressed genes from enoxolone treatment and HNF4⍺ knockout. While some of this data was provided in the supplementary files, we recognize that it should be more accessible for the reader. Furthermore, as suggested by the Reviewer, we will enhance our transcriptomic analysis by utilizing bioinformatic tools such as Enrichr.

      The Reviewers noted that we identified a number of lipoprotein-lowering compounds through our drug screen, but limited the impact of our manuscript by focusing on enoxolone, a known inhibitor of HNF4⍺ and modulator of lipid metabolism. While we understand with the sentiment that other novel compounds would be interesting to study, we aimed to demonstrate proof of concept in this manuscript. We view the characterization of novel compounds as beyond the direct scope of this manuscript. We did not perform LipoGlo imaging and electrophoresis experiments on each drug because these experiments are low-throughput given the number of drugs and doses we examined. In light of the Reviewer’s comments, we will add some additional characterizations of our validated hits with LipoGlo imaging and electrophoresis studies.

      The reviewers also identified a number of typos in text and figures that will be addressed in the next Version on Record. We believe that the recommended changes will strengthen our manuscript and broaden its appeal. We are grateful for the opportunity to improve our work based on the reviewers’ valuable suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      We thank the reviewer for highlighting the strength in our manuscript  as quote: “Overall, this work not only deepens our understanding of PRMT1's role in leukemia progression but also opens new avenues for targeting metabolic pathways in cancer therapy.”

      Weakness :

      (1) The findings rely heavily on a single AMKL cell line, with no validation in patient-derived samples to confirm clinical relevance or even another type of leukemia line. Adding the discussion of PRMT1's role in other leukemia types will increase the impact of this work.

      We mentioned in the introduction that PRMT1 is known to be the driver for leukemia with diverse types of mutations. In a related paper published in Cell Reports (Su et al. 2021), we demonstrated that PRMT1 is upregulated in MDS myeloid dysplasia syndrome patient samples and that the inhibition of PRMT1 promotes megakaryocytic differentiation of a few MDS samples. AMKL is very rare. Via Children’s Oncology group consortium, we have obtained five AMKL samples with Down’s syndrome and AMKL with RBM15-MKL1 translocation out of 32 samples in the bank over the last 20 years. Interestingly, these patient samples also contain trisomy 19. As PRMT1 is localized on chromosome 19, we speculate that PRMT1 is the significant driver for AMKL leukemia, although we have very limited genetic evidence. However, these human frozen samples derived from peripheral blood cannot be grown in a cell culture system. Although we did not perform metabolic analysis for other AMKL cell lines, we did validate in our unpublished studies that PRMT1 drives down CPT1A expression in normal bone marrow cells and platelets in mice and in human leukemia cell line called MEG-01, which can be differentiated into megakaryocytes upon PMA (phorbol 12-myristate 13-acetate) treatment. Therefore, we expect that the PRMT1-mediated metabolic reprogramming we described here should apply to other types of hematological malignancies.

      (2) The observed heterogeneity in Prmt1 expression is noted but not further investigated, leaving gaps in understanding its broader implications.

      The expression level of PRMT1 is heterogeneous within leukemia cell populations, making it intriguing to study. We can sort the cells based on high versus low PRMT1 expression using a fluorescent dye called E84. However, we have not conducted transcriptome analysis on these two populations, mainly due to resource constraints. Theoretically, the E84 high-expression population may transiently utilize glucose more efficiently, as these cells do not ectopically express PRMT1. Therefore, when nutrient levels decline, these cells might switch to the low PRMT1 expression population. It will be interesting to see whether endogenous leukemia cells transiently expressing high levels of PRMT1 take advantage of their efficient usage of glucose and thus adapt to the niche environment successfully, as we observed in the Figure 1. I agree that this would be an interesting direction to pursue in the future.

      (3) Some figures and figure legends didn't include important details or had not matching information.

      We would like to thank the reviewer for pointing out these mistakes. Now we have corrected.

      (4) Some wording is not accurate, such as line 80 "the elevated level of PRMT1 maintains the leukemic stem cells", the study is using the cell line, not leukemia stem cells.

      Leukemic stem cells are often referred to as cells that can initiate leukemia when transplanted into recipient mice, a concept first proposed by John Dick. In this study, we found that even the 6133 cell line displays heterogeneity in terms of PRMT1 expression levels. We identified a subgroup of 6133 cells as leukemia stem cells due to their ability to initiate leukemia.

      (5) In the disease model, histopathology of blood, spleen, and BM should be shown.

      We did not conduct histopathology analysis. 6133 cells associated histopathology has been published in Mercher et al JCI 2009 and a recent preprint by Diane Krause’s group.

      (6) Can MS023 treatment reverse the metabolic changes in PRMT1 overexpression AMKL cells?

      Yes, We demonstrated in figure 4 in the seahorse assays that prmt1 inhibitor can increase the oxygen consumption.

      It would be helpful to provide a summary graph at the end of the manuscript.

      Yes, we now provide a graphic abstract.

      Reviewer #2 (Public review):

      We would like to thank the reviewer for finding the manuscript novel and important.

      Weaknesses:

      (1) The manuscript lacks detailed molecular mechanisms underlying PRMT1 overexpression, particularly its role in enhancing survival and metabolic reprogramming via upregulated glycolysis and diminished oxidative phosphorylation (OxPhos). The findings primarily report phenomena without exploring the reasons behind these changes.

      In the introduction, we highlighted that numerous studies have demonstrated how PMT1 directly interacts with several key enzymes involved in glycolysis. These studies provide a mechanism for the observed upregulation of PMT1 in leukemia. Additionally, our previous research published in eLife 2015 {Zhang, 2015 #5031} demonstrated that PRMT1 methylates the RNA-binding protein RBM15, which can bind to the 3' UTR of mRNAs encoding various metabolic enzymes. Therefore, we propose that PMT1 may also regulate metabolism indirectly through the RBM15 protein.

      (2) The article shows that PRMT1 overexpression leads to augmented glycolysis and low reliance on the OxPhos. However, the manuscript also shows that PMRT1 overexpression leads to increased mitochondrial number and mitochondrial DNA content and has an elevated NADPH/NAD+ ratio. Further, these overexpressing cells have the ability to better survive on alternative energy sources in the absence of glucose compared to low PMRT1-expressing parental cells. Surprisingly, the seashores assay in PRMT1 overexpressing cells showed no further enhancement in the ECAR after adding mitochondrial decoupler FCCP, indicating the truncated mitochondrial energetics. These results are contradicting and need a more detailed explanation in the discussion.

      We have explained the metabolic changes in more detail now. Increasing mitochondria number is not equivalent to increasing fatty acid oxidation and oxygen consumption, as the mitochondria have many other functions. PRMT1 only downregulates CPT1A, which is a rate-limiting step for long-chain fatty acid oxidation. The data suggest that PRMT1 promotes the biogenesis of mitochondria maybe via PGC1alpha as published by Stallcup’s group. The seahorse assays were performed in the high concentration of glucose instead of alternative carbon sources.  FCCP treatment under high glucose conditions did not increase the ECR and OCR, which is normal for leukemia cells as shown in other people’s publications {Sriskanthadevan, 2015 #3944}{Kreitz, 2019 #2133}. PRMT1 could dampen the activities of TCA cycle and the electron transportation chain as the proteomic data from our unpublished data and published data {Fong, 2019 #1185} suggested. The elevated NADPH/NAD+ ratio is another indication that glycolysis and anabolism are enhanced by PRMT1.

      (3) How was disease penetrance established following the 6133/PRMT1 transplant before MS023 treatment?

      Yes, the data was in figure 1f, demonstrating that the penetrance is 100%.  

      (4) The 6133/PRMT1 cells show elevated glycolysis compared to parental 6133; why did the author choose the 6133 cells for treatment with the MS023 and ECAR assay (Fig.3 b)? The same is confusing with OCR after inhibitor treatment in 6133 cells; the figure legend and results section description are inconsistent.

      Sorry for the mistakes while we are preparing the manuscript.  We used 6133/PRMT1 cells to be treated with MS023 in figure 4.

      (5) The discussion is too brief and incoherent and does not adequately address key findings. A comprehensive rewrite is necessary to improve coherence and depth.

      We agree with the reviewer. Now we added comprehensive review of PRMT1-mediated metabolism. The PRMT1 homolgous in yeast is called hmt1. In yeast, hmt1 is upregulated by glucose and enhance glycolysis. So PRMT1 enhanced glycolysis is a conserved pathway in eukaryocytic cells.

      (6) The materials and methods section lacks a description of statistical analysis, and significance is not indicated in several figures (e.g., Figures 1C, D, F; Figures 2D, E, F, I). Statistical significance must be consistently indicated. The methods section requires more detailed descriptions to enable replication of the study's findings.

      We have added extra details on the methods and statistical analysis for the figures.

      (7) Figures are hazy and unclear. They should be replaced with high-resolution images, ensuring legible text and data.

      We have prepared separate figure files with high resolution.

      (8) Correct the labeling in Figure 2I by removing the redundant "D."

      We would like to thank the reviewer and fixed the figure.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Strengths:

      The genetic approaches here for visualizing the recombination status of an endogenous allele are very clever, and by comparing the turnover of wildtype and mutant cells in the same animal the authors can make very convincing arguments about the effect of chronic loss of pu.1. Likely this phenotype would be either very subtle or nonexistent without the point of comparison and competition with the wildtype cells.

      Using multiple species allows for more generalizable results, and shows conservation of the phenomena at play.

      The demonstration of changes to proliferation and cell death in concert with higher expression of tp53 is compelling evidence for the authors' argument.

      Weaknesses:

      This paper is very strong. It would benefit from further investigating the specific relationship between pu.1 and tp53 specifically. Does pu.1 interact with the tp53 locus? Specific molecular analysis of this interaction would strengthen the mechanistic findings.

      We agree with the reviewer’s assessment regarding the significance of the relationship between PU.1 and TP53. To investigate the potential interaction between Pu.1 and Tp53 in zebrafish, we analyzed the promoter region of zebrafish tp53. Indeed, we found three PU.1 binding sites (GAGGAA) on tp53 promoter, which locate on the antisense strand from position -1047 to -1042, -1098 to -1093 and -1423 to -1418 relative to the transcriptional start site (Fig. S10). These potential Pu.1 binding sites indicate a direct interaction between Pu.1 and tp53 locus. Furthermore, a previous study by Tschan et al. (2008) elucidated the mechanism by which PU.1 attenuates the transcriptional activity of the P53 tumor suppressor family through direct binding to the DNA-binding and/or oligomerization domains of p53/p73 proteins. We have also cited this study (Line 399-401) and included all above information in the discussion of the revised manuscript (Line 399-405).

      Reviewer #2 (Public review):

      Strengths:

      Generation of an elegantly designed conditional pu.1 allele in zebrafish that allows for the visual detection of expression of the knockout allele.

      The combination of analysis of pu.1 function in two model systems, zebrafish and mouse, strengthens the conclusions of the paper.

      Confirmation of the functional significance of the observed upregulation of tp53 in mutant microglia through double mutant analysis provides some mechanistic insight.

      Weaknesses:

      (1) The presented RNA-Seq analysis of mutant microglia is underpowered and details on how the data was analyzed are missing. Only 9-15 cells were analyzed in total (3 pools of 3-5 cells each). Further, the variability in relative gene expression of ccl35b.1, which was used as a quality control and inclusion criterion to define pools consisting of microglia, is extremely high (between ~4 and ~1600, Figure S7A).

      We feel sorry for the unclearness of RNAseq procedures and have accordingly added the details about RNA-seq data analysis in the “Material and methods” section (Line 491-501). Briefly, reads were aligned to the zebrafish genome using the STAR package. Original counts were calculated with featureCounts package. Differential expression genes (DEGs) were identified with the DESeq2 package. Owing to the technical challenge of unambiguously distinguishing microglia from dendritic cells (DCs) in brain cell suspensions, we employed a strategy of isolating 3-5 cells per pool and quantifying the relative expression of the microglia-specific marker ccl34b.1 normalized to the DC-specific marker ccl19a.1. This approach aimed to reduce DC contamination in downstream analyses. Across all experimental groups subjected to RNA-seq analysis, the ccl34b.1/ccl19a.1 expression ratios exceeded 5, confirming microglia as the dominant cell population. Nonetheless, residual DC contamination in the RNA-seq data cannot be entirely ruled out. We have discussed this technical constraint in the revised manuscript to ensure methodological transparency (Line 498-501).

      (2) The authors conclude that the reduction of microglia observed in the adult brain after cKO of pu.1 in the spi-b mutant background is due to apoptosis (Lines 213-215). However, they only provide evidence of apoptosis in 3-5 dpf embryos, a stage at which loss of pu.1 alone does lead to a complete loss of microglia (Figure 2E). A control of pu.1 KI/d839 mutants treated with 4-OHT should be added to show that this effect is indeed dependent on the loss of spi-b. In addition, experiments should be performed to show apoptosis in the adult brain after cKO of pu.1 in spi-b mutants as there seems to be a difference in the requirement of pu.1 in embryonic and adult stages.

      We apologize for the omission of data regarding conditional pu.1 knockout alone in the embryos in our manuscript, which may have led to ambiguity. We would like to clarify that conditional pu.1 knockout alone at the embryonic stage does not induce microglial death (Fig S2). Microglial death occurs only in both embryonic and adult brains when Pu.1 is disrupted in the spi-b mutant background. The blebbing morphology of some microglia after pu.1 conditional knockout in adult spi-b mutant indicated microglia undergo apoptosis at both embryonic and adult stages (Figure S4 and Fig. S5). The reviewer’s concern likely arises from the distinct outcomes of global pu.1 knockout (Fig. 2) versus conditional pu.1 ablation (Fig. S2). Global knockout eliminates microglia during early development due to Pu.1’s essential role in myeloid lineage specification. We have included this clarification in the revised manuscript (Line 208-211).

      (3) The number of microglia after pu.1 knockout in zebrafish did only show a significant decrease 3 months after 4-OHT injection, whereas microglia were almost completely depleted already 7 days after injection in mice. This major difference is not discussed in the paper.

      We propose that zebrafish Pu.1 and Spi-b function cooperatively to regulate microglial maintenance, analogous to the role of PU.1 alone in mice. This cooperative mechanism likely explains the observed difference in microglial depletion kinetics between zebrafish and mice following pu.1 conditional knockout. Specifically, the compensatory activity of Spi-b in zebrafish may buffer the immediate loss of Pu.1, whereas in mice, the absence of Spi-b expression in microglia eliminates this redundancy, resulting in rapid microglial depletion. Furthermore, during evolution, SPI-B appears to have acquired lineage-specific roles, becoming absent in microglia. We have included the clarification in the revised manuscript (Line 302-305).

      (4) Data is represented as mean +/-.SEM. Instead of SEM, standard deviation should be shown in all graphs to show the variability of the data. This is especially important for all graphs where individual data points are not shown. It should also be stated in the figure legend if SEM or SD is shown

      We have represented our data as mean ± SD in the revised manuscript.

      Recommendations for the authors:

      Reviewing Editor:

      To further strengthen the manuscript, we ask the authors to address the reviewers' comments through additional experiments where necessary. In cases where certain experiments may be challenging, we encourage the authors to address these concerns within the text, such as by referencing any prior evidence of pu.1 and tp53 interactions or incorporating in silico analyses that support such interaction.

      As suggested, we have performed in-silico analysis of Pu.1 binding sites in zebrafish tp53 promoter and also cited previous paper showing how PU.1 attenuates the transcriptional activity of the P53 tumor suppressor family (Line 399-405).

      Reviewer #1 (Recommendations for the authors):

      It would be useful to investigate the relationship between pu.1 and tp53. The data presented here show that pu.1 deficient cells have higher expression of tp53, but this could be an indirect effect. However, since pu.1 has known DNA binding motifs, it would be worthwhile to investigate if there are any direct interactions between pu.1 and the tp53 locus -- does pu.1 directly bind and repress tp53 expression? This could be directly investigated with Cut & Run or an EMSA.

      The interaction between Pu.1 and Tp53 has been discussed in the public review section.

      The paper would likely also benefit from a more in-depth discussion of the relationship of the zebrafish alleles and their relationship to mammalian Pu.1 -- as presented here, the authors are implicitly arguing that zebrafish pu.1 and spi-b are both more closely related to mammalian Pu.1 than to mammalian Spi-b. A clear argument, perhaps backed up by sequence alignment and homology matching, would help readers, especially those less familiar with zebrafish genome duplications.

      We have conducted detailed sequence alignment in our previous work (Yu et al., 2017, Blood) and found zebrafish Spi-b shares the highest similarity with the mammalian SPI-B among Ets family transcription factors in zebrafish. A unique P/S/T-rich region known to be essential for mammalian SPI-B transactivation activity is present in zebrafish Spi-b. Our data do not support the interpretation that Spi-b is more closely related to mammalian Pu.1 than to Spi-b. Instead, functional compensation between pu.1 and spi-b in microglia maintenance likely reflects their shared role as Ets-family transcriptional regulators, rather than ortholog-driven redundancy.

      Reviewer #2 (Recommendations for the authors):

      (1) The nomenclature of the genes in the SPI family in zebrafish is somewhat confusing as genes were renamed several times. It would make it easier for the reader to understand if in the abstract and the main text, spi-b would be referred to as the zebrafish orthologue of mouse SPI-B (as determined by the authors in previous work) rather than the paralogue of zebrafish pu.1. To clarify which genes were analyzed in both zebrafish and mouse, Gene accession numbers should be added.

      Thanks for the recommendations. We have changed “the paralogue of zebrafish pu.1” to “the orthologue of mouse Spi-b” in the abstract (Line 22) and added gene accession numbers for both zebrafish and mouse gene (Line 105-106 and 301-302).

      (2) Methods RNA-seq: Details on how the aligned reads were analyzed to detect differentially expressed genes are missing and should be added. In addition, a table with read counts, fold changes and adjusted p values should be added.

      We have added details of RNA-seq analysis in the Material and Methods part (Line 491-501). A table generated by Deseq2 has been included as a supplemental file to show read counts, fold changes and adjusted p values (Supplemental file 2).

      (3) Figure 2H: It would be helpful to the reader if the KO splicing would be shown in comparison to WT splicing.

      Thank you for your suggestion. We have added the sequence result between exon 3 and exon 4 of pu.1 from wildtype cDNA to show WT splicing in Figure 2H.

      (4) Legend Figure 5C. Relative expression should be replaced with transcripts per million (TPM).

      We have corrected it in the figure legend of Figure 5C (Line 786-787).

      (5) In Figure S3. the label on the y-axis in panel B is not visible.

      We apologize for the mistake during figures assembling. We have corrected it and now the y-axis is visible.

      (6) In Figure S7B an explanation for the colors in the heat map is missing and should be added.

      Colors represent scaled TPM values. The red color represents high expression while the blue color represents low expression. We have added the information in the figure legend.

      (7) A justification for the use of male mice only should be added or additional experiments in female mice should be performed.

      Female mice were excluded to avoid variability associated with estrous cycle-dependent hormonal changes, which are known to influence microglial behavior (Habib P et al., 2015). We have added a justification in the revised manuscript (Line 547-548).

      (8) The manuscript would benefit from some language editing. A few examples are listed below:

      a) line 97: the rostral blood (RBI) should read the rostral blood island.

      b) line 373 typo: nucleus translocation should read nuclear translocation.

      c) line 393 typo: pu.1-dificent should read pu.1-deficient.

      We apologize for the typos or grammar mistakes in the manuscript. We have checked the manuscript thoroughly and revised those typos or grammar mistakes.

      Reference:

      Tschan MP, Reddy VA, Ress A, Arvidsson G, Fey MF, Torbett BE (2008) PU.1 binding to the p53 family of tumor suppressors impairs their transcriptional activity. Oncogene 27: 3489-93

      Yu T, Guo W, Tian Y, Xu J, Chen J, Li L, Wen Z (2017) Distinct regulatory networks control the development of macrophages of different origins in zebrafish. Blood 129: 509-519

      Habib P, Beyer C (2015) Regulation of brain microglia by female gonadal steroids. J Steroid Biochem Mol Biol 146: 3-14

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tubert C. et al. investigated the role of dopamine D5 receptors (D5R) and their downstream potassium channel, Kv1, in the striatal cholinergic neuron pause response induced by thalamic excitatory input. Using slice electrophysiological analysis combined with pharmacological approaches, the authors tested which receptors and channels contribute to the cholinergic interneuron pause response in both control and dyskinetic mice (in the L-DOPA off state). They found that activation of Kv1 was necessary for the pause response, while activation of D5R blocked the pause response in control mice. Furthermore, in the L-DOPA off state of dyskinetic mice, the absence of the pause response was restored by the application of clozapine. The authors claimed that 1) the D5R-Kv1 pathway contributes to the cholinergic interneuron pause response in a phasic dopamine concentration-dependent manner, and 2) clozapine inhibits D5R in the L-DOPA off state, which restores the pause response.

      Strengths

      The electrophysiological and pharmacological approaches used in this study are powerful tools for testing channel properties and functions. The authors' group has well-established these methodologies and analysis pipelines. Indeed, the data presented were robust and reliable.

      Weaknesses:

      Although the paper has strengths in its methodological approaches, there is a significant gap between the presented data and the authors' claims.

      The authors answered the most of concerns I raised. However, the critical issue remains unresolved.

      I am still not convinced by the results presented in Fig. 6 and their interpretation. Since Clozapine acts as an agonist in the absence of an endogenous agonist, it may stimulate the D5R-cAMP-Kv1 pathway. Stimulation of this pathway should abolish the pause response mediated by thalamic stimulation in SCINs, rather than restoring the pause response. Clarification is needed regarding how Clozapine reduces D5R-ligand-independent activity in the absence of dopamine (the endogenous agonist). In addition, the author's argued that D5R antagonist does not work in the absence of dopamine, therefore solely D5R antagonist didn't restore the pause response. However, if D5R-cAMP-Kv1 pathway is already active in L-DOPA off state, why D5R antagonist didn't contribute to inhibition of D5R pathway? Since Clozapine is not D5 specific and Clozapine experiments were not concrete, I recommend testing whether other receptors, such as the D2 receptor, contribute to the Clozapine-induced pause response in the L-DOPA-off state.

      Thank you for the opportunity to clarify this point. It seems there may have been a misunderstanding regarding our proposal about clozapine's mechanism of action. We are not suggesting that clozapine acts as an agonist, but rather as an “inverse agonist”. Unlike classical agonists, inverse agonists produce a pharmacological effect opposite to that of an agonist. Although clozapine is best known for its antagonistic effects on dopamine and serotonin receptors, under conditions where no endogenous agonist is present, it has been shown to reduce the constitutive activity of D1 and D5 receptors (PMID: 24931197). This is explained in lines 240-254 in the Results section.

      In contrast, the prototypical and selective D1/D5 receptor antagonist SCH23390 does not exhibit inverse agonist properties and would not be expected to produce effects in the absence of an agonist (PMID: 7525564). The observation that SCH23390 blocks the effects of clozapine in dopamine-depleted animals strongly supports the idea that clozapine acts through D1/D5 receptors. This is now clarified in lines 257264.

      To further address your comments, we now include a new figure (Figure 6) presenting experiments that show D2-type receptor agonists do not restore the pause response in dyskinetic mice in the off-L-DOPA condition. These results are described in a new subsection of the Results section and discussed in a newly added paragraph in the Discussion (lines 369-380).

      Finally, to exclude a potential contribution of serotonin receptors to clozapine’s effects, we have expanded what is now Figure 7 (formerly Figure 6) to show that clozapine continues to restore the pause response even in the presence of a serotonin receptor antagonist in the bath.

      All these results are further discussed in lines 342-360.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Tubert et al. presents the role of D5 receptors (D5R) in regulating the striatal cholinergic interneuron (CIN) pause response through D5R-cAMP-Kv1 inhibitory signaling. Their findings provide a compelling model explaining the "on/off" switch of the CIN pause, driven by the distinct dopamine affinities of D2R and D5R. This mechanism, coupled with varying dopamine states, is likely critical for modulating synaptic plasticity in cortico-striatal circuits during motor learning and execution. Furthermore, the study bridges their previous finding of CIN hyperexcitability (Paz et al., Movement Disorder 2022) with the loss of the pause response in LID mice and demonstrates the restore of the pause through D1/D5 inverse agonism.

      Strengths:

      The study presents solid findings, and the writing is logically structured and easy to follow. The experiments are well-designed, properly combining ex vivo electrophysiology recording, optogenetics, and pharmacological treatment to dissect / rule out most, if not all, alternative mechanisms in their model.

      Weaknesses:

      While the manuscript is overall satisfying, one conceptual gap needs to be further addressed or discussed: the potential "imbalance" between D2R and D5R signaling due to the ligand-independent activity of D5R in LID. Given that D2R and D5R oppositely regulate CIN pause responses through cAMP signaling, investigating the role of D2R under LID off L-DOPA (e.g., by applying D2 agonists or antagonists, even together with intracellular cAMP analogs or inhibitors) could provide critical insights. Addressing this aspect would strengthen the manuscript in understanding CIN pause loss under pathological conditions.

      Thank you for your comments. Although our primary focus is on the role of D5 receptors, we have also investigated the effects of two D2-type receptor agonists in dyskinetic mice in the off-L-DOPA condition. We found that neither quinpirole nor sumanirole was able to restore the pause response. These results are presented in Figure 6 and related text in the Results and Discussion sections.

      Understanding why D2 agonists fail to restore the pause response—despite their expected effect of reducing cAMP levels—is an important question that warrants further investigation. Interestingly, previous studies have reported paradoxical effects of D2 receptor stimulation in SCINs in animal models of dystonia (PMID: 16934985, PMID: 21912682), as well as under conditions where the SCIN’s constitutively active integrated stress response is diminished (PMID: 33888613). This is now discussed in lines 369-380.

      Reviewer #3 (Public review):

      Summary:

      Tubert et al. investigate the mechanisms underlying the pause response in striatal cholinergic interneurons (SCINs). The authors demonstrate that optogenetic activation of thalamic axons in the striatum induces burst activity in SCINs, followed by a brief pause in firing. They show that the duration of this pause correlates with the number of elicited action potentials, suggesting a burst-dependent pause mechanism. The authors demonstrated this burst-dependent pause relied on Kv1 channels. The pause is blocked by a SKF81297 and partially by sulpiride and mecamylamine, implicating D1/D5 receptor involvement. The study also shows that the ZD7288 does not reduce the duration of the pause, and that lesioning dopamine neurons abolishes this response, which can be restored by clozapine.

      Weaknesses:

      While this study presents an interesting mechanism for SCIN pausing after burst activity, there are several major concerns that should be addressed:

      (1) Scope of the Mechanism: It is important to clarify that the proposed mechanism may apply specifically to the pause in SCINs following burst activity. The manuscript does not provide clear evidence that this mechanism contributes to the pause response observed in behavioral animals. While the thalamus is crucial for SCIN pauses in behavioral contexts, the exact mechanism remains unclear. Activating thalamic input triggers burst activity in SCINs, leading to a subsequent pause, but this mechanism may not be generalizable across different scenarios. For instance, approximately half of TANs do not exhibit initial excitation but still pause during behavior, suggesting that the burstdependent pause mechanism is unlikely to explain this phenomenon. Furthermore, in behavioral animals, the duration of the pause seems consistent, whereas the proposed mechanism suggests it depends on the prior burst, which is not aligned with in vivo observations. Additionally, many in vivo recordings show that the pause response is a reduction in firing rate, not complete silence, which the mechanism described here does not explain. Please address these in the manuscript.

      Thank you for the opportunity to clarify these points. We acknowledge that the response of SCINs to optogenetic stimulation of thalamic afferents in brain slices represents a model system that may not capture all aspects of TAN responses to behaviorally salient events. Nevertheless, this approach allows us to test mechanistic hypotheses that are difficult to address in behaving animals with current technologies. This is now stated in lines 311-314.

      Importantly, our ex vivo preparation reproduces, for the first time, the loss of TAN responses observed in non-human primates with parkinsonism, enabling investigation of the underlying mechanisms. In line with your suggestion, we have expanded the Discussion (third and fourth paragraphs) to address the sources of variability in pause responses.

      (2) Terminology: The use of "pause response" throughout the manuscript is misleading. The pause induced by thalamic input in brain slices is distinct from the pause observed in behavioral animals. Given the lack of a clear link between these two phenomena in the manuscript, it is essential to use more precise terminology throughout, including in the title, bullet points, and body of the manuscript.

      Thank you for raising this important point. We agree that it is essential to be precise in describing the nature of the pause observed in our ex vivo model. While we believe that readers would recognize from the abstract and methods that our study focuses on a model of the pause response, we understand your concern about potential confusion. In response, we have revised the terminology in the abstract, bullet points, and throughout the manuscript to more clearly reflect that we are describing an ex vivo model of the pause observed in behaving animals.

      (3) Kv1 Blocker Specificity: It is unclear how the authors ruled out the possibility that the Kv1 blocker did not act directly on SCINs. Could there be an indirect effect contributing to the burst-dependent pause?

      Clarification on this point would strengthen the interpretation of the results.

      This issue is addressed in lines 147-150.

      (4) Role of D1 Receptors: While it is well-established that activating thalamic input to SCINs triggers dopamine release, contributing to SCIN pausing (as shown in Figure 3), it would be helpful to assess the extent to which D1 receptors contribute to this burst-dependent pause. This could be achieved by applying the D1 agonist SKF81297 after blocking nAChRs and D2 receptors.

      Figure 3C shows that the D1/D5 receptor antagonist SCH23390 does not modify the pause, while the full D1/D5 agonist SKF81297 abolishes it, indicating that in our slice preparation, baseline dopamine levels are not contributing to the pause through D1/D5 receptor stimulation.

      (5) Clozapine's Mechanism of Action: The restoration of the burst-dependent pause by clozapine following dopamine neuron lesioning is interesting, but clozapine acts on multiple receptors beyond D1 and D5. Although it may be challenging to find a specific D5 antagonist or inverse agonist, it would be more accurate to state that clozapine restores the burst-dependent pause without conclusively attributing this effect to D5 receptors.

      As explained in our response to Reviewer #1, the effect of clozapine is blocked by the D1/D5-selective antagonist SCH23390. Additionally, new data presented in Figure 7C show that clozapine's ability to restore the pause response is maintained even in the presence of a broad-spectrum serotonin receptor antagonist. Since SCINs do not significantly express D1 receptors, we believe that these findings strongly support a role for D5 receptors in SCINs.

      Comments on revisions:

      The authors have addressed many of my concerns. However, I remain unconvinced that adding an 'ex vivo' experiment fully resolves the fundamental differences between the burst-dependent pause observed in slices - defined by the duration of a single AHP - and the pause response in CHINs observed in vivo, which may involve contributions from more than one prolonged AHP. In vivo, neurons can still fire action potentials during the pause, albeit at a lower frequency. Moreover, in behaving animals, pause duration does not vary with or without initial excitation. The mechanism proposed demonstrates that the pause duration, defined by the length of a single AHP, is positively correlated with preceding burst activity.

      As discussed in paragraphs 3 and 4 of the Discussion (starting at line 285), and illustrated in Figure 1J–K, our data show that the duration of the pause can be modulated by rebound excitation from thalamic input. The absence of this rebound allows us to observe a longer pause when more spikes are elicited during the initial excitatory phase, providing a clearer readout of the contribution of intrinsic membrane mechanisms. We do not claim that intrinsic mechanisms alone account for the entire phasic response of SCINs in behaving animals (lines 295-303 in Discussion).

      To improve clarity, I recommend using the term 'SCIN pause' to describe the ex vivo findings, distinguishing them more explicitly from the 'pause response' observed in behaving animals. This distinction would help contextualize the ex vivo findings as potentially contributing to, but not fully representing, the pause response in vivo.

      We did changes in the abstract, bullet points, and main text to clarify that we are not studying the in vivo response.

      Again, it would be helpful to present raw data for pause durations rather than relying solely on ratios. This approach would provide the audience with a clearer understanding of the absolute duration of the burst-dependent pause and allow for better comparison to the ~200 ms pause observed in behaving animals.

      Thank you for your comment. Following your suggestion, we provide the average pause durations for the data shown in Figure 1H (lines 127–130). We opted not to include raw pause durations in the main text for all figures, as this would make the manuscript more difficult to read and, in our view, is unnecessary. The figures already allow readers to estimate absolute durations: in each case, pause durations are shown relative to baseline ISIs in one panel, while the corresponding absolute ISIs are shown side-by-side. This provides a clearer understanding of pause magnitude relative to the cell’s spontaneous firing, which is more informative than absolute values alone, since one would expect a pause to be longer than the average ISI. Please note that baseline ISI are significantly shorter in dyskinetic mice (Figure 5l). Showing the pause duration relative to baseline ISI allows readers to readily compare results across figures regardless of changes in SCIN baseline firing rate.

      Additionally, it is important to note that, in vivo, pause durations are typically inferred from perievent time histograms (PETHs), which represent population averages across many trials. In contrast, in our ex vivo studies, we measured pause duration on a trial-by-trial basis. This approach enables us to analyze how the pause duration varies as a function of the initial burst size in the same neuron—something not typically reported in in vivo studies. As described in the first two paragraphs of the Results, the same SCIN may respond with a different number of spikes in successive trials, and this variability is influenced by factors such as the timing of the last spontaneous spike relative to stimulation onset (Figure 1D–F). We are not aware of studies reporting trial-by-trial analyses of pause duration in behaving animals, particularly in relation to the strength of initial excitation. Therefore, while our slice preparation may yield pause durations that are longer than those observed in vivo, direct comparison to PETH-derived pause durations from behaving animals is not straightforward.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We are appreciative of the reviewers’ and editors’ constructive suggestions of manuscript, which have helped us to improve our manuscript. We have made considerable revisions to our details of data analyses.

      The reason that the reviews did not change is that there were really three central points that led to the "incomplete". These were (1) the fact that there was potentially a selection bias due to double dipping, and (2) there was potentially a time-confound due to the lack of counterbalancing (3) There is confusion about how the modeling was done, but it seems like the modelling was of the complete block (rather than tied to specific events in that block).

      (1) Double dipping

      We appreciate the opportunity to explain our robust safeguards against double-dipping and have provided detailed clarifications regarding the data analyses (pp.11-14).Our study ensures statistical independence between task-related region selection and hypothesis testing through three orthogonal mechanisms:

      (1) Regressor Orthogonality:Statistical Independence Between Selection and Testing

      The selection regressor (group mean activation) was mathematically independent from test regressors (group differences, behavioral scores). This was confirmed through our GLM implementation: First-level: Task vs. rest contrast (β values) for each participant; Second-level: One-sample t-tests (selection) vs. independent group/behavioral tests.

      (2) Multimodal Validation: Complementary Neural and Behavioral Measures

      We employed multiple distinct metrics to provide convergent yet independent validation of effects.

      Neural Measures: Three orthogonal indices assessed different neural dimensions.

      A. Single-brain activation examines neural activity patterns within individual decision-makers,

      B. while within-group neural synchronization (GNS) quantifies the temporal alignment of neural activity across interacting group members during shared decision processes.

      C. Functional connectivity (FC) analyses, by contrast, measure correlated activity between different brain regions within individual participants.

      Behavioral Safeguards: Behavioral metrics were analyzed in independent regressions, avoiding circularity.

      A. Individual performance was based on personal accuracy,

      B. collective performance represented the group-level average accuracy across raters, and

      C. their similarity was quantified as the Euclidean distance between individual and collective scores.

      (3) Statistical Safeguards

      We further ensured independence by applying strict FDR correction at both selection (p < 0.05) and testing stages (p < 0.05). Besides, permutation test was conducted, we tested 1,000 pseudo-group iterations for GNS null distributions.

      Drawing on both classic and latest NIRS (e.g., Jiang et al., 2015; Liu et al., 2023; Stolk et al., 2016; Xie et al., 2023) and NIRS hyperscanning studies (e.g., Liu et al., 2019; P’arnamets et al., 2020; Reinero et al., 2021; Számadó et al., 2021; Solansky, 2011), we performed the data analyses. Below, we provide the details of our data analysis:

      Single-brain activation. To identify task-related brain regions (channels), we used a one-sample t-test based on brain activation data from all participants during the task compared to the baseline (resting state).

      (1)  Data Collection: Each participant had brain activation data (HbO signals measured by fNIRS) during the task (the entire process of reading, sharing, discussing, and decision-making) and the resting state (baseline).

      (2)  Pre-processing: We sought to explore the neural mechanisms that manipulated group identification and its effect on collective performance. Data were preprocessed using the Homer2 package in MATLAB 2020b (Mathworks Inc., Natick, MA, USA). First, motion artifacts were detected and corrected using a discrete wavelet transformation filter procedure. After that, the raw intensity data were converted to optical density (OD) changes. Then, kurtosis-based wavelet filtering (Wav Kurt) was applied to remove motion artifacts with a kurtosis threshold of 3.3 (Chiarelli, Maclin, Fabiani, & Gratton, 2015). Based on a prior multi-brain study of social interactions (Cheng et al., 2022), the output was bandpass filtered using a Butterworth filter with order 5 and cut-offs at 0.01 and 0.5 Hz to remove longitudinal signal drift and instrument noise. Finally, OD data were converted to HbO concentrations.

      (3) Individual-Level Analysis: First, a GLM was used to compute the "task vs. rest" brain activation contrast for each participant [0,1], obtaining each individual's "task effect" value (β value, representing task activation strength).

      (4) Group-Level Analysis: These "task effect" values from all participants were then aggregated, and a one-sample t-test was performed for each brain region (or channel) to determine whether the average activation in that region was significantly greater than 0 (i.e., significantly more active during the task compared to the resting state).

      (5) Task-Related Regions: If the t-test result for a brain region was significant (p < 0.05, FDR-corrected), we considered that region "task-related" and suitable for further analysis.

      (6) Subsequent Tests:

      - Group Comparisons: We examined differences in activation between groups (e.g., high vs. low group identification) using independent t-tests on the same task vs. baseline contrast.

      - Behavioral Correlations: We analyzed relationships between task-related activation (β values) and behavioral scores (e.g., individual performance) using Pearson analyses.

      - Mediation model: We examined the relationship between an individual's perceived group identification and individual performance, which was mediated by task-related activation (β values).

      Within-Group Neural Synchronization (GNS).

      (1) Data Collection and Pre-processing as above

      (2) Calculation: WTC was applied to generate the brain-to-brain coupling of each pair in each triad (Coherence1&2, Coherence 1&3, and Coherence 2&3). Then, three coherence values from three pairs were averaged as the GNS for each triad, that is, GNS = (Coherence 1&2 + Coherence 1&3 + Coherence 2&3) / 3.

      (3) Task-Related Regions: Time-averaged GNS (also averaged across channels in each group) was compared between the baseline session (i.e., the resting phase) and the task session (from reading information to making decisions) using a series of one-sample t-tests. When determining the frequency band of interest, the time-averaged GNS was also averaged across channels. After that, we analyzed the time-averaged GNS of each channel. Then, channels showing significant GNS were regarded as regions of interest and included in subsequent analyses.

      (4) Permutation test: The nonparametric permutation test was conducted on the observed interaction effects on GNS of the real group against the 1,000 permutation samples.

      (5) Subsequent Tests:

      - Group Comparisons: We examined differences in activation between groups (e.g., high vs. low group identification) using independent t-tests on the same task vs. baseline contrast.

      - Behavioral Correlations: The Pearson’s correlation between GNS and collective performance (i.e., calculated by averaging the individual scores assigned by the three raters for each group) was performed.

      -  Mediation model: We examined how GNS mediated the relationship between group identification and collective performance.

      The brain activation connectivity.

      (1) Data Collection and Pre-processing as above

      (2) Calculation: Exploratory Pearson’s correlations between individual performance related HbO and collective performance-related HbO.

      (3) Moderation analysis: Single-brain activation × connectivity → GNS.

      (2) Counterbalancing.

      We sincerely appreciate this valuable methodological insight. Building on prior group decision-making research (De Wilde et al., 2017; Stasser et al., 1992), we refined all stages to enhance experimental control and procedural clarity throughout the process (i.e., a. Reading information, b. Sharing private information, c. Discussing information, d. Decision) (Xie et al., 2023). Importantly, we maintained a fixed task sequence to preserve ecological validity, as this progression mirrors natural group decision-making dynamics.

      While this design choice precludes sequential counterbalancing, several factors mitigate potential temporal confounds: (1) random assignment and uniform task timing across conditions minimize systematic between-group differences; (2) our whole-block GLM approach captures sustained decision-related neural activity rather than phase-specific effects; and (3) We fully acknowledge this limitation and will incorporate a detailed discussion of temporal considerations in the revised manuscript, while noting that our design provides unique advantages for studying naturalistic decision-making processes.

      (3) The modelling was of the complete block

      In our revised manuscript, we have explicitly stated that the analysis was performed at the block level rather than the event level, for the following reasons:

      (1) The hidden profile task is inherently a “group decision-making process” that unfolds dynamically across multiple stages (reading, sharing, discussing, and deciding). Prior research in this paradigm (De Wilde et al., 2017; Stasser & Titus, 1985; Xie et al., 2023) has consistently treated these phases as integrated blocks because the key cognitive and social processes (e.g., information integration, deliberation, and consensus formation) occur over extended interactions rather than discrete events.

      (2) Methodologically, our fNIRS hyperscanning approach requires longer blocks to reliably capture the slow hemodynamic response and the gradual emergence of inter-brain neural synchronization during naturalistic social exchanges (Cui et al., 2012; Liu et al., 2019). Event-related designs, while useful for transient stimuli, are less suited for studying prolonged, interactive decision-making where neural coupling develops over time. Thus, our block-based analysis aligns with both the cognitive demands of the task and the neuroimaging constraints, ensuring robust detection of group-level neural dynamics.

    1. Author response:

      Reviewer 1:

      The selection of heavy metal stress as the condition to investigate is not speculative. The elucidation of the genome from the Palomero toluqueño maize landrace revealed heavy metal effects during domestication (Vielle-Calzada et al., 2009). Differences concordant with its ancient origin identified chromosomal regions of low nucleotide variability that contained the three domestication loci included in this study; all three are involved in heavy-metal detoxification. Results presented in Vielle-Calzada et al 2009 indicated that environmental changes related to heavy metal stress were important selective forces acting on maize domestication. Our study expands those results by starting to elucidate the function of these heavy metal response genes and their role in the evolutionary transition from teosinte parviglumis to maize.

      Although the paper presents some interesting findings, it is difficult to distinguish which observations are novel versus already known in the literature regarding maize HM stress responses. The rationale behind focusing on specific loci is often lacking. For example, a statistically significant region identified via LOD score on chromosome 5 contains over 50 genes, yet the authors focus on three known HM-related genes without discussing others in the region. It is unclear why ZmHMA1 was selected for mutagenesis over ZmHMA7 or ZmSKUs5.

      We appreciate the value of this comment. We will modify the manuscript to clearly show which phenotypic observations are novel and which were previously reported for maize grown under HM stress. The rationale for focusing on three specific loci is related to results from Vielle-Calzada et al. 2009 (see comment above). Although we demonstrated that these three loci show unusual reduction in genetic variability when compared to the rest of chromosome 5 – including a separate class of genes previously identified as being affected by domestication (Hufford et al., 2012) -, we will expand the genetic and expression analysis to all genes included in a region precisely defined via LOD scores of five QTL 1.5-LOD support intervals that overlap with ZmHMA1.Within this region of 1.5 to 2 Mb, we will compare nucleotide variability and gene expression in response to HMs. Contrary to major domestication loci showing a single highly pleiotropic gene responsible for important domestication traits, in this chr.5 genomic region phenotypic effects are due to multiple linked QTLs (Lemmon and Doebley, 2014). The mutagenic analysis of ZmHMA7 and ZmSKUs5 will be included in a different publication; we can anticipate that the results reinforce the conclusions of this study.

      The idea that HM stress impacted gene function and influenced human selection during domestication is of interest. However, the data presented do not convincingly link environmental factors with human-driven selection or the paleoenvironmental context of the transition. While lower nucleotide diversity values in maize could suggest selective pressure, it is not sufficient to infer human selection and could be due to other evolutionary processes. It is also unclear whether the statistical analysis was robust enough to rule out bias from a narrow locus selection. Furthermore, the addition of paleoclimate records (Paleoenvironmental Data Sources as a starting point) or conducting ecological niche modeling or crop growth models incorporating climate and soil scenarios would strengthen the arguments.

      We agree that lower nucleotide diversity values in maize are not sufficient to infer human selection and could be due to other evolutionary processes. As a matter of fact, since these same HM response loci also show unusually low nucleotide variability in teosinte parviglumis (Fig 2), we cannot discard the possibility that natural selection forces related to environmental changes could have affected native teosinte parviglumis populations in the early Holocene, before maize emergence. This possibility supports a speculative model suggesting that phenotypic changes caused by HM stress could have preceded human selection and its consequences, contributing to initial subspeciation; the model is proposed in the “Ideas and Speculation” section of the manuscript. Fortunately, as suggested by the reviewer, a large body of paleoclimatic records and paleoenvironmental data is available for the Trans-Mexican Volcanic Belt  in the Holocene, including geographic regions where the emergence of maize presumably occurred. We will include an extensive analysis of available paleoenvironmental data and discuss it at the light of our current results regarding the effects of HM stress. We are also expanding the physical range of our statistical analysis to cover at least 60 Kb per locus - including neighboring genes for all three loci - to determine if our results could be due to narrow locus selection.

      Despite the interest in examining HM stress in maize and the presence of a pleiotropic phenotype, the assessment of the impact of gene expression is limited. The authors rely on qPCR for two ZmHMA genes and the locus tb1, known to be associated with maize architecture. A transcriptomic analysis would be necessary to 1- strengthen the proposed connection and 2- identify other genes with linked QTLs, such as those in the short arm of chromosome 5.

      Although real-time qPCR is an accurate and reliable approach to assess the expression of specific genes such as ZMHMA1 and Tb1, we will explore the possibility of complementing our analysis with available RNA-seq results that are pertinent for this study (see for example Li et al., 2022 and Zhang et al., 2024) and further explore causative effects between HM stress, Tb1 and ZmHMA1 expression. As also pointed by Reviewer#1, TEs are known to influence gene expression under abiotic stress and RNA-Seq analysis would allow to determine if TE activity could lead to similar outcomes.

      Reviewer #2:

      The authors explored Cu/Cd stress but not a more comprehensive panel of heavy metals, making the implications of this study quite narrow. Some techniques used, such as end-point RT-PCR and qPCR, are substandard for the field. The phenotypic changes explored are not clearly connected with the potential genetic mechanisms associated with them, with the exception of nodal roots. If teosintes in response to heavy metal have phenotypic similarity with modern landraces of maize, then heavy metal stress might have been a confounding factor in the selection of maize and not a potential driving factor. Similar to the positive selection of ZmHMA1 and its phenotypic traits. In that sense, there is no clear hypothesis of what the authors are looking for in this study, and it is hard to make conclusions based on the provided results to understand its importance. The authors do not provide any clear data on the potential influence of heavy metals in the field during the domestication of maize. The potential role of Tb-1 is not very clear either.

      Thank you for these comments. We will clearly emphasize our hypothesis that HM stress was an important factor driving the emergence of maize from teosinte parvglumis through action of HM response genes. A comprehensive panel of heavy metals would not be more accurate in terms of simulating the composition of volcanic soils evolving across 9,000 years in the region where maize presumably emerged. Copper (Cu) and cadmium (Cu) correspond each to a different affinity group for proteins of the ZmHMA family. ZmHMA1 has preferential affinity for Cu and Ag (silver), whereas ZmHMA7 has preferential affinity to Cd, Zn (zinc), Co (cobalt), and Pb (lead). Since these P1b-ATPase transporters mediate the movement of divalent cations, their function remains consistent regardless of the specific metal tested, provided it belongs to the respective affinity group. By applying sublethal concentrations of Cd (16 mg/kg) and Cu (400 mg/kg), we caused a measurable physiological response while allowing plants to complete their life cycle, including the reproductive phase, facilitating a comprehensive analysis of metal stress adaptation.

      Although real-time qPCR is an accurate and reliable approach to assess gene expression, we agree that RNA-Seq results would improve the scope of the analysis and better assess the role of Tb1 in relation to HM response (see comments for Reviewer#1). There are two phenotypic changes clearly connected with the genetic mechanisms involved in the parviglumis to maize transition: plant height and the number of seminal roots (not nodal roots). We will emphasize these phenotypic changes in a modified version of the manuscript. There is a possibility for HM stress to represent a confounding factor in the selection of maize and not a driving factor; however, if such is the case, we think it is rather unlikely that the real driving factor could have acted through mechanisms not related to abiotic stress or HM response. To address the possibility that HM stress was a cofounding factor, we will extensively analyze genetic diversity and gene expression in all loci containing genes mapping in close proximity to peak LOD scores of all 1.5-LOD support intervals located in chromosome 5 and showing pleiotropic effects on domestication traits (Lemmon and Doebley, 2014). These will also include those mapping in close proximity to ZmHMA1. The potential influence of heavy metals in the field is being investigated through the analysis of paleoenvironmental data (see response to Reviewer#1); we will include our results in a modified version of the manuscript.

      We thank both reviewers for their detailed revision the manuscript and their pertinent recommendations to improve its presentation and reading.

      References:

      Hufford, Matthew B., Xun Xu, Joost Van Heerwaarden, Tanja Pyhäjärvi, Jer-Ming Chia, Reed A. Cartwright, Robert J. Elshire, et al. 2012. Comparative population genomics of maize domestication and improvement. Nature Genetics 44(7): 808-11.

      Lemmon Zachary H., Doebley John F. 2014. Genetic dissection of a genomic region with pleiotropic effects on domestication traits in maize reveals multiple linked QTL. Genetics 198(1): 345-353.

      Lin Kaina, Zeng Meng, Williams Darron V., Hu Weimin, Shabala Sergey, Zhou Meixue, Cao Fangbin, et al. 2022. Integration of transcriptome and metabolome analyses reveals the mechanictic basis for cadmium accumulation in maize. iScience 25(12): 105484.

      Vielle-Calzada JP, De La Vega OM, Hernández-Guzmán G, Ibarra-LacLette E, Alvarez-Mejía C, Vega-Arreguín JC, Jiménez-Moraila B, Fernández-Cortés A, Corona-Armenta G, Herrera-Estrella L, Herrera-Estrella A. 2009. The Palomero genome suggests metal effects on domestication. Science 326: 1078.

      Zhang Mengyan, Zhao Lin, Yun Zhenyu, Wu Xi, Wu Qi, et al. 2024. Comparative transcriptome analysis of maize (Zea mays L.) seedlings in response to copper stress. Open Life Sciences 19(1): 20220953.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors build on their previous study that showed the midgut microbiome does not oscillate in Drosophila. Here, they focus on metabolites and find that these rhythms are in fact microbiome-dependent. Tests of time-restricted feeding, a clock gene mutant, and diet reveal additional regulatory roles for factors that dictate the timing and rhythmicity of metabolites. The study is well-written and straightforward, adding to a growing body of literature that shows the time of food consumption affects microbial metabolism which in turn could affect the host.

      We thank the reviewer for the positive comments.

      Some additional questions and considerations remain:

      (1) The main finding that the microbiome promotes metabolite rhythms is very interesting. Which microbiota are likely to be responsible for these effects? The author's previous work in this area may shed light on this question. Are specific microbiota linked to some of the metabolic pathways investigated in Figure 5?

      This is a good question. Although the Drosophila microbiome shows limited diversity, comprised largely of two major families (Acetobacteraceae and Lactobacillaceae), effects on the host could arise from just a subset of species within these families. However, identifying these would require inoculating microbiome-free flies with single and mixed combinations of species and conducting metabolomics to examine cycling of each of the three categories of metabolites we studied-- primary, lipids and biogenic amines (each of these may respond differently to different species). We believe this is beyond the scope of this manuscript, which is focused on how cycles in these different types of metabolites change in the context of the microbiome, the circadian clock and different diets.

      (2) TF increases the number of rhythmic metabolites in both microbiome-containing and abiotic flies in Figure 1. This is somewhat surprising given that flies typically eat during the daytime rather than at night, very similar to TF conditions. I would have assumed that in a clock-functioning animal, the effect of restricting food availability should not make a huge difference in the time of food consumption, and thus downstream impacts on metabolism and microbiome. Can the authors measure food intake directly to compare the ad-lib vs TF flies to see if there are changes in food intake? Would restricting feeding to other times of day shift the timing of metabolites accordingly?

      Previous studies have indicated that there is no significant difference in food consumption between ad-lib and TF flies (Gill et al., 2015; Villaneuva et al 2019). We also found that the presence of a microbiome does not alter total food consumption when compared with germ-free flies (Zhang et al, 2023, and current manuscript). Though flies primarily feed during the day, some food consumption occurs at night (i.e the feeding rhythm is not tight) and so restricting food to the daytime can increase metabolite cycling. Restricting feeding to other times of day is expected to shift metabolite cycling. We previously showed that this shifts transcript cycling (Xu et al, Cell Metabolism 2011)

      (3) In Figure 2, Per loss of function reveals a change in the phase of rhythmic metabolites. In addition, the effect of the microbiome on these is very different = The per mutants show increased numbers of rhythmic metabolites when the microbiome is absent, unlike the controls. Is it possible that these changes are due to altered daily feeding rhythms in per mutants? Testing the time and amount of food consumed by the per mutant flies would address this question. Would TF in the per mutants rescue their metabolite rhythms and make them resemble clock-functioning controls?

      We previously showed that per<sup>01</sup> flies lose feeding rhythms in DD and LD conditions, but consume a lot more food (Barber et al, 2021). Given that locomotor rhythms are maintained in per<sup>01</sup> in LD (Konopka and Benzer 1971), these rhythms or other rhythms driven by LD cues likely account for the maintenance of metabolite rhythms. And the increased food consumption may contribute to the changes observed. To address the reviewer’s question about the microbiome, we assayed feeding rhythms in per<sup>01</sup> in the absence/presence of a microbiome on the diets that haven’t been tested before (high sugar and high protein diet). Surprisingly, feeding was rhythmic on a high protein diet, regardless of whether a microbiome was present (new Figure S10). On a high sugar diet, feeding appears to be somewhat rhythmic in the presence of a microbiome (although not significant) and not when the microbiome is absent. The same is true in iso31 controls, and in all cases, the phase is the same. Despite the similar effect of the microbiome on feeding rhythms in wild type and per<sup>01</sup>, the effect on cycling is very different. Thus, feeding rhythms do not appear to explain the effects of the microbiome on metabolite cycling in per<sup>01</sup>.

      (4) The calorie content of each diet-normal vs high protein vs high-sugar are different. The possibility of a calorie effect rather than a difference in nutrition (protein/carbohydrate) should be discussed. Another issue worth considering is the effect of high protein/sugar on the microbiome itself. While the microbiome doesn't seem to affect rhythms in the high-protein diet, the high-sugar diet seems highly microbiome-dependent in Supplementary Fig 8C vs D. Does the diet impact the microbiome and thus metabolite rhythmicity downstream?

      Thank you for these good suggestions. We have added to the discussion the possibility that caloric content, rather than nutrition (protein/carbohydrate), affects metabolite cycling in flies fed normal vs. high-protein vs. high-sugar diets. We have also discussed the possibility that effects of different diets on metabolite cycling are mediated by changes in the microbiome. We cite papers that show effects of diet on microbiomes.

      (5) It would be good if a supplementary table was provided outlining the specific metabolites that are shown in the radial plots. It is not clear if the rhythms shown in the figures refer to the same metabolites peaking at the same time, or rather the overall abundance of completely different metabolites. This information would be useful for future research in this area.

      We have added a supplementary Table 1-21 which includes all the raw metabolites.

      Reviewer #2 (Public Review):

      Summary:

      The paper addresses several factors that influence daily changes in concentration of metabolites in the Drosophila melanogaster gut. The authors describe metabolomes extracted from fly guts at four time-points during a 24-hour period, comparing profiles of primary metabolites, lipids, and biogenic amines. The study elucidates that the percentage of metabolites that exhibit a circadian cycle, peak phases of their increased appearance, and the cycling amplitude depends on the combination of factors (microbiome status, composition or timing of the diet, circadian clock genotype). Multiple general conclusions based on the data obtained with modern metabolomics techniques are provided in each part of the article. Descriptive analysis of the data supports the finding that microbiome increases the number of metabolites for which concentration oscillates during the day period. Results of the experiments show that timed feeding significantly enhanced metabolite cycling and changed its phase regardless of the presence of a microbiome. The authors suggest that the host circadian rhythm modifies both metabolite cycling period and the number of such metabolites.

      Strengths:

      The obvious strength of the study is the data on circadian cycling of the detected 843, 4510, and 4330 total primary metabolites, lipids, and biogenic amines respectively in iso31 flies and 623, 2245, and 2791 respective metabolites in per<sup>01</sup> mutants. The comparison of the abundance of these metabolites, their cycling phase, and the ratio of cycling/non-cycling metabolites is well described and illustrated. The conditions tested represent significant experimental interest for contemporary chronobiology: with/without microbiota, wild-type/mutant period gene, ad libitum/time-restricted feeding, and high-sugar/high-protein diet. The authors conclude that the complex interaction between these factors exists, and some metabolic implications of combinations of these factors can be perceived as reminiscent of metabolic implications of another combination ("...the microbiome and time-restricted feeding paradigms can compensate for each other, suggesting that different strategies can be leveraged to serve organismal health"). Enrichment analysis of cycling metabolites leads to an interesting suggestion that oscillation of metabolites related to amino acids is promoted by the absence of microbiota, alteration of circadian clock, and time-restricted feeding. In contrast, association with microbiota induces oscillation of alpha-linolenic acid-related metabolites. These results provide the initial step for hypothesising about functional explanations of the uncovered observations.

      We thank the reviewer for summarizing the contributions made by this manuscript.

      Weaknesses:

      Among the weaknesses of the study, one might point out too generalist interpretations of the results, which propose hypothetical conclusions without their mechanistic proof. The quantitative indices analysed are obviously of particular interest, however are not self-explaining and exhaustive. More specific biological examples would add valuable insights into the results of this study, making conclusions clearer. More specific comments on the weaknesses are listed below:

      (1) The criterion of the percentage of cycling metabolites used for comparisons has its own limitations. It is not clear, whether the cycling metabolites are the same in the guts with/without microbiota, or whether there are totally different groups of metabolites that cycle in each condition. GO enrichment analysis gives only a partial assessment, but is still not quantitative enough.

      Microbiome-containing flies and germ-free flies do share some cycling metabolites. Figure 6 provides GO analysis for the pathways enriched in each condition, and Figure S6 shows quantitative data on the number that overlap between different conditions. We have also expanded discussion of specific cycling groups (e.g. amino acid metabolism) to indicate that different metabolites of the same pathway may cycle under different conditions. In addition, we have added detailed information for all cycling metabolites in Supplemental Tables 1-21.

      (2) The period of cycling data is based on only 4 time points during 24 hours in 4 replicates (>200 guts per replicate) on the fifth day of the experiment (10-12-day-old adults). It does not convincingly prove that these metabolites cycle the following days or more finely within the day. Moreover, it is not clear how peaks in polar histogram plots were detected in between the timepoints of ZT0, ZT6, ZT12, and ZT18.

      We acknowledge these limitations, but note that these experiments are very challenging because of the amount of tissue/guts needed for each data point and the time it takes to dissect each gut. Thus, getting more closely spaced time points is difficult. And we believe the detection of daily peaks with four biological replicates provides good evidence for cycling. The peaks in polar histogram plots are based on the parameter of JTK_adjphase when conducting JTK cycle analysis; as the data are averaged across replicates, the average can sometimes fall in between two assayed time points. Details can be found in the attached Supplementary Tables.

      (3) Average expression and amplitude are analysed for groups of many metabolites, whereas the data on distinct metabolites is hidden behind these general comparisons. This kind of loss of information can be misleading, making interpretation of the mentioned parameters quite complicated for authors and their readers. Probably more particular datasets can be extracted to be discussed more thoroughly, rather than those general descriptions.

      We analyzed groups of metabolites, dividing them into primary metabolites, lipids and biogenic amines, to extract general take-home messages and also to simplify the presentation. Figure 6 demonstrates specific pathways whose cycling is affected in each condition assayed. And Figure S11 shows examples of cycling metabolites under different conditions. To highlight a dataset that is altered under different conditions, we expanded our discussion of amino acid metabolism, and show how the specific metabolites that cycle in this pathway may vary from one condition to another (Figure S11). For more quantitative data on individual metabolites, we now provide supplementary tables that display all the cycling metabolites. These include those uniquely cycling in one group, those shared between both two groups, and those uniquely cycling in the other group.

      (4) The metabolites' preservation is crucial for this type of analysis, and both proper sampling plus normalisation require more attention. More details about measures taken to avoid different degradation rates, different sizes of intestines, and different amounts of microbes inside them will be beneficial for data interpretation.

      We were careful to control for gut size and to preserve the samples so as to minimize degradation (We placed all the fly samples on ice during collection, and the entire dissection process was also conducted on ice. Once the gut sample collection was completed, we immediately transferred the samples to dry ice for storage. After we finished collecting all the samples, we stored them at -80°C). In general, gut sizes varied in the following order: females fed high-protein diets >females fed normal chow diets> female flies fed high-sugar diets. As the metabolomic facility suggested 10mg samples for each biological repeat, we dissected at least 180 female guts from flies fed high-protein diets, 200 female guts from flies fed normal chow diets, and at least 250 female guts from flies fed high-sugar diets. Also, as gut sizes were smaller in sterile flies, relative to microbiome-containing flies, on a high protein diet, we collected 200 guts from sterile flies under these conditions. Finally, the service that conducted the metabolomics (UC Davis) provided three detailed files to describe the extraction process for primary metabolites, lipids, and biogenic amines, respectively. We have submitted these files as supplemental materials in the revised manuscript.

      (5) The data in the article describes formal phenomena, not directly connected with organism physiology. The parameters discussed obviously depend on the behavior of flies. Food consumption, sleep, and locomotor activity could be additionally taken into account.

      These are very interesting suggestions. Previous results indicated that microbiome-containing flies do not change their total food consumption or exhibit changes in feeding rhythms when compared with germ-free flies (Zhang et al., 2023), which indicates that microbiome-mediated metabolite cycling is independent of feeding rhythms. As noted above, we examined the contribution of feeding to metabolite cycling in per<sup>01</sup> flies, and did not see an obvious link. We also assayed feeding rhythms and overall food consumption in wild type under AS and AM conditions and on different diets, and likewise could not account for changes in metabolite cycling based on altered food intake (new Figure S10). We acknowledge that behavior, including locomotor activity and sleep, could indeed influence metabolite cycling. We have added discussion of this.

      (6) Division of metabolites into three classes limits functional discussion of found differences. Since the enrichment analysis provided insights into groups of metabolites of particular interest (for example, amino acid metabolism), a comparison of their cycling characteristics can be shown separately and discussed.

      The intent of this work was to provide an overall account of changes in metabolite cycling that occur under different conditions/diets/genotypes. We have expanded the discussion of amino acid metabolism and show how different metabolites of this pathway cycle under different conditions (Figure S11). We believe that discussion/analysis of other specific groups would be good follow-up studies, which can build upon this work. Detailed datasets about all cycling metabolites are provided in Table S1-12.

      Reviewer #3 (Public Review):

      Summary:

      The authors. sought to quantify the influence of the gut microbiome on metabolite cycling in a Drosophila model with extensive metabolomic profiling over a 24-hour period. The major strength of the work is the production of a large dataset of metabolites that can be the basis for hypothesis generation for more specific experiments. There are several weaknesses that make the conclusions difficult to evaluate. Additional experiments to quantify food intake over time will be required to determine the direct role of the microbiome in metabolite cycling.

      Strengths:

      An extensive metabolomic dataset was collected with treatments designed to determine the influence of the gut microbiome on metabolite circadian cycling.

      Weaknesses:

      (1) The major strength of this study is the extensive metabolomic data, but as far as I can tell, the raw data is not made publicly available to the community. The presentation of highly processed data in the figures further underscores the need to provide the raw data (see comment 3).

      The raw data have been submitted to the public metabolite database. https://www.ebi.ac.uk/metabolights/. (ID: MTBLS8819)

      In addition, the normalized metabolite data have been added in the supplemental materials.

      (2) Feeding times heavily influence the metabolome. The authors use timed feeding to constrain when flies can eat, but there is no measurement of how much they ate and when. That needs to be addressed.

      Since food is the major source of metabolites, the timing of feeding needs to be measured for each of the treatment groups. In the previous paper (Zhang et al 2023 PNAS), the feeding activity of groups of 4 male flies was measured for the wildtype flies. That is not sufficient to determine to what extent feeding controls the metabolic profile of the flies. Additionally, timed feeding opportunities do not equate to the precise time of feeding. They may also result in dietary restriction, leading to the loss of stress resistance in the TF flies. The authors need to measure food consumption over time in the exact conditions under which transcriptomic and metabolomic cycling are measured. I suggest using the EX-Q assay as it is much less effort than the CAFE assay and can be more easily adapted to the rearing conditions of the experiments.

      As noted above, we have now added considerable additional data on feeding and feeding rhythms in microbiome-containing and sterile wild type and per<sup>01</sup> flies on different diets (Figure S10). Our previous study, using the EX-Q assay method, found no differences in either total food consumption or feeding rhythms between microbiome-containing flies and germ-free flies (Zhang et al., 2023). Also, previous work has demonstrated that there is no significant difference in food consumption between ad-lib and TF flies (Villaneuva et al 2019).

      (3) The data on the cycling of metabolites is presented in a heavily analyzed form, making it difficult to evaluate the validity of the findings, particularly when a lack of cycling is detected. The normalization to calculate the change in cycling due to particular treatments is particularly unclear and makes me question whether it is affecting the conclusions. More presentation of the raw data to show when cycling is occurring versus not would help address this concern, as would a more thorough explanation of how the normalization is calculated - the brief description in the methods is not sufficient.

      For instance, the authors state that "timed feeding had less effect on flies containing a microbiome relative to germ-free flies." One alternative interpretation of that result is that both treatments are cycling but that the normalization of one treatment to the other removes the apparent effect. This concern should be straightforward to address by showing the raw data for individual metabolites rather than the group.

      We have added Supplement Table1-21 that includes detailed information on metabolite identity and data processing. Also, we have included the raw data, encompassing all the cycling metabolites, in the Supplement Table1-21.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The abstract could be rewritten to clarify. I found the last part of the introduction better but struggled to understand the abstract.

      We apologize for this. The abstract was indeed quite dense; we have revised it for clarity.

      (2) Supplementary Figure 8 could be moved to the main text. Since all the comparisons are on one page it is much easier to see the similarities and differences in the conditions tested.

      We have moved Supplementary Figure 8 to main Figure 5.

      (3) The sex and age of the flies used in all experiments should be clarified. The authors mention female guts are collected in the methods (line 111) but it is not clear if this is throughout.

      All guts used in this study were female. We have clarified this in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Some minor notes that might be improved:

      (1) The order of obtaining eggs without microbiota might be different (first - bleaching, second - sterilisation with ethanol). Otherwise, it is not clear why dechorionating is needed after sterilisation.

      Protocols for generating axenic flies vary. We used the method Feltzin et al reported in 2019: “For newborn fly embryos (<12 hours). First, cleanse and sterilize any leftover agar from collection plates using 100% ethanol, second, dechorionate the fly embryos with 10% bleach, and then immediately rinse three times in germ-free PBS”.

      (2) References for the resources used might be provided (MetaboAnalyst5.0, JTK_CYCLEv3.1).

      We have added the reference for MetaboAnalyst5.0, JTK_CYCLEv3.1 (Pang et al., 2022)

      (3) References or justification for the chosen composition of the diets might be useful (standard diet, high-protein diet, high-sugar diet).

      We have added the references (Bedont et al, 2021, Morris et al, 2021).

      (4) Justification of the choice of iso31 line and per<sup>01</sup> mutant might be important.

      iso31 is the standard wild type line we use in the laboratory. To understand the role of the endogenous clock in microbiome-mediated metabolite cycling, we chose the classical canonical clock mutant per<sup>01</sup> as this displays fewer non-circadian phenotypes seen. For instance, loss of transcriptional activators of the clock produces additional effects (e.g. hyperactivity), likely because of the effect it has on overall expression of many genes. We have added this explanation to the manuscript.  

      (5) Abbreviation decoding might be introduced when it is used for the first time in the text (line 240 - TM, TS).

      We apologize for this omission and have rectified it. Thanks

      TM (timed feeding microbiome-containing flies)

      TS (timed feeding germ-free flies)

      (6) The term "germ-free" is recommended to be avoided in the context of the paper (germ-free = infertile for animals). It might be replaced with the terms "without microbiota" or "germ-free" for example.

      Given that the reviewer recommends use of the word “germ-free” in the second sentence, we assume that the first sentence was intended to say we should avoid “sterile” (and not “germ-free”). We have edited to “germ-free” in the manuscript.

      (7) When only one diet is assumed, it might be better to say so (line 324 - "the protein diet" instead of "protein diets").

      Sorry, we have edited accordingly.

      (8) Too many speculative conclusions are confusing (line 476 - what does it mean for "just as” - how exactly similar; line 477 - what kind of "compensation"; line 503 - how exactly it is related to "metabolic homeostasis" and to which kind of homeostasis).

      “just as” was not intended to refer to any degree of similarity but only to the fact that amino acid cycling occurs in the absence of a clock, as it does in the absence of a microbiome. We speculate that this “compensates” for something that is normally conferred by the clock and the microbiome, for instance maybe the clock drives cycling of a microbiome component that is important for protein metabolism. In the absence of either the clock or the microbiome, this is compensated for by amino acid cycling. We have clarified in the text.

      We used the term "metabolic homeostasis" to reflect steady maintenance of metabolic health via interaction and modulation of different factors. As in the case of the example given above for amino acid metabolism, a perturbation of one process might produce a change in another to optimize metabolism. We have changed the wording in the text to better convey our message (lines 576-579)

      (9) Particular examples of metabolites might be beneficial for supporting conclusions (a figure which shows, for instance, the specific data on linolenic acid: in which conditions it cycles, in which not, what is the period of cycling, what are the exact expression and JTK_amplitude values).

      All cycling metabolites, including linolenic acid, are now included in the supplemental tables.

      Reviewer #3 (Recommendations For The Authors):

      (1) The level of biological replication is unclear for the metabolomic experiments. I see that 200 guts per sample were collected and 4 repeat samples were made at each timepoint. Are these 4 biological replicates for each treatment (AS, AM, TS, TM) at each timepoint? 5 replicates are standard in metabolomics. Please be more explicit in the methods.

      There are 4 biological replicates for each time point of each of the 4 treatments. The metabolomics service recommended 4-6 replicates, so we prepared 4 replicates for each sample. As noted above, these preparations are quite difficult. We found that in general the biological replicates do not differ significantly from each other.

      (2) Wolbachia can have a significant influence on fly physiology. How was this variable addressed? Were flies checked for Wolbachia?

      All the flies are Wolbachia-free, as in our previous study (Zhang et al., 2023). Initially, we treated the flies with 1 mM kanamycin (11815024, ThermoFisher) to remove bacteria. Afterwards, we repopulated the flies with a Wolbachia-free microbiome containing Lactobacillus and Acetobacter bacteria from a medium previously occupied by other flies.

      (3) In Results section 1, the authors report changes in the percentages of metabolites that are cycling, but no statistical test is presented to show that these changes are indeed significant. The authors need to report statistics on the percentages of cycling metabolites.

      We used statistical tests, specifically JTK cycle, to determine cycling of each metabolite. The P value for cycling of each metabolite in this test is computed on the basis of all the biological replicates and all time points. Metabolites that showed a significant P value contribute to the percent cycling. As a result, there is only one value for the percentage cycling in each condition. Thus, statistical analysis cannot be done.

      (4) The authors report that the species proportions in the gut microbiome don't cycle, but do absolute CFU counts? By many accounts (see e.g. Blum et al 2013 mBio), the gut microbiome in lab flies is what they recently ate from the food. The abundance of bacteria in the gut would then be directly tied to the timing of feeding. Timed feeding should produce oscillations in individual flies, so individual flies should be analyzed.

      We assume the reviewer is suggesting that rhythmic feeding could result in rhythmic abundance of the microbiome, which could contribute to cycling. This is indeed a possibility and one we now discuss in the manuscript. Thanks! Analysis of the gut microbiome in individual flies would require quantitation of CFUs from single guts. We do not believe a single gut would yield enough material.

      (5) Line 252: the ZT9 peak could just be due to feeding and digestion.

      This is possible. We now acknowledge this

      (6) What is the expectation for metabolite cycling in per mutant flies? Shouldn't per mutant flies have no cycling on average? Does the cycling suggest there is an external factor causing cycling?

      Under light-dark conditions, metabolite cycling in per mutant flies may be driven by light: dark cues, either directly or through other light-driven rhythms e,g. locomotor activity is rhythmic in per<sup>01</sup> flies maintained in LD.

      (7) Performing food intake analysis on each of the treatments would provide critical data to address the direct role of the microbiome in metabolite cycling.

      As noted above, we now provide considerable additional data on food intake at different times of day in microbiome-containing and germ-free wild type and per<sup>01</sup> flies on different diets (Figure S11). Overall, our data indicate that food intake or feeding rhythms do not account for the effects we report here.

      (8) Please be more explicit about replication in the methods and figure legends.

      We have added n=4 for each condition in the methods and figure legends.

      (9) There are numerous minor grammatical errors such as incorrect verb tenses and usage of articles. Additional proofreading could correct these.

      Sorry! We have done a thorough proofreading and made corrections.

    1. Author response:

      Reviewer #1 (Public Review):

      Insects, such as bees, are surprisingly good at recognizing visual patterns. How they achieve this challenging task with limited computational resources is not fully understood. Based on the actual bee's behaviour and visual circuit structure, MaBouDi et al. constructed a biologically plausible model where the circuit extracts essential visual features from scanned natural scenes. The model successfully discriminated a variety set of visual patterns as the actual bee does. By implementing a type of Hebb's rule for non-associative learning, an early layer of the model extracted orientational information from natural scenes essential to pattern recognition. Throughout the paper, the authors provided intuitive logic for how the relatively simple circuit could achieve pattern recognition. This work could draw broad attention not only in visual neuroscience but also in computer vision.

      We appreciate your positive feedback.

      However, there are a number of weaknesses in the manuscript. 1) The authors claim that the model is inspired by micromorphology, yet it does not rigorously follow the detailed anatomy of the insect brain revealed as of now. 2) Some claims sound a bit too strong compared to what the authors demonstrated with the model. For example, when the authors say the model is minimal, the authors simply investigated how many lobula neurons are required for pattern discrimination in the model. However, the manuscript appears to use this to claim that the presented model is the minimal one required for visual tasks. 3) It lacks explanations of what mechanisms in the model could discriminate some patterns but not others, making the descriptions very qualitative. 4) The authors did not provide compelling evidence that the algorithm is particularly tuned to natural scenes.

      We appreciate the reviewer's constructive feedback and have revised the manuscript to clarify and strengthen our claims. Below, we address each of the concerns raised:

      (1) The model does not rigorously follow the detailed anatomy of the insect brain

      We acknowledge that our model is an abstraction rather than a direct reproduction of the full micromorphology of the insect brain. The goal of our study was not to replicate every anatomical feature but rather to extract the core computational principles underlying active vision, based on the functional activity of insect brain. Although the recent connectome studies provide detailed structural maps, they do not fully capture the functional dynamics of sensory processing and behavioural outcomes. Our model integrates key neurobiological insights, including the hierarchical structure of the optic lobes, lateral inhibition in the lobula, and non-associative learning mechanisms shaping spatiotemporal receptive fields.

      However, to address this concern, we have revised the introduction and discussion to explicitly acknowledge the model’s level of abstraction and its relationship to the known anatomy of the insect visual system. Furthermore, we highlight future directions in which connectomic data could refine our model.

      (2) Strength of claims regarding minimality of the model

      We appreciate the reviewer’s concern regarding the definition of a "minimal" model. Our intention was not to claim that this model represents the absolute minimal neural architecture for visual learning task but rather that it identifies a minimal set of necessary computational elements that enable pattern discrimination in insects. To clarify this, we have refined the text to ensure that our conclusions about minimality are explicitly tied to the specific constraints and assumptions of our model. For instance, in the revised manuscript, we emphasise that our findings demonstrate how the number of lobula neurons, inhibitory lateral connection, non-associative learning model, affect neural representation and discrimination performance, rather than establishing an absolute lower bound on the complexity required for visual processing in insects.

      (3) Mechanistic explanations for pattern discrimination

      Thank you for highlighting this point. We have conducted a more detailed analysis of the model’s response to different patterns and expanded our discussion of the underlying mechanisms. To address this, we have refined our explanation of how different scanning strategies and temporal integration mechanisms contribute to neural selectivity in the lobula and overall discrimination performance. Specifically:

      - Figure 3 illustrates how the model benefits from generating sparse coding in the visual network, leading to improved performance in pattern recognition tasks.

      - Figure 5 now includes a more detailed explanation of how different scanning strategies influence the selectivity and separability of lobula neuron responses. Additionally, we provide further analysis of why the model successfully discriminates certain patterns (e.g., simple oriented bars) but struggles with more complex spatially structured quadrant-based patterns.

      - We elaborate on how sequential sampling, temporal coding, and lateral inhibition collectively shape neural representations, enabling the model to distinguish between visual stimuli effectively.

      These refinements provide a clearer mechanistic explanation of the model’s strengths and limitations, ensuring a more comprehensive understanding of its function.

      (4) Evidence that the model is tuned to natural scenes

      We have revised the manuscript to provide stronger support for the claim that the model is particularly adapted to natural scenes. Specifically:

      - Figure 3 demonstrates that training on natural images leads to sparse, decorrelated responses in the lobula, a hallmark of efficient coding observed in biological systems.

      - Supplementary Figure 2-1B shows that training with shuffled images fails to produce structured receptive fields, reinforcing that the statistical structure of natural images is crucial for efficient learning.

      - We now explicitly discuss how the receptive fields emerging from non-associative learning align with known orientation-selective responses in insect visual neurons, supporting the idea that the model is optimised for processing natural visual inputs (Figures 3, 6) and discussion section.

      Taken together, these revisions clarify how the model captures fundamental principles of insect vision without making overly strong claims about biological fidelity. We thank the reviewer for these insightful comments; addressing them has significantly strengthened the clarity and depth of our manuscript.

      Reviewer #2 (Public Review):

      This study is inspired by the scanning movements observed in bees when performing visual recognition tasks. It uses a multilayered network, representing stages of processing in the visual lobes (lamina, medulla, lobula), and uses the lobula output as input to a model of associative learning in the mushroom body (MB). The network is first trained with short "scanning" sequences of natural images, in a non-associative adaptation process, and then several experimental paradigms where images are rewarded or punished are simulated, with the output of the MB able to provide the appropriate discriminative decisions (in some but not all cases). The lobula receptive fields formed by the initial adaptation process show spatiotemporal tuning to edges moving at particular orientations and speeds that are comparable to recorded responses of such neurons in the insect brain.

      There are two main limitations to the study in my view. First, although described (caption fig 1) as a model "inspired by the micromorphology" of the insect brain, implying a significant degree of accuracy and detail, there are many arbitrary features (unsupported by current connectomics). For example, the strongly constrained delay line structure from medulla to­ lobula neurons, and the use of a single MB0N that has input synapses that undergo facilitation and decay according to different neuromodulators. Second, while it is reasonable to explore some arbitrary architectural features, given that not everything is yet known about these pathways, the presented work does not sufficiently assess the necessity and sufficiency of the different components, given the repeated claims that this is the "minimal circuit" required for the visual tasks explored.

      We appreciate your feedback and have refined the manuscript to clarify model design choices and address concerns regarding minimality.

      (1) Model Architecture and Functional Simplifications<br /> While our model is inspired by insect visual system, it is not intended as an exact anatomical reconstruction but rather a functional abstraction to uncover key computational principles of active vision and visual learning. The delay-line structure and simplified MBON implementation were deliberate choices to enable spatiotemporal encoding and associative learning without overcomplicating the model. As connectome data alone do not fully reveal functional relationships, our approach serves as a hypothesis-generating tool for future neurobiological studies.

      (2) Necessity and Sufficiency of Model Components<br /> We have removed overstatements about minimality and now clarify that our model represents a functional circuit rather than the absolute minimal configuration. Additionally, we conducted new control experiments assessing the influence of different model components, and further justifying key mechanisms such as spatiotemporal encoding and lateral inhibition.

      For a more detailed discussion of these revisions and improvements, please refer to our response to the Journal, above.

      Regarding the mushroom body (MB) learning model, it is strange that no reference is made to recent models closely tied to connectomic and other data in fruit flies, which suggests separate MBONS encode positive vs. negative value; that learning is not dependent on MB0N activity (so is not STDP); that feedback from MBONs to dopaminergic signalling plays an important role, etc. Possibly the MB of the bee operates in a completely different way to the fly, but the presented model relies on relatively old data about MB function, mostly from insects other than bees (e.g. locust) so its relationship to the increasingly comprehensive understanding emerging for the fly MB needs to be clarified. It is implied that the complex interaction of the differential effects of dopamine and octopamine, as modelled here, are required to learn the more complex visual paradigms, but it is not actually tested if simpler rules might suffice. Also, given previous work on models of view recognition in the MB, inspired by bees and ants, it seems plausible that simply using static 25×25 medulla activity as input to produce sparse activity in the KCs would be sufficient for MB0N output to discriminate the patterns used in training, including the face stimulus. Thus it is not clear whether the spatiotemporal input and the lobula encoding are necessary to solve these tasks.

      Thank you for your suggestion. The primary focus of this study was not to uncover the exact mechanisms of associative learning in the mushroom body (MB) but rather to evaluate the role of lobula output activity in active vision. The associative learning component was included as a simplified mechanism to assess how the spatiotemporal encoding in the lobula contributes to visual pattern learning.

      We conducted a detailed analysis of lobula neuron activity, focusing on sparsity, decorrelation, and selectivity to demonstrate how the visual system extracts compact yet relevant signals before reaching the learning centre (see Figure 5). Theoretical predictions based on these findings suggest that such encoding enhances pattern recognition performance. While selecting this possible associative learning mechanism allowed us to explicitly evaluate this capability, it also facilitated comparison with previous active vision experiments and assessed the influence of different components on bee behaviour.

      We acknowledge that recent Drosophila connectomics studies suggest alternative MB architectures, including separate MBONs encoding positive vs. negative values, learning mechanisms independent of MBON activity, and feedback from MBONs to dopaminergic pathways. However, visual learning mechanisms in the MB remain poorly characterised, especially in bees, where the functional relevance of different MBON configurations is still unclear. The decision to simplify the MB learning process was intentional, allowing us to prioritise model interpretability over anatomical replication.

      These simplifications have been explicitly discussed in the revised manuscript, where we suggest future directions for integrating more biologically detailed MB models to enhance our understanding of active visual learning in insects. For a broader discussion of our rationale for prioritising computational simplifications over direct neurobiological replication, please refer to our response to the Journal, above.

      It is also difficult to interpret the range of results in fig 3. The network sometimes learns well, sometimes just adequately (perhaps comparable to bees), and sometimes fails. The presentation of these results does not seem to identify any coherent pattern underlying success or failure, other than that the ability to generalise seems limited. That is, recognition (in most cases) requires the presentation of exactly the same stimulus in exactly the same way (same scanning pattern, distance and speed). In particular, it is hard to know what to conclude when the network appears able to learn some "complex patterns" (spirals, faces) but fails to learn the apparently simple plus vs. multiplication symbol discrimination if it is trained and tested with a scan passing across the whole pattern instead of just the lower half.

      We acknowledge that the variability in the model’s performance across different tasks and conditions required a clearer explanation. In the revised manuscript, we have analysed the underlying factors influencing success and failure in greater detail and have expanded the discussion on the model’s generalisation limitations.

      To address this, we have conducted new control experiments and deeper analyses, now presented in Figure 5, Figure 6F, which illustrate how scanning conditions impact recognition performance. Specifically, we examine why the model can successfully learn complex patterns (e.g., spirals, faces) but struggles with apparently simpler tasks, such as distinguishing between a plus and multiplication symbol when scanning the entire pattern rather than just the lower half. Our results suggest that spatially constrained scanning enhances discriminability, while whole-pattern scanning reduces selectivity due to weaker and less sparse feature encoding in lobula neurons.

      We have also clarified in the Discussion section that while the model demonstrates robust pattern learning under specific conditions, its ability to generalise remains limited when tested with compex patterns (Figure 6F. Further investigation is needed to explore how adaptive scanning strategies or hierarchical processing might improve generalisation.

      In summary, although it is certainly interesting to explore how active vision (scanning a visual pattern) might affect the encoding of stimuli and the ability to learn to discriminate rewarding stimuli, some claims in the paper need to be tempered or better supported by the demonstration that alternative, equally plausible, models of the visual and mushroom body circuits are not sufficient to solve the given tasks.

      There is limited knowledge in the literature regarding the neural correlates of visual-related plasticity in the mushroom body (MB). The majority of our current understanding of the MB is derived from studies on olfactory learning, particularly in Drosophila, which does not provide sufficient data to directly implement or comprehensively compare alternative models for visual learning.

      However, the primary focus of our study is on active vision and how spatiotemporal signals are encoded in the insect visual system. Rather than aiming to replicate a detailed biological model of MB function, we intentionally employed a simplified associative learning network to investigate how neural activity emerging from our visual processing model can support pattern recognition. This approach also allows us to compare model performance with bee behaviour, drawing on insights from previous experimental work on active vision in bees.

      We now discuss the limitations of our approach and the rationale for selectively incorporating specific neural network components in lines 652-677. Additionally, we have provided further justification (see responses above) for prioritising a simplified model, rather than attempting to mimic a highly detailed, yet currently unverified, alternative learning circuit. These clarifications help ensure that our claims are appropriately tempered while still demonstrating the functional relevance of our model.

      Reviewer #3 (Public Review):

      In this manuscript, the authors use the data collected and observations made on bees' scanning behaviour during visual learning to design a bio-inspired artificial neural network. The network follows the architecture of bees visual systems, where photoreceptors project into the lamina, then the medulla, medulla neurons connect to a set of spiking neurons in the lobula. Lobula neurons project to kenyon cells and then to MBON, which controls reward and punishment. The authors then test the performance of the network in comparison with real bee data, finding it to perform well in all tasks. The paper attempts to reproduce a living organism network with a practical application in mind, and it is quite impressive! I appreciate both the potential implications for the understanding of biological systems and the applications in the development of autonomous agents, making the paper absolutely worth reading.

      Thank you for your positive feedback and appreciation of our work.

      However, I believe that the current version somewhat lacks in clarity regarding the methodology and in some of the keywords used to describe the model.

      Definitions:<br /> Throughout the manuscript, the authors use some key terminology that I believe would benefit from some clarification.<br /> The generated model is described in the title and once in the introduction as "neuromorphic". The model is definitely bio-inspired, but at least in some layers of the neural network, the model is built very differently from actual brain connectivity. Generally, when we use the term neuromorphic we imply many advantages of neural tissue, like energy efficiency, that I am not sure the current model is achieving. I absolutely see how this work is going in that direction, and I also fundamentally agree with the choice of terminology, but this should be clearly explained to not risk over-implications

      We appreciate the reviewer’s feedback and acknowledge the importance of clarifying key terminology in our manuscript. As outlined in our response to the Journal, we intentionally simplified the model to focus on understanding the core computational processes involved in active vision rather than precisely replicating the full complexity of insect neural circuits (see other reasons for simplification in the manuscript). This simplification allows us to systematically analyse the influence of specific components underlying active vision mechanisms.

      Despite these simplifications, our model incorporates key neuromorphic principles, including the use of a recurrent neural network architecture and a spiking neuron model at multiple processing levels. These elements enable biologically inspired information processing, aligning with the fundamental characteristics of neuromorphic computing, even if the model does not explicitly focus on hardware efficiency or energy constraints.

      The authors describe this as a model of "active vision". This is done in the title of the article, and in the many paragraph headings (methods, results). In the introduction, however, the term active vision is reserved to the description of bees' behavior. Indeed, the developed model is not a model of active vision, as this would require for the model to control the movement of the "camera". Here instead the stimuli display is given to the model in a fixed progression. What I suspect is that the authors' aim is to describe a model that supports the bees' active vision, not a model of active vision. I believe this should be very clear from the paper, and it may be appropriate to remove the term from the title.

      While our model does not actively control camera movement in the environment, it does simulate the effects of active vision by incorporating scanning dynamics. Our results demonstrate that model responses change significantly with variations in scanning speed and restricted scanning areas, highlighting the importance of movement in shaping visual encoding. However, we acknowledge that true active vision would involve adaptive, real-time control of gaze or trajectory, which the step after the current implementation for make more realistic model of active vison. To address your concern, we have discussed the potential for incorporating dynamic flight behaviours in future studies, allowing the model to actively adjust its scanning strategy based on learned visual cues.

      In the short title, it said that this network is minimal. This is then characterized in the introduction as the minimal network capable of enabling active vision in bees. The authors, however, in their experiment only vary the number of lobula neurons, without changing other parts of the architecture. Given this, we can only say that 16 lobula neurons is the minimal number required to solve the experimental task with the given model. I don't believe that this is generalizable to bees, nor that this network is minimal, as there may be different architectures (for the other layers especially) that require overall less neurons. Moreover, the tasks attempted in the minimal network experiment did not include any of the complex stimuli presented in figure 3, like faces. It may be that 16 lobula neurons are sufficient for the X vs + and clockwise vs counter-clockwise spirals, but we do not know if increasing stimuli complexity would result in a failure of the model with 16 neurons.

      We agree that analysing only the number of lobula neurons is not sufficient to establish a truly minimal model for active vision. To address this, we conducted further control experiments to evaluate the influence of other key components, including non-associative learning, scanning behaviour, and lateral connectivity, on model performance. Our results suggest that the proposed model represents a computationally minimal network capable of implementing a basic active vision process, but a more complex model would be required for higher-order visual tasks.

      However, to avoid potential misinterpretation, we have revised the short title and updated the manuscript to clarify that our model identifies a possible minimal functional circuit rather than the absolute minimal network for active vision. Additionally, we have added further discussion on the simplifications made in the model and emphasised the need for future studies to explore alternative architectures and assess their relevance for understanding active vision in insects.

      Methodology:

      The current explanation of the model is currently a bit lacking in clarity and details. This risks impacting negatively on the relevance of the whole work which is interesting and worth reading! This issue affects also the interpretation of the results, as it is not clear to what extent each part of the network could affect the results shown. This is especially the case when the network under-performs with respect to the best performing scenario (e.g., when varying the speed and part of the pattern that is observed, such as in Fig 2C). Adding a detailed technical scheme/drawing specific to the network architecture could have been a way of significantly increasing the clarity of the Methods section and the interpretation of the results.<br /> On a similar note, the authors make some comparisons between the model and real bees. However, it remains unclear whether these similarities are actually indicative of an optimality in the bees visual scanning strategy, or just deriving from the authors design. This is for me particularly important in the experiments aimed at finding the best scanning procedure. If the initial model training is based on natural images it is performed by presenting left to right moving frames, the highest efficiency of lower-half scanning may be due to how the weights in the initial layers are structured and a low generalizability of the model, rather than to the strategy optimality

      We appreciate the reviewer’s constructive feedback and have taken steps to enhance the clarity, interpretability, and transparency of our model description and results. Below, we address the concerns regarding model explanation, performance interpretation, and the comparison with real bee behaviour.

      (1) Improved Model Explanation and Network Clarity: We apologise that the previous version of the manuscript did not fully detail the architecture and functioning of the model. To address this, we have expanded the Methods section with a more detailed breakdown of the network components, their roles, and their contribution to active vision processing. Additionally, we have summarised the network architecture and its implementation for visual learning tasks at the beginning of the Results section, providing a clearer overview of the information flow from visual input to associative learning. Furthermore, we have explicitly analysed and discussed the role of key model components, including scanning strategies, lateral connectivity, and non-associative learning mechanisms, clarifying how each contributes to the observed results.

      (2) Interpretation of Model Performance Variability: Understanding the factors influencing performance variability is crucial, and to improve clarity, we have conducted further analysis of model performance across different conditions, particularly examining the effects of scanning speed, spatial constraints, and feature encoding (see Figure 2C). Additionally, we have expanded the discussion on how scanning conditions impact performance, providing explanations for why some conditions lead to higher or lower discrimination success. Furthermore, we have clarified why certain stimuli present greater challenges for the model, linking these difficulties to receptive field properties and scanning dynamics.

      (3) Comparison Between Model Behaviour and Real Bees: To address your concern regarding the link between scanning preferences and true biological optimality, we have included further analysis discussing the influence of training conditions on the model’s learned behaviours. Additionally, we propose future experiments to test alternative scanning strategies, including adaptive scanning mechanisms that adjust based on visual task demands. Furthermore, we have expanded the discussion on the simplifications made in this study, explicitly stating the limitations of the model and emphasising the need for future research to explore more flexible and biologically plausible scanning mechanisms.

      We believe these revisions significantly enhance the clarity and interpretability of the study, ensuring that the model’s findings are well contextualised within both computational and biological frameworks.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Specific comments:

      (1) It is difficult to appreciate that there is a "peripheral sub-membrane microtubule array" as it is not well defined in the manuscript. This reviewer assumes that this is in the respective field clear. Yet, while it is appreciated that there is an increased amount of MTs close to the cytoplasmic membrane, the densities appear very variable along the membrane. Please provide a clear description in the Introduction what is meant with "peripheral sub-membrane microtubule array".

      A definition has been added to the Introduction.

      (2) The authors described a "consistent presence of a significant peripheral array in the C57BL/ 6J control mice, while the KO counterparts exhibited a partial loss of this peripheral bundle.

      Specifically, the measured tubulin intensity at the cell periphery was significantly reduced in the KO mice compared to their wild-type counterparts". In vitro "control cells had convoluted nonradial MTs with a prominent sub-membrane array, typical for β cells (Fig. 2A), KIF5B-depleted cells featured extra-dense MTs in the cell center and sparse receding MTs at the periphery (Fig. 2B,C)". Please comment/discuss why in vivo there are no "extra-dense MTs in the cell center".

      We now add a discussion of this point, which we believe could be a manifestation of 3D shape of a beta cell in tissue and/or compensatory mechanisms in organisms.

      (3) Authors should include in the Discussion a paragraph discussing the fact that small changes in MT configuration can have strong effects.

      A paragraph added to the discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1: Even though the reviewer appreciates that minor changes of MT configuration have severe effects, still the overall effects appear minor (40 vs. <50% or 35% vs. around 28%). Notably, there are no statistically significant differences in the different groups in Fig. 1Suppl-Fig.1 D. This reviewer is not sure if the combination of many not significantly different data points can result in significant changes and this should be checked by a statistician. Authors should include in the Discussion a paragraph discussing the fact that small changes in MT configuration can have strong effects.

      We have now added the requested paragraph to the discussion. Indeed, the differences are small, and the significance is only detected in a data set with a large sample size in Fig. 1J,K (combined data sets with smaller sizes from Fig. 1-Suppl-Fig.1 D), consistent with the fact that a larger sample size generally provides more power to detect an effect.

      (2) Unfortunately, the authors cannot block kinesin-1 resulting in microtubule accumulation in the cell center and then release the block (best inhibiting microtubule formation), to show that the MTs accumulated in the cell center will be transported to the periphery.

      This is indeed the case at the moment, yes.

      Minor comments:

      - Abstract: β-cells vs. β cells (and throughout the manuscript)

      - Page 4: "MTOC, the Golgi, (Trogden et al. 2019), and"

      - Page 5: "β-cell specific"

      - MT-sliding vs. MT sliding

      - Kinesin 1 vs. kinesin-1

      - Page 6, line 1: "β cells. actively"

      - Page 7: "a microtubule probe", should be "MT"

      - Page 9: "1μm" vs. "1 μm"

      - Page 10: "demonstrate a dramatic effect" recommended is: "demonstrate a marked effect"

      - Page 13, line 1: dramatically vs. markedly

      - Page 13, line 5: "50μm" vs. "50 μm" (in general, there should be a space between number and unit?)

      - "37 degrees C" vs. "37{degree sign}C"

      - Animal protocol number?

      - "Mice were euthanized by isoflurane inhalation"? What concentration? How long? More details are needed (no cervical dislocation?).

      - Antibodies: more identifiers are needed.

      - Antibody information in Key reagents and under 5. Reagents and antibodies do not fit (1:500 and 1:1000).

      Thank you, we corrected all relevant information now.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Hussain and collaborators aims at deciphering the microtubule-dependent ribbon formation in zebrafish hair cells. By using confocal imaging, pharmacology tools, and zebrafish mutants, the group of Katie Kindt convincingly demonstrated that ribbon, the organelle that concentrates glutamate-filled vesicles at the hair cell synapse, originates from the fusion of precursors that move along the microtubule network. This study goes hand in hand with a complementary paper (Voorn et al.) showing similar results in mouse hair cells.

      Strengths:

      This study clearly tracked the dynamics of the microtubules, and those of the microtubule-associated ribbons and demonstrated fusion ribbon events. In addition, the authors have identified the critical role of kinesin Kif1aa in the fusion events. The results are compelling and the images and movies are magnificent.

      Weaknesses:

      The lack of functional data regarding the role of Kif1aa. Although it is difficult to probe and interpret the behavior of zebrafish after nocodazole treatment, I wonder whether deletion of kif1aa in hair cells may result in a functional deficit that could be easily tested in zebrafish?

      We have examined functional deficits in kif1aa mutants in another paper that was recently accepted: David et al. 2024. https://pubmed.ncbi.nlm.nih.gov/39373584/

      In David et al., we found that in addition to a subtle role in ribbon fusion during development, Kif1aa plays a major role in enriching glutamate-filled synaptic vesicles at the presynaptic active zone of mature hair cells. In kif1aa mutants, synaptic vesicles are no longer enriched at the hair cell base, and there is a reduction in the number of synaptic vesicles associated with presynaptic ribbons. Further, we demonstrated that kif1aa mutants also have functional defects including reductions in spontaneous vesicle release (from hair cells) and evoked postsynaptic calcium responses. Behaviorally, kif1aa mutants exhibit impaired rheotaxis, indicating defects in the lateral-line system and an inability to accurately detect water flow. Because our current paper focuses on microtubule-associated ribbon movement and dynamics early in hair-cell development, we have only discussed the effects of Kif1aa directly related to ribbon dynamics during this time window. In our revision, we have referenced this recent work. Currently it is challenging to disentangle how the subtle defects in ribbon formation in kif1aa mutants contribute to the defects we observe in ribbon-synapse function.

      Added to results:

      “Recent work in our lab using this mutant has shown that Kif1aa is responsible for enriching glutamate-filled vesicles at the base of hair cells. In addition this work demonstrated that loss of Kif1aa results in functional defects in mature hair cells including a reduction in evoked post-synaptic calcium responses (David et al., 2024). We hypothesized that Kif1aa may also be playing an earlier role in ribbon formation.”

      Impact:

      The synaptogenesis in the auditory sensory cell remains still elusive. Here, this study indicates that the formation of the synaptic organelle is a dynamic process involving the fusion of presynaptic elements. This study will undoubtedly boost a new line of research aimed at identifying the specific molecular determinants that target ribbon precursors to the synapse and govern the fusion process.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors set out to resolve a long-standing mystery in the field of sensory biology - how large, presynaptic bodies called "ribbon synapses" migrate to the basolateral end of hair cells. The ribbon synapse is found in sensory hair cells and photoreceptors, and is a critical structural feature of a readily-releasable pool of glutamate that excites postsynaptic afferent neurons. For decades, we have known these structures exist, but the mechanisms that control how ribbon synapses coalesce at the bottom of hair cells are not well understood. The authors addressed this question by leveraging the highly-tractable zebrafish lateral line neuromast, which exhibits a small number of visible hair cells, easily observed in time-lapse imaging. The approach combined genetics, pharmacological manipulations, high-resolution imaging, and careful quantifications. The manuscript commences with a developmental time course of ribbon synapse development, characterizing both immature and mature ribbon bodies (defined by position in the hair cell, apical vs. basal). Next, the authors show convincing (and frankly mesmerizing) imaging data of plus end-directed microtubule trafficking toward the basal end of the hair cells, and data highlighting the directed motion of ribbon bodies. The authors then use a series of pharmacological and genetic manipulations showing the role of microtubule stability and one particular kinesin (Kif1aa) in the transport and fusion of ribbon bodies, which is presumably a prerequisite for hair cell synaptic transmission. The data suggest that microtubules and their stability are necessary for normal numbers of mature ribbons and that Kif1aa is likely required for fusion events associated with ribbon maturation. Overall, the data provide a new and interesting story on ribbon synapse dynamics.

      Strengths:

      (1) The manuscript offers a comprehensive Introduction and Discussion sections that will inform generalists and specialists.

      (2) The use of Airyscan imaging in living samples to view and measure microtubule and ribbon dynamics in vivo represents a strength. With rigorous quantification and thoughtful analyses, the authors generate datasets often only obtained in cultured cells or more diminutive animal models (e.g., C. elegans).

      (3) The number of biological replicates and the statistical analyses are strong. The combination of pharmacology and genetic manipulations also represents strong rigor.

      (4) One of the most important strengths is that the manuscript and data spur on other questions - namely, do (or how do) ribbon bodies attach to Kinesin proteins? Also, and as noted in the Discussion, do hair cell activity and subsequent intracellular calcium rises facilitate ribbon transport/fusion?

      These are important strengths and as stated we are currently investigating what other kinesins and adaptors and adaptor’s transport ribbons. We have ongoing work examining how hair-cell activity impacts ribbon fusion and transport!

      Weaknesses:

      (1) Neither the data or the Discussion address a direct or indirect link between Kinesins and ribbon bodies. Showing Kif1aa protein in proximity to the ribbon bodies would add strength.

      This is a great point. Previous immunohistochemistry work in mice demonstrated that ribbons and Kif1a colocalize in mouse hair cells (Michanski et al, 2019). Unfortunately, the antibody used in study work did not work in zebrafish. To further investigate this interaction, we also attempted to create a transgenic line expressing a fluorescently tagged Kif1aa to directly visualize its association with ribbons in vivo. At present, we were unable to detect transient expression of Kif1aa-GFP or establish a transgenic line using this approach. While we will continue to work towards understanding whether Kif1aa and ribbons colocalize in live hair cells, currently this goal is beyond the scope of this paper. In our revision we discuss this caveat.

      Added to discussion:

      “In addition, it will be useful to visualize these kinesins by fluorescently tagging them in live hair cells to observe whether they associate with ribbons.”

      (2) Neither the data or Discussion address the functional consequences of loss of Kif1aa or ribbon transport. Presumably, both manipulations would reduce afferent excitation.

      Excellent point. Please see the response above to Reviewer #1 public response weaknesses.

      (3) It is unknown whether the drug treatments or genetic manipulations are specific to hair cells, so we can't know for certain whether any phenotypic defects are secondary.

      This is correct and a caveat of our Kif1aa and drug experiments. In our recently published work, we confirmed that Kif1aa is expressed in hair cells and neurons, while kif1ab is present just is neurons. Therefore, it is likely that the ribbon formation defects in kif1aa mutants are restricted to hair cells. We added this expression information to our results:

      “ScRNA-seq in zebrafish has demonstrated widespread co-expression of kif1ab and kif1aa mRNA in the nervous system. Additionally, both scRNA-seq and fluorescent in situ hybridization have revealed that pLL hair cells exclusively express kif1aa mRNA (David et al., 2024; Lush et al., 2019; Sur et al., 2023).”

      Non-hair cell effects are a real concern in our pharmacology experiments. To mitigate this in our pharmacological experiments, we have performed drug treatments at 3 different timescales: long-term (overnight), short-term (4 hr) and fast (30 min) treatments. The fast experiments were done after 30 min nocodazole drug treatment, and after this treatment we observed reduced directional motion and fusions. This fast drug treatment should not incur any long-term changes or developmental defects as hair-cell development occurs over 12-16 hrs. However, we acknowledge that drug treatments could have secondary phenotypic effects or effects that are not hair-cell specific. In our revision, we discuss these issues.

      Added to discussion:

      “Another important consideration is the potential off-target effects of nocodazole. Even at non-cytotoxic doses, nocodazole toxicity may impact ribbons and synapses independently of its effects on microtubules. While this is less of a concern in the short- and medium-term experiments (30-70 min and 4 hr), long-term treatments (16 hrs) could introduce confounding effects. Additionally, nocodazole treatment is not hair cell-specific and could disrupt microtubule organization within afferent terminals as well. Thus, the reduction in ribbon-synapse formation following prolonged nocodazole treatment may result from microtubule disruption in hair cells, afferent terminals, or a combination of the two.”

      Reviewer #3 (Public Review):

      Summary:

      The manuscript uses live imaging to study the role of microtubules in the movement of ribeye aggregates in neuromast hair cells in zebrafish. The main findings are that

      (1) Ribeye aggregates, assumed to be ribbon precursors, move in a directed motion toward the active zone;

      (2) Disruption of microtubules and kif1aa increases the number of ribeye aggregates and decreases the number of mature synapses.

      The evidence for point 2 is compelling, while the evidence for point 1 is less convincing. In particular, the directed motion conclusion is dependent upon fitting of mean squared displacement that can be prone to error and variance to do stochasticity, which is not accounted for in the analysis. Only a small subset of the aggregates meet this criteria and one wonders whether the focus on this subset misses the bigger picture of what is happening with the majority of spots.

      Strengths:

      (1) The effects of Kif1aa removal and nocodozole on ribbon precursor number and size are convincing and novel.

      (2) The live imaging of Ribeye aggregate dynamics provides interesting insight into ribbon formation. The movies showing the fusion of ribeye spots are convincing and the demonstrated effects of nocodozole and kif1aa removal on the frequency of these events is novel.

      (3) The effect of nocodozole and kif1aa removal on precursor fusion is novel and interesting.

      (4) The quality of the data is extremely high and the results are interesting.

      Weaknesses:

      (1) To image ribeye aggregates, the investigators overexpressed Ribeye-a TAGRFP under the control of a MyoVI promoter. While it is understandable why they chose to do the experiments this way, expression is not under the same transcriptional regulation as the native protein, and some caution is warranted in drawing some conclusions. For example, the reduction in the number of puncta with maturity may partially reflect the regulation of the MyoVI promoter with hair cell maturity. Similarly, it is unknown whether overexpression has the potential to saturate binding sites (for example motors), which could influence mobility.

      We agree that overexpression of transgenes under using a non-endogenous promoter in transgenic lines is an important consideration. Ideally, we would do these experiments with endogenously expressed fluorescent proteins under a native promoter. However, this was not technically possible for us. The decrease in precursors is likely not due to regulation by the myo6a promoter. Although the myo6a promoter comes on early in hair cell development, the promoter only gets stronger as the hair cells mature. This would lead to a continued increase rather than a decrease in puncta numbers with development.

      Protein tags such as tagRFP always have the caveat of impacting protein function. This is in partly why we complemented our live imaging with analyses in fixed tissue without transgenes (kif1aa mutants and nocodazole/taxol treatments).

      In our revision, we did perform an immunolabel on myo6b:riba-tagRFP transgenic fish and found that Riba-tagRFP expression did not impact ribbon synapse numbers or ribbon size. This analysis argues that the transgene is expressed at a level that does not impact ribbon synapses. This data is summarized in Figure 1-S1.

      Added to the results:

      “Although this latter transgene expresses Riba-TagRFP under a non-endogenous promoter, neither the tag nor the promoter ultimately impacts cell numbers, synapse counts, or ribbon size (Figure 1-S1A-E).”

      Added to methods:

      Tg(myo6b:ctbp2a-TagRFP)<sup>idc11Tg</sup> reliably labels mature ribbons, similar to a pan-CTBP immunolabel at 5 dpf (Figure 1-S1B). This transgenic line does not alter the number of hair cells or complete synapses per hair cell (Figure 1-S1A-D). In addition, myo6b:ctbp2a-TagRFP does not alter the size of ribbons (Figure 1-S1E).”

      (2) The examples of punctae colocalizing with microtubules look clear (Figures 1 F-G), but the presentation is anecdotal. It would be better and more informative, if quantified.

      We did attempt a co-localization analysis between microtubules and ribbons but did not move forward with it due to several issues:

      (1) Hair cells have an extremely crowded environment, especially since the nucleus occupies the majority of the cell. All proteins are pushed together in the small space surrounding the nucleus and ultimately, we found that co-localization analyses were not meaningful because the distances were too small.

      (2) We also attempted to segment microtubules in these images and quantify how many ribbons were associated with microtubules, but 3D microtubule segmentation was not accurate in hair cells due to highly varying filament intensities, filament dynamics and the presence of diffuse cytoplasmic tubulin signal.

      Because of these challenges we concluded the best evidence of ribbon-microtubule association is through visualization of ribbons and their association with microtubules over time (in our timelapses). We see that ribbons localize to microtubules in all our timelapses, including the examples shown (Movies S2-S10). The only instance of ribbon dissociation it when ribbons switch from one filament to another. We did not observe free-floating ribbons in our study.

      (3) It appears that any directed transport may be rare. Simply having an alpha >1 is not sufficient to declare movement to be directed (motor-driven transport typically has an alpha approaching 2). Due to the randomness of a random walk and errors in fits in imperfect data will yield some spread in movement driven by Brownian motion. Many of the tracks in Figure 3H look as though they might be reasonably fit by a straight line (i.e. alpha = 1).

      (4) The "directed motion" shown here does not really resemble motor-driven transport observed in other systems (axonal transport, for example) even in the subset that has been picked out as examples here. While the role of microtubules and kif1aa in synapse maturation is strong, it seems likely that this role may be something non-canonical (which would be interesting).

      Yes, it is true, that directed transport of ribbon precursors is relatively rare. Only a small subset of the ribbon precursors moves directionally (α > 1, 20 %) or have a displacement distance > 1 µm (36 %) during the time windows we are imaging. The majority of the ribbons are stationary. To emphasize this result we have added bar graphs to Figure 3I,K to illustrate this result and state the numbers behind this result more clearly.

      “Upon quantification, 20.2 % of ribbon tracks show α > 1, indicative of directional motion, but the majority of ribbon tracks (79.8 %) show α < 1, indicating confinement on microtubules (Figure 3I, n = 10 neuromasts, 40 hair cells, and 203 tracks).

      To provide a more comprehensive analysis of precursor movement, we also examined displacement distance (Figure 3J). Here, as an additional measure of directed motion, we calculated the percent of tracks with a cumulative displacement > 1 µm. We found 35.6 % of tracks had a displacement > 1 µm (Figure 3K; n = 10 neuromasts, 40 hair cells, and 203 tracks).”

      We cannot say for certain what is happening with the stationary ribbons, but our hypothesis is that these ribbons eventually exhibit directed motion sufficient to reach the active zone. This idea is supported by the fact that we see ribbons that are stationary begin movement, and ribbons that are moving come to a stop during the acquisition of our timelapses (Movies S4 and S5). It is possible that ribbons that are stationary may not have enough motors attached, or there may be a ‘seeding’ phase where Ribeye aggregates are condensing on the ribbon.

      We also reexamined our MSD a values as the a values we observed in hair cells were lower than those seen canonical motor-driven transport (where a approaches 2). One reason for this difference may arise from the dynamic microtubule network in developing hair cells, which could affect directional ribbon movement. In our revision we plotted the distribution of a values which confirmed that in control hair cells, the majority of the a values we see are typically less than 2 (Figure 7-S1A). Interestingly we also compared the distribution a values between control and taxol-treated hair cells, where the microtubule network is more stable, and found that the distribution shifted towards higher a values (Figure 7-S1A). We also plotted only ‘directional’ tracks (with a > 1) and observed significantly higher a values in taxol-treated hair cells (Figure 7-S1B). This is an interesting result which indicates that although the proportion of directional tracks (with a > 1) is not significantly different between control and taxol-treated hair cells (which could be limited by the number of motor/adapter proteins), the ribbons that move directionally do so with greater velocities when the microtubules are more stable. This supports our idea that the stability of the microtubule network could be why ribbon movement does not resemble canonical motor transport. This analysis is presented as a new figure (Figure 7-S1A-B) and is referred to in the text in the results and the discussion.

      Results:

      “Interestingly, when we examined the distribution of α values, we observed that taxol treatment shifted the overall distribution towards higher α a values (Figure 7-S1A). In addition, when we plotted only tracks with directional motion (α > 1), we found significantly higher α values in hair cells treated with taxol compared to controls (Figure 7-S1B). This indicates that in taxol-treated hair cells, where the microtubule network is stabilized, ribbons with directional motion have higher velocities.”

      Discussion:

      “Our findings indicate that ribbons and precursors show directed motion indicative of motor-mediated transport (Figure 3 and 7). While a subset of ribbons moves directionally with α values > 1, canonical motor-driven transport in other systems, such as axonal transport, can achieve even higher α values approaching 2 (Bellotti et al., 2021; Corradi et al., 2020). We suggest that relatively lower α values arise from the highly dynamic nature of microtubules in hair cells. In axons, microtubules form stable, linear tracks that allow kinesins to transport cargo with high velocity. In contrast, the microtubule network in hair cells is highly dynamic, particularly near the cell base. Within a single time frame (50-100 s), we observe continuous movement and branching of these networks. This dynamic behavior adds complexity to ribbon motion, leading to frequent stalling, filament switching, and reversals in direction. As a result, ribbon transport appears less directional than the movement of traditional motor cargoes along stable axonal filaments, resulting in lower α values compared to canonical motor-mediated transport. Notably, treatment with taxol, which stabilizes microtubules, increased α values to levels closer to those observed in canonical motor-driven transport (Figure 7-S1). This finding supports the idea that the relatively lower α values in hair cells are a consequence of a more dynamic microtubule network. Overall, this dynamic network gives rise to a slower, non-canonical mode of transport.”

      (5) The effect of acute treatment with nocodozole on microtubules in movie 7 and Figure 6 is not obvious to me and it is clear that whatever effect it has on microtubules is incomplete.

      When using nocodazole, we worked to optimize the concentration of the drug to minimize cytotoxicity, while still being effective. While the more stable filaments at the cell apex remain largely intact after nocodazole treatment, there are almost no filaments at the hair cell base, which is different from the wild-type hair cells. In addition, nocodazole-treated hair cells have more cytoplasmic YFP-tubulin signal compared to wild type. We have clarified this in our results. To better illustrate the effect of nocodazole and taxol we have also added additional side-view images of hair cells expressing YFP-tubulin (Figure 4-S1F-G), that highlight cytoplasmic YFP-tubulin and long, stabilized microtubules after 3-4 hr treatment with nocodazole and taxol respectively. In these images we also point out microtubules at the apical region of hair cells that are very stable and do not completely destabilize with nocodazole treatment at concentrations that are tolerable to hair cells.

      “We verified the effectiveness of our in vivo pharmacological treatments using either 500 nM nocodazole or 25 µM taxol by imaging microtubule dynamics in pLL hair cells (myo6b:YFP-tubulin). After a 30-min pharmacological treatment, we used Airyscan confocal microscopy to acquire timelapses of YFP-tubulin (3 µm z-stacks, every 50-100 s for 30-70 min, Movie S8). Compared to controls, 500 nM nocodazole destabilized microtubules (presence of depolymerized YFP-tubulin in the cytosol, see arrows in Figure 4-S1F-G) and 25 µM taxol dramatically stabilized microtubules (indicated by long, rigid microtubules, see arrowheads in Figure 4-S1F,H) in pLL hair cells. We did still observe a subset of apical microtubules after nocodazole treatment, indicating that this population is particularly stable (see asterisks in Figure 4-S1F-H).”

      To further address concerns about verifying the efficacy of nocodazole and taxol treatment on microtubules, we added a quantification of our immunostaining data comparing the mean acetylated-a-tubulin intensities between control, nocodazole and taxol-treated hair cells. Our results show that nocodazole treatment reduces the mean acetylated-a-tubulin intensity in hair cells. This is included as a new figure (Figure 4-S1D-E) and this result is referred to in the text. To better illustrate the effect of nocodazole and taxol we have also added additional side-view images of hair cells after overnight treatment with nocodazole and taxol (Figure 4-S1A-C).

      “After a 16-hr treatment with 250 nM nocodazole we observed a decrease in acetylated-a-tubulin label (qualitative examples: Figure 4A,C, Figure 4-S1A-B). Quantification revealed significantly less mean acetylated-a-tubulin label in hair cells after nocodazole treatment (Figure 4-S1D). Less acetylated-a-tubulin label indicates that our nocodazole treatment successfully destabilized microtubules.”

      “Qualitatively more acetylated-a-tubulin label was observed after treatment, indicating that our taxol treatment successfully stabilized microtubules (qualitative examples: Figure 4-S1A,C). Quantification revealed an overall increase in mean acetylated-a-tubulin label in hair cells after taxol treatment, but this increase did not reach significance (Figure 4-S1E).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The manuscript is fairly dense. For instance, some information is repeated (page 3 ribbon synapses form along a condensed timeline in zebrafish hair cells: 12-18 hrs, and on .page 5. These hair cells form 3-4 ribbon synapses in just 12-18 hrs). Perhaps, the authors could condense some of the ideas? The introduction could be shortened.

      We have eliminated this repeated text in our revision. We have shortened the introduction 1275 to 1038 words (with references)

      (2) The mechanosensory structure on page 5 is not defined for readers outside the field.

      Great point, we have added addition information to define this structure in the results:

      “We staged hair cells based on the development of the apical, mechanosensory hair bundle. The hair bundle is composed of actin-based stereocilia and a tubulin-based kinocilium. We used the height of the kinocilium (see schematic in Figure 1B), the tallest part of the hair bundle, to estimate the developmental stage of hair cells as described previously…”

      (3) Figure 1E is quite interesting but I'd rather show Figure S1 B/C as they provide statistics. In addition, the authors define 4 stages : early, intermediate, late, and mature for counting but provide only 3 panels for representative examples by mixing late/mature.

      We were torn about which ribbon quantification graph to show. Ultimately, we decided to keep the summary data in Figure 1E. This is primarily because the supplementary Figure will be adjacent to the main Figure in the Elife format, and the statistics will be easy to find and view.

      Figure 1 now provides a representative image for both late and mature hair cells.

      (4.) The ribbon that jumps from one microtubule to another one is eye-catching. Can the authors provide any statistics on this (e.g. percentage)?

      Good point. In our revision, we have added quantification for these events. We observe 2.8 switching events per neuromast during our fast timelapses. This information is now in the text and is also shown in a graph in Figure 3-S1D.

      “Third, we often observed that precursors switched association between neighboring microtubules (2.8 switching events per neuromast, n= 10 neuromasts; Figure 3-S1C-D, Movie S7).”

      (5) With regard to acetyl-a-tub immunocytochemistry, I would suggest obtaining a profile of the fluorescence intensity on a horizontal plane (at the apical part and at the base).

      (6) Same issue with microtubule destruction by nocodazole. Can the authors provide fluorescence intensity measurements to convince readers of microtubule disruption for long and short-term application.

      Regarding quantification of microtubule disruption using nocodazole and taxol. We did attempt to create profiles of the acetylated tubulin or YFP-tubulin label along horizontal planes at the apex and base, but the amount variability among cells and the angle of the cell in the images made this type of display and quantification challenging. In our revision we as stated above in our response to Reviewer #1’s public comment, we have added representative side-view images to show the disruptions to microtubules more clearly after short and long-term drug experiments (Figure 4-S1A-C, F-H). In addition, we quantified the reduction in acetylated tubulin label after overnight treatment with nocodazole and found the signal was significantly reduced (Figure 3-S1D-E). Unfortunately, we were unable to do a similar quantification due to the variability in YFP-tubulin intensity due to variations in mounting. The following text has been added to the results:

      “Quantification revealed significantly less mean acetylated-a-tubulin label in hair cells after nocodazole treatment (Figure 4-S1D).”

      “Quantification revealed an overall increase in mean acetylated-a-tubulin label in hair cells after taxol treatment, but this increase did not reach significance (Figure 4-S1A,C,E).”

      (7) It is a bit difficult to understand that the long-term (overnight) microtubule destabilization leads to a reduction in the number of synapses (Figure 4F) whereas short-term (30 min) microtubule destabilization leads to the opposite phenotype with an increased number of ribbons (Figure 6G). Are these ribbons still synaptic in short-term experiments? What is the size of the ribbons in the short-term experiments? Alternatively, could the reduction in synapse number upon long-term application of nocodazole be a side-effect of the toxicity within the hair cell?

      Agreed-this is a bit confusing. In our revision, we have changed our analyses, so the comparisons are more similar between the short- and long-term experiments–we examined the number of ribbons and precursor per cells (apical and basal) in both experiments (Changed the panel in Figure 4G, Figure 4-S2G and Figure 5G). In our live experiments we cannot be sure that ribbons are synaptic as we do not have a postsynaptic co-label. Also, we are unable to reliably quantify ribbon and precursor size in our live images due to variability in mounting. We have changed the text to clarify as follows:

      Results:

      “In each developing cell, we quantified the total number of Riba-TagRFP puncta (apical and basal) before and after each treatment. In our control samples we observed on average no change in the number of Riba-TagRFP puncta per cell (Figure 6G). Interestingly, we observed that nocodazole treatment led to a significant increase in the total number of Riba-TagRFP puncta after 3-4 hrs (Figure 6G). This result is similar to our overnight nocodazole experiments in fixed samples, where we also observed an increase in the number of ribbons and precursors per hair cell. In contrast to our 3-4 hr nocodazole treatment, similar to controls, taxol treatment did not alter the total number of Riba-TagRFP puncta over 3-4 hrs (Figure 6G). Overall, our overnight and 3-4 hr pharmacology experiments demonstrate that microtubule destabilization has a more significant impact on ribbon numbers compared to microtubule stabilization.”

      Discussion:

      “Ribbons and microtubules may interact during development to promote fusion, to form larger ribbons. Disrupting microtubules could interfere with this process, preventing ribbon maturation. Consistent with this, short-term (3-4 hr) and long-term (overnight) nocodazole increased ribbon and precursor numbers (Figure 6AG; Figure 4G), suggesting reduced fusion. Long-term treatment (overnight) resulted in a shift toward smaller ribbons (Figure 4H-I), and ultimately fewer complete synapses (Figure 4F).”

      Nocodazole toxicity: in response to Reviewer # 2’s public comment we have added the following text in our discussion:

      Discussion:

      “Another important consideration is the potential off-target effects of nocodazole. Even at non-cytotoxic doses, nocodazole toxicity may impact ribbons and synapses independently of its effects on microtubules. While this is less of a concern in the short- and medium-term experiments (30 min to 4 hr), long-term treatments (16 hrs) could introduce confounding effects. Additionally, nocodazole treatment is not hair cell-specific and could disrupt microtubule organization within afferent terminals as well. Thus, the reduction in ribbon-synapse formation following prolonged nocodazole treatment may result from microtubule disruption in hair cells, afferent terminals, or a combination of the two.”

      (8) Does ribbon motion depend on size or location?

      It is challenging to reliability quantify the actual area of precursors in our live samples, as there is variability in mounting and precursors are quite small. But we did examine the location of ribbon precursors (using tracks > 1 µm as these tracks can easily be linked to cell location in Imaris) with motion in the cell. We found evidence of ribbons with tracks > 1 µm throughout the cell, both above and below the nucleus. This is now plotted in Figure 3M. We have also added the following test to the results:

      “In addition, we examined the location of precursors within the cell that exhibited displacements > 1 µm. We found that 38.9 % of these tracks were located above the nucleus, while 61.1 % were located below the nucleus (Figure 3M).”

      Although this is not an area or size measurement, this result suggests that both smaller precursors that are more apical, and larger precursors/ribbons that are more basal all show motion.

      (9) The fusion event needs to be analyzed in further detail: when one ribbon precursor fuses with another one, is there an increase in size or intensity (this should follow the law of mass conservation)? This is important to support the abstract sentence "ribbon precursors can fuse together on microtubules to form larger ribbons".

      As mentioned above it is challenging accurately estimate the absolute size or intensity of ribbon precursors in our live preparation. But we did examine whether there is a relative increase in area after ribbon fuse. We have plotted the change in area (within the same samples) for the two fusion events in shown in Figure 8-S1A-B. In these examples, the area of the puncta after fusion is larger than either of the two precursors that fuse. Although the areas are not additive, these plots do provide some evidence that fusion does act to form larger ribbons. To accompany these plots, we have added the following text to the results:

      “Although we could not accurately measure the areas of precursors before and after fusion, we observed that the relative area resulting from the fusion of two smaller precursors was greater than that of either precursor alone. This increase in area suggests that precursor fusion may serve as a mechanism for generating larger ribbons (see examples: Figure 8-S1A-B).”

      Because we were unable to provide more accurate evidence of precursor fusion resulting in larger ribbons, we have removed this statement from our abstract and lessened our claims elsewhere in the manuscript.

      (10) The title in Figure 8 is a bit confusing. If fusion events reflect ribbon precursors fusion, it is obvious it depends on ribbon precursors. I'd like to replace this title with something like "microtubules and kif1aa are required for fusion events"

      We have changed the figure title as suggested, good idea.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1C. The purple/magenta colors are hard to distinguish.

      We have made the magenta color much lighter in the Figure 1C to make it easier to distinguish purple and magenta.

      (2) There are places where some words are unnecessarily hyphenated. Examples: live-imaging and hair-cell in the abstract, time-course in the results.

      In our revision, we have done our best to remove unnecessary hyphens, including the ones pointed out here.

      (3) Figure 4H and elsewhere - what is "area of Ribeye puncta?" Related, I think, in the Discussion the authors refer to "ribbon volume" on line 484. But they never measured ribbon volume so this needs to be clarified.

      We have done best to clarify what is meant by area of Ribeye puncta in the results and the methods:

      Results:

      “We also observed that the average of individual Ribeyeb puncta (from 2D max-projected images) was significantly reduced compared to controls (Figure 4H). Further, the relative frequency of individual Ribeyeb puncta with smaller areas was higher in nocodazole treated hair cells compared to controls (Figure 4I).”

      Methods:

      “To quantify the area of each ribbon and precursor, images were processed in a FIJI ‘IJMacro_AIRYSCAN_simple3dSeg_ribbons only.ijm’ as previously described (Wong et al., 2019). Here each Airyscan z-stack was max-projected. A threshold was applied to each image, followed by segmentation to delineate individual Ribeyeb/CTBP puncta. The watershed function was used to separate adjacent puncta. A list of 2D objects of individual ROIs (minimum size filter of 0.002 μm2) was created to measure the 2D areas of each Ribeyeb/CTBP puncta.”

      We did refer to ribbon volume once in the discussion, but volume is not reflected in our analyses, so we have removed this mention of volume.

      (4) More validation data showing gene/protein removal for the crispants would be helpful.

      Great suggestion. As this is a relatively new method, we have created a figure that outlines how we genotype each individual crispant animal analyzed in our study Figure 6-S1. In the methods we have also added the following information:

      “fPCR fragments were run on a genetic analyzer (Applied Biosystems, 3500XL) using LIZ500 (Applied Biosystems, 4322682) as a dye standard. Analysis of this fPCR revealed an average peak height of 4740 a.u. in wild type, and an average peak height of 126 a.u. in kif1aa F0 crispants (Figure 6-S1). Any kif1aa F0 crispant without robust genomic cutting or a peak height > 500 a.u. was not included in our analyses.”

      Reviewer #3 (Recommendations For The Authors):

      Lines 208-209--should refer to the movie in the text.

      Movie S1 is now referenced here.

      It would be helpful if the authors could analyze and quantify the effect of nocodozole and taxol on microtubules (movie 7).

      See responses above to Reviewer #1’s similar request.

      Figure 7 caption says "500 mM" nocodozole.

      Thank you, we have changed the caption to 500 nM.

      One problem with the MSD analysis is that it is dependent upon fits of individual tracks that lead to inaccuracies in assigning diffusive, restricted, and directed motion. The authors might be able to get around these problems by looking at the ensemble averages of all the tracks and seeing how they change with the various treatments. Even if the effect is on a subset of ribeye spots, it would be reassuring to see significant effects that did not rely upon fitting.

      We are hesitant to average the MSD tracks as not all tracks have the same number of time steps (ribbon moving in and out of the z-stack during the timelapse). This makes it challenging for us to look at the ensembles of all averages accurately, especially for the duration of the timelapse. This is the main reason why added another analysis, displacements > 1µm as another readout of directional motion, a measure that does not rely upon fitting.

      The abstract states that directed movement is toward the synapse. The only real evidence for this is a statement in the results: "Of the tracks that showed directional motion, while the majority move to the cell base, we found that 21.2 % of ribbon tracks moved apically." A clearer demonstration of this would be to do the analysis of Figure 2G for the ribeye aggregates.

      If was not possible to do the same analysis to ribbon tracks that we did for the EB3-GFP analysis in Figure 2. In Figure 2 we did a 2D tracking analysis and measured the relative angles in 2D. In contrast, the ribbon tracking was done in 3D in Imaris not possible to get angles in the same way. Further the MSD analysis was outside of Imaris, making it extremely difficult to link ribbon trajectories to the 3D cellular landscape in Imaris. Instead, we examined the direction of the 3D vectors in Imaris with tracks > 1µm and determined the direction of the motion (apical, basal or undetermined). For clarity, this data is now included as a bar graph in Figure 3L. In our results, we have clarified the results of this analysis:

      “To provide a more comprehensive analysis of precursor movement, we also examined displacement distance (Figure 3J). Here, as an additional measure of directed motion, we calculated the percent of tracks with a cumulative displacement > 1 µm. We found 35.6 % of tracks had a displacement > 1 µm (Figure 3K; n = 10 neuromasts, 40 hair cells and 203 tracks). Of the tracks with displacement > 1 µm, the majority of ribbon tracks (45.8 %) moved to the cell base, but we also found a subset of ribbon tracks (20.8 %) that moved apically (33.4 % moved in an undetermined direction) (Figure 3L).”

      Some more detail about the F0 crispants should be provided. In particular, what degree of cutting was observed and what was the criteria for robust cutting?

      See our response to Reviewer 2 and the newly created Figure 6-S1.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1:

      Specificity of MYL3 Selection:

      My previous question focused on why MYL3 was prioritized over other myosin family members. While the response broadly implicates myosins in viral entry, it does not justify why MYL3 was specifically chosen. For clarity, the "Introduction sections" should explicitly state the unique features of MYL3 (e.g., domain structure, binding affinity, or prior evidence linking it to NNV) that distinguish it from other myosins.

      Thank you for your valuable comment regarding the specificity of MYL3 selection. In response, we have revised the "Introduction" section to explicitly clarify the rationale for prioritizing MYL3 over other myosin family members. Specifically, we have now included prior evidence linking MYL3 to NNV infection, citing our studies that identified MYL3 as a potential host factor interacting with NNV CP protein. In our previous study, sixteen CP-interacting proteins were identified by Co-IP assays followed by MS, including HSP90ab1, Centrosomal protein 170B, MYL3 and so on. In addition to our findings, previous study by other researchers has also reported that Epinephelus coioides MYL3 can bind to NNV (page 3, lines 79–81). These revisions provide a clearer justification for the selection of MYL3 and distinguish it from other myosin proteins. The added content can be found in the revised manuscript on page 3, lines 81–84.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) I miss some treatment of the lack of behavioural correlate. What does it mean that metamine benefits EEG classification accuracy without improving performance? One possibility here is that there is an improvement in response latency, rather than perceptual sensitivity. Is there any hint of that in the RT results? In some sort of combined measure of RT and accuracy? 

      First, we would like to thank the reviewer for their positive assessment of our work and for their extremely helpful and constructive comments that helped to significantly improve the quality of our manuscript.  

      The reviewer rightly points out that, to our surprise, we did not obtain a correlate of the effect of memantine in our behavioral data, neither in the reported accuracy data nor in the RT data. We do not report RT results as participants were instructed to respond as accurately as possible, without speed pressure. We added a paragraph in the discussion section to point to possible reasons for this surprising finding:

      “There are several possible reasons for this lack of behavioral correlate.  For example, EEG decoding may be a more sensitive measure of the neural effects of memantine, in particular given that perceptual sensitivity may have been at floor (masked condition, experiment 1) or ceiling (unmasked condition, experiment 1, and experiment 2). It is also possible that the present decoding results are merely epiphenomenal, not mapping onto functional improvements (e.g., Williams et al., 2007). However, given that we found a tight link between these EEG decoding markers and behavioral performance in our previous work (Fahrenfort et al., 2017; Noorman et al., 2023), it is possible that the effect of memantine was just too subtle to show up in changes in overt behavior.”

      (2) An explanation is missing, about why memantine impacts the decoding of illusion but not collinearity. At a systems level, how would this work? How would NMDAR antagonism selectively impact long-range connectivity, but not lateral connectivity? Is this supported by our understanding of laminar connectivity and neurochemistry in the visual cortex?

      We have no straightforward or mechanistic explanation for this finding. In the revised discussion, we are highlighting this finding more clearly, and included some speculative explanations:

      “The present effect of memantine was largely specific to illusion decoding, our marker of feedback processing, while collinearity decoding, our marker of lateral processing, was not (experiment 1) or only weakly (experiment 2) affected by memantine. We have no straightforward explanation for why NMDA receptor blockade would impact inter-areal feedback connections more strongly than intra-areal lateral connections, considering their strong functional interdependency and interaction in grouping and segmentation processes (Liang et al., 2017). One possibility is that this finding reflects properties of our EEG decoding markers for feedback vs. lateral processing: for example, decoding of the Kanizsa illusion may have been more sensitive to the relatively subtle effect of our pharmacological manipulation, either because overall decoding was better than for collinearity or because NMDA receptor dependent recurrent processes more strongly contribute to illusion decoding than to collinearity decoding.”

      (3) The motivating idea for the paper is that the NMDAR antagonist might disrupt the modulation of the AMPA-mediated glu signal. This is in line with the motivating logic for Self et al., 2012, where NMDAR and AMPAR efficacy in macacque V1 was manipulated via microinfusion. But this logic seems to conflict with a broader understanding of NMDA antagonism. NMDA antagonism appears to generally have the net effect of increasing glu (and ACh) in the cortex through a selective effect on inhibitory GABAergic cells (eg. Olney, Newcomer, & Farber, 1999). Memantine, in particular, has a specific impact on extrasynaptic NMDARs (that is in contrast to ketamine; Milnerwood et al, 2010, Neuron), and this type of receptor is prominent in GABA cells (eg. Yao et al., 2022, JoN). The effect of NMDA antagonists on GABAergic cells generally appears to be much stronger than the effect on glutamergic cells (at least in the hippocampus; eg. Grunze et al., 1996).

      This all means that it's reasonable to expect that memantine might have a benefit to visually evoked activity. This idea is raised in the GD of the paper, based on a separate literature from that I mentioned above. But all of this could be better spelled out earlier in the paper, so that the result observed in the paper can be interpreted by the reader in this broader context.

      To my mind, the challenging task is for the authors to explain why memantine causes an increase in EEG decoding, where microinfusion of an NMDA antagonist into V1 reduced the neural signal Self et al., 2012. This might be as simple as the change in drug... memantine's specific efficacy on extrasynaptic NMDA receptors might not be shared with whatever NMDA antagonist was used in Self et al. 2012. Ketamine and memantine are already known to differ in this way. 

      We addressed the reviewer’s comments in the following way. First, we bring up our (to us, surprising) result already at the end of the Introduction, pointing the reader to the explanation mentioned by the reviewer:

      “We hypothesized that disrupting the reentrant glutamate signal via blocking NMDA receptors by memantine would impair illusion and possibly collinearity decoding, as putative markers of feedback and lateral processing, but would spare the decoding of local contrast differences, our marker of feedforward processing. To foreshadow our results, memantine indeed specifically affected illusion decoding, but enhancing rather than impairing it. In the Discussion, we offer explanations for this surprising finding, including the effect of memantine on extrasynaptic NMDA receptors in GABAergic cells, which may have resulted in boosted visual activity.”

      Second, as outlined in the response to the first point by Reviewer #2, we are now clear throughout the title, abstract, and paper that memantine “improved” rather than “modulated” illusion decoding.

      Third, and most importantly, we restructured and expanded the Discussion section to include the reviewer’s proposed mechanisms and explanations for the effect. We would like to thank the reviewer for pointing us to this literature. We also discuss the results of Self et al. (2012), specifically the distinct effects of the two NMDAR antagonists used in this study, more extensively, and speculate that their effects may have been similar to ketamine and thus possibly opposite of memantine (for the feedback signal):

      “Although both drugs are known to inhibit NMDA receptors by occupying the receptor’s ion channel and are thereby blocking current flow (Glasgow et al., 2017; Molina et al., 2020), the drugs have different actions at receptors other than NMDA, with ketamine acting on dopamine D2 and serotonin 5-HT2 receptors, and memantine inhibiting several subtypes of the acetylcholine (ACh) receptor as well as serotonin 5HT3 receptors. Memantine and ketamine are also known to target different NMDA receptor subpopulations, with their inhibitory action displaying different time courses and intensity (Glasgow et al., 2017; Johnson et al., 2015). Blockade of different NMDA receptor subpopulations can result in markedly different and even opposite results. For example, Self and colleagues (2012) found overall reduced or elevated visual activity after microinfusion of two different selective NMDA receptor antagonists (2-amino-5phosphonovalerate and ifendprodil) in macaque primary visual cortex. Although both drugs impaired the feedback-related response to figure vs. ground, similar to the effects of ketamine (Meuwese et al., 2013; van Loon et al., 2016) such opposite effects on overall activity demonstrate that the effects of NMDA antagonism strongly depend on the targeted receptor subpopulation, each with distinct functional properties.”

      Finally, we link these differences to the potential mechanism via GABAergic neurons:

      “As mentioned in the Introduction, this may be related to memantine modulating processing at other pre- or post-synaptic receptors present at NMDA-rich synapses, specifically affecting extrasynaptic NMDA receptors in GABAergic cells (Milnerwood et al, 2010; Yao et al., 2022). Memantine’s strong effect on extrasynaptic NMDA receptors in GABAergic cells leads to increases in ACh levels, which have been shown to increase firing rates and reduce firing rate variability in macaques (Herrero et al., 2013, 2008). This may represent a mechanism through which memantine (but not ketamine or the NMDA receptor antagonists used by Self and colleagues) could boost visually evoked activity.”

      (4) The paper's proposal is that the effect of memantine is mediated by an impact on the efficacy of reentrant signaling in visual cortex. But perhaps the best-known impact of NMDAR manipulation is on LTP, in the hippocampus particularly but also broadly.

      Perception and identification of the kanisza illusion may be sensitive to learning (eg. Maertens & Pollmann, 2005; Gellatly, 1982; Rubin, Nakayama, Shapley, 1997); what argues against an account of the results from an effect on perceptual learning? Generally, the paper proposes a very specific mechanism through which the drug influences perception. This is motivated by results from Self et al 2012 where an NMDA antagonist was infused into V1. But oral memantine will, of course, have a whole-brain effect, and some of these effects are well characterized and - on the surface - appear as potential sources of change in illusion perception. The paper needs some treatment of the known ancillary effects of diffuse NMDAR antagonism to convince the reader that the account provided is better than the other possibilities. 

      We cannot fully exclude an effect based on perceptual learning but consider this possibility highly unlikely for several reasons. First, subjects have performed more than a thousand trials in a localizer session before starting the main task (in experiment 2 even more than two thousand) containing the drug manipulation. Therefore, a large part of putative perceptual learning would have already occurred before starting the main experiment. Second, the main experiment was counterbalanced across drug sessions, so half of the participants first performed the memantine session and then the placebo session, and the other half of the subjects the other way around. If memantine would have improved perceptual learning in our experiments, one may actually expect to observe improved decoding in the placebo session and not in the memantine session. If memantine would have facilitated perceptual learning during the memantine session, the effect of that facilitated perceptual learning would have been most visible in the placebo session following the memantine session. Because we observed improved decoding in the memantine session itself, perceptual learning is likely not the main explanation for these findings. Third, perceptual learning is known to occur for several stimulus dimensions (e.g., orientation, spatial frequency or contrast). If these findings would have been driven by perceptual learning one would have expected to see perceptual learning for all three features, whereas the memantine effects were specific to illusion decoding. Especially in experiment 2, all features were equally often task relevant and in such a situation one would’ve expected to observe perceptual learning effects on those other features as well.  

      To further investigate any potential role of perceptual learning, we analyzed participants’ performance in detecting the Kanizsa illusion over the course of the experiments. To investigate this, we divided the experiments’ trials into four time bins, from the beginning until the end of the experiment. For the first experiment’s first target (T1), there was no interaction between the factors bin and drug (memantine/placebo; F<sub>3,84</sub>=0.89, P\=0.437; Figure S6A). For the second target (T2), we performed a repeatedmeasures ANOVA with the factors bin, drug, T1-T2 lag (short/long), and masks (present/absent). There was only a trend towards a bin by drug interaction (F<sub>3,84</sub>=2.57, P\=0.064; Figure S6B), reflecting worse performance under memantine in the first three bins and slightly better performance in the fourth bin. The other interactions that include the factors bin and drug factors were not significant (all P>0.117). For the second experiment, we performed a repeated-measures ANOVA with the factors bin, drug, masks, and task-relevant feature (local contrast/collinearity/illusion). None of the interactions that included the bin and drug factors were significant (all P>0.219; Figure S6C). Taken together, memantine does not appear to affect Kanizsa illusion detection performance through perceptual learning. Finally, there was no interaction between the factors bin and task-relevant feature (F<sub>6,150</sub>=0.76, P\=0.547; Figure S6D), implying there is no perceptual learning effect specific to Kanizsa illusion detection. We included these analyses in our revised Supplement as Fig. S6.

      (5) The cross-decoding approach to data analysis concerns me a little. The approach adopted here is to train models on a localizer task, in this case, a task where participants matched a kanisza figure to a target template (E1) or discriminated one of the three relevant stimuli features (E2). The resulting model was subsequently employed to classify the stimuli seen during separate tasks - an AB task in E1, and a feature discrimination task in E2. This scheme makes the localizer task very important. If models built from this task have any bias, this will taint classifier accuracy in the analysis of experimental data. My concern is that the emergence of the kanisza illusion in the localizer task was probably quite salient, respective to changes in stimuli rotation or collinearity. If the model was better at detecting the illusion to begin with, the data pattern - where drug manipulation impacts classification in this condition but not other conditions - may simply reflect model insensitivity to non-illusion features.

      I am also vaguely worried by manipulations implemented in the main task that do not emerge in the localizer - the use of RSVP in E1 and manipulation of the base rate and staircasing in E2. This all starts to introduce the possibility that localizer and experimental data just don't correspond, that this generates low classification accuracy in the experimental results and ineffective classification in some conditions (ie. when stimuli are masked; would collinearity decoding in the unmasked condition potentially differ if classification accuracy were not at a floor? See Figure 3c upper, Figure 5c lower).

      What is the motivation for the use of localizer validation at all? The same hypotheses can be tested using within-experiment cross-validation, rather than validation from a model built on localizer data. The argument may be that this kind of modelling will necessarily employ a smaller dataset, but, while true, this effect can be minimized at the expense of computational cost - many-fold cross-validation will mean that the vast majority of data contributes to model building in each instance. 

      It would be compelling if results were to reproduce when classification was validated in this kind of way. This kind of analysis would fit very well into the supplementary material.

      We thank the reviewer for this excellent question. We used separate localizers for several reasons, exactly to circumvent the kind of biases in decoding that the reviewer alludes to. Below we have detailed our rationale, first focusing on our general rationale and then focusing on the decisions we made in designing the specific experiments.  

      Using a localizer task in the design of decoding analysis offers several key advantages over relying solely on k-fold cross-validation within the main task:

      (1) Feature selection independence and better generalization: A separate localizer task allows for independent feature selection, ensuring that the features used for decoding are chosen without bias from the main task data. Specifically, the use of a localizer task allows us to determine the time-windows of interest independently based on the peaks of the decoding in the localizer. This allows for a better direct comparison between the memantine and placebo conditions because we can isolate the relevant time windows outside a drug manipulation. Further, training a classifier on a localizer task and testing it on a separate experimental task assesses whether neural representations generalize across contexts, rather than simply distinguishing conditions within a single dataset. This supports claims about the robustness of the decoded information.

      (2) Increased sensitivity and interpretability: The localizer task can be designed specifically to elicit strong, reliable responses in the relevant neural patterns. This can improve signal-to-noise ratio and make it easier to interpret the features being used for decoding in the test set. We facilitate this by having many more trials in the localizer tasks (1280 in E1 and 5184 in E2) than in the separate conditions of the main task, in which we would have to do k-folding (e.g., 2, mask, x 2 (lag) design in E1 leaves fewer than 256 trials, due to preprocessing, for specific comparisons) on very low trial numbers. The same holds for experiment 2 which has a 2x3 design, but also included the base-rate manipulation. Finally, we further facilitate sensitivity of the model by having the stimuli presented at full contrast without any manipulations of attention or masking during the localizer, which allows us to extract the feature specific EEG signals in the most optimal way.

      (3) Decoupling task-specific confounds: If decoding is performed within the main task using k-folding, there is a risk that task-related confounds (e.g., motor responses, attention shifts, drug) influence decoding performance. A localizer task allows us to separate the neural representation of interest from these taskrelated confounds.

      Experiment 1 

      In experiment 1, the Kanizsa was always task relevant in the main experiment in which we employed the pharmacological manipulation. To make sure that the classifiers were not biased towards Kanizsa figures from the start (which would be the case if we would have done k-folding in the main task), we used a training set in which all features were equally relevant for task performance. As can be seen in figure 1E, which plots the decoding accuracies of the localizer task, illusion decoding as well as rotation decoding were equally strong, whereas collinearity decoding was weaker. It may be that the Kanizsa illusion was quite salient in the localizer task, which we can’t know at present, but it was at least less salient and relevant than in the main task (where it was the only task-relevant feature). Based on the localizer decoding results one could argue that the rotation dimension and illusion dimension were most salient, because the decoding was highest for these dimensions. Clearly the model was not insensitive to nonillusory features. The localizer task of experiment 2 reveals that collinearity decoding tends to be generally lower, even when that feature is task relevant.  

      Experiment 2 

      In experiment 2, the localizer task and main task were also similar, with three exceptions: during the localizer task no drug was active, and no masking and no base rate manipulation were employed. To make sure that the classifier was not biased towards a certain stimulus category (due to the bias manipulation), e.g. the stimulus that is presented most often, we used a localizer task without this manipulation. As can be seen in figure 4D decoding of all the features was highly robust, also for example for the collinearity condition. Therefore the low decoding that we observe in the main experiment cannot be due to poor classifier training or feature extraction in the localizer. We believe this is actually an advantage instead of a disadvantage of the current decoding protocol.

      Based on the rationale presented above we are uncomfortable performing the suggested analyses using a k-folding approach in the main task, because according to our standards the trial numbers are too low and the risk that these results are somehow influenced by task specific confounds cannot be ruled out.  

      Line 301 - 'Interestingly, in both experiments the effect of memantine... was specific to... stimuli presented without a backward mask.' This rubs a bit, given that the mask broadly disrupted classification. The absence of memantine results in masked results may simply be a product of the floor ... some care is needed in the interpretation of this pattern. 

      In the results section of experiment 1, we added:

      “While the interaction between masking and memantine only approached significance (P\=0.068), the absence of an effect of memantine in the masked condition could reflect a floor effect, given that illusion decoding in the masked condition was not significantly better than chance.”

      While floor is less likely to account for the absence of an effect in the masked condition in experiment 2, where illusion decoding in the masked condition was significantly above chance, it is still possible that to obtain an effect of memantine, decoding accuracy needed to be higher. We therefore also added here:

      “For our time window-based analyses of illusion decoding, the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking (note, however, given overall much lower decoding accuracy in the masked condition, the lack of a memantine effect could reflect a floor effect).”

      In the discussion, we changed the sentence to read “…the effect of memantine on illusion decoding tended to be specific to attended, task-relevant stimuli presented without a backward mask.”

      Line 441 - What were the contraindications/exclusion parameters for the administration of memantine? 

      Thanks for spotting this. We have added the relevant exclusion criteria in the revised version of the supplement. See also below.

      – Allergy for memantine or one of the inactive ingredients of these products;

      – (History of) psychiatric treatment;

      – First-degree relative with (history of) schizophrenia or major depression;

      – (History of) clinically significant hepatic, cardiac, obstructive respiratory, renal, cerebrovascular, metabolic or pulmonary disease, including, but not limited to fibrotic disorders;

      – Claustrophobia;

      –  Regular usage of medicines (antihistamines or occasional use of paracetamol);

      – (History of) neurological disease;

      –  (History of) epilepsy;

      –  Abnormal hearing or (uncorrected) vision;

      –  Average use of more than 15 alcoholic beverages weekly;

      – Smoking

      – History of drug (opiate, LSD, (meth)amphetamine, cocaine, solvents, cannabis, or barbiturate) or alcohol dependence;

      – Any known other serious health problem or mental/physical stress;

      – Used psychotropic medication, or recreational drugs over a period of 72 hours prior to each test session,  

      – Used alcohol within the last 24 hours prior to each test session;

      – (History of) pheochromocytoma.

      – Narrow-angle glaucoma;

      – (History of) ulcer disease;

      – Galactose intolerance, Lapp lactase deficiency or glucose­galactose malabsorption.

      – (History of) convulsion;

      Line 587 - The localizer task used to train the classifier in E2 was collected in different sessions. Was the number of trials from separate sessions ultimately equal? The issue here is that the localizer might pick up on subtle differences in electrode placement. If the test session happens to have electrode placement that is similar to the electrode placement that existed for a majority of one condition of the localizer... this will create bias. This is likely to be minor, but machine classifiers really love this kind of minor confound.

      Indeed, the trial counts in the separate sessions for the localizer in E2 were equal. We have added that information to the methods section.  

      Experiment 1: 1280 trials collected during the intake session.

      In experiment 2: 1728 trials were collected per session (intake, and 2 drug sessions), so there were 5184 trials across three sessions.

      Reviewer #2:

      To start off, I think the reader is being a bit tricked when reading the paper. Perhaps my priors are too strong, but I assumed, just like the authors, that NMDA-receptors would disrupt recurrent processing, in line with previous work. However, due to the continuous use of the ambiguous word 'affected' rather than the more clear increased or perturbed recurrent processing, the reader is left guessing what is actually found. That's until they read the results and discussion finding that decoding is actually improved. This seems like a really big deal, and I strongly urge the authors to reword their title, abstract, and introduction to make clear they hypothesized a disruption in decoding in the illusion condition, but found the opposite, namely an increase in decoding. I want to encourage the authors that this is still a fascinating finding.

      We thank the reviewer for the positive assessment of our manuscript, and for many helpful comments and suggestions.  

      We changed the title, abstract, and introduction in accordance with the reviewer’s comment, highlighting that “memantine […] improves decoding” and “enhances recurrent processing” in all three sections. We also changed the heading of the corresponding results section to “Memantine selectively improves decoding of the Kanizsa illusion”.

      Apologies if I have missed it, but it is not clear to me whether participants were given the drug or placebo during the localiser task. If they are given the drug this makes me question the logic of their analysis approach. How can one study the presence of a process, if their very means of detecting that process (the localiser) was disrupted in the first place? If participants were not given a drug during the localiser task, please make that clear. I'll proceed with the rest of my comments assuming the latter is the case. But if the former, please note that I am not sure how to interpret their findings in this paper.

      Thanks for asking this, this was indeed unclear. In experiment 1 the localizer was performed in the intake session in which no drugs were administered. In the second experiment the localizer was performed in all three sessions with equal trial numbers. In the intake session no drugs were administrated. In the other two sessions the localizer was performed directly after pill intake and therefore the memantine was not (or barely) active yet. We started the main task four hours after pill intake because that is the approximate peak time of memantine. Note that all three localizer tasks were averaged before using them as training set. We have clarified this in the revised manuscript.

      The main purpose of the paper is to study recurrent processing. The extent to which this study achieves this aim is completely dependent to what extent we can interpret decoding of illusory contours as uniquely capturing recurrent processing. While I am sure illusory contours rely on recurrent processing, it does not follow that decoding of illusory contours capture recurrent processing alone. Indeed, if the drug selectively manipulates recurrent processing, it's not obvious to me why the authors find the interaction with masking in experiment 2. Recurrent processing seems to still be happening in the masked condition, but is not affected by the NMDA-receptor here, so where does that leave us in interpreting the role of NMDA-receptors in recurrent processing? If the authors can not strengthen the claim that the effects are completely driven by affecting recurrent processing, I suggest that the paper will shift its focus to making claims about the encoding of illusory contours, rather than making primary claims about recurrent processing.

      We indeed used illusion decoding as a marker of recurrent processing. Clearly, such a marker based on a non-invasive and indirect method to record neural activity is not perfect. To directly and selectively manipulate recurrent processing, invasive methods and direct neural recordings would be required. However, as explained in the revised Introduction,

      “In recent work we have validated that the decoding profiles of these features of different complexities at different points in time, in combination with the associated topography, can indeed serve as EEG markers of feedforward, lateral and recurrent processes (Fahrenfort et al., 2017; Noorman et al., 2023).”  

      The timing and topography of the decoding results of the present study were consistent with our previous EEG decoding studies (Fahrenfort et al., 2017; Noorman et al., 2023). This validates the use of these EEG decoding signatures as (imperfect) markers of distinct neural processes, and we continue to use them as such. However, we expanded the discussion section to alert the reader to the indirect and imperfect nature of these EEG decoding signatures as markers of distinct neural processes: “Our approach relied on using EEG decoding of different stimulus features at different points in time, together with their topography, as markers of distinct neural processes. Although such non-invasive, indirect measures of neural activity cannot provide direct evidence for feedforward vs. recurrent processes, the timing, topography, and susceptibility to masking of the decoding signatures obtained in the present study are consistent with neurophysiology (e.g., Bosking et al., 1997; Kandel et al., 2000; Lamme & Roelfsema, 2000; Lee & Nguyen, 2001; Liang et al., 2017; Pak et al., 2020), as well as with our previous work (Fahrenfort et al., 2017; Noorman et al., 2023).” 

      The reviewer is also concerned about the lack of effect of memantine on illusion decoding in the masked condition in experiment 2. In our view, the strong effect of masking on illusion decoding (both in absolute terms, as well as when compared to its effect on local contrast decoding), provides strong support for our assumption that illusion decoding represents a marker of recurrent processing. Nevertheless, as the reviewer points out, weak but statistically significant illusion decoding was still possible in the masked condition, at least when the illusion was task-relevant. As the reviewer notes, this may reflect residual recurrent processing during masking, a conclusion consistent with the relatively high behavioral performance despite masking (d’ > 1). However, rather than invalidating the use of our EEG markers or challenging the role of NMDA-receptors in recurrent processing, this may simply reflect a floor effect. As outlined in our response to reviewer #1 (who was concerned about floor effects), in the results section of experiment 1, we added:

      “While the interaction between masking and memantine only approached significance (P\=0.068), the absence of an effect of memantine in the masked condition could reflect a floor effect, given that illusion decoding in the masked condition was not significantly better than chance.”

      And for experiment 1:

      “For our time window-based analyses of illusion decoding, the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking (note, however, given overall much lower decoding accuracy in the masked condition, the lack of a memantine effect could reflect a floor effect).”

      An additional claim is being made with regards to the effects of the drug manipulation. The authors state that this effect is only present when the stimulus is 1) consciously accessed, and 2) attended. The evidence for claim 1 is not supported by experiment 1, as the masking manipulation did not interact in the cluster-analyses, and the analyses focussing on the peak of the timing window do not show a significant effect either. There is evidence for this claim coming from experiment 2 as masking interacts with the drug condition. Evidence for the second claim (about task relevance) is not presented, as there is no interaction with the task condition. A classical error seems to be made here, where interactions are not properly tested. Instead, the presence of a significant effect in one condition but not the other is taken as sufficient evidence for an interaction, which is not appropriate. I therefore urge the authors to dampen the claim about the importance of attending to the decoded features. Alternatively, I suggest the authors run their interactions of interest on the time-courses and conduct the appropriate clusterbased analyses.

      We thank the reviewer for pointing out the importance of key interaction effects. Following the reviewer’s suggestion, we dampened our claims about the role of attention. For experiment 1, we changed the heading of the relevant results section from “Memantine’s effect on illusion decoding requires attention” to “The role of consciousness and attention in memantine’s effect on illusion decoding”, and we added the following in the results section:

      “Also our time window-based analyses showed a significant effect of memantine only when the illusion was both unmasked and presented outside the AB (t_28\=-2.76, _P\=0.010, BF<sub>10</sub>=4.53; Fig. 3F). Note, however, that although these post-hoc tests of the effect of memantine on illusion decoding were significant, for our time window-based analyses we did not obtain a statistically significant interaction between the AB and memantine, and the interaction between masking and memantine only approached significance (P\= 0.068). Thus, although these memantine effects were slightly less robust than for T1, probably due to reduced trial counts, these results point to (but do not conclusively demonstrate) a selective effect of memantine on illusion-related feedback processing that depends on the availability of attention. In addition to the lack of the interaction effect, another potential concern…”

      For experiment 2, we added the following in the results section:

      “Note that, for our time window-based analyses of illusion decoding, although the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking, we did not obtain a statistically significant interaction between memantine and task-relevance. Thus, although the memantine effect was significant only when the illusion was unmasked and taskrelevant, just like for the effect of temporal attention in experiment 1, these results do not conclusively demonstrate a selective effect of memantine that depends attention (task-relevance).”

      In the discussion, we toned down claims about memantine’s effects being specific to attended conditions, we are highlighting the “preliminary” nature of these findings, and we are now alerting the reader explicitly to be careful with interpreting these effects, e.g.:

      “Although these results have to be interpreted with caution because the key interaction effects were not statistically significant, …”

      How were the length of the peak-timing windows established in Figure 1E? My understanding is that this forms the training-time window for the further decoding analyses, so it is important to justify why they have different lengths, and how they are determined. The same goes for the peak AUC time windows for the interaction analyses. A number of claims in the paper rely on the interactions found in these posthoc analyses, so the 223- to 323 time window needs justification.

      Thanks for this question. The length of these peak-timing windows is different because the decoding of rotation is temporarily very precise and short-lived, whereas the decoding of the other features last much longer and is more temporally variable. In fact, we have followed the same procedure as in a previously published study (Noorman et al., elife 2025) for defining the peak-timing and length of the windows. We followed the same procedure for both experiments reported in this paper, replicating the crucial findings and therefore excluding the possibility that these findings are in any way dependent on the time windows that are selected. We have added that information to the revised version of the manuscript.

      Reviewer #3:

      First, despite its clear pattern of neural effects, there is no corresponding perceptual effect. Although the manipulation fits neatly within the conceptual framework, and there are many reasons for not finding such an effect (floor and ceiling effects, narrow perceptual tasks, etc), this does leave open the possibility that the observation is entirely epiphenomenal, and that the mechanisms being recorded here are not actually causally involved in perception per se.

      We thank the reviewer for the positive assessment of our work. The reviewer rightly points out that, to our surprise, we did not obtain a correlate of the effect of memantine in our behavioral data. We agree with the possible reasons for the absence of such an effect highlighted by the reviewer, and expanded our discussion section accordingly:

      “There are several possible reasons for this lack of behavioral correlate.  For example, EEG decoding may be a more sensitive measure of the neural effects of memantine, in particular given that perceptual sensitivity may have been at floor (masked condition, experiment 1) or ceiling (unmasked condition, experiment 1, and experiment 2). It is also possible that the present decoding results are merely epiphenomenal, not mapping onto functional improvements (e.g., Williams et al., 2007). However, given that in our previous work we found a tight link between these EEG decoding markers and behavioral performance (Fahrenfort et al., 2017; Noorman et al., 2023), it is possible that the effect of memantine in the present study was just too subtle to show up in changes in overt behavior.”

      Second, although it is clear that there is an effect on decoding in this particular condition, what that means is not entirely clear - particularly since performance improves, rather than decreases. It should be noted here that improvements in decoding performance do not necessarily need to map onto functional improvements, and we should all be careful to remain agnostic about what is driving classifier performance. Here too, the effect of memantine on decoding might be epiphenomenal - unrelated to the information carried in the neural population, but somehow changing the balance of how that is electrically aggregated on the surface of the skull. *Something* is changing, but that might be a neurochemical or electrical side-effect unrelated to actual processing (particularly since no corresponding behavioural impact is observed.)

      We would like to refer to our reply to the previous point, and we would like to add that in our previous work (Fahrenfort et al., 2017; Noorman et al., 2023) similar EEG decoding markers were often tightly linked to changes in behavioral performance. This indicates that these particular EEG decoding markers do not simply reflect some sideeffect not related to neural processing. However, as stated in the revised discussion section, “it is possible that the effect of memantine in the present study was just too subtle to show up in changes in overt behavior.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      (…) In my view, the part about NF-YA1 is less strong - although I realize this is a compelling candidate to be a regulator of cell cycle progression, the experimental approaches used to address this question falls a bit short, in particular, compared to the very detailed approaches shown in the rest of the manuscript. The authors show that the transcription factor NF-YA1 regulates cell division in tobacco leaves; however, there is no experimental validation in the experimental system (nodules). All conclusions are based on a heterologous cell division system in tobacco leaves. The authors state that NF-YA1 has a nodule-specific role as a regulator of cell differentiation. I am concerned the tobacco system may not allow for adequate testing of this hypothesis.

      Reviewer #1 makes a valid point by asking to focus the manuscript more explicitly on the role of NF-YA1 as a differentiation factor in a symbiotic context. We have now addressed this formally and experimentally.

      The involvement of A-type NF-Y subunits in the transition to the early differentiation of nodule cells has been documented in model legumes through several publications that we refer to in the revised version of the discussion (lines 617/623). We fully agree that the CDEL system, because it is heterologous, does not allow us more than to propose a parallel explanation for these observations - i.e_., that the Medicago NF-YA1 subunit presumably acts in post-replicative cell-cycle regulation at the G2/M transition. Considering your recommendations and those of reviewer #2, we sought to support this conclusion by testing the impact of localized over-expression of _NF-YA1 on cortical cell division and infection competence at an early stage of root colonization. The results of these experiments are now presented in the new Figure 9 and Figure 9-figure supplement 1-5 and described from line 435 to 495.

      With the fluorescent tools the authors have at hand (in particular tools to detect G2/M transition, which the authors suggest is regulated by NF-YA1), it would be interesting to test what happens to cell division if NF-YA1 is over-expressed in Medicago roots?

      To limit pleiotropic effects of an ectopic over-expression, we used the symbiosis-induced, ENOD11 promoter to increase NF-YA1 expression levels more specifically along the trajectory of infected cells. We chose to remain in continuity with the experiments performed in the CDEL system by opting for a destabilized version of the KNOLLE transcriptional reporter to detect the G2/M transition. The results obtained are presented in Figure 9B (quantification of split infected cells), in Figure 9-figure supplement 1B (ENOD11 expression profile), in Figure 9-figure supplement 3B (representative confocal images) and Figure 9-figure supplement 4D (quantification of pKNOLLE reporter signal). There, we show that mitosis remains inhibited in cells accommodating infection threads, but is completed in a higher proportion of outer cortical cells positioned on the infection trajectory, where ENOD11 gene transcription is active before their physical colonization.

      Based on NF-YA1 expression data published previously and their results in tobacco epidermal cells, the authors hypothesize that NF-YA regulates the mitotic entry of nodule primordial cells. Given that much of the manuscript deals with earlier stages of the infection, I wonder if NF-YA1 could also have a role in regulating mitotic entry in cells adjacent to the infection thread?

      The expression profile of NF-YA1 at early stages of cortical infection (Laporte et al., 2014) is indeed similar to the one of ENOD11 (as shown in Figure 9-figure supplement 1C) in wild-type Medicago roots, with corresponding transcriptional reporters being both activated in cells adjacent to the infection thread. Under our experimental conditions, additional expression of NF-YA1 (driven by the ENOD11 promoter) in these neighbouring cells did not impact their propensity to enter mitosis and to complete cell division. These results are presented in Figure 9-figure supplement 4D (quantification of pKNOLLE reporter signal) and Figure 9-figure supplement 5 (quantification of split neighbouring cells).

      Reviewer #1 (Recommendations For The Authors):

      - In the first part, images show the qualitative presence/absence of H3.1 or H3.3 histones.

      Upon closer inspection, many cells seem to have both histones. In Fig1-S1 for example (root meristem), it is evident that there are many cells with low but clearly present H3.1 content in the green channel; however, in the overlay, the green is lost and H3.3 (pink) is mainly visible. What does this mean in terms of the cell cycle? 

      We fully agree with reviewer #1 on these points. Independent of whether they have low or high proliferation potential, most cells retain histone H3.1 particularly in silent regions of the genome, while H3.3 is constitutively produced and enriched at transcriptionally active regions. When channels are overlaid, cells in an active proliferation or endoreduplication state (in G1, S or G2, depending on the size of their nuclei) will appear mainly "green" (H3.1-eGFP positive). Cells with a low proliferation potential (e.g., in the QC), G2-arrested (e.g., IT-traversed) or terminally differentiating (e.g., containing symbiosomes or arbuscules) will appear mainly "magenta" (H3.1-low, medium to high H3.3-mCherry content).

      Furthermore, all nodule images only display the overlay image, and individual fluorescence channels are not shown. Does the same masking effect happen here? It may be helpful to quantify fluoresce intensity not only in green but also in red channels as done for other experiments.

      Quantifying fluorescence intensity in the mCherry channel may indeed help to highlight the likely replacement of H3.1-eGFP by H3.3-mCherry in infected cells, as described by Otero and colleagues (2016) at the onset of cellular differentiation. However, the quantification method as established (i.e., measuring the corrected total nuclear fluorescence at the equatorial plane) cannot be applied, most of the time, to infected cells' nuclei due to the overlapping presence of mCherry-producing S. meliloti in the same channel (e.g., in Figure 2B). Nevertheless, and to avoid this masking effect when the eGFP and mCherry channels are overlaid, we now present them as isolated channels in revised Figures 1-3 and associated figure supplements. As the cell-wall staining is regularly included and displayed in grayscale, we assigned to both of them the Green Fire Blue lookup table, which maps intensity values to a multiple-colour sequential scheme (with blue or yellow indicating low or high fluorescence levels, respectively). We hope that this will allow a better appreciation of the respective levels of H3.1- and H3.3-fusions in our confocal images.

      - Fig 1 B - it is hard to differentiate between S. meliloti-mCherry and H3.3-mCherry. Is there a way to label the different structures?

      In the revised version of Figure 1B, we used filled or empty arrowheads to point to histone H3-containing nuclei. To label rhizobia-associated structures, we used dashed lines to delineate nodule cells hosting symbiosomes and included the annotation “IT” for infection threads. We also indicated proliferating, endoreduplicating and differentiating tissues and cells using the following annotations: “CD” for cell division, “En” for endoreduplication and “TD” for terminal differentiation. All annotations are explained in the figure legend.

      - Fig 1 - supplement E and F - no statistics are shown.

      We performed non-parametric tests using the latest version of the GraphPad Prism software (version 10.4.1). Stars (Figure 1-figure supplement 1F) or different letters (Figure 1-figure supplement 1G) now indicate statistically significant differences. Results of the normality and non-parametric tests were included in the corresponding Source Data Files (Figure 1 – figure supplement 1 – source data 1 and 2). We have also updated the compact display of letters in other figures as indicated by the new software version. The raw data and the results of the statistical analyses remain unchanged and can be viewed in the corresponding source files.

      - Fig 2 A - overview and close-up image do not seem to be in the same focal plane. This is confusing because the nuclei position is different (so is the infection thread position).

      We fully agree that our former Figure may have confused reviewers #1 and #2 as well as readers. Figure 2A was designed to highlight, from the same nodule primordium, actively dividing cells of the inner cortex (optical section z 6-14) and cells of the outer cortex traversed, penetrated by or neighbouring an infection thread (optical section z 11-19). We initially wanted to show different magnification views of the same confocal image (i.e_._, a full-view of the inner cortex and a zoomed-view of the outer layers) to ensure that audiences can identify these details. In the revised version of Figure 2A, we displayed these full- and zoomed-views in upper and lower panels, respectively and we removed the solid-line inset to avoid confusion. 

      - Fig 1A and Fig 2E could be combined and shown at the beginning of the manuscript. Also, consider making the cell size increase more extreme, as it is important to differentiate G2 cells after H3.1 eviction and cells in G1. You have to look very closely at the graph to see the size differences.

      We have taken each of your suggestions into account. A combined version of our schematic representation with more pronounced nuclei size differences is now presented in Figure 1A.

      - Fig. 3 C is difficult to interpret. Can this be split into different panels?

      We realized that our previous choice of representation may have been confusing. Each value corresponds only to the H3.1-eGFP content, measured in an infected cell and reported to that of the neighbouring cell (IC / NC) within individual root samples. Therefore, we removed the green-magenta colour code and changed the legend accordingly. We hope that these slight modifications will facilitate the interpretation of the results - namely, that the relative level of H3.1 increases significantly in infected cells in the selected mutants compared to the wild-type. This mode of representation also highlights that in the mutants, there are more individual cases where the H3.1 content in an infected cell exceeds that of the neighbouring cell by more than two times. These cases would be masked if the couples of infected cells and associated neighbours would be split into different panels as in Figure 3B.

      - Line 357/359. I assume you mean ...'through the G2 phase can commit to nuclear division'.

      We have edited this sentence according to your suggestion, which now appears in line 370. 

      Reviewer #2 (Recommendations For The Authors):

      Cell cycle control during the nitrogen-fixing symbiosis is an important question but only poorly understood. This manuscript uses largely cell biological methods, which are always of the highest quality - to investigate host cell cycle progression during the early stages of nodule formation, where cortical infection threads penetrate the nodule primordium. The experiments were carefully conducted, the observations were detail oriented, and the results were thought-provoking. The study should be supported by mechanistic insights. 

      (1) One thought provoked by the authors' work is that while the study was carried out at an unprecedented resolution, the relationship between control of the cell cycle and infection thread penetration remains correlative. Is this reduced replicative potential among cells in the infection thread trajectory a consequence of hosting an infection thread, or a prerequisite to do so?

      We understand and share the point of view of reviewer #2. At this stage, we believe that our data won’t enable us to fully answer the question, thus this relationship remains rather correlative. The reasons are that 1) the access to the status of cortical cells below C2 is restricted to fixed material and therefore only represents a snapshot of the situation, and 2) we are currently unable to significantly interfere with mechanisms as intertwined as cell cycle control and infection control. What we can reasonably suggest from our images is that the most favorable window of the cell cycle for cells about to be crossed by an infection thread is post-replicative, i.e., the G2 phase. Typical markers of the G2 phase were recurrently observed at the onset of physical colonization – enlarged nucleus, containing less histone H3.1 than neighbouring cells in S phase (e.g., in Figure 2A). Reaching the G2 phase could therefore be a prerequisite for infection (and associated cellular rearrangements), while prolonged arrest in this same phase is likely a consequence of transcellular passage towards a forming nodule primordium.

      More importantly, in either scenario, what is the functional significance of exiting the cell cycle or endocycle? By stating that "local control of mitotic activity could be especially important for rhizobia to timely cross the middle cortex, where sustained cellular proliferation gives rise to the nodule meristem" (Line 239), the authors seem to believe that cortical cells need to stop the cell cycle to prepare for rhizobia infection. This is certainly reasonable, but the current study provides no proof, yet. To test the functional importance of cell cycle exit, one would interfere with G2/M transition in nodule cells,  and examine the effect on infection.

      We fully agree with reviewer #2 that the functional importance of a cell-cycle arrest on the infection thread trajectory remains to be demonstrated. Interfering with cell-cycle progression in a system as complex and fine-tuned as infected legume roots certainly requires the right timing – at the level of the tissue and of individual cells; the right dose; and the right molecular player(s) (i.e., bona fide activators or repressors of the G2/M transition). Using the symbiosis-specific NPL promoter, activated in the direct vicinity of cortical infection threads (Figure 9-figure supplement 1B), we tried to force infectable cells to recruit the cell division program by ectopically over-expressing the Arabidopsis CYCD3.1, “mimicking” the CDEL system. So far, this strategy has not resulted in a significant increase in the number of uninfected nodules in transgenic hairy roots - though the effect on symbiosome release remains to be investigated. Provided that a suitable promoter-cell cycle regulator combination is identified, we hope to be able to answer this question in the future.

      Given that the authors have already identified a candidate, and showed it represses cell division in the CDEL system, not testing the same gene in a more relevant context seems a lost opportunity. If one ectopically expressed NY-YA1 in hairy roots, thus repressing mitosis in general, would more cells become competent to host infection threads? This seems a straightforward experiment and readily feasible with the constructs that the authors already have. If this view is too naive, the authors should explain why such a functional investigation does not belong in this manuscript.

      Reviewer #2's point is entirely valid, and we decided to address it through additional experiments. To avoid possible side effects on development by affecting cell division in general, we placed NF-YA1 under control of the symbiosis-induced ENOD11 promoter. Based on the results obtained in the CDEL system, the pENOD11::FLAG-NF-YA1 cassette was coupled to a destabilized version of the KNOLLE transcriptional reporter to detect the G2/M transition. Competence for transcellular infection was maintained upon local NFYA1 overexpression, the latter leading to a slight (non-significant) increase in the number of infected cells per cortical layer. These results are presented in Figure 9-figure supplement 3A-B (representative confocal images) and in Figure 9-figure supplement 4A-

      G.

      (1b) A related comment: on Line 183, it was stated that "The H3.1-eGFP fusion protein was also visible in cells penetrated but not fully passed by an infection thread". Presumably, the authors were talking about the cell marked by the arrowhead. But its H3.1-GFP signal looks no different from the cell immediately to its left. It is hard to say which cells are ones "preparing for intracellular infection pass through S-phase", and which ones are just "regularly dividing cortical cells forming the nodule primordium". What can be concluded is that once a cell has been fully transversed by an infection thread, its H3.1 level is low. Whether this is the cause or consequence of infection cannot be resolved simply by timing the appearance or disappearance of H3.1-GFP.

      We basically agree with comment 1b. In an unsynchronized system such as infected hairy roots, it is challenging to detect the event where a cell is penetrated, but not yet completely crossed by an infection thread. What we wanted to emphasize in Figure 2A, is that host cells in the path of an infection thread re-enter the cell cycle and pass through S-phase just as their neighbours do (as pointed out by reviewer #2 in his summary). The larger nucleus with slightly lower H3.1-eGFP signal than the neighbouring cell (as indicated by the use of the Green Fire Blue lookup table) suggests that the infected cell marked by the arrowhead in Figure 2A is actually in the G2 phase. The main difference is indeed that cells allowing complete infection thread passage exit the cell cycle and largely evict H3.1 while their neighbours proceed to cell division (as exemplified by PlaCCI reporters in Figure 4CD and the new Figure 5-figure supplement 2). Whether cell-cycle exit in G2 is a cause, or a consequence of cortical infection is a question that cannot be easily answered from fixed samples, which is a limitation of our study.

      (2) The authors have convincingly demonstrated that cortical cells accommodating infection threads exit the cell cycle, inhibit cell division, and down-regulate KNOLLE expression. How do these observations reconcile with the feature called the pre-infection thread? The authors devoted one paragraph to this question in the Discussion, but this does seem sufficient given that the pre-infection thread is a prominent concept. Is the resemblance to the cell division plane superficial, or does it reflect a co-option of the normal cytokinesis machinery for accommodating rhizobia?

      From our point of view, cortical cells forming pre-infection threads are likely in an intermediate state. PIT structures undoubtedly share many similarities with cells establishing a cell division plane. The recruitment of at least some of the players normally associated with cytokinesis has been demonstrated and is consistent with the maintenance of infectable cells in a pre-mitotic phase in Medicago, as discussed in lines 558 to 568. We nevertheless think that the arrest of the cell cycle in the G2 phase, presumably occurring in crossed cortical cells, constitutes an event of cellular differentiation and specialization in transcellular infection. 

      The following are mainly points of presentation and description: 

      (3) Line 158: I can't see "subnuclear foci" in Figure 1-figure supplement 1C-E. However, they are visible in Fig. 1C.

      We hope that presenting the eGFP and mCherry channels in separate panels and assigning them the Green Fire Blue colour scheme provides better visibility and contrast of these detailed structures. We now refer to Figure 1C in addition to Figure 1–figure supplement 1E in the main text (line 161). 

      (4) Line 160: The authors should outline a larger region containing multiple QC cells, rather than pointing to a single cell, as there are other areas in the image containing cells with the same pattern.

      We updated Figure 1-figure supplement 1E accordingly.

      (5) Fig. 1B should include single channels, since within a single plant cell, the nucleus, the infection thread, and sometimes symbiosomes all have the same color. This makes it hard to see whether the nuclei in these cells are less green, or are simply overwhelmed by the magenta color.

      To improve the readability of Figure 1B and to address suggestions from individual reviewers, we now include separate channels and have annotated the different structures labeled by mCherry.

      (6) Fig. 2A: the close-up does not match the boxed area in the left panel. Based on the labeling, it seems that the two panels are different optical sections. But why choose a different optical depth for the left panel? This can be disorienting to the author, because one expects the close-up to be the same image, just under higher magnification.

      We fully agree that our previous choice of representation may have been confusing. As we also specified to reviewer #1, we wanted to show a full-view of proliferating cells in the inner cortex and a zoomed-view of infected cells in the outer layers of the same nodule primordium. In the revised version of Figure 2A, we displayed these full- and zoomedviews in separate panels and removed the boxed area to avoid confusion. 

      (7) Figure 2-figure supplement 1B: the cell indicated by the empty arrowhead has a striking pattern of H3.1 and H3.3 distribution on condensed chromosomes. Can you comment on that?

      Reviewer #2 may be referring to the apparent enrichment of H3.3 at telomeres, previously described in Arabidopsis, while pericentromeric regions are enriched in H3.1. This distribution is indeed visible on most of the condensed chromosomes shown in Figure 2-figure supplement 1B. We included this comment in the corresponding caption.

      (8) Fig. 4: It is not very easy to distinguish M phase. Can the authors describe how each phase is supposed to look like with the reporters?

      We agree with reviewer #2 and attempted to improve Figure 4, which is now dedicated to the Arabidopsis PlaCCI reporter. ECFP, mCherry, and YFP channels were presented separately and the corresponding cell-cycle phases (in interphase and mitosis) were annotated. The Green Fire Blue lookup table was assigned to each reporter to provide the best visibility of, for example, chromosomes in early prophase. We included a schematic representation corresponding to the distribution of each reporter, using the colors of the overlaid image to facilitate its interpretation.

      (9) Line 298: what is endopolyploid? This term is used at least three times throughout the manuscript. How is it different from polyploid?

      In the manuscript, we aimed to differentiate the (poly)ploidy of an organism (reflecting the number of copies of the basic genome and inherited through the germline) from endopolyploidy produced by individual somatic cells. As reviewed by Scholes and Paige, polyploidy and endopolyploidy differ in important ways, including allelic diversity and chromosome structural differences. In the Medicago truncatula root cortex for example, a tetraploid cell generated via endoreduplication from the diploid state would contain at most two alleles at any locus. The effects of endopolyploidy on cell size, gene expression, cell metabolism and the duration of the mitotic cell cycle are not shared among individual cells or organs, contrasting to a polyploid individual (Scholes and Paige, 2015).

      See Scholes, D. R., & Paige, K. N. (2015). Plasticity in ploidy : A generalized response to stress. Trends in Plant Science, 20(3), 165‑175. https://doi.org/10.1016/j.tplants.2014.11.007

      (10) Line 332: "chromosomes on mitotic figures" - what does this mean?

      Reviewer #2 is right to point out this redundant wording. Mitotic “figures” are recognized, by definition, based on chromosome condensation. We now use the term "mitotic chromosomes" (line 344).

      (11) Fig. 6A: could the authors consider labeling the doublets, at least some of them? I understand that this nucleus contains many doublets. However, this is the first image where one is supposed to recognize these doublets, and pointing out these features can facilitate understanding. Otherwise, a reader might think the image is comparable to nuclei with no doublets in the rest of the figure.

      Following this suggestion, five of these doublets are now labeled in Figure 7A (formerly Figure 6A).

  2. May 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) A previously determined 2:2 heterodimeric complex of LGI1-ADAM22 was suggested to play a role in trans interactions. Could the authors discuss if the heterohexameric 3:3 LGI1-ADAM22 is more likely to represent a cis complex or a trans complex, or if both are possible?

      We noticed that there was no obvious structural feature strongly suggesting that the heterohexameric 3:3 LGI1-ADAM22 is more likely to represent a cis complex or a trans complex. Both are possible at the synapse (and similarly, for LGI3-ADAM23 at the jaxtaparanode of myelinated axons). Therefore, we revised the Introduction and Discussion sections as follows:

      Introduction: (about potential structural mechanisms of the 3:3 complex)

      “Similarly to the 2:2 complex, the 3:3 complex might serve as an extracellular scaffold to stabilize Kv1 channels or AMPA receptors in a trans-synaptic fashion. In addition, the 3:3 assembly in a cis fashion on the same membrane might regulate the accumulation of Kv1 channel complexes at axon initial segment. However, no clear evidence to prove these potential mechanistic roles of the 3:3 assembly has been provided, and the three-dimensional structure of the 3:3 complex has not yet been determined.”

      Discussion: (about a role of the LGI3–ADAM23 complex at the jaxtaparanode of myelinated axons)

      “In this context, as discussed in (30), either or both of the 2:2 and 3:3 complexes might be formed in a trans fashion at the juxtaparanode of myelinated axons and bridge the axon and the innermost myelin membrane. Alternatively, the 3:3 complex formed in a cis fashion might positively regulate the clustering of the axonal Kv channels at the juxtaparanode, possibly in a similar manner at the axon initial segment.”

      *Ref. 30: Y. Miyazaki et al., Oligodendrocyte-derived LGI3 and its receptor ADAM23 organize juxtaparanodal Kv1 channel clustering for short-term synaptic plasticity. Cell Rep 43, 113634 (2024).

      (2) It is not entirely clear to me if the LGI1-ADAM22 complex is also crosslinked in the HS-AFM experiments. Could this be more clearly indicated? In addition, if this is the case, could an explanation be given about how the complex can still dissociate?

      Thank you for the constructive suggestions. A non-crosslinked 3:3 LGI-ADAM22 complex was used for HS-AFM observations. To clarify the sample used for HS-AFM, we have modified the text as follows.

      P.8 “Dynamics of the LGI1‒ADAM22 higher-order complex observed by HS-AFM

      HS-AFM images of gel filtration chromatography fractions containing the 3:3 LGI1-ADAM22<sub>ECD</sub> complex (not chemically crosslinked with glutaraldehyde) predominantly…”

      P.10 Materials and methods

      “HS-AFM observations of the LGI1–ADAM22<sub>ECD</sub> complex (not chemically crosslinked with glutaraldehyde) were conducted on AP-mica,…”

      (3) The LGI1 and ADAM22 are of similar size. To me, this complicates the interpretation of dissociation of the complex in the HS-AFM data. How is the overinterpretation of this data prevented? In other words, what confidence do the authors have in the dissociation steps in the HS-AFM data?

      Our criteria for assigning HS-AFM images to the 3:3 LGI1–ADAM22<sub>ECD</sub> complex were based on a comparison of the simulated AFM image of the 3:3 complex obtained by cryo-EM. The automatized fitting process (42) identifies the optimal orientation of cryo-EM images that closely matches the HS-AFM image. In the present study, the concordance coefficient (CC) reached 0.8, indicating that the protein orientation in HS-AFM images of the 3:3 complex was objectively satisfactory.

      Regarding the dissociation step of ADAM22 from the 3:3 complex, we carefully analyzed the HS-AFM videos frame by frame and observed that the protrusion corresponding to ADAM22 in the 3:3 complex disappeared at a specific frame (4.5 s in the third molecule in Movie S1). The dissociation steps of ADAM22 were further confirmed by integrating multiple independent HS-AFM experiments and observations. Thus, although HS-AFM images alone cannot determine the orientation of LGI1 and ADAM22 in the 3:3 complex, the comparison of cryo-EM images with simulated AFM images enables objective assignment and orientation of proteins in the 3:3 complex through automated fitting.

      *Ref. 42: R. Amyot et al., Flechsig, Simulation atomic force microscopy for atomic reconstruction of biomolecular structures from resolution-limited experimental images. PLoS Comput Biol 18, e1009970 (2022).

      (4) What is the "LGI1 collapse" mentioned in Figure 4c?

      Thank you for the constructive suggestions. The term “LGI1 collapse” was intended the dissociation of LGI1 from the 3:3 complex. To avoid confusion, we have revised it to “LGI1 release”.

      (5) Am I correct that the structure indicates that the trimerization is entirely organized by LGI1? This would suggest LGI1 trimerizes on its own. Can this be discussed? Has this been observed?

      Yes. The present cryo-EM structure of the 3:3 complex indicates that the trimerization can be entirely organized by LGI1. In addition, during the HS-AFM imaging, the triangle shape seems to be maintained even if one ADAM22<sub>ECD</sub> molecule is released. These findings suggest the possibility that LGI1 could trimerize on its own although this possibility could not be tested due to the difficulty in the expression of the full-length LGI1 alone for biophysical analysis in our hands. On the other hand, considering the dynamic property of the 3:3 complex and spatial alignment of LGI1LRR and ADAM22, we cannot exclude the possibility that ADAM22 could act as a platform to facilitate the intermolecular interaction between LGI1<sub>LRR</sub> and LGI1*<sub>EPTP</sub> for the trimerization of LGI1. This discussion was added in the first paragraph of the subsection "Dynamics of the LGI1–ADAM22 higher-order complex by HS-AFM".

      (6) C3 symmetry was not applied in the cryo-EM reconstruction of the heterohexameric 3:3 LGI1-ADAM22 complex. How much is the complex deviating from C3 symmetry? What interactions stabilize the specific trimeric conformation reconstructed here, compared to other trimeric conformations?

      According to this comment, we compared the non-symmetric, present cryo-EM structure to the previously calculated _C_3 symmetry-restrained structure based on small-angle X-ray scattering analysis and the _C_3 symmetric structure generated by AlphaFold3. Their differences in the domain or protomer configuration are illustrated in Fig. S9.

      We did not find interactions that could obviously stabilize the specific trimeric conformation but the closure motion of LGI1<sub>LRR</sub> (relative to LGI1<sub>EPTP</sub>) in chain F appears to locate it in close proximity to LGI1LRR in chain D to make the triangular assembly slightly more compact. This (partly) compact configuration might stabilize the non-symmetric trimeric configuration observed in the cryo-EM structure. This was described in the last sentence in the subsection "Cryo-EM structure of the 3:3 LGI1– ADAM22<sub>ECD</sub> complex".

      Reviewer #2 (public review):

      The functional significance of these two complexes in the context of synapse remains speculative.

      To assess the functional significance of the 3:3 complex, we spent time and effort designing mutations that solely inhibit the 3:3 assembly but failed to find such mutations. In this paper, we just focused on structural characterization of the 3:3 complex.

      Additionally, the structural presentations in Figures 1-3 (especially Figures 2-3) lack the clarity needed for general readers to fully understand the authors' key points. Enhancing the quality of these visual representations would greatly improve accessibility and comprehension.

      We made an effort to improve Figures 1-3 accordingly. Specifically, we revised them based on the strategy suggested in the Editorial comment regarding this reviewer's comment.

      Editorial comments:

      We noticed that in the reconstruction of the 3:3 complex, which is claimed to be at 3.8A resolution, beta-strands are not separated in the map and local resolution estimates vary from 6-10A. Please clarify.

      We revised Fig. S8 to show the local resolution and volume quality, which correspond to nominal resolution of 3.8 Å, estimated from gold-standard FSC.

      Reviewer #1 (Recommendations for the authors):

      (1) PDB validation reports should be presented to allow further validation

      The PDB validation reports were attached to the revised manuscript (uploaded as "related manuscript file").

      (2) In Figure 4, models below the AFM figures are difficult to see because of the light coloring. In addition, in panel c, the orientation of some of the parts of the models below the 19.2 and 34.5 s. panels do not seem to correlate with the AFM figures. Could the models be adjusted so that they represent the data better?

      Thank you for the constructive suggestions. According to the Reviewer’s comments, we have revised the AFM figures (Fig. 4).

      (3) References are sometimes missing for important statements. Please check throughout.

      Some examples:

      P3, "it has been suggested that the 3:3 complex regulates the density of synaptic molecules such as scaffolding proteins and synaptic vesicles".

      P3. "Furthermore, LGI1 forms a complex with the voltage-gated potassium channel (VGKC) through ADAM22/23".

      According to this comment, we rewrote the description about potential physiological roles of the 3:3 complex and added references as follows:

      "Similarly to the 2:2 complex, the 3:3 complex might serve as an extracellular scaffold to stabilize Kv1 channels or AMPA receptors in a trans-synaptic fashion (9, 17, 19). In addition, the 3:3 assembly in a cis fashion on the same membrane might regulate the accumulation of Kv1 channel complexes at axon initial segment (18, 20). However, no clear evidence to prove these potential mechanistic roles of the 3:3 assembly has been provided, and the three-dimensional structure of the 3:3 complex has not yet been determined."

      We also added references to the following sentences:

      p.2, (the last sentence in the first paragraph of the Introduction) “Additionally, some epilepsy-related mutations have been identified in genes encoding non-ion channel proteins such as LGI1 (4-7).”

      p.3, ln 4-5, “The metalloprotease-like domain interacts with the EPTP domain of LGI1 in the extracellular space (11, 14).”

      p.3, ln 9-10, “Furthermore, LGI1 forms a complex with the voltage-gated potassium channel (VGKC) through ADAM22/23 (9, 17, 18)”

      p.3, ln 20-22, “The results revealed the structural basis of the interaction between the EPTP domain of one LGI1 and the LRR domain of the other LGI1, as well as the interaction between the EPTP domain of LGI1 and the metalloproteinase-like domain of ADAM22 (14)”

      (4) S5 for clarity please add an overview of the complex highlighting where the different parts shown in the panels are located.

      Fig. S5 was modified accordingly. Every panel showing a zoom-up view was indicated by a box in an overview of the complex.

      (5) S7 a+b, also here add models for the structures to indicate which parts are shown.

      Could labels be added to highlight important parts?

      We added an overview of the complex with boxes that indicate the parts shown as the panels, according to this comment. We also added labels to highlight residues that are important for the LGI1<sub>EPTP</sub>–ADAM22<sub>ECD</sub> interaction in the panel showing the LGI1<sub>EPTP</sub>–ADAM22<sub>ECD</sub> interface.

      (6) S7c also shows the cartoon of the structure. How is it possible that the local resolution is not much higher than 6 Å? The overall resolution was 3.8 Å? This looks like a figure of the density plotted at a low level, and not as stated a "surface representation". Could an extra panel be shown of the density plotted at a higher level? Also, please add Å to the legend in this figure.

      Local resolution maps of the 3:3 LGI1-ADAM22<sub>ECD</sub> complex were shown as Fig. S8 in the revised manuscript. According to this comment, the distribution of the resolution was plotted onto the density at high (0.06) and low (0.03) levels. "Å" was added to the legend in the figure.

      Reviewer #2 (Recommendations for the authors):

      (1) The study was conducted using the ectodomain (ECD) of ADAM22. It remains unclear whether the 3:3 complex could form if the transmembrane domain (TMD) of ADAM22 were included. In other words, it is difficult to assess whether the observed 3:3 complex represents plausible cis interactions.

      As mentioned in our reply to the first comment from Reviewer #1, we noticed that there was no obvious structural feature strongly suggesting that the heterohexameric 3:3 LGI1–ADAM22 is more likely to represent a cis complex or a trans complex. Both are possible at the synapse (and similarly, for LGI3–ADAM23 at the jaxtaparanode of myelinated axons). Therefore, we revised the Introduction and Discussion sections as follows:

      Introduction: (about potential structural mechanisms of the 3:3 complex)

      “Similarly to the 2:2 complex, the 3:3 complex might serve as an extracellular scaffold to stabilize Kv1 channels or AMPA receptors in a trans-synaptic fashion. In addition, the 3:3 assembly in a cis fashion on the same membrane might regulate the accumulation of Kv1 channel complexes at axon initial segment. However, no clear evidence to prove these potential mechanistic roles of the 3:3 assembly has been provided, and the three-dimensional structure of the 3:3 complex has not yet been determined.”

      Discussion: (about a role of the LGI3–ADAM23 complex at the jaxtaparanode of myelinated axons)

      “In this context, as discussed in (30), either or both of the 2:2 and 3:3 complexes might be formed in a trans fashion at the juxtaparanode of myelinated axons and bridge the axon and the innermost myelin membrane. Alternatively, the 3:3 complex formed in a cis fashion might positively regulate the clustering of the axonal Kv channels at the juxtaparanode, possibly in a similar manner at the axon initial segment.”

      *Ref. 30: Y. Miyazaki et al., Oligodendrocyte-derived LGI3 and its receptor ADAM23 organize juxtaparanodal Kv1 channel clustering for short-term synaptic plasticity. Cell Rep 43, 113634 (2024).

      (2) Page 2, line 1: "...caused by genetic mutations." - Specify the mutations involved. Which genes are mutated? Providing this information would enhance clarity and context.

      According to this comment, we rephrased the sentence as follows:

      "LGI1 is linked to epilepsy, a neurological disorder that can be caused by genetic mutations of genes regulating neuronal excitability (e.g., voltage- or ligand-gated ion channels)."

      (3) The experimental strategy and data for both cryo-EM and HS-AFM are of high quality. However, improvements are needed in the cryo-EM/structural figures to enhance clarity. Structural components should be labeled, and the protein interfaces should be identified within the overall complex figures in Figures 2 and 3, as the current presentation is challenging for general readers to follow. For example, in Figure 2, panel a would benefit from clear labeling to indicate the locations of ADAM22 and LGI1. Panels b and c lack context unless the authors specify which interface corresponds to panel a. Additionally, panels e and f are unlabelled, making it difficult to interpret the figures. Improved annotations and descriptions would significantly enhance figure accessibility and comprehension.

      Thank you for the constructive suggestion for enhancing accessibility and comprehension of cryo-EM/structural figures. According to this comment, we labeled structural components and indicated the protein interfaces as boxes in the overall complex figures in Figures 2 and 3. Further, in Figure 2, the locations that panels b and c show were indicated as two boxes in the close-up view in panel a.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) The data are generated using ATP read-out (CTG assay). For any inhibitor of mitochondrial function, ATP assays are highly sensitive reflecting metabolic stress, yet these do not necessarily translate into cell growth inhibition using standard Trypan blue assays and tend to overestimate the effects. Please show orthogonal more robust assays of cell growth or proliferation.

      We acknowledge the sensitivity of the ATP read-out assay in reflecting metabolic stress. While additional cell growth assays such as Trypan blue exclusion could provide further insights, we believe that the current ATP assay data robustly demonstrate the effect of the IMT and venetoclax combination on cellular metabolism, which is a critical aspect of our study. The scope of our current work focused on metabolic inhibition, and we suggest that future studies could further explore cell proliferation assays to complement these findings.

      (2) It is concluded that AML cells do not utilize glucose for ATP production. Please provide formal measurements of glycolysis/lactate upon combinatorial treatment.

      We appreciate the reviewer’s suggestion to include glycolysis and lactate measurements, which could indeed add further granularity to our metabolic analysis. However, the primary focus of our study is on mitochondrial function and oxidative phosphorylation (OXPHOS) in AML cells treated with IMT and venetoclax. We believe the data presented in Figure 3 provide strong support for the conclusion that glycolysis is not a major energy source in these cells.

      Specifically, in Figure 3C, we demonstrate that AML cells maintain ATP levels and viability when cultured in galactose, a condition that restricts ATP production through glycolysis and forces cells to rely on OXPHOS. This result strongly suggests that these AML cells are not dependent on glycolysis for ATP production. Furthermore, in Supplementary Figure S3B, we show that oxygen consumption rate (OCR) measurements remain stable in the presence of excess glucose, further supporting our conclusion that the cells do not switch to glycolysis when OXPHOS is inhibited.

      These findings collectively indicate a primary reliance on OXPHOS for energy generation in AML cells, consistent with our study’s objectives to explore mitochondrial dependency and the therapeutic potential of targeting mitochondrial transcription in AML. Future studies could certainly expand on these insights by incorporating a more detailed analysis of glycolytic flux and lactate production under combinatorial treatment, but we believe the current data are sufficient to support our main conclusions.

      (3) The transcriptome data are shown without any analysis of pathways. The conclusion from this data beyond the higher number of genes impacted in the combination arm is unclear. Please provide analysis for example GO pathways and interpret in the context of the drugs' mechanism of action.

      In response to the reviewer’s question, we have added gene ontology (GO) pathway analysis to clarify the transcriptomic impact of our combination treatment with IMT and venetoclax. Functional annotation identified significant enrichment in pathways relevant to innate immune response, mitochondrial function, and cellular signaling processes. Specifically, pathways associated with immune defense, mitochondrial signaling, and intracellular signaling were notably affected. These findings suggest that the combination treatment not only disrupts cellular energy metabolism but also potentially primes immune signaling mechanisms. This aligns with the proposed mechanism, where IMT targets mitochondrial transcription and venetoclax induces apoptosis, together enhancing sensitivity in AML cells. The enriched pathways, therefore, support the mechanism of action of both drugs, showing how the combined inhibition of BCL-2 and mitochondrial transcription creates a compounded cellular disruption that enhances the therapeutic effect.

      (4) Please demonstrate (could be in supplement) matrix of combination to support the statement that the combination is synergistic using Bliss index. The actual Bliss values are missing.

      For the revision, we have now included a matrix of combination treatment effects with the corresponding Bliss synergy index values to substantiate our claim of synergy between IMT and venetoclax. This analysis, provided in the supplement, demonstrates that the observed effects exceed the expected additive impact of each drug alone, as calculated by the Bliss independence model. Specifically, the Bliss values confirm a synergistic interaction in venetoclax-sensitive AML cell lines, highlighting that the combined treatment significantly enhances inhibition of cell viability and apoptosis induction compared to single treatments. This data supports our interpretation of synergy and strengthens the mechanistic conclusions drawn from our findings on the combination therapy’s efficacy.

      (5) Please show KG1 data (OCR), here or in Supplement.

      In response to the reviewer’s request to include OCR data for the KG-1 cell line, we would like to clarify that OCR measurements were attempted; however, they did not yield conclusive results. This is noted in the revised manuscript (Results section), where we explain that the KG-1 cell line did not provide usable OCR data, likely due to limitations in detecting reliable mitochondrial respiration in this particular line under our experimental conditions. Therefore, we were unable to include KG-1 OCR data in the main figures or the supplement.

      Reviewer #2:

      (1) It's important that the authors show that the drug's effects in AML are due to on-target inhibition. It's critical that they show that IMT actually inhibits the mito polymerase in the AML cells in the dose range employed.

      We appreciate the importance of demonstrating on-target inhibition of mitochondrial RNA polymerase by IMT1, especially in light of the detailed characterization of IMT1b, a closely related compound, as presented in Bonekamp et al., Nature 2020. The work by Bonekamp et al. established the specificity and efficacy of IMT1b in targeting mitochondrial RNA polymerase across various tumor models. Building on these findings, we designed our study to primarily evaluate the combinatorial efficacy of IMT1 with venetoclax in AML models, assuming a similar mechanism of action as described for IMT1b. While direct confirmation of on-target inhibition in AML cells by IMT1 would undoubtedly provide additional mechanistic insight, we focused on translational aspects in this study. We believe that the foundational work provided by Bonekamp et al. supports the assumption of on-target effects by IMT1, and we suggest that future studies could explicitly verify this in the context of AML.

      (2) For Fig 1, the stated synergism between Venetoclax (Vex) and IMT in p53 mutant THP1 cells is really not evident, despite what the statistical analysis says. In some ways, the more interesting conclusion is that inhibiting mitochondrial transcription does NOT potentiate the efficacy of Bcl2 inhibition in TP53 mutant AML.

      We appreciate the reviewer’s observation regarding the lack of evident synergy between IMT and venetoclax in TP53 mutant THP-1 cells. In line with this comment, we have now expanded the discussion to emphasize that, while statistical analysis suggested a potential interaction, the biological response in TP53 mutant cells was minimal. This contrasts with the strong synergy observed in TP53 wild-type cell lines, such as MV4-11 and MOLM-13. We have now highlighted that TP53 mutation status may limit the effectiveness of mitochondrial transcription inhibition in potentiating BCL-2 inhibition. This addition underscores the importance of mutation profiles, such as TP53 status, in predicting response to combination therapies in AML and is now clearly addressed in the revised discussion.

      (3) They combine IMT with Vex, but Vex plus azacytidine or decitabine is the approved therapy for AML. Any clinical trial would likely start with this backbone (like Vex+Aza). They should test combinations of IMT with Vex/Aza or Vex/Dec.

      While we recognize the importance of testing IMT in combination with clinically approved therapies like Vex+Aza, our current study was designed to explore the potential of IMT in combination with venetoclax alone. Expanding to other combinations would be an excellent direction for future research but is beyond the scope of our current investigation.

      (4) It's interesting that AML cell lines do not show any reliance on ATP generation from glycolysis, but would this still be the case when OxPhos is inhibited with IMT? Such a simple experiment would be much more interesting and could help them better understand the mechanism of IMT efficacy.

      We thank the reviewer for highlighting this point regarding the reliance of AML cell lines on glycolysis under OxPhos inhibition. In our study, we observed that AML cells predominantly rely on OxPhos, and we did test for ATP production in conditions that favored glycolysis by growing AML cells with galactose instead of glucose in the medium. As described in the manuscript, we did not observe significant ATP production or cell viability from glycolysis, even under these conditions. This finding suggests that AML cells have a low capacity to adapt to glycolytic ATP generation when OxPhos is disrupted by IMT, reinforcing the view that they are highly dependent on mitochondrial function for energy production. We agree that this adaptation—or lack thereof—is an intriguing aspect of IMT efficacy in targeting energy metabolism in AML cells, and we have clarified this point in the discussion.

      (5) OxPhos measurements need statistical analyses.

      We appreciate the reviewer’s suggestion to include statistical analyses for the OXPHOS measurements. We would like to clarify that statistical analyses were included in the initial submission. These are detailed in Figure 3 and its legend, as well as in the Statistical Analysis section, where we specify methods such as the calculation of standard error across replicates. This approach was implemented to ensure the rigor of our OCR data and its conclusions on OXPHOS inhibition in AML cells.

      (6) Given that the combo-treated mice do not exhibit much leukemia in the blood through ~180 days, and yet start dying after 100 days, the authors should comment on this, given that the bone marrow has been shown to be a refuge that protects leukemia cells from various therapies.

      We thank the reviewer for highlighting the observed discrepancy between peripheral blood leukemia levels and survival in combo-treated mice. While leukemic cells were minimally detected in the blood up to approximately 180 days, treated mice began to show signs of disease progression and reduced survival around 100 days. This may suggest that residual leukemic cells persist within the bone marrow, which has been established as a sanctuary site for leukemic cells, providing protection from various therapies. The bone marrow environment likely supports a survival niche, enabling these residual cells to evade treatment effects and potentially initiate disease relapse. We have added this interpretation to the discussion to acknowledge the possibility of bone marrow as a protective refuge, which may limit the full eradication of leukemia in these models despite apparent peripheral blood clearance.

      (7) For Fig 5C, the authors should statistically compare the Combo with Vex alone.

      We have now included statistical comparisons between the combination treatment and venetoclax alone in Fig 5C to provide a clearer interpretation of the data.

      (8) The analyses of gene expression using RNAseq of harvested leukemia cells from the PDX model (Table S2), some more discussion of these results would be helpful, particularly given that neither drug is directly targeting nuclear gene expression.

      We thank the reviewer for their suggestion to discuss the RNAseq findings in more detail. In the revised manuscript, we have expanded on the functional annotation of the gene expression changes observed in leukemia cells from the PDX model following combination treatment (Table S2). The enriched pathways include innate immune involvement, mitochondrial function and immune signaling, and intracellular signaling. This suggests that while neither IMT nor venetoclax directly targets nuclear gene expression, the combined treatment induces secondary effects that alter these pathways, potentially contributing to the treatment’s efficacy in AML. This expanded discussion provides greater insight into how the drug combination impacts gene expression and cellular pathways.

      (9) We need more information on the PDX models, in terms of the classification (M1 to M6) of the patient AMLs and genetics (specific mutations, not just the genes mutated, and chromosomal alterations).

      Additional details regarding the classification and genetic background of the PDX models have been included in the manuscript to better contextualize our findings.

      (10) The authors should discuss whether or not IMT represents an improvement over other therapies intended to target Oxphos in AML (clearly, the low toxicity of IMT is a plus, at least in mice).

      We appreciate the reviewer’s suggestion to discuss IMT in comparison with other OXPHOS-targeting therapies for AML. In the revised discussion, we highlight IMT’s unique properties, particularly its low toxicity profile, which may offer advantages over other OXPHOS inhibitors. This low toxicity, demonstrated in preclinical studies, suggests that IMT might improve patient tolerability compared to existing therapies that target mitochondrial function.

      (11) The authors examined toxicity by weighing the mice and performing CBCs. Measurements of liver and kidney toxicity will be necessary for further clinical development.

      We thank the reviewer for the suggestion to further investigate liver and kidney toxicity. In our study, we assessed toxicity through regular weight monitoring and complete blood counts (CBCs) to evaluate overall health status. While additional liver and kidney toxicity measurements will indeed be important in future studies, resource limitations currently prevent us from performing these additional analyses in this model. We agree that these assessments will be essential as we progress towards clinical development, and we plan to address them in upcoming preclinical studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      The Reviewer asks that we provide the source of PDGF-AB/BB proteins.

      We apologize for omitting such information. We now provide the source of PDGF-AB/BB in the Methods as PeproTech. In our revised manuscript we clearly state in Page 7, line 142: “Cells were then treated with recombinant human PDGF-AB (40ng/ml; PeproTech, 10770584) or -BB (20ng/ml; PeproTech, 10771918) for 5 days. “

      The Reviewer asks that we adequately report our chosen irradiation parameters suggesting that we consider (PMCID: PMC5495460) for appropriate parameter reporting.

      We thank the Reviewer for this excellent suggestion. We now provide a more detailed irradiation reporting based on the shared manuscript in Page 9, line 10, line 204.

      The Reviewer requests more details about the age range to distinguish young from old donors.

      In the Methods section of our revised manuscript, we now provide the age range for our old donors being between 53 and 67 while our younger donor population ranged between 19 and 27 years of age. These changes are reflected in Page 6, line 128: “Human degenerated NP and AF tissues (Grade IV or V on Pfirrman grade; 64.6 ±8.5 years old)) were obtained as the surgical waste from donors with discogenic pain, with each donor providing written informed consent. Healthy NP and AF cells (23.0 ±3.7 years old) were gifted by Professor Lisbet Haglund from McGill University (Tissue Biobank #2019-4896).”

      The Reviewer wonders about the rationale for using different concentrations of PDGF-AB/BB in the degenerate cell and irradiation experiments.

      We apologize for our lack of clarity. We initially treated cells with different concentrations (20 and 40 ng/ml) of PDGF-AB/BB to first establish a dose-response. From our MTT and gene expression analyses we determined that 20ng/ml was sufficient to elicit significant changes in cell proliferation markers, including MKI67, CCNB1 and CCND1. Increasing the concentration to 40 ng/ml of either growth factor did not significantly influence these parameters. However, we felt that for our bulk RNA seq experiments, we may see better changes in signaling molecules under 40ng/ml of PDGF-AB since its effects on cell growth at this concentration were maximal while PDGF-BB was maintained at 20ng/ml based on its efficacy in our mitogenic response.

      The Reviewer asks that we consider describing the effects of PDGF-AB/BB as mitigating or therapeutic rather than protective both in the title and throughout the manuscript.

      We agree with the Reviewer’s recommendation, and we have now changed the title to “Therapeutic effects of PDGF-AB/BB against cellular senescence in human intervertebral disc”. Moreover, we implemented this change in the revised manuscript as requested.

      The Reviewer believes that changes in the NP are more clinically evident (by imaging methods), despite degeneration often initiating from the AF (annulus fibrosus), e,g. through tears/microtears and would like for us to reflect this in our revised manuscript.

      We agree with the Reviewer’s comment, and we thank them for this added accuracy. On this basis, we now corrected our language in the introduction by stating in Page 4, line 68 that: “To date, the main focus of IVD cell studies has been on the NP, as changes in the NP are easily detected through imaging techniques like MRI, making it the most visible indicator of disc degeneration in clinical practice. In addition, NP plays a crucial role in the progression of IVD degeneration due to its susceptibility to significant structural and functional changes during aging and degeneration.”

      The Reviewer points out a prior study which examined the effects of X-ray irradiation on NF-kB signaling in young and aged IVDs (PMCID: PMC5495460) suggesting that we include this reference in our revised manuscript.

      We thank the Reviewer for this suggestion, and we are now referencing this elegant study in the discussion section of our revised manuscript. Thus, in page 20, line 440 we state: “ In fact, it has been shown that NF-kB signaling was elevated in mouse IVDs exposed to a single 20 Gy dose of irradiation in an ex vivo culture model.”

      The Reviewer asks that our experimental methods are described in the order of the experimental workflow. For example, section 2.2 describes RNA sequencing, which is a terminal assay. Section 2.2 may be more appropriate for detailing the methods of PDGF-AB/BB treatment, along with the rationale.

      We thank the Reviewer for pointing this out and have reorganized the Methods section accordingly.

      Reviewer #2:

      The Reviewer requests more experimental details in the methodology including the rationale for such methods/conditions as well as specific culture models utilized, substrates, cell density, and media components.

      We apologize for our lack of clarity. We now revised the methods section based on the comments.

      The Reviewer asks about the quantitative data for b-galactosidase assay and immunofluorescence of senescence-associated proteins such as P21 and P16.

      We apologize for omitting this information. We now included the quantification of P21 and P16 positive cells, which is presented in the revised Figures 4. For b-galactosidase assay, we were unable to quantify the percentage of positive cells because we did not perform nuclei staining, making it difficult to accurately determine the total cell number. Instead, we provided representative images showing the full field of view at 10X magnification using Echo microscope.

      The Reviewer requests the protein level data of PDGFRA to determine if the transcripts are being translated to protein.

      We thank the Reviewer for this suggestion. The protein expression of PDGFRA has been included in the Supplementary Figure 2. We found that PDGFRA protein levels were decreased in both NP and AF cells in response to PDGF treatments. It is known that upon binding with PDGF ligands, PDGFRA undergoes rapid internalization and degradation, a mechanism that prevents overstimulation of the signaling pathway (doi: 10.1042/BST20200004). The upregulated gene expression probably attempting to compensate for this degradation and supports continued activation of PDGFRA signaling activation, emphasizing its crucial role in response to the PDGF treatment. Thus, we implemented it in the discussion section in page 22, line486:” Interestingly, while mRNA level was increased in PDGF treated NP cells, its protein level was decreased, highlighting the complexity in PDGF receptor dynamics. Upon binding with PDGF ligands, PDGFRA is known to undergo rapid internalization and degradation, a mechanism that prevents overstimulation of the signaling pathway (Rogers and Fantauzzo 2020). The upregulated gene expression probably attempting to compensate for this degradation and supports continued activation of PDGFRA signaling activation, emphasizing its crucial role in response to the PDGF treatment.”

      The Reviewer points out that our conclusion that “PDGF do not mediate their effects via the PDGFRA” is not supported by the current data asking that further discussion, interpretation, and direct comparison of the nucleus pulposus and annulus fibrosus data sets be presented to the readers.

      We thank the Reviewer for the insightful comment. In page 20, line 432, we have corrected our language to now state: “In contrast, while PDGF treatment alleviated the senescent phenotype in AF cells, it also induced changes in pathways such as response to mechanical stimuli and neurogenesis, which were distinct from those in NP cells. This indicates that the treatment enhanced IVD functionality through different mechanisms within the two compartments.”

      The Reviewer cannot appreciate the changes in S-phase between control and treated groups.

      We apologize for the poor quality of the figure in our initial submission. We analyzed the data in S phase and included them in our revised Figures 5C and 5F.

      The Reviewer believes that discectomies are typically not performed on patients with discogenic back pain but on patients who are undergoing surgery for a herniated disc.

      We agree with the Reviewer, and we corrected our language in the revised manuscript. In Page 6, line 128, we now stated: “Human degenerated NP and AF tissues (Grade IV or V on Pfirrman grade; 64.6 ±8.5 years old)) were obtained as the surgical waste from donors with disc herniation, with each donor providing written informed consent.”

      The Reviewer asks about the protein-protein interactions in AF cells.

      We thank the Reviewer for this suggestion, and we now included it in Figure 3.

      The Reviewer requests more details about the protocol and doses for the irradiation studies.

      In the revised manuscript, we added this information in page 10, line 204.

      The Reviewer asks whether the gene expression of PDGFRA was increased or decreased in irradiated cells compared to non-irradiated cells.

      The gene expression of PDGFRA was decreased in NP cells exposed to irradiation compared to non-irradiated cells. The data are shown in Figure 4 and their description in the text is in page 17, line 411.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zhao and colleagues employ Drosophila nephrocytes as a model to investigate the effects of a high-fat diet on these podocyte-like cells. Through a highly focused analysis, they initially confirm previous research in their hands demonstrating impaired nephrocyte function and move on to observe the mislocalization of a slit diaphragmassociated protein (pyd). Employing a reporter construct, they identify the activation of the JAK/STAT signaling pathway in nephrocytes. Subsequently, the authors demonstrate the involvement of this pathway in nephrocyte function from multiple angles, using a gain-of-function construct, silencing of an inhibitor, and ectopic overexpression of a ligand. Silencing the effector Stat92E via RNAi or inhibiting JAK/ STAT with Methotrexate effectively restored impaired nephrocyte function induced by a high-fat diet, while showing no impact under normal dietary conditions.

      Strengths:

      The findings establish a link between JAK/STAT activity and the impact of a high-fat diet on nephrocytes. This nicely underscores the importance of organ crosstalk for nephrocytes and supports a potential role for JAK/STAT in diabetic nephropathy, as previously suggested by other models.

      Weaknesses:

      The analysis is overly reliant on tracer endocytosis and single lines. Immunofluorescence of slit diaphragm proteins would provide a more specific assessment of the phenotypes.

      We thank the reviewer for the positive comments and pointing out that slit diaphragm markers would provide a more specific assessment of the phenotypes. In our revised manuscript, we used Sns-mRuby3, in which mRuby3 was tagged endogenously at the C-terminal of Sns (PMID: 39195240 and PMID: 39431457), to show the slit diaphragm pattern.

      Reviewer #2 (Public Review):

      Summary:

      In their manuscript, Zhao et al. describe a link between JAK-STAT pathway activation in nephrocytes on a high-fat diet. Nephrocytes are the homologs to mammalian podocytes and it has been previously shown, that metabolic syndrome and obesity are associated with worse outcomes for chronic kidney disease. A study from 2021 (Lubojemska et al.) could already confirm a severe nephrocyte phenotype upon feeding Drosophila a high-fat diet and also linking lipid overflow by expressing adipose triglyceride lipase in the fat body to nephrocyte dysfunction. In this study, the authors identified a second pathway and mechanism, how lipid dysregulation impact on nephrocyte function. In detail, they show activation of JAK-STAT signaling in nephrocytes upon feeding them a high-fat diet, which was induced by Upd2 expression (a leptin-like hormone) in the fat body, and the adipose tissue in Drosophila. Further, they could show genetic and pharmacological interventions can reduce JAK-STAT activation and thereby prevent the nephrocyte phenotype in the high-fat diet model.

      Strengths:

      The strength of this study is the combination of genetic tools and pharmacological intervention to confirm a mechanistic link between the fat body/adipose tissue and nephrocytes. Inter-organ communication is crucial in the development of several diseases, but the underlying mechanisms are only poorly understood. Using Drosophila, it is possible to investigate several players of one pathway, here JAK-STAT. This was done, by investigating the functional role of Hop, Socs36E, and Stat92E in nephrocytes and has also been combined with feeding a high-fat diet, to assess restoration of nephrocyte function by inhibiting JAK-STAT signaling. Adding a translational approach was done by inhibiting JAK-STAT signaling with methotrexate, which also resulted in attenuated nephrocyte dysfunction. Expression of the leptin-like hormone upd2 in the fat body is a good approach to studying inter-organ communication and the impact of other organs/tissue on nephrocyte function and expands their findings from nephrocyte function towards whole animal physiology.

      Weaknesses:

      Although the general findings of this study are of great interest, there are some weaknesses in the study, which should be addressed. Overall, the number of flies investigated for the majority of the experiments is very low (6 flies) and it is not clear whether the flies used, are from independent experiments to exclude problems with food/diet. For the analysis, the mean values of flies should be calculated, as one fly can be considered a biological replicate, but not all individual cells. By increasing the number of flies investigated, statistical analysis will become more solid. In addition, the morphological assessment is rather preliminary, by only using a Pyd antibody. Duf or Sns should be visualized as well, also the investigation of the different transgenic fly strains studying the importance of JAK-STAT signaling in nephrocytes needs to include a morphological assessment. Moreover, the expected effect of feeding a high-fat diet on nephrocytes needs to be shown (e.g. by lipid droplet formation) and whether upd2 is actually increased here should also be assessed. The time points of assessment vary between 1, 3, and 7 days and should be consistent throughout the study or the authors should describe why they use different time points.

      We thank the reviewer for the comments and suggestions. HFD causes enlarged crop (Liao et al, 2021, PMID: 33171202) and accumulation of lipid droplets in the intestine. To exclude the problems with different batches of food/diet, we checked crop and the intestine during the sample preparation as indications of food consistency.

      We followed the suggestion to take the mean values of flies in the data analysis, one was considered a biological replicate in the revised version. We added in another slit diaphragm protein reporter Sns-mRuby3, in which mRuby3 fluorescent protein was tagged at the C-terminal of endogenous Sns. This reporter was used to show the effect of HFD on slit diaphragm protein, manipulation of Jak/Stat pathway (ppl-Gal4>upd2 and dot-Gal4>UAS-Stat92E-RNAi), and drug treatment.

      Lubojemska et al 2021 (PMID: 33945525) showed that HFD leads to lipid droplet accumulation in larval nephrocytes. Following the reviewer’s suggestion, we stained the adult nephrocytes with Nile red and found lipid droplet formation caused by HFD, verifying the HFD effects on lipid droplet accumulation.

      Regarding the timepoints, the newly eclosed flies (1-day old) were treated for 7 days (transferred to fresh diet or shifted from 18 to 29 °C for 7 days to induce target gene expression). Thus, the flies were 7 days old. In the revised manuscript, we changed “1-day-old females” to “7-day-old females” in the figure legend. The exception was Figure 4 panel G and H, we used Day 3 for the UAS-hop.Tum overexpression in the flp-out clones, which is different from the HFD approach (Day 7). This is because Hop.Tum is a strong gain of function mutation. UAS-hop.Tum overexpression in the eye imaginal disc leads to apoptosis via up-regulating a proapoptotic gene hid (Bhawana Maurya et al, 2021, PMID: 33824299). Thus, we used Day 3 for this experiment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There are relevant issues, that should be addressed:

      Major:

      - The analysis of JAK/STAT signaling in nephrocytes is limited to nephrocyte function, despite the nice slit diaphragm phenotype shown in Figure 2A. What happens to the slit diaphragm in the other genotypes, the rescue settings in particular? Immunofluorescence of Pyd should be explored for all conditions to evaluate proper phenocopy. Tracer endocytosis is much less specific.

      We thank the reviewer for the suggestion. We made a transgenic line Sns-mRuby3, in which mRuby3 was tagged to the endogenous Sns C-terminal. It has been used as a slit diaphragm reporter (PMID: 39195240 and PMID: 39431457). Apart from the tracer assays, we used Sns-mRuby3 reporter and/or Pyd staining to visualize the changes in slit-diaphragm structures.

      - The interventions are restricted to single RNAi lines and reporters, raising concerns about specificity/potential off-targets. Additional lines should be tested for verification.

      Different versions of RNAi lines are available for targeting fly genes. For UAS-Socs36E-RNAi, we chose the one that was generated with a short hairpin, which is known to restrict the off-target effects (Ni et al, 2011, PMID: 21460824). For UAS-Stat92E-RNAi, we added in an independent RNAi line (Figure 6 - figure supplement 1 and 2).   

      Minor:

      - In Figure 2C, the image of HFD shows a section that cuts through the surface at a shallower angle, making everything appear blurry. This image should be replaced.

      We replaced Figure 2C (the image of HFD) with another one.

      - What is the relevance (if any) of reduced electrodense vacuoles with a high-fat diet? An effect on endocytic trafficking/endosome architecture remains unexplored.

      Lubojemska et al (PMID: 33945525) studied the endocytic trafficking/endosome architecture of the larval nephrocytes and found that HFD impaired the endocytosis. We studied the adult pericardial nephrocytes. It is very likely that the endocytic trafficking/endosome architecture is affected by HFD in the adult nephrocytes.  

      - How do the findings presented in this manuscript correlate with a similar study by Lubojemska et al.? At least the discussion should provide more evaluation of this aspect.

      Lubojemska et al (PMID: 33945525) assayed the larval nephrocytes and found that a HFD leads to the ectopic accumulation of lipid droplets in the nephrocytes and decreased endocytosis. They further demonstrated that lipid droplet lipolysis and PGC1α counteracts the harmful effects of a HFD. We performed Nile red staining and verified the accumulation of lipid droplets in the adult pericardial nephrocytes upon HFD feeding, which agrees with Lubojemska discovery. We found that a HFD activates Jak/Stat pathway, which mediates the nephrocyte functional defects. A previous study showed that Stat1 has an inhibitory effect on PGC1α transcription (PMID: 26689548). Further study is needed to investigate the interaction between Jak/Stat pathway and PGC1α transcription. We added the information to the discussion.

      - Please check spelling and grammar.

      Reviewer #2 (Recommendations For The Authors):

      (1) Which cells are investigated? Please state.

      Pericardial nephrocytes were used in this study. The information was added to the result parts.

      (2) Rephrase 'chronic kidney disease model'. Feeding for 7 days and assessment after 7 days cannot be considered chronic as flies can live more than 60 days.

      Lubojemska et al (PMID: 33945525) fed the newly hatched larvae with a HFD and used the third instar larvae for the experiments. The term “chronic kidney disease” has been used in the reference PMID: 33945525. It takes about 4 days for fly larvae to develop from the first instar to the third instar. Thus, the animals were fed on the HFD for only 4 days. In this regard, feeding for seven days might be considered as chronic.

      (3) Line 89: Curran et al., 2014). with risk increasing risk as BMI increases (Hsu et al., 2006). Please correct this sentence.

      We thank the reviewer for finding the error. In the revised version, the sentence was changed as “with increasing risk as BMI increases (Hsu et al., 2006)”.

      (4) Figure 1: The authors should explain why they use FITC-Albumin and 10kDA dextran, what are the differences, and why are both used?

      The tracers are different in size (70kD FITC-Albumin and 10kDA dextran). Both FITC-Albumin and 10kDA dextran have been used in previous publications (Zhao et al 2024, PMID: 39431457 and Weavers et al 2009, PMID: 18971929) to show that the nephrocytes can efficiently take up the tracers of different sizes.

      (5) Figure 3: The JAK-STAT sensor was used on Day 1 to confirm activation of JAKSTAT signaling, which means a very fast response towards the HFD after 24hrs. How is the activation after 7 days? The nephrocyte assessment in Figures 1 and 2 is done at the later time point, how about earlier time points in HFD? One would expect an earlier phenotype as well if JAK-STAT signaling is causative.

      In Figure 3C, newly eclosed flies (1-day old) were fed on a control diet or a HFD for 7 days. Thus, in the legend it shall be “7-day-old females”. Sorry for misleading. The caption was updated as “7-day-old females”.

      (6) Figure 4H: I don't understand how many cells or flies are depicted and analysed? Are the dots one nephrocyte from 4 flies? If yes, the numbers need to be increased.

      In figure 4H, we quantified 5 UAS-hop.Tum clones and 5 neighbor cells. We only found 5 clones from 4 flies. We didn’t quantify all the nephrocytes, since we compared the clone with its neighbor cell. To make it easier to follow, we changed the description as “n= 5 clones and 5 neighbor cells”.

      (7) Figure 4: Why are flies investigated at different ages? Day 1 vs Day 3? This should be consistent with the HFD approach and day 7. Or investigate the HFD at earlier time points as well.

      In Figure 4, the newly eclosed flies (1-day old) were shifted from 18 to 29 °C for 7 days to induce target gene expression. Thus, the flies were 7-day old. In the revised manuscript, we changed “1-day-old females” to “7-day-old females” in the figure legend. We used Day 3 for the UAS-hop.Tum overexpression in the flp-out clones, which is different from the HFD approach (Day 7). This is because Hop.Tum is a strong gain of function mutation. UAS-hop.Tum overexpression in the eye imaginal disc leads to apoptosis via up-regulating a proapoptotic gene hid (Bhawana Maurya et al, 2021, PMID: 33824299). Thus, we used Day 3 for this experiment.

      (8) Figure 5: Do the authors see upd2-GFP in the nephrocyte or at the nephrocyte? Is upd2 filtered to bind the JAK-STAT-receptor? They should show this, which is easy to do due to the GFP label.

      We thank the reviewer for the suggestion. We looked into the nephrocyte from ppl-Gal4>upd2-GFP flies and found Upd2-GFP in the nephrocytes. We further showed that ppl-Gal4 was not expressed in the nephrocytes, suggesting that Upd2-GFP is secreted from the fat body and transported to the nephrocytes. We stained the nephrocytes for Pyd and found compromised fingerprint pattern caused by Upd2-GFP expression in the fat body. The data was added to Figure 5 - figure supplement 1.

      (9) Figure 5: What are the upd2 levels after day 1 and compared to HFD at day 7? In the Rajan et al manuscript, upd2 levels have been assessed by qPCR, this can be done here as well. Although there is a mechanistic link shown here, I think it would be interesting to test the upd2 levels at the different time points assessed.

      In the Rajan et al manuscript, they showed that the expression of upd2 was up regulated by HFD. My previous work showed that HFD changes taste perception. We performed qPCR to determine the expression of upd2 and verified that upd2 was upregulated in HFD fed flies (Yunpo Zhao et al. 2023. PMID: 37934669). We included the reference in the revised version.

      (10) Figure 6: Does a Socs36E overexpression e.g. with the Bloomington strain 91352 also rescue the HFD phenotype, by blocking JAK-STAT signaling?

      We thank the reviewer for the suggestion. We tested the effect of Socs36E overexpression and observed that UAS-Socs36E can partially rescue HFD caused nephrocyte functional decline. The data was not included in the revised manuscript. Notably, apart from having an inhibitory effect on the Jak/Stat, Socs36E represses MAPK pathway (Amoyel et al, 2016, PMID: 26807580).    

      (11) Figure 7: What is the control for the methotrexate treatment? What is the solvent?

      We used DMSO as the solvent for methotrexate and used it as the control for the methotrexate treatment. We added the following sentences to the method parts, “Methotrexate (06563, Sigma-Aldrich, MO) was dissolved in DMSO to make a 10mM stock solution”, and “The samples incubated in Schneider’s Medium supplemented with DMSO vehicle were used a control”.

      (12) Why did the authors use Dot-Gal4 for the Socs36E knockdown and Dot-Gal4ts for the Stat92E knockdown?

      We used Dot-Gal4ts and temperature shifting to restrict the Stat92E knockdown at adult stages.

      (13) Supplementary Figure 1: Please add the individual data to the figure as done for all other figures.

      We thank the reviewer for this comment. The figure individual data was added according to the suggestion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors use microscopy experiments to track the gliding motion of filaments of the cyanobacteria Fluctiforma draycotensis. They find that filament motion consists of back-and-forth trajectories along a "track", interspersed with reversals of movement direction, with no clear dependence between filament speed and length. It is also observed that longer filaments can buckle and form plectonemes. A computational model is used to rationalise these findings.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      Much work in this field focuses on molecular mechanisms of motility; by tracking filament dynamics this work helps to connect molecular mechanisms to environmentally and industrially relevant ecological behavior such as aggregate formation.

      The observation that filaments move on tracks is interesting and potentially ecologically significant.

      The observation of rotating membrane-bound protein complexes and tubular arrangement of slime around the filament provides important clues to the mechanism of motion.

      The observation that long filaments buckle has the potential to shed light on the nature of mechanical forces in the filaments, e.g. through the study of the length dependence of buckling.

      We thank the reviewer for listing these positive aspects of the presented work.

      Weaknesses:

      The manuscript makes the interesting statement that the distribution of speed vs filament length is uniform, which would constrain the possibilities for mechanical coupling between the filaments. However, Figure 1C does not show a uniform distribution but rather an apparent lack of correlation between speed and filament length, while Figure S3 shows a dependence that is clearly increasing with filament length. Also, although it is claimed that the computational model reproduces the key features of the experiments, no data is shown for the dependence of speed on filament length in the computational model. The statement that is made about the model "all or most cells contribute to propulsive force generation, as seen from a uniform distribution of mean speed across different filament lengths", seems to be contradictory, since if each cell contributes to the force one might expect that speed would increase with filament length.

      We agree that the data shows in general a lack of correlation, rather than strictly being uniform. In the revised manuscript, we intend to collect more data from observations on glass to better understand the relation between filament length and speed.

      In considering longer filaments, one also needs to consider the increased drag created by each additional cell - in other words, overall friction will either increase or be constant as filament length increases. Therefore, if only one cell (or few cells) are generating motility forces, then adding more cells in longer filaments would decrease speed.

      Since the current data does not show any decrease in speed with increasing filament length, we stand by the argument that the data supports that all (or most) cells in a filament are involved in force generation for motility. We would revise the manuscript to make this point - and our arguments about assuming multiple / most cells in a filament contributing to motility - clear.

      The computational model misses perhaps the most interesting aspect of the experimental results which is the coupling between rotation, slime generation, and motion. While the dependence of synchronization and reversal efficiency on internal model parameters are explored (Figure 2D), these model parameters cannot be connected with biological reality. The model predictions seem somewhat simplistic: that less coupling leads to more erratic reversal and that the number of reversals matches the expected number (which appears to be simply consistent with a filament moving backwards and forwards on a track at constant speed).

      We agree that the coupling between rotation, slime generation and motion is interesting and important when studying the specific mechanism leading to filament motion. However, we believe it is even more fundamental to consider the intercellular coordination that is needed to realise this motion. Individual filaments are a collection of independent cells. This raises the question of how they can coordinate their thrust generation in such a way that the whole filament can both move and reverse direction of motion as a single unit. With the presented model, we want to start addressing precisely this point.

      The model allows us to qualitatively understand the relation between coupling strength and reversals (erratic vs. coordinated motion of the filament). It also provides a hint about the possibility of de-coordination, which we then look for and identify in longer filaments.

      While the model’s results seem obvious in hindsight, the analysis of the model allows phrasing the question of cell-to-cell coordination, which so far has not been brought up when considering the inherently multi-cell process of filament motility.

      Filament buckling is not analysed in quantitative detail, which seems to be a missed opportunity to connect with the computational model, eg by predicting the length dependence of buckling.

      Please note that Figure S10 provides an analysis of filament length and number of buckling instances observed. This suggests that buckling happens only in filaments above a certain length.

      We do agree that further analyses of buckling - both experimentally and through modelling would be interesting. This study, however, focussed on cell-to-cell coupling / coordination during filament motility. We have identified the possibility of de-coordination through the use of a simple 1D model of motion, and found evidence of such de-coordination in experiments. Notice that the buckling we report does not depend on the filament hitting an external object. It is a direct result of a filament activity which, in this context, serves as evidence of cellular de-coordination.

      Now that we have observed buckling and plectoneme formation, these processes need to be analysed with additional experiments and modelling. The appropriate model for this process needs to be 3D, and should ideally include torques arising from filament rotation. Experimentally, we need to identify means of influencing filament length and motion and see if we can measure buckling frequency and position across different filament lengths. These works are ongoing and will have to be summarised in a separate, future publication.

      Reviewer #2 (Public review):

      Summary:

      The authors combined time-lapse microscopy with biophysical modeling to study the mechanisms and timescales of gliding and reversals in filamentous cyanobacterium Fluctiforma draycotensis. They observed the highly coordinated behavior of protein complexes moving in a helical fashion on cells' surfaces and along individual filaments as well as their de-coordination, which induces buckling in long filaments.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The authors provided concrete experimental evidence of cellular coordination and de-coordination of motility between cells along individual filaments. The evidence is comprised of individual trajectories of filaments that glide and reverse on surfaces as well as the helical trajectories of membrane-bound protein complexes that move on individual filaments and are implicated in generating propulsive forces.

      We thank the reviewer for listing these positive aspects of the presented work.

      Limitations:

      The biophysical model is one-dimensional and thus does not capture the buckling observed in long filaments. I expect that the buckling contains useful information since it reflects the competition between bending rigidity, the speed at which cell synchronization occurs, and the strength of the propulsion forces.

      Cell-to-cell coordination is a more fundamental phenomenon than the buckling and twisting of longer filaments, in that the latter is a consequence of limits of the former. In this sense, we are focussing here on something that we think is the necessary first step to understand filament gliding. The 3D motion of filaments (bending, plectoneme formation) is fascinating and can have important consequences for collective behaviour and macroscopic structure formation. As a consequence of cellular coupling, however, it is beyond the scope of the present paper.

      Please also see our response above. We believe that the detailed analysis of buckling and plectoneme formation requires (and merits) dedicated experiments and modelling which go beyond the focus of the current study (on cellular coordination) and will constitute a separate analysis that stands on its own. We are currently working in that direction.

      Future directions:

      The study highlights the need to identify molecular and mechanical signaling pathways of cellular coordination. In analogy to the many works on the mechanisms and functions of multi-ciliary coordination, elucidating coordination in cyanobacteria may reveal a variety of dynamic strategies in different filamentous cyanobacteria.

      We thank the reviewer for highlighting this point again and seeing the value in combining molecular and dynamical approaches.

      Reviewer #3 (Public review):

      Summary:

      The authors present new observations related to the gliding motility of the multicellular filamentous cyanobacteria Fluctiforma draycotensis. The bacteria move forward by rotating their about their long axis, which causes points on the cell surface to move along helical paths. As filaments glide forward they form visible tracks. Filaments preferentially move within the tracks. The authors devise a simple model in which each cell in a filament exerts a force that either pushes forward or backwards. Mechanical interactions between cells cause neighboring cells to align the forces they exert. The model qualitatively reproduces the tendency of filaments to move in a concerted direction and reverse at the end of tracks.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The observations of the helical motion of the filament are compelling. The biophysical model used to describe cell-cell coordination of locomotion is clear and reasonable. The qualitative consistency between theory and observation suggests that this model captures some essential qualities of the true system.

      The authors suggest that molecular studies should be directly coupled to the analysis and modeling of motion. I agree.

      We thank the reviewer for listing these positive aspects of the presented work and highlighting the need for combining molecular and biophysical approaches.

      Weaknesses:

      There is very little quantitative comparison between theory and experiment. It seems plausible that mechanisms other than mechano-sensing could lead to equations similar to those in the proposed model. As there is no comparison of model parameters to measurements or similar experiments, it is not certain that the mechanisms proposed here are an accurate description of reality. Rather the model appears to be a promising hypothesis.

      We agree with the referee that the model we put forward is one of several possible. We note, however, that the assumption of mechanosensing by each cell - as done in this model - results in capturing both the alignment of cells within a filament (with some flexibility) and reversal dynamics. We have explored an even more minimal 1D model, where the cell’s direction of force generation is treated as an Ising-like spin and coupled between nearest neighbours (without assuming any specific physico-chemical basis). We found that this model was not fully able to capture both phenomena. In that model, we found that alignment required high levels of coupling (which is hard to justify except for mechanical coupling) and reversals were not readily explainable (and required additional assumptions). These points led us to the current, mechanically motivated model.

      The parameterisation of the current model would require measuring cellular forces. To this end, a recent study has attempted to measure some of the physical parameters in a different filamentous cyanobacteria [1] and in our revision we will re-evaluate model parameters and dynamics in light of that study. We will also attempt to directly verify the presence of mechano-sensing by obstructing the movement of filaments.

      Summary from the Reviewing Editor:

      The authors present a simple one-dimensional biophysical model to describe the gliding motion and the observed statistics of trajectory reversals. However, the model does not capture some important experimental findings, such as the buckling occurring in long filaments, and the coupling between rotation, slime generation, and motion. More effort is recommended to integrate the information gathered on these different aspects to provide a more unified understanding of filament motility. In particular, the referees suggest performing a more quantitative analysis of the buckling in long filaments. Finally, it is also recommended to discuss the results in the context of previous literature, in order to better explain their relevance. Please find below the detailed individual recommendations of the three reviewers.

      We thank the editor for this accurate summary of the presented work and for highlighting the key points raised by the reviewers. We have provided below point-by-point replies to these.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The relevance of the study organism Fluctiforma draycotensis is not clearly explained, and the results are not discussed in the context of previous literature. The motivation would be clearer if the manuscript explained why this model organism was chosen and how the results compare with those previously observed for this or other organisms.

      We have extended the introduction and discussion sections to make it clearer why we have worked with this organism and how the findings from this work relate to previous ones. In brief, Flucitforma draycotensis is a useful organism to work with as it not only displays significant motility but it also displays intriguing collective behaviour at different scales. Previous works on gliding motility in filamentous cyanobacteria have mostly focussed on the model organism Nostoc punctiforme, which only displays motility after differentiation into hormogonia [1]. There have also been studies in a range of different filamentous species, including those of the non-monophyletic genus, Phormidium, but these studies mostly looked at effects of genetic deletions on motility [2] or utilised electron microscopy to identify proteins (or surface features) involved in motility [3-5]. It must be noted that motility is also described and studied in non-filamentous cyanobacteria, but the dynamics of motion and molecular mechanisms there are different to filamentous cyanobacteria [6,7]. These previous studies are now cited / summarised in the revised introduction and discussion sections.

      The inferred tracks, probably associated with secreted slime, play a key role since it is supposed that the tracks provide the external force that keeps the filaments straight. Movie S3, in phase contrast, provides convincing evidence for the tracks, but they cannot be seen in the fluorescence images presented in the main text. Clearer evidence of them should be shown in the main text. An especially important aspect of the tracks is where they start and end since the computational model assumes that reversal happens due to forces generated by reaching the end of a track. Therefore it seems important to comment on what produces the tracks, to check whether reversals actually happen at the end of a track, etc. Perhaps tracks could be strained with Concanavalin-A?

      To confirm that reversals happen on track ends, we have now performed an analysis on agar, where we can see tracks on phase microscopy. This analysis confirms that, on agar, reversals indeed happen on track ends. We added this analysis, along with images showing tracks clearly as a new Fig in the main text (see new Fig. 1).

      Further confirming the reversal at track ends, we note that filaments on circular tracks do not not reverse over durations longer than the ‘expected reversal interval’ of a filament on a straight track (see details in response to Reviewer 2).

      Regarding what produces the tracks on agar, we are still analysing this using different methods and these results will be part of a future study. Fluorescent staining can be used to visualise slime tubes using TIRF microscopy, as shown in Fig. S8, however, visualising tracks on agar using low magnification microscopy has been difficult due to background fluorescence from agar.

      We would also like to clarify that the model does not incorporate any assumptions regarding the track-filament interaction, other than that the track ends behave akin to a physical boundary for the filament. The observed reversal at track ends and “what” produces the track are distinct aspects of filament motion. We do not think that the model’s assumption of filament reversal at the end of the track requires understanding of the mechanism of slime production.

      Reviewer #3 (Recommendations for the authors):

      The manuscript combines three distinct topics: (1) the difference in locomotion on glass vs agar, (2) the development of a biophysical model, and (3) the helical motion of filament. It is not clear what insight one can gain from any one of these topics about the two others. The manuscript would be strengthened by more clearly connecting these three aspects of the work. A stronger comparison of theory to observation would be very useful. Some suggestions:

      (1) The observation that it is only the longest filaments that buckle is interesting. It should be possible to predict the critical length from the biophysical model. Doing so could allow fits of some model parameters.

      (2) What model parameters change between glass and agar? Can you explain these qualitative differences in motility by changing one model parameter?

      (3) Is it possible to exert a force on one end of a filament to see if it is really mechano-sensing that couples their motion?

      We thank the reviewer for this comment and agree with them that a better connection between model and experiment should be sought. We believe that the new analyses, presented below in response to the 2nd suggestion of the reviewer, provide such a connection in the context of reversal frequency. As stated below, we think that the 1st suggestion falls outside of the scope of the current work, but should form the basis of a future study.

      Regarding suggestion (1) - addressing buckling:

      We agree with the reviewer that using a model to predict a critical buckling length would be useful. We note, however, that the presented study focussed on cell-to-cell coupling / coordination during filament motility using a 1D, beadchain model. The buckling observations served, in this context, as evidence of cellular de-coordination. Now that we have observed buckling (and plectoneme formation), these processes need to be analysed with further experiments and modelling. The appropriate model for studying buckling would have to be at least 2D (ideally 3D) and consider elastic forces and torques relating to filament bending, rotation, and twisting. Experimentally, we need to identify means of influencing filament length and motion and undertake further measurements of buckling frequency and position across different filament lengths. These investigations are ongoing and will be summarised in a separate, future publication.

      Regarding suggestion (2) - addressing differences in motility on agar vs. glass:

      We believe that the two key differences between agar and glass experiments are the occasional detachment of filaments from substrate on glass and the lack of confining tracks on glass. These differences might arise from the interactions between the filament, the slime, and the surface. As both slime and agar contain polysaccharides, the slime-agar interaction can be expected to be different from the slime-glass interaction. Additionally, in the agar experiments, the filaments are confined between the agar and a glass slide, while they are not confined on the glass, leaving them free to lift up from the glass surface. We expect these factors to alter reversal frequency between the two conditions. To explore this possibility, we have now extended the analysis of experimental data from glass and present that (see details below):

      (i) dwell times are similar between agar and glass, and

      (ii) reversal frequency distribution is different between glass and agar, and remains constant across filament length on glass.

      We were able to explore these experimental findings with new model simulations, by removing the assumption of an “external bounding frame”. We then analysed reversal frequency within against model parameters, as detailed below.

      “The movement of the filaments on glass. We have extended our analysis of motility on glass resulting in the following noted features. Firstly, the median speed shows a weak positive correlation with filament length on glass (see original Fig S3B vs. updated Fig. S3A). This is slightly different to agar, where we do not observe any strong correlation in either direction (see original, Fig. 1 vs. updated Fig 2). Both the cases of positive, and no correlation, support our original hypothesis that the propulsion force is generated by multiple cells within the filament.

      Secondly, the filaments on glass display ‘stopping’ events that are not followed by a reversal, but are instead followed by a continuation in the original direction of motion, which we term ‘stop-go’ events, in contrast to the reversals. The dwell times associated with reversals and ‘stop-go’ events are similarly distributed (see original Fig S3A vs. updated Fig S3B). Furthermore, the dwell time distributions are similar between agar and glass (compare old Fig. 1C vs. new Fig 2C and new Fig. S3B). This suggests that the reversal process is the same on both agar and glass.

      Thirdly, we find that the frequencies of both reversal and stop-go events on glass are uncorrelated with the filament length (see new Fig. S4A) and there are approximately twice as many reversals as stop-go events. In contrast, the filaments on agar reverse with a frequency that is inversely proportional to the filament length (which is in turn proportional to the track length) (see original Fig. S1). The distribution of reversal frequencies on agar is broader and flatter than the distribution on glass (see new Fig. S4B). These findings are inline with the idea that tracks on agar (which are defined by filament length) dictate reversal frequency, resulting in the strong correlations we observe between reversal frequency, track length, and filament length. On glass, filament movement is not constrained by tracks, and we have a specific reversal frequency independent of filament length.”

      “Model can capture movement of filaments on glass and provides hypotheses regarding constancy of reversal frequency with length. We believe the model parameters controlling cellular memory (ω<sub>max</sub>) and strength of cellular coupling (K<sub>ω</sub>) describe the internal behaviour of a filament and therefore should not change depending on the substrate. Thus, we expect the model to be able to capture movement on glass just by removal of any ‘confining tracks’, i.e external forces, from the simulations. Indeed, we find that the model displays both stop-go and reversal events when simulated without any external force and can capture the dwell time distribution under this condition (compare new Figs. S12,S13 with S3).

      In terms of reversal frequency, however, the model shows a reduction in reversal frequency with filament length (see new Fig. S15). This is in contrast to the experimental data. We find, however, that model results also show a reduction in reversal frequency with increasing (ω<sub>max</sub> and K<sub>ω</sub> (see new Fig. S14 and S15). This effect is stronger with (ω<sub>max</sub>, while it quickly saturates with K<sub>ω</sub> (see new Fig. S14). Therefore, one possibility of reconciling the model and experiment results in terms of constant reversal frequency with filament length would be to assume that (ω<sub>max</sub> is decreasing with filament length (see new Fig. S16). Testing this hypothesis - or adding additional mechanisms into the model - will constitute the basis of future studies.”

      Regarding suggestion (3) - role of mechanosensing:

      We have tried several experiments to evaluate mechanosensing. First, we have used a micropipette or a thin wire placed on the agar, to create a physical barrier in the way of the filaments. The micropipette approach was not quite feasible in our current setup. The wire approach was possible to implement, but the wire caused a significant undulation / perturbation on agar. Possibly relating to this, filaments tended to continue moving alongside the wire barrier. Therefore, these experiments were inconclusive at this stage with regards to mechanosensing a physical barrier. As an alternative, we have attempted trapping gliding filaments using an optical trap with a far red laser that should not affect the physiology of the cells. This did not cause an immediate reversal in filament motion. However, this could be due to the optical trap strength being below the threshold value for mechanosensing. The force per unit length generated by filamentous cyanobacteria has been calculated via a model of self-buckling rods, giving a value of ≈1nN/μm [8]. In comparison, the optical trap generates forces on the scale of pN. Thus, the trap force is several orders of magnitude lower than the propulsive force generated by a filament, given filament lengths in the range of ten to several hundreds μm. We conclude that the lack of observed response may be due to the optical trap force being too weak.

      Thus, the experiments we can perform using our current available methods and equipment are not able to prove either the presence or the absence of mechanosensing in the filament. We plan to perform further experiments in this direction, involving new and/or improved experimental setups, such as use of Atomic Force Microscopy.

      We would like to note that there is an additional observation that supports the idea of reversals being mediated by mechanosensing at the end of a track, instead of the locations of the track ends being caused by the intrinsic reversal frequency of the filament. In a few instances (N = 4), filaments on agar ended up on a circular track (see Movie S4 for an example). These filaments did not reverse over durations a few times longer than the ‘expected reversal interval’ of a filament on a straight track.

      Should $N$ following eq 7 and in eq 9 be $N_f$?

      We have corrected this typo.

      It would be useful to include references to what is known about mechanosensing in cyanobacteria.

      We agree with the reviewer, and we have not updated the discussion section to include this information. Mechanosensing has not yet been shown directly in any cyanobacteria, but several species are shown to harbor genes that are implicated (by homology) to be involved in mechanosensing. In particular, analysis of cyanobacterial genomes predicts the presence of a significant number of homologues of the Escherichia coli mechanosensory ion channels MscS and MscL [9]. We have also identified similar MscS protein sequences in F. draycotensis. These channels open when the membrane tension increases, allowing the cell to protect itself from swelling and rupturing when subject to extreme osmotic shock. [10,11]

      We also note that F. draycotensis, as with other filamentous cyanobacteria, have genes associated with the type IV pili, which may be involved in the surface-based motility [1]. Type IV pili have been shown to be mechanosensitive. For example, in cells of Pseudomonas aeruginosa that ‘twitch’ on a surface using type IV pili, application of mechanical shear stress results in increased production of an intracellular signalling molecule involved in promoting biofilm production. The pilus retraction motor has been shown to be involved in this shear-sensing response [12]. Additionally, twitching P. aeruginosa cells often reverse in response to collisions with other cells. Reversal is also caused by collisions with inert glass microfibres, which suggests that the pili-based motility can be affected by a mechanical stimulus [13].

      References

      (1) D. D. Risser, Hormogonium Development and Motility in Filamentous Cyanobacteria. Appl Environ Microbiol 89, e0039223 (2023).

      (2) T. Lamparter et al., The involvement of type IV pili and the phytochrome CphA in gliding motility, lateral motility and photophobotaxis of the cyanobacterium Phormidium lacuna. PLoS One 17, e0249509 (2022)

      (3) E. Hoiczyk, Gliding motility in cyanobacteria: observations and possible explanations. Arch Microbiol 174, 11-17 (2000).

      (4) D. G. Adams, D. Ashworth, B. Nelmes, Fibrillar Array in the Cell Wall of a Gliding Filamentous Cyanobacterium. Journal of Bacteriology 181 (1999).

      (5) L. N. Halfen, R. W. Castenholz, Gliding in a blue-green alga: a possible mechanism. Nature 225, 1163-1165 (1970).

      (6) S. N. Menon, P. Varuni, F. Bunbury, D. Bhaya, G. I. Menon, Phototaxis in Cyanobacteria: From Mutants to Models of Collective Behavior. mBio 12, e0239821 (2021).

      (7) F. D. Conradi, C. W. Mullineaux, A. Wilde, The Role of the Cyanobacterial Type IV Pilus Machinery in Finding and Maintaining a Favourable Environment. Life (Basel) 10 (2020).

      (8) M. Kurjahn, A. Deka, A. Girot, L. Abbaspour, S. Klumpp, M. Lorenz, O. Bäumchen, S. Karpitschka Quantifying gliding forces of filamentous cyanobacteria by self-buckling. eLife 12:RP87450 (2024).

      (9) S.C. Johnson, J. Veres, H. R. Malcolm, Exploring the diversity of mechanosensitive channels in bacterial genomes. Eur Biophys J 50, 25–36 (2021).

      (10) S.I. Sukharev, W.J. Sigurdson, C. Kung, F. Sachs, Energetic and spatial parameters for gating of the bacterial large conductance mechanosensitive channel, MscL. Journal of General Physiology, 113(4), 525-540 (1999).

      (11) N. Levina, S. Tötemeyer, N.R. Stoke, P. Louis, M.A. Jones, I.R. Boot. Protection of Escherichia coli cells against extreme turgor by activation of MscS and MscL mechanosensitive channels: identification of genes required for MscS activity. The EMBO journal (1999).

      (12) V.D. Gordon, L. Wang, Bacterial mechanosensing: the force will be with you, always. Journal of cell science 132(7):jcs227694 (2019).

      (13) M.J. Kühn, L. Talà, Y.F. Inclan, R. Patino, X. Pierrat, I. Vos, Z. Al-Mayyah, H. Macmillan, J. Negrete Jr, J.N. Engel, A. Persat, Mechanotaxis directs Pseudomonas aeruginosa twitching motility. Proceedings of the National Academy of Sciences. 118(30):e2101759118 (2021).

    1. Author response:

      We sincerely thank the editor and all three reviewers for their constructive comments. We deeply appreciate the reviewers’ efforts in highlighting both the strengths and the weaknesses of our study. To enhance the quality and clarity of our work, we plan to address the concerns raised in the public reviews through the following actions:

      (1) Improving the tone and language of the manuscript

      We will revise the manuscript thoroughly, incorporating additional explanations and clarifications where necessary, and improving the tone and language to enhance readability and precision. Especially, we will pay careful attention on the terms “positional information,” “positional value,” and “positional cue,” and we plan to explain them in a historical context.

      (2) Extending analysis to regular blastemas

      To validate the applicability of our proposed model beyond the accessory limb model (ALM), we will examine the gene expression patterns of key signaling molecules in regular blastemas generated by limb amputation. This will allow us to test whether the mechanisms we describe are also active during normal limb regeneration.

      (3) Increasing sample sizes in critical experiments

      In order to ensure reproducibility and statistical reliability, we will increase the number of biological replicates in key experiments within the limitations regulated by our animal ethics approval. Additionally, we will collect data that clearly defines the dorsal/ventral axis within the structures, as far as possible. We will also revise the manuscript to pay closer attention to the anterior/posterior/dorsal/ventral axis in the existing data, ensuring that it is clearly described.

      (4) Adding quantitative gene expression data

      To support and reinforce our in situ hybridization results, we will include additional quantitative gene expression analyses (e.g., qRT-PCR), thereby strengthening the conclusions drawn from our expression data.

      We are grateful for the reviewers’ insights and are confident that these revisions will significantly strengthen our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews:

      We sincerely thank the reviewers for their thoughtful review and feedback. We believe that our work will provide valuable insights into how MRSA evolves under bacteriophage predation and stimulate efforts to use genetic trade-offs to combat drug resistance. We have substantially revised the paper and performed several additional experiments to address the reviewers' questions and concerns.

      Summary:

      (1) Testing for genetic trade-offs in additional S. aureus strains

      We obtained 30 clinical isolates of the S. aureus USA300 strain that were isolated between 2008 and 2011 (see Table S1). We first tested the FStaph1N, Evo2, and FNM1g6 phages against this expanded strain panel and found that Evo2 showed strong activity against all 30 strains (Table S4). We tested whether Evo2 infection could elicit trade-offs in b-lactam resistance for a subset of these strains. We found that Evo2 infection caused a ~10-100-fold reduction in their MIC against oxacillin. This data is now incorporated into a revised Figure 2 in panel C.

      (2) Testing additional staphylococcal phages

      We isolated from the environment a phage called SATA8505. Similar to FStaph1N and Evo2, SATA8505 belongs to the Kayvirus genus and infects the MRSA strains MRSA252, MW2, and LAC. Phage-resistant MRSA recovered following SATA8505 infection also showed a strong reduction in oxacillin resistance (Figure S5). Furthermore, we confirmed that resistance against FNM1g6, which belongs to the Dubowvirus genes, does not elicit tradeoffs in b-lactam resistance (Figure S4). Sequencing analysis of FNM1g6 - resistant LAC strains showed a different mutation fmhC, which was not observed with the FStaph1N and Evo2 phages (Table 1). We have added this new data into the main text and supplemental figures and tables. Future work will focus on obtaining comprehensive analysis of a wide range of phage families. 

      (3) Testing additional antibiotics

      We also expanded our trade-off analysis include wider range of antibiotic classes (Table S3). Overall, the loss of resistance appears to be confined to b-lactams.

      (4) Genetic analysis of ORF141

      In order determine the function of ORF141, which is mutated in Evo2, we attempted to clone wild-type ORF141 into a staphylococcal plasmid and perform complementation assays with Evo2. Unfortunately, obtaining the plasmid-borne wild-type ORF141 has proven to be tricky, as all clones developed frameshift or deletions in the open reading frame. We posit that the gene product of ORF141 is toxic to the bacteria. We are currently working on placing the gene under more stringent expression conditions but feel that these efforts fall outside of the scope of this paper.  

      (5) Testing the effect of single mutants  

      Our genomic analysis showed that phage-resistant MRSA evolved multiple mutations following phage infection, making it difficult to determine the mechanism of each mutation alone. For example, phage-resistant MW2 and LAC evolved nonsense mutations in transcriptional regulators mgrA, arlR, and sarA. To test whether these mutations alone were sufficient to confer resistance, we obtained MRSA strains with single-gene knockouts of mgrA, arlR, and sarA and tested their ability to resist phage. We observed that deletion of mgrA in the MW2 resulted in a modest reduction in phage sensitivity (Figure S7). However, we did not the observe any changes in the other mutant strains. These results suggest that phage resistance in these strains is likely caused by a combination of mutations. Determining the mechanisms of these mutations is the focus if our future work.

      (6) Transcriptomics of phage-resistant MRSA strains

      To further assess the effects of the phage resistance mutations, we performed bulk RNA-seq on phage-resistant MW2 and LAC strains and compared their differential expression levels to the respective wild-type strains. We picked these strains because our genomic data showed that they had evolved mutations in known transcriptional regulators (e.g. mgrA). Our analysis shows that both strains significantly modulate their gene expression (Figure 4). Notably, both strains upregulate the cell wall-associated protein ebh, while downregulating several genes involved in quorum sensing, virulence, and secretion. We have included this new data in Figure 4 and Table S5 and added an entire section in the manuscript discussing these results and their implications.  

      (7) Co-treatment of MRSA with phage and b-lactam

      We performed checkerboard experiments on MRSA strains with phage and b-lactam gradients (Figure 6). We found that under most conditions, MRSA cells were only able to recover under low phage and b-lactam concentrations. Notably, these recovered cells were still phage resistant and b-lactam sensitive. However, under one condition where MW2 was treated with FStaph1N and b-lactam, we found that some recovered cells still had high levels of b-lactam resistance, showing a distinct mutational profile. We discuss these results in detail in the main text.

      Reviewer # 1:

      Strengths:

      Phage-mediated re-sensitization to antibiotics has been reported previously but the underlying mutational analyses have not been described. These studies suggest that phages and antibiotics may target similar pathways in bacteria.

      We thank Reviewer 1 for this assessment. We hope that the data provided in this work will help stimulate further inquiries into this area and help in the development of better phage-based therapies to combat MRSA.

      Weaknesses:

      One limitation is the lack of mechanistic investigations linking particular mutations to the phenotypes reported here. This limits the impact of the work.

      We acknowledge the limitations of our initial analysis. We note (and cite) that separate studies have already linked mutations in femA, mgrA, arlR, and sarA with reduced b-lactam resistance and virulence phenotypes in MRSA, but not to phage resistance. For the other mutations, we could not find literature linking them to our observed phenotypes. We analyzed the effects of single gene knockouts of mgrA, arlR, and sarA on MRSA’s phage resistance. However, as shown above, the results only showed modest effects on phage resistance in the MW2 strain (see Figure S7 and lines 309-317). We therefore believe that mutations in single genes are not sufficient to cause the trade-offs in phage/ b-lactam resistance. Because each MRSA strain evolved multiple mutations (e.g. MW2 evolved 6 or more mutations), we feel that determining the effects of all possible permutations of those mutations was beyond the scope of the paper.

      However, to bridge the mutational data with our phenotypic observations, we performed RNAseq and compared the transcriptomes of un-treated and phage-treated MRSA strains (see Figure 4, Table S5, and lines 337-391). Our results show that phage-treated MRSA strains significantly modulate their transcript levels. Indeed, some of the changes in gene expression can explain for the phenotypic observations (e.g. overexpression of ebh can lead to reduced clumping). Further, the results shown some unexpected patterns, such as the downregulation of quorum sensing genes or genes involved in type VII secretion.

      Another limitation of this work is the use of lab strains and a single pair of phages. However, while incorporation of clinical isolates would increase the translational relevance of this work it is unlikely to change the conclusions.

      We thank the reviewer for this suggestion. We would like to clarify that MW2, MRSA252, and LAC are pathogenic clinical isolates that were isolated between 1997 and 2000’s. However, we acknowledge that, because these 3 strains have been propagated for many generations, they might have acquired laboratory adaptations. We therefore obtained 30 USA300 clinical strains that were isolated in more recent years (~2008-2011) and tested our phages against them. We note that these clinical isolates (generously provided by Dr. Petra Levin’s lab) were preserved with minimal passaging to reduce the effects of laboratory adaptation. We found that the Evo2 phage was able to elicit oxacillin trade-offs in those strains as well. (see Table S1, Table S7, Fig 2C, and lines 210 – 225)

      For the phages, we had to work with phage(s) that could infect all three MRSA strains. That is why in our initial tests, we focused on FStaph1N and Evo2, both members of the Kayvirus genus. Now in our revised work, we extend our analysis to FNM1g6, a member of the Dubowvirus genus, that also infects the LAC strain, but not MW2 and MRSA252. We find that FNM1g6 is unable to drive trade-offs in b-lactam resistance (see lines 229 – 238). Next, we analyzed the effects of SATA8505, also a member of the Kayvirus genus. Here, we observed that SATA8505 can elicit trade-offs in b-lactam resistance (see Figure S5 and lines 238 – 246). These results suggest that not all staphylococcal phages can elicit these trade-offs and call for more comprehensive analyses of different types of phages.

      Reviewer #1 (Recommendations for the authors):

      Specific questions:

      (1) The Evo2 isolate is an evolved version of phage Staph1N with more potent lytic activity. Is this reflected in more pronounced antibiotic sensitivity?

      We did not observe that Evo2-treated MRSA cells showed more sensitivity towards b-lactams. However, we did observe that Evo2 was able to elicit these trade-offs at lower multiplicities of infection (MOI) (see lines 173 – 176 and Figure S2). Further, we did observe that Evo2 caused a greater trade-off in virulence phenotypes (hemolysis and cell agglutination) (see lines 416 - 419 lines 433 – 435, and Figure 5)

      In our revisions, we also tested Evo2-treated MRSA against a wide range of antibiotics. We did not observe significant changes in MICs against those agents.   

      (2) Are there mutations in the SCCmec cassette or the MecA gene after selection against ΦStaph1N?

      We did not observe any mutations in known resistance genes SCCmec or blaZ. Furthermore, we did not see any differential expression of those genes in our transcriptomic data (see lines 344 and 346).  

      (3) The authors report that phage ΦNM1γ6 does not induce antibiotic sensitivity changes despite being effective against bacterial strain LAC. Were mutational sequencing studies performed with the resistant isolates that emerged against this strain? Can the authors hypothesize why these did not impact the virulence or resistance of LAC despite effective killing? How does this align with their models for ΦStaph1N?

      We thank the reviewer for that insightful question. In our revised manuscript, we found that ΦNM1γ6 elicits a point mutation in the fmhC gene, which is involved in cell wall maintenance (see lines 326 – 335). To our knowledge, this point mutation has not been linked to phage resistance or drug sensitivity MRSA. Notably this mutation was not observed with ΦStaph1N or Evo2. We therefore speculate that ΦNM1γ6 binds to a different receptor molecule on the MRSA cell wall.   

      (4) If I understand correctly, the authors attribute these effects of phage predation on antibiotic sensitivity and virulence to orthogonal selection pressures. A good test of this model would be to examine the mutations that emerge in antibiotic/phage co-treatment. This should be done.

      We thank the reviewer for this suggestion. As described in the summary section above, we performed checkerboard experiments on MRSA strains with phage and b-lactam gradients (see lines 440 – 494 and Figure 6). We found that under most conditions, MRSA cells were only able to recover under low phage and b-lactam concentrations. Notably, these recovered cells were still phage resistant and b-lactam sensitive. However, under one condition where MW2 was treated with FStaph1N and b-lactam, we found that some recovered cells still had high levels of b-lactam resistance and only limited phage resistance, showing a distinct mutational profile (Figure S6). Under these conditions, we think that the selective pressure exerted by FStaph1N is “overcome” by the selective pressure of the high oxacillin concentration, a point that we discuss in the main text.

      Reviewer #2 (Public review):

      Summary:

      The work presented in the manuscript by Tran et al deals with bacterial evolution in the presence of bacteriophage. Here, the authors have taken three methicillin-resistant S. aureus strains that are also resistant to beta-lactams. Eventually, upon being exposed to phage, these strains develop beta-lactam sensitivity. Besides this, the strains also show other changes in their phenotype such as reduced binding to fibrinogen and hemolysis.

      Strengths:

      The experiments carried out are convincing to suggest such in vitro development of sensitivity to the antibiotics. Authors were also able to "evolve" phage in a similar fashion thus showing enhanced virulence against the bacterium. In the end, authors carry out DNA sequencing of both evolved bacteria and phage and show mutations occurring in various genes. Overall, the experiments that have been carried out are convincing.

      We thank Reviewer 2 for their positive comments.

      Weaknesses:

      Although more experiments are not needed, additional experiments could add more information. For example, the phage gene showing the HTH motif could be reintroduced in the bacterial genome and such a strain can then be assayed with wildtype phage infection to see enhanced virulence as suggested. At least one such experiment proves the discoveries regarding the identification of mutations and their outcome.

      We thank the reviewer for this suggestion. We attempted to clone ORF141 into an expression plasmid and perform complementation experiments with Evo2 phage; however, all transformants that were isolated had premature stop-codons and frameshifts in the wild-type ORF141 insert that would disrupt protein function. We therefore think that the gene product of ORF141 might be toxic to the cells. We are currently working on placing the gene under more stringent transcriptional control but feel that these efforts fall outside of the scope of this paper.  

      Secondly, I also feel that authors looked for beta-lactam sensitivity and they found it. I am sure that if they look for rifampicin resistance in these strains, they will find that too. In this case, I cannot say that the evolution was directed to beta-lactam sensitivity; this is perhaps just one trait that was observed. This is the only weakness I find in the work. Nevertheless, I find the experiments convincing enough; more experiments only add value to the work.  

      We thank the reviewer for their comments. Because both phages and β-lactams interface with the bacterial cell wall, we posited that phage resistance would reduce resistance in cell wall targeting antibiotics. In our revisions, we have expanded our analysis to include a much wider range of antibiotic classes, including rifampicin, mupirocin, erythromycin, and other cell wall disruptors, such as daptomycin and teicoplanin. We did not observe any significant changes to the MICs of these other antibiotics (see Table S3 and lines 191-199). It therefore appears that the effects of these trade-offs are confined to beta-lactams.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      This study uses in vivo multimodal high-resolution imaging to track how microglia and neutrophils respond to light-induced retinal injury from soon after injury to 2 months post-injury. The in vivo imaging finding was subsequently verified by ex vivo study. The results suggest that despite the highly active microglia at the injury site, neutrophils were not recruited in response to acute light-induced retinal injury.

      Strengths:

      An extremely thorough examination of the cellular-level immune activity at the injury site. In vivo imaging observations being verified using ex vivo techniques is a strong plus.

      Thank you!

      Weaknesses:

      This paper is extremely long, and in the perspective of this reviewer, needs to be better organized. Update: Modifications have been made throughout, which has made the manuscript easier to follow.

      Thank you!

      Study weakness: though the finding prompts more questions and future studies, the findings discussed in this paper is potentially important for us to understand how the immune cells respond differently to different severity level of injury. The study also demonstrated an imaging technology which may help us better understand cellular activity in living tissue during earlier time points.

      We agree that AOSLO has much to offer and this represents some of the earliest reports of its kind.  

      Comments on revisions:

      I appreciate the thorough clarification and re-organization by the authors, and the messages in the manuscript are now more apparent. I recommend also briefly discussing limitations/future improvements in the discussion or conclusion.

      We have added a section to the discussion entitled “Limitations and future improvements”, please see lines 665 – 677.

      Reviewer #3 (Public review):

      Summary

      This work investigated the immune response in the murine retina after focal laser lesions. These lesions are made with close to 2 orders of magnitude lower laser power than the more prevalent choroidal neovascularization model of laser ablation. Histology and OCT together show that the laser insult is localized to the photoreceptors and spares the inner retina, the vasculature and the pigment epithelium. As early as 1-day after injury, a loss of cell bodies in the outer nuclear layer is observed. This is accompanied by strong microglial proliferation to the site of injury in the outer retina where microglia do not typically reside. The injury did not seem to result in the extravasation of neutrophils from the capillary network, constituting one of the main findings of the paper. The demonstrated paradigm of studying the immune response and potentially retinal remodeling in the future in vivo is valuable and would appeal to a broad audience in visual neuroscience.

      Strengths

      Adaptive optics imaging of murine retina is cutting edge and enables non-destructive visualization of fluorescently labeled cells in the milieu of retinal injury. As may be obvious, this in vivo approach is a benefit for studying fast and dynamic immune processes on a local time scale - minutes and hours, and also for the longer days-to-months follow-up of retinal remodeling as demonstrated in the article. In certain cases, the in vivo findings are corroborated with histology.

      Thank you!

      The analysis is sound and accompanied by stunning video and static imagery. A few different sets of mouse models are used, a) two different mouse lines, each with a fluorescent tag for neutrophils and microglia, b) two different models of inflammation - endotoxin-induced uveitis (EAU) and laser ablation are used to study differences in the immune interaction.

      Thank you!

      One of the major advances in this article is the development of the laser ablation model for 'mild' retinal damage as an alternative to the more severe neovascularization models. This model would potentially allow for controlling the size, depth and severity of the laser injury opening interesting avenues for future study.

      Thank you!

      The time-course, 2D and 3D spatial activation pattern of microglial activation are striking and provide an unprecedented view of the retinal response to mild injury.

      We agree that this more complete spatial and temporal evaluation made possible by in vivo imaging is novel.

      Weaknesses

      Generalization of the (lack of) neutrophil response to photoreceptor loss - there is ample evidence in literature that neutrophils are heavily recruited in response to severe retinal damage that includes photoreceptor loss. Why the same was not observed here in this article remains an open question. One could hypothesize that neutrophil recruitment might indeed occur under conditions that are more in line with the more extreme damage models, for example, with a stronger and global ablation (substantially more photoreceptor loss over a larger area). This parameter space is unwieldy and sufficiently large to address the question conclusively in the current article, i.e. how much photoreceptor loss leads to neutrophil recruitment? By the same token, the strong and general conclusion in the title - Photoreceptor loss does not recruit neutrophils - cannot be made until an exhaustive exploration be made of the same parameter space. A scaling back may help here, to reflect the specific, mild form of laser damage explored here, for instance - Mild photoreceptor loss does not recruit neutrophils despite...

      We are striving for clarity and accuracy in our title without adding too many qualifiers.  At present, we feel that the title as submitted is consistent and aligned with the central finding of our manuscript.  The nuance that the reviewer points to is elaborated in the body of the manuscript and we hope the general readership appreciates the same level of detail as appreciated by reviewer #3.

      EIU model - The EIU model was used as a positive control for neutrophil extravasation. Prior work with flow cytometry has shown a substantial increase in neutrophil counts in the EIU model. Yet, in all, the entire article shows exactly 2 examples in vivo and 3 ex vivo (Figure 7) of extravasated neutrophils from the EIU model (n = 2 mice). The general conclusion made about neutrophil recruitment (or lack thereof) is built partly upon this positive control experiment. But these limited examples, especially in the case where literature reports a preponderance of extravasated neutrophils, raise a question on the paradigm(s) used to evaluate this effect in the mild laser damage model.

      This is a helpful suggestion. We agree that readers should see more evidence of the positive control. Therefore we have now included two more supplementary files that show that there is a strong neutrophil response to EIU.  In Figure 7 – supplementary figure 1, we show many Ly-6G-positive neutrophils in the retina seen with histology at the 24 hour time point. In Figure 7 – video 3, we show massive Catchup-positive neutrophil presence in vivo at 24hrs as well.  This aligns with our positive control and also the literature.

      Overall, the strengths outweigh the weaknesses, provided the conclusions/interpretations are reconsidered.

      With the added clarification about the magnitude of the neutrophil response in EIU, we feel that the conclusions presented in the manuscript as-is are valid and appropriate.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      The authors are applauded for embracing the reviewers' feedback and making substantial revisions. Some minor comments below:

      The weakness noted in the public review encourages the authors to reconsider the interpretations drawn based on the results. One would have expected to see far more examples of extravasated neutrophils from the EIU model. That this was not seen weakens the neutrophil recruitment claim substantially. Even without this claim, the methods, laser damage model, time-course and spatial activation pattern of microglial activation are all striking and unprecedented. So, as stated in the public review, the strengths do indeed outweigh the weaknesses once the neutrophil claim is softened.

      We address this in the response above. A strong neutrophil response was observed to EIU. This was confirmed with both histology and in vivo imaging.

      This was alluded to by Reviewer 1 in the prior review - at times, there is an overemphasis on imaging technology that distracts from the scientific questions. The imaging is undoubtedly cutting-edge but also documented in prior work by the authors. Any efforts to reduce or balance the emphasis would help with the general flow.

      Given that these discoveries are made possible partly through new technology, we prefer to keep the details of the innovation in the current manuscript. Given the exceptionally large readership of eLife, we feel some description of the AOSLO imaging is warranted in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors explored how galanin affects whole-brain activity in larval zebrafish using wide-field Ca2+ imaging, genetic modifications, and drugs that increase brain activity. The authors conclude that galanin has a sedative effect on the brain under normal conditions and during seizures, mainly through the galanin receptor 1a (galr1a). However, acute "stressors(?)" like pentylenetetrazole (PTZ) reduce galanin's effects, leading to increased brain activity and more seizures. The authors claim that galanin can reduce seizure severity while increasing seizure occurrence, speculated to occur through different receptor subtypes. This study confirms galanin's complex role in brain activity, supporting its potential impact on epilepsy.

      Strengths:

      The overall strength of the study lies primarily in its methodological approach using whole-brain Calcium imaging facilitated by the transparency of zebrafish larvae. Additionally, the use of transgenic zebrafish models is an advantage, as it enables genetic manipulations to investigate specific aspects of galanin signaling. This combination of advanced imaging and genetic tools allows for addressing galanin's role in regulating brain activity.

      Weaknesses:

      The weaknesses of the study also stem from the methodological approach, particularly the use of whole-brain Calcium imaging as a measure of brain activity. While epilepsy and seizures involve network interactions, they typically do not originate across the entire brain simultaneously. Seizures often begin in specific regions or even within specific populations of neurons within those regions. Therefore, a whole-brain approach, especially with Calcium imaging with inherited limitations, may not fully capture the localized nature of seizure initiation and propagation, potentially limiting the understanding of Galanin's role in epilepsy.

      Furthermore, Galanin's effects may vary across different brain areas, likely influenced by the predominant receptor types expressed in those regions. Additionally, the use of PTZ as a "stressor" is questionable since PTZ induces seizures rather than conventional stress. Referring to seizures induced by PTZ as "stress" might be a misinterpretation intended to fit the proposed model of stress regulation by receptors other than Galanin receptor 1 (GalR1).

      The description of the EAAT2 mutants is missing crucial details. EAAT2 plays a significant role in the uptake of glutamate from the synaptic cleft, thereby regulating excitatory neurotransmission and preventing excitotoxicity. Authors suggest that in EAAT2 knockout (KO) mice galanin expression is upregulated 15-fold compared to wild-type (WT) mice, which could be interpreted as galanin playing a role in the hypoactivity observed in these animals.

      Indeed, our observation of the unexpected hypoactivity in EAAT2a mutants, described in our description of this mutant (Hotz et al., 2022), prompted us to initiate this study formulating the hypothesis that the observed upregulation of galanin is a neuroprotective response to epilepsy.

      However, the study does not explore the misregulation of other genes that could be contributing to the observed phenotype. For instance, if AMPA receptors are significantly downregulated, or if there are alterations in other genes critical for brain activity, these changes could be more important than the upregulation of galanin. The lack of wider gene expression analysis leaves open the possibility that the observed hypoactivity could be due to factors other than, or in addition to, galanin upregulation.

      We have performed a transcriptome analysis that we are still evaluation. We can already state that AMPA receptor genes are not significantly altered in the mutant.

      Moreover, the observation that in double KO mice for both EAAT2 and galanin, there was little difference in seizure susceptibility compared to EAAT2 KO mice alone further supports the idea that galanin upregulation might not be the reason for the observed phenotype. This indicates that other regulatory mechanisms or gene expressions might be playing a more pivotal role in the manifestation of hypoactivity in EAAT2 mutants.

      We agree that upregulation of galanin transcripts is at best one of a suite of regulatory mechanisms that lead to hypoactivity in EAAT2 zebrafish mutants.

      These methodological shortcomings and conceptual inconsistencies undermine the perceived strengths of the study, and hinders understanding of Galanin's role in epilepsy and stress regulation.

      Reviewer #2 (Public Review):

      Summary:

      This study is an investigation of galanin and galanin receptor signaling on whole-brain activity in the context of recurrent seizure activity or under homeostatic basal conditions. The authors primarily use calcium imaging to observe whole-brain neuronal activity accompanied by galanin qPCR to determine how manipulations of galanin or the galr1a receptor affect the activity of the whole-brain under non-ictal or seizure event conditions. The authors' Eaat2a-/- model (introduced in their Glia 2022 paper, PMID 34716961) that shows recurrent seizure activity alongside suppression of neuronal activity and locomotion in the time periods lacking seizures is used in this paper in comparison to the well-known pentylenetetrazole (PTZ) pharmacological model of epilepsy in zebrafish. Given the literature cited in their Introduction, the authors reasonably hypothesize that galanin will exert a net inhibitory effect on brain activity in models of epilepsy and at homeostatic baseline, but were surprised to find that this hypothesis was only moderately supported in their Eaat2a-/- model. In contrast, under PTZ challenge, fish with galanin overexpression showed increased seizure number and reduced duration while fish with galanin KO showed reduced seizure number and increased duration. These results would have been greatly enriched by the inclusion of behavioral analyses of seizure activity and locomotion (similar to the authors' 2022 Glia paper and/or PMIDs 15730879, 24002024). In addition, the authors have not accounted for sex as a biological variable, though they did note that sex sorting zebrafish larvae precludes sex selection at the younger ages used. It would be helpful to include smaller experiments taken from pilot experiments in older, sex-balanced groups of the relevant zebrafish to increase confidence in the findings' robustness across sexes. A possible major caveat is that all of the various genetic manipulations are non-conditional as performed, meaning that developmental impacts of galanin overexpression or galanin or galr1a knockout on the observed results have not been controlled for and may have had a confounding influence on the authors' findings. Overall, this study is important and solid (yet limited), and carries clear value for understanding the multifaceted functions that neuronal galanin can have under homeostatic and disease conditions.

      Strengths:

      - The authors convincingly show that galanin is upregulated across multiple contexts that feature seizure activity or hyperexcitability in zebrafish, and appears to reduce neuronal activity overall, with key identified exceptions (PTZ model).

      - The authors use both genetic and pharmacological models to answer their question, and through this diverse approach, find serendipitous results that suggest novel underexplored functions of galanin and its receptors in basal and disease conditions. Their question is well-informed by the cited literature, though the authors should cite and consider their findings in the context of Mazarati et al., 1998 (PMID:982276). The authors' Discussion places their findings in context, allowing for multiple interpretations and suggesting some convincing explanations.

      - Sample sizes are robust and the methods used are well-characterized, with a few exceptions (as the paper is currently written).

      - Use of a glutamatergic signaling-based genetic model of epilepsy (Eaat2a-/-) is likely the most appropriate selection to test how galanin signaling can alter seizure activity, as galanin is known to reduce glutamatergic release as an inhibitory mechanism in rodent hippocampal neurons via GalR1a (alongside GIRK activation effects). Given that PTZ instead acts through GABAergic signaling pathways, it is reasonable and useful to note that their glutamate-based genetic model showed different effects than did their GABAergic-based model of seizure activity.

      Weaknesses:

      - The authors do not include behavioral assessments of seizure or locomotor activity that would be expected in this paper given their characterizations of their Eaat2a-/- model in the Glia 2022 paper that showed these behavioral data for this zebrafish model. These data would inform the reader of the behavioral phenotypes to expect under the various conditions and would likely further support the authors' findings if obtained and reported.<br />

      We agree that a thorough behavioral assessment would have strengthened the study, but we deemed it outside of the scope of this study.

      - No assessment of sex as a biological variable is included, though it is understood that these specific studied ages of the larvae may preclude sex sorting for experimental balancing as stated by the authors.

      The study was done on larval zebrafish (5 days post fertilization). The first signs of sexual differentiation become apparent at about 17 days post fertilization (reviewed in Ye and Chen, 2020). Hence sex is no biological variable at the stage studied. 

      - The reported results may have been influenced by the loss or overexpression of galanin or loss of galr1a during developmental stages. The authors did attempt to use the hsp70l system to overexpress galanin, but noted that the heat shock induction step led to reduced brain activity on its own (Supplementary Figure 1). Their hsp70l:gal model shows galanin overexpression anyways (8x fold) regardless of heat induction, so this model is still useful as a way to overexpress galanin, but it should be noted that this galanin overexpression is not restricted to post-developmental timepoints and is present during development.

      The developmental perspective is an important point to consider. Due to the rapid development of the zebrafish it is not trivial to untangle this. In the zebrafish we first observe epileptic seizures as early as 3 days post fertilization (dpf), where the brain is clearly not well developed yet (e.g. behaviroal response to light are still minimal). Even the 5 dpf stage, where most of our experiments have been conducted, cannot by far not be considered post-development.  

      Reviewer #3 (Public Review):

      Summary:

      The neuropeptide galanin is primarily expressed in the hypothalamus and has been shown to play critical roles in homeostatic functions such as arousal, sleep, stress, and brain disorders such as epilepsy. Previous work in rodents using galanin analogs and receptor-specific knockout has provided convincing evidence for the anti-convulsant effects of galanin.

      In the present study, the authors sought to determine the relationship between galanin expression and whole-brain activity. The authors took advantage of the transparent nature of larval zebrafish to perform whole-brain neural activity measurements via widefield calcium imaging. Two models of seizures were used (eaat2a-/- and pentylenetetrazol; PTZ). In the eaat2a-/- model, spontaneous seizures occur and the authors found that galanin transcript levels were significantly increased and associated with a reduced frequency of calcium events. Similarly, two hours after PTZ galanin transcript levels roughly doubled and the frequency and amplitude of calcium events were reduced. The authors also used a heat shock protein line (hsp70I:gal) where galanin transcript levels are induced by activation of heat shock protein, but this line also shows higher basal transcript levels of galanin. Again, the higher level of galanin in hsp70I:gal larval zebrafish resulted in a reduction of calcium events and a reduction in the amplitude of events. In contrast, galanin knockout (gal-/-) increased calcium activity, indicated by an increased number of calcium events, but a reduction in amplitude and duration. Knockout of the galanin receptor subtype galr1a via crispants also increased the frequency of calcium events.

      In subsequent experiments in eaat2a-/- mutants were crossed with hsp70I:gal or gal-/- to increase or decrease galanin expression, respectively. These experiments showed modest effects, with eaat2a-/- x gal-/- knockouts showing an increased normalized area under the curve and seizure amplitude.

      Lastly, the authors attempted to study the relationship between galanin and brain activity during a PTZ challenge. The hsp70I:gal larva showed an increased number of seizures and reduced seizure duration during PTZ. In contrast, gal-/- mutants showed an increased normalized area under the curve and a stark reduction in the number of detected seizures, a reduction in seizure amplitude, but an increase in seizure duration. The authors then ruled out the role of Galr1a in modulating this effect during PTZ, since the number of seizures was unaffected, whereas the amplitude and duration of seizures were increased.

      Strengths:

      (1) The gain- and loss-of function galanin manipulations provided convincing evidence that galanin influences brain activity (via calcium imaging) during interictal and/or seizure-free periods. In particular, the relationship between galanin transcript levels and brain activity in Figures 1 & 2 was convincing.

      (2) The authors use two models of epilepsy (eaat2a-/- and PTZ).

      (3) Focus on the galanin receptor subtype galr1a provided good evidence for the important role of this receptor in controlling brain activity during interictal and/or seizure-free periods.

      Weaknesses:

      (1) Although the relationship between galanin and brain activity during interictal or seizure-free periods was clear, the manuscript currently lacks mechanistic insight in the role of galanin during seizure-like activity induced by PTZ.

      We completely agree and concede that this study constitutes only a first attempt to understand the (at least for us) perplexing complexity of galanin function on the brain.

      (2) Calcium imaging is the primary data for the paper, but there are no representative time-series images or movies of GCaMP signal in the various mutants used.

      We have now added various movies in supplementary data.

      (3) For Figure 3, the authors suggest that hsp70I:gal x eaat2a-/-mutants would further increase galanin transcript levels, which were hypothesized to further reduce brain activity. However, the authors failed to measure galanin transcript levels in this cross to show that galanin is actually increased more than the eaat2a-/- mutant or the hsp70I:gal mutant alone.

      After a couple of unsuccessful mating attempts with our older mutants, we finally decided not to wait for a new generation to grow up, deeming the experiment not crucial (but still nice to have).

      (4) Similarly, transcript levels of galanin are not provided in Figure 2 for Gal-/- mutants and galr1a KOs. Transcript levels would help validate the knockout and any potential compensatory effects of subtype-specific knockout.

      To validate the gal-/- mutant line, we decided to show loss of protein expression (Suppl. Figure 2), which we deem to more relevant to argue for loss of function. Galanin transcript levels in galr1a KOs were also added into the same Figure. However, validation of the galr1a KO could not be performed due to transcript levels being close to the detection limit and lack of available antibodies.

      (5) The authors very heavily rely on calcium imaging of different mutant lines. Additional methods could strengthen the data, translational relevance, and interpretation (e.g., acute pharmacology using galanin agonists or antagonists, brain or cell recordings, biochemistry, etc).

      Again, we agree and concede that a number of additional approaches are needed to get more insight into the complex role of galanin in regulation overall brain activity. These include, among others, also behavioral, multiple single cell recordings and pharmacological interventions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Minor issues:

      (1) "Sedative" effect of galanin is somewhat vague and seems overapplied without the inclusion of behavioral data showing sedation effects. I would replace "sedative" with something clearer, like the phrase "net inhibitory effect" or similar.

      We have modified the wording as deemed appropriate.

      (2) Include new data that is sufficiently powered to detect or rule out the effects of sex as a biological variable within the various experiments.

      At this stage sex is not a biological variable. Sex determination starts a late larval stage around 14dpf. Our analysis is based on 5pdf larvae.

      (3) Attempt to perform some experiments with galanin/galr1a manipulations that have been induced after the majority of development without using heat shock induction if possible (unknown how feasible this is in current model systems).

      In the current model this is not feasible, but an excellent suggestion for future studies that would then also address more longterm effects in the model.

      (4) Figure 2 should include qPCR results for galanin or galr1a mRNA expression to match Figure 1C, F, and Figure 2C and to confirm reductions in the respective RNA transcript levels of gal or galr1a. It could be useful to perform qPCR for galanin in all galr1aKO mice to ascertain whether compensatory elevations in galanin occur in response to galr1aKO.

      (5) Axes should be made with bolder lines and bolder/larger fonts for readability and consistency throughout.

      Indeed, an excellent suggestion. We have adjusted the axes significantly improving the readability of the graphs.

      (6) The bottom o,f the image for Figure 2 appears to have been cut off by mistake (page 5).

      (7) The ending of the legend text for Figure 3 appears to have been cut off by mistake (page 6).

      Both regrettable mistakes have been corrected (already in the initial posted version)

      Reviewer #3 (Recommendations For The Authors):

      (1) The introduction or first paragraph of the results should be revised to more directly state the hypotheses. Several critical details were only clear after reading the discussion.

      We added some words to the introduction, hoping that the critical points are now more apparent to the reader.

      (2) Galanin is known to be rapidly depleted by seizures (Mazarati et al., 1998; Journal of Neuroscience, PMID #9822761) but this paper did not appear to be cited or considered. Could the rapid depletion of galanin during seizures help explain the confusing effects of galanin manipulations during PTZ?

      We have added a sentence and the reference to the discussion.

      (3) Figure 1 panels are incorrect. For example, Panel 'F' is used twice and the figure legend is also incorrect due to the labeling errors. In-text references to the figure should also be updated accordingly.

      (4) In Figure 2 N-P, the delta F/F threshold wording is partially cropped. The figure should be updated.

      Thank you for pointing out this mistake. Both figures have now been updated (already in the initial posted version)

      (5) The naming and labeling of groups in the manuscript and figures should be updated to more accurately reflect the fish used for each experiment. As it currently stands, I found the labeling confusing and sometimes misleading. For example, Figure 3 'controls' are actually eaat2a-/- mutants, whereas the other group is hsp70I:gal x eaat2a-/- crosses or gal-/- x eaat2a-/- crosses. In other Figures, 'controls' are eaat2a+/+larva, or wild-type siblings (sometimes unclear).

      We have made appropriate changes to the manuscript to make this point clearer to the reader, especially when the controls are eaat2a mutants.

      (6) Figure 4J and 4K only show 5 data points, when the authors clearly indicate that 6 fish had seizures. Continuation of this data in Figure 4L shows 6 data points.

      Indeed the 6 data points in Figure 4J and K are hard to see due to their nearly complete overlap. On larger magnification all six data points become distinguishable. We will try some different plotting approaches for the revision.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      Summary:

      Gene transfer agent (GTA) from Bartonella is a fascinating chimeric GTA that evolved from the domestication of two phages. Not much is known about how the expression of the BaGTA is regulated. In this manuscript, Korotaev et al noted the structural similarity between BrrG (a protein encoded by the ror locus of BaGTA) to a well-known transcriptional anti-termination factor, 21Q, from phage P21. This sparked the investigation into the possibility that BaGTA cluster is also regulated by anti-termination. Using a suite of cell biology, genetics, and genome-wide techniques (ChIP-seq), Korotaev et al convincingly showed that this is most likely the case. The findings offer the first insight into the regulation of GTA cluster (and GTA-mediated gene transfer) particularly in this pathogen Bartonella. Note that anti-termination is a well-known/studied mechanism of transcriptional control. Anti-termination is a very common mechanism for gene expression control of prophages, phages, bacterial gene clusters, and other GTAs, so in this sense, the impact of the findings in this study here is limited to Bartonella.

      Strengths:

      Convincing results that overall support the main claim of the manuscript.

      Weaknesses:

      A few important controls are missing.

      We sincerely appreciate reviewer #1's positive assessment of our manuscript. In response to the concern regarding control samples/experiments, we have addressed this issue in our revision, by providing data of the replicates of our experiments. We acknowledge that antitermination is a well-established mechanism of expression control in bacteria, including bacterial gene clusters, phages, prophages, and at least one other GTA. As reviewer #2 also noted, our study presents a unique example of phage co-domestication, where antitermination integrates both phage remnants at the regulatory level. We have emphasized this original aspect more clearly in the revised manuscript.

      Reviewer 1 (Recommendations for the authors):

      (1) Provide Rsmd and DALI scores to show how similar the AlphaFold-predicted structures of BrrG are to other anti-termination factors. This should be done for Fig1B and also for Suppl. Fig 1 to support the claim that BrrG, GafA, GafZ, Q21 share structural features.

      In the revised manuscript we provide Rsmd and DALI scores in the supplementary Fig. 1A (Suppl. Fig. 1A). In Suppl. Fig. 1B we further include a heatmap of similiarity values.

      (2) Throughout the manuscript, flow cytometry data of gfp expression was used and shown as single replicate. Korotaev et al wrote in the legends that error bars are shown (that is not true for e.g. Figs. 3, 4, and 5). It is difficult for reviewers/readers to gauge how reliable are their experiments.

      In the revised manuscript we show all replicates for the flow cytometry histograms.

      For Fig. 2C, all replicates are provided in Suppl. Fig. 3.

      For Fig. 3B, all replicates are provided in Suppl. Fig. 4.

      For Fig. 4B, all replicates are provided in Suppl. Fig. 5.

      For Fig. 5B, all replicates are provided in Suppl. Fig. 6.

      (3) I am unsure how ChIP-seq in Fig. 2A was performed (with anti-FLAG or anti-HA antibodies? I cannot tell from the Materials & Methods). More importantly, I did not see the control for this ChIP-seq experiment. If a FLAG-tagged BrrG was used for ChIP-seq, then a WT non-tagged version should be used as a negative control (not sequencing INPUT DNA), this is especially important for anti-terminator that can co-travel with RNA polymerase. Please also report the number of replicates for ChIP-seq experiments.

      Fig. 2A presents the coverage plot from the ChIP-Seq of ∆brrG +pPtet:3xFLAG-brrG (N’ in green). As anticipated by the referee, we had used ∆brrG +pTet:brrG (untagged) as control (grey). Each strain was tested in a single replicate. The C-terminal tag produced results similar to the untagged version, suggesting it is non-functional. All tested tags are shown in Supplementary Figure 2.

      (4) Korotaev et al mentioned that BrrG binds to DNA (as well as to RNA polymerase). With the availability of existing ChIP-seq data, the authors should be able to locate the DNA-binding element of BrrG, this additional information will be useful to the community.

      We identified a putative binding site of BrrG using our ChIP-Seq data. The putative binding site is indicated in Fig. 2D of the revised manuscript.

      (5) Mutational experiments to break the potential hairpin structure are required to strengthen the claim that this putative hairpin is the potential transcriptional terminator.

      We did not claim the identified hairpin is a confirmed terminator, but proposed it as a candidate. We agree with the referee that the suggested experiment would be necessary to definitively establish its function. However, our main objective was to show that BrrG acts as a processive terminator, which we demonstrated by replacing the putative terminator with a well-characterized synthetic one that BrrG successfully bypassed. Therefore, we chose not to perform the proposed experiment and have accordingly softened our conclusions regarding the hairpin’s potential terminator function.

      Reviewer 2 (Public review):

      Summary:

      In this study, the authors identified and characterized a regulatory mechanism based on transcriptional anti-termination that connects the two gene clusters, capsid and run-off replication (ROR) locus, of the bipartite Bartonella gene transfer agent (GTA). Among genes essential for GTA functionality identified in a previous transposon sequencing project, they found a potential antiterminatior of phage origin within the ROR locus. They employed fluorescence reporter and gene transfer assays of overexpression and knockout strains in combination with ChiPSeq and promoter-fusions to convincingly show that this protein indeed acts as an antiterminator counteracting attenuation of the capsid gene cluster expression.

      Impact on the field:

      The results provide valuable insights into the evolution of the chimeric BaGTA, a unique example of phage co-domestication by bacteria. A similar system found in the other broadly studied Rhodobacterales/Caulobacterales GTA family suggests that antitermination could be a general mechanism for GTA control.

      Strengths:

      Results of the selected and carefully designed experiments support the main conclusions.

      Weaknesses:

      It remains open why overexpression of the antiterminator does not increase the gene transfer frequency.

      We are grateful for reviewer #2's thoughtful and encouraging feedback on our manuscript. The reviewer raises an important question about why overexpression of the antiterminator does not increase gene transfer frequency. While we acknowledge this point, we consider it beyond the scope of the current study. Our findings clearly demonstrate that the antiterminator induces capsid component expression in a large proportion of cells. However, the fact that this expression plateaus at high levels rather than exhibiting a transient peak, as seen in the wild type, suggests that antiterminators do not regulate GTA particle release via lysis. We are actively investigating this further through additional experiments, which we plan to publish separately from this study.

      Reviewer 2 (Recommendations for the authors):

      (1) The authors wrote "GTAs are not self-transmitting because the DNA packaging capacity of a GTA particle is too small to package the entire gene cluster encoding it" (page 3). I thought that at least the Bartonella capsid gene cluster should be self-transmissible within the 14 kb packaged DNA (https://doi.org/10.1371/journal.pgen.1003393, https://doi.org/10.1371/journal.pgen.1000546). This was also concluded by Lang et al (https://doi.org/10.1146/annurev-virology-101416-041624). In this case the presented results would have important implications. As the gene cluster and the anti-terminator required for its expression are separated on the chromosome, it would not be possible to transfer an active GTA gene cluster, although the DNA coding for the genes required for making the packaging agent itself, theoretically fits into a BaGTA particle. Could the authors comment on that? I think it would be helpful to add the sizes of the different gene clusters and the distance between them in Fig. 2A. The ROR amplified region spans 500kb, is the capsid gene cluster within this region?

      We thank the reviewer for bringing up this interesting point. The ror gene cluster, which encodes the antiterminator BrrG, is approximately 9.2 kb in size and could feasibly be packaged in its entirety into a GTA particle. In contrast, the bgt cluster (capsid cluster) is approximately 20 kb in size —exceeding the packaging limit of GTA particles—and is separated from the bgt cluster by approximately 35 kb. Consequently, if the ror cluster is transferred via a GTA particle into a recipient host that does not encode the bgt gene cluster, the ror cluster would not be expressed.

      We added the sizes of the gene clusters to Fig. 1A.

      (2) Another side-note regarding the introduction: On page three the authors write: "GTAs encode bacteriophage-like particles and in contrast to phages transfer random pieces of host bacterial DNA". While packaging is not specific, certain biases in the packaging frequency are observed in both studied GTA families. For Bartonella this is ROR. In the two GTA-producing strains D. shibae and C. crescentus origin and terminus of replication are not packaged and certain regions are overrepresented (https://doi.org/10.1093/gbe/evy005, https://doi.org/10.1371/journal.pbio.3001790). Furthermore, D. shibae plasmids are not packaged but chromids are. I think the term "random" does not properly describe these observations. I would suggest using "not specific" instead.

      We thank the reviewer for this suggestion and adjusted the wording on p. 3 accordingly.

      (3) Page 5: Remove "To address this". It is not needed as you already state "To test this hypothesis" in the previous sentence.

      We adjusted the working on p.5 accordingly.

      (4) I think the manuscript would greatly benefit from a summary figure to visualize the Q-like antiterminator-dependent regulatory circuit for GTA control and its four components described on pages 15 and 16.

      We thank the reviewer for this valuable suggestion. We included a summary figure (Fig. 6) in the discussion section of the revised manuscript.

      (5) Page 17: It might be worth noting that GafA is highly conserved along GTAs in Rhodobacterales (https://doi.org/10.3389/fmicb.2021.662907) and so is probably regulatory integration into the ctrA network (https://doi.org/10.3389/fmicb.2019.00803). It's an old mechanism. It would be also interesting to know if it is a common feature of the two archetypical GTAs that the regulator is not part of the cluster itself.

      We agree with the reviewer’s comments and have revised the wording to state that GafA is highly conserved.

    1. Author response:

      The following is the authors’ response to the previous reviews

      General Response to Reviewers:

      We thank the Reviewers for their comments, which continue to substantially improve the quality and clarity of the manuscript, and therefore help us to strengthen its message while acknowledging alternative explanations.

      All three reviewers raised the concern that we have not proven that Rab3A is acting on a presynaptic mechanism to increase mEPSC amplitude after TTX treatment of mouse cortical cultures.  The reviewers’ main point is that we have not shown a lack of upregulation of postsynaptic receptors in mouse cortical cultures. We want to stress that we agree that postsynaptic receptors are upregulated after activity block in neuronal cultures.  However, the reviewers are not acknowledging that we have previously presented strong evidence at the mammalian NMJ that there is no increase in AChR after activity blockade, and therefore the requirement for Rab3A in the homeostatic increase in quantal amplitude points to a presynaptic contribution. We agree that we should restrict our firmest conclusions to the data in the current study, but in the Discussion we are proposing interpretations. We have added the following new text:

      “The impetus for our current study was two previous studies in which we examined homeostatic regulation of quantal amplitude at the NMJ.  An advantage of studying the NMJ is that synaptic ACh receptors are easily identified with fluorescently labeled alpha-bungarotoxin, which allows for very accurate quantification of postsynaptic receptor density. We were able to detect a known change due to mixing 2 colors of alpha-BTX to within 1% (Wang et al., 2005).  Using this model synapse, we showed that there was no increase in synaptic AChRs after TTX treatment, whereas miniature endplate current increased 35% (Wang et al., 2005). We further showed that the presynaptic protein Rab3A was necessary for full upregulation of mEPC amplitude (Wang et al., 2011). These data strongly suggested Rab3A contributed to homeostatic upregulation of quantal amplitude via a presynaptic mechanism.  With the current study showing that Rab3A is required for the homeostatic increase in mEPSC amplitude in cortical cultures, one interpretation is that in both situations, Rab3A is required for an increase in the presynaptic quantum.”

      The point we are making is that the current manuscript is an extension of that work and interpretation of our findings regarding the variability of upregulation of postsynaptic receptors in our mouse cortical cultures further supports the idea that there is a Rab3Adependent presynaptic contribution to homeostatic increases in quantal amplitude.

      Public Reviews:

      Reviewer #1 (Public review):

      Koesters and colleagues investigated the role of the small GTPase Rab3A in homeostatic scaling of miniature synaptic transmission in primary mouse cortical cultures using electrophysiology and immunohistochemistry. The major finding is that TTX incubation for 48 hours does not induce an increase in the amplitude of excitatory synaptic miniature events in neuronal cortical cultures derived from Rab3A KO and Rab3A Earlybird mutant mice. NASPM application had comparable effects on mEPSC amplitude in control and after TTX, implying that Ca2+-permeable glutamate receptors are unlikely modulated during synaptic scaling. Immunohistochemical analysis revealed no significant changes in GluA2 puncta size, intensity, and integral after TTX treatment in control and Rab3A KO cultures. Finally, they provide evidence that loss of Rab3A in neurons, but not astrocytes, blocks homeostatic scaling. Based on these data, the authors propose a model in which neuronal Rab3A is required for homeostatic scaling of synaptic transmission, potentially through GluA2-independent mechanisms.

      The major finding - impaired homeostatic up-scaling after TTX treatment in Rab3A KO and Rab3 earlybird mutant neurons - is supported by data of high quality. However, the paper falls short of providing any evidence or direction regarding potential mechanisms. The data on GluA2 modulation after TTX incubation are likely statistically underpowered, and do not allow drawing solid conclusions, such as GluA2-independent mechanisms of up-scaling.

      The study should be of interest to the field because it implicates a presynaptic molecule in homeostatic scaling, which is generally thought to involve postsynaptic neurotransmitter receptor modulation. However, it remains unclear how Rab3A participates in homeostatic plasticity.

      Major (remaining) point:

      (1) Direct quantitative comparison between electrophysiology and GluA2 imaging data is complicated by many factors, such as different signal-to-noise ratios. Hence, comparing the variability of the increase in mini amplitude vs. GluA2 fluorescence area is not valid. Thus, I recommend removing the sentence "We found that the increase in postsynaptic AMPAR levels was more variable than that of mEPSC amplitudes, suggesting other factors may contribute to the homeostatic increase in synaptic strength." from the abstract.

      We have not removed the statement, but altered it to soften the conclusion. It now reads, “We found that the increase in postsynaptic AMPAR levels in wild type cultures was more variable than that of mEPSC amplitudes, which might be explained by a presynaptic contribution, but we cannot rule out variability in the measurement.”.

      Similarly, the data do not directly support the conclusion of GluA2-independent mechanisms of homeostatic scaling. Statements like "We conclude that these data support the idea that there is another contributor to the TTX- induced increase in quantal size." should be thus revised or removed.

      This particular statement is in the previous response to reviewers only, we deleted the sentence that starts, “The simplest explanation Rab3A regulates a presynaptic contributor….”. and “Imaging of immunofluorescence more variable…”. We deleted “ our data suggest….consistently leads to an increase in mEPSC amplitude and sometimes leads to….” We added “…the lack of a robust increase in receptor levels leaves open the possibility that there is a presynaptic contributor to quantal size in mouse cortical cultures. However, the variability could arise from technical factors associated with the immunofluorescence method, and the mechanism of Rab3A-dependent plasticity could be presynaptic for the NMJ and postsynaptic for cortical neurons.”

      Reviewer #2 (Public review):

      I thank the authors for their efforts in the revision. In general, I believe the main conclusion that Rab3A is required for TTX-induced homeostatic synaptic plasticity is wellsupported by the data presented, and this is an important addition to the repertoire of molecular players involved in homeostatic compensations. I also acknowledge that the authors are more cautious in making conclusions based on the current evidence, and the structure and logic have been much improved.

      The only major concern I have still falls on the interpretation of the mismatch between GluA2 cluster size and mEPSC amplitude. The authors argue that they are only trying to say that changes in the cluster size are more variable than those in the mEPSC amplitude, and they provide multiple explanations for this mismatch. It seems incongruous to state that the simplest explanation is a presynaptic factor when you have all these alternative factors that very likely have contributed to the results. Further, the authors speculate in the discussion that Rab3A does not regulate postsynaptic GluA2 but instead regulates a presynaptic contributor. Do the authors mean that, in their model, the mEPSC amplitude increases can be attributed to two factors- postsynaptic GluA2 regulation and a presynaptic contribution (which is regulated by Rab3A)? If so, and Rab3A does not affect GluA2 whatsoever, shouldn't we see GluA2 increase even in the absence of Rab3A? The data in Table 1 seems to indicate otherwise.

      The main body of this comment is addressed in the General Response to Reviewers. In addition, we deleted text “current data, coupled with our previous findings at the mouse neuromuscular junction, support the idea that there are additional sources contributing to the homeostatic increase in quantal size.” We added new text, so the sentence now reads: “Increased receptors likely contribute to increases in mESPC amplitudes in mouse cortical cultures, but because we do not have a significant increase in GluA2 receptors in our experiments, it is impossible to conclude that the increase is lacking in cultures from Rab3A<sup>-/-</sup> neurons.”

      I also question the way the data are presented in Figure 5. The authors first compare 3 cultures and then 5 cultures altogether, if these experiments are all aimed to answer the same research question, then they should be pooled together. Interestingly, the additional two cultures both show increases in GluA2 clusters, which makes the decrease in culture #3 even more perplexing, for which the authors comment in line 261 that this is due to other factors. Shouldn't this be an indicator that something unusual has happened in this culture?

      Data in this figure is sufficient to support that GluA2 increases are variable across cultures, which hardly adds anything new to the paper or to the field. 

      A major goal of performing the immunofluorescence measurements in the same cultures for which we had electrophysiological results was to address the common impression that the homeostatic effect itself is highly variable, as the reviewer notes in the comment “…GluA2 increases are variable across cultures…” Presumably, if GluA2 increases are the mechanism of the mEPSC amplitude increases, then variable GluA2 increases should correlate with variable mEPSC amplitude increases, but that is not what we observed. We are left with the explanation that the immunofluorescence method itself is very variable. We have added the point to the Discussion, which reads, “the variability could arise from technical factors associated with the immunofluorescence method, and the mechanism of Rab3A-dependent homeostatic plasticity could be presynaptic for the NMJ and postsynaptic for cortical neurons.”

      Finally, the implication of “Shouldn’t this be an indicator that something unusual has happened in this culture?” if it is not due to culture to culture variability in the homeostatic response itself, is that there was a technical problem with accurately measuring receptor levels. We have no reason to suspect anything was amiss in this set of coverslips (the values for controls and for TTX-treated were not outside the range of values in other experiments). In any of the coverslips, there may be variability in the amount of primary anti-GluA2 antibody, as this was added directly to the culture rather than prepared as a diluted solution and added to all the coverslips. But to remove this one experiment because it did not give the expected result is to allow bias to direct our data selection.

      The authors further cite a study with comparable sample sizes, which shows a similar mismatch based on p values (Xu and Pozzo-Miller 2007), yet the effect sizes in this study actually match quite well (both ~160%). P values cannot be used to show whether two effects match, but effect sizes can. Therefore, the statement in lines 411-413 "... consistently leads to an increase in mEPSC amplitudes, and sometimes leads to an increase in synaptic GluA2 receptor cluster size" is not very convincing, and can hardly be used to support "the idea that there are additional sources contributing to the homeostatic increase in quantal size.”

      We have the same situation; our effect sizes match (19.7% increase for mEPSC amplitude; 18.1% increase for GluA2 receptor cluster size, see Table 1), but in our case, the p value for receptors does not reach statistical significance. Our point here is that there is published evidence that the variability in receptor measurements is greater than the variability in electrophysiological measurements. But we have softened this point, removing the sentences containing “…consistently leads and sometimes...” and “……additional sources contributing…”.

      I would suggest simply showing mEPSC and immunostaining data from all cultures in this experiment as additional evidence for homeostatic synaptic plasticity in WT cultures, and leave out the argument for "mismatch". The presynaptic location of Rab3A is sufficient to speculate a presynaptic regulation of this form of homeostatic compensation.

      We have removed all uses of the word “mismatch,” but feel the presentation of the 3 matched experiments, 23-24 cells (Figure 5A, D), and the additional 2 experiments for a total of 5 cultures, 48-49 cells (Figure 5C, F), is important in order to demonstrate that the lack of statistically significant receptor response is due neither to a variable homeostatic response in the mEPSC amplitudes, nor to a small number of cultures.

      Minor concerns:

      (1) Line 214, I see the authors cite literature to argue that GluA2 can form homomers and can conduct currents. While GluA2 subunits edited at the Q/R site (they are in nature) can form homomers with very low efficiency in exogenous systems such as HEK293 cells (as done in the cited studies), it's unlikely for this to happen in neurons (they can hardly traffic to synapses if possible at all).

      We were unable to identify a key reference that characterized GluA2 homomers vs. heteromers in native cortical neurons, but we have rewritten the section in the manuscript to acknowledge the low conductance of homomers:

      “…to assess whether GluA2 receptor expression, which will identify GluA2 homomers and GluA2 heteromers (the former unlikely to contribute to mEPSCs given their low conductance relative to heteromers (Swanson et al., 1997; Mansour et al., 2001)…”

      (2) Lines 221-222, the authors may have misinterpreted the results in Turrigiano 1998. This study does not show that the increase in receptors is most dramatic in the apical dendrite, in fact, this is the only region they have tested. The results in Figures 3b-c show that the effect size is independent of the distance from soma.

      Figure 3 in Turrigiano et al., shows that the increase in glutamate responsiveness is higher at the cell body than along the primary dendrite. We have revised our description to indicate that an increase in responsiveness on the primary dendrite has been demonstrated in Turrigiano et al. 1998.

      “We focused on the primary dendrite of pyramidal neurons as a way to reduce variability that might arise from being at widely ranging distances from the cell body, or, from inadvertently sampling dendritic regions arising from inhibitory neurons. In addition, it has been shown that there is a clear increase in response to glutamate in this region (Turrigiano et al., 1998).”

      “…synaptic receptors on the primary dendrite, where a clear increase in sensitivity to exogenously applied glutamate was demonstrated (see Figure 3 in (Turrigiano et al., 1998)).

      (3) Lines 309-310 (and other places mentioning TNFa), the addition of TNFa to this experiment seems out of place. The authors have not performed any experiment to validate the presence/absence of TNFa in their system (citing only 1 study from another lab is insufficient). Although it's convincing that glia Rab3A is not required for homeostatic plasticity here, the data does not suggest Rab3A's role (or the lack of) for TNFa in this process.

      We have modified the paragraph in the Discussion that addresses the glial results, to describe more clearly the data that supported an astrocytic TNF-alpha mechanism: “TNF-alpha accumulates after activity blockade, and directly applied to neuronal cultures, can cause an increase in GluA1 receptors, providing a potential mechanism by which activity blockade leads to the homeostatic upregulation of postsynaptic receptors (Beattie et al., 2002; Stellwagen et al., 2005; Stellwagen and Malenka, 2006).”

      We have also acknowledged that we cannot rule out TNF-alpha coming from neurons in the cortical cultures: “…suggesting the possibility that neuronal Rab3A can act via a non-TNF-alpha mechanism to contribute to homeostatic regulation of quantal amplitude, although we have not ruled out a neuronal Rab3A-mediated TNF-alpha pathway in cortical cultures.”

      Reviewer #3 (Public review):

      This manuscript presents a number of interesting findings that have the potential to increase our understanding of the mechanism underlying homeostatic synaptic plasticity (HSP). The data broadly support that Rab3A plays a role in HSP, although the site and mechanism of action remain uncertain.

      The authors clearly demonstrate that Rab3A plays a role in HSP at excitatory synapses, with substantially less plasticity occurring in the Rab3A KO neurons. There is also no apparent HSP in the Earlybird Rab3A mutation, although baseline synaptic strength is already elevated. In this context, it is unclear if the plasticity is absent, already induced by this mutation, or just occluded by a ceiling effect due to the synapses already being strengthened. Occlusion may also occur in the mixed cultures when Rab3A is missing from neurons but not astrocytes. The authors do appropriately discuss these options. The authors have solid data showing that Rab3A is unlikely to be active in astrocytes, Finally, they attempt to study the linkage between changes in synaptic strength and AMPA receptor trafficking during HSP, and conclude that trafficking may not be solely responsible for the changes in synaptic strength during HSP.

      Strengths:

      This work adds another player into the mechanisms underlying an important form of synaptic plasticity. The plasticity is likely only reduced, suggesting Rab3A is only partially required and perhaps multiple mechanisms contribute. The authors speculate about some possible novel mechanisms, including whether Rab3A is active pre-synaptically to regulate quantal amplitude.

      As Rab3A is primarily known as a pre-synaptic molecule, this possibility is intriguing. However, it is based on the partial dissociation of AMPAR trafficking and synaptic response and lacks strong support. On average, they saw a similar magnitude of change in mEPSC amplitude and GluA2 cluster area and integral, but the GluA2 data was not significant due to higher variability. It is difficult to determine if this is due to biology or methodology - the imaging method involves assessing puncta pairs (GluA2/VGlut1) clearly associated with a MAP2 labeled dendrite. This is a small subset of synapses, with usually less than 20 synapses per neuron analyzed, which would be expected to be more variable than mEPSC recordings averaged across several hundred events. However, when they reduce the mEPSC number of events to similar numbers as the imaging, the mESPC amplitudes are still less variable than the imaging data. The reason for this remains unclear. The pool of sampled synapses is still different between the methods and recent data has shown that synapses have variable responses during HSP. Further, there could be variability in the subunit composition of newly inserted AMPARs, and only assessing GluA2 could mask this (see below). It is intriguing that pre-synaptic changes might contribute to HSP, especially given the likely localization of Rab3A. But it remains difficult to distinguish if the apparent difference in imaging and electrophysiology is a methodological issue rather than a biological one. Stronger data, especially positive data on changes in release, will be necessary to conclude that pre-synaptic factors are required for HSP, beyond the established changes in post-synaptic receptor trafficking.

      Regarding the concern that the lack of increase in receptors is due to a technical issue, please see General Response to Reviewers, above. We have also softened our conclusions throughout, acknowledging we cannot rule out a technical issue.

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a strong frequency effect that is unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. But the change in frequency seems to argue (as the authors do) that some synapses only have CP-AMPARs, while the rest of the synapses have few or none. Another possibility is that there are pre-synaptic NASPM-sensitive receptors that influence release probability. Further, the amplitude data show a strong trend towards smaller amplitude following NASPM treatment (Fig 3B). The p value for both control and TTX neurons was 0.08 - it is very difficult to argue that there is no effect. The decrease on average is larger in the TTX neurons, and some cells show a strong effect. It is possible there is some heterogeneity between neurons on whether GluA1/A2 heteromers or GluA1 homomers are added during HSP. This would impact the conclusions about the GluA2 imaging as compared to the mEPSC amplitude data.

      The key finding in Figure 3 is that NASPM did not eliminate the statistically significant increase in mEPSC amplitude after TTX treatment (Fig 3A).  Whether or not NASPM sensitive receptors contribute to mESPC amplitude is a separate question (Fig 3B). We are open to the possibility that NASPM reduces mEPSC amplitude in both control and TTX treated cells (p = 0.08 for both), but that does not change our conclusion that NASPM has no effect on the TTX-induced increase in mEPSC amplitude. The mechanism underlying the decrease in mEPSC frequency following NASPM is interesting, but does not alter our conclusions regarding the role of Rab3A in homeostatic synaptic plasticity of mEPSC amplitude. In addition, the Reviewer does not acknowledge the Supplemental Figure #1, which shows a similar lack of correspondence between homeostatic increases in mEPSC amplitude and GluA1 receptors in two cultures where matched data were obtained. Therefore, we do not think our lack of a robust increase in receptors can be explained by our failing to look at the relevant receptor.

      To understand the role of Rab3A in HSP will require addressing two main issues:

      (1) Is Rab3A acting pre-synaptically, post-synaptically or both? The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where it is acting (pre or post) would aid substantially in understanding its role. The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. More concrete support for the authors' suggestion of a pre-synaptic site of control would be helpful.

      We agree that definitive evidence for a presynaptic role of Rab3A in homeostatic plasticity of mEPSC amplitudes in mouse cortical cultures requires demonstrating that loss of Rab3A in postsynaptic neurons does not disrupt the plasticity, whereas loss in presynaptic neurons does. Without these data, we can only speculate that the Rab3A-dependence of homeostatic plasticity of quantal size in cortical neurons may be similar to that of the neuromuscular junction, where it cannot be receptors. We have added to the Discussion that the mechanism of Rab3A regulation of homeostatic plasticity of quantal amplitude could different between cortical neurons and the neuromuscular junction (lines 448-450 in markup,). Establishing a way to co-culture Rab3A-/- and Rab3A+/+ neurons in ratios that would allow us to record from a Rab3A-/- neuron that has mainly Rab3A+/+ inputs (or vice versa) is not impossible, but requires either transfection or transgenic expression with markers that identify the relevant genotype, and will be the subject of future experiments.

      (2): Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs or a decrease in GABA release (ie the opposite of whatever is happening at excitatory synapses). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling; an effect only at excitatory synapses would argue for a more specific role just at those synapses.

      We agree with the Reviewer, that it is important to determine the generality of Rab3A function in homeostatic plasticity. Establishing the homeostatic effect on mIPSCs and then examining them in Rab3A-/- cultures is a large undertaking and will be the subject of future experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor (remaining) points:

      (1) The figure referenced in the first response to the reviewers (Figure 5G) does not exist.

      We meant Figure 5F, which has been corrected in the current response.

      (2) I recommend showing the data without binning (despite some overlap).

      The box plot in Origin will not allow not binning, but we can make the bin size so small that for all intents and purposes, there is close to 1 sample in each bin. When we do this, the majority of data are overlapped in a straight vertical line. Previously described concerns were regarding the gaps in the data, but it should be noted that these are cell means and we are not depicting the distributions of mEPSC amplitudes within a recording or across multiple recordings.

      (3) Please auto-scale all axes from 0 (e.g., Fig 1E, F).

      We have rescaled all mEPSC amplitude axes in box plots to go from 0 (Figures 1, 2 and 6).

      (4) Typo in Figure legend 3: "NASPM (20 um)" => uM

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 140, frequencies are reported in Hz while other places are in sec-1, while these are essentially the same, they should be kept consistent in writing.

      All mEPSC frequencies have been changed to sec<sup>-1</sup>, except we have left “Hz” for repetitive stimulation and filtering.

      (2) Paragraph starting from line 163 (as well as other places where multiple groups are compared, such as the occlusion discussion), the authors assessed whether there was a change in baseline between WT and mutant group by doing pairwise tests, this is not the right test. A two-way ANOVA, or at least a multivariant test would be more appropriate.

      We have performed a two-way ANOVA, with genotype as one factor, and treatment as the other factor. The p values in Figures 1 and 2 have been revised to reflect p values from the post-hoc Tukey test on the specific interactions (for each particular genotype, TTX vs CON effects). The difference in the two WT strains, untreated, was not significant in the Post-Hoc Tukey test, and we have revised the text. The difference between the untreated WT from the Rab3A+/Ebd colony and the untreated Rab3AEbd/Ebd mutant was still significant in the Post-Hoc Tukey test, and this has replaced the Kruskal-Wallis test. The two-way ANOVA was also applied to the neuron-glia experiments and p values in Figure 6 adjusted accordingly.

      (3) Relevant to the second point under minor concerns, I suggest this sentence be removed, as reducing variability and avoiding inhibitory projects are reasons good enough to restrict the analysis to the apical dendrites.

      We have revised the description of the Turrigiano et al., 1998 finding from their Figure 3 and feel it still strengthens the justification for choosing to analyze only synapses on the apical dendrite.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      The comments on lines 256-7 could seem misleading - the NASPM results wouldn't rule out contribution of those other subunits, only non-GluA2 containing combinations of those subunits. I would suggest revising this statement. Also, NASPM does likely have an effect, just not one that changes much with TTX treatment.

      At new line 213 (markup) we have added the modifier “homomeric” to clarify our point that the lack of NASPM effect on the increase in mEPSC amplitude after TTX indicates that the increase is not due to more homomeric Ca<sup>2+</sup>-permeable receptors. We have always stated that NASPM reduces mEPSC amplitude, but it is in both control and treated cultures.

      Strong conclusions based on a single culture (lines 314-5) seem unwarranted.

      We have softened this statement with a “suggesting that” substituted for the previous “Therefore,” but stand by our point that the mEPSC amplitude data support a homeostatic effect of TTX in Culture #3, so the lack of increase in GluA2 cluster size needs an explanation other than variability in the homeostatic effect itself.

      Saying (line 554) something is 'the only remaining possibility' also seems unwarranted.

      We have softened this statement to read, “A remaining possibility…”.

      Beattie EC, Stellwagen D, Morishita W, Bresnahan JC, Ha BK, Von Zastrow M, Beattie MS, Malenka RC (2002) Control of synaptic strength by glial TNFalpha. Science 295:2282-2285.

      Mansour M, Nagarajan N, Nehring RB, Clements JD, Rosenmund C (2001) Heteromeric AMPA receptors assemble with a preferred subunit stoichiometry and spatial arrangement. Neuron 32:841-853. Stellwagen D, Malenka RC (2006) Synaptic scaling mediated by glial TNF-alpha. Nature 440:1054-1059.

      Stellwagen D, Beattie EC, Seo JY, Malenka RC (2005) Differential regulation of AMPA receptor and GABA receptor trafficking by tumor necrosis factor-alpha. J Neurosci 25:3219-3228.

      Swanson GT, Kamboj SK, Cull-Candy SG (1997) Single-channel properties of recombinant AMPA receptors depend on RNA editing, splice variation, and subunit composition. J Neurosci 17:5869.

      Turrigiano GG, Leslie KR, Desai NS, Rutherford LC, Nelson SB (1998) Activity-dependent scaling of quantal amplitude in neocortical neurons. Nature 391:892-896.

      Wang X, Wang Q, Yang S, Bucan M, Rich MM, Engisch KL (2011) Impaired activity-dependent plasticity of quantal amplitude at the neuromuscular junction of Rab3A deletion and Rab3A earlybird mutant mice. J Neurosci 31:3580-3588.

      Wang X, Li Y, Engisch KL, Nakanishi ST, Dodson SE, Miller GW, Cope TC, Pinter MJ, Rich MM (2005) Activity-dependent presynaptic regulation of quantal size at the mammalian neuromuscular junction in vivo. J Neurosci 25:343-351.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study uncovers a protective role of the ubiquitin-conjugating enzyme variant Uev1A in mitigating cell death caused by over-expressed oncogenic Ras in polyploid Drosophila nurse cells and by RasK12 in diploid human tumor cell lines. The authors previously showed that overexpression of oncogenic Ras induces death in nurse cells, and now they perform a deficiency screen for modifiers. They identified Uev1A as a suppressor of this Ras-induced cell death. Using genetics and biochemistry, the authors found that Uev1A collaborates with the APC/C E3 ubiquitin ligase complex to promote proteasomal degradation of Cyclin A. This function of Uev1A appears to extend to diploid cells, where its human homologs UBE2V1 and UBE2V2 suppress oncogenic Ras-dependent phenotypes in human colorectal cancer cells in vitro and in xenografts in mice.

      Strengths:

      (1) Most of the data is supported by a sufficient sample size and appropriate statistics.

      (2) Good mix of genetics and biochemistry.

      (3) Generation of new transgenes and Drosophila alleles that will be beneficial for the community.

      We greatly appreciate these comments.

      Weaknesses:

      (1) Phenotypes are based on artificial overexpression. It is not clear whether these results are relevant to normal physiology.

      Downregulation of Uev1A, Ben, and Cdc27 together significantly increased the incidence of dying nurse cells in normal ovaries (Figure 2-figure supplement 4), indicating that the mechanism we uncovered also protects nurse cells from death during normal oogenesis.

      (2) The phenotype of "degenerating ovaries" is very broad, and the study is not focused on phenotypes at the cellular level. Furthermore, no information is provided in the Materials and Methods on how degenerating ovaries are scored, despite this being the most important assay in the study.

      Thanks for pointing out this issue. We quantified the phenotype of nurse cell death using “degrading/total egg chambers per ovary”, not “degenerating ovaries” (see all quantification data in our manuscript). Notably, this phenotype ranges from mild to severe. In normal nurse cells, nuclei exhibit a large, round morphology in DAPI staining (see the first panel in Figure 1D). During early death, nurse cell nuclei become disorganized and begin to condense and fragment (see the third panel in Figure 2-figure supplement 2E). In late-stage death, the nuclei are completely fragmented into small, condensed spherical structures (see the second panel in Figure 1D), making cellular-level phenotypic quantification impossible. Since all nurse cells within the same egg chamber are interconnected, their death process is synchronous. Thus, quantifying the phenotype at the egg-chamber level is more practical than at the cellular level. To improve clarity, we will provide a detailed description of the phenotype and integrate this explanation into the main text of the revised manuscript.

      (3) In Figure 5, the authors want to conclude that uev1a is a tumor-suppressor, and so they over-express ubev1/2 in human cancer cell lines that have RasK12 and find reduced proliferation, colony formation, and xenograft size. However, genes that act as tumor suppressors have loss-of-function phenotypes that allow for increased cell division. The Drosophila uev1a mutant is viable and fertile, suggesting that it is not a tumor suppressor in flies. Additionally, they do not deplete human ubev1/2 from human cancer cell lines and assess whether this increases cell division, colony formation, and xenograph growth.

      We apologize for our misleading description. In Figure 5, we aimed to demonstrate that UBE2V1/2, like Uev1A in Drosophilanos>Ras<sup>G12V</sup>+bam-RNAi” germline tumors (Figure 4), suppress oncogenic KRAS-driven overgrowth in diploid human cancer cells. Importantly, this function of Uev1A and UBE2V1/2 is dependent on Ras-driven tumors; there is no evidence that they act as broad tumor suppressors in the absence of oncogenic Ras. Drosophila uev1a mutants were lethal, not viable (see Lines 131-133), and germline-specific knockdown of uev1a (nos>uev1a-RNAi) caused female sterility without inducing tumors. These findings suggest that Uev1A lacks tumor-suppressive activity in the Drosophila female germline in the absence of Ras-driven tumors. We will revise the manuscript to prevent misinterpretation. Furthermore, we will investigate whether depletion of UBE2V1, UBE2V2, or both promotes oncogenic KRAS-driven overgrowth in human cancer cells.

      (4) A critical part of the model does not make sense. CycA is a key part of their model, but they do not show CycA protein expression in WT egg chambers or in their over-expression models (nos.RasV12 or bam>RasV12). Based on Lilly and Spradling 1996, Cyclin A is not expressed in germ cells in region 2-3 of the germarium; whether CycA is expressed in nurse cells in later egg chambers is not shown but is critical to document comprehensively.

      We appreciate this critical comment. CycA is a key cyclin that partners with Cdk1 to promote cell division (Edgar and Lehner, 1996). Notably, nurse cells are post-mitotic endocycling cells (Hammond and Laird, 1985) and typically do not express CycA (Lilly and Spradling, 1996) (see the last sentence, page 2518, paragraph 3). However, their death induced by oncogenic Ras<sup>G12V</sup> is significantly suppressed by monoallelic deletion of either cycA or cdk1 (Zhang et al., 2024). Conversely, ectopic CycA expression in nurse cells triggers their death (Figure 2C, 2D). These findings suggest that polyploid nurse cells exhibit high sensitivity to aberrant division-promoting stress, which may represent a distinct form of cellular stress unique to polyploid cells. To further test our model, we will compare CycA expression levels in normal nurse cells versus those undergoing oncogenic Ras<sup>G12V</sup>-induced cell death.

      (5) The authors should provide more information about the knowledge base of uev1a and its homologs in the introduction.

      Thanks for this suggestion. We will include this information in the introduction of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors performed a genetic screen using deficiency lines and identified Uev1a as a factor that protects nurse cells from Ras<sup>G12V</sup>-induced cell death. According to a previous study from the same lab, this cell death is caused by aberrant mitotic stress due to CycA upregulation (Zhang et al.). This paper further reveals that Uev1a forms a complex with APC/C to promote proteasome-mediated degradation of CycA.

      In addition to polyploid nurse cells, the authors also examined the effect of Ras<sup>G12V</sup>-overexpression in diploid germline cells, where Ras<sup>G12V</sup>-overexpression triggers active proliferation, not cell death. Uev1a was found to suppress its overgrowth as well.

      Finally, the authors show that the overexpression of the human homologs, UBE2V1 and UBE2V2, suppresses tumor growth in human colorectal cancer xenografts and cell lines. Notably, the expression of these genes correlates with the survival of colorectal cancer patients carrying the Ras mutation.

      Strength:

      This paper presents a significant finding that UBE2V1/2 may serve as a potential therapy for cancers harboring Ras mutations. The authors propose a fascinating mechanism in which Uev1a forms a complex with APC/C to inhibit aberrant cell cycle progression.

      We greatly appreciate these comments.

      Weakness:

      The quantification of some crucial experiments lacks sufficient clarity.

      Thanks for highlighting this issue. We will provide requested details regarding these quantification data in the revised manuscript.

      References

      Edgar, B.A., and Lehner, C.F. (1996). Developmental control of cell cycle regulators: a fly's perspective. Science 274, 1646-1652.

      Hammond, M.P., and Laird, C.D. (1985). Chromosome structure and DNA replication in nurse and follicle cells of Drosophila melanogaster. Chromosoma 91, 267-278.

      Lilly, M.A., and Spradling, A.C. (1996). The Drosophila endocycle is controlled by Cyclin E and lacks a checkpoint ensuring S-phase completion. Genes Dev 10, 2514-2526.

      Zhang, Q., Wang, Y., Bu, Z., Zhang, Y., Zhang, Q., Li, L., Yan, L., Wang, Y., and Zhao, S. (2024). Ras promotes germline stem cell division in Drosophila ovaries. Stem Cell Reports 19, 1205-1216.

    1. Author response:

      We sincerely thank the reviewers and editors for their thoughtful and constructive evaluation of our manuscript and their recognition of its technical strengths, including advanced spatio-temporal Ca2+ imaging, image processing, and the rational design of selective AVP receptor ligands. We appreciate their acknowledgement that our study contributes to the understanding of glucose-dependent AVP effects in pancreatic islet physiology. Their comments will guide us to refine the scope of our work, which focuses on how α and β cells respond to AVP under varying glucose and hormonal conditions, rather than on linear correlations between the function and transcript levels in individual cells or metabolic profiles in individual cell. Most of the reviewers´ concerns and proposed remedies reflect a reductionist framework, for which we believe cannot not fully account for emergent behavior within the islet collective. As we and others have shown, islet cells do not behave in isolation; their responses often depend on the state of the entire cell population(1, 2). This means that even under identical experimental conditions, responses can differ depending on the islet’s current state. These patterns are not random, but reflect how the islet integrates signals dynamically(3, 4).

      To take advantage of both the systems and molecular side, we do plan to address several of the reviewers' suggestions with new experiments and analyses:

      First, we will add hormone, specifically glucagon, secretion assays to support our conclusions on α cell responses and possible paracrine effects. Second, we will include a targeted transcript analysis of V1bR using RNAscope and extend the pharmacological characterization of downstream signaling using selective agonists and inhibitors. Third, we will clarify the rationale for using forskolin, and added new experiments using GLP-1 analogues to selectively increase cAMP in β cells, allowing us to examine direct AVP effects. And fourth, we will reinforce presence of emergency and that variability in islet responses is not experimental noise, but a hallmark of the collective, non-linear behavior of the islet cell collective, which should later drive a rethinking of experimental designs and the interpretation of pharmacological responses. In conclusion, we believe that our study provides new insights into AVP modulation in pancreatic islets and highlights the importance of context-dependent responses in α and β cells. We are grateful for the opportunity to revise our manuscript and look forward to further strengthening it further through the review process.

      (1) Jin E, Briggs JK, Benninger RKP, Merrins MJ. Glucokinase activity controls peripherally-located subpopulations of β-cells that lead islet Ca2+ oscillations. eLife Sciences Publications, Ltd; 2025.

      (2) Korošak D, Jusup M, Podobnik B, Stožer A, Dolenšek J, Holme P, et al. Autopoietic Influence Hierarchies in Pancreatic β Cells. Phys Rev Lett. 2021;127(16):168101.

      (3) Ball P. How life works : a user's guide to the new biology. Chicago: The University of Chicago Press; 2023. 541 pages p.

      (4) Fancher S, Mugler A. Fundamental Limits to Collective Concentration Sensing in Cell Populations. Phys Rev Lett. 2017;118(7):078101.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study investigates prey capture by archer fish, showing that even though the visuomotor behavior unfolds very rapidly (within 40-70 ms), it is not hardwired; it can adapt to different simulated physics and different prey shapes. Although there was agreement that the model system, experimental design, and main hypothesis are certainly interesting, opinions were divided on whether the evidence supporting the central claims is incomplete. A more rigorous definition and assessment of "reflex speed", more detailed evidence of stimulus control, and a more detailed analysis of individual subjects could potentially increase confidence in the main conclusions.

      Thank you very much. There are several points that we had to absolutely make sure that they are very well understood. (1) Explaining in the best possible way the experiment with a fly sliding on top of a glass plate. Here, the virtual ballistic landing point can be calculated using simple high school physics. It turns out that this is where the fish turn to – even though the fly is not falling at all. Once this is understood it becomes clear that we can precisely measure latency and accuracy of the C-start turns. In the new version we explain this essential aspect in more detail and add an extra Figure (new Figure 2). This may, perhaps, help readers to notice this important background (previously covered in Fig. 1C). (2) The full experimental evidence that the VR method works is presented in more detail and all measurements necessary will be clear after the new Figure 2. They will however not be clear if this Figure is ignored. (3) We have rewritten the manuscript to make it easier to understand what we wanted to show, why we needed VR to proceed and why the archerfish highspeed decision lent itself so readily to tackle the problem. (4) We emphasize the importance of speed-accuracy tradeoffs in standard decision-making and also include data on the absence of such a relation in the archerfish highspeed decisions.

      So, in summary, we have emphasized what we wanted to show and what we did not want to show, we have rewritten the text to make it easier for future readers and we have tried to add more guidance to the figures. We do hope very much that the beauty of the quite unexpected findings is more easily visible to those who take the trouble of actually reading the paper.  

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors test whether the archerfish can modulate the fast response to a falling target. By manipulating the trajectory of the target, they claim that the fish can modulate the fast response. While it is clear from the result that the fish can modulate the fast response, the experimental support for the argument that the fish can do it for a reflex-like behavior is inadequate.

      Please note that we have not simply tested whether archerfish can 'modulate the fast response'. We quantitatively test specific hypotheses on the rules used by the fish. For this the accuracy of the decisions is analyzed with respect to specific points that can be calculated precisely in each of the experiments. These points are shown on the figures and in the movies that were meant to illustrate this important aspect. We had to make sure that the way we calculate the predicted point(s) is made as clear as possible in the text. We added more text and separated the fundamentally important aspects in a separate Figure 2 to make it more difficult to overlook the fundamental aspects that lay the foundation for everything that follows.

      Strengths:

      Overall, the question that the authors raised in the manuscript is interesting.

      Thank you and we do hope very much that, with our revision, you will see the beauty of the findings.

      Weaknesses:

      (1) The argument that the fish can modulate reflex-like behavior relies on the claim that the archerfish makes the decision in 40 ms. There is little support for the 40 ms reaction time. The reaction time for the same behavior in Schlegel 2008, is 6070 ms, and in Tsvilling 2012 about 75 ms, if we take the half height of the maximum as the estimated reaction time in both cases. If we take the peak (or average) of the distribution as an estimation of reaction time, the reaction time is even longer. This number is critical for the analysis the authors perform since if the reaction time is longer, maybe this is not a reflex as claimed. In addition, mentioning the 40 ms in the abstract is overselling the result. The title is also not supported by the results.

      Although the minimum latency is indeed 40 ms (it can be slightly less: e.g., see the evidence in the paper, for instance the plots in the new Fig. 4) the paper's statements are not dependent on a specific number. Even if minimum latency was 100 ms (which it is not) the speed of the response and the absence of a speedaccuracy relation (now shown directly in Fig. 4) is what is of importance. To show this we have completely rewritten large parts of the manuscript.

      (2) A critical technical issue of the stimulus delivery is not clear. The frame rate is 120 FPS and the target horizontal speed can be up to 1.775 m/s. This produces a target jumping on the screen 15 mm in each frame. This is not a continuous motion. Thus, the similarity between the natural system where the target experiences ballistic trajectory and the experiment here is not clear. Ideally, another type of stimulus delivery system is needed for a project of this kind that requires fast-moving targets (e.g. Reiser, J. Neurosci.Meth. 2008). In addition, the screen is rectangular and not circular, so in some directions, the target vanishes earlier than others. It must produce a bias in the fish response but there is no analysis of this type.

      Please note that the new Fig. 3 (former Fig. 2) reports all the evidence that is needed to just show this and in a way that could in no way have been better. We have rewritten the text to explain what needs to be shown experimentally in order to be able to proceed, what critical tests were done and what results were obtained. We also add a short comment on another unsuccessful attempt that we have tried before.

      (3) The results here rely on the ability to measure the error of response in the case of a virtual experiment. It is not clear how this is done since the virtual target does not fall. How do the authors validate that the fish indeed perceives the virtual target as the falling target? Since the deflection is at a later stage of the virtual trajectory, it is not clear what is the actual physics that governs the world of the experiment. Overall, the experimental setup is not well designed.

      Understanding this aspect is essential. If the glass plate experiment is not thoroughly understood (new Fig. 2 with new text to emphasize that this is absolutely essential) nothing that follows makes any sense, including what is meant by the statement that the decision could be hardwired to ballistic motion.

      Reviewer #2 (Public review):

      Summary:

      This manuscript studies prey capture by archer fish, which observe the initial values of motion of aerial prey they made fall by spitting on them, and then rapidly turn to reach the ballistic landing point on the water surface. The question raised by the article is whether this incredibly fast decision-making process is hardwired and thus unmodifiable or can be adjusted by experience to follow a new rule, namely that the landing point is deflected from a certain amount of the expected ballistic landing point. The results show that the fish learn the new rule and use it afterward in a variety of novel situations that include height, side, and speed of the prey, and which preserve the speed of the fish's decision. Moreover, a remarkable finding presented in this work is the fact that fish that have learned to use the new rule can relearn to use the ballistic landing point for an object based on its shape (a triangle) while keeping simultaneously the 'deflected rule' for an object differing in shape (a disc); in other words, fish can master simultaneously two decisionmaking rules based on the different shape of objects.

      Strengths:

      The manuscript relies on a sophisticated and clever experimental design that allows changing the apparent landing point of a virtual prey using a virtual reality system. Several robust controls are provided to demonstrate the reliability and usefulness of the experimental setup.

      Overall, I very much like the idea conveyed by the authors that even stimuli triggering apparently hardwired responses can be relearned in order to be associated with a different response, thus showing the impressive flexibility of circuits that are sometimes considered mediating pure reflexive responses.

      Thank you so much for this precise assessment of what we have shown!

      This is the case - as an additional example - of the main component of the Nasanov pheromone of bees (geraniol), which triggers immediate reflexive attraction and appetitive responses, and which can, nevertheless, be learned by bees in association with an electric shock so that bees end up exhibiting avoidance and the aversive response of sting extension to this odorant (1), which is a fully unnatural situation, and which shows that associative aversive learning is strong enough to override preprogrammed responding, thus reflecting an impressive behavioral flexibility.

      That's very interesting, thanks and we are very happy to mention this important study in the revised version.

      Weaknesses:

      As a general remark, there is some information that I missed and that is mandatory in the analysis of behavioral changes.

      Firstly, the variability in the performances displayed. The authors mentioned that the results reported come from 6 fish (which is a low sample size). How were the individual performances in terms of consistency? Were all fish equally good in adjusting/learning the new rule? How did errors vary according to individual identity? It seems to me that this kind of information should be available as the authors reported that individual fish could be recognized and tracked (see lines 620-635) and is essential for appreciating the flexibility of the system under study.

      Secondly, the speed of the learning process is not properly explained. Admittedly, fish learn in an impressive way the new rule and even two rules simultaneously; yet, how long did they need to achieve this? In the article, Figure 2 mentions that at least 6 training stages (each defined as a block of 60 evaluated turn decisions, which actually shows that the standard term 'Training Block' would be more appropriate) were required for the fish to learn the 'deflected rule'. While this means 360 trials (turning starts), I was left with the question of how long this process lasted. How many hours, days, and weeks were needed for the fish to learn? And as mentioned above, were all fish equally fast in learning? I would appreciate explaining this very important point because learning dynamics is relevant to understanding the flexibility of the system.

      First, it is very important to keep the question in mind that we wanted to clarify: Does the system have the potential to re-tune the decisions to other non-ballistic relations between the input variables and the output? This would have been established if one fish was found capable of doing that. We have rewritten the introduction and discussion to specifically say what our aim was. We feel that the paper is already extremely long and difficult to understand (even after we tried very hard in this revision to explain everything in detail and as good as we could), requires the establishment of a method whose success was really unexpected and finding a degree of plasticity that we did not expect at all. We also have added a section in the discussion stating what we can, and we cannot say given the number of fish examined. For instance, we do not know if there are differences in the speed at which the different individuals mastered the new rules and if social learning could play a role to speed up the acquisition. That is a brilliant idea and we are very interested in checking this - but we wanted to stick with the (quite ambitious) goal of the present study.

      Reference:

      (1) Roussel, E., Padie, S. & Giurfa, M. Aversive learning overcomes appetitive innate responding in honeybees. Anim Cogn 15, 135-141, doi:10.1007/s10071011-0426-1 (2012).

      Thanks for this reference!

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (1) What is the difference between Reinel, J. Exp. Bio. 2016 and the current study?

      Clearly in that study all objects were strictly falling ballistically, and latency and accuracy of the turn decisions were determined when the initial motion was not only horizontal but had an additional vertical component of speed. The question of that study was if the need to account to an additional variable (vertical speed) in the decision would affect its latency or accuracy. The study showed that also then archerfish rapidly turn to the later impact point. It also showed that accuracy and latency were not changed by the added degree of freedom.

      (2) How do Figures 2 F and G demonstrate that an accurate start is possible?

      See above.

      (3) Figure 4 is hard to follow, it is not clear what is presented and how it supports the claim that the new rule is represented in a way that allows immediate generalization.

      Yes, this is not at all an easy experiment. Briefly, fish were re-trained at only one height level and then are tested at other levels. The strategy is as in the experiments Schuster et al. 2004 Current Biology, Vol. 14, 1565–1568, Figure 5. We have changed text and Figure (new Figure 5) to show how the predictions were reached.

      Reviewer #2 (Recommendations for the authors):

      Minor remarks

      Lines 88-90: I was surprised to see that in this section, the authors did not mention the speed-accuracy trade-off off which has inspired numerous experiments in animal behavior (1). This could be used to back their point, namely, that speed comes with an apparent cost of a loss in accuracy.

      Yes, that is a crucial aspect that was completely missing even though it demonstrates a key aspect of 'standard' versus some 'highspeed' decisions! We definitely had to include it and also to show, directly under the conditions of our present experiments (in the new Fig. 4) the absence of a significant speedaccuracy relation for the archerfish highspeed decisions! Thank you very much for emphasizing this crucial aspect!

      Lines 182-184: Specify that this situation corresponds to the hatched bar in Figure (this can be specified in the caption of the figure, where the bar is not mentioned).

      Thanks!

      Lines 187-188: here and elsewhere (e.g. lines 224-225, etc), the error made by the fish is presented in cm (see Figure 2 where the inset shows how the error was computed). I wonder if it would not be more appropriate to present it in terms of the angular difference between the trajectory made by the fish and the food delivery location.

      Angles could also be used, but because of the large variation in initial distances (that we wanted to make sure that the fish had to capture a rule, allowing them to respond from various distances) another measure was used that we found somehow more natural: it is simply how close a fish would get to the landing point if it continued in the direction assumed after the turn. Although we describe how we defined accuracy we did not discuss why this measure was used in this (and many previous studies). We are very happy to add this. Please also note that running all tests based on angular errors (which we also have done throughout to ensure that the conclusions are independent on an arbitrary measure of the error) leads to no different conclusion. We have added a brief explanation in the text and in the new Fig. 2.

      Lines 299-323: Is it my impression or did fish have more trouble in generalizing their learned rule to the condition untrained larger height (see for instance red curves in Figures 4 D, E, G)? Could the authors elaborate on this point?

      We changed the code to make this more clear. The red curves (before marked A to highlight impact point option A) correspond to the errors to the ballistic impact point without deflection, so what would have to be compared are the black curves (marked P to highlight the virtual impact point that should be chosen had the fish immediately generated to the untrained conditions). We have rewritten the text and the labels in the Figure (now Figure 5) to illustrate the predictions and to name them in more helpful ways and so that they can't be confused with panel labels. At any rate, what needs to be compared, to check the idea, are the black curves, and these are not statistically different between both heights (p=0.525, Mann-Whitney). Interestingly, none of the black curves from all panels (D-G) differ (p>0.3).

      Line 559: if we are speaking here about luminance contrast, it should read 'Michelson Contrast' rather than 'Michelsen Contrast'.

      Absolutely, thanks!

      References

      (1) Chittka, L., Skorupski, P. & Raine, N. E. Speed-accuracy tradeoffs in animal decision making. Trends Ecol Evol 24, 400-407, doi:10.1016/j.tree.2009.02.010 (2009).

      An excellent paper that helps to stress our main question

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Pakula et al. explore the impact of reactive oxygen species (ROS) on neonatal cerebellar regeneration, providing evidence that ROS activates regeneration through Nestin-expressing progenitors (NEPs). Using scRNA-seq analysis of FACS-isolated NEPs, the authors characterize injury-induced changes, including an enrichment in ROS metabolic processes within the cerebellar microenvironment. Biochemical analyses confirm a rapid increase in ROS levels following irradiation and forced catalase expression, which reduces ROS levels, and impairs external granule layer (EGL) replenishment post-injury.

      Strengths:

      Overall, the study robustly supports its main conclusion and provides valuable insights into ROS as a regenerative signal in the neonatal cerebellum.

      Comments on revisions:

      The authors have addressed most of the previous comments. However, they should clarify the following response:

      *"For reasons we have not explored, the phenotype is most prominent in these lobules, that is why they were originally chosen. We edited the following sentence (lines 578-579):

      First, we analyzed the replenishment of the EGL by BgL-NEPs in vermis lobules 3-5, since our previous work showed that these lobules have a prominent defect."*

      It has been reported that the anterior part of the cerebellum may have a lower regenerative capacity compared to the posterior lobe. To avoid potential ambiguity, the authors should clarify that "the phenotype" and "prominent defect" refer to more severe EGL depletion at an earlier stage after IR rather than a poorer regenerative outcome. Additionally, they should provide a reference to support their statement or indicate if it is based on unpublished observations.

      Our comment does not refer to a more severe EGL depletion at an earlier stage. There is instead poorer regeneration of the anterior region. The irradiation approach used provides consistent cell killing of GCPs across the cerebellum. This can be seen in Fig. 1c, e, g, i in our previous publication: Wojcinski, et al. (2017) Cerebellar granule cell replenishment post-injury by adaptive reprogramming of Nestin+ progenitors. Nature Neuroscience, 20:1361-1370). Also, Fig 2e, g, k, m in the paper shows that by P5 and P8, posterior lobule 8 recovers better than anterior lobules 1-5.

      Reviewer #2 (Public review):

      Summary:

      The authors have previously shown that the mouse neonatal cerebellum can regenerate damage to granule cell progenitors in the external granular layer, through reprogramming of gliogenic nestin-expressing progenitors (NEPs). The mechanisms of this reprogramming remain largely unknown. Here the authors used scRNAseq and ATACseq of purified neonatal NEPs from P1-P5 and showed that ROS signatures were transiently upregulated in gliogenic NEPs ve neurogenic NEPs 24 hours post injury (P2). To assess the role of ROS, mice transgenic for global catalase activity were assessed to reduce ROS. Inhibition of ROS significantly decreased gliogenic NEP reprogramming and diminished cerebellar growth post-injury. Further, inhibition of microglia across this same time period prevented one of the first steps of repair - the migration of NEPs into the external granule layer. This work is the first demonstration that the tissue microenvironment of the damaged neonatal cerebellum is a major regulator of neonatal cerebellar regeneration. Increased ROS is seen in other CNS damage models, including adults, thus there may be some shared mechanisms across age and regions, although interestingly neonatal cerebellar astrocytes do not upregulate GFAP as seen in adult CNS damage models. Another intriguing finding is that global inhibition of ROS did not alter normal cerebellar development.

      Strengths:

      This paper presents a beautiful example of using single cell data to generate biologically relevant, testable hypotheses of mechanisms driving important biological processes. The scRNAseq and ATACseq analyses are rigorously conducted and conclusive. Data is very clearly presented and easily interpreted supporting the hypothesis next tested by reduce ROS in irradiated brains.

      Analysis of whole tissue and FAC sorted NEPS in transgenic mice where human catalase was globally expressed in mitochondria were rigorously controlled and conclusively show that ROS upregulation was indeed decreased post injury and very clearly the regenerative response was inhibited. The authors are to be commended on the very careful analyses which are very well presented and again, easy to follow with all appropriate data shown to support their conclusions.

      Weaknesses:

      The authors also present data to show that microglia are required for an early step of mobilizing gliogenic NEPs into the damaged EGL. While the data that PLX5622 administration from P0-P5 or even P0-P8 clearly shows that there is an immediate reduction of NEPs mobilized to the damaged EGL, there is no subsequent reduction of cerebellar growth such that by P30, the treated and untreated irradiated cerebella are equivalent in size. There is speculation in the discussion about why this might be the case. Additional experiments and tools are required to assess mechanisms. Regardless, the data still implicate microglia in the neonatal regenerative response, and this finding remains an important advance.

      As stated previously, the suggested follow up experiments while relevant are extensive and considered beyond the scope of the current paper.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Pakula et al. explore the impact of reactive oxygen species (ROS) on neonatal cerebellar regeneration, providing evidence that ROS activates regeneration through Nestin-expressing progenitors (NEPs). Using scRNA-seq analysis of FACS-isolated NEPs, the authors characterize injury-induced changes, including an enrichment in ROS metabolic processes within the cerebellar microenvironment. Biochemical analyses confirm a rapid increase in ROS levels following irradiation, and forced catalase expression, which reduces ROS levels, and impairs external granule layer (EGL) replenishment post-injury.

      Strengths:

      Overall, the study robustly supports its main conclusion and provides valuable insights into ROS as a regenerative signal in the neonatal cerebellum.

      Weaknesses:

      (1) The diversity of cell types recovered from scRNA-seq libraries of sorted Nes-CFP cells is unexpected, especially the inclusion of minor types such as microglia, meninges, and ependymal cells. The authors should validate whether Nes and CFP mRNAs are enriched in the sorted cells; if not, they should discuss the potential pitfalls in sampling bias or artifacts that may have affected the dataset, impacting interpretation.

      In our previous work, we thoroughly assessed the transgene using RNA in situ hybridization for Cfp, immunofluorescent analysis for CFP and scRNA-seq analysis for Cfp transcripts (Bayin et al., Science Adv. 2021, Fig. S1-2)(1), and characterized the diversity within the NEP populations of the cerebellum. Our present scRNA-seq data also confirms that Nes transcripts are expressed in all the NEP subtypes. A feature plot for Nes expression has been added to the revised manuscript (Fig 1E), as well as a sentence explaining the results. Of note, since this data was generated from FACS-isolated CFP+ cells, the perdurance of the protein allows for the detection of immediate progeny of Nes-expressing cells, even in cells where Nes is not expressed once cells are differentiated. Finally, oligodendrocyte progenitors, perivascular cells, some rare microglia and ependymal cells have been demonstrated to express Nes in the central nervous system; therefore, detecting small groups of these cells is expected (2-4). We have added the following sentence (lines 391-394):

      “Detection of Nes mRNA confirmed that the transgene reflects endogenous Nes expression in progenitors of many lineages, and also that the perdurance of CFP protein in immediate progeny of Nes-expressing cells allowed the isolation of these cells by FACS (Figure 1E)”.

      (2) The authors should de-emphasize that ROS signaling and related gene upregulation exclusively in gliogenic NEPs. Genes such as Cdkn1a, Phlda3, Ass1, and Bax are identified as differentially expressed in neurogenic NEPs and granule cell progenitors (GCPs), with Ass1 absent in GCPs. According to Table S4, gene ontology (GO) terms related to ROS metabolic processes are also enriched in gliogenic NEPs, neurogenic NEPs, and GCPs.

      As the reviewer requested, we have de-emphasized that ROS signaling is preferentially upregulated in gliogenic NEPs, since we agree with the reviewer that there is some evidence for similar transcriptional signatures in neurogenic NEPs and GCPs. We added the following (lines 429-531):

      “Some of the DNA damage and apoptosis related genes that were upregulated in IR gliogenic-NEPs (Cdkn1a, Phlda3, Bax) were also upregulated in the IR neurogenic-NEPs and GCPs at P2 (Supplementary Figure 2B-E).”

      And we edited the last few sentences of the section to state (lines 453-459):

      “Interestingly, we did not observe significant enrichment for GO terms associated with cellular stress response in the GCPs that survived the irradiation compared to controls, despite significant enrichment for ROS signaling related GO-terms (Table S4). Collectively, these results indicate that injury induces significant and overlapping transcriptional changes in NEPs and GCPs. The gliogenic- and neurogenic-NEP subtypes transiently upregulate stress response genes upon GCP death, and an overall increase in ROS signaling is observed in the injured cerebella.”

      (3) The authors need to justify the selection of only the anterior lobe for EGL replenishment and microglia quantification.

      We thank the reviewers for asking for this clarification. Our previous publications on regeneration of the EGL by NEPs have all involved quantification of these lobules, thus we think it is important to stay with the same lobules. For reasons we have not explored, the phenotype is most prominent in these lobules, that is why they were originally chosen. We edited the following sentence (lines 578-579):

      “First, we analyzed the replenishment of the EGL by BgL-NEPs in vermis lobules 3-5, since our previous work showed that these lobules have a prominent defect.”

      (4) Figure 1K: The figure presents linkages between genes and GO terms as a network but does not depict a gene network. The terminology should be corrected accordingly.

      We have corrected the terminology and added the following (lines 487-489):

      “Finally, linkages between the genes in differentially open regions identified by ATAC-seq and the associated GO-terms revealed an active transcriptional network involved in regulating cell death and apoptosis (Figure 1K).”

      (5) Figure 1H and S2: The x-axis appears to display raw p-values rather than log10(p.value) as indicated. The x-axis should ideally show -log10(p.adjust), beginning at zero. The current format may misleadingly suggest that the ROS GO term has the lowest p-values.

      Apologies for the mistake. The data represents raw p-values and the x-axis has been corrected.

      (6) Genes such as Ppara, Egln3, Foxo3, Jun, and Nos1ap were identified by bulk ATAC-seq based on proximity to peaks, not by scRNA-seq. Without additional expression data, caution is needed when presenting these genes as direct evidence of ROS involvement in NEPs.

      We modified the text to discuss the discrepancies between the analyses. While some of this could be due to the lower detection limits in the scRNA-seq, it also highlights that chromatin accessibility is not a direct readout for expression levels and further analysis is needed. Nevertheless, both scRNA-seq and ATAC-seq have identified similar mechanisms, and our mutant analysis confirmed our hypothesis that an increase in ROS levels underlies repair, further increasing the confidence in our analyses. Further investigation is needed to understand the downstream mechanisms. We added the following sentence (lines 478-481):

      “However, not all genes in the accessible areas were differentially expressed in the scRNA-seq data. While some of this could be due to the detection limits of scRNA-seq, further analysis is required to assess the mechanisms of how the differentially accessible chromatin affects transcription.”

      (7) The authors should annotate cell identities for the different clusters in Table S2.

      All cell types have been annotated in Table S2.

      (8) Reiterative clustering analysis reveals distinct subpopulations among gliogenic and neurogenic NEPs. Could the authors clarify the identities of these subclusters? Can we distinguish the gliogenic NEPs in the Bergmann glia layer from those in the white matter?

      Thank you for this clarification. As shown in our previous studies, we can not distinguish between the gliogenic NEPs in the Bergmann glia layer and the white matter based on scRNA-seq, but expression of the Bergmann glia marker Gdf10 suggests that a large proportion of the cells in the Hopx+ clusters are in the Bergmann glia layer. The distinction within the major subpopulations that we characterized (Hopx-, Ascl1-expressing NEPs and GCPs) are driven by their proliferative/maturation status as we previously observed. We have included a detailed annotation of all the clusters in Table S2, as requested and a UMAP for mKi57 expression in Fig 1E. We have clarified this in the following sentence (lines 383-385):

      “These groups of cells were further subdivided into molecularly distinct clusters based on marker genes and their cell cycle profiles or developmental stages (Figure 1D, Table S2).”

      (9) In the Methods section, the authors mention filtering out genes with fewer than 10 counts. They should specify if these genes were used as background for enrichment analysis. Background gene selection is critical, as it influences the functional enrichment of gene sets in the list.

      As requested, the approach used has been added to the Methods section of the revised paper. Briefly, the background genes used by the goseq function are the same genes used for the probability weight function (nullp). The mm8 genome annotation was used in the nullp function, and all annotated genes were used as background genes to compute GO term enrichment. The following was added (lines 307-308):

      “The background genes used to compute the GO term enrichment includes all genes with gene symbol annotations within mm8.”

      (10) Figure S1C: The authors could consider using bar plots to better illustrate cell composition differences across conditions and replicates.

      As suggested, we have included bar plots in Fig. S1D-F.

      (11) Figures 4-6: It remains unclear how the white matter microglia contribute to the recruitment of BgL-NEPs to the EGL, as the mCAT-mediated microglia loss data are all confined to the white matter.

      We have thought about the question and had initially quantified the microglia in the white matter and the rest of the lobules (excluding the EGL) separately. However, there are very few microglia outside the white matter in each section, thus it is not possible to obtain reliable statistical data on such a small population. We therefore did not include the cells in the analysis. We have added this point in the main text (line 548).

      “As a possible explanation for how white matter microglia could influence NEP behaviors, given the small size of the lobules and how the cytoarchitecture is disrupted after injury, we think it is possible that secreted factors from the white matter microglia could reach the BgL NEPs. Alternatively, there could be a relay system through an intermediate cell type closer to the microglia.” We have added these ideas to the Discussion of the revised paper (lines 735-738).

      Reviewer #2 (Public review):

      Summary:

      The authors have previously shown that the mouse neonatal cerebellum can regenerate damage to granule cell progenitors in the external granular layer, through reprogramming of gliogenic nestin-expressing progenitors (NEPs). The mechanisms of this reprogramming remain largely unknown. Here the authors used scRNAseq and ATACseq of purified neonatal NEPs from P1-P5 and showed that ROS signatures were transiently upregulated in gliogenic NEPs ve neurogenic NEPs 24 hours post injury (P2). To assess the role of ROS, mice transgenic for global catalase activity were assessed to reduce ROS. Inhibition of ROS significantly decreased gliogenic NEP reprogramming and diminished cerebellar growth post-injury. Further, inhibition of microglia across this same time period prevented one of the first steps of repair - the migration of NEPs into the external granule layer. This work is the first demonstration that the tissue microenvironment of the damaged neonatal cerebellum is a major regulator of neonatal cerebellar regeneration. Increased ROS is seen in other CNS damage models including adults, thus there may be some shared mechanisms across age and regions, although interestingly neonatal cerebellar astrocytes do not upregulate GFAP as seen in adult CNS damage models. Another intriguing finding is that global inhibition of ROS did not alter normal cerebellar development.

      Strengths:

      This paper presents a beautiful example of using single cell data to generate biologically relevant, testable hypotheses of mechanisms driving important biological processes. The scRNAseq and ATACseq analyses are rigorously conducted and conclusive. Data is very clearly presented and easily interpreted supporting the hypothesis next tested by reduce ROS in irradiated brains.

      Analysis of whole tissue and FAC sorted NEPS in transgenic mice where human catalase was globally expressed in mitochondria were rigorously controlled and conclusively show that ROS upregulation was indeed decreased post injury and very clearly the regenerative response was inhibited. The authors are to be commended on the very careful analyses which are very well presented and again, easy to follow with all appropriate data shown to support their conclusions.

      Weaknesses:

      The authors also present data to show that microglia are required for an early step of mobilizing gliogenic NEPs into the damaged EGL. While the data that PLX5622 administration from P0-P5 or even P0-P8 clearly shows that there is an immediate reduction of NEPs mobilized to the damaged EGL, there is no subsequent reduction of cerebellar growth such that by P30, the treated and untreated irradiated cerebella are equivalent in size. There is speculation in the discussion about why this might be the case, but there is no explanation for why further, longer treatment was not attempted nor was there any additional analyses of other regenerative steps in the treated animals. The data still implicate microglia in the neonatal regenerative response, but how remains uncertain.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      This is an exemplary manuscript.

      The methods and data are very well described and presented.

      I actually have very little to ask the authors except for an explanation of why PLX treatment was discontinued after P5 or P8 and what other steps of NEP reprogramming were assessed in these animals? Was NEP expansion still decreased at P8 even in the presence of PLX at this stage? Also - was there any analysis attempted combining mCAT and PLX?

      We agree with the reviewer that a follow up study that goes into a deeper analysis of the role of microglia in GCP regeneration and any interaction with ROS signaling would interesting. However, it would require a set of tools that we do not currently have. We did not have enough PLX5622 to perform addition experiments or extend the length of treatment. Plexxikon informed us in 2021 that they were no longer manufacturing PLX5622 because they were focusing on new analogs for in vivo use, and thus we had to use what we had left over from a completed preclinical cancer study. We nevertheless think it is important to publish our preliminary results to spark further experiments by other groups.

      References

      (1) Bayin N. S. Mizrak D., Stephen N. D., Lao Z., Sims P. A., Joyner A. L. Injury induced ASCL1 expression orchestrates a transitory cell state required for repair of the neonatal cerebellum. Sci Adv. 2021;7(50):eabj1598.

      (2) Cawsey T, Duflou J, Weickert CS, Gorrie CA. Nestin-Positive Ependymal Cells Are Increased in the Human Spinal Cord after Traumatic Central Nervous System Injury. J Neurotrauma. 2015;32(18):1393-402.

      (3) Gallo V, Armstrong RC. Developmental and growth factor-induced regulation of nestin in oligodendrocyte lineage cells. The Journal of neuroscience : the official journal of the Society for Neuroscience. 1995;15(1 Pt 1):394-406.

      (4) Huang Y, Xu Z, Xiong S, Sun F, Qin G, Hu G, et al. Repopulated microglia are solely derived from the proliferation of residual microglia after acute depletion. Nat Neurosci. 2018;21(4):530-40.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Jin et al. investigated how the bacterial DNA damage (SOS) response and its regulator protein RecA affects the development of drug resistance under short-term exposure to beta-lactam antibiotics. Canonically, the SOS response is triggered by DNA damage, which results in the induction of error-prone DNA repair mechanisms. These error-prone repair pathways can increase mutagenesis in the cell, leading to the evolution of drug resistance. Thus, inhibiting the SOS regulator RecA has been proposed as means to delay the rise of resistance.

      In this paper, the authors deleted the RecA protein from E. coli and exposed this ∆recA strain to selective levels of the beta-lactam antibiotic, ampicillin. After an 8h treatment, they washed the antibiotic away and allowed the surviving cells to recover in regular media. They then measured the minimum inhibitory concentration (MIC) of ampicillin against these treated strains. They note that after just 8 h treatment with ampicillin, the ∆recA had developed higher MICs towards ampicillin, while by contrast, wild-type cells exhibited unchanged MICs. This MIC increase was also observed subsequent generations of bacteria, suggesting that the phenotype is driven by a genetic change.

      The authors then used whole genome sequencing (WGS) to identify mutations that accounted for the resistance phenotype. Within resistant populations, they discovered key mutations in the promoter region of the beta-lactamase gene, ampC; in the penicillin-binding protein PBP3 which is the target of ampicillin; and in the AcrB subunit of the AcrAB-TolC efflux machinery. Importantly, mutations in the efflux machinery can impact the resistances towards other antibiotics, not just beta-lactams. To test this, they repeated the MIC experiments with other classes of antibiotics, including kanamycin, chloramphenicol, and rifampicin. Interestingly, they observed that the ∆recA strains pre-treated with ampicillin showed higher MICs towards all other antibiotic tested. This suggests that the mutations conferring resistance to ampicillin are also increasing resistance to other antibiotics.

      The authors then performed an impressive series of genetic, microscopy, and transcriptomic experiments to show that this increase in resistance is not driven by the SOS response, but by independent DNA repair and stress response pathways. Specifically, they show that deletion of the recA reduces the bacterium's ability to process reactive oxygen species (ROS) and repair its DNA. These factors drive accumulation of mutations that can confer resistance towards different classes of antibiotics. The conclusions are reasonably well-supported by the data, but some aspects of the data and the model need to be clarified and extended.

      Strengths:

      A major strength of the paper is the detailed bacterial genetics and transcriptomics that the authors performed to elucidate the molecular pathways responsible for this increased resistance. They systemically deleted or inactivated genes involved in the SOS response in E. coli. They then subjected these mutants the same MIC assays as described previously. Surprisingly, none of the other SOS gene deletions resulted an increase in drug resistance, suggesting that the SOS response is not involved in this phenotype. This led the authors to focus on the localization of DNA PolI, which also participates in DNA damage repair. Using microscopy, they discovered that in the RecA deletion background, PolI co-localizes with the bacterial chromosome at much lower rates than wild-type. This led the authors to conclude that deletion of RecA hinders PolI and DNA repair. Although the authors do not provide a mechanism, this observation is nonetheless valuable for the field and can stimulate further investigations in the future.

      In order to understand how RecA deletion affects cellular physiology, the authors performed RNA-seq on ampicillin-treated strains. Crucially, they discovered that in the RecA deletion strain, genes associated with antioxidative activity (cysJ, cysI, cysH, soda, sufD) and Base Excision Repair repair (mutH, mutY, mutM), which repairs oxidized forms of guanine, were all downregulated. The authors conclude that down-regulation of these genes might result in elevated levels of reactive oxygen species in the cells, which in turn, might drive the rise of resistance. Experimentally, they further demonstrated that treating the ∆recA strain with an antioxidant GSH prevents the rise of MICs. These observations will be useful for more detailed mechanistic follow-ups in the future.

      Weaknesses:

      Throughout the paper, the authors use language suggesting that ampicillin treatment of the ∆recA strain induces higher levels of mutagenesis inside the cells, leading to the rapid rise of resistance mutations. However, as the authors note, the mutants enriched by ampicillin selection can play a role in efflux and can thus change a bacterium's sensitivity to a wide range of antibiotics, in what is known as cross-resistance. The current data is not clear on whether the elevated "mutagenesis" is driven ampicillin selection or by a bona fide increase in mutation rate.

      Furthermore, on a technical level, the authors employed WGS to identify resistance mutations in the treated ampicillin-treated wild-type and ∆recA strains. However, the WGS methodology described in the paper is inconsistent. Notably, wild-type WGS samples were picked from non-selective plates, while ΔrecA WGS isolates were picked from selective plates with 50 μg/mL ampicillin. Such an approach biases the frequency and identity of the mutations seen in the WGS and cannot be used to support the idea that ampicillin treatment induces higher levels of mutagenesis.

      Finally, it is important to establish what the basal mutation rates of both the WT and ∆recA strains are. Currently, only the ampicillin-treated populations were reported. It is possible that the ∆recA strain has inherently higher mutagenesis than WT, with a larger subpopulation of resistant clones. Thus, ampicillin treatment might not in fact induce higher mutagenesis in ∆recA.

      Comments on revisions:

      Thank you for responding to the concerns raised previously. The manuscript overall has improved.

      We sincerely thank the reviewer for raising this important point. In our initial submission, we acknowledge that our mutation analysis was based on a limited number of replicates (n=6), which may not have been sufficient to robustly distinguish between mutation induction and selection. In response to this concern, we have substantially expanded our experimental dataset. Specifically, we redesigned the mutation rate validation experiment by increasing the number of biological replicates in each condition to 96 independent parallel cultures. This enabled us to systematically assess mutation frequency distributions under four conditions (WT, WT+ampicillin, ΔrecA, ΔrecA+ampicillin), using both maximum likelihood estimation (MLE) and distribution-based fluctuation analysis (new Figure 1F, 1G, and Figure S5).

      These expanded datasets revealed that:

      (1) While the estimated mutation rate was significantly elevated in ΔrecA+ampicillin compared to ΔrecA alone (Fig. 1G),

      (2) The distribution of mutation frequencies in ΔrecA+ampicillin was highly skewed with evident jackpot cultures (Fig. 1F), and

      (3) The observed pattern significantly deviated from Poisson expectations, which is inconsistent with uniform mutagenesis and instead supports clonal selection from an early-arising mutational pool (Fig. S5).

      Importantly, these new results do not contradict our original conclusions but rather extend and refine them. The previous evidence for ROS-mediated mutagenesis remains valid and is supported by our GSH experiments, transcriptomic analysis of oxidative stress genes, and DNA repair pathway repression. However, the additional data now indicate that ROS-induced variants are not uniformly induced after antibiotic exposure but are instead generated stochastically under the stress-prone ΔrecA background and then selectively enriched upon ampicillin treatment.

      Taken together, we now propose a two-step model of resistance evolution in ΔrecA cells (new Figure 5):

      Step i: RecA deficiency creates a hypermutable state through impaired repair and elevated ROS, increasing the probability of resistance-conferring mutations.

      Step ii: β-lactam exposure acts as a selective bottleneck, enriching early-arising mutants that confer resistance.

      We have revised both the Results and Discussion sections to clearly articulate this complementary relationship between mutational supply and selection, and we believe this integrated model better explains the observed phenotypes and mechanistic outcomes.

      Reviewer #2 (Public review):

      This study aims to demonstrate that E. coli can acquire rapid antibiotic resistance mutations in the absence of a DNA damage response. The authors employed a modified Adaptive Laboratory Evolution (ALE) workflow to investigate this, initiating the process by diluting an overnight culture 50-fold into an ampicillin selection medium. They present evidence that a recA- strain develops ampicillin resistance mutations more rapidly than the wild-type, as indicated by the Minimum Inhibitory Concentration (MIC) and mutation frequency. Whole-genome sequencing of recA- colonies resistant to ampicillin showed predominant inactivation of genes involved in the multi-drug efflux pump system, contrasting with wild-type mutations that seem to activate the chromosomal ampC cryptic promoter. Further analysis of mutants, including a lexA3 mutant incapable of inducing the SOS response, led the authors to conclude that the rapid evolution of antibiotic resistance occurs via an SOS-independent mechanism in the absence of recA. RNA sequencing suggests that antioxidative response genes drive the rapid evolution of antibiotic resistance in the recA- strain. They assert that rapid evolution is facilitated by compromised DNA repair, transcriptional repression of antioxidative stress genes, and excessive ROS accumulation.

      Strengths:

      The experiments are well-executed and the data appear reliable. It is evident that the inactivation of recA promotes faster evolutionary responses, although the exact mechanisms driving this acceleration remain elusive and deserve further investigation.

      Weaknesses:

      Some conclusions are overstated. For instance, the conclusion regarding the LexA3 allele, indicating that rapid evolution occurs in an SOS-independent manner (line 217), contradicts the introductory statement that attributes evolution to compromised DNA repair.

      We thank the reviewer for this insightful observation, which highlights a central conceptual advance of our study. Our data indeed indicate that resistance evolution in ΔrecA occurs independently of canonical SOS induction (as shown by the lack of resistance in lexA3, dpiBA, and translesion polymerase mutants), yet is clearly associated with impaired DNA repair capacity (e.g., downregulation of polA, mutH, mutY).

      This apparent “contradiction” reflects the dual role of RecA: it functions both as the master activator of the SOS response and as a key factor in SOS-independent repair processes. Thus, the rapid resistance evolution in ΔrecA is not due to loss of SOS, but rather due to the broader suppression of DNA repair pathways that RecA coordinates, which elevates mutational load under stress (This point is discussed in further detail in our response to Reviewer 1).

      The claim made in the discussion of Figure 3 that the hindrance of DNA repair in recA- is crucial for rapid evolution is at best suggestive, not demonstrative. Additionally, the interpretation of the PolI data implies its role, yet it remains speculative.

      We appreciate this comment and would like to respectfully clarify that our conclusion regarding the role of DNA repair impairment is supported by several independent lines of mechanistic evidence.

      First, our RNA-seq analysis revealed transcriptional suppression of multiple DNA repair genes in ΔrecA cells following ampicillin treatment, including polA (DNA Pol I) and the base excision repair genes mutH, mutY, and mutM (Fig. 4K). This indicates that multiple repair pathways, including those responsible for correcting oxidative DNA lesions, are downregulated under these conditions.

      Second, we observed a significant reduction in DNA Pol I protein expression as well as reduced colocalization with chromosomal DNA in ΔrecA cells, suggesting impaired engagement of repair machinery (Fig. 3C-E). These phenotypes are not limited to transcriptional signatures but extend to functional protein localization.

      Third, and most importantly, resistance evolution was fully suppressed in ΔrecA cells upon co-treatment with glutathione (GSH), which reduces ROS levels. As GSH did not affect ampicillin killing (Fig. 4J), these findings suggest that mutagenesis and thus the emergence of resistance requires both ROS accumulation and the absence of efficient repair.

      Therefore, we believe these data go beyond correlation and demonstrate a mechanistic role for DNA repair impairment in driving stress-associated resistance evolution in ΔrecA. We have revised the Discussion to emphasize the strength of this evidence while avoiding overstatement.

      In Figure 2A table, mutations in amp promoters are leading to amino acid changes.

      We thank the reviewer for spotting this inconsistency. Indeed, the ampC promoter mutations we identified reside in non-coding regulatory regions and do not result in amino acid substitutions. We have corrected the annotation in Fig. 2A and clarified in the main text that these mutations likely affect gene expression through transcriptional regulation, rather than protein sequence alteration.

      The authors' assertion that ampicillin significantly influences persistence pathways in the wild-type strain, affecting quorum sensing, flagellar assembly, biofilm formation, and bacterial chemotaxis, lacks empirical validation.

      We thank the reviewer for pointing this out. In the original version, we acknowledged transcriptional enrichment of genes related to quorum sensing, flagellar assembly, and chemotaxis in the wild-type strain upon ampicillin treatment. However, as we did not directly assess persistence phenotypes (e.g., biofilm formation or persister levels), we agree that such functional inferences were not fully supported. We have revised the relevant statements to focus solely on transcriptomic changes and have removed language suggesting direct effects on persistence pathways.

      Figure 1G suggests that recA cells treated with ampicillin exhibit a strong mutator phenotype; however, it remains unclear if this can be linked to the mutations identified in Figure 2's sequencing analysis.

      We appreciate the reviewer’s comment. This point is discussed in further detail in our response to Reviewer 1.

      Reviewer #3 (Public review):

      In the present work, Zhang et al investigate involvement of the bacterial DNA damage repair SOS response in the evolution of beta-lactam drug resistance evolution in Escherichia coli. Using a combination of microbiological, bacterial genetics, laboratory evolution, next-generation, and live-cell imaging approaches, the authors propose short-term (transient) drug resistance evolution can take place in RecA-deficient cells in an SOS response-independent manner. They propose the evolvability of drug resistance is alternatively driven by the oxidative stress imposed by accumulation of reactive oxygen species and compromised DNA repair. Overall, this is a nice study that addresses a growing and fundamental global health challenge (antimicrobial resistance).

      Strengths:

      The authors introduce new concepts to antimicrobial resistance evolution mechanisms. They show short-term exposure to beta-lactams can induce durably fixed antimicrobial resistance mutations. They propose this is due to comprised DNA repair and oxidative stress. Antibiotic resistance evolution under transient stress is poorly studied, so the authors' work is a nice mechanistic contribution to this field.

      Weaknesses:

      The authors do not show any direct evidence of altered mutation rate or accumulated DNA damage in their model.

      We appreciate the reviewer’s comment. This point is discussed in further detail in our response to Reviewer 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would like to suggest two minor changes to the text.

      (1) Re. WGS data.

      The authors write in their response "We appreciate your concern regarding potential inconsistencies in the WGS methodology. However, we would like to clarify that the primary aim of the WGS experiment was to identify the types of mutations present in the wild type and ΔrecA strains after treatment of ampicillin, rather than to quantify or compare mutation frequencies. This purpose was explicitly stated in the manuscript.

      I think the source of my confusion stemmed from this part in the text:

      "In bacteria, resistance to most antibiotics requires the accumulation of drug resistance associated DNA mutations developed over time to provide high levels of resistance (29). To verify whether drug resistance associated DNA mutations have led to the rapid development of antibiotic resistance in recA mutant strain, we..."

      I would change the phrase "verify whether drug resistance associated DNA mutations have led to the rapid development of antibiotic resistance in recA mutant strain" to "identify the types of mutations present in the wild type and ΔrecA strains after treatment of ampicillin." This would explicitly state what the sequencing was for (ie. ID-ing mutations). The current phrase can give the impression that WGS was used to validate rapid or high mutagenesis.

      Thanks for this suggestion. We have revised this description to “In bacteria, resistance to most antibiotics requires the accumulation of drug resistance associated DNA mutations that can arise stochastically and, under stress conditions, become enriched through selection over time to confer high levels of resistance (33). Having observed a non-random and right-skewed distribution of mutation frequencies in ΔrecA isolates following ampicillin exposure, we next sought to determine whether specific resistance-conferring mutations were enriched in ΔrecA isolates following antibiotic exposure.”

      (2) Re. whether the mutations are "induced" or "pre-existing."

      The authors write:

      "We appreciate your detailed feedback on the language used to describe our data. We understand the concern regarding the use of the term "induced" in relation to beta-lactam exposure. To clarify, we employed not only beta-lactam antibiotics but also other antibiotics, such as ciprofloxacin and chloramphenicol, in our experiments (data not shown). However, we observed that beta-lactam antibiotics specifically induced the emergence of resistance or altered the MIC in our bacterial populations. If resistance had pre-existed before antibiotic exposure, we would expect other antibiotics to exhibit a similar selective effect, particularly given the potential for cross-resistance to multiple antibiotics."

      I think it is important to discuss the negative data for the other antibiotics (along with the other points made in your Reviewer response) in the main text.

      This point is discussed in further detail in our response to Reviewer 1 (Public Review).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public reviews):

      (1) A cartoon paradigm of the HFD treatment window would be a helpful addition to Figure 1. Relatedly, the authors might consider qualifying MHFD as 'lactational MHFD.' Readers might miss the fact that the exposure window starts at birth.

      This is a good suggestion. The MHFD-L model has been used previously (e.g. Vogt et al. 2014). We have included a cartoon of the MHFD-L model and the PLX treatments to Figure 4, which we feel helps the readers and thank the reviewer for the suggestion.

      (2) More details on the modeling pipeline are needed either in Figure 1 or text. Of the ~50 microglia that were counted (based on Figure 1J), were all 50 quantified for the morphological assessments? Were equal numbers used for the control and MHFD groups? Were the 3D models adjusted manually for accuracy? How much background was detected by IMARIS that was discarded? Was the user blind to the treatment group while using the pipeline? Were the microglia clustered or equally spread across the PVN?

      In response to this suggestion, we have expanded the description of the image analysis routine in the methods. The analysis focused on detailed changes in microglial morphology as opposed to overall changes in microglia throughout the PVH as a whole. Accordingly, we applied anatomically matched ROIs to the PVH for the measurements. As described in the methods, the Imaris Filaments tool was used to visualize microglia fully contained within a tissue section and a mask derived from the 3D model for these cells was used to isolate them for further analysis, thereby separating these cells from interstitial labeling corresponding to parts of cell processes or other labeling not associated with selected cells. There was no formal “background subtraction.” This was an error in the previous version of the manuscript and we have revised the methods to reflect the process actually used. The images were segmented (to enhance signal to noise for 3D rendering), and then a Gaussian filter was applied to improve edge detection, which facilitates the morphological measurements.

      (3) Suggest toning back some of the language. For example: "...consistent with enhanced activity and surveillance of their immediate microenvironment" (Line 195) could be "...perhaps consistent with...". Likewise, "profound" (Lines 194, 377) might be an overstatement.

      Revisions have been made to both the Introduction and Discussion to modulate our representation of this controversial issue.

      (4) Representative images for AgRP+ cells (quantified in Figure 2J) are missing. Why not a co-label of Iba1+/AgRP+ as per Figure 1, 3? Also, what was quantified in Figure 2J - soma? Total immunoreactivity?

      Because the density of AgRP labeling does not change in the ARH we omitted the red channel image (AgRP labeling) to highlight the similarity of the microglial morphology. To address the reviewer’s concerns, in the revised figure we have reconstituted the figure with both the green (microglial) and red (AgRP) channels depicted.

      Figure 2J displays the numbers of AgRP neurons counted in the ARH in selected R01s through the ARH. The Methods section has been revised to include the visualization procedure used for the cell counts.

      (5) For the PLX experiment:

      a) "...we depleted microglia during the lactation period" (Line 234). This statement suggests microglia decreased from the first injection at P4 and throughout lactation, which is inaccurate. PLX5622 effects take time, upwards of a week. Thus, if PLX5622 injections started at P4, it could be P11 before the decrease in microglia numbers is stable. Moreover, by the time microglia are entirely knocked down, the pups might be supplementing some chow for milk, making it unclear how much PLX5622 they were receiving from the dam, which could also impact the rate at which microglia repopulation commences in the fetal brain. Quantifying microglia across the P4-P21 treatment window would be helpful, especially at P16, since the PVN AgRP microglia phenotypes were demonstrated and roughly when pups might start eating some chow. b) I am surprised that ~70% of the microglia are present at P21. Does this number reflect that microglia are returning as the pups no longer receive PLX5622 from milk from the dam? Does it reflect the poor elimination of microglia in the first place?

      This is an important point and have revised the first sentence in section 2.3 to clarify the PLX treatment logic and added a cartoon to Fig. 4 to show the treatment timeline. The PLX5622 was not administered to the dams but daily to the pups. We also agree with the interpretation that PLX5622 depleted numbers of microglia, as supported by the microglial cell counts, rather than effected a complete elimination and have made revisions to clarify this distinction. Although mice were weighed at weaning, cellular measurements were only made in mice perfused at P55.

      (6) Was microglia morphology examined for all microglia across the PVN? It is possible that a focus on PVNmpd microglia would reveal a stronger phenotype? In Figure 4H, J, AgRP+ terminals are counted in PVN subregions - PVNmpd and PVNpml, with PVNmpd showing a decrease of ~300 AgRP+ terminals in MHFD/Veh (rescued in MHFD/PLX5622). In Figure 1K, AgRP+ terminals across what appears to be the entire PVN decrease by ~300, suggesting that PVNmpd is driving this phenotype. If true, then do microglia within the PVNmpd display this morphology phenotype?

      We have revised the description of the analysis procedures to clarify these points. All measurements were made in user defined, matched regions of interest according to morphological features of the PVH. No measurements were made that included the entire PVH and we revised the Methods section to improve clarity.

      (7) What chow did the pups receive as they started to consume solid food? Is this only a MHFD challenge, or could the pups be consuming HFD chow that fell into the cage?

      The pups were weaned onto the same normal chow diet that the dams received prior to MHFD-L treatment. The cages were inspected daily and minimal HFD spillage was observed, although we cannot rule out with certainty any contribution of the pups directly consuming the HFD. We have edited Methods section 5.2 for clarity.

      (8) Figure 5: Does internalized AgRP+ co-localize with CD68+ lysosomes? How was 'internalized' determined?

      This important point has been clarified by revisions to the Methods section.

      (9) Different sample sizes are used across experiments (e.g., Figure 4 NCD n=5, MHFD n=4). Does this impact statistical significance?

      Sample size does impact power of ANOVA with larger samples reducing the chance of errors. ANOVA is generally robust in the face of moderate departures from the assumption of equal sample sizes and equal variance such as we experienced in the PLX5622 experiment. Here we used t-tests to detect differences in a single variable between two groups and two-way ANOVA to compare treatment by diet and treatment changes in the PLX5622 studies. Additional detail has been added to the Methods section to clarify this point.

      Reviewer #2 (Public reviews):

      (1) Under chow-fed conditions, there is a decrease in the number of microglia in the PVH and ARH between P16 and P30, accompanied by an increase in complexity/volume. With the exception of PVH microglia at P16, this maturation process is not affected by MHFD. This "transient" increase in microglial complexity could also reflect premature maturation of the circuit.

      This is an interesting possibility that requires future investigation (see response to Recommended Suggestions, above).

      (2) The key experiment in this paper, the ablation of microglia, was presumably designed to prevent microglial expansion/activation in the PVH of MHFD pups. However, it also likely accelerates and exaggerates the decrease in cell number during normal development regardless of maternal diet. Efforts to interpret these findings are further complicated because microglial and AgRP neuronal phenotypes were not assessed at earlier time points when the circuit is most sensitive to maternal influences.

      We agree that evaluations of microglia and hypothalamic circuits at many more time points would indeed be informative (see comments above).

      (3) Microglial loss was induced broadly in the forebrain. Enhanced AgRP outgrowth to the PVH could be caused by actions elsewhere, such as direct effects on AgRP neurons in the ARH or secondary effects of changes in growth rates.

      A local effect of microglia in the ARH that affects growth of AgRP axons remains a distinct possibility that deserves a targeted examination (see response to Recommended Suggestions, above).

      (4) Prior publications from the authors and other groups support the idea that the density of AgRP projections to the PVH is primarily driven by factors regulating outgrowth and not pruning. The failure to observe increased engulfment of AgRP fibers by PVH microglia is therefore not surprising. The possibility that synaptic connectivity is modulated by microglia was not explored.

      Synaptic pruning and regulation of axon targeting are not mutually exclusive processes and microglia may participate in both. Here we evaluated innervation of the PVH, which is sensitive to MHFD-L exposure, and engulfment of AgRP terminals by microglia, which does appear to be altered by MHFD-L. Given previous observations of terminal engulfment by microglia in other brain regions in response to environmental changes (e.g. prolonged stress) it is not unreasonable to expect this outcome in the offspring of MHFD-L dams.  In future work it will be important to profile multiple cell types in the PVH for microglial dependent and MHFDL-sensitive changes in targeting of AgRP axons. Equally important is a full characterization of postsynaptic changes in PVH neurons.

      Reviewer #3 (Public reviews):

      There was no attempt to interrogate microglia in different parts of the hypothalamus functionally. Morphology alone does not reflect a potential for significant signaling alterations that may occur within and between these and other cell types.

      The authors should discuss the limitations of their approach and findings and propose future directions to address them.

      We agree that evaluations of microglia and hypothalamic circuits at many more time points that include analyses of multiple regions would indeed be informative. We have added statements to the manuscript that address the limitations of our experimental approach and suggest future studies that will extend understanding of underlying mechanisms beyond those investigated here.

      Recommendations for the authors:

      Reviewing Editors Comments:

      (1) The Abstract is 405 words and should be shortened to less than 200 words.  

      The abstract has been edited to 200 words.

      (2) The authors might consider raising the question in the Introduction of whether reduced AgRP innervation of the PVN in MHFD-treated mice is due to decreased axonal growth, enhanced microglial-mediated pruning, or a combination of both. The potential effects on axonal growth should be given more consideration.

      This is an important point that we agree deserves additional consideration in the manuscript. Our past work has focused on leptin’s ability to influence axonal targeting of PVH neurons by AgRP and PPG neurons through a cell-autonomous mechanism and our conclusion is that leptin primarily induces axon growth. Because in this study our design did not focus on changes in axon growth over time but on regional changes in microglia and their interactions with AgRP terminals we did not want to divert attention from our logic in the introduction by highlighting multiple mechanisms. However, we have added a brief mention in the Introduction and have expanded consideration of axonal growth effects to the Discussion. Distinguishing between microglia’s role in synaptic density or axon targeting in this pathway is an important goal of future work.

      (3) Line 37, a high-fat diet should be defined here as HFD and used consistently thereafter. Note that "high-fat-diet exposure" requires two hyphens.

      The suggested revisions have been made throughout the manuscript.

      (4) Line 38 and elsewhere, MHFD does not adequately describe the treatment being limited to the lactation period, perhaps MLHFD would be better or just LHFD (because the pups can't lactate).

      The suggested revisions have been made throughout the manuscript, and we have used MHFD-L to describe maternal consumption of a high-fat diet that is restricted to the lactation period.

      (5) Line 110, leptin-deficient mice (add hyphen).

      (6) Line 183, NCD should be defined.

      The suggested revisions have been made throughout the manuscript.

      (7) Lines 237- 238, it is not clear what is widespread in the rostral forebrain. Is it the loss of microglia? What is the dividing point between the rostral and caudal forebrain? Were microglia depleted in the caudal forebrain too?

      We have revised this section of the manuscript to focus the description on the hypothalamus alone and specify that the reduction in microglial density is not restricted to the PVH.  

      (8) Line 245, microglial-mediated effects (add hyphen).

      (9) Line 247, vehicle-treated mice (add hyphen).

      The suggested revisions have been made throughout the manuscript.

      (10) Line 457, when referring to genes, the approved gene name should be used in italics, AgRP should be Agrp (italics).

      The suggested revision has been made throughout the manuscript.

      (11) Line 459, the name of the Syn-Tom mice in the Key Resource table, Methods, and Text should be consistent. It would be best to use the formal name of the Ai34 line of mice on the JAX website.

      The suggested revisions have been made throughout the manuscript.

      (12) Figure 1G H, and I um should have Greek micro; Fig. 1J and K, Replace # with Number. The same suggestions apply to all the other figures.

      Both the manuscript and figures have been revised in accordance with this recommendation.

      (13) Figures 4 G, H, I and J. and Figures 5 M and O. The font size is too small to see well.

      Fonts have been changed in the figures to improve visibility.

      Reviewer #1 (Recommendations for the authors):

      (1) Figures are out of order in the text. For example, Figure 1A is followed next by the results for Figure 1J instead of Figure 1B.

      We regret that the organization of figure panels makes for awkward matching for the reader as they proceed through the text. We designed the figures to facilitate comparisons between cellular responses and differences in labeling. After evaluating a reorganization, we decided to maintain the original panel configurations, but have revised the text to more closely follow the presentation of cellular features in the figures.

      (2) Figure 1B.: All images lack scale bars.

      (3) Line 433 - 'underlie' is spelled wrong.

      (4) Rosin et al should be 2019 and not 2018.

      These corrections have been implemented in the revised text and figures.

      (5) The statement that "the effects of MHFD on microglial morphology in the PVH of offspring display both temporal and regional specificity, which correspond to a decrease in the density of AgRP inputs to the PVH" (Line 196) needs clarification, as the phrase "regional specificity" has not been substantiated in this section even though it is discussed later.

      We agree with this comment and have revised section 2.1 to more closely match the data presented to this point in the manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) The claim of "spatial specificity" in the effects of MHFD on microglia is based on an increase in the complexity/volume of microglia at P16 in the PVH that was not seen in the ARH or BNST. The transient nature of the effect raises several questions: Does the effect on the PVH represent premature maturation?

      This is an interesting suggestion. However, given how little is known about microglial maturation in the hypothalamus it is difficult to address. It is indeed possible that microglia mature at different rates in each AgRP target, and that MHFD-L exposure alters the rate of maturation in some regions but not others. This will require a great deal more analysis of both microglia and ARH projections to understand fully (see below).

      (2) To support their central claim that microglia in the PVH "sculpt the density of AgRP inputs to the PVH" the authors report effects on Iba1+ cells in the PVH of chow-fed dams at P55, body weight at P21, and AgRP projections in the PVH at an unspecified age. It is hard to understand what is happening across "normal" development in chow-fed dams since the number of Iba1+ cells decreases from ~50 to ~25 between P16 and P30 (Figure 1), but then increases to >60 cells at P55 (Figure 4). Given the large fluctuations in microglial population across time, analyzing the same parameters (i.e. microglial number/morphology in the ARH and PVH, AgRP neuronal number in the ARH, and fiber density in the PVH, and body weight) across time points before, during and after the critical period in chow and MHFD conditions would be very helpful.

      The time points we evaluated were chosen to be during and after the previously determined critical period for development of AgRP projections to the PVH, which were then compared with adults (which were all P55) to assess longevity of the effects. We have incorporated revisions to improve the clarity of when measurements were assessed, and treatments implemented. Defining the cellular dynamics of microglia across time remains a major challenge for the field and will certainly be informed by future studies with additional time points, as well as by in vivo imaging studies focused on regions identified here. Although such studies are beyond the scope of the present work, their completion would advance our current understanding of how microglia respond to nutritional changes during development of feeding circuits.

      (3) As microglia are also ablated in the ARH, direct effects on AgRP neurons or indirect effects via changes in growth rates could also contribute to increased AgRP fiber density in the PVH. In support of the first possibility, postnatal microglial depletion increases the number of AgRP neurons (Sun, et al. 2023).

      We agree with the suggestion, also raised by the Reviewing Editor, which has been addressed briefly in the Introduction, and in more detail by revisions to the Discussion section.

      (4) The failure to assess alpha-MSH fibers in the same animals was a missed opportunity. They are also affected by MHFD but likely involve a distinct mechanism (Vogt, et al 2014).

      Given the paired interest in POMC neurons and AgRP neurons I understand the reviewer’s comment. We chose to focus solely on AgRP neurons because we do not currently have a way to genetically target axonal labeling exclusively to POMC neurons due to the shared precursor origin of POMC neurons and a percentage of NPY neurons in the ARH, as shown by Lori Zeltser’s laboratory. Moreover, the elegant work by Vogt et al. focused on responses of POMC neurons in the MHFD-L model. However, it certainly remains possible that microglia in the PVH interact with terminals derived from POMC neurons, as well as with terminals derived from other afferent populations of neurons.

      (5) All statistical analyses involved unpaired t-tests. Two-way ANOVAs should be used to assess the effects of age and HFD and interactions between these factors.

      We used t-tests to detect differences in a single variable between two groups and two-way ANOVA to compare treatment by diet and treatment changes in the PLX5622 studies.  Additional detail has been added to the Methods section and information added to the figure legend for Fig. 4 to clarify this point.

      Reviewer #3 (Recommendations for the authors):

      I suggest exploring the deeper characterization of the microglia in various parts of the hypothalamus in different conditions. This could include cytokine assessment or spatial transcriptomic.

      We agree that a great deal more work is needed to improve our understanding of how microglia impact hypothalamic development more broadly and to identify underlying molecular mechanisms. We are hopeful that the data presented here will motivate additional study of microglial dynamics in multiple hypothalamic regions, as well as detailed studies of cellular signaling events for factors derived from MHFD-L dams that impact neural development in the hypothalamus.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      n this manuscript, the authors used a leucine/pantothenate auxotrophic strain of Mtb to screen a library of FDA-approved compounds for their antimycobacterial activity and found significant antibacterial activity of the inhibitor semapimod. In addition to alterations in pathways, including amino acid and lipid metabolism and transcriptional machinery, the authors demonstrate that semapimod treatment targets leucine uptake in Mtb. The work presents an interesting connection between nutrient uptake and cell wall composition in mycobacteria.

      Strengths:

      The link between the leucine uptake pathway and PDIM is interesting but has not been characterized mechanistically. The authors discuss that PDIM presents a barrier to the uptake of nutrients and shows binding of the drug with PpsB. However it is unclear why only the leucine uptake pathway was affected.

      We observe interference of L-leucine, but not of pantothenate, uptake in mc2 6206 strain upon semapimod treatment. At present, we do not have any clue whether PDIM presents a barrier exclusively to the uptake of L-leucine. Further studies may shed a light on underlying mechanism(s) by which L-leucine uptake is modulated by this small molecule.

      We still do not know what PpsB actually does for amino acid uptake - is it a transporter?

      By BLI-Octet we do not find any interaction between L-leucine and PpsB. Therefore, we doubt that PpsB is a transporter of L-leucine.

      Does semapimod binding affect its activity?

      Our study suggests that semapimod treatment alters PDIM architecture which becomes restrictive to L-leucine. However, at present the exact mechanism is not clear. Further studies are required to thoroughly examine the effect of semapimod on Mtb PpsB activity and alterations in PDIM by mass spectrometry.

      Does the auxotrophic Mtb have lower PDIM levels compared to wild-type Mtb?

      As per the published report by Mulholland et al, and by vancomycin susceptibility phenotype in our study, both the strains appear to have comparable PDIM levels.

      The authors show an interesting result where they observed antibacterial activity of semapimod against H37Rv only in vivo and not in vitro. Why do the authors think this is the basis of this observation? It is possible semapimod has an immunomodulatory effect on the host since leucine is an essential amino acid in mice. The authors could check pro-inflammatory cytokine levels in infected mouse lungs with and without drug treatment.

      Semapimod inhibits production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6, which would indeed help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth.

      The authors show that the semapimod-resistant auxotroph lacks PDIM. The conclusions would be further strengthened by including validations using PDIM mutants, including del-ppsB Mtb and other genes of the PDIM locus, whether in vivo this mutant would be more susceptible (or resistant) to semapimod treatment.

      PDIM is a virulence factor, and plays an important role in the intracellular survival of the TB pathogen. Mtb strains lacking PDIM are expected to show attenuated growth during infection, even without semapimod treatment. In such a case, it might be difficult to draw any conclusions about the effect of semapimod against PDIM(-) strains in vivo.

      Prolonged subculturing can introduce mutations in PDIM, which can be overcome by supplementing with propionate (Mullholland et al, Nat Microbiol, 2024). Did the authors also supplement their cultures with propionate? It would be interesting to see what mutations would result in Semr strains with propionate supplementation along with prolonged semapimod treatment.

      Considering the fact that extensive subculturing may result in loss of PDIM, we avoided prolonged subculturing of bacteria. As presented in Fig. 6b, the WT bacteria retain PDIM. While performing the initial screening of drugs, we did not anticipate such phenotype, and hence bacteria were cultured in regular 7H9-OADS medium without propionate supplementation.

      A comprehensive future study would help examining the effect of propionate on generation of semapimod resistant mutants in Mtb mc2 6206.

      Weaknesses:

      I have summarized the limitations above in my comments. Overall, it would be helpful to provide more mechanistic details to study the connection between leucine uptake and PDIM.

      Reviewer #2 (Public review):

      Summary

      This important study uncovers a novel mechanism for L-leucine uptake by M. tuberculosis and shows that targeting this pathway with 'Semapimod' interferes with bacterial metabolism and virulence. These results identify the leucine uptake pathway as a potential target to design new anti-tubercular therapy.

      Strengths

      The authors took numerous approaches to prove that L-leucine uptake of M. tuberculosis is an important physiological phenomenon and may be effectively targeted by 'Semapimod'. This study utilizes a series of experiments using a broad set of tools to justify how the leucine uptake pathway of M. tuberculosis may be targeted to design new anti-tubercular therapy.

      Weaknesses

      The study does not explain how L-leucine is taken up by M. tuberculosis, leaving the mechanism unclear. Even though 'Semapimod' binds to the PpsB protein, the relevant connection between changes in PDIM and amino acid transport remains incomplete.

      While Leucine uptake involves specific transporters in other bacteria, such transport system is not known in Mtb. By screening small molecule inhibitors, we came across a molecule, semapimod, which selectively kills the leucine auxotroph (mc2 6206), but not the WT Mtb. To understand the underlying mechanism of differential susceptibility of the WT and auxotrophic strains to this molecule, we evaluated the effect of restoration of leuCD and panCD expression on susceptibility of the auxotrophic strain to semapimod. Interestingly, our results demonstrated that upon endogenous expression of leuCD genes, mc2 6206 strain becomes resistant to killing by semapimod. In contrast, no effect of panCD expression was observed on semapimod susceptibility of mc2 6206. These findings were further substantiated by gene expression analysis of semapimod treated mc2 6206, which exhibits differential regulation of a set of genes that are altered upon leucine depletion in Mtb as well as in other bacteria. Overall results thus provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph.

      To further gain mechanistic insights into the effect of semapimod on leucine uptake in Mtb, we generated the semapimod resistant strain which exhibits point mutation in 4 genes including ppsB. Interestingly, overexpression of wild-type ppsB, but not of other genes, restored susceptibility of the resistant bacteria to semapimod. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As mentioned above, we anticipate that semapimod treatment brings about certain modifications in PDIM which becomes more restrictive to L-leucine. A comprehensive future study will be helpful to examine the effect of semapimod on Mtb physiology.

      Also, the fact that the drug does not function on WT bacteria makes it a weak candidate to consider its usefulness for a therapeutic option.

      We agree that semapimod is not an appropriate drug candidate against TB owing to its inhibitory effect on production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6 that help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth. Therefore targeting L-leucine uptake can be a novel therapeutic strategy against TB.

      Reviewer #3 (Public review):

      Agarwal et al identified the small molecule semapimod from a chemical screen of repurposed drugs with specific antimycobacterial activity against a leucine-dependent strain of M. tuberculosis. To better understand the mechanism of action of this repurposed anti-inflammatory drug, the authors used RNA-seq to reveal a leucine-deficient transcriptomic signature from semapimod challenge. The authors then measured a decreased intracellular concentration of leucine after semapimod challenge, suggesting that semapimod disrupts leucine uptake as the primary mechanism of action. Unexpectedly, however, resistant mutants raised against semapimod had a mutation in the polyketide synthase gene ppsB that resulted in loss of PDIM synthesis. The authors believe growth inhibition is a consequence of decreased accumulation of leucine as a result of an impaired cell wall and a disrupted, unknown leucine transporter. This study highlights the importance of branched-chain amino acids for M. tuberculosis survival, and the chemical genetic interactions between semapimod and ppsB indicate that ppsB is a conditionally essential gene in a medium depleted of leucine.

      The conclusions regarding the leucine and PDIM phenotypes are moderately supported by experimental data. The authors do not provide experimental evidence to support a specific link between leucine uptake and impaired PDIM production. Additional work is needed to support these claims and strengthen this mechanism of action.

      As mentioned above, overall results from this study provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As hitherto mentioned, it appears that semapimod treatment brings about certain modifications in PDIM which becomes restrictive to L-leucine. Future studies are required to gain detailed mechanistic insights into the effect of semapimod on Mtb physiology.

      Since leucine uptake and PDIM synthesis are important concepts of the manuscript, experiments would benefit from exploring other BCAAs to know if the phenotypes observed are specific to leucine, and adding additional strains to the 2D TLC experiments to provide confidence in the absence of the PDIM band.

      We thank the peer reviewer for this suggestion. We would be happy to analyse the effect of semapimod on the level of other amino acids including BCAA by mass spectrometry.

      The intriguing observation that wild-type H37Rv is resistant to semapimod but the leucine-auxotroph is sensitive should be further explored. If the authors are correct and semapimod does inhibit leucine uptake through a specific transporter or disrupted cell wall (PDIM synthesis), testing semapimod activity against the leucine-auxotroph in various concentrations of BCAAs could highlight the importance of intracellular leucine. H37Rv is still able to synthesize endogenous leucine and is able to circumvent the effect of semapimod.

      We thank the peer reviewer for this suggestion. We would explore the possibility of analysing the effect of increasing concentrations of BCAAs on mc2 6206 susceptibility to semapimod.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript finds a negative relationship between tuberculin skin test-induced type I interferon activity with chest X-ray tuberculosis severity in humans. This evidence is between incomplete and solid. It needs a bioinfomatics/transcriptomics reviewer to make a more insightful judgement. The manuscript demonstrates a convincing role for Stat2 in controlling Mycobacterium marinum infection in zebrafish embryos, incomplete data are presented linking reduced leukocyte recruitment to the infection susceptibility phenotype.

      Strengths:

      (1) An interesting analysis of TST response correlated with chest X-ray pathology.

      (2) Novel data on a protective role for Stat2 in a natural host-mycobacterial species infection pairing.

      We appreciate the reviewer’s positive comments.

      Weaknesses:

      (1) The transcriptional modules are very large sets of genes that do not present a clear picture of what is actually being measured relative to other biological pathways.

      The transcriptional module analysis is a major strength of our approach. These gene signatures are derived from independent experiments, most of which have been previously published/validated [1,2]. To clarify, they represent co-regulated gene sets downstream of signalling pathways. Increased number of genes in these modules increases their combinatorial specificity for a given biological pathway. In the human data, they serve as orthogonal validation for the bioinformatic analysis showing enrichment of the type I IFN pathway among TST transcriptome genes that are negatively correlated with radiographic disease severity in pulmonary TB (see Figure 2). Importantly, our modules confirm the relationship with type I IFN signalling (see Figure 2E) by discriminating from type II IFN signalling, which is not statistically significantly correlated with radiographic TB severity (see Figure S6C-E).

      (2) The link between infection-Stat2-leukocyte recruitment and containment of infection is plausible, but lacks a specific link to the first part of the manuscript.

      For clarification, the first part of the study seeks to identify immune response pathways that relate to severity of human disease, leading to the identification of type I IFN signalling. Since the human data are limited to an observational analysis in which we cannot test causality, the second part of our study uses a genetically tractable experimental model to test the hypothesis that type I IFN signalling is host-protective and explore possible mechanisms for a beneficial effect. This leads to the observation that type I IFN responses contribute to early myeloid cell recruitment to the site of infection, that has previously been shown to be crucial for containment of mycobacterial infection in zebrafish larvae. We will further evaluate the introduction and results sections to ensure a clear link between the human and zebrafish work.

      Major concerns

      (1) Line 158: The two transcriptional modules should be placed in the context of other DEG patterns. The macrophage type I interferon module, in particular, is quite large (361 genes). Can this be made more granular in terms of type I IFN ligands and STAT2-dependent genes?

      We respectfully disagree with this comment. For clarification, the 360 gene module reflects the zebrafish larval response to IFNphi1 protein [3]. Type I IFNs are known to induce hundreds of interferon stimulated genes [4]. As explained above, the size of the modules increases specificity for a given signalling pathway. In this case, we are most interested in discriminating type I and type II IFN signalling pathways that represent very different upstream biological processes. The discrimination we achieve with our modular approach is a major advance over previous reports of gene signatures in TB that do not discriminate between the two pathways. In this study, we did not discriminate between signalling downstream of type I IFN ligands and STAT2, consistent with existing literature showing that type I IFN signalling is STAT2 dependent [5,6].

      (2) The ifnphi1 injection into mxa:mCherry stat2 crispants is a nice experiment to demonstrate loss of type I IFN responsiveness. Further data is required to demonstrate if important mycobacterial control pathways (IFNy, TNF, il6?, etc) are intact in stat2 crispants before being able to conclude that these phenotypes are specific to type I IFN.

      Thank you for the positive comment. We acknowledge this point and will attempt to evaluate whether pro-inflammatory cytokine responses are intact in stat2 CRISPants by qPCR or bulk RNAseq. However, these experiments may prove inconclusive because of the limited sensitivity in this approach.

      Reviewer #2 (Public review):

      Summary:

      This study shows that type I interferon (IFN-I) signaling helps protect against mycobacterial infection. Using human gene expression data and a zebrafish model, the authors find that reduced IFN-I activity is linked to more severe disease. They also show that zebrafish lacking the IFN-I signaling gene stat2 are more vulnerable to infection due to poor macrophage migration. These results suggest a protective role for IFN-I in mycobacterial disease, challenging previous findings from other animal models.

      Strengths:

      Strengths of the manuscript include the use of human clinical samples to support relevance to disease, along with a genetically tractable zebrafish model that enables mechanistic insight.

      We welcome the reviewer’s positive summary of our study.

      Weaknesses:

      (1) The manuscript presents intriguing human data showing an inverse correlation between IFN-I gene signatures and TB disease, but the findings remain correlative and may be cohort-specific. Given that the skin is not a primary site of TB and is relatively immunotolerant, the biological relevance of downregulated IFN-I-related genes in this tissue to systemic or pulmonary TB is unclear.

      We agree with the reviewer that the observational human data are correlative. That is precisely why we extend the study to undertake mechanistic studies in a genetically tractable animal model, using M. marinum infection of zebrafish larvae. In the introduction, we already provide a detailed rationale for the strengths of the TST model to study human immune responses to a standardised mycobacterial challenge. This approach mitigates against the confounding of heterogeneity in bacterial burden and sampling different stages of the natural history of infection in conventional observational human studies. Therefore, the application of the TST is a major strength of this study. We do not understand the context in which the reviewer suggests the skin is immunotolerant. In the present study and previous work we provide molecular level analysis of the TST as a robust cell mediated immune response that reflects molecular perturbation in granuloma from the site of pulmonary TB disease 1.

      (2) The reliance on stat2 CRISPants in zebrafish offers a limited view of IFN-I signaling. Including additional crispant lines targeting other key regulators (e.g., ifnar1, tyk2, irf3, irf7) would strengthen the interpretation and clarify whether the observed effects reflect broader IFN-I pathway disruption.

      We respectfully disagree with this comment. Our objective was to test the role of type I IFN signalling in M. marinum infection of zebrafish. We show that stat2 deletion effectively disrupts type I IFN signalling (Figure S8). Therefore, we do not see a compelling rationale to evaluate other molecules in the signalling pathway.

      (3) The conclusion that IFN-I is protective contrasts with established findings from murine and non-human primate models, where IFN-I is often detrimental. While the authors highlight species differences, the lack of functional human data and reliance on M. marinum in zebrafish limit the translational relevance. A more balanced discussion addressing these discrepancies would improve the manuscript.

      We acknowledge that our findings contrast with the prevailing view in published literature to date. We will further review the discussion to see how we can elaborate on the potential strengths and weaknesses of different experimental approaches, which may underpin these discrepancies.

      (4) Quantification of bacterial burden using fluorescence intensity alone may not accurately reflect bacterial viability. Complementary methods, such as qPCR for bacterial DNA, would provide a more robust assessment of antimicrobial activity.

      We and others have previously validated the use of the quantitative measures of fluorescence, used here as a measure of bacterial load [7,8]. Importantly, our measurements do not rely purely on the total fluorescence signal, but also measures of dissemination of infection, for which we see consistent findings. It is also widely recognised that DNA measurements do not necessarily correlate well with bacterial viability. Therefore, we respectfully disagree that a PCR-based approach will add substantial value to our existing analysis.

      (5) Finally, the authors should clarify whether impaired macrophage recruitment in stat2 crispants results from defects in chemotaxis, differentiation, or survival, and address discrepancies between their human blood findings and prior studies.

      We acknowledge that these are important questions. Our data show that stat2 disruption does not impact total macrophage numbers at baseline (Figure 4A,B) and therefore do not support any effect of Stat2 signalling on steady state macrophage survival or differentiation. The downregulation of macrophage mpeg1 expression in M. marinum infection precludes long-term follow-up of these cells in the context of infection [9]. Therefore, we cannot currently test the hypothesis that Stat2 signalling may influence death of macrophages recruited to the site of infection or make them more susceptible to the cytopathic effects of direct mycobacterial infection. We will attempt to confirm using short-term time-lapse imaging that cellular migration to the site of hindbrain M. marinum infection is reduced in stat2 deficient zebrafish. On the strength of what is possible to test and the established role of type I IFNs in induction of several chemokines [10,11], the most likely effect is that Stat2 signalling increases recruitment through chemokine production. We are exploring the possibility of testing changes to the chemokine profile in stat2 CRISPants by qPCR or bulk RNAseq, but these experiments may prove inconclusive because of the limitations of sensitivity in this approach.

      We recognize that our finding of no relationship between peripheral blood type I IFN activity and severity of human TB contrasts with that of previous studies. As stated in the discussion, the most likely explanation for this is our use of transcriptional modules which reflect exclusive type I IFN responses. The signatures used in other studies include both type I and type II IFN inducible genes and therefore also reflect IFN gamma driven responses.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors presented an interesting study providing an insight into the role of Type-I interferon responses in tuberculosis (TB) pathogenesis by combining transcriptome analysis of PBMCs and TST from tuberculosis patients. The zebrafish model was used to identify the changes in the innate immune cell population of macrophages and neutrophils. The findings suggested that Type-I interferon signatures inversely correlated with disease severity in the TST transcriptome data. The authors validated the observations by CRISPR-mediated disruption of stat2 (a critical transcription factor for type I interferon signaling) in zebrafish larvae, showing increased susceptibility to M. marinum infection. Traditionally, type-I interferon responses have been viewed as detrimental in mycobacterial infections, with studies suggesting enhanced susceptibility in certain mouse models. The study tried to identify and further characterize the understanding of the role of type-I interferons in TB.

      Strengths:

      Traditionally, type-I interferon responses have been viewed as detrimental in mycobacterial infections, with studies suggesting enhanced susceptibility in certain mouse models. The study tried to further understand the role of type-I interferons in TB pathogenesis.

      We thank the reviewer for their summary.

      Weaknesses:

      Though the study showed an inverse correlation of Type-I interferon with radiological features of TB, the molecular mechanism is largely unexplored in the study, which is making it difficult to understand the basis of the results shown in the manuscript by the authors.

      We respectfully disagree with this comment. The observations in the human data lead to the hypothesis that type I IFN responses may be host-protective, which we then test specifically in the zebrafish model, and explore candidate mechanisms, focussing on myeloid cell recruitment to the site of infection.

      References

      (1) Bell, L.C.K., Pollara, G., Pascoe, M., Tomlinson, G.S., Lehloenya, R.J., Roe, J., Meldau, R., Miller, R.F., Ramsay, A., Chain, B.M., et al. (2016). In Vivo Molecular Dissection of the Effects of HIV-1 in Active Tuberculosis. PLoS Pathog. 12, e1005469. https://doi.org/10.1371/journal.ppat.1005469.

      (2) Pollara, G., Turner, C.T., Rosenheim, J., Chandran, A., Bell, L.C.K., Khan, A., Patel, A., Peralta, L.F., Folino, A., Akarca, A., et al. (2021). Exaggerated IL-17A activity in human in vivo recall responses discriminates active tuberculosis from latent infection and cured disease. Sci. Transl. Med. 13, eabg7673. https://doi.org/10.1126/scitranslmed.abg7673.

      (3) Levraud, J.-P., Jouneau, L., Briolat, V., Laghi, V., and Boudinot, P. (2019). IFN-Stimulated Genes in Zebrafish and Humans Define an Ancient Arsenal of Antiviral Immunity. J. Immunol. Baltim. Md 1950 203, 3361–3373. https://doi.org/10.4049/jimmunol.1900804.

      (4) Schoggins, J.W. (2019). Interferon-Stimulated Genes: What Do They All Do? Annu. Rev. Virol. 6, 567–584. https://doi.org/10.1146/annurev-virology-092818-015756.

      (5) Blaszczyk, K., Nowicka, H., Kostyrko, K., Antonczyk, A., Wesoly, J., and Bluyssen, H.A.R. (2016). The unique role of STAT2 in constitutive and IFN-induced transcription and antiviral responses. Cytokine Growth Factor Rev. 29, 71–81. https://doi.org/10.1016/j.cytogfr.2016.02.010.

      (6) Begitt, A., Droescher, M., Meyer, T., Schmid, C.D., Baker, M., Antunes, F., Knobeloch, K.-P., Owen, M.R., Naumann, R., Decker, T., et al. (2014). STAT1-cooperative DNA binding distinguishes type 1 from type 2 interferon signaling. Nat. Immunol. 15, 168–176. https://doi.org/10.1038/ni.2794.

      (7) Stirling, D.R., Suleyman, O., Gil, E., Elks, P.M., Torraca, V., Noursadeghi, M., and Tomlinson, G.S. (2020). Analysis tools to quantify dissemination of pathology in zebrafish larvae. Sci. Rep. 10, 3149. https://doi.org/10.1038/s41598-020-59932-1.

      (8) Takaki, K., Davis, J.M., Winglee, K., and Ramakrishnan, L. (2013). Evaluation of the pathogenesis and treatment of Mycobacterium marinum infection in zebrafish. Nat. Protoc. 8, 1114–1124. https://doi.org/10.1038/nprot.2013.068.

      (9) Benard, E.L., Racz, P.I., Rougeot, J., Nezhinsky, A.E., Verbeek, F.J., Spaink, H.P., and Meijer, A.H. (2015). Macrophage-expressed perforins mpeg1 and mpeg1.2 have an anti-bacterial function in zebrafish. J. Innate Immun. 7, 136–152. https://doi.org/10.1159/000366103.

      (10) Lehmann, M.H., Torres-Domínguez, L.E., Price, P.J.R., Brandmüller, C., Kirschning, C.J., and Sutter, G. (2016). CCL2 expression is mediated by type I IFN receptor and recruits NK and T cells to the lung during MVA infection. J. Leukoc. Biol. 99, 1057–1064. https://doi.org/10.1189/jlb.4MA0815-376RR.

      (11) Buttmann, M., Merzyn, C., and Rieckmann, P. (2004). Interferon-beta induces transient systemic IP-10/CXCL10 chemokine release in patients with multiple sclerosis. J. Neuroimmunol. 156, 195–203. https://doi.org/10.1016/j.jneuroim.2004.07.016.

    1. Author response:

      Reviewer #1:

      Lipid transfer proteins (LTPs) play a crucial role in the intramembrane lipid exchange within cells. However, the molecular mechanisms that govern this activity remain largely unclear. Specifically, the way in which LTPs surmount the energy barrier to extract a single lipid molecule from a lipid bilayer is not yet fully understood. This manuscript investigates the influence of membrane properties on the binding of Ups1 to the membrane and the transfer of phosphatidic acid (PA) by the LTP. The findings reveal that Ups1 shows a preference for binding to membranes with positive curvature. Moreover, coarse-grained molecular dynamics simulations indicate that positive curvature decreases the energy barrier associated with PA extraction from the membrane. Additionally, lipid transfer assays conducted with purified proteins and liposomes in vitro demonstrate that the size of the donor membrane significantly impacts lipid transfer efficiency by Ups1-Mdm35 complexes, with smaller liposomes (characterized by high positive curvature) promoting rapid lipid transfer.

      This study offers significant new insights into the reaction cycle of phosphatidic acid (PA) transfer by Ups1 in mitochondria. Notably, the authors present compelling evidence that, alongside negatively charged phospholipids, positive membrane curvature enhances lipid transfer - an effect that is particularly relevant at the mitochondrial outer membrane. The experiments are technically robust, and my primary feedback pertains to the interpretation of specific results.

      (1) The authors conclude from the lipid transfer assays (Figure 5) that lipid extraction is the rate-limiting step in the transfer cycle. While this conclusion seems plausible, it should be noted that the authors employed high concentrations of Ups1-Mdm35 along with less negatively charged phospholipids in these reactions. This combination may lead to binding becoming the rate-limiting factor. The authors should take this point into consideration. In this type of assay, it is challenging to clearly distinguish between binding, lipid extraction, and membrane dissociation as separate processes.

      We thank the reviewer for the constructive and positive evaluation of our manuscript. We agree that, while our data support the interpretation that the rate-limiting step occurs at the donor membrane, it is difficult to dissect in our assay which of the individual steps at the donor membrane - such as binding of Ups1, lipid extraction into the binding pocket, or dissociation of Ups1 - is rate-limiting. Nevertheless, although we cannot exclude contributions from membrane binding or dissociation, several observations suggest that lipid extraction is a rate-limiting step under our experimental conditions.

      The acceptor membrane has a similar lipid composition to the donor membrane (in tendency, the donor membrane is even a bit richer in binding-promoting lipids). If binding was ratelimiting, similar constraints would be expected at the acceptor membrane during lipid insertion. However, this is not observed.

      Regarding dissociation, if this step were rate-limiting, one would expect similar constraints to be evident at the acceptor vesicles as well. Nevertheless, membrane dissociation might be mechanistically coupled to lipid extraction and thus difficult to evaluate as an independent step.

      Based on our data and the considerations described above, we suggest that lipid extraction is the dominant rate-limiting step at the donor membrane under our conditions. However, we agree that a clear separation of these individual steps is not possible with the current experimental design. We will revise the corresponding passage to clarify that the rate-limiting step occurs at the donor membrane and, based on our observations, likely involves lipid extraction. Future studies aiming on dissecting these steps, will be important for elucidating the mechanism and regulation of Ups1-mediated lipid transfer both in vitro and in vivo.

      (2) The authors should discuss that variations in the size of liposomes will also affect the distance between them at a constant concentration, which may affect the rate of lipid transfer. Therefore, the authors should determine the average size and size distribution of liposomes after sonication (by DLS or nanoparticle analyzer, etc.)

      We agree that variations in liposome size will influence the average distance between vesicles at a given lipid concentration, which may in turn affect the rate of lipid transfer. As suggested, we will include DLS measurements to characterize the size distribution of our different liposome preparations.

      Our setup was designed to keep the total membrane surface area comparable across conditions. This approach ensures a comparable overall binding capacity for Ups1 and enables the comparison of membrane binding and lipid extraction from different membranes. However, we agree that vesicle spacing, which is affected by liposome size at constant lipid concentration, could potentially influence certain steps in the transfer process, such as the time required for Ups1 to travel between donor and acceptor membranes. Whether this intermembrane travel time contributes to rate limitation is indeed an interesting question, and we will address this point through further discussion in the revised manuscript.

      Investigating such effects in our current experimental system would require altering the vesicle concentration, which would in turn change the total membrane surface area and introduce additional variables. Nevertheless, exploring the influence of vesicle spacing and intermembrane distance on lipid transfer represents a promising direction for future studies aimed at dissecting the rate-limiting steps of the transfer cycle.

      (3) The authors use NBD-PA in the lipid transfer assays. Does the size of the donor liposomes affect the transfer of NBD-PA and DOPA similarly? Since NBD-labeled lipids are somewhat unstable within lipid bilayers (as shown by spontaneous desorption in Figure 5B), monitoring the transfer of unlabeled PA in at least one setting would strengthen the conclusion of the swap experiments.

      Ups1-mediated transfer of PA has been demonstrated both by mass spectrometry analysis of donor and acceptor vesicles (Connerth et al., 2012) and by NBD-fluorescence-based lipid transfer assays (Lu et al., 2020; Miliara et al., 2015; Miliara et al., 2019; Miliara et al., 2023; Potting et al., 2013; Watanabe et al., 2015). The fluorescence-based approach has been the most widely applied across multiple studies and has enabled detailed analysis of various aspects of lipid transfer by Ups1. It has been used to investigate mutants of key structural elements—such as the lipid-binding pocket and the α2–loop region. It has also been used to analyze fusion constructs between Ups1 and Mdm35, the influence of Mdm35 variants, and competition with excess Mdm35. Additionally, by comparing the transfer of NBD-labeled PA and NBD-labeled PS, this assay has provided insights into the determinants of the lipid specificity of Ups1. Hence, our experiments are based on the standard assay used to analyse lipid transfer in the field and thus can be corralated with the majority of published data.

      Nevertheless, we agree that it is important to keep in mind that NBD labeling may alter the biophysical properties of lipids and, consequently, affect their transfer efficiency. Moreover, NBD-labeled lipids are not suitable for comparing the transfer efficiency of different PA species, as the label itself may mask differences in acyl chain composition. Therefore, it will be valuable to establish complementary methods that do not rely on NBD-labeled PA. We aim to develop these non-standard methods for possible inclusion in the present study, but even if not fully implemented at this stage, they will certainly form an important part of future investigations.

      (4) The present study suggests that membrane domains with positive curvature at the outer membrane may serve as starting points for lipid transport by Ups1-Mdm35. Is anything known about the mechanisms that form such structures? This should be discussed in the text.

      The origin of positively curved membrane domains is indeed highly relevant in the context of our findings, and while not the primary focus of this work, we will place more emphasis on discussing how such curvature may arise. Mechanisms include the action of curvature-generating proteins, asymmetric lipid composition and curvature induced at membrane contact sites. We have so far included examples of proteins in the outer mitochondrial membrane that are expected to influence curvature in their vicinity, and we will expand on this aspect and other contributing factors more thoroughly in the revised text.

      Reviewer #2:

      Summary:

      Lipid transfer between membranes is essential for lipid biosynthesis across different organelle membranes. Ups1-Mdm35 is one of the best-characterized lipid transfer proteins, responsible for transferring phosphatidic acid (PA) between the mitochondrial outer membrane (OM) and inner membrane (IM), a process critical for cardiolipin (CL) synthesis in the IM. Upon dissociation from Mdm35, Ups1 binds to the intermembrane space (IMS) surface of the OM, extracts a PA molecule, re-associates with Mdm35, and moves through the aqueous IMS to deliver PA to the IM. Here, the authors analyzed the early steps of this PA transfer - membrane binding and PA extraction - using a combination of in vitro biochemical assays with lipid liposomes and purified Ups1-Mdm35 to measure liposome binding, lipid transfer between liposomes, and lipid extraction from liposomes. The authors found that membrane curvature, a previously overlooked property of the membrane, significantly affects PA extraction but not PA insertion into liposomes. These findings were further supported by MD simulations.

      Strengths:

      The experiments are well-designed, and the data are logically interpreted. The present study provides an important basis for understanding the mechanism of lipid transfer between membranes.  

      Weaknesses:

      The physiological relevance of membrane curvature in lipid extraction and transfer still remains open.

      We thank the reviewer for the constructive feedback on our work. We agree that the physiological relevance of membrane curvature in lipid extraction and transfer remains an open question. Our data show that Ups1 binding to native-like OM membranes under physiological pH conditions is curvature-dependent, supporting the idea that this mechanism may optimize lipid transfer in vivo. While the intricate biophysical basis of this behaviour can only be dissected in vitro, these findings offer valuable insight into how curvature may functionally regulate Ups1 activity in the cellular context. To directly test this, it will be important in future studies to identify Ups1 mutants that lack curvature sensitivity and assess their performance in vivo, which will help clarify the physiological importance of this mechanism.

      Reviewer #3:

      The manuscript by Sadeqi et al. studies the interactions between the mitochondrial protein Ups1 and reconstituted membranes. The authors apply synthetic liposomal vesicles to investigate the role of pH, curvature, and charge on the binding of Ups1 to membranes and its ability to extract PA from them. The manuscript is well wrifen and structured. With minor exceptions, the authors provide all relevant information (see minor points below) and reference the appropriate literature in their introduction. The underlying question of how the energy barrier for lipid extraction from membranes is overcome by Ups1 is interesting, and the data presented by the authors could offer a valuable new perspective on this process. It is also certainly a challenging in vitro reconstitution experiment, as the authors aim to disentangle individual membrane properties (e.g., curvature, charge, and packing density) to study protein adsorption and lipid transfer. I have one major suggestion and a few minor ones that the authors might want to consider to improve their manuscript and data interpretation:

      Major Comments:

      The experiments are performed with reconstituted vesicles, which are incubated with recombinant protein variants and quantitatively assessed in flotation and pelleting assays. According to the Materials and Methods section, the lipid concentration in these assays is kept constant at 5 µM. However, the authors change the size of the vesicles to tune their curvature. Using the same lipid concentration but varying vesicle sizes results in different total vesicle concentrations. Moreover, larger vesicles (produced by freeze-thawing and extrusion) tend to form a higher proportion of multilamellar vesicles, thus also altering the total membrane area available for binding. Could these differences in the experimental system account for the variation in binding? To address this, the authors would need to perform the experiments either under saturation (excess protein) conditions or find an experimental approach to normalize for these differences.

      We thank the reviewer for the constructive and positive comments. We agree that, since the total number of lipids was kept constant, the number of vesicles varied with vesicle size in our experiments. However, the setup was specifically designed to maintain a comparable total membrane surface area across conditions, ensuring a comparable number of available binding sites for Ups1. Because membrane surface area decreases with the square of the vesicle radius, keeping vesicle number constant would have led to a marked reduction in binding surface. Our approach was therefore aimed at preserving comparable binding capacity while varying membrane curvature.

      With respect to multilamellarity, we thank the reviewer for addressing this important point. As described above, we aimed to maintain a constant total membrane surface area across all conditions to ensure an equal number of potential binding sites. We agree that multilamellarity in large liposomes could restrict accessibility to part of the membrane surface. However, we see in our experiments that even when the total membrane surface area of the small liposomes is reduced to one quarter of the standard amount, binding to the small liposomes remained stronger than to the larger liposomes at the higher concentration. This strongly indicates that restricted accessibility cannot account for the curvature-specific effect observed. Nonetheless, we will further address this aspect experimentally and in the discussion of the revised manuscript.

      References

      Connerth, M., Tatsuta, T., Haag, M., Klecker, T., Westermann, B., & Langer, T. (2012). Intramitochondrial transport of phosphatidic acid in yeast by a lipid transfer protein. Science, 338(6108), 815-818. https://doi.org/10.1126/science.1225625 

      Lu, J., Chan, C., Yu, L., Fan, J., Sun, F., & Zhai, Y. (2020). Molecular mechanism of mitochondrial phosphatidate transfer by Ups1. Commun Biol, 3(1), 468. https://doi.org/10.1038/s42003-020-01121-x 

      Miliara, X., Garnef, J. A., Tatsuta, T., Abid Ali, F., Baldie, H., Perez-Dorado, I., Simpson, P., Yague, E., Langer, T., & Mafhews, S. (2015). Structural insight into the TRIAP1/PRELI-like domain family of mitochondrial phospholipid transfer complexes. EMBO Rep, 16(7), 824-835. https://doi.org/10.15252/embr.201540229 

      Miliara, X., Tatsuta, T., Berry, J. L., Rouse, S. L., Solak, K., Chorev, D. S., Wu, D., Robinson, C. V., Mafhews, S., & Langer, T. (2019). Structural determinants of lipid specificity within Ups/PRELI lipid transfer proteins. Nat Commun, 10(1), 1130. https://doi.org/10.1038/s41467-019-09089-x 

      Miliara, X., Tatsuta, T., Eiyama, A., Langer, T., Rouse, S. L., & Mafhews, S. (2023). An intermolecular hydrogen bonded network in the PRELID-TRIAP protein family plays a role in lipid sensing. Biochim Biophys Acta Proteins Proteom, 1871(1), 140867. https://doi.org/10.1016/j.bbapap.2022.140867 

      Posng, C., Tatsuta, T., Konig, T., Haag, M., Wai, T., Aaltonen, M. J., & Langer, T. (2013). TRIAP1/PRELI complexes prevent apoptosis by mediating intramitochondrial transport of phosphatidic acid. Cell Metab, 18(2), 287-295. https://doi.org/10.1016/j.cmet.2013.07.008 

      Watanabe, Y., Tamura, Y., Kawano, S., & Endo, T. (2015). Structural and mechanistic insights into phospholipid transfer by Ups1-Mdm35 in mitochondria. Nat Commun, 6, 7922. https://doi.org/10.1038/ncomms8922

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) 8 molar urea not only denatures proteins but also denatures DNA. Obviously, this does not affect the ChIP, since antibodies often recognize small linear epitopes and the proteins are crosslinked. However, under high urea conditions the BUR elements should be rendered single-stranded, and one wonders whether this has any effect on the procedure. The authors should alert the reader of these circumstances.

      Thank you for raising this important question about the effects of 8M urea. We have added a brief paragraph explaining this point in the revised manuscript. Despite common misconceptions, 8M urea by itself does not actively convert double-stranded DNA to single-stranded DNA. For this conversion to occur, a heat denaturation step is required. Once DNA is heat-denatured to become single-stranded, urea can maintain this configuration. This is why the addition of 8M urea to acrylamide gel electrophoresis is a standard method for analyzing single-stranded oligonucleotides, but the DNA must first be denatured by heat (Summer et al., J. Vis. Exp. (32), e1485, DOI : 10.3791/1485). This is clearly described in published work comparing the status of DNA with and without heat treatment in an 8M urea-containing buffer (Hegedus et al., Nucl.Acids Res. 2009 (doi:10.1093/nar/gkp539).

      We have additional evidence supporting this conclusion in the context of our urea ultracentrifugation experiment. Both crosslinked and un-crosslinked genomic DNA purified by 8M urea centrifugation can be digested with restriction enzymes, which indicates that the DNA remains double-stranded. For instance, we previously published SATB1 ChIP-3C results using Sau3A-digested DNA after urea purification. In the current paper, we used HindIII to digest urea-purified DNA for urea4C-seq. The BUR reference map can also be generated after restriction digestion of urea-purified DNA and isolating and sequencing SATB1-bound restriction fragments in vitro. If genomic DNA were denatured by 8M urea ultracentrifugation, we would not have been able to digest it with restriction enzymes to obtain these results.

      We have now added a sentence noting that SATB1 is a double-stranded DNA-binding protein that does not bind to single-stranded DNA, as we have previously shown (Dickinson et al., 1992, Ref 32).

      (2) An important conclusion is that urea-ChIP reveals direct DNA binding events, whereas standard ChIP shows indirect binding (which is stripped off by urea). I do not see any evidence for direct binding. At low resolution, predicted BUR elements are enriched in domains where SATB-1 is mapped by urea-ChIP. A statement like 'In a zoomed-in view, covering a 430 kb region, SATB1 sites identified from urea ChIP-seq precisely coincided with BUR peaks' is certainly not correct: most BUR peaks do not show significant SATB-1 binding. The randomly chosen regions shown in Figure 4 – Supplement 1 show how poor the overlap of SATB-1 and BURs is; indeed, they show that SATB-1 binds DNA mostly at non-BUR sites. I see Figure 2D, but such cumulative plots can be highly biased by very few cases. I suggest showing these data in heat maps instead.

      We believe there may be some confusion regarding the interpretation of our figures. Looking at Track 3 (BUR reference map, RED peaks) and urea SATB1 Tracks 4 and 5 (replicas from two independent experiments) in Fig. 2B, the SATB1 peaks detected by urea ChIP-seq do indeed coincide with BUR peaks. In the revised manuscript, we have provided a further ‘zoomed-in’ view to better illustrate this point and also provided the underlying BUR sequence from one of these SATB1-bound regions (Figure 2—supplement figure 1).

      It is true that many more BURs exist than SATB1-bound BURs, especially in gene-poor regions where BURs are clustered. However, from the perspective of SATB1-bound peaks, the majority of these coincide with BURs, as shown by both deepTools analyses and new heatmap, as suggested (Figure 2E, and Figure 7—supplement figure 3).

      The results from our genome-wide quantitative analyses using deepTools to compare peaks from urea SATB1 ChIP-seq data and the BUR reference map shown in Supplementary Tables 1 and 2 are consistent with the heatmap analyses.

      We must apologize for an error in the scaling of the y-axis in Figure 4-supplement figure 1 that likely contributed to some confusion. We have corrected our mistake in the revised manuscript. As we were preparing our figures, when placed in the figure and axes relabeled for legibility, the BUR reference peaks were mislabeled on their y-axis. In the figure the peaks were erroneously labeled on a scale of 0.1-1 read counts/million reads, but the data shown is actually scaled at 0.1 to 2 read counts per million reads. Unfortunately, we did not realize this error and, using the figure as a guide for scaling, provided urea SATB1 ChIP-seq peaks at a scale of 0.1-1 read counts/million reads to match the mislabeled BUR reference track. This had the effect of reducing the signal/noise in the SATB1 ChIP-seq data (Figure 1). We have now standardized the y-axis for fair comparison using a scaling of the y-axis at 0.1-2 for all tracks.  This will more clearly show that there are indeed more BUR peaks than SATB1-bound sites, consistent with our quantitative analysis.

      We hope that these clarifications as well as the added heatmaps and binding site example allay the concerns about the specificity and overlap of SATB1 binding on BURS.

      (3) In Figure 6C 'peaks' are compared. However, looking at Figure 4 - Supplement 1 again it is clear that peak calling can yield a misleading impression. Figure 6D suggests that there are more BURs than SATB-1 peaks but this is not true from looking at the browser.

      We thank the reviewer for this observation. As noted in our response to point 2 above, the inconsistent y-axis scaling in Figure 4-supplement figure 1 created a misleading impression, which we have corrected in the revised manuscript. When properly displayed with consistent y-axis scaling, the browser view aligns with our quantitative data showing that there are indeed many more BURs than SATB1-bound sites. As mentioned under 2 above, we have performed genome-wide quantitative analysis by deepTools (Supplementary Tables 1 and 2) to confirm the results shown by bar graphs in Fig. 6C, 6D and Fig. 2D. 

      In Figure 6C, the bars show the percentage of SATB1-bound peaks in each cell type (denominator) that overlap with confirmed BUR sites in the BUR reference map (numerator). In Figure 6D, we show the percentage of total BUR sites in the BUR reference map (denominator) that are bound by SATB1 from urea ChIP-seq (numerator). To avoid any confusion, we have added brief subtitles to Figures 6C and 6D in the revised manuscript.

      (4) An important conclusion is that urea-ChIP reveals direct DNA binding events, whereas standard ChIP shows indirect binding (which is stripped off by urea). I do not yet see any evidence for direct binding. It cannot be excluded that the binding is RNA-mediated. The authors mention in passing that urea-ChIP material still contains (specific!) RNA. Given that this is a new procedure, the authors should document the RNA content of urea-ChIP and RNase-treat their samples prior to ChIP to monitor an RNA contribution.

      Thank you for raising this important point. The direct binding of SATB1 to BURs is well-established in our previous work. Indeed, this was the main motivation to explore the reason for the lack of evidence for genome-wide SATB1 binding to BURs in the DNA-binding profile by standard ChIP-seq. This has been a major point of confusion for us for many years.

      SATB1 was originally identified through a search for mammalian proteins that could recognize BURs specifically and not just any A+T-rich sequence. The Satb1 gene was originally cloned by an expression cDNA library and encoded SATB1 protein bound the BUR probe but not a mutated AT-rich BUR (control) probe.  Subsequent experiments confirmed that SATB1 specifically binds to many BURs without requiring additional factors. Furthermore, SATB1 recognizes BURs by binding in the minor groove of double-stranded DNA, presumably recognizing the altered phosphate backbone structure of BUR DNA, rather than accessing nucleotide bases (Dickinson et al, 1992).

      We do agree with the reviewer, however, that there is a possibility that RNA can redirect SATB1 to different subsets of BURs and/or to interact indirectly with different regulatory regions depending on cell type or developmental stage. Although urea ultracentrifugation clearly separates most RNA (found in the middle region of the tube) from genomic DNA (pelleted at the bottom) (de Belle et al., 1998), upon crosslinking cells, a small quantity of RNA is found co-pelleted with DNA (our recent unpublished results). This RNA, tightly associated with crosslinked chromatin, may have some impact on SATB1 function.

      Based on our preliminary data, we are currently planning to study the impact of RNA using RNase A as well as by targeting specific RNAs employing an anti-sense approach. We believe that thoroughly addressing the impact of RNA warrants a full paper, including the potential roles of specific non-coding RNAs in SATB1 function, and thus is beyond the scope of the current paper. However, we have now added discussion of this important point in the manuscript.

      (5) An important aspect of the model is that SATB1 tethers active genes to inactive LADs. However, in the 4C experiment the BUR elements used to anchor the looping are both in the accessible, active chromatin domain. If the authors want to maintain their statement, they must show a 4C result that connects the 2 distinct domains and transverses A/B domain boundaries. Currently, the data only show a looping within accessible chromatin.

      We appreciate REVIEWER 1 for bringing up the important point that our model could potentially be interpreted as “SATB1 tethers active genes to inactive LADs.” Since we describe that BURs are enriched in LADs and that SATB1 binds a subset of BURs, readers may assume that we aim to demonstrate, through urea 4C-seq, that SATB1 tethers active genes to transcriptionally-inactive LADs (via BURs). However, this is not our intention in the model (Figure 8). In the experiment we designed for our present study,  we selected BUR-1 and BUR-2 as viewpoints from a non-LAD gene-rich region (inter-LAD). Because these BURs are bound by SATB1, it indicates that these BURs are part of the “hard-to-access” SATB1-rich subnuclear structure, which resists extraction, in contrast to accessible chromatin. Thus, we illustrate in the model that BURs anchored to the SATB1-rich nuclear substructure make contact with accessible chromatin over long distances in a SATB1-dependent manner. Therefore, we do not intend to conclude that SATB1 mediates interactions between LADs and inter-LADs (accessible chromatin) from our current study: this would be a topic for future research. In the original model in the submitted manuscript, we used the terms “inaccessible” and “accessible.” In the revised version, we clarified this in the model by changing “inaccessible” to “SATB1-rich subnuclear structure” and carefully revised  the text in the Figure 8 legend to clarify the model. 

      At this time, we do not know exactly how LADs and SATB1 nuclear architecture are related spatially and functionally. While LADs are mapped as genomic domains in proximity to Lamin B1 by LaminB1-DamID, BURs are mapped at ~300-500 bp resolution by urea ChIP-seq. To gain further insight into this important question, a large body of DNA-FISH and immunoDNA-FISH experiments will be required, comparing different cell types to see whether and how specific BURs move between LADs and SATB1 nuclear architecture. Such experiments may benefit from testing the Gabrg1 and Gabra2 loci, where many BURs are anchored to SATB1 in neurons but not in thymocytes, for instance.  This is included in Discussion in the revised manuscript.

      Regarding the reviewer's second point about showing more extended domains for 4C interactions, we would like to highlight that Figure 5—supplement figure 3 in our submitted manuscript addresses this concern. This figure shows that BUR-interactions extend to multiple gene-rich regions across intervening gene-poor regions. Interestingly, BUR-1 and BUR-2 interactions skip a transcriptionally silent gene-rich region containing olfactory receptor genes but interact with subsequent gene-rich regions containing active genes. These data demonstrate that BUR-interactions do indeed traverse A- and B-compartment boundaries.  In the revised manuscript (in Figure 5—supplement figure 3), we newly added a Lamin B1-DamID (thymocyte) track.  Comparing with LADs, BUR-1 interactions occur mostly in non-LAD regions. Some minor overlap with LADs was detected in high resolution views (not shown). Future experiments testing BUR viewpoints that reside within LADs are required to assess whether SATB1 mediates interactions between B and A compartments.

      (6) The description of the urea-co-immunoprecipitation experiment (Figure 3C) could be improved to make it unequivocally clear that co-binding to chromatin is tested, not protein-protein interaction (which is destroyed by urea).

      Thank you for this helpful suggestion. We have revised the text in the manuscript by stating “Distinct from protein-protein co-immunoprecipitation (co-IP) using whole cell or nuclear extracts, we examined the direct co-binding status on chromatin in vivo of SATB1 and CTCF or cohesin by urea ChIP-Western”.

      Reviewer #2:

      (1) Since SATB1 has been described to interact with beta-catenin, I wonder if the authors have looked at TCF4/TCF7l2 binding patterns and their potential overlap with SATB1 binding patterns. This might appear a trivial request. However, uncontrolled WNT signalling is a major feature of cancer undergoing metastasis - a process that the authors have earlier associated with unscheduled SATB1 expression in triple-negative breast cancer.

      We thank the reviewer for highlighting this important point about the potential relationship between SATB1 and TCF4/TCF7l2 binding patterns. Based on published observations with other factors (Rad21, CTCF, BRG1, RUNX) that show substantial overlap with SATB1 in standard ChIP-seq peaks(Kakugawa et al., Cell Rep 19, 1176-1188 (2017). DOI: 10.1016/j.celrep.2017.04.038. Poterlowicz et al., PLoS Genet, 2017 DOI: 10.1371/journal.pgen.1006966), we would anticipate that TCF4 might also show significant overlap with SATB1. An important question is whether the DNA binding profile of TCF4 depends on SATB1.

      We have not yet generated ChIP-seq data for TCF4 in the presence and absence of SATB1, but we agree that such experiments could provide important insights into cancer progression as well as brain function. This represents an interesting direction for future work. We have added this point in our discussion based on your kind suggestion.

      (2) The CTCF sizes indicated in the western blot analyses of Figures 3C and Figure 3 - supplement figure 2 do not display the normal size, which is around 130 kDa. Either the issue is erroneous marking or a so-called salt effect to slow the migration in the gel. Alternatively, it reflects a slower migrating form of CTCF generated by for example PARylation (by PARP1) that is known to approach 180 kDa. It would be useful if the authors could clarify this minor issue.

      We appreciate the reviewer pointing out this discrepancy. As the reviewer correctly noted, CTCF can appear at a higher molecular weight due to post-translational modifications such as PARylation and O-GlcNAcylation, which alter its migration during electrophoresis.

      Upon re-examination of our raw data for Figure 3—supplement figure 2A, we discovered that the marker lane for the CTCF panel was broken, and the 150kDa band was erroneously assigned. This led to the 150kDa marker being placed below the CTCF migration position, which is clearly an error. We thank the reviewer for bringing this to our attention.

      We have checked our other data and consistently observe CTCF migrating below the 150kDa band, similar to the pattern shown on the Abcam website for the antibody we used (ab128873) (Figure 2). For Figure 3-supplement figure 2, we will use a marker lane from a parallel gel with identical composition and run time to correctly indicate the molecular weight. We havealso corrected the marker position in Figure 3C.

      Reviewing Editor (Recommendations for the authors):

      (1) The introduction states that urea ChIP-seq is "unbiased", which is difficult to unambiguously determine and therefore might be an overstatement. Maybe the authors could consider rephrasing.

      We agree with the reviewer's assessment and have rephrased our description of the urea ChIP-seq method to avoid using the term "unbiased."

      (2) The authors propose that in standard ChIP, most SATB1 is in the insoluble fraction. This seems easy to test and demonstrating it may help to further clarify the differences between the protocols.

      We appreciate this suggestion and would like to clarify our description. What we stated in the manuscript was:

      "We envision that SATB1 bound to inaccessible nuclear regions may be lost in the insoluble fraction."

      This refers specifically to a subpopulation of SATB1 that is bound to the high-salt extraction-resistant nuclear substructure, not to the total SATB1 protein. We also noted elsewhere in the manuscript that:

      "SATB1 proteins are found in high salt-resistant fraction as well as salt-extracted fraction (40). Thus, it is possible that soluble SATB1 may associate with open chromatin."

      Our unpublished results show that SATB1 proteins exist in at least two distinct forms based on protein mobility: SATB1 with high mobility and another with very low or no mobility. While we have identified the SATB1 domain responsible for each of these distinct mobility patterns, we have not yet identified biochemical differences that would allow us to distinguish them conclusively. Therefore, an experiment to test the distribution of SATB1 in soluble versus insoluble fractions would show SATB1 in both fractions but would not necessarily provide information about the functional significance of these different populations. We believe this is an important area for future research and are working to develop tools to specifically distinguish and characterize SATB1 in the soluble versus insoluble fractions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work studies representations in a network with one recurrent layer and one output layer that needs to path-integrate so that its position can be accurately decoded from its output. To formalise this problem, the authors define a cost function consisting of the decoding error and a regularisation term. They specify a decoding procedure that at a given time averages the output unit center locations, weighted by the activity of the unit at that time. The network is initialised without position information, and only receives a velocity signal (and a context signal to index the environment) at each timestep, so to achieve low decoding error it needs to infer its position and keep it updated with respect to its velocity by path integration.

      The authors take the trained network and let it explore a series of environments with different geometries while collecting unit activities to probe learned representations. They find localised responses in the output units (resembling place fields) and border responses in the recurrent units. Across environments, the output units show global remapping and the recurrent units show rate remapping. Stretching the environment generally produces stretched responses in output and recurrent units. Ratemaps remain stable within environments and stabilise after noise injection. Low-dimensional projections of the recurrent population activity forms environment-specific clusters that reflect the environment's geometry, which suggests independent rather than generalised representations. Finally, the authors discover that the centers of the output unit ratemaps cluster together on a triangular lattice (like the receptive fields of a single grid cell), and find significant clustering of place cell centers in empirical data as well.

      The model setup and simulations are clearly described, and are an interesting exploration of the consequences of a particular set of training requirements - here: path integration and decodability. But it is not obvious to what extent the modelling choices are a realistic reflection of how the brain solves navigation. Therefore it is not clear whether the results generalize beyond the specifics of the setup here.

      Strengths:

      The authors introduce a very minimal set of model requirements, assumptions, and constraints. In that sense, the model can function as a useful 'baseline', that shows how spatial representations and remapping properties can emerge from the requirement of path integration and decodability alone. Moreover, the authors use the same formalism to relate their setup to existing spatial navigation models, which is informative.

      The global remapping that the authors show is convincing and well-supported by their analyses. The geometric manipulations and the resulting stretching of place responses, without additional training, are interesting. They seem to suggest that the recurrent network may scale the velocity input by the environment dimensions so that the exact same path integrator-output mappings remain valid (but maybe there are other mechanisms too that achieve the same).

      The clustering of place cell peaks on a triangular lattice is intriguing, given there is no grid cell input. It could have something to do with the fact that a triangular lattice provides optimal coverage of 2d space? The included comparison with empirical data is valuable, although the authors only show significant clustering - there is no analysis of its grid-like regularity.

      First of all, we would like to thank the reviewer for their comprehensive feedback, and their insightful comments. Importantly, as you point out, our goal with this model was to build a minimal model of place cell representations, where representations were encouraged to be place-like, but free to vary in tuning and firing locations. By doing so, we could explore what upstream representations facilitate place-like representations, and even remapping (as it turned out) with minimal assumptions. However, we agree that our task does not capture some of the nuances of real-world navigation, such as sensory observations, which could be useful extensions in future work. Then again, the simplicity of our setup makes it easier to interpret the model, and makes it all the more surprising that it learns many behaviors exhibited by real world place cells.

      As to the distribution of phases - we also agree that a hexagonal arrangement likely reflects some optimal configuration for decoding of location.

      And we agree that the symmetry within the experimental data is important; we have revised analyses on experimental phase distributions, and included an analysis of ensemble grid score, to quantify any hexagonal symmetries within the data.

      Weaknesses:

      The navigation problem that needs to be solved by the model is a bit of an odd one. Without any initial position information, the network needs to figure out where it is, and then path-integrate with respect to a velocity signal. As the authors remark in Methods 4.2, without additional input, the only way to infer location is from border interactions. It is like navigating in absolute darkness. Therefore, it seems likely that the salient wall representations found in the recurrent units are just a consequence of the specific navigation task here; it is unclear if the same would apply in natural navigation. In natural navigation, there are many more sensory cues that help inferring location, most importantly vision, but also smell and whiskers/touch (which provides a more direct wall interaction; here, wall interactions are indirect by constraining velocity vectors). There is a similar but weaker concern about whether the (place cell like) localised firing fields of the output units are a direct consequence of the decoding procedure that only considers activity center locations.

      Thank you for raising this point; we absolutely agree that the navigation task is somewhat niche. However, this was a conscious decision, to minimize any possible confounding from alternate input sources, such as observations. In part, this experimental design was inspired by the suggestion that grid cells support navigation/path integration in open-field environments with minimal sensory input (as they could, conceivably do so with no external input). This also pertains to your other point, that boundary interactions are necessary for navigation. In our model, using boundaries is one solution, but there is another way around this problem, which is conceivably better: to path integrate in an egocentric frame, starting from your initial position. Since the locations of place fields are inferred only after a trajectory has been traversed, the network is free to create a new or shifted representation every time, independently of the arena. In this case, one might have expected generalized solutions, such as grid cells to emerge. That this is not the case, seems to suggest that grid cells may somehow not be optimal for pure path integration, or at the very least, hard to learn (but may still play a part, as alluded to by place field locations). We have tried to make these points more evident in the revised manuscript.

      As for the point that the decoding may lead to place-like representations, this is a fair point. Indeed, we did choose this form of decoding, inspired by the localized firing of place cells, in the hope that it would encourage minimally constrained, place-like solutions. However, compared to other works (Sorscher and Xu) hand tuning the functional form of their place cells, our (although biased towards centralized tuning curves) allows for flexible functional forms such as the position of the place cell centers, their tuning width, whether or not it is center-surround activity, and how they should tune to different environments/rooms. This allows us to study several features of the place cell system, such as remapping and field formation. We have revised to make this more clear in the model description.

      The conclusion that 'contexts are attractive' (heading of section 2) is not well-supported. The authors show 'attractor-like behaviour' within a single context, but there could be alternative explanations for the recovery of stable ratemaps after noise injection. For example, the noise injection could scramble the network's currently inferred position, so that it would need to re-infer its position from boundary interactions along the trajectory. In that case the stabilisation would be driven by the input, not just internal attractor dynamics. Moreover, the authors show that different contexts occupy different regions in the space of low-dimensional projections of recurrent activity, but not that these regions are attractive.

      We agree that boundary interactions could facilitate the convergence of representations after noise injection. We did try to moderate this claim by the wording “attractor-like”, but we agree that boundaries could confound this result. We have therefore performed a modified noise injection experiment, where we let the network run for an extended period of time, before noise injection (and no velocity signal), see Appendix Velocity Ablation in the revised text. Notably, representations converge to their pre-scrambled state after noise injection, even without a velocity signal. However, place-like representations do not converge for all noise levels in this case, possibly indicating that boundary interactions do serve an error-correcting function, also. Thank you for pointing this out.

      As for the attractiveness of contexts, we agree that more analyses were required to demonstrate this. We have therefore conducted a supplementary analysis where we run the trained network with a mismatch in context/geometry, and demonstrate that the context signal fixes the representation, up to geometric distortions.

      The authors report empirical data that shows clustering of place cell centers like they find for their output units. They report that 'there appears to be a tendency for the clusters to arrange in hexagonal fashion, similar to our computational findings'. They only quantify the clustering, but not the arrangement. Moreover, in Figure 7e they only plot data from a single animal, then plot all other animals in the supplementary. Does the analysis of Fig 7f include all animals, or just the one for which the data is plotted in 7e? If so, why that animal? As Appendix C mentions that the ratemap for the plotted animal 'has a hexagonal resemblance' whereas other have 'no clear pattern in their center arrangements', it feels like cherrypicking to only analyse one animal without further justification.

      Thank you for pointing this out; we agree that this is not sufficiently explained and explored in the current version. We have therefore conducted a grid score analysis of the experimental place center distributions, to uncover possible hexagonal symmetries. The reason for choosing this particular animal was in part because it featured the largest number of included cells, while also demonstrating the most striking phase distribution, while including all distributions in the supplementary. Originally, this was only intended as a preliminary analysis, suggesting non-uniformity in experimental place field distributions, but we realize that these may all provide interesting insight into the distributional properties of place cells.

      We have explained these choices in the revised text, and expanded analyses on all animals to showcase these results more clearly.

      Reviewer #2 (Public Review):

      Summary:

      The authors proposed a neural network model to explore the spatial representations of the hippocampal CA1 and entorhinal cortex (EC) and the remapping of these representations when multiple environments are learned. The model consists of a recurrent network and output units (a decoder) mimicking the EC and CA1, respectively. The major results of this study are: the EC network generates cells with their receptive fields tuned to a border of the arena; decoder develops neuron clusters arranged in a hexagonal lattice. Thus, the model accounts for entorhinal border cells and CA1 place cells. The authors also suggested the remapping of place cells occurs between different environments through state transitions corresponding to unstable dynamical modes in the recurrent network.

      Strengths:

      The authors found a spatial arrangement of receptive fields similar to their model's prediction in experimental data recorded from CA1. Thus, the model proposes a plausible mechanisms to generate hippocampal spatial representations without relying on grid cells. This result is consistent with the observation that grid cells are unnecessary to generate CA1 place cells.

      The suggestion about the remapping mechanism shows an interesting theoretical possibility.

      We thank the reviewer for their kind feedback.

      Weaknesses:

      The explicit mechanisms of generating border cells and place cells and those underlying remapping were not clarified at a satisfactory level.

      The model cannot generate entorhinal grid cells. Therefore, how the proposed model is integrated into the entire picture of the hippocampal mechanism of memory processing remains elusive.

      We appreciate this point, and hope to clarify: From a purely architectural perspective, place-like representations are generated by linear combinations of recurrent unit representations, which, after training, appear border-like. During remapping, the network is simply evaluated/run in different geometries/contexts, which, it turns out, causes the network to exhibit different representations, likely as solutions to optimally encoding position in the different environments. We have attempted to revise the text to make some of these interpretations more clear. We have also conducted a supplementary analysis to demonstrate how representations are determined by the context signal directly, which helps to explain how recurrent and output units form their representations.

      We also agree that our model does not capture the full complexity of the Hippocampal formation. However, we would argue that its simplicity (focusing on a single cell type and a pure path integration task), acts as a useful baseline for studying the role of place cells during spatial navigation. The fact that our model captures a range of place cell behaviors (field formation, remapping and geometric deformation) without grid cells also point to several interesting possibilities, such that grid cells may not be strictly necessary for place cell formation and remapping, or that border cells may account for many of the peculiar behaviors of place cells. However, we wholeheartedly agree that including e.g. sensory information and memory storage/retrieval tasks would prove a very interesting extension of our model to more naturalistic tasks and settings. In fact, our framework could easily accommodate this, e.g. by decoding contexts/observations/memories from the network state, alongside location.

      Reviewer #3 (Public Review):

      Summary:

      The authors used recurrent neural network modelling of spatial navigation tasks to investigate border and place cell behaviour during remapping phenomena.

      Strengths:

      The neural network training seemed for the most part (see comments later) well-performed, and the analyses used to make the points were thorough.

      The paper and ideas were well explained.

      Figure 4 contained some interesting and strong evidence for map-like generalisation as environmental geometry was warped.

      Figure 7 was striking, and potentially very interesting.

      It was impressive that the RNN path-integration error stayed low for so long (Fig A1), given that normally networks that only work with dead-reckoning have errors that compound. I would have loved to know how the network was doing this, given that borders did not provide sensory input to the network. I could not think of many other plausible explanations... It would be even more impressive if it was preserved when the network was slightly noisy.

      Thank you for your insightful comments! Regarding the low path integration error, there is a slight statistical signal from the boundaries, as trajectories tend to turn away from arena boundaries. However, we agree, that studying path integration performance in the face of noise would make for a very interesting future development.

      Weaknesses:

      I felt that the stated neuroscience interpretations were not well supported by the presented evidence, for a few reasons I'll now detail.

      First, I was unconvinced by the interpretation of the reported recurrent cells as border cells. An equally likely hypothesis seemed to be that they were positions cells that are linearly encoding the x and y position, which when your environment only contains external linear boundaries, look the same. As in figure 4, in environments with internal boundaries the cells do not encode them, they encode (x,y) position. Further, if I'm not misunderstanding, there is, throughout, a confusing case of broken symmetry. The cells appear to code not for any random linear direction, but for either the x or y axis (i.e. there are x cells and y cells). These look like border cells in environments in which the boundaries are external only, and align with the axes (like square and rectangular ones), but the same also appears to be true in the rotationally symmetric circular environment, which strikes me as very odd. I can't think of a good reason why the cells in circular environments should care about the particular choice of (x,y) axes... unless the choice of position encoding scheme is leaking influence throughout. A good test of these would be differently oriented (45 degree rotated square) or more geometrically complicated (two diamonds connected) environments in which the difference between a pure (x,y) code and a border code are more obvious.

      Thank you for pointing this out. This is an excellent point, that we agree could be addressed more rigorously. Note that there is no position encoding in our model; the initial state of the network is a vector of zeros, and the network must infer its location from boundary interactions and context information alone. So there is no way for positional information to leak through to the recurrent layer directly. However, one possible reason for the observed symmetry breaking, is the fact that the velocity input signal is aligned with the cardinal directions. To investigate this, we trained a new model, wherein input velocities are rotated 45 degrees relative to the horizontal, as you suggest. The results, shown and discussed in appendix E (Learned recurrent representations align with environment boundaries), do indicate that representations are tuned to environment boundaries, and not the cardinal directions, which hopefully improves upon this point.

      Next, the decoding mechanism used seems to have forced the representation to learn place cells (no other cell type is going to be usefully decodable?). That is, in itself, not a problem. It just changes the interpretation of the results. To be a normative interpretation for place cells you need to show some evidence that this decoding mechanism is relevant for the brain, since this seems to be where they are coming from in this model. Instead, this is a model with place cells built into it, which can then be used for studying things like remapping, which is a reasonable stance.

      This is a great point, and we agree. We do write that we perform this encoding to encourage minimally constrained place-like representations (to study their properties), but we have revised to make this more evident.

      However, the remapping results were also puzzling. The authors present convincing evidence that the recurrent units effectively form 6 different maps of the 6 different environments (e.g. the sparsity of the code, or fig 6a), with the place cells remapping between environments. Yet, as the authors point out, in neural data the finding is that some cells generalise their co-firing patterns across environments (e.g. grid cells, border cells), while place cells remap, making it unclear what correspondence to make between the authors network and the brain. There are existing normative models that capture both entorhinal's consistent and hippocampus' less consistent neural remapping behaviour (Whittington et al. and probably others), what have we then learnt from this exercise?

      Thanks for raising this point! We agree that this finding is surprising, but we hold that it actually shows something quite important: that border-type units are sufficient to create place-like representations, and learns several of the behaviors associated with place cells and remapping (including global remapping and field stretching). In other words, a single cell type known to exist upstream of place cells is sufficient to explain a surprising range of phenomena, demonstrating that other cell types are not strictly necessary. However, we agree that understanding why the boundary type units sometimes rate remap, and whether that can be true for some border type cells in the brain (either directly, or through gating mechanisms) would be important future developments. Related to this point, we also expanded upon the influence of the context signal for representation selection (appendix F)

      Concerning the relationship to other models, we would argue that the simplicity of our model is one of its core strengths, making it possible to disentangle what different cell types are doing. While other models, including TEM, are highly important for understanding how different cell types and brain regions interact to solve complex problems, we believe there is a need for minimal, understandable models that allows us to investigate what each cell type is doing, and this is where we believe our work is important. As an example, our model not only highlights the sufficiency of boundary-type cells as generators of place cells, its lack of e.g. grid cells also suggest that grid cells may not be strictly necessary for e.g. open-field/sensory-deprived navigation, as is often claimed.

      One striking result was figure 7, the hexagonal arrangement of place cell centres. I had one question that I couldn't find the answer to in the paper, which would change my interpretation. Are place cell centres within a single clusters of points in figure 7a, for example, from one cell across the 100 trajectories, or from many? If each cluster belongs to a different place cell then the interpretation seems like some kind of optimal packing/coding of 2D space by a set of place cells, an interesting prediction. If multiple place cells fall within a single cluster then that's a very puzzling suggestion about the grouping of place cells into these discrete clusters. From figure 7c I guess that the former is the likely interpretation, from the fact that clusters appear to maintain the same colour, and are unlikely to be co-remapping place cells, but I would like to know for sure!

      This is a good point, and you are correct: one cluster tends to correspond to one unit. To make this more clear, we have revised Fig. 7, so that each decoded center is shaded by unit identity, which makes this more evident. And yes, this is, seemingly in line with some form of optimal packing/encoding of space, yes!

      I felt that the neural data analysis was unconvincing. Most notably, the statistical effect was found in only one of seven animals. Random noise is likely to pass statistical tests 1 in 20 times (at 0.05 p value), this seems like it could have been something similar? Further, the data was compared to a null model in which place cell fields were randomly distributed. The authors claim place cell fields have two properties that the random model doesn't (1) clustering to edges (as experimentally reported) and (2) much more provocatively, a hexagonal lattice arrangement. The test seems to collude the two; I think that nearby ball radii could be overrepresented, as in figure 7f, due to either effect. I would have liked to see a computation of the statistic for a null model in which place cells were random but with a bias towards to boundaries of the environment that matches the observed changing density, to distinguish these two hypotheses.

      Thanks for raising this point. We agree that we were not clear enough in our original manuscript. We included additional analyses in one animal, to showcase one preliminary case of non-uniform phases. To mitigate this, we have performed the same analyses for all animals, and included a longer discussion of these results (included in the supplementary material). We have also moderated the discussion on Ripley’s H to encompass only non-uniformity, and added a grid score analysis to showcase possible rotational symmetries in the data. We hope this gets our findings across more clearly

      Some smaller weaknesses:

      - Had the models trained to convergence? From the loss plot it seemed like not, and when including regularisors recent work (grokking phenomena, e.g. Nanda et al. 2023) has shown the importance of letting the regularisor minimise completely to see the resulting effect. Else you are interpreting representations that are likely still being learnt, a dangerous business.

      Longer training time did not seem to affect representations. However, due to the long trajectories and statefulness involved, training was time-intensive and could become unstable for very long training. We therefore stopped training at the indicated time.

      - Since RNNs are nonlinear it seems that eigenvalues larger than 1 doesn't necessarily mean unstable?

      This is a good point; stability is not guaranteed. We have updated the text to reflect this.

      - Why do you not include a bias in the networks? ReLU networks without bias are not universal function approximators, so it is a real change in architecture that doesn't seem to have any positives?

      We found that bias tended to have a detrimental effect on training, possibly related to the identity initialization used (see e.g. Le et al. 2015), and found that training improved when biases were fixed to zero.

      - The claim that this work provided a mathematical formalism of the intuitive idea of a cognitive map seems strange, given that upwards of 10 of the works this paper cite also mathematically formalise a cognitive map into a similar integration loss for a neural network.

      We agree that other works also provide ways of formalizing this concepts. However, our goal by doing so was to elucidate common features across these seemingly disparate models. We also found that the concept of a learned and target map made it easier to come up with novel models, such as one wherein place cells are constructed to match a grid cell label.

      Aim Achieved? Impact/Utility/Context of Work

      Given the listed weaknesses, I think this was a thorough exploration of how this network with these losses is able to path-integrate its position and remap. This is useful, it is good to know how another neural network with slightly different constraints learns to perform these behaviours. That said, I do not think the link to neuroscience was convincing, and as such, it has not achieved its stated aim of explaining these phenomena in biology. The mechanism for remapping in the entorhinal module seemed fundamentally different to the brain's, instead using completely disjoint maps; the recurrent cell types described seemed to match no described cell type (no bad thing in itself, but it does limit the permissible neuroscience claims) either in tuning or remapping properties, with a potentially worrying link between an arbitrary encoding choice and the responses; and the striking place cell prediction was unconvincingly matched by neural data. Further, this is a busy field in which many remapping results have been shown before by similar models, limiting the impact of this work. For example, George et al. and Whittington et al. show remapping of place cells across environments; Whittington et al. study remapping of entorhinal codes; and Rajkumar Vasudeva et al. 2022 show similar place cell stretching results under environmental shifts. As such, this papers contribution is muddied significantly.

      Thank you for this perspective; we agree that all of these are important works that arrive at complementary findings. We hold that the importance of our paper lies in its minimal nature, and its focus on place cells, via a purpose-built decoding that enables place-like representations. In doing so, we can point to possibly under explored relationships between cell types, in particular place cells and border cells, while challenging the necessity of other cell types for open-field navigation (i.e. grid cells). In addition, our work points to a novel connection between grid cells, place cells and even border cells, by way of the hexagonal arrangement of place unit centers. However, we agree that expanding our model to include more biologically plausible architectures and constraints would make for a very interesting extension in the future.

      Thank you again for your time, as well as insightful comments.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Even after reading Methods 5.3, I found it hard to understand how the ratemap population vectors that produce Fig 3e and Fig 5 are calculated. It's unclear to me how there can be a ratemap at a single timestep, because calculating a ratemap involves averaging the activity in each location, which would take a whole trajectory and not a single timestep. But I think I've understood from Methods 5.1 that instead the ratemap is calculated by running multiple 'simultaneous' trajectories, so that there are many visited locations at each timestep. That's a bit confusing because as far as I know it's not a common way to calculate ratemaps in rodent experiments (probably because it would be hard to repeat the same task 500 times, while the representations remain the same), so it might be worth explaining more in Methods 5.3.

      We understand the confusion, and have attempted to make this more clear in the revised manuscript. We did indeed create ratemaps over many trajectories for time-dependent plots, for the reasons you mentioned. We also agree that this would be difficult to do experimentally, but found it an interesting way to observe convergence of representations in our simulated scenario.

      Fig 3b-d shows multiple analyses to support output unit global remapping, but no analysis to support the claim that recurrent units remap by rate changes. The examples in Fig 3ai look pretty convincing, but it would be useful to also have a more quantitative result.

      We agree, and only showed that units turn off/become silent using ratemaps. We have therefore added an explicit analysis, showcasing rate remapping in recurrent units (see appendix G; Recurrent units rate remap)

      Reviewer #2 (Recommendations For The Authors):

      Some parts of the current manuscript are hard to follow. Particularly, the model description is not transparent enough. See below for the details.

      Major comments:

      (1) Mathematical models should be explained more explicitly and carefully. I had to guess or desperately search for the definitions of parameters. For instance, define the loss function L in eq.(1). Though I can assume L represents the least square error (in A.8), I could not find the definition in Model & Objective. N should also be defined explicitly in equation (3). Is this the number of output cells?

      Thank you for pointing this out, we have revised to make it more clear.

      (2) In Fig. 1d, how were the velocity and context inputs given to individual neurons in the network? The information may be described in the Methods, but I could not identify it.

      This was described in the methods section (Neural Network Architecture and Training), but we realize that we used confusing notation, when comparing with Fig. 1d. We have therefore changed the notation, and it should hopefully be clearer now. Thanks for pointing out this discrepancy.

      (3) I took a while to understand equations (3) and (4) (for instance, t is not defined here). The manuscript would be easier to read if equations (5) and (6) are explained in the main text but not on page 18 (indeed, these equations are just copies of equations 3 and 4). Otherwise, the authors may replace equations (3) and (4) with verbal explanations similar to figure legend for Fig. 1b.

      (4) Is there any experimental evidence for uniformly strong EC-to-CA1 projections assumed in the non-trainable decoder? This point should be briefly mentioned.

      Thank you for raising this point. The decoding from EC (the RNN) to CA1 (the output layer) consists of a trainable weight matrix, and may thus be non-uniform in magnitude. The non-trainable decoding acts on the resulting “CA1” representation only. We hope that improvements to the model description also makes this more evident.  

      (5) The explanation of Fig. 3 in the main text is difficult to follow because subpanels are explained in separate paragraphs, some of which are very short, as short as just a few lines.

      This presentation style makes it difficult to follow the logical relationships between the subpanels. This writing style is obeyed throughout the manuscript but is not popular in neuroscience.

      Thanks for pointing this out, we have revised to accommodate this.

      (6) Why do field centers cluster near boundaries? No underlying mechanisms are discussed in the manuscript.

      This is a good point; we have added a note on this; it likely reflects the border tuning of upstream units.

      (7) In Fig. 4, the authors presented how cognitive maps may vary when the shape and size of open arenas are modified. The results would be more interesting if the authors explained the remapping mechanism. For instance, on page 8, the authors mentioned that output units exhibit global remapping between contexts, whereas recurrent units mainly rate remapping.

      Why do such representational differences emerge?

      We agree! Thanks for raising this point. We have therefore expanded upon this discussion in section 2.4.

      (8) In the first paragraph of page 10, the authors stated ".. some output units display distinct field doubling (see both Fig. 4c), bottom right, and Fig. 4d), middle row)". I could not understand how Fig. 4d, middle row supports the argument. Similarly, they stated "..some output units reflect their main boundary input (with greater activity near one boundary)." I can neither understand what the authors mean to say nor which figures support the statement. Please clarify.

      This is a good point, there was an identifier missing; we have updated to refer to the correct “magnification”. Thanks!

      (9) The underlying mechanism of generating the hexagonal representation of output cells remains unclear. The decoder network uses a non-trainable decoding scheme based on localized firing patterns of output units. To what extent does the hexagonal representation depend on the particular decoding scheme? Similarly, how does the emergence of the hexagonal representation rely on the border representation in the upstream recurrent network? Showing several snapshots of the two place representations during learning may answer these questions.

      This is an interesting point, and we have added some discussion on this matter. In particular, we speculate whether it’s an optimal configuration for position reconstruction, which is demanded by the task and thus highly likely dependent on the decoding scheme. We have not reached a conclusive method to determine the explicit dependence of the hexagonal arrangement on the choice of decoding scheme. Still, it seems this would require comparison with other schemes. In our framework, this would require changing the fundamental operation of the model, which we leave as inspiration for future work. We have also added additional discussion concerning the relationship between place units, border units, and remapping in our model. As for exploring different training snapshots, the model is randomly initialized, which suggests that earlier training steps should tend to reveal unorganized/uninformative phase arrangements, as phases are learned as a way of optimizing position reconstruction. However, we do call for more analysis of experimental data to determine whether this is true in animals, which would strongly support this observation. We also hope that our work inspires other models studying the formation and remapping of place cells, which could serve as a starting point for answering this question in the future.

      (10) Figure 7 requires a title including the word "hexagonal" to make it easier to find the results demonstrating the hexagonal representations. In addition, please clarify which networks, p or g, gave the results shown here.

      We agree, and have added it!

      Minor comments:

      (11) In many paragraphs, conclusions appear near their ends. Stating the conclusion at the beginning of each paragraph whenever possible will improve the readability.

      We have made several rewrites to the manuscript, and hope this improves readability.

      (12) Figure A4 is important as it shows evidence of the CA1 spatial representation predicted by the model. However, I could not find where the figure is cited in the manuscript. The authors can consider showing this figure in the main text.

      We agree, and we have added more references to the experimental data analyses in the main text, as well as expanded this analysis.

      (13) The main text cites figures in the following format: "... rate mapping of Fig. 3a), i), boundary ...." The parentheses make reading difficult.

      We have removed the overly stringent use of double parentheses, thanks for letting us know.

      (14) It would be nice if the authors briefly explained the concept of Ripley's H function on page 14.

      Yes, we have added a brief descriptor.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Review 1:

      Weaknesses:

      The weaknesses of the study also stem from the methodological approach, particularly the use of whole-brain Calcium imaging as a measure of brain activity. While epilepsy and seizures involve network interactions, they typically do not originate across the entire brain simultaneously. Seizures often begin in specific regions or even within specific populations of neurons within those regions. Therefore, a whole-brain approach, especially with Calcium imaging with inherited limitations, may not fully capture the localized nature of seizure initiation and propagation, potentially limiting the understanding of Galanin's role in epilepsy.

      We agree with the reviewers that the whole brain imaging approach is both a strength and a weakness. This manuscript and our previously published paper (Hotz et al., 2022) show indeed that the seizures have a initiation point and spread throughout the brain, interestingly affecting the telencephalon last. Localized seizure initiation was not the scope of this manuscript, however also here we would have to rely on imaging techniques. Using cell type specific drivers for specific neuronal subpopulation are an interesting approach, but outside of the scope of this study. An interesting approach would also include a more detailed analysis of glia in the context of epilepsy.

      Furthermore, Galanin's effects may vary across different brain areas, likely influenced by the predominant receptor types expressed in those regions. Additionally, the use of PTZ as a "stressor" is questionable since PTZ induces seizures rather than conventional stress. Referring to seizures induced by PTZ as "stress" might be a misinterpretation intended to fit the proposed model of stress regulation by receptors other than Galanin receptor 1 (GalR1).

      We also agree, that a more regional approach, after having more reliable information on the expression domains of the different galanin receptors, including more information on their respective role, is an important future research direction.

      The description of the EAAT2 mutants is missing crucial details. EAAT2 plays a significant role in the uptake of glutamate from the synaptic cleft, thereby regulating excitatory neurotransmission and preventing excitotoxicity. Authors suggest that in EAAT2 knockout (KO) mice galanin expression is upregulated 15-fold compared to wild-type (WT) mice, which could be interpreted as galanin playing a role in the hypoactivity observed in these animals.

      However, the study does not explore the misregulation of other genes that could be contributing to the observed phenotype. For instance, if AMPA receptors are significantly downregulated, or if there are alterations in other genes critical for brain activity, these changes could be more important than the upregulation of galanin. The lack of wider gene expression analysis leaves open the possibility that the observed hypoactivity could be due to factors other than, or in addition to, galanin upregulation.

      We are in the process of preparing a manuscript describing a more detailed gene expression study of this and a chemically induced seizure model. Surprisingly we did not observe strong effects on glutamate receptor related genes. This does not preclude and indeed we deem it likely that additional factors play a role, e.g. other neuropeptides.

      Moreover, the observation that in double KO mice for both EAAT2 and galanin there was little difference in seizure susceptibility compared to EAAT2 KO mice alone further supports the idea that galanin upregulation might not be the reason to the observed phenotype. This indicates that other regulatory mechanisms or gene expressions might be playing a more pivotal role in the manifestation of hypoactivity in EAAT2 mutants.

      Yes, we agree that galanin is likely not the only player. This warrants further investigations.

      These methodological shortcomings and conceptual inconsistencies undermine the perceived strengths of the study, and hinders understanding of Galanin's role in epilepsy and stress regulation.

      Review 2:

      Previous concerns about sex or developmental biological variables were addressed, as their model's seizure phenotype emerges rapidly and long prior to the establishment of zebrafish sexual maturity. However, in the course of re-review, some additional concerns (below) were detected that, if addressed, could further improve the manuscript. These concerns relate to how seizures were defined from the measurement of fluorescent calcium imaging data. Overall, this study is important and convincing, and carries clear value for understanding the multifaceted functions that neuronal galanin can perform under homeostatic and disease conditions.

      We are pleased that we could dispel the initial concerns.

      Additional Concerns:

      - The authors have validated their ability to measure behavioral seizures quantitatively in their 2022 Glia paper but the information provided on defining behavioral seizures was limited. The definition of behavioral seizure activity is not expanded upon in this paper, but could provide detail about how the behavioral seizures relate to a seizure detected via calcium imaging.

      In this paper we indeed do not address behavioral seizures but focus completely on neuronal seizures as defined in the material and methods section (“seizures were defined as calcium fluctuations reaching at least 100% of ΔF/F0 in the whole brain.”). Epileptic seizures in zebrafish, either evoked by pharmacological means or the result of genetic mutations, evoke stereotyped locomotor behavior in zebrafish as described in multiple publications (e.g. Baraban et al., 2005, Berghmans et al., 2007, Baxendale et al., 2012 and references therein).

      - Related to the previous point, for the calcium imaging, the difference between an increase in fluorescence that the authors think reflects increased neuronal activity and the fluorescence that corresponds to seizures is not very clear. This detail is necessary because exactly when the term "seizure" describes a degree of increased activity can be difficult to distinguish objectively.

      In our material and methods section, we describe our working definition of a seizure. Seizures are easily distinguished from increased activity by being synchronized.

      - The supplementary movies that were added were very useful, but raised some questions. For example, what brain regions were pulsating? What areas seemed to constantly exhibit strong fluorescence and was this an artifact? It seemed that sometimes there was background fluorescence in the body. Perhaps an anatomical diagram could be provided for the readers. In addition, there were some movies with much greater fluorescence changes - are these the seizures? These are some reasons for our request for clarified definitions of the term "seizure".

      The ”pulsating” (or “flickering”) brain activity is spontaneous neuronal activity. Some areas may appear to be more active, probably by a denser packing of neurons and intrinsically more spontaneous neuronal activity. However, since we only use normalized data, this does not affect our measurements.

      - While it is not critical to change, I will again note the possible confusion that the use of the word "sedative" in this context may cause. However, I do understand this is a stylistic choice.

      - Supplementary Figure 1B: the N values along the x-axis appear to have been duplicated and the duplications are offset and overlapping with one another by mistake.

      Thank you for pointing this out. We have corrected the figure accordingly.

      Review 3:

      (1) Although the relationship between galanin and brain activity during interictal or seizure-free periods was clear, the revised manuscript still lacks mechanistic insight in the role of galanin during seizure-like activity induced by PTZ.

      We agree that the mechanistic role of galanin still needs to be defined. The role is more complex that we expected, mainly due to its negative feedback properties. A complete mechanistic understanding will require a number of additional studies and is unfortunately outside of the scope of this manuscript.

      (2) The revised manuscript continues to heavily rely on calcium imaging of different mutant lines. Confirmation of knockouts has been provided with immunostaining in a new supplementary figure. Additional methods could strengthen the data, translational relevance, and interpretation (e.g., acute pharmacology using galanin agonists or antagonists, brain or cell recordings, biochemistry, etc).

      Cell recordings and biochemistry is challenging in the small larval zebrafish brain. We deem the genetic manipulations that we describe to be more informative than pharmacological experiments due to specificity issues.

    1. Author response

      eLife Assessment

      The authors investigated KLF Transcription Factor 16 (KLF16) as an inhibitor of osteogenic differentiation, which plays a critical role in bone development, metabolism and repair. The results of the study are valuable as they could help to facilitate future research on the regulation of osteogenesis in vitro and in vivo. However, the evidence overall is incomplete, as validation by knockout mouse models would help to strengthen the conclusions.

      We appreciate the editors’ evaluation and recognition of the importance of our research. The primary goal and value of our study is to provide robust bioinformatics analyses of 20 independent iPSC lines, which can lead to the identification of novel genes involved in osteogenic differentiation. The identification of KLF16 serves to illustrate this goal. A thorough analysis of the function of any single gene both in vitro and in vivo is beyond the initial scope of this study. To validate KLF16’s inhibitory role in osteogenic differentiation, we provided evidence showing overexpression of Klf16 suppressed osteogenic differentiation in vitro, and Klf16<sup>+/-</sup> mice exhibited enhanced bone mineral content and density in vivo.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Ru and colleagues investigated regulatory gene interactions during osteogenic differentiation. By profiling transcriptomic changes during mesenchymal stem cell differentiation, they identified KLF16 as a key transcription factor that inhibits osteogenic differentiation and mineralization. It was found that overexpression of KLF16 suppressed osteogenesis in vitro, while Klf16<sup>+/-</sup> mice exhibited enhanced bone density, underscoring its regulatory role in bone formation.

      Strengths:

      (1) Bioinformatics is strong and comprehensive.

      (2) Identification of KLF16 in osteoblast differentiation is exciting and innovative.

      We appreciate the reviewer’s comments on our bioinformatic analyses of MSC osteogenic differentiation and the identification of KLF16 as a new osteogenesis regulator. The differentiation of iPSC-derived MSCs to OBs serves as a valuable model for investigating gene expression and regulatory networks in osteogenic differentiation. This study provides insights into the complex and dynamic regulation of the transcriptomic landscape in osteogenic differentiation and supplies a foundational resource for additional investigation into normal bone formation and the mechanisms underlying pathological conditions.

      Weaknesses:

      (1) The mechanism of KLF16 function is not studied.

      (2) Studies of KLF16 in bone development, from both in vitro and in vivo perspectives, are descriptive.

      Our study aims to apply rigorous bioinformatic analyses of 20 iPSC lines to identify novel genes involved in osteogenic differentiation. With this strategy, we successfully identified KLF16 as a regulator of osteogenic differentiation. We validated this with both in vitro and in vivo models even though we had limited availability of Klf16 knockout mice when the study was conducted. We demonstrated that overexpression of Klf16 suppressed osteogenesis in vitro, while Klf16<sup>+/-</sup> mice exhibited increased bone mineral density, trabecular number, and cortical bone area, highlighting its role in bone formation. With these mice now available, further investigation into the mechanism of KLF16's function is possible.

      (3) Findings in bioinformatics analysis are mostly redundant with previous studies in the field, and can be simplified.

      We compared our bulk RNA-seq data with our previously published single-cell RNA-seq (scRNA-seq) data generated from iPSC-induced cells during osteogenic differentiation (Housman et al., 2022). The purpose is to corroborate the expression patterns of the genes we focused on during osteogenic differentiation. We found similar differential expression patterns in a pseudobulk analysis of the scRNA-seq data, even though there are significant differences between these two studies, including: cell culture conditions, sequencing approaches (bulk vs. single cell), goals of the studies (key TF drivers of osteoblast differentiation vs. mapping differentiation stages and inter-species gene programs in human and chimp), and findings (identification of TFs vs. identification of interspecific regulatory differences) .

      Importantly, we performed network analyses to identify key transcription factors, which were not redundant with previous studies. We constructed a transcription factor regulatory network analysis during human osteogenic differentiation, and identified a network organized into five interactive modules. The most exciting finding was the identification of KLF16 as one of the strongest regulators in Module 5 (Figure 3), which previously was not demonstrated to be involved in bone formation. We also demonstrated known TF genes regulating osteogenic differentiation in these modules, and performed gene ontology (GO) and reactome pathway (RP) analyses for regulatory functions and pathways specific to each module. To clarify that our findings do not overlap with previous studies, we will revise the manuscript focusing on Module 5 and simplify the description of the bioinformatics analysis as the reviewer suggested.

      Reviewer #2 (Public review):

      In their manuscript with the title "Integrated transcriptomic analysis of human induced pluripotent stem cell (iPSC)-derived osteogenic differentiation reveals a regulatory role of KLF16", Ru et al. have analyzed the gene expression changes during the osteogenic differentiation of iPSC-derived mesenchymal stem/stromal cells into preosteoblasts and osteoblasts. As part of the computational analyses, they have investigated the transcription factor regulatory network mediating this differentiation process, which has also led to the identification of the transcription factor KLF16. Overexpression experiments in vitro and the analysis of heterozygous KLF16 knockout mice in vivo indicate that KLF16 is an inhibitor of osteogenic differentiation.

      The integrated analysis of iPSC bulk transcriptomic data is a major strength of the study, and it is also great that the authors provide deeper functional characterization of the transcription factor KLF16, one of the newly identified candidate regulators of osteogenic differentiation.

      We appreciate the reviewer’s summary and comments on the strength of our bioinformatic analyses of iPSC/MSC osteogenic differentiation and the deep functional characterization of the KLF16, as well as the novelty of our findings.

      However, characterization of KLF16 expression in the mouse and validation of the knockout model are currently lacking. Alternative explanations for the mutant phenotype should be considered to improve the strength of the conclusions.

      If all issues can be addressed, the study would provide an important resource for the field that would facilitate future research on the regulation of osteogenesis in vitro and in vivo, with potential implications for preclinical and clinical research as well as bioengineering.

      We appreciate the reviewer’s valuable suggestions. Klf16 is highly expressed in mandibular, maxillary and tail mesenchyme at embryonic Day 12 (D'Souza et al., 2002), indicating its role in early bone development. We will further characterize the expression of Klf16 in mice, especially in the developing bones.

      We identified Klf16 as a potential regulator of osteogenic differentiation, and then validated this with both in vitro and in vivo models. Overexpression of Klf16 suppressed osteogenesis in vitro, and Klf16<sup>+/-</sup> mice showed increased bone mineral content and density, indicating its regulatory role in bone formation. We agree with the reviewer that the bone phenotypes of Klf16 knockout mice potentially can be affected by other factors in addition to osteogenic differentiation. As both bone formation and resorption are critical for bone development, we evaluated osteoclastogenesis in the Klf16<sup>+/-</sup> mice by analyzing the expression of osteoclast marker CALCR and regulator RANKL in the femurs of the Klf16<sup>+/-</sup> mice. Neither CALCR nor RANKL decreased in the bone of Klf16<sup>+/-</sup> mice, indicating that osteoclastogenesis is not decreased; therefore, increased bone mineral content and density in the mutant mice is more likely attributed to enhanced bone formation rather than reduced resorption by osteoclasts. Additionally, we will discuss other alternative explanations for the bone phenotypes of Klf16 knockout mice as suggested by the reviewer.

      References

      D'Souza, U. M., Lammers, C.-H., Hwang, C. K., Yajima, S. and Mouradian, M. M. (2002). Developmental expression of the zinc finger transcription factor DRRF (dopamine receptor regulating factor). Mechanisms of Development 110, 197-201.

      Housman, G., Briscoe, E. and Gilad, Y. (2022). Evolutionary insights into primate skeletal gene regulation using a comparative cell culture model. PLOS Genetics 18, e1010073-e1010073.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank all the reviewers for their time and valuable feedback, which helped us improve our manuscript. Based on the comments, we have made several critical changes to the revised manuscript.

      (1) We have changed our threshold for detecting freezing epochs from 1 cm/s to 0 cm/s in this revised manuscript. This change allows us to capture periods when animals are completely still on the treadmill, better matching the "true freezing" behavior seen in freely moving set-ups. We have added a new supplementary video (Supplementary Video 2) that better demonstrates the freezing response we observe. All results and figures in the revised manuscript reflect this updated threshold (Figure 2-6, Supplementary Figures 16, Tables 1-6). Our main findings remain robust, demonstrating that freezing serves as a reliable conditioned response in our paradigms, comparable to freely moving animals. Specifically, freezing behavior increased reliably in the fear-conditioned environment following CFC across all paradigms. We have also added data from a no-shock control group (Supplementary Figure 2) which, when compared to the conditioned group, shows that freezing responses in the conditioned group result from fear conditioning rather than immobility. We do observe other avoidance behaviors unique to our treadmill-based task— such as hesitation, backward movement, and slow crawls. These conditioned behaviors are captured through a separate metric: the time taken to complete a lap.

      (2) As suggested by the reviewers, we have separately analyzed fear discrimination and extinction dynamics across recall days (Supplementary Figures 2, 5 and 6, Table 1-6). To assess fear discrimination, we use within-group comparisons to evaluate how well animals differentiate between the two VRs across days. For extinction, we use within-VR comparisons to examine freezing dynamics over time. Freezing across recall days is compared to baseline freezing (pre-conditioning) using a Linear Mixed Effects model (Tables 1-6), with recall days as fixed effects and mouse as a random effect, using baseline freezing as the reference.

      (3) We have expanded the behavioral dataset in Paradigm 1 to investigate the effect of shock amplitude on the conditioned fear response (Supplementary Figure 2 C-E). Consistent with findings in freely moving animals, our data show that increasing shock intensity from 0.6 mA to 1.0 mA leads to stronger freezing. For the revised manuscript, we specifically increased the sample size in the 0.6 mA group (n = 8) in Paradigm 1, as this intensity is used in Paradigm 3. These additional data demonstrate that combining a lower shock amplitude with shorter inter-shock intervals and retaining the tail-coat during recall can enhance freezing, suggesting that these parameters help compensate for lower shock intensity.

      (4) We have added more sample sizes to the imaging dataset (now n = 8, Figures 7-8).

      Finally, we acknowledge that many aspects of this paradigm still require optimization. The headfixed CFC paradigm is in its early stages compared to the decades of research dedicated to understanding fear learning parameters in freely moving CFC paradigms. While there are numerous parameters that could be tested—both those identified through our own discussions and those raised by the reviewers—it is not feasible for a single lab to conduct a full evaluation of all the possible factors that could influence CFC in the head-fixed prep. A key limitation is that our approach requires robust navigation behavior in the VR without rewards, which requires weeks of training per mouse. It also necessitates larger sample sizes at the outset as not all animals will make it through our behavioral criteria required for CFC. Another important consideration is scalability. Unlike freely moving CFC paradigms, which allow parallel testing of many animals with minimal pre-training, the VR-CFC setup requires several weeks of behavior training and involves a more complex integration of hardware and software to accurately track behavior in virtual space. The number of VR rigs that can be operated simultaneously in a single lab is often limited, making high-throughput testing more challenging. These factors mean that the testing of a single parameter in a group of animals requires approximately 3–4 months to complete. Despite these constraints, we are committed to continue refining this paradigm over time. With this manuscript, our main aim was to provide a detailed framework, initial parameters, and evidence for conditioned behavior in the head-fixed preparation. By doing so, we hope to facilitate the adoption of this paradigm by researchers interested in studying the neural correlates of learning and memory using multiphoton imaging and stimulation techniques. This approach enables investigations that are not possible in freely moving animals, while the presence of freezing as a conditioned response allows for direct comparisons to the extensive body of work done in freely moving paradigms. Moving forward, we anticipate that optimizing this paradigm and identifying the key parameters that drive learning will be a collaborative, community-led effort.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors set out to develop a contextual fear learning (CFC) paradigm in head-fixed mice that would produce freezing as the conditioned response. Typically, lick suppression is the conditioned response in such designs, but this (1) introduces a potential confounding influence of reward learning on neural assessments of aversion learning and (2) does not easily allow comparison of head-fixed studies with extensive previous work in freely moving animals, which use freezing as the primary conditioned response.

      The first part of this study is a report on the development and outcomes of 3 variations of the CFC paradigm in a virtual reality environment. The fundamental design is strong, with headfixed mice required to run down a linear virtual track to obtain a water reward. Once trained, the water reward is no longer necessary and mice will navigate virtual reality environments. There are rigorous performance criteria to ensure that mice that make it to the experimental stage show very low levels of inactivity prior to fear conditioning. These criteria do result in only 40% of the mice making it to the experimental stage, but high rates of activity in the VR environment are crucial for detecting learning-related freezing. It is possible that further adjustments to the procedure could improve attrition rates.

      We acknowledge that further adjustments to the procedure could improve attrition rates, and we will continue to work on improving the paradigm.

      Paradigm versions 1 and 2 vary the familiarity of the control context while paradigm versions 2 and 3 vary the inter-shock interval. Paradigm version 1 is the most promising, showing the greatest increase in conditioned freezing (~40%) and good discrimination between contexts (delta ~15-20%). Paradigm version 2 showed no clear evidence of learning - average freezing at recall day 1 was not different than pre-shock freezing. First-lap freezing showed a difference, but this single-lap effect is not useful for many of the neural circuit questions for which this paradigm is meant to facilitate. Also, the claim that mice extinguished first-lap freezing after 1 day is weak. Extinction is determined here by the loss of context discrimination, but this was not strong to begin with. First-lap freezing does not appear to be different between Recall Day 1 and 2, but this analysis was not done.

      This is an important point. Following reviewer suggestions, we have replotted our figures for all paradigms to show within-VR freezing (see Supplementary Figures 2, 5 and 6) as the appropriate method for quantifying fear extinction across days. Using an LME model (Tables 16), we quantify freezing during recall days against baseline freezing levels measured before fear conditioning within each VR. In Paradigm 2, while some fear discrimination persists across days, extinction does occur rapidly. After the first lap in the CFC VR, we observed no significant differences in freezing compared to the baseline. These results are shown in the revised Supplementary Figure 5, and the revised text is in lines 393-399.

      Paradigm version 3 has some promise, but the magnitude of the context discrimination is modest (~10% difference in freezing). Thus, further optimization of the VR CFC will be needed to achieve robust learning and extinction. This could include factors not thoroughly tested in this study, including context pre-exposure timing and duration and shock intensity and frequency.

      We acknowledge that many aspects of this paradigm still need optimization, as virtual reality CFC is in its early stages, and we have not explored all of the parameter space. We describe above the reasoning for this. However, for this revised version of the paper we have added new behavioral data (Supplementary Figure 2 C-E) showing that increasing shock intensities from 0.6 mA to 1 mA enhances freezing, both in the first lap and on average. There are of course many other parameters that are likely important, like the ones pointed out here by the reviewer, but exploring the entire parameter space will take many years and will likely require many labs. The purpose of this paper is to show that VR-CFC fundamentally works and is a starting point from which the field can build on. We have now pointed out in the introduction (lines 54-58) and discussion (lines 730-737, 810-814) that there remains significant scope for improving this paradigm and optimizing parameters in the future.

      The second part of the study is a validation of the head-fixed CFC VR protocol through the demonstration that fear conditioning leads to the remapping of dorsal CA1 place fields, similar to that observed in freely moving subjects. The results support this aim and largely replicate previous findings in freely moving subjects. One difference from previous work of note is that VR CFC led to the remapping of the control environment, not just the conditioning context. The authors present several possible explanations for this lack of specificity to the shock context, further underscoring the need for further refinement of the CFC protocol before it can be widely applied. While this experiment examined place cell remapping after fear conditioning, it did not attempt to link neural activity to the learned association or freezing behavior.

      This is an interesting observation. We think that the remapping observed in the control context likely occurred due to the absence of reward in a previously rewarded environment. Our prior work has demonstrated that removal of reward causes increased remapping (Krishnan et al., 2022, Krishnan and Sheffield, 2023). In other words, the continued presence of reward within an environment stabilizes CA1 place fields. The Moita et al. (2004) paper, which showed remapping only in the fear conditioned context and not in the control context, provided rats with food pellets throughout the experimental session in both the control and conditioned context— likely to increase exploration necessary for identifying place cells. The presence of reward in the Moita et al experiment could explain the minimal remapping observed in their control context compared to our control context which lacked reward. Another possibility could lie in the differences in the intervals between place cell activity recordings in our study and that of Moita et al. While Moita et al. separated their recordings by just one hour, our recordings were separated by a full day, with a sleep period in between. The absence of sleep and the shorter time interval between conditioning and retrieval sessions in their study could explain the minimal remapping observed by Moita et al. compared to our findings. We have now addressed this discrepancy explicitly in lines 596-606.

      Although we agree with the reviewer that it would be informative to perform analysis of how neural activity correlates with freezing responses, we think this warrants its own stand-alone manuscript as the neural dynamics and methods to appropriately analyze them are complicated. We are in the midst of analyzing this data further and will present these findings in a separate publication.

      In summary, this is an important study that sets the initial parameters and neuronal validation needed to establish a head-fixed CFC paradigm that produces freezing behaviors. In the discussion, the authors note the limitations of this study, suggest the next steps in refinement, and point to several future directions using this protocol to significantly advance our understanding of the neural circuits of threat-related learning and behavior.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Krishnan et al devised three paradigms to perform contextual fear conditioning in head-fixed mice. Each of the paradigms relied on head-fixed mice running on a treadmill through virtual reality arenas. The authors tested the validity of three versions of the paradigms by using various parameters. As described below, I think there are several issues with the way the paradigms are designed and how the data are interpreted. Moreover, as Paradigm 3 was published previously in a study by the same group, it is unclear to me what this manuscript offers beyond the validations of parameters used for the previous publication. Below, I list my concerns point-by-point, which I believe need to be addressed to strengthen the manuscript.

      Major comments

      (1) In the analysis using the LME model (Tables 1 and 2), I am left wondering why the mice had increased freezing across recall days as well as increased generalization (increased freezing to the familiar context, where shock was never delivered). Would the authors expect freezing to decrease across recall days, since repeated exposure to the shock context should drive some extinction? This is complicated by the analysis showing that freeing was increased only on retrieval day 1 when analyzing data from the first lap only. Since reward (e.g., motivation to run) is removed during the conditioning and retrieval tests, I wonder if what the authors are observing is related to decreased motivation to perform the task (mice will just sit, immobile, not necessarily freezing per se). I think that these aspects need to be teased out.

      This is an important point and we agree teasing out a lack of motivation versus fearful freezing would be useful. To address the possibility that reduced motivation to run without reward could contribute to the observed freezing behavior, we have now included a no-shock control group in the revised manuscript (n = 7; Supplementary Figure 2A-B, H–I). These control mice experienced the same protocol, including the wearing of a tail coat, but did not receive any shocks. We observed no increases in freezing across days in these controls, confirming that the increased freezing in the Familiar context of our experimental group stems from fear conditioning rather than the removal of reward from a previously rewarded context. If reduced motivation from reward removal were the primary driver, similar freezing patterns would have emerged in the no-shock controls. We have added lines 248-261 in the revised manuscript, discussing this point, and we thank the reviewer for motivating us to do this experiment and analysis.

      That said, the precise mechanisms underlying the fear generalization observed in the nonconditioned context—particularly its emergence during later recall days—remain unclear. Studies in freely moving animals have shown that fear memories initially specific to the conditioned context can become generalized with repeated exposures, which may be occurring here (Biedenkapp & Rudy, 2007; Wiltgen & Silva, 2007). Alternatively, it is possible that the combination of fear conditioning and the removal of expected reward contributes to a delayed generalization effect. This may reflect a limitation of our approach, which relies on reward to motivate initial training. As noted by another reviewer, we have now addressed this potential drawback of reward-based training in the discussion (see lines 809-817). Clearly, unique factors specific to the head-fixed VR paradigm may contribute to this phenomenon. Understanding the mechanisms underlying fear generalization in the head-fixed VR CFC paradigm will be a valuable direction for future research.

      (2) Related to point 1, the authors actually point out that these changes could be due to the loss of the water reward. So, in line 304, is it appropriate to call this freezing? I think it will be very important for the authors to exactly define and delineate what they consider as freezing in this task, versus mice just simply sitting around, immobile, and taking a break from performing the task when they realize there is no reward at the end.

      As noted in point 1 above, we have added a no-shock control group (n = 7; Supplementary Figure 2A-B, H–I) to determine whether the observed freezing was driven by fear conditioning or by reduced motivation to run in the absence of reward. The absence of increased freezing in these controls supports the interpretation that the behavior in the conditioned group is fearrelated. In future studies, incorporating additional physiological measures—such as heart rate monitoring—could further help distinguish fear-related freezing from other forms of immobility.

      (3) In the second paradigm, mice are exposed to both novel and (at the time before conditioning) neutral environments just before fear conditioning. There is a big chance that the mice are 'linking' the memories (Cai et al 2016) of the two contexts such that there is no difference in freezing in the shock context compared to the neutral context, which is what the authors observe (Lines 333-335). The experiment should be repeated such that exposure to the contexts does not occur on the conditioning day.

      This is an interesting idea. However, if memory linking were driving the observed freezing patterns, we would expect to see similarly reduced fear discrimination across all three paradigms, as mice experience both contexts sequentially in each case. However, this effect appears to be specific to Paradigm 2, suggesting this may be due to other factors. We agree it would be informative to eliminate pre-conditioning exposure to both environments—to assess whether this improves fear discrimination and helps clarify the potential contribution of memory linking. This is something we plan to do in future studies that are beyond the scope of this initial paper on VR-CFC.

      (4) On lines 360-361, the authors conclude that extinction happens rapidly, within the first lap of the VR trial. To my understanding, that would mean that extinction would happen within the first 5-10 seconds of the test (according to Figure S1E). That seems far too fast for extinction to occur, as this never occurs in freely behaving mice this quickly.

      We agree with the reviewer that extinction in Paradigm 2 appears to occur relatively rapidly.

      However, the average time to complete the first lap in the fear-conditioned context in Paradigm 2 is 25.68 ± 5.55 seconds (as stated in line 384), indicating that extinction occurs within approximately the first 30 seconds of context exposure—not within 5–10 seconds. This is specific to Paradigm 2 and does not happen in either of the other paradigms, as shown in Supplementary Figure 4. For clarification, Figure S1E pertains to baseline running in Paradigm 1 and does not apply to Paradigm 2.

      As the reviewer points out, even at 30 seconds, extinction seems to be happening more quickly in Paradigm 2 than seen in freely moving setups. This may be due to a key structural difference in our setup. The VR-CFC task is organized into discrete trials, with mice being teleported back to the start after reaching the end of the virtual track. Completing a full lap without receiving a shock could serve as a clear signal that the threat is no longer present within the environment as the completion of a lap means that the animals have surveyed all locations within the environment. This structure could accelerate extinction compared to freely moving setups, where animals take longer to explore their complete environment due to the lack of discrete trials. Although this is true for all our paradigms, the accelerated extinction seen in paradigm 2 versus 1 and 3 may be driven by other factors. As noted by the reviewers, other task parameters—such as context pre-exposure timing, shock intensity, and conditioning duration— are likely to play a role in shaping extinction dynamics. These factors warrant further investigation, and we plan to explore them in future studies to better understand the conditions influencing extinction in the VR-CFC paradigm.

      (5) Throughout the different paradigms, the authors are using different shock intensities. This can lead to differences in fear memory encoding as well as in levels of fear memory generalization. I don't think that comparisons can be made across the different paradigms as too many variables (including shock intensity - 0.5/0.6mA can be very different from 1.0 mA) are different. How can the authors pinpoint which works best? Indeed, they find Paradigm 3 'works' better than Paradigm 2 because mice discriminate better between the neutral and shock contexts. This can definitely be driven by decreased generalization from using a 0.6mA shock in Paradigm 3 compared to 1.0 mA shock in Paradigm 2.

      The reviewer brings up important points here. We have now added new data evaluating 0.6 mA shocks in Paradigm 1 (Supplementary Figure 2A–E, n=8). These data show that 1.0 mA shocks produced stronger conditioned responses and greater fear discrimination compared to 0.6 mA. Our goal in Paradigm 3 was to begin with a lower shock intensity and assess whether additional modifications—specifically the shorter ISI and retention of the tail-coat during recall—could enhance fear conditioning. Surprisingly, despite the weaker shock intensity, Paradigm 3 resulted in improved discrimination and freezing behavior relative to Paradigm 2. We have now clarified this point in the manuscript (lines 466-470), and we interpret this outcome as evidence that the shorter ISIs and contextual cue continuity (tail-coat) likely play a more significant role in enhancing learning and recall. However, as noted in the text (lines 511-514), further testing is needed to determine the individual contributions of each parameter to successful VR-CFC. Fully optimizing the parameter settings will take additional time and resources, and we aim to continually refine the parameter space in the future, as has been done over the years for freely moving animals.

      (6) There are some differences in the calcium imaging dataset compared to other studies, and the authors should perform additional testing to determine why. This will be integral to validating their head-fixed paradigm(s) and showing they are useful for modeling circuit dynamics/behaviors observed in freely behaving mice. Moreover, the sample size (number of mice) seems low.

      The one notable difference between our imaging study and that done in freely moving animals is that we observed remapping of place cells in the control context. In contrast, Moita et al. (2004) reported more stable place fields in the control context. A key distinction is that their study included rewards in the control context, which may have contributed to the spatial stability. We now discuss this difference in the manuscript (lines 599-605).

      It should be noted that there are many key distinctions among paradigms that study neural activity during fear conditioning in freely moving animals. These include varying exposure times to environments (1–6 days), the time interval between neural activity recordings, and the use of food rewards during the experiment stages in freely moving animals to encourage exploration for place cell identification. Although freely moving paradigms that investigate fear conditioning and place cells are heterogeneous, we were encouraged by the replication of several key findings. This validates VR-based CFC as a viable tool for neural circuit investigations. While future work will include more thorough analyses, our current findings demonstrate the paradigm's effectiveness for modeling circuit dynamics and behavior. We have now expanded our dataset, which includes four additional mice, further corroborating these original findings.

      (7) It appears that the authors have already published a paper using Paradigm 3 (Ratigan et al 2023). If they already found a paradigm that is published and works, it is unclear to me what the current manuscript offers beyond that initial manuscript.

      The reviewer is correct that we have published a paper using Paradigm 3. However, this manuscript goes beyond that one and provides a much more comprehensive description and fundamental analysis of the behavior and experimental parameters regarding VR-CFC, allowing the research community to adapt our paradigm reproducibly. While Ratigan et al. (2023) offered only a minimal description of behavior and included just Paradigm 3, we present two additional paradigms along with neuronal validation using hippocampal place cells. We have now explicitly stated this in the introduction (lines 50-55).

      (8) As written, the manuscript is really difficult to follow with the averages and standard error reported throughout the text. This reporting in the text occurred heterogeneously throughout the text, as sometimes it was reported and other times it was not. Cleaning this reporting up throughout the paper would greatly improve the flow of the text and qualitative description of the results.

      We completely agree with this point and have now cleaned up the text, leaving details only in a few places we felt were important.

      Reviewer #3 (Public review):

      Summary:

      Krishnan et al. present a novel contextual fear conditioning (CFC) paradigm using a virtual reality (VR) apparatus to evaluate whether conditioned context-induced freezing can be elicited in head-fixed mice. By combining this approach with two-photon imaging, the authors aim to provide high-resolution insights into the neural mechanisms underlying learning, memory, and fear. Their experiments demonstrate that head-fixed mice can discriminate between threat and non-threat contexts, exhibit fear-related behavior in VR, and show context-dependent variability during extinction. Supplemental analyses further explore alternative behaviors and the influence of experimental parameters, while hippocampal neuron remapping is tracked throughout the experiments, showcasing the paradigm's potential for studying memory formation and extinction processes.

      Strengths:

      Methodological Innovation: The integration of a VR-based CFC paradigm with real-time twophoton imaging offers a powerful, high-resolution tool for investigating the neural circuits underlying fear, learning, and memory.

      Versatility and Utility: The paradigm provides a controlled and reproducible environment for studying contextual fear learning, addressing challenges associated with freely moving paradigms.

      Potential for Broader Applications: By demonstrating hippocampal neuron remapping during fear learning and extinction, the study highlights the paradigm's utility for exploring memory dynamics, providing a strong foundation for future studies in behavioral neuroscience.

      Comprehensive Data Presentation: The inclusion of supplemental figures and behavioral analyses (e.g., licking behaviors and variability in extinction) strengthens the manuscript by addressing additional dimensions of the experimental outcomes.

      Weaknesses:

      Characterization of Freezing Behavior: The evidence supporting freezing behavior as the primary defensive response in VR is unclear. Supplementary videos suggest the observed behaviors may include avoidance-like actions (e.g., backing away or stopping locomotion) rather than true freezing. Additional physiological measurements, such as EMG or heart rate, are necessary to substantiate the claim that freezing is elicited in the paradigm.

      To strengthen our claim that freezing is a conditioned response in this task, we have taken three key steps:

      (1) We adjusted our freezing detection threshold from 1 cm/s to near 0 cm/s to capture only periods where the animal is virtually motionless on the treadmill. We validated this approach in Figure 2, particularly in the zoomed-in track position trace in Figure 2A, which clearly shows that the identified freezing epochs correspond to no change in track position. All analyses and figures have been updated to reflect this more stringent threshold.

      (2) We have added a no-shock control group in the revised manuscript (n = 7; Supplementary Figure 2A-B, H–I) where mice experienced the same protocol, including wearing a tail-coat, but received no shocks. These mice showed no increases in freezing behavior, which further demonstrates that the increased freezing we observe is a result of fear conditioning.

      (3) We have added a new supplementary video (Supplementary Video 2) that better illustrates the freezing behavior in our task.

      That said, we fully agree with the reviewer that freezing is not the only defensive response observed. Other behaviors—such as hesitation, backward movement, and slowing down—also emerge that are unique to our treadmill-based paradigm. We chose to focus on freezing in this manuscript to align with convention in freely moving fear conditioning studies and to facilitate direct comparisons. We agree that additional physiological measurements (e.g., EMG or heart rate) would provide further validation and could help distinguish between different forms of defensive responses. We view this as an important future direction and plan to incorporate such measures in upcoming studies. We highlight this in the results section (lines 175-179, 262-268) and in the discussion (lines 739-750).

      Analysis of Extinction: Extinction dynamics are only analyzed through between-group comparisons within each Recall day, without addressing within-group changes in behavior across days. Statistical comparisons within groups would provide a more robust demonstration of extinction processes.

      This is an important distinction and we have now added figures (Supplementary Figures 2H-I, 5C-D, 6C-D) showing within-VR behavior across Recall days, along with statistical comparisons and a description of the extinction process based on these results.

      Low Sample Sizes: Paradigm 1 includes conditions with very low sample sizes (N=1-3), limiting the reliability of statistical comparisons regarding the effects of shock number and intensity.

      Increasing sample sizes or excluding data from mice that do not match the conditions used in Paradigms 2 and 3 would improve the rigor of the analysis.

      While we included all conditions in Figure 2 for completeness, we have separated these conditions in Supplementary Figure 2 to ensure clarity. This allows researchers interested in this paradigm to see the approximate range of conditioned responses observed across different parameters. When comparing Paradigm 1 with Paradigms 2 and 3, we have only used data from 1mA, 6 shocks condition.

      Potential Confound of Water Reward: The authors critique the use of reward in conjunction with fear conditioning in prior studies but do not fully address the potential confound introduced by using water reward during the training phase in their own paradigm.

      We agree this is a point that needs discussion. We have now noted the limitation of using water rewards during training in the discussion section, particularly its effect on the animal’s motivation in the long term and on place cell activity (lines 814-820).

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      I suggest changing "3 paradigms" to "3 versions of a CFC paradigm," as the paradigm is fundamentally the same, but parameters were adjusted towards finding an optimal protocol.

      We have changed this phrasing where applicable.

      Figure S2: There appear to be different sets of shock parameters for different mice, most with an n of 1 or 2. This is not reliable for making a decision for optimal shock parameters and should not be discussed in that way until a full-powered comparison is completed. Also, the N adds up to 19, yet only 18 are described as being included in the study.

      We thank the reviewer for this important point. We agree that the current study is not powered to definitively identify optimal parameter settings. We have been careful not to interpret it in that way in the text. Rather, we adopted a commonly used starting point from the freely moving literature—1 mA with six shocks—as our initial condition (lines 196-199). To provide context for others interested in pursuing this work, we have presented a range of conditioned responses from different parameter combinations to illustrate potential variability. In most cases, these data are intended for illustrative purposes only and are not meant to support firm conclusions. We agree that a systematic and fully powered investigation of each parameter would be highly valuable, and we plan to pursue this in future work (and hope other labs contribute to this goal, too), much like the iterative optimizations performed in freely moving paradigms over time.

      We thank the reviewer for catching the sample size discrepancy and have now corrected it.

      The number of animals for the no-shock condition should be included.

      Thank you. We have now included this.

      A possible explanation for the lower fear and poorer discrimination in versions 2 and 3 could be that 10 min pre-exposure to the CFC context on day -1 led to latent inhibition. Shorter (or eliminated) pre-exposure may improve outcomes.

      We agree that the exposure time is a parameter that we should explore. We have highlighted this in the discussion (lines 729-736) as a parameter that is worth testing in the future.

      For analysis of extinction, it is best to establish this within condition - is freezing to the CFC context significantly reduced compared with initial recall and similar to pre-training freezing? By using discrimination as your index of extinction, increases in control context freezing/inactivity can eliminate context discrimination without the conditioned response of freezing actually undergoing extinction.

      This is a good point, and we have now included analysis and conclusions based on a within-VR comparison for the analysis of fear extinction (Supplementary Figures 2H-I, 5C-D, 6C-D).

      Reviewer #3 (Recommendations for the authors):

      Clarification of Treadmill Shape: The manuscript describes the treadmill as "spherical" throughout. However, based on representative images and videos, the treadmill appears cylindrical. This discrepancy should be clarified to ensure consistency between the text and visuals.

      The reviewer is correct that the treadmill is cylindrical, and this was an error on our part. We have corrected it throughout.

      Figure and Legend Labeling: To improve clarity, all figures and their legends should be explicitly labeled with the corresponding paradigm (1, 2, or 3) to facilitate interpretation.

      We have now added a label on all figures that clarifies which Paradigm the figures are referring to. We have also explicitly added this to the figure legends.

      Objective Language: Subjective language, such as "since we wanted animals to" (Line 850), should be revised to reflect an objective tone (e.g., "to allow animals to"). Similarly, phrases like "We believe" (Line 896) should be avoided to maintain an unbiased presentation.

      We have removed subjective language from our text.

      Placement of Future Directions: Speculations on future experimental plans, such as the use of sex as a biological variable (Lines 895-903), should be included in the Discussion section rather than the Methods. Additionally, remarks about the responsiveness of female mice to tail shocks should be moved to the main text for proper contextualization.

      We have moved these lines as suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guo and colleagues used a cell rounding assay to screen a library of compounds for inhibition of TcdB, an important toxin produced by Clostridioides difficile. Caffeic acid and derivatives were identified as promising leads, and caffeic acid phenethyl ester (CAPE) was further investigated.

      Strengths:

      Considering the high morbidity rate associated with C. difficile infections (CDI), this manuscript presents valuable research in the investigation of novel therapeutics to combat this pressing issue. Given the rising antibiotic resistance in CDI, the significance of this work is particularly noteworthy. The authors employed a robust set of methods and confirmatory tests, which strengthen the validity of the findings. The explanations provided are clear, and the scientific rationale behind the results is well-articulated. The manuscript is extremely well written and organized. There is a clear flow in the description of the experiments performed. Also, the authors have investigated the effects of CAPE on TcdB in careful detail, and reported compelling evidence that this is a meaningful and potentially useful metabolite for further studies.

      Weaknesses:

      The authors have made some changes in the revised version. However, many of the changes were superficial, and some concerns still need to be addressed. Important details are still missing from the description of some experiments. Authors should carefully revise the manuscript to ascertain that all details that could affect interpretation of their results are presented clearly. For instance, authors still need to include details of how the metabolomics analyses were performed. Just stating that samples were "frozen for metabolomics analyses" is not enough. Was this mass-spec or NMR-based metabolomics. Assuming it was mass-spec, what kind? How was metabolite identity assigned, etc? These are important details, which need to be included. Even in cases where additional information was included, the authors did not discuss how the specific way in which certain experiments were performed could affect interpretation of their results. One example is the potential for compound carryover in their experiments. Another important one is the fact that CAPE affects bacterial growth and sporulation. Therefore, it is critical that authors acknowledge that they cannot discard the possibility that other factors besides compound interactions with the toxin are involved in their phenotypes. As stated previously, authors should also be careful when drawing conclusions from the analysis of microbiota composition data, and changes to the manuscript should be made to reflect this. Ascribing causality to correlational relationships is a recurring issue in the microbiome field. Again, I suggest authors carefully revise the manuscript and tone down some statements about the impact of CAPE treatment on the gut microbiota.

      Thanks for your constructive suggestion. We have carefully revised the manuscript according to your suggestions.

      Reviewer #2 (Public review):

      I appreciate the author's responses to my original review. This is a comprehensive analysis of CAPE on C. difficile activity. It seems like this compound affects all aspects of C. difficile, which could make it effective during infection but also make it difficult to understand the mechanism. Even considering the authors responses, I think it is critical for the authors to work on the conclusions regarding the infection model. There is some protection from disease by CAPE but some parameters are not substantially changed. For instance, weight loss is not significantly different in the C. difficile only group versus the C. difficile + CAPE group. Histology analysis still shows a substantial amount of pathology in the C. difficile + CAPE group. This should be discussed more thoroughly using precise language.

      Thanks for your constructive suggestion. We have carefully revised the manuscript according to your suggestions.

      Reviewer #3 (Public review):

      Summary:

      The study is well written, and the results are solid and well demonstrated. It shows a field that can be explored for the treatment of CDI

      Strengths:

      Results are really good, and the CAPE shows a good and promising alternative for treating CDI.

      Weaknesses:

      Some references are too old or missing.

      Comments on revisions:

      I have read your study after comments made by all referees, and I noticed that all questions and suggestions addressed to the authors were answered and well explained. Some of the minor and major issues related to the article were also solved. I am satisfied with all the effort given by the authors to improve their manuscript.

      Thanks again for your review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The legend of Figure 3SB is incorrect. It should read "Growth curves of C. difficile BAA-1870 in the presence of varying concentrations of CAPE (0-64 µg/mL)". Also, there is something wrong with the symbols in this figure. I suspect what is happening is that the symbols for the concentrations of 32 and 64 µg/mL are superimposing, but this is a problem because the lower line looks like a closed circle, which is supposed to represent the condition where no CAPE was added. The authors should change the symbols to allow clear distinction between each of the conditions.

      Thanks for your constructive suggestion. We have modified the panel and figure legend in Figure 3SB. The concentrations of 32 μg/mL and 64 μg/mL are quite similar, which makes it challenging to differentiate between the corresponding data points on the graph. To enhance clarity, we have utilized distinct colors to help distinguish these closely valued lines as effectively as possible.

      Since the authors observed a significant effect of CAPE on both bacterial growth and spore production, their discussion and conclusions need to reflect the fact that the effects observed can no longer be attributed solely to toxin inhibition.

      Thanks for your comments. We have modified the corresponding description according to your suggestions.

      In lines 43-45, authors state that "CAPE treatment of C. difficile-challenged mice induces a remarkable increase in the diversity and composition of the gut microbiota (e.g., Bacteroides spp.)". It is still unclear to this reviewer why mention Bacteroides between parentheses. Does this mean that there was an increase in the abundance of Bacteroides? If that is the case this needs to be stated more clearly.

      Thanks for your comments. Treatment with CAPE indeed significantly increased the abundance of Bacteroides spp. in the gut microbiota (Figure 7H-J). However, to avoid ambiguity in the abstract, we have chosen to delete the specific mention of Bacteroides spp. within the parentheses.

      The modifications made to lines 132-135 still do not address my concern. Authors stated in the manuscript that "compounds that were not bound to TcdB were removed". But how was this done? This needs to be clearly explained in the manuscript. In the response to reviewers document, authors state that this was done through centrifugation. But given that the goal here is to separate excess of small molecule from a protein target, just stating that centrifugation was used is not enough. Did the authors use ultracentrifugation? What were the conditions employed. This is critical so that the reader can assess the degree of compound carryover that may have occurred. Also, authors need to clearly acknowledge the caveats of their experimental design by stating that they cannot rule out the contribution of compound carryover to their results.

      Thanks for your comments. We employed ultrafiltration centrifugal partition to remove the unbound small molecule compounds. Due to the large molecular weight of TcdB, approximately 270 kDa, we selected a 100 kDa molecular weight cutoff ultrafiltration membrane. The centrifugation was performed at 4000 g for 5 min to eliminate the compounds that did not bind to TcdB. We have incorporated the relevant methods and discussed the potential impacts on the respective sections of the manuscript.

      In line 142, authors added the molar concentration of caffeic acid, as requested. Although this helps, it is even more important that molar concentrations are added every time a compound concentration is mentioned. For instance, just 2 lines down there is another mention of a compound concentration. It would be informative if authors also added molar concentrations here and throughout the manuscript.

      Thanks for your comments. In our initial test design, we have utilized the concentration unit of μg/mL. However, during the conversion to μM using the dilution method, some values do not result in neat, whole numbers. For instance, the conversion of 32 μg/mL of caffeic acid phenyl ethyl ester yields 112.55 μM, which appears somewhat irregular when expressed in this manner.

      Line 277. For the sake of clarity, I would strongly suggest that authors use the term "control mice" instead of "model mice".

      Thanks for your comments. We have modified “model mice” to “control mice” throughout the manuscript.

      In line 302, the word taxa should not be capitalized. I capitalized it in my original comments simply to draw attention to it.

      Thanks for your comments. We have modified this word.

      In the section starting in line 318, authors still need to include details of how the metabolomics analyses were performed. Just stating that samples were "frozen for metabolomics analyses" is not enough. Was this mass-spec or NMR-based metabolomics. Assuming it was mass-spec, what kind? How was metabolite identity assigned? Etc, etc. These are important details, which need to be included.

      Thanks for your comments. We have added some metabolomics methods in the corresponding section.

      In line 338, the authors misunderstood my original comment. This sentence should read "...the final product of purine degradation, were markedly decreased in mice after...".

      Thanks for your comments. We have modified this sentence.

      Panels of figure 3 are still incorrectly labeled. The secondary structure predictions are shown in A and C, not A and B as is currently stated in the legend.

      Thanks for your comments. We have modified the figure legend in Figure 3.

      About Figure 5C, I think the authors for the clarification, but this explanation should be included in the figure legend.

      Thanks for your comments. We have added the relevant information to the figure legend.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Batra, Cabrera and Spence et al. present a model which integrates histone posttranslational modification (PTM) data across cell models to predict gene expression with the goal of using this model to better understand epigenetic editing. This gene expression prediction model approach is useful if a) it predicts gene expression in specific cell lines b) it predicts expression values rather than a rank or bin, c) if it helps us to better understand the biology of gene expression or d) it helps us to understand epigenome editing activity. Problematically for point a) and b) it is easier to directly measure gene expression than to measure multiple PTMs and so the real usefulness of this approach mostly relates to c) and d).

      We appreciate this point from Reviewer #1 and the instructive comments and helpful feedback on our study. We designed our approach keeping in mind that the primary use case is to understand how epigenome editing would affect gene expression.

      Other approaches have been published that use histone PTM to predict expression (e.g. PMID 27587684, 36588793). Is this model better in some way? No comparisons are made although a claim is made that direct comparisons are difficult. I appreciate that the authors have not used the histone PTM data to predict gene expression levels of an "average cell" but rather that they are predicting expression within specific cell types or for unseen cell types. Approaches that predict expression levels are much more useful whereas some previous approaches have only predicted expressed or not expressed or a rank order or bin-based ranking. The paper does not seem to have substantial novel insights into understanding the biology of gene expression.

      We thank Reviewer #1 again for this insightful comment. We have included citations for a series of papers (PMIDs: 27587684, 30147283, 36588793) that performed gene expression prediction using histone PTM data. However, each of these methods performs classification of gene expression as opposed to predicting the actual gene expression value via regression. Additionally, the referenced studies all work with Roadmap Epigenomics read-depth data as opposed to p-values obtained from the ENCODE pipelines, making it difficult to make direct comparisons. We outline in the Discussion section that by creating a comprehensive dataset of epigenome editing outcomes, which include quantification of histone PTMs before and after in situ 1 perturbations, will improve our understanding of the effects of dCas9-p300 on gene expression and assist in the design of gRNAs for achieving fine-tuned control over gene expression levels. In this revised version of our study, we have also added new data (Figure 3 – figure supplement 3) to further benchmark our model against others.

      The approach of using this model to predict epigenetic editor activity on transcription is interesting and to my knowledge novel although only examined in the context of a p300 editor. As the author point out the interpretation of the epigenetic editing data is convoluted by things like sgRNA activity scoring and to fully understand the results likely would require histone PTM profiling and maybe dCas9 ChIP-seq for each sgRNA which would be a substantial amount of work.

      We agree with the Reviewer and view these experiments as important components of future studies.

      Furthermore from the model evaluation of H3K9me3 is seems the model is performing modestly for other forms of epigenetic or transcriptional editing- e.g. we know for the best studied transcriptional editor which is CRISPRi (dCas9-KRAB) that recruitment to a locus is associated with robust gene repression across the genome and is associated with H3K9me3 deposition by recruitment of KAP1/HP1/SETDB1 (PMID: 35688146, 31980609, 27980086, 26501517).

      This is an interesting point. We have included new data (Figure 4 – figure supplement 1), that quantifies how sensitive the trained gene expression model is to perturbations in H3K9me3. Indeed our data suggests that the model predictions are sensitive to perturbations in H3K9me3. For instance, there is a clear decrease and a gradual increase as the position where the perturbation is performed moves from upstream to downstream of the TSS. Additionally, the magnitude of the predicted fold-change is a function of how much the H3K9me3 is perturbed and hence the magnitude of change would be even higher if the perturbation magnitude is increased. However, this precise magnitude is hard to estimate In the absence of experimental perturbation data for H3K9me3. Leveraging our model in combination with KRAB-based CRISPRi is an exciting and important aspect of future studies.

      One concern overall with this approach is that dCas9-p300 has been observed to induce sgRNA independent off target H3K27Ac (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8349887/ see Figure S5D) which could convolute interpretation of this type of experiment for the model.

      This remains an excellent point and indeed, we and others have observed that dCas9-p300 can result in off-target H3K27ac levels (both increased and suppressed) across the genome. Our study focused on p300, because the molecule is one of the few known proteins that can catalyze H3K27ac in the human genome, and H3K27ac remains a proxy for active genomic regulatory elements. Nevertheless, any off target activity of dCas9-p300 could certainly convolute our analyses. We have included language to address this caveat in our discussion.

      Reviewer #2 (Public review):

      Summary:

      The authors build a gene expression model based on histone post-translational modifications, and find that H3K27ac is correlated with gene expression. They proceed to perturb H3K27ac at 13 gene promoters in two cell types, and measure gene expression changes to test their model.

      We remain appreciative of the constructive feedback and input from Reviewer #2 on our manuscript.

      Strengths:

      The combination of multiple methods to model expression, along with utilizing 6 histone datasets in 13 cell types allowed the authors to build a model that correlates between 0.7-0.79 with gene expression. They use dCas9-p300 fusions to perturb H3K27ac and monitor gene expression to test their model. Ranked correlations of the HEK293 data showed some support for the predictions after perturbation of H3K27ac.

      Weaknesses:

      The perturbation of 5 genes in K562 with perturb-seq data shows a modest correlation of ~0.5 and isn't included in the main figures. The authors are then left to speculate reasons why the outcome of epigenome editing doesn't fit their predictions, which highlights the limited value in the current version of this method.

      We agree with the reviewer’s suggestion and highlight in our conclusion that generating epigenome editing data across a variety of cell types and across many genes will help uncover the underlying mechanisms of gene expression modulation.

      As mentioned before, testing genes that were not expressed being most activated by dCas9-p300 weaken the correlations vs. looking at a broad range of different gene expression as the original model was trained on.

      We appreciate this comment from Reviewer #2. We note that the data generated from this dCas9-p300 perturb-seq experiment used gRNAs from a pre-existing library published previously (PMID: 37034704). While this library enabled deeper interrogation of dCas9-p300 driven effects compared to our previous revision, the gRNAs in this library were designed against genes associated with haploinsufficiency in neuronal cell types, and which were generally lowly-expressed in K562 cells. Further, we restricted our analysis here to promoter-proximal gRNAs (as opposed to enhancer-targeted gRNAs in the library), focusing our scope even more so. Thus the genes ultimately used for analysis are enriched for low expression.

      If the authors want this method to be used to predict outcomes of epigenome editing, expanding to dCas9-KRAB and other CRISPRa methods (SAM and VPR) would be useful. Those datasets are published and could be analyzed for this manuscript.

      This is an exciting suggestion from Reviewer #2. We agree, and view this as a component of future work in this area.

      The authors don't compare their method to other prediction methods.

      In this revised version of our study, we have also added new data (Figure 3 – figure supplement 3) to further benchmark our model against others. These data demonstrate that our CNN model outperforms existing approaches across multiple cell types.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Looking at the individual genes in K562 shows a random looking range of predictions and observed, with the exception of Bcl11A which is one of two genes in this set of 5 that are not expressed. I will repeat my earlier comment, that epigenome editing and CRISPRa methods generally show the most upregulation with the lowest expressed genes. I speculate that plotting endogenous expression vs. outcome (assuming using all gRNAs within a reasonable and similar distance to TSS) would produce a correlation of -0.5 or greater and be as useful as this method.

      We agree, and believe that this demonstrates more work is needed in this emerging research area.

      The methods describe Perturb-seq analysis but not the bench experiments.

      We have added the bench methods related to our Perturb-seq experiments to our revised manuscript under the Experimental Methods section in the Appendix.

      I don't understand why the authors can't compare to other methods as that is fairly standard in new prediction papers. I get that others used REMC vs. ENCODE, and were rank or binary based, but the authors could use REMC data and/or convert their data to ranked or binary and still compare. Lacking that it's hard to judge this manuscript.

      We have added benchmarking against existing methods as Figure 3 – figure supplement 3.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Our revised manuscript thoroughly addresses all comments and suggestions raised by the reviewers, as detailed in our point-by-point response. To strengthen our findings, we have conducted additional in vivo experiments to evaluate the presence of fibro-adipogenic progenitors (FAPs) at different time points during HO formation in control and BYL719-treated mice. Our results indicate that BYL719 reduces the accumulation of FAPs and promotes muscle fiber regeneration in vivo. We have also expanded our discussion on BYL719’s effects on mTOR signaling, further clarifying key points raised by Reviewer #1, and have addressed all minor comments.

      Additionally, in response to Reviewer #2, we have employed an orthogonal and complementary approach using a new model. We conducted chondrogenic differentiation experiments with murine MSCs expressing either ACVR1wt or ACVR1<sup>R206H</sup>. qPCR analysis of chondrogenic gene markers (Sox9, Acan, Col2a1) demonstrates that Activin A enhances their expression in ACVR1<sup>R206H</sup> cells, whereas BYL719 strongly suppresses their expression, regardless of ACVR1 mutational status. These new data further confirm that BYL719 effectively inhibits genes involved in ossification and osteoblast differentiation, independent of the ACVR1 mutation. We have also expanded our discussion to further clarify points raised by Reviewer #2 and have addressed all remaining minor comments.

      Below, we provide a detailed point-by-point response to the reviewers’ comments:

      Rreviewer #1:

      Point 1: In this revised manuscript, the authors clearly showed that BYL719 suppressed the proliferation and differentiation of murine myoblasts, C2C12 cells, in addition to human MSCs in vitro. Furthermore, BYL719 decreased migratory activity in vitro in monocytes and macrophages without suppressing proliferation. Overall, these data suggested that BYL719 is not a specific chemical compound for cell types or signaling pathways as mentioned in the manuscript by the authors themselves. Therefore, it was still unclear how to explain the molecular mechanisms in inhibition of HO by the compound in a specific signaling pathway in a specific cell type, MSCs, contradicting many other possibilities. The authors should add logical explanations in the manuscript.

      Regarding its selectivity, BYL719 is a potent and highly selective inhibitor of PI3Kα. It has been demonstrated in multiple studies and in several in vitro kinase assay panels (Furet et al. PMID: 23726034, Fritsch et al. PMID: 24608574). The IC50 or Kd values for BYL719 against PI3Kα were at least 50 times lower than for most of other kinases tested. Moreover, BYL719 is also highly selective for PI3Kα (IC50 = 4.6 nmol/L) compared to other class I PI3K (PI3Kβ (IC50 = 1,156 nmol/L), PI3Kδ (IC50 = 290 nmol/L), PI3Kγ (IC50 = 250 nmol/L)) (Fritsch et al). Consistent with these data, we show that, at the concentrations tested, BYL719 does not have a direct effect on any kinase receptor within the TGF-b superfamily, including ACVR1 or ACVR1<sup>R206H</sup>.

      Rather than blocking ACVR1 kinase activity, in our manuscript we provide evidence that BYL719 has the potential to inhibit osteochondroprogenitor specification and prevent an exacerbated inflammatory response in vivo (Valer et al., 2019a PMID: 31373426, and this manuscript) through different mechanisms, such as (i) increasing SMAD1/5 degradation, (ii) reducing transcriptional responsiveness to BMPs and Activin, (iii) blocking non-canonical ACVR1 responses such as the activation of AKT/mTOR. All these defined molecular mechanisms contribute to suppress HO in vitro and in vivo, as we report and explain throughout the manuscript. Selective PI3Kα inhibition is at the core of the different molecular pathways described. As such, PI3Kα blockade inhibits the phosphorylation of GSK3 and compromises SMAD1 protein stability, thereby altering canonical responsiveness and osteochondroprogenitor specification (Gamez et al PMID: 26896753; Valer et al PMID: 31373426). Moreover, PI3Kα blockade downregulates Akt/mTOR signalling, which is critical for FOP and non‐genetic (trauma induced) HO in preclinical models (Hino et al, 2017 PMID: 28758906; Hino et al. PMID: 30392977). Finally, PI3Kα inhibition hampers a number of proinflammatory pathways, thereby limiting the expression of pro-inflammatory cytokines, reducing the proliferation of monocytes, macrophages and mast cells, and partially blocking the migration of monocytes. As we suggest in the discussion of the manuscript, this effect likely causes a poor recruitment of monocytes and macrophages at injury sites and throughout the in vivo ossification process.

      Noteworthy, in our manuscript we do not refer to a “specific chemical compound for cell types”. Rather, in the Discussion we write “the administration of BYL719 prevented an exacerbated inflammatory response in vivo, possibly due to specific effects observed on immune cell populations.” This sentence did not intend to imply that BYL719 only affects these specific cell types, but aimed to emphasize the effects observed on those cell populations, even though systemic BYL719 may affect all populations. We rephrased it to “the administration of BYL719 prevented an exacerbated inflammatory response in vivo, possibly due to the effects observed on immune cell populations.” to provide a clearer message as suggested by the reviewer. We thank the reviewer for these questions and hope that these explanations and changes in the text improve the clarity of the message.

      Mesenchymal stem/stromal cells (MSCs) are osteochondroprogenitor cells that can follow distinct differentiation paths. In this study, we use these cells as an in vitro model for the study of osteochondrogenitor specification. MSCs, and induced MSCs (iMSCs), have been widely used as in vitro cellular models of osteochondroprogenitor specification for the analysis of markers, signaling, modulation, and differentiation potential or capacity. Their use as models for this purpose has been extensively studied in wild type MSCs, and in the presence of FOP mutations (Boeuf and Richter PMID: 20959030; Schwartzl et al. PMID: 37923731).

      Point 2: Related to comment #1, the effects of BYL719 on the proliferation and differentiation of fibro-adipogenic cells in skeletal muscle, which are potential progenitor cells of HO, should be important to support the claim of the authors.

      We have performed additional in vivo experiments to assess the presence of fibro-adipogenic precursors (FAPs) at different time-points during HO formation in control and BYL719-treated in the mouse model of heterotopic ossification. We analyzed the number of fibro-adipogenic progenitor (FAPs) during the progression of the HO. These data are shown in the new Figure3-Figure Supplement 1. We demonstrate that BYL719 reduces the number of PDGFRA+ cells (FAPs, red) throughout the ossification process in vivo. Moreover, now we also show an enlargement of the diameter of myofibers (labelled with wheat germ agglutinin, green) when animals were treated with BYL719, indicating improved muscle regeneration and further validating the data reported as supplementary figures that were added in the first revision of this manuscript.

      Point 3: BYL719 inhibited signaling through not only ACVR1-R206H and ACVR1-Q207D but also wild type ACVR1 and suppressed the chondrogenic differentiation of parental MSCs regardless of the expression of wild type or mutant ACVR1. Again, these findings suggest that BYL719 inhibits HO through a multiple and nonspecific pathway in multiple types of cells in vivo. The authors are encouraged to explain logically the use of bone marrow-derived MSCs to examine the effects of BYL719.

      As detailed in main point 1, we consider that the main target, molecular mechanisms and inhibited pathways by BYL719 are specific and well characterised in other research articles and further defined in this manuscript, including the generation of PI3Ka deficient mice in an FOP background, that undoubtedly demonstrates an essential role for PI3Ka in ACVR1-driven heterotopic ossification in vivo. Altogether, we are confident that BYL719 inhibits HO through multiple and specific pathways that arise from the PI3Kα inhibition. As a systemically administrated drug, BYL719 affects the multiple types of cells in vivo that express PI3Kα. It is well known that PI3Kα is exquisitely required for chondrogenesis and osteogenesis (Zuscik et al. PMID; Gamez et al PMID: 26896753 1824619). Accordingly, throughout the manuscript we refrain from suggesting a specific effect on ACVR1-R206H cells but instead an inhibitory effect on cell number and differentiation regardless on the ACVR1 form expressed.

      Similarly, as detailed in main point 1, MSCs and hiPSCs have been extensible used as in vitro cellular models of osteochondroprogenitor specification for the analysis of markers, signaling, modulation, and differentiation potential or capacity (Barruet et al., PMID: 28716551; Kan et al., PMID: 39308190).

      Point 4: BYL719 clearly inhibits an mTOR pathway. Is there a possibility that BYL719 suppresses HO by inhibiting mTOR rather than PI3K? The authors are encouraged to show the unique role of PI3K in BYL719-suppressed HO formation.

      As clarified above, BYL719 is a potent and selective inhibitor of PI3Kα, with minimal off-target inhibition against other kinases, as it has been demonstrated in multiple studies and in several in vitro kinase assay panels. In the same study, while IC50 of BYL719 against PI3Kα was (IC50 = 4.6 nmol/L), IC50 against mTOR was (IC50= >9,100 nmol/L), indicating that it was not directly inhibited. mTOR is one of the well-known pathways that are activated downstream of PI3K. Therefore, there is no surprise that blocking PI3Kα will block mTOR signalling. This potential effect was already demonstrated in previous publications (Valer et al., 2019a PMID: 31373426) and discussed throughout the first revision. We consider that the additive effect of mTOR inhibition and other molecular mechanisms downstream of PI3Kα, including reduced SMAD1/5 protein levels, contribute to the in vivo HO inhibition by BYL719.

      Reviewer #2:

      Point 1: It is also important to note that, in most of the data, there is no significant difference between cells with wild-type ACVR1 and those with the R206H mutation. The authors demonstrated that ACVR1 is not a target of BYL719 based on NanoBRET assay data, suggesting that BYL719's effect is not specific to FOP cells, even though they used an FOP mouse model to show in vivo effects.

      The main effect of R206H mutation is the gain of function in response to Activin A. For most of the responses to other ACVR1 ligands (e.g. BMP6/7), we observe a slightly increased response in the presence of the mutation (which is consistent with previous research, usually labelling RH as a “weak activating mutant” unless Activin A is added (Song et al., PMID: 20463014)). Therefore, as expected, most of the differences between WT and RH mutant cells can be observed mostly upon Activin A addition, as observed, for example, in Figure 3 of our manuscript.

      We agree with the reviewer that, at the concentrations used, BYL719 does not specifically target FOP cells. However, we believe that it targets downstream pathways of PI3Kα inhibition that are essential for osteochondrogenic specification, regardless of mutation status. This therapeutic strategy aligns with other experimental drugs, including Palovarotene (validated for FOP) and Garetosmab and Saracatinib (in advanced clinical trials), which target Activin A function, ACVR1 activity, or osteochondrogenic differentiation irrespective of the mutant allele. Unlike these molecules, BYL719 has been chronically administered to patients (including children) without major side effects (Gallagher et al.; PMID: 38297009), further supporting its potential for safe long-term use.

      The authors should consider that the effect of Activin A on R206H cells is not identical to that of BMP6 on WT cells. If the authors aim to identify the target of BYL719 in FOP cells, they should compare R206H cells treated with Activin A/BYL719 to WT cells treated with BMP6/BYL719.

      We use Activin A and BMP6, both high-affinity ACVR1 ligands, to demonstrate, as observed in figure 6, that PI3Kα inhibition can inhibit the expression of genes within GO terms ossification and osteoblast differentiation. It is important to note, however, that Activin A canonical signaling receptor is ACVR1B. Since BYL719 blocks the induction of a heterotopic ossification gene expression signature common to Activin A and BMP6, in the context of the FOP mutation R206H, our results indicate that BYL719 inhibition affects a signaling pathway downstream of ACVR1, activated by either BMP6 (wild type receptor, relevant for non-genetic heterotopic ossifications) or Activin (R206H mutant receptor, relevant for FOP).

      We consider that the comparison (RH ACTA BYL vs WT BMP6 BYL) would provide confounding results raised from intrinsic model differences in basal expression programs (WT vs RH), and differences in the quantitative level of signaling of the different ligands at these specific doses. First, if we only consider SMAD1/5 signaling, Activin A and BMP6 won’t have identical signaling, and differences will arise from the strength of that signaling. Secondly, in the suggested comparison we would find, mostly, all the differential gene expression promoted by Activin A canonical signaling through type I receptors ACVR1B/ALK4 in complex with ACVR2A or ACVR2B, promoting SMAD2/3 activation (in addition to the altered signaling that ACVR1-R206H could promote). Examples of differential response in pSMAD1/5 in ACVR1-WT or RH with BMP ligands and R206H with Activin A ligand, and examples of pSMAD2/3 canonical signaling in R206H cells have been described in Ramachandran et al, PMID: 34003511; Hatsell et al., PMID: 26333933).

      Point 2: The interpretation of the data in the new Figure 5 is inappropriate. Based on the expression levels of SOX9, COL2A1, and ACAN, it is unclear whether the effect of BYL719 is due to the inhibition of differentiation or proliferation. The addition of Activin A showed no difference between ACVR1/WT and ACVR1/R206H cells, suggesting that these cells did not accurately replicate the FOP condition.

      To gain consistency in our manuscript, we decided to use an orthogonal and complementary approach in a completely new model. We performed new experiments of chondrogenic differentiation using murine MSCs from UBC-Cre-ERT2/ACVR1<sup>R206H</sup> knock-in mice. These cells, when treated with 4OH-tamoxifen, express the intracellular exons of human ACVR1<sup>R206H</sup> in the murine Acvr1 locus. Therefore, we can compare differentiation of wild type and R206H MSCs isolated form the same mice. We initiated the chondrogenic differentiation assay from confluent cells to minimize changes in cell proliferation throughout the process. These new results are shown in the new Figure 5F. Mutant (RH) cells display an enhanced chondrogenic response to activin A compared to wild type cells. The treatment with BYL719 decreased the expression of chondrogenic markers irrespective of the mutational status of ACVR1 in the cells, further supporting our previous results in this manuscript and published article (Valer et al., 2019a PMID: 31373426).

      Point 3: The additional investigation of RNA-seq data provided useful information but was insufficient to fully address the purpose of this study. The authors should identify downregulated genes by comparing WT cells treated with Activin A/BYL719 and Activin A alone and then compare these identified genes with those shown in Figure 5E. Additionally, they should compare R206H cells treated with Activin A/BYL719 to WT cells treated with BMP6/BYL719. These comparisons will clarify whether there are FOP-specific BYL719-regulated genes.

      We thank the reviewer for considering that RNAseq data provides useful information. As already discussed in our answer above, our results indicate that regardless of the ligand (Activin A or BMP6) and regardless of the ACVR1 mutation (WT, relevant for non-genetic heterotopic ossifications or RH, relevant for FOP), BYL719 can inhibit the expression of the genes relevant to endochondral ossification. In our opinion, this is a very relevant conclusion of this study.

      We have deeply considered the strategy proposed by the reviewer, comparing “WT cells treated with Activin A/BYL719 and Activin A alone and then compare these identified genes with those shown in Figure 5E” and/or comparing “R206H cells treated with Activin A/BYL719 to WT cells treated with BMP6/BYL719”. While we have discussed why we do not consider appropriate the first comparison proposed, there are a number of reasons why we are not confident that the second comparison would provide a straightforward conclusion.

      Regarding the second suggested comparison already in Main point 1, we consider that it would provide confounding results due to all the arguments detailed in Main point 1. Regarding the first suggested comparison, we also consider that it would provide confounding results. There are several reasons why we do not consider that the genes only found in the RH comparison can be confidently considered genes that are only affected by BYL719 in RH cells.

      First, the effect of BYL719 in an osteogenic-prone sample (for example, RH-ActA) is higher than the effect that we can observe in absence of this activation (for example, WT-ActA), as observed in the higher number of significantly downregulated genes in RH ActA BYL vs RH ActA comparison, compared to WT ActA BYL vs WT ActA. Similar results are observed in figure 3C, where the expressions of the genes are significantly inhibited in RH ActA compared to RH ActA BYL. This inhibition is not significantly observed in in WT ActA compared to WT ActA BYL because the osteogenic expression of these genes is already very weak in the absence of ACVR1 R206H. This weak signaling of pSMAD1/5 in the absence of osteogenic signaling (RH without ligand or, especially, WT with Activin A) has already been described (Ramachandran et al. MID: 34003511). Therefore, even though the inhibition is present in both comparisons, as observed in figure 6C, the extent of the observed effect is different. Second, we are comparing a different number of DEGs for each comparison between them. If we compare the 67 downregulated genes from one comparison and 38 downregulated genes from the other comparison, the unequal list size may inflate the number of unique genes in the group with more downregulated genes. To prove these concerns, we performed the comparison that the reviewer suggested and we found, for example, that amongst the 38 differentially downregulated ossification genes in (WT_ActA_BYL vs WT_ActA) and 67 differentially downregulated ossification genes in (RH_ActA_BYL vs RH_ActA), 39 genes were only found in the RH comparison, while 10 were only found in the WT comparison, and 28 were found in both.

      These effects are present, for example, when studying the ID genes, well-known downstream mediators of BMP signaling. In this case, ID1 is downregulated in both comparisons, while ID2, ID3, and ID4, are downregulated only in the RH-group, despite the fact that all ID1, ID2, ID3, and ID4 are similarly regulated and increase their expression with similar time curves upon BMP signaling activation (Yang et al., PMID: 23771884). Therefore, we consider that the comparisons proposed will not help us to identify specific BYL719-regulated genes relevant for FOP and/or ACVR1 R206H signaling. Again, we consider that BYL719 effect is not specific of FOP cells. Our results show that regardless of the ligand (Activin A or BMP6) and regardless of the ACVR1 mutation (WT, relevant for non-genetic heterotopic ossifications or RH, relevant for FOP), BYL719 can inhibit the expression of the genes linked to ossification and osteoblast differentiation, which could be important for the treatment of FOP and non-genetic heterotopic ossifications.

      Point 4: The data in Figure 7 are not relevant to the aim of this study because the cell lines used in these experiments did not have ACVR1/R206H mutations. The authors mentioned that BMP6 is a ligand for ACVR1 and, therefore, these experiments reflect the situation of inflammatory cells in FOP. This is inappropriate and not rational. As mentioned above, the effect of Activin A on FOP cells is not identical to the effect of BMP6 in wild-type cells. The data in Figure 7 indicated that the effect of BYL719 is unrelated to the presence of BMP6, clearly demonstrating that these experiments are not related to the activation of ACVR1. In the gene expression analyses, almost all genes showed no changes with the addition of BMP6. Only TGF and CCL2 showed upregulation in THP1 cells, and the treatment with BYL719 failed to inhibit the effect of BMP6, suggesting that these experiments merely demonstrate the effect of BYL719 on inflammatory cells irrespective of the presence of the HO signal.

      We consider that Figure 7 is relevant to the aim of this study. As shown in Fig. 8, treatment of FOP mice with BYL719 led to a decreased recruitment of immune cells within the FOP lesions, suggesting a direct effect of BYL719 in immune cells. This is very relevant for the FOP pathology, since flare-ups have been linked with inflammatory episodes since the very early characterization of the disease (Mejias-Rivera et al., PMID: 38672135). Given the technical difficulties to transduce THP1, RAW264 and HMC1 cell lines with lentiviral particles carrying ACVR1 R206H, we decided to partially recapitulate ACVR1 R206H activation with recombinant BMP6 and to test the effect of BYL719 in these conditions. In these models, we found that BYL719 inhibited the expression of key genes driving immune cell activation, in a cell-type and ligand independent manner. To clarify this rationale, we have swapped Figures 7 and 8 and adjusted our conclusions accordingly. We have softened our interpretations, emphasizing the absence of the ACVR1 R206H mutant receptor in these experiments.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review)

      Summary

      The results offer compelling evidence that L5-L5 tLTD depends on presynaptic NMDARs, a concept that has previously been somewhat controversial. It documents the novel finding that presynaptic NMDARs facilitate tLTD through their metabotropic signaling mechanism.

      We thank Reviewer 1 for their kind words and thoughtful feedback!

      Strengths

      The experimental design is clever and clean. The approach of comparing the results in cell pairs where NMDA is deleted either presynaptically or postsynaptically is technically insightful and yields decisive data. The MK801 experiments are also compelling.

      We are very grateful for this kind feedback!

      Weaknesses

      No major weaknesses were noted by this reviewer.

      We were happy to see that Reviewer 1 had no concerns in the Public Review. We address their Recommendations here below.

      Reviewer #1 (Recommendations for the authors):

      There is one minor issue that the authors might want to address. In Figure 6C, the average time course of the controls (blue symbols) shows a clear decline in the baseline. The rate of this decline appears to be similar to the initial decline rate observed after inducing tLTD.

      Sorry, the x-axis was truncated so the first data points were not visible. We fixed Fig 6C as well as 6G, which suffered from the same problem.

      Reviewer 2 (Public review)

      Summary

      The study characterized the dependence of spike-timing-dependent long-term depression (tLTD) on presynaptic NMDA receptors and the intracellular cascade after NMDAR activation possibly involved in the observed decrease in glutamate probability release at L5-L5 synapses of the visual cortex in mouse brain slices.

      We are grateful for Reviewer 2’s thoughtful and detailed feedback!

      Strengths

      The genetic and electrophysiological experiments are thorough. The experiments are well-reported and mainly support the conclusions. This study confirms and extends current knowledge by elucidating additional plasticity mechanisms at cortical synapses, complementing existing literature.

      We were thrilled to see that the reviewer thinks our experiments are “thorough”, “well-reported” and they “mainly support the conclusions”!

      Weaknesses

      While one of the main conclusions (preNMDARs mediating presynaptic LTD) is resolved in a very convincing genetic approach, the second main conclusion of the manuscript (non-ionotropic preNMDARs) relies on the use of a high concentration of extracellular blockers (MK801, 2 mM; 7-clorokinurenic acid: 100 microM), but no controls for the specific actions of these compounds are shown.

      We thank the reviewer for calling our genetic approach “very convincing”!

      Regarding the pharmacological controls: for MK-801, we deliberately used a high extracellular concentration in the mM-range to match the intracellular concentrations used both in our own experiments and in prior studies (Berretta and Jones, 1996; Brasier and Feldman, 2008; Buchanan et al., 2012; Corlew et al., 2007; Humeau et al., 2003; Larsen et al., 2011; Rodríguez-Moreno et al., 2011; Rodríguez-Moreno and Paulsen, 2008). Our goal was to isolate the variable of application site (internal vs. external) while keeping concentration constant. If we had used the lower, more conventional µM-range extracellular concentrations (e.g., Huettner and Bean, 1988; Kemp et al., 1988; Tovar and Westbrook, 1999), differences in outcome might have reflected differences in drug efficacy rather than localization — particularly since failure to observe an effect at low concentrations would be hard to interpret.

      We now clarify this rationale in the revised manuscript (lines 578-585).

      As for 7-chlorokynurenic acid (7-CK), the 100 µM concentration we used is standard for effectively blocking the glycine-binding site of NMDARs (e.g., Nabavi et al., 2013).

      We also added two supplementary figures to show the effects of washing in MK-801 and 7-CK. In MK-801, responses are stable at low frequency (clarified in the manuscript lines 155-157 and Supp Fig 1 caption text). However, 7-CK suppresses responses appreciably, which takes time to stabilize. We clarify in the revised manuscript that in 7-CK experiments, we waited for this stabilization before inducing tLTD (lines 167-172 and Supp Fig 2 caption text). This additional suppression is consistent with 7-CK also acting as a potent competitive inhibitor of L-glutamate transport into synaptic vesicles (Bartlett et al., 1998).

      In addition, no direct testing for ions passing through preNMDAR has been performed.

      Sorry for being unclear, we have previously tested directly for ions passing through preNMDARs. For example, we showed blockade with Mg<sup>2+</sup> before (Abrahamsson et al., 2017; Wong et al., 2024), and we showed preNMDAR Ca<sup>2+</sup> supralinearities before (Abrahamsson et al., 2017; Buchanan et al., 2012). To improve the manuscript, we clarified the text accordingly (lines 140-141).

      It is not known if the results can be extrapolated to adult brain as the data were obtained from 11-18 days-old mice slices, a period during which synapses are still maturing and the cortex is highly plastic.

      Thank you, this is a good point. We address this point in the revised manuscript (lines 428-432). While our study focuses on the early postnatal period (P11–P18), when plasticity mechanisms are prominent and synaptic maturation is ongoing, we agree that extrapolation to the adult brain should be made with caution.

      Reviewer #2 (Recommendations for the authors):

      Points 1-3 were also found in the Public Review so are not addressed again here.

      (4) Results seem to be obtained in the absence of inhibition blocking and the role of inhibition in tLTD is not described. It should be indicated whether present results are obtained with or without the functional inhibitory synapse activation. If GABAergic synapses are not blocked authors need to show what happens when this inhibition is blocked.

      We agree that extracellular stimulation can inadvertently recruit inhibitory circuits. However, in our paired whole-cell recordings, synaptic responses are always subthreshold and exclusively reflect the direct connection between the two recorded neurons (Chou et al., 2024; Song et al., 2005). Under these conditions, inhibitory synapses are not activated, and we therefore did not apply GABAergic blockers. We thank the reviewer for raising this, which is now clarified in the Methods (lines 539-541) of the revised manuscript.

      (5) In some figures, the number of experiments seems to be low, and this number of experiments might be increased (Figures 1C, 3C, 4B).

      We acknowledge that the number of experiments in these figures is modest, but these recordings are technically demanding, and the data are carefully curated. Importantly, the observed effects were statistically significant, indicating that the sample sizes were sufficient. We also note that concerns about statistical power are typically more critical in the case of negative or null results, whereas our findings were positive.

      (6) The discussion is detailed but it is not clear that the activation of JNK2 needs to be achieved by a non-ionotropic action of NMDAR as activation after ionotropic NMDAR activation has been described in the literature. This point needs to be clarified and expanded.

      Sorry that we were unclear on this point. We clarified this on lines 371-372 of the manuscript.

      (7) Adding a cartoon/schematic summarizing the proposed mechanism for tLTD would help the reading of the manuscript.

      We appreciate this suggestion and agree that a schematic would be helpful. However, we prefer to hold off on including one at this stage, as aspects of the underlying mechanism — particularly the role of CB1 receptors in presynaptic pyramidal cells (Sjöström et al., 2003) — are currently under active investigation in a separate project. To avoid potentially misleading oversimplifications, we would prefer to revisit a summary schematic once these uncertainties have been resolved.

      Minor:

      (1) Concentration of compounds is recommended to be included in the figures or in the text. This would make it easy to follow the results.

      We appreciate the suggestion. However, we avoid repeating concentrations to emphasize that conditions are consistent unless otherwise stated. All compound concentrations are clearly listed in the Methods and remain unchanged across experiments. We believe this streamlined approach avoids redundancy while keeping the results clear.

      (2) In some figures, failures in synaptic transmission can be observed (and changes after tLTD). The authors may analyse changes in a number of failures in synaptic transmission after tLTD as an additional indication of a presynaptic expression of this form of tLTD. PPR may also be included in all figures.

      While failures in synaptic transmission are occasionally visible, we chose to focus on CV analysis, which is mathematically equivalent to failure rate analysis, as both rely on the same underlying variability in synaptic responses (Brock et al., 2020). Provided failures are reliably extracted (which requires sufficient signal-to-noise), CV and failure rate analyses should yield consistent conclusions.

      In contrast, PPR analysis is not mathematically equivalent to CV analysis and may offer complementary insights into presynaptic mechanisms. However, the presence of preNMDARs complicates the use of paired-pulse stimulation during baseline: preNMDARs enhance release during high-frequency activity (Abrahamsson et al., 2017; Sjöström et al., 2003; Wong et al., 2024), so repeated stimulation can suppress synaptic responses when preNMDARs are blocked, potentially confounding interpretation. For this reason, we limited PPR analysis to Figures 5 and 6, where conditions were appropriate.

      Admittedly, our manuscript was previously not clear on when we did paired-pulse stimulation and when we did not. We have clarified this in the revised manuscript (lines 548- 551 and lines 569-574).

      (3) Discussion: Line 363-64, hippocampal (SC-CA1 synapses) results exist where postsynaptic MK801 blocks presynaptic tLTD, this may be added here and in the references.

      While we acknowledge that postsynaptic MK-801 has been shown to block presynaptic tLTD at hippocampal SC–CA1 synapses, we note that the hippocampus is part of the archicortex, whereas our study focuses on neocortical circuits, as highlighted in the manuscript title. Given the substantial anatomical and functional differences between these regions, we prefer to keep our discussion focused on the neocortex to maintain conceptual coherence.

      (4) Discussion: While authors indicate "non-ionotropic" they do not discuss whether this action can be named properly "metabotropic" and whether G-proteins may be in fact needed for this action. The authors may briefly discuss this point.

      We previously referred to non-ionotropic NMDAR signaling as “metabotropic,” but reconsidered after discussions with colleagues, including Juan Lerma, who pointed out that the term typically implies G-protein coupling, which has not been definitively shown in this context. While the term “metabotropic” is used inconsistently in the literature (Heuss and Gerber, 2000; Heuss et al., 1999) — sometimes broadly to indicate non-ion flow signaling — we prefer to avoid potential confusion and therefore use “non-ionotropic” unless and until G-protein involvement is clearly demonstrated. We clarified this on lines 423-427 of the Discussion.

      (5) Page 19, line 451 NMDR needs to be corrected to NMDAR.

      Thanks! This was corrected.

      Reviewer 3 (Public review)

      Summary

      In this manuscript, "Neocortical Layer-5 tLTD Relies on Non-Ionotropic Presynaptic NMDA Receptor Signaling", Thomazeau et al. seek to determine the role of presynaptic NMDA receptors and the mechanism by which they mediate expression of frequency-independent timing-dependent long-term depression (tLTD) between layer-5 (L5) pyramidal cells (PCs) in the developing mouse visual cortex. By utilizing sophisticated methods, including sparse Cre-dependent deletion of GluN1 subunit via neonatal iCre-encoding viral injection, in vitro quadruple patch clamp recordings, and pharmacological interventions, the authors elegantly show that L5 PC->PC tLTD is (1) dependent on presynaptic NMDA receptors, (2) mediated by non-ionotropic NMDA receptor signaling, and (3) is reliant on JNK2/Syntaxin-1a (STX1a) interaction (but not RIM1αβ) in the presynaptic neuron. The study elegantly and pointedly addresses a long-standing conundrum regarding the lack of frequency dependence of tLTD.

      We thank the reviewer for calling our methods “sophisticated” and our study “elegant”! We appreciate the kind feedback!

      Strengths

      The authors did a commendable job presenting a very polished piece of work with high-quality data that this Reviewer feels enthusiastic about. The manuscript has several notable strengths. Firstly, the methodological approach used in the study is highly sophisticated and technically challenging and successfully produced high-quality data that were easily accessible to a broader audience. Secondly, the pharmacological interventions used in the study targeted specific players and their mechanistic roles, unveiling the mechanism in question step-by-step. Lastly, the manuscript is written in a well-organized manner that is easy to follow. Overall, the study provides a series of compelling evidence that leads to a clear illustration of mechanistic understanding.

      We are elated that the reviewer described our study with words such as “polished”, “high-quality”, “sophisticated”, and “compelling”!

      Minor comments

      (1) For the broad readership, a brief description of JNK2-mediated signaling cascade underlying tLTD, including its intersection with CB1 receptor signaling may be desired.

      Thank you, this is a great suggestion for improving clarity. We briefly address this point in the revised manuscript (lines 360-363).

      (2) The authors used juvenile mice, P11 to P18 of age. It is a typical age range used for plasticity experiments, but it is also true that this age range spans before and after eye-opening in mice (~P13) and is a few days before the onset of the classical critical period for ocular dominance plasticity in the visual cortex. Given the mechanistic novelty reported in the study, can authors comment on whether this signaling pathway may be age-dependent?

      Thanks, Reviewer 2 also raised this point. In the revised manuscript, we discuss this point (lines 428-432).

      Reviewer #3 (Recommendations for the authors):

      (1) Minor typos: page 4 line 101: sensitivity -> sensitive.

      We fixed this typo.

      (2) Page 15 line 333: sensitivity -> sensitive.

      We fixed this typo.

      (3) Minor aesthetic suggestion: On the scale bars for all examples, LTP and LTD data are easily confused with the letter L. I'd suggest flipping them left to right.

      We thank the reviewer for the suggestion. We flipped the scale bars in all figures.

      References

      Abrahamsson, T., Chou, C.Y.C., Li, S.Y., Mancino, A., Costa, R.P., Brock, J.A., Nuro, E., Buchanan, K.A., Elgar, D., Blackman, A.V., et al. 2017. Differential Regulation of Evoked and Spontaneous Release by Presynaptic NMDA Receptors. Neuron 96: 839-855 e835

      Bartlett, R.D., Esslinger, C.S., Thompson, C.M., and Bridges, R.J. 1998. Substituted quinolines as inhibitors of L-glutamate transport into synaptic vesicles. Neuropharmacology 37: 839-846

      Berretta, N., and Jones, R.S. 1996. Tonic facilitation of glutamate release by presynaptic N-methyl-D-aspartate autoreceptors in the entorhinal cortex. Neuroscience 75: 339-344.

      Brasier, D.J., and Feldman, D.E. 2008. Synapse-specific expression of functional presynaptic NMDA receptors in rat somatosensory cortex. J Neurosci 28: 2199-2211

      Brock, J.A., Thomazeau, A., Watanabe, A., Li, S.S.Y., and Sjöström, P.J. 2020. A Practical Guide to Using CV Analysis for Determining the Locus of Synaptic Plasticity. Frontiers in Synaptic Neuroscience 12:11 10.3389/fnsyn.2020.00011

      Buchanan, K.A., Blackman, A.V., Moreau, A.W., Elgar, D., Costa, R.P., Lalanne, T., Tudor Jones, A.A., Oyrer, J., and Sjöström, P.J. 2012. Target-Specific Expression of Presynaptic NMDA Receptors in Neocortical Microcircuits. Neuron 75: 451-466

      Chou, C.Y.C., Wong, H.H.W., Guo, C., Boukoulou, K.E., Huang, C., Jannat, J., Klimenko, T., Li, V.Y., Liang, T.A., Wu, V.C., and Sjöström, P.J. 2024. Principles of visual cortex excitatory microcircuit organization. The Innovation 6: 1-11

      Corlew, R., Wang, Y., Ghermazien, H., Erisir, A., and Philpot, B.D. 2007. Developmental switch in the contribution of presynaptic and postsynaptic NMDA receptors to long-term depression. J Neurosci 27: 9835-9845

      Heuss, C., and Gerber, U. 2000. G-protein-independent signaling by G-protein-coupled receptors. Trends in Neurosciences 23: 469-475

      Heuss, C., Scanziani, M., Gähwiler, B.H., and Gerber, U. 1999. G-protein-independent signaling mediated by metabotropic glutamate receptors. Nature Neuroscience 2: 1070-1077

      Huettner, J.E., and Bean, B.P. 1988. Block of N-methyl-D-aspartate-activated current by the anticonvulsant MK-801: selective binding to open channels. PNAS 85: 1307-1311.

      Humeau, Y., Shaban, H., Bissière, S., and Lüthi, A. 2003. Presynaptic induction of heterosynaptic associative plasticity in the mammalian brain. Nature 426: 841-845

      Kemp, J.A., Foster, A.C., Leeson, P.D., Priestley, T., Tridgett, R., Iversen, L.L., and Woodruff, G.N. 1988. 7-Chlorokynurenic acid is a selective antagonist at the glycine modulatory site of the N-methyl-D-aspartate receptor complex. PNAS 85: 6547-6550

      Larsen, R.S., Corlew, R.J., Henson, M.A., Roberts, A.C., Mishina, M., Watanabe, M., Lipton, S.A., Nakanishi, N., Perez-Otano, I., Weinberg, R.J., and Philpot, B.D. 2011. NR3A-containing NMDARs promote neurotransmitter release and spike timing-dependent plasticity. Nat Neurosci 14: 338-344

      Nabavi, S., Kessels, H.W., Alfonso, S., Aow, J., Fox, R., and Malinow, R. 2013. Metabotropic NMDA receptor function is required for NMDA receptor-dependent long-term depression. PNAS 110: 4027-4032

      Rodríguez-Moreno, A., Kohl, M.M., Reeve, J.E., Eaton, T.R., Collins, H.A., Anderson, H.L., and Paulsen, O. 2011. Presynaptic induction and expression of timing-dependent long-term depression demonstrated by compartment-specific photorelease of a use-dependent NMDA receptor antagonist. J Neurosci 31: 8564-8569

      Rodríguez-Moreno, A., and Paulsen, O. 2008. Spike timing-dependent long-term depression requires presynaptic NMDA receptors. Nat Neurosci 11: 744-745

      Sjöström, P.J., Turrigiano, G.G., and Nelson, S.B. 2003. Neocortical LTD via coincident activation of presynaptic NMDA and cannabinoid receptors. Neuron 39: 641-654

      Song, S., Sjöström, P.J., Reigl, M., Nelson, S., and Chklovskii, D.B. 2005. Highly nonrandom features of synaptic connectivity in local cortical circuits. PLoS biology 3: e68

      Tovar, K.R., and Westbrook, G.L. 1999. The incorporation of NMDA receptors with a distinct subunit composition at nascent hippocampal synapses in vitro. J Neurosci 19: 4180-4188

      Wong, H.H., Watt, A.J., and Sjöström, P.J. 2024. Synapse-specific burst coding sustained by local axonal translation. Neuron 112: 264-276 e266

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      (1) “It is likely that metabolism changes ex vivo vs in vivo, and therefore stable isotope tracing experiments in the explants may not reflect in vivo metabolism.”

      We agree with the reviewer that metabolic changes may differ ex vivo versus in vivo. We now state: “Lastly, an important caveat to our study is that metabolism changes ex vivo versus in vivo, and thus, in the future, in vivo studies can be performed to assess metabolic changes.” (lines 591-593).

      (2) “The retina at P0 is composed of both progenitors and differentiated cells. It is not clear if the results of the RNA-seq and metabolic analysis reflect changes in the metabolism of progenitors, or of mature cells, or changes in cell type composition rather than direct metabolic changes in a specific cell type.”

      We have clarified that the metabolic changes may be in RPCs or in other retinal cell types on lines 149-152: “Since these measurements were performed in bulk, and the ratio of RPCs to differentiated cells declines as development proceeds, it is not clear whether glycolytic activity is temporally regulated within RPCs or in other retinal cell types.”

      However, since we mined a single cell (sc) RNA-seq dataset, we are able to attribute gene expression specifically within RPCs (Figure 1).

      (3) “The biochemical links between elevated glycolysis and pH and beta-catenin stability are unclear. White et al found that higher pH decreased beta-catenin stability (JCB 217: 3965) in contrast to the results here. Oginuma et al found that inhibition of glycolysis or beta-catenin acetylation does not affect beta-catenin stability (Nature 584:98), again in contrast to these results. Another paper showed that acidification inhibits Wnt signaling by promoting the expression of a transcriptional repressor and not via beta-catenin stability (Cell Discovery 4:37). There are also additional papers showing increased pH can promote cell proliferation via other mechanisms (e.g. Nat Metab 2:1212). It is possible that there is organ-specificity in these signaling pathways however some clarification of these divergent results is warranted.”

      We have added the information and references brought up by the reviewer in our discussion (lines 529-549 and 570-574). We have also suggested future experiments to further analyse our system in line with the studies now referenced (lines 580-589).

      (4) The gene expression analysis is not completely convincing. E.g. the expression of additional glycolytic genes should be shown in Figure 1. It is not clear why Hk1 and Pgk1 are specifically shown, and conclusions about changes in glycolysis are difficult to draw from the expression of these two genes. The increase in glycolytic gene expression in the Pten-deficient retina is generally small.

      We have expanded the list of glycolytic genes analysed, in modified Figure 1B, and expanded the description of these results on lines 156-166.

      (5) Is it possible that glycolytic inhibition with 2DG slows down the development and production of most newly differentiated cells rather than specifically affecting photoreceptor differentiation?

      We added a comment to this effect to the discussion: “It is possible that glycolytic inhibition with 2DG slows down the development and production of most newly differentiated cells rather than specifically affecting photoreceptor differentiation, which we could assess in the future.“ (lines 600-603).

      (6) “Likewise the result that an increase in pH from 7.4 to 8.0 is sufficient to increase proliferation implies that pH regulation may have instructive roles in setting the tempo of retinal development and embryonic cell proliferation. Similarly, the results show that acetate supplementation increases proliferation (I think this result should be moved to the main figures).”

      We have added the acetate data to main Figure 7E.

      We added a supplemental data table that was inadvertently not included in our last submission. Figure 2– Data supplement 1.

      Reviewer #2 (Recommendations for the authors):

      Major points

      (1) Assuming that increased glycolysis gets RPCs to exit from the proliferative stage earlier, the total number of retinal cells, notably that of the rod photoreceptors, should be reduced since the pool of proliferating cells is depleted earlier. Is that really the case for a mature retina? To address this question, the authors should perform quantifications of photoreceptors at a stage where most developmental cell death has concluded (i.e. at P14 or later; Young, J. Comp. Neurol. 229:362-373, 1984) and check whether or not there are more or less photoreceptors present.

      We have previously quantified numbers of each cell type in Pten RPC-cKO retinas, and as suggested by the reviewer, there are fewer rod photoreceptors at P7 (Tachibana et al. 2016. J Neurosci 36 (36) 9454-9471) and P21 (Hanna et al. 2025. IOVS. Mar 3;66(3):45). We have edited the following sentence: “Using cellular birthdating, we previously showed that Pten-cKO RPCs are hyperproliferative and differentiate on an accelerated schedule between E12.5 and E18.5, yet fewer rod photoreceptors are ultimately present in P7 (Tachibana et al., 2016) and P21 (Hanna et al., 2025) retinas, suggestive of a developmental defect. (lines 184-187).

      (2) Figure 1B, 1H: On what data are these two figures based? The plots suggest that a high-density time series of gene expression and rod photoreceptor birth was performed, yet it is not clear where and how this was done. The authors should provide the data, plot individual data points, and, if applicable perform a statistical analysis to support their idea that glycolytic gene expression (as a surrogate for glycolysis) overlaps in time with rod photoreceptor birth (Figure 1B) and that in Pten KO the glycolytic gene expression is shifted forward in time (Figure 1H). If the data required to construct these plots (min. 5 data points, min 3 repeats each) does not exist or cannot be generated (e.g. from reanalysis of previously published datasets), then these graphs should be removed.

      We have removed the previous Figure 1B and Figure 1H.

      (3) Figure 2E: Which PKM isozyme was analyzed here? Does the genetic analysis allow us to distinguish between PKM1 and PKM2? Since PKM governs the key rate-limiting step of glycolysis but was not significantly upregulated, does this not contradict the authors' main hypothesis? If PKM at some point was inhibited (see also below comment to Figure 5) one would expect an accumulation of glycolytic intermediates, including phosphoenolpyruvate. Was such an effect observed?

      The data in Figure 2E is bulk RNA-seq data. Since there is only a single Pkm gene that is alternatively spliced, the RNA-sequencing data cannot distinguish between the four PK isozymes that arise from alternative splicing. Specifically, we used Illumina NextSeq 500 for sequencing of 75bp Single-End reads that will sequence transcripts for alternatively spliced Pkm1 and Pkm2 mRNAs, which carry a common 3’end. We added a statement to this effect: “However, since we employed 75 bp single-end sequencing, we could not distinguish between alternatively spliced Pkm1 and Pkm2 mRNAs.“ (lines 215-216).

      We have not performed metabolic analyses of glycolytic intermediates, but we have proposed such a strategy as an important avenue of investigation for future studies in the Discussion: “Lastly, an important caveat to our study is that metabolism changes ex vivo versus in vivo, and thus, in the future, in vivo studies can be performed to assess metabolic changes.” (lines 591-593).

      (4) Figure 3 and materials & methods: For the retinal explant cultures, was the RPE included in the cultured explants? If so, how can the authors distinguish drug effects on neuroretina and RPE? If the RPE was not included, then the authors should discuss how the missing RPE - neuroretina interaction could have influenced their results.

      We remove the RPE from the retinal explants, as indicated in the Methods section. The RPE is a metabolic hub that allows transport of nutrients for the retina, so in the absence of the RPE, there is not an immediate source of energy, such as glucose, to the retina. However, the media (DMEM) contains 25 mM glucose to replace the RPE as an energy source, and we now show that RPCs express GLUT1, which allows uptake of glucose (see new Figure 3A).

      We added the following sentence “P0 explants were mounted on Nucleopore membranes and cultured on top of retinal explant media, providing a source of nutrients, growth factors and glucose. “(lines 241-243).

      (5) Figure 3: It seems rather odd that, if glycolysis was so important for retinal proliferation, differentiation, and metabolism in general, the inhibition of glycolysis with 2DG should not produce a strong degeneration. However, since 2DG competes with glucose, and must be used at nearly equimolar concentration to block glycolysis in a meaningful way, it is possible that the 2DG concentration used simply was not high enough to substantially inhibit glycolysis. Since the inhibitory effect of 2DG depends on the glucose concentration, the authors should measure and provide the concentration of glucose in the explant culture medium. This value should be given either in results or materials and methods.

      We recently published a manuscript showing that 2DG treatments at the same concentrations employed in this study are effective at reducing lactate production in the developing retina in vivo, which is the expected effect of reduced glycolysis (Hanna et al. 2025. IOVS). However, in this study, we did not observe an impact on cell survival.

      We do not agree that it is necessary to measure glucose in the media since the anti-proliferative effect of 2DG is well known, and we are working in the effective range established by multiple groups. We have clarified that we are in the effective range by adding the following sentences: “2DG is typically used in the range of 5-10 mM in cell culture studies and in general, has anti-proliferative effects. To test whether 2DG treatment was in the effective range, explants were exposed to BrdU, which is incorporated into S-phase cells, for 30 minutes prior to harvesting. 2DG treatment resulted in a dose-dependent inhibition of RPC proliferation as evidenced by a reduction in BrdU<sup>+</sup> cells (Figure 3D), indicating that our treatment was in the effective range.” (lines 246-251).

      (6) Figure 3F: The authors use immunostaining for cleaved, activated caspase-3 to assess the amount of apoptotic cell death. However, there are many different possible mechanisms for neuronal cells to die, the majority of which are caspase-independent. To assess the amount of cell death occurring, the authors should perform a TUNEL assay (which labels apoptotic and non-apoptotic forms of cell death; Grasl-Kraupp et al., Hepatology 21:1465-8, 1995), quantify the numbers of TUNEL-positive cells in the retina, and compare this to the numbers of cells positive for activated caspase-3.

      We agree with the reviewer that there are more ways for a cell to die than just apoptosis, and TUNEL would pick up dying cells that may undergo apoptosis or necrosis, for example, our data with cleaved caspase-3, an executioner protease for apoptosis, provides us with clear evidence of cell death in our different conditions. Since this manuscript is not focused on cell death pathways, we have not performed the additional TUNEL assay.

      (7) Figure 4F and 4I: At post-natal day P7 the rod outer segments (OSs) only just start to grow out and the characteristic, rhodopsin-filled disk stacks are not yet formed. To test whether the PFKB3 gain-of function or the Pten KO has a marked effect on OS formation and length, the authors should perform the same tests on older, more mature retina at a time when rod OS show their characteristic disk structures (e.g. somewhere between P14 to P30). The same applies to the 2DG inhibition on the Pten KO retina.

      The precocious differentiation of rod outer segments observed in P7 Pten-cKO retinas does not persist in adulthood, and instead reflects a developmental acceleration. Indeed, we found that in Pten cKO retinas at 3-, 6- and 12-months of age, rod and cone photoreceptors degenerate, and cone outer segments are shorter (Hanna et al., 2025; Tachibana et al., 2016). These data demonstrate that Pten is required to support rod and cone survival.

      (8) Figure 5: Lowering media pH is a rather coarse and untargeted intervention that will have multiple metabolic consequences independent of PKM2. It is thus hardly possible to attribute the effects of pH manipulation to any specific enzyme. To assess this and possibly confirm the results obtained with low pH, the authors should perform a targeted inhibition experiment, for instance using Shikonin (Chen et al., Oncogene 30:4297-306, 2011), to selectively inhibit PKM2. If the retinal explant cultures contained the RPE, an additional question would be how the changes in RPE would alter lactate flux and metabolization between RPE and neuroretina (see also question 4 above).

      We have reframed the rationale for the pH manipulation experiments, highlighting the importance of pH in cell fate specification, and indicating that the aggregation of PKM2 is only one possible effect of lower pH.

      We wrote: “Given that altered glycolysis influences intracellular pH, which in turn controls cell fate decisions, we set out to assess the impact of manipulating pH on cell fate selection in the retina. One of the expected impacts of lowering pH was the aggregation of PKM2, a rate-limiting enzyme for glycolysis, which aggregates in reversible, inactive amyloids (Cereghetti et al., 2024).” (lines 362-366). 

      We have also added a discussion point “Whether pH manipulations also impact the stability of other retinal proteins, such as PKM2, can be further investigated in the future using specific PKM2 inhibitors, such as Shikonin (Chen et al., 2011). (lines 545-547).

      (9) Figure 5G: As for Figure 3F, the authors should perform TUNEL assays to assess the number of cells dying independent of caspase-3.

      Please see response to point 6.

      (10) Figure 7E: In the figure legend "K" should read "E". From the figure and the legend, it is not clear to which cell type this diagram should refer. This must be specified. Importantly, the insulin-dependent glucose-transporter 4 (GLUT4) highlighted in Figure 7E, while expressed on inner retinal vasculature endothelial cells, is not expressed in retinal neurons. What GLUTs exactly are expressed in what retinal neurons may still be to some extent contentious (cf. Chen et al., elife, https://doi.org/10.7554/eLife.91141.3 ; and reviewer comments therein), yet RPE cells clearly express GLUT1, photoreceptors likely express GLUT3, Müller glia cells may express GLUT1, while horizontal cells likely express GLUT2 (Yang et al., J Neurochem. 160:283-296, 2022).’

      We have removed this summary schematic for simplicity.

      (11) Materials and methods: The retinal explant culture system must be described in more detail. Important questions concern the use of medium and serum for which the providers, order numbers, and batch/lot numbers (whichever is applicable) must be given. The glucose concentration in the medium (including the serum content) should be measured. A key concern is whether the explants were cultivated submerged into the medium - this would prevent sufficient oxygenation and drive metabolism towards glycolysis (i.e. the Pasteur effect) - or whether they were cultivated on top of the liquid medium, at the interface between air and liquid (i.e. a situation that would favor OXPHOS).

      We have added further detail to the methods section for the explant assay (lines 686-689). We cultured the retinal explants on membranes on top of the media, which is the standard methodology in the field and in our laboratory (Cantrup et al., 2012; Tachibana et al., 2016; Touahri et al., 2024). Typically, RPCs undergo aerobic glycolysis, meaning that even in the presence of oxygen, they still prefer glycolysis rather than OXPHOS. We demonstrated that 2DG blocks RPC proliferation when treated with 2DG, indicating that RPCs are indeed favoring glycolysis in our assay system.

      (12) A point the authors may want to discuss additionally is the potential relevance of their data for the pathogenesis of human diseases, especially early developmental defects such as they occur in oxygen-induced retinopathy of prematurity.

      We would like to thank the reviewer for their valuable comment. Given that retinopathy of prematurity (ROP) is primarily vascular in nature, and we have not investigated vascular defects in this study, we have elected not to add a discussion of ROP to our manuscript.

      Minor points

      (1) Please add a label indicating the ages of the retina to images showing the entire retina (i.e. "P7"; e.g. in Figures 1F, 3, 4D, 5, etc.).

      Figure 1:

      1D: E18.5 indicated at the bottom of the two panels

      1F – P0 is indicated at the bottom of the two panels.

      Figure 3C-H: P0 explant stage and days of culture indicated

      Figure 4D: E12.5 BrdU and P7 harvest date indicated

      Figure 5C-H: P0 explant stage and days of culture indicated

      Figure 7A-E: P0 explant stage and days of culture indicated

      (2) The term Ctnnb1 should be introduced also in the abstract.

      We now state that Ctnnb1 encodes for b-catenin in the abstract.

      (3) Line 249: "...remaining..." should probably read "...remained...".

      Changed (now line 260).

      (4) Line 381: The sentence "...correlating with the propensity of some RPCs to continue to proliferate while others to differentiate.", should probably be rewritten to something like "...correlating with the propensity of some RPCs to continue to proliferate while others differentiate.".

      We have corrected this sentence.

      (5) The structure of the discussion might benefit from the introduction of subheadings.

      We have introduced subheadings.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1H shows the kinetics of rod photoreceptor production as accelerated, but does not represent the fact that fewer rods are ultimately produced, which appears to be the case from the data. If so, the Pten cKO curve should probably be lower than WT to reflect that difference.

      We have removed this graph (as per Reviewer #2, point 2).

      (2) KEGG analysis also showed that the HIF-1 signaling pathway is altered in the Pten cKO retina. What is the significance of that, and is it related to metabolic dysregulation? It has been shown that lactate can promote vessel growth, which initiates at birth in the mouse retina.

      We have added some information on HIF-1 to the Discussion. “The increased glycolytic gene expression in Pten-cKO retinas is likely tied to the increased expression of hypoxia-induced-factor-1-alpha (Hif1a), a known target of mTOR signaling that transcriptionally activates Slc1a3 (GLUT1) and glycolytic genes (Hanna et al., 2022). Indeed, mTOR signaling is hyperactive in Pten-cKO retinas (Cantrup et al., 2012; Tachibana et al., 2016; Tachibana et al., 2018; Touahri et al., 2024), and likewise, in Tsc1-cKO retinas, which also increase glycolysis via HIF-1A (Lim et al., 2021).” (lines 489-494).

      Cantrup, R., Dixit, R., Palmesino, E., Bonfield, S., Shaker, T., Tachibana, N., Zinyk, D., Dalesman, S., Yamakawa, K., Stell, W. K., Wong, R. O., Reese, B. E., Kania, A., Sauve, Y., & Schuurmans, C. (2012). Cell-type specific roles for PTEN in establishing a functional retinal architecture. PLoS One, 7(3), e32795. https://doi.org/10.1371/journal.pone.0032795

      Cereghetti, G., Kissling, V. M., Koch, L. M., Arm, A., Schmidt, C. C., Thüringer, Y., Zamboni, N., Afanasyev, P., Linsenmeier, M., Eichmann, C., Kroschwald, S., Zhou, J., Cao, Y., Pfizenmaier, D. M., Wiegand, T., Cadalbert, R., Gupta, G., Boehringer, D., Knowles, T. P. J., Mezzenga, R., Arosio, P., Riek, R., & Peter, M. (2024). An evolutionarily conserved mechanism controls reversible amyloids of pyruvate kinase via pH-sensing regions. Dev Cell. https://doi.org/10.1016/j.devcel.2024.04.018

      Chen, J., Xie, J., Jiang, Z., Wang, B., Wang, Y., & Hu, X. (2011). Shikonin and its analogs inhibit cancer cell glycolysis by targeting tumor pyruvate kinase-M2. Oncogene, 30(42), 4297-4306. https://doi.org/10.1038/onc.2011.137

      Hanna, J., Touahri, Y., Pak, A., David, L. A., van Oosten, E., Dixit, R., Vecchio, L. M., Mehta, D. N., Minamisono, R., Aubert, I., & Schuurmans, C. (2025). Pten Loss Triggers Progressive Photoreceptor Degeneration in an mTORC1-Independent Manner. Invest Ophthalmol Vis Sci, 66(3), 45. https://doi.org/10.1167/iovs.66.3.45

      Tachibana, N., Cantrup, R., Dixit, R., Touahri, Y., Kaushik, G., Zinyk, D., Daftarian, N., Biernaskie, J., McFarlane, S., & Schuurmans, C. (2016). Pten Regulates Retinal Amacrine Cell Number by Modulating Akt, Tgfbeta, and Erk Signaling. J Neurosci, 36(36), 9454-9471. https://doi.org/10.1523/JNEUROSCI.0936-16.2016

      Touahri, Y., Hanna, J., Tachibana, N., Okawa, S., Liu, H., David, L. A., Olender, T., Vasan, L., Pak, A., Mehta, D. N., Chinchalongporn, V., Balakrishnan, A., Cantrup, R., Dixit, R., Mattar, P., Saleh, F., Ilnytskyy, Y., Murshed, M., Mains, P. E., Kovalchuk, I., Lefebvre, J. L., Leong, H. S., Cayouette, M., Wang, C., Sol, A. D., Brand, M., Reese, B. E., & Schuurmans, C. (2024). Pten regulates endocytic trafficking of cell adhesion and Wnt signaling molecules to pattern the retina. Cell Rep, 43(4), 114005. https://doi.org/10.1016/j.celrep.2024.114005

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper describes the cryoEM structure of RAD51 filament on the recombination intermediate. In the RAD51 filament, the insertion of a DNA-binding loop called the L2 loop stabilizes the separation of the complementary strand for the base-pairing with an incoming ssDNA and the non-complementary strand, which is captured by the second DNA-binding channel called the site II. The molecular structure of the RAD51 filament with a recombination intermediate provides a new insight into the mechanism of homology search and strand exchange between ssDNA and dsDNA.

      Strengths:

      This is the first human RAD51 filament structure with a recombination intermediate called the D-loop. The work has been done with great care, and the results shown in the paper are compelling based on cryo-EM and biochemical analyses. The paper is really nice and important for researchers in the field of homologous recombination, which gives a new view on the molecular mechanism of RAD51-mediated homology search and strand exchange.

      Weaknesses:

      The authors need more careful text writing. Without page and line numbers, it is hard to give comments.

      We would like to thank the reviewer for their kind words of appreciation of our work.

      Reviewer #2 (Public review):

      Summary:

      Homologous recombination (HR) is a critical pathway for repairing double-strand DNA breaks and ensuring genomic stability. At the core of HR is the RAD51-mediated strand-exchange process, in which the RAD51-ssDNA filament binds to homologous double-stranded DNA (dsDNA) to form a characteristic D-loop structure. While decades of biochemical, genetic, and single-molecule studies have elucidated many aspects of this mechanism, the atomic-level details of the strand-exchange process remained unresolved due to a lack of atomic-resolution structure of RAD51 D-loop complex.

      In this study, the authors achieved this by reconstituting a RAD51 mini-filament, allowing them to solve the RAD51 D-loop complex at 2.64 Å resolution using a single particle approach. The atomic resolution structure reveals how specific residues of RAD51 facilitate the strand exchange reaction. Ultimately, this work provides unprecedented structural insight into the eukaryotic HR process and deepens the understanding of RAD51 function at the atomic level, advancing the broader knowledge of DNA repair mechanisms.

      Strengths:

      The authors overcame the challenge of RAD51's helical symmetry by designing a minifilament system suitable for single-particle cryo-EM, enabling them to resolve the RAD51 D-loop structure at 2.64 Å without imposed symmetry. This high resolution revealed precise roles of key residues, including F279 in Loop 2, which facilitates strand separation, and basic residues on site II that capture the displaced strand. Their findings were supported by mutagenesis, strand exchange assays, and single-molecule analysis, providing strong validation of the structural insights.

      Weaknesses:

      Despite the detailed structural data, some structure-based mutagenesis data interpretation lacks clarity. Additionally, the proposed 3′-to-5′ polarity of strand exchange relies on assumptions from static structural features, such as stronger binding of the 5′-arm-which are not directly supported by other experiments. This makes the directional model compelling but contradicts several well-established biochemical studies that support a 5'-to-3' polarity relative to the complementary strand (e.g., Cell 1995, PMID: 7634335; JBC 1996, PMID: 8910403; Nature 2008, PMID: 18256600).

      Overall:

      The 2.6 Å resolution cryoEM structure of the RAD51 D-loop complex provides remarkably detailed insights into the residues involved in D-loop formation. The high-quality cryoEM density enables precise placement of each nucleotide, which is essential for interpreting the molecular interactions between RAD51 and DNA. Particularly, the structural analysis highlights specific roles for key domains, such as the N-terminal domain (NTD), in engaging the donor DNA duplex.

      This structural interpretation is further substantiated by single-molecule fluorescence experiments using the KK39,40AA NTD mutant. The data clearly show a significant reduction in D-loop formation by the mutant compared to wild-type, supporting the proposed functional role of the NTD observed in the cryoEM model.

      However, the strand exchange activity interpretation presented in Figure 5B could benefit from a more rigorous experimental design. The current assay measures an increase in fluorescence intensity, which depends heavily on the formation of RAD51-ssDNA filaments. As shown in Figure S6A, several mutants exhibit reduced ability to form such filaments, which could confound the interpretation of strand exchange efficiency. To address this, the assay should either: (1) normalize for equivalent levels of RAD51-ssDNA filaments across samples, or (2) compare the initial rates of fluorescence increase (i.e., the slope of the reaction curve), rather than endpoint fluorescence, to better isolate the strand exchange activity itself.

      Based on the structural features of the D-loop, the authors propose that strand pairing and exchange initiate at the 3'-end of the complementary strand in the donor DNA and proceed with a 3'-to-5' polarity. This conclusion, drawn from static structural observations, contrasts with several well-established biochemical studies that support a 5'-to-3' polarity relative to the complementary strand (e.g., Cell 1995, PMID: 7634335; JBC 1996, PMID: 8910403; Nature 2008, PMID: 18256600). While the structural model is compelling and methodologically robust, this discrepancy underscores the need for further experiments.

      We would like to thank the reviewer for highlighting the importance of our findings to our understanding of the mechanism of homologous recombination.

      We agree with the reviewer that the reduced filament-forming ability of some of the RAD51 mutants complicates a straightforward interpretation of their strand-exchange assay. Interestingly, the RAD51 mutants that appear most impaired are the esDNA-capture mutants that do not contact the ssDNA in the structure of the pre-synaptic filament. However, the RAD51 NTD mutants, that display the most severe defect in strand-exchange, have a near-WT filament forming ability.

      The reviewer correctly points out that the polarity of strand exchange by RecA and RAD51 is an extensively researched topic that has been characterised in several authoritative studies. In our paper, we simply describe the mechanistic insights obtained from the structural D-loop models of RAD51 (our work) and RecA (Yang et al, PMID: 33057191).The structures illustrate a very similar mechanism of D-loop formation that proceeds with opposite polarity of strand exchange for RAD51 and RecA. Comparison of the D-loop structures for RecA and RAD51 provides an attractive explanation for the opposite polarity, as caused by the different positions of their dsDNA-binding domains in the filament structure. We agree with the reviewer that further investigation will be needed for an adequate rationalisation of the available evidence. We will mention the relevant literature in the revised version of the manuscript.

      Reviewer #3 (Public review):

      Summary:

      Built on their previous pioneer expertise in studying RAD51 biology, in this paper, the authors aim to capture and investigate the structural mechanism of human RAD51 filament bound with a displacement loop (D-loop), which occurs during the dynamic synaptic state of the homologous recombination (HR) strand-exchange step. As the structures of both pre- and post-synaptic RAD51 filaments were previously determined, a complex structure of RAD51 filaments during strand exchange is one of the key missing pieces of information for a complete understanding of how RAD51 functions in the HR pathway. This paper aims to determine the high-resolution cryo-EM structure of RAD51 filament bound with the D-loop. Combined with mutagenesis analysis and biophysical assays, the authors aim to investigate the D-loop DNA structure, RAD51-mediated strand separation and polarity, and a working model of RAD51 during HR strand invasion in comparison with RecA.

      Strengths:

      (1) The structural work and associated biophysical assays in this paper are solid, elegantly designed, and interpreted.  These results provide novel insights into RAD51's function in HR.

      (2) The DNA substrate used was well designed, taking into consideration the nucleotide number requirement of RAD51 for stable capture of donor DNA. This DNA substrate choice lays the foundation for successfully determining the structure of the RAD51 filament on D-loop DNA using single-particle cryo-EM.

      (3) The authors utilised their previous expertise in capping DNA ends using monomeric streptavidin and combined their careful data collection and processing to determine the cryo-EM structure of full-length human RAD51 bound at the D-loop in high resolution. This interesting structure forms the core part of this work and allows detailed mapping of DNA-DNA and DNA-protein interaction among RAD51, invading strands, and donor DNA arms (Figures 1, 2, 3, 4). The geometric analysis of D-loop DNA bound with RAD51 and EM density for homologous DNA pairing is also impressive (Figure S5). The previously disordered RAD51's L2-loop is now ordered and traceable in the density map and functions as a physical spacer when bound with D-loop DNA. Interestingly, the authors identified that the side chain position of F279 in the L2_loop of RAD51_H differs from other F279 residues in L2-loops of E, F, and G protomers. This asymmetric binding of L2 loops and RAD51_NTD binding with donor DNA arms forms the basis of the proposed working model about the polarity of csDNA during RAD51-mediated strand exchange.

      (4) This work also includes mutagenesis analysis and biophysical experiments, especially EMSA, single-molecule fluorescence imaging using an optical tweezer, and DNA strand exchange assay, which are all suitable methods to study the key residues of RAD51 for strand exchange and D-loop formation (Figure 5).

      Weaknesses:

      (1) The proposed model for the 3'-5' polarity of RAD51-mediated strand invasion is based on the structural observations in the cryo-EM structure. This study lacks follow-up biochemical/biophysical experiments to validate the proposed model compared to RecA or developing methods to capture structures of any intermediate states with different polarity models.

      (2) The functional impact of key mutants designed based on structure has not been tested in cells to evaluate how these mutants impact the HR pathway.

      The significance of the work for the DNA repair field and beyond:

      Homologous recombination (HR) is a key pathway for repairing DNA double-strand breaks and involves multiple steps. RAD51 forms nucleoprotein filaments first with 3' overhang single-strand DNA (ssDNA), followed by a search and exchange with a homologous strand. This function serves as the basis of an accurate template-based DNA repair during HR. This research addressed a long-standing challenge of capturing RAD51 bound with the dynamic synaptic DNA and provided the first structural insight into how RAD51 performs this function. The significance of this work extends beyond the discovery of biology for the DNA repair field, into its medical relevance. RAD51 is a potential drug target for inhibiting DNA repair in cancer cells to overcome drug resistance. This work offers a structural understanding of RAD51's function with the D-loop and provides new strategies for targeting RAD51 to improve cancer therapies.

      We thank the reviewer for their positive comments on the significance of our work. Concerning the proposed polarity of strand exchange based on our structural finding, please see our reply to the previous reviewer; we agree with the reviewer that further experimentation will be needed to reach a settled view on this.

      Testing the functional effects of the RAD51 mutants on HR in cells was not an aim of the current work but we agree that it would be a very interesting experiment, which would likely provide further important insights into the mechanism of strand exchange at the core of the HR reaction.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1:

      (1) The initial high accumulation by all cells followed by the emergence of a sub-population that has reduced its intracellular levels of tachyplesin is a key observation and I agree with the authors' conclusion that this suggests an induced response to the AMP is important in facilitating the bimodal distribution. However, I think the conclusion that upregulated efflux is driving the reduction in signal in the "low accumulator" subpopulation is not fully supported. Steady-state amounts of intracellular fluorescent AMP are determined by the relative rates of influx and efflux and a decrease could be caused by decreasing influx (while efflux remained unchanged), increasing efflux (while influx remained unchanged), or both decreasing influx and increasing efflux. Given the transcriptomic data suggest possible changes in the expression of enzymes that could affect outer membrane permeability and outer membrane vesicle formation as well as efflux, it seems very possible that changes to both influx and efflux are important. The "efflux inhibitors" shown to block the formation of the low accumulator subpopulation have highly pleiotropic or incompletely characterised mechanisms of action so they also do not exclusively support a hypothesis of increased efflux.

      We agree with the reviewer that the emergence of low accumulators after 30 min in the presence of extracellular tachyplesin-NBD (Figure 4A) could be due to either decreased influx while efflux remained unchanged, increased efflux while influx remained unchanged, or both decreasing influx and increasing efflux. Increased proteolytic activity or increased secretion of OMVs could also play a role.

      We have now acknowledged that “Reduced intracellular accumulation of tachyplesin-NBD in the presence of extracellular tachyplesin-NBD could be due to decreased drug influx, increased drug efflux, increased proteolytic activity or increased secretion of OMVs.” (lines 313-315).

      However, the emergence of low accumulators after 60 min in the absence of extracellular tachyplesin-NBD in our efflux assays (Figure 4C) cannot be due to decreased influx while efflux remained unchanged because of the absence of extracellular tachyplesin-NBD. We acknowledge that in our original manuscript we did not explicitly state that the efflux assays reported in Figure 4C-D were performed in the absence of tachyplesin-NBD in the extracellular environment. We have now clarified this point in our manuscript, we have added illustrations in Figure 4A, 4C-D and we have also carried out efflux assays using ethidium bromide (EtBr) to further support our conclusions about the primary role played by efflux in reducing tachyplesin accumulation in low accumulators. We have added the following paragraphs to our revised manuscript:

      “Next, we performed efflux assays using ethidium bromide (EtBr) by adapting a previously described protocol [62]. Briefly, we preloaded stationary phase E. coli with EtBr by incubating cells at a concentration of 254 µM EtBr in M9 medium for 90 min. Cells were then pelleted and resuspended in M9 to remove extracellular EtBr. Single-cell EtBr fluorescence was measured at regular time points in the absence of extracellular EtBr using flow cytometry. This analysis revealed a progressive homogeneous decrease of EtBr fluorescence due to efflux from all cells within the stationary phase E. coli population (Figure S13A). In contrast, when we performed efflux assays by preloading cells with tachyplesin-NBD (46 μg mL<sup>-1</sup> or 18.2 μM), followed by pelleting and resuspension in M9 to remove extracellular tachyplesin-NBD, we observed a heterogeneous decrease in tachyplesin-NBD fluorescence in the absence of extracellular tachyplesin-NBD: a subpopulation retained high tachyplesin-NBD fluorescence, i.e. high accumulators; whereas another subpopulation displayed decreased tachyplesin-NBD fluorescence, 60 min after the removal of extracellular tachyplesin-NBD (Figure 4B). Since these assays were performed in the absence of extracellular tachyplesin-NBD, decreased tachyplesin-NBD fluorescence could not be ascribed to decreased drug influx or increased secretion of OMVs in low accumulators, but could be due to either enhanced efflux or proteolytic activity in low accumulators.

      Next, we repeated efflux assays using EtBr in the presence of 46 μg mL<sup>-1</sup> (or 20.3 µM) extracellular tachyplesin-1. We observed a heterogeneous decrease of EtBr fluorescence with a subpopulation retaining high EtBr fluorescence (i.e. high tachyplesin accumulators) and another population displaying reduced EtBr fluorescence (i.e. low tachyplesin accumulators, Figure S14B) when extracellular tachyplesin-1 was present. Moreover, we repeated tachyplesin-NBD efflux assays in the presence of M9 containing 50 μg mL<sup>-1</sup> (244 μM) carbonyl cyanide m-chlorophenyl hydrazone (CCCP), an ionophore that disrupts the proton motive force (PMF) and is commonly employed to abolish efflux and found that all cells retained tachyplesin-NBD fluorescence (Figure S15B). However, it is important to note that CCCP does not only abolish efflux but also other respiration-associated and energy-driven processes [63].

      Taken together, our data demonstrate that in the absence of extracellular tachyplesin, stationary phase E. coli homogeneously efflux EtBr, whereas only low accumulators are capable of performing efflux of intracellular tachyplesin after initial tachyplesin accumulation. In the presence of extracellular tachyplesin, only low accumulators can perform efflux of both intracellular tachyplesin and intracellular EtBr. However, it is also conceivable that besides enhanced efflux, low accumulators employ proteolytic activity, OMV secretion, and variations to their bacterial membrane to hinder further uptake and intracellular accumulation of tachyplesin in the presence of extracellular tachyplesin.”

      These amendments can be found on lines 316-350 and in the new Figure S13 and Figure 4. We have also carried out more tachyplesin-NBD accumulation assays using single and double gene-deletion mutants lacking efflux components, please see Response 3 to reviewer 2 and the data reported in Figure 4B.

      (2) A conclusion of the transcriptomic analysis is that the lower accumulating subpopulation was exhibiting "a less translationally and metabolically active state" based on less upregulation of a cluster of genes including those involved in transcription and translation. This conclusion seems to borrow from well-described relationships referred to as bacterial growth laws in which the expression of genes involved in ribosome production and translation is directly related to the bacterial growth (and metabolic) rate. However, the assumptions that allow the formulation of the bacterial growth laws (balanced, steady state, exponential growth) do not hold in growth arrest. A non-growing cell could express no genes at all or could express ribosomal genes at a very low level, or efflux pumps at a high level. The distribution of transcripts among the functional classes of genes does not reveal anything about metabolic rates within the context of growth arrest - it only allows insight into metabolic rates when the constraint of exponential growth can be assumed. Efflux pumps can be highly metabolically costly; for example, Tn-Seq experiments have repeatedly shown that mutants for efflux pump gene transcriptional repressors have strong fitness disadvantages in energy-limited conditions. There are no data presented here to disprove a hypothesis that the low accumulators have high metabolic rates but allocate all of their metabolic resources to fortifying their outer membranes and upregulating efflux. This could be an important distinction for understanding the vulnerabilities of this subpopulation. Metabolic rates can be more directly estimated for single cells using respiratory dyes or pulsed metabolic labelling, for example, and these data could allow deeper insight into the metabolic rates of the two subpopulations. My main recommendation for additional experiments to strengthen the conclusions of the paper would be to attempt to directly measure metabolic or translational activity in the high- and low-accumulating populations. I do not think that the transcriptomic data are sufficient to draw conclusions about this but it would be interesting to directly measure activity. Otherwise, it might be reasonable to simply soften the language describing the two populations as having different activity levels. They do seem to have different transcriptional profiles, and this is already an interesting observation.

      We agree with the reviewer that it might be misleading to draw conclusions on bacterial metabolic states solely based on transcriptomic data. We have therefore removed the statement “low accumulators displayed a less translationally and metabolically active state”. We have instead stated the following: “Our transcriptomics analysis showed that low tachyplesin accumulators downregulated protein synthesis, energy production, and gene expression processes compared to high accumulators”. Moreover, we have employed the membrane-permeable redox-sensitive dye C<sub>12</sub>-resazurin, which is reduced to the fluorescent C<sub>12</sub>-resorufin in metabolically active cells, to obtain a more direct estimate of the metabolic state of low and high accumulators of tachyplesin. We have added the following paragraph reporting our new data:

      “Our transcriptomics analysis also showed that low tachyplesin accumulators downregulated protein synthesis, energy production, and gene expression compared to high accumulators. To gain further insight on the metabolic state of low tachyplesin accumulators, we employed the membrane-permeable redox-sensitive dye, resazurin, which is reduced to the highly fluorescent resorufin in metabolically active cells. We first treated stationary phase E. coli with 46 μg mL<sup>-1</sup> (18.2 μM) tachyplesin-NBD for 60 min, then washed the cells, and then incubated them in 1 μM resazurin for 15 min and measured single-cell fluorescence of resorufin and tachyplesin-NBD simultaneously via flow cytometry. We found that low tachyplesin-NBD accumulators also displayed low fluorescence of resorufin, whereas high tachyplesin-NBD accumulators also displayed high fluorescence of resorufin (Figure S16), suggesting lower metabolic activity in low tachyplesin-NBD accumulators.”

      These amendments can be found on lines 398-408 and in Figure S16.

      (3) The observation that adding nutrients to the stationary phase cultures pushes most of the cells to the "high accumulator" state is presented as support of the hypothesis that the high accumulator state is a higher metabolism/higher translational activity state. However, it is important to note that adding nutrients will cause most or all of the cells in the population to start to grow, thus re-entering the familiar regime in which bacterial growth laws apply. This is evident in the slightly larger cell sizes seen in the nutrient-amended condition. In contrast to stationary phase cells, growing cells largely do not exhibit the bimodal distribution, and they are much more sensitive to tachyplesin, as demonstrated clearly in the supplement. Growing cells are not necessarily the same as the high-accumulating subpopulation of non-growing cells.

      Following the reviewer’s suggestion, we are no longer using the nutrient supplementation data to support the hypothesis that high accumulators possess higher metabolism or translational activity.

      The nutrient supplementation data is now only used to investigate whether tachyplesin-NBD accumulation and efficacy can be increased, and not to show that high tachyplesin-NBD accumulators are more metabolically or translationally active.

      Furthermore, our previous statement “Our data suggests that such slower-growing subpopulations might display lower antibiotic accumulation and thus enhanced survival to antibiotic treatment.” has now been removed from the discussion.

      (4) It might also be worth adding some additional context around the potential to employ efflux inhibitors as therapeutics. It is very clear that obtaining sufficient antimicrobial drug accumulation within Gram-negative bacteria is a substantial barrier to effective treatments, and large concerted efforts to find and develop therapeutic efflux pump inhibitors have been undertaken repeatedly over the last 25 years. Sufficiently selective inhibitors of bacterial efflux pumps with appropriate drug-like properties have been challenging to find and none have entered clinical trials. Multiple psychoactive drugs have been shown to impact efflux in bacteria but usually using concentrations in the 10-100 uM range (as here). Meanwhile, the Ki values for their human targets are usually in the sub- to low-nanomolar range. The authors rightly note that the concentration of sertraline they have used is higher than that achieved in patients, but this is by many orders of magnitude, and it might be worth expanding a bit on the substantial challenge of finding efflux inhibitors that would be specific and non-toxic enough to be used therapeutically. Many advances in structural biology, molecular dynamics, and medicinal chemistry may make the quest for therapeutic efflux inhibitors more fruitful than it has been in the past but it is likely to remain a substantial challenge.

      We agree with this comment and we have now added the following statement:

      “This limitation underscores the broader challenge of identifying EPIs that are both effective and minimally toxic within clinically achievable concentrations, while also meeting key therapeutic criteria such as broad-spectrum efficacy against diverse efflux pumps, high specificity for bacterial targets, and non-inducers of AMR [117]. However, advances in biochemical, computational, and structural methodologies hold the potential to guide rational drug design, making the search for effective EPIs more promising [118]. Therefore, more investigation should be carried out to further optimise the use of sertraline or other EPIs in combination with tachyplesin and other AMPs.”

      This amendment can be found on lines 535-542.

      (5) My second recommendation is that the transcriptomic data should be made available in full and in a format that is easier for other researchers to explore. The raw data should also be uploaded to a sequence repository, such as the NCBI Geo database or the EMBL ENA. The most useful format for sharing transcriptomic data is a table (such as an excel spreadsheet) of transcripts per million counts for each gene for each sample. This allows other researchers to do their own analyses and compare expression levels to observations from other datasets. When only fold change data are supplied, data cannot be compared to other datasets at all, because they are relative to levels in an untreated control which are not known. The cluster analysis is one way of gaining insight into biological function revealed by transcriptional profile, but it can hide interesting additional complexities. For example, rpoS is named as one of the transcription-associated genes that are higher in the high accumulator subpopulation and evidence of generally increased activity. But RpoS is the stress sigma factor that drives much lower levels of expression generally than the housekeeping sigma factor RpoD, even though it recognises many of the same promoters (and some additional stress-specific promoters). Therefore, increased RpoS occupancy of RNAP would be expected to result in overall lower levels of transcription. However, it is also true that the transcript level for the rpoS gene is a particularly poor indicator of expression - rpoS is largely post-transcriptionally regulated. More generally, annotations are always evolving and key functional insights related to each gene might change in the future, so the results are a more durable resource if they are presented in a less analysed form as well as showing the analysis steps. It can also be important to know which genes were robustly expressed but did not change, versus genes that were not detected.

      Sequencing data associated with this study have now been uploaded and linked under NCBI BioProject accession number PRJNA1096674 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1096674).

      We have added this link to the methods under subheading “Accession Numbers” on lines 858-860. Additionally, transcripts per million counts for each gene for each sample have been added to the Figure 3 - Source Data file as requested by the reviewer.

      (6) In the introduction, the susceptibility of AMP efficacy to resistance mechanisms is discussed:

      "However, compared to small molecule antimicrobials, AMP resistance genes typically confer smaller increases in resistance, with polymyxin-B being a notable exception 7, 8. Moreover, mobile resistance genes against AMPs are relatively rare, and horizontal acquisition of AMP resistance is hindered by phylogenetic barriers owing to functional incompatibility with the new host bacteria9, again with plasmid-transmitted polymyxin resistance being a notable exception."

      It seems worth pointing out that polymixins are the only AMPs that can reasonably be compared with small molecule antibiotics in terms of resistance acquisition since they are the only AMPs that have been widely used as drugs and therefore had similar chances to select for resistance among diverse global microbial populations.

      We have now clarified that we are referring to laboratory evolutionary analyses of resistance towards small molecule antibiotics and AMPs (Spohn et al., 2019) and that polymyxins are the only AMPs that have been used in antibiotic treatment to date.

      We have added the following statement to address this point:

      “Bacteria have developed genetic resistance to AMPs, including proteolysis by proteases, modifications in membrane charge and fluidity to reduce affinity, and extrusion by AMP transporters. However, compared to small molecule antimicrobials, AMP resistance genes typically confer smaller increases in resistance in experimental evolution analyses, with polymyxin-B and CAP18 being notable exceptions [8]. Moreover, mobile resistance genes against AMPs are relatively rare and horizontal acquisition of AMP resistance is hindered by phylogenetic barriers owing to functional incompatibility with the new host bacteria [9]. Plasmid-transmitted polymyxin resistance constitutes a notable exception [10], possibly because polymyxins are the only AMPs that have been in clinical use to date [9].”

      This amendment can be found on lines 57-65.

      (7) In the description of Figure 4, " tachyplesin monotherapy" is mentioned. It is not really appropriate to describe the treatment of a planktonic culture of bacteria in a test tube as a therapy since there is no host that is benefitting.

      We have now replaced “tachyplesin monotherapy” with “tachyplesin treatment”.

      (8) In the discussion, it is stated that " tachyplesin accumulates intracellularly only in bacteria that do not survive tachyplesin exposure" but this is clearly not true. All bacteria accumulate tachyplesin intracellularly initially, but if the bacteria are non-growing during the exposure, some of them are able to reduce their intracellular levels. The fraction of survivors is roughly correlated with the fraction of bacteria that do not maintain high intracellular levels of tachyplesin and that do not stain with propidium iodide, but for any given cell it seems that there is no clear point at which a high intracellular level of tachyplesin means that it will definitely not survive.

      We have now clarified this statement as follows: “We show that after an initial homogeneous tachyplesin accumulation within a stationary phase E. coli population, tachyplesin is retained intracellularly by bacteria that do not survive tachyplesin exposure, whereas tachyplesin is retained only in the membrane of bacteria that survive tachyplesin exposure.”

      This amendment can be found on lines 443-446.

      (9) Also in the discussion: " Our data suggests that such slower-growing subpopulations might display lower antibiotic accumulation and thus enchanced [sic] survival to antibiotic treatment." This does not really relate to the results here because the bimodal distributions were primarily studied in the absence of growth. In the LB/exponential growth situations where the population was growing but a very small subpopulation of low accumulators was observed, no measurements were made to indicate subpopulation growth rates.

      We have now removed this statement from the manuscript.

      (10) In discussion, L-Ara4N appears to be referred to as both positively charged and negatively charged; this should be clarified.

      We have now clarified that L-Ara4N is positively charged.

      This amendment can be found on line 496.

      (11) Discussion of TF analysis seems to overstate what is supported by the evidence. The correlation of up- and downregulated genes with previously described TF regulons (probably measured in very different conditions) does not really demonstrate TF activity. This could be measured directly with additional experiments but in the absence of those experiments claims about detecting TF activity should probably be avoided. The attempts to directly demonstrate the importance of those transcription factors to the observed accumulation activity were not successful.

      We have now removed from the discussion the previous paragraph related to the TF analysis. We have also modified the results section reported the TF analysis as follows: “Next, we sought to infer transcription factor (TF) activities via differential expression of their known regulatory targets [61]. A total of 126 TFs were inferred to exhibit differential activity between low and high accumulators (Data Set S4). Among the top ten TFs displaying higher inferred activity in low accumulators compared to high accumulators, four regulate transport systems, i.e. Nac, EvgA, Cra, and NtrC (Figure S12). However, further experiments should be carried out to directly measure the activity of these TFs.”

      Finally, we have also moved the TFs’ data from Figure 3 to Figure S12 in the Supplementary information.

      These amendments can be found on lines 288-293.

      (12) When discussing the possibility of nutrient supplementation versus efflux inhibition as a potential therapeutic strategy, it could be noted that nutrient supplementation cannot be done in many infection contexts. The host immune system and host/bacterial cell density control nutrient access.

      We have now added the following statement: “Moreover, nutrient supplementation as a therapeutic strategy may not be viable in many infection contexts, as host density and the immune system often regulate access to nutrients [3]”.

      These amendments can be found on lines 553-555.

      Reviewer 2:

      (1) Some questions regarding the mechanism remain. One shortcoming of the setup of the transcriptomics experiment is that the tachyplesin-NBD probe itself has antibiotic efficacy and induces phenotypes (and eventually cell death) in the ´high accumulator´cells. This makes it challenging to interpret whether any differences seen between the two groups are causative for the observed accumulation pattern or if they are a consequence of differential accumulation and downstream phenotypic effects.

      We agree with the reviewer and we have now acknowledged that “tachyplesin-NBD has antibiotic efficacy (see Figure 2) and has an impact on the E. coli transcriptome (Figure 3). Therefore, we cannot conclude whether the transcriptomic differences reported between low and high accumulators of tachyplesin-NBD are causative for the distinct accumulation patterns or if they are a consequence of differential accumulation and downstream phenotypic effects.”

      These amendments can be found on lines 283-287.

      (2) It would be relevant to test and report the MIC of sertraline for the strain tested, particularly since in Figure 4G an initial reduction in CFUs is observed for sertraline treatment, which suggests the existence of biological effects in addition to efflux inhibition.

      We have now measured the MIC of sertraline against E. coli BW25113 finding the MIC value to be 128 μg mL<sup>-1</sup> (418 µM). This value is more than four times higher compared to the sertraline concentration employed in our study, i.e. 30 μg mL<sup>-1</sup> (98 μM).

      These amendments can be found on lines 389-391 and data has been added to Figure 4 – Source Data.

      (3) The role of efflux systems is further supported by the finding that efflux pump inhibitors sensitize E. coli to tachyplesin and prevent the occurrence of the tolerant ´low accumulator´ subpopulations. In principle, this is a great way of validating the role of efflux pumps, but the limited selectivity of these inhibitors (CCCP is an uncoupling agent, and for sertraline direct antimicrobial effects on E. coli have been reported by Bohnert et al.) leaves some ambiguity as to whether the synergistic effect is truly mediated via efflux pump inhibition. To strengthen the mechanistic angle of the work analysis of tachyplesin-NBD accumulation in mutants of the identified efflux components would be interesting.

      We have now performed tachyplesin-NBD accumulation assays using 28 single and 4 double E. coli BW25113 gene-deletion mutants of efflux components and transcription factors regulating efflux. While for the majority of the mutants we recorded bimodal distributions of tachyplesin-NBD accumulation similar to the distribution recorded for the E. coli BW25113 parental strain (Figure 4B and Figure S13), we found unimodal distributions of tachyplesin-NBD accumulation constituted only of high accumulators for both DqseB and DqseBDqseC mutants as well as reduced numbers of low accumulators for the DacrADtolC mutant (Figure 4B). Considering that the AcrAB-TolC tripartite RND efflux system is known to confer genetic resistance against AMPs like protamine and polymyxin-B [29,30] and that the quorum sensing regulators qseBC might control the expression of acrA [64] , these data further corroborate the hypothesis that low accumulators can efflux tachyplesin and survive treatment with this AMP.

      These amendments can be found on lines 351-361, in the new Figure 4B and in the new Figure S14.

      Moreover, we have also carried out further efflux assays with both ethidium bromide and tachyplesin-NBD to further demonstrate the role of efflux in reduced accumulation of tachyplesin as well as acknowledging that other mechanisms (i.e reduced influx, increased protease activity or increased secretion of OMVs) could play an important role, please see Response 1 to Reviewer 1.

      (4) The authors imply that protease could contribute to the low accumulator mechanism. Proteases could certainly cleave and thus inactivate AMPs/tachyplesin, but would this effect really lead to a reduction in fluorescence levels since the fluorophore itself would not be affected by proteolytic cleavage?

      We agree with the reviewer that nitrobenzoxadiazole (NBD) might not be cleaved by proteases that inactivate tachyplesin and other AMPs. Therefore, inactivation of tachyplesin by proteases might not affect cellular fluorescence levels unless efflux of NBD is possible following the cleavage of tachyplesin-NBD. We have therefore removed the statement “Conversely, should efflux or proteolytic activities by proteases underpin the functioning of low accumulators, we should observe high initial tachyplesin-NBD fluorescence in the intracellular space of low accumulators followed by a decrease in fluorescence due to efflux or proteolytic degradation.” We have now stated the following: “Low accumulators displayed an upregulation of peptidases and proteases compared to high accumulators, suggesting a potential mechanism for degrading tachyplesin (Table S1 and Data Set S3).”

      These amendments can be found on lines 280-282.

      (5) To facilitate comparison with other literature (e.g. papers on sertraline) it would be helpful to state compound concentrations also as molar concentrations.

      We have now added the molar concentrations alongside all instances where concentrations are stated in μg mL<sup>-1</sup>.

      (6) The authors tested a series of efflux pump inhibitors and found that CCCP and sertraline prevented the generation of the low accumulator subpopulation, whereas other inhibitors did not. An overview and discussion of the known molecular targets and mode of action of the different selected inhibitors could reveal additional insights into the molecular mechanism underlying the synergy with tachyplesin.

      We have now added molecular targets and mode of action of the different inhibitors where known. “Moreover, we repeated tachyplesin-NBD efflux assays in the presence of M9 containing 50 μg mL<sup>-1</sup> (244 μM) carbonyl cyanide m-chlorophenyl hydrazone (CCCP), an ionophore that disrupts the proton motive force (PMF) and is commonly employed to abolish efflux and found that all cells retained tachyplesin-NBD fluorescence (Figure S15B). However, it is important to note that CCCP does not only abolish efflux but also other respiration-associated and energy-driven processes [63].” And “Interestingly, M9 containing 30 µg mL<sup>-1</sup> (98 μM) sertraline (Figure 4D and S15C), an antidepressant which inhibits efflux activity of RND pumps, potentially through direct binding to efflux pumps [65] and decreasing the PMF [66], or 50 µg mL<sup>-1</sup> (110 μM) verapamil (Figure S15D), a calcium channel blocker that inhibits MATE transporters [67] by a generally accepted mechanism of PMF generation interference [68,69], was able to prevent the emergence of low accumulators. Furthermore, tachyplesin-NBD cotreatment with sertraline simultaneously increased tachyplesin-NBD accumulation and PI fluorescence levels in individual cells (Figure 4E and F, p-value < 0.0001 and 0.05, respectively). The use of berberine, a natural isoquinoline alkaloid that inhibits MFS transporters [70] and RND pumps [71], potentially by inhibiting conformational changes required for efflux activity [70], and baicalein, a natural flavonoid compound that inhibits ABC [72] and MFS [73,74] transporters, potentially through PMF dissipation [75], prevented the formation of a bimodal distribution of tachyplesin accumulation, however displayed reduction in fluorescence of the whole population (Figure S15E and F). Phenylalanine-arginine beta-naphthylamide (PAbN), a synthetic peptidomimetic compound that inhibits RND pumps [76] through competitive inhibition [77], reserpine, an indole alkaloid that inhibits ABC and MFS transporters, and RND pumps [78], by altering the generation of the PMF [69], and 1-(1-naphthylmethyl)piperazine (NMP), a synthetic piperazine derivative that inhibits RND pumps [79], through non-competitive inhibition [80], did not prevent the emergence of low accumulators (Figure S15G-I).”

      These amendments can be found on lines 337-342 and 367-385.

      (7) Page 8. The term ´medium accumulators´ for a 1:1 mix of low and high accumulators is misleading.

      We have now replaced the term “medium accumulators” with “a 1:1 (v/v) mixture of low and high accumulators”.

      These amendments to the description can be found on lines 238-239.

      (8) Figure 3. It may be more appropriate to rephrase the title of the figure to ´biological processes associated with low tachyplesin accumulation´ (rather than ´facilitate accumulation´). The same applies to the section title on page 8.

      We have amended the title of Figure 3 as requested by the reviewer.

      (9) The fact that the low accumulation phenotype depends on the growth media and conditions and can be prevented by nutrients is highly relevant. I would encourage the authors to consider showing the corresponding data in the main manuscript rather than in the SI.

      We have created a new Figure 5, displaying the impact of the nutritional environment and bacterial growth phase on both tachyplesin-NBD accumulation and efficacy.

      (10) In the discussion the authors state´ Heterogeneous expression of efflux pumps within isogenic bacterial populations has been reported 29,32,33,67-69. However, recent reports have suggested that efflux is not the primary mechanism of antimicrobial resistance within stationary-phase bacteria 31,70.´. In light of the authors´ findings that the response to tachyplesin is induced by exposure and is not pre-selected, could they speculate on why this specific response can be induced in stationary, but not exponential cells? Could there be a combination of pre-existing traits and induced responses at play? Could e.g. the reduced growth rate/metabolism in these cells render these cells less susceptible to the intracellular effects of tachyplesin and slow down the antibiotic efficacy, giving the cells enough time to mount additional protective responses that then lead to the low accumulation phenotype?

      We have now acknowledged that it is conceivable that other pre-existing traits of low accumulators also contribute to reduced tachyplesin accumulation. For example, reduced protein synthesis, energy production and gene expression in low accumulators could slow down tachyplesin efficacy, giving low accumulators more time to mount efflux as an additional protective response.

      “As our accumulation assay did not require the prior selection for phenotypic variants, we have demonstrated that low accumulators emerge subsequent to the initial high accumulation of tachyplesin-NBD, suggesting enhanced efflux as an induced response. However, it is conceivable that other pre-existing traits of low accumulators also contribute to reduced tachyplesin accumulation. For example, reduced protein synthesis, energy production, and gene expression in low accumulators could slow down tachyplesin efficacy, giving low accumulators more time to mount efflux as an additional protective response.”

      This amendment can be found on lines 482-489.

      (11) In the abstract: Is it true that low accumulators ´sequester´ the drug in their membrane? In my understanding ´sequestering´ would imply that low accumulators would bind higher levels of tachyplesin-NBD in their membrane compared to high accumulators (and thereby preventing it from entering the cells). According to Figure 1 J, K, it rather seems that the fluorescent signal around the membrane is also stronger in high accumulators.

      We have now removed the sentence “low accumulators sequester the drug in their membrane” from the abstract. We have instead stated: “These phenotypic variants display enhanced efflux activity to limit intracellular peptide accumulation.”

      These amendments can be found on lines 34-35.

      Reviewer 3:

      (1) The authors' claims about high efflux being the main mechanism of survival are unconvincing, given the current data. There can be several alternative hypotheses that could explain their results, such as lower binding of the AMP, lower rate of internalization, metabolic inactivity, etc. It is unclear how efflux can be important for survival against a peptide that the authors claim binds externally to the cell. The addition of efflux assays would be beneficial for clear interpretations. Given the current data, the authors' claims about efflux being the major mechanism in this resistance are unconvincing (in my humble opinion). Some direct evidence is necessary to confirm the involvement of efflux. The data with CCCP in Figure 4C can only indicate accumulation, not efflux. The authors are encouraged to perform direct efflux assays using known methods (e.g., PMIDs 20606071, 30981730, etc.). Figure 4A: The data does not support the broad claims about efflux. First, if the peptide is accumulated on the outside of the outer membrane, how will efflux help in survival? The dynamics shown in 4A may be due to lower binding, lower entry, or lower efflux. These mechanisms are not dissected here. Second, the heterogeneity can be preexisting or a result of the response to this stress. Either way, whether active efflux or dynamic transcriptomic changes are responsible for these patterns is not clear. Direct efflux assays are crucial to conclude that efflux is a major factor here.

      This important comment is similar in scope to the first comment of reviewer 1 and it is partly due to the fact that we had not clearly explained our efflux assays reported in Figure 4 in the original manuscript. We kindly refer this reviewer to our extensive response 1 to reviewer 1 and corresponding amendments on lines 316-350 and in the new Figure S13 and Figure 4 (reported in the response 1 to reviewer 1 above), where we have now fully addressed this reviewer’s and reviewer 1 concerns, as well as performing new experiments following their important suggestions and the methods described in PMIDs 20606071 suggested by this reviewer.

      (2) The fluorescent imaging experiments can be conducted in the presence of externally added proteases, such as proteinase K, which has multiple cleavage sites on tachyplesin. This would ensure that all the external peptides (both free and bound) are removed. If the signal is still present, it can be concluded that the peptide is present internally. If the peptide is primarily external, the authors need to explain how efflux could help with externally bound peptides. Figure 1J-K: How are the authors sure about the location of the intensity? The peptide can be inside or outside and still give the same signal. To prove that the peptide is inside or outside, a proteolytic cleavage experiment is necessary (proteinase K, Arg-C proteinase, clostripain, etc.).

      We thank the reviewer for this important suggestion.

      We have now performed experiments where stationary phase E. coli was incubated in 46 μg mL<sup>-1</sup> (18.2 μM) tachyplesin-NBD in M9 for 60 min. Next, cells were pelleted and washed to remove extracellular tachyplesin-NBD and then incubated in either M9 or 20 μg mL<sup>-1</sup> (0.7 μΜ) proteinase K in M9 for 120 min. We found that the fluorescence of low accumulators decreased over time in the presence of proteinase K; in contrast, the fluorescence of high accumulators did not decrease over time in the presence of proteinase K. These data therefore suggest that tachyplesin-NBD is present only on the cell membrane of low accumulators and both on the membrane and intracellularly in high accumulators.

      Moreover, confocal microscopy using tachyplesin-NBD along with the membrane dye FM™ 4-64FX further confirmed that tachyplesin-NBD is present only on the cell membrane of low accumulators and both on the membrane and intracellularly in high accumulators.

      These amendments can be found on lines 173-179, lines 188-192 and in the new Figures S4 and S6.

      (3) Further genetic experiments are necessary to test whether efflux genes are involved at all. The genetic data presented by the authors in Figure S11 is crucial and should be further extended. The problem with fitting this data to the current hypothesis is as follows: If specific efflux pumps are involved in the resistance mechanism, then single deletions would cause some changes to the resistance phenotype, and the data in Figure S11 would look different. If there is redundancy (as is the case in many efflux phenotypes), the authors may consider performing double deletions on the major RND regulators (for example, evgA and marA). Additionally, the deletion of pump components such as TolC (one of the few OM components) and adaptors (such as acrA/D) might also provide insights. If the peptide is present in the periplasm, then deletions involving outer components would become important.

      This important comment is similar in scope to the third comment of reviewer 2. We have now performed tachyplesin-NBD accumulation assays using 28 single and 4 double E. coli BW25113 gene-deletion mutants of efflux components and transcription factors regulating efflux. While for the majority of the mutants we recorded bimodal distributions of tachyplesin-NBD accumulation similar to the distribution recorded for the E. coli BW25113 parental strain (Figure 4B and Figure S13), we found unimodal distributions of tachyplesin-NBD accumulation constituted only of high accumulators for both DqseB and DqseBDqseC mutants as well as reduced numbers of low accumulators for the DacrADtolC mutant.

      These amendments can be found on lines 351-361, in the new Figure 4B and in the new Figure S14, please also see our response to comment 3 of reviewer 2.

      (4) Line numbers would have been really helpful. Please mention the size of the peptide (length and spatial) for readers.

      We have now added line numbers to the revised manuscript. The length and molecular weight of tachyplesin-1 have now been added on lines 75.

      (5) Figure S4 is unclear. How were the low accumulators collected? What prompted the low-temperature experiment? The conclusion that it accumulates at the outer membrane is unjustified. Where is the data for high accumulators?

      We have now corrected the results section to state that tachyplesin-NBD accumulates on the cell membranes, rather than at the outer membrane of E. coli cells.

      These amendments can be found on lines 178 and 190.

      We would like to clarify that in Figure S4 we compare the distribution of tachyplesin-NBD single-cell fluorescence at low temperature versus 37 °C across the whole stationary phase E. coli population, we did not collect low accumulators only.

      The low-temperature experiment was prompted by a previous publication paper (Zhou Y et al. 2015: doi: 10.1021/ac504880r. Epub 2015 Mar 24. PMID: 25753586) that showed non-specific adherence of antimicrobials to the bacterial surface occurs at low temperatures and that passive and active transport of antimicrobials across the membrane is significantly diminished. Additionally, there are previous reports that suggest low temperatures inhibit post-binding peptide-lipid interactions, but not the primary binding step (PMID: 16569868; PMCID: PMC1426969; PMID: 3891625; PMCID: PMC262080).

      Therefore, the low-temperature experiment was performed to quantify the fluorescence of cells due to non-specific binding. This quantification allowed us to deduce that fluorescence levels of high accumulators are above the measured non-specific binding fluorescence (measured in the low-temperature experiment for the whole stationary phase E. coli population) is the result of intracellular tachyplesin-NBD accumulation. In contrast, the comparable fluorescence levels between all the cells in the low-temperature experiment and the low accumulator subpopulation at 37 °C suggest that tachyplesin-NBD is predominantly accumulated on the cell membranes of low accumulators instead of intracellularly.

      Please also see our response to comment 2 above for further evidence supporting that tachyplesin-NBD accumulates only on the cell membranes of low accumulators and both on the cell membranes and intracellularly in low accumulators.

      (6) Figure S5: Describe the microfluidic setup briefly. Why did the distribution pattern change (compared to Figure 1A)? Now, there are more high accumulators. Does the peptide get equally distributed between daughter cells?

      We have now added a brief description of the microfluidic setup on lines 182-184.

      The difference in the abundance of low and high accumulators between the microfluidics and flow cytometry measurements is likely due to differences in cell density, i.e. a few cells per channel vs millions of cells in a tube. A second major difference is that tachyplesin-NBD is continuously supplied in the microfluidic device for the entire duration of the experiment, therefore, the extracellular concentration of tachyplesin-NBD does not decrease over time. In contrast, tachyplesin-NBD is added to the tube only at the beginning of the experiment, therefore, the extracellular concentration of tachyplesin-NBD likely decreases in time as it is accumulated by the bacteria. The relative abundance of low and high accumulators changes with the extracellular concentration of tachyplesin-NBD as shown in Figure 1A.

      We have added a sentence to acknowledge this discrepancy on lines 186-187.

      No instances of cell division were observed in stationary phase E. coli in the absence of nutrients in all microfluidics assays. Therefore, we cannot comment on the distribution of tachyplesin-NBD across daughter cells.

      (7) How did the authors conclude this: "tachyplesin accumulation on the bacterial membrane may not be sufficient for bacterial eradication"? It is completely unclear to this reviewer.

      We presented this hypothesis at the end of the section “Tachyplesin accumulates primarily in the membranes of low accumulators” as a link to the following section “Tachyplesin accumulation on the bacterial membranes is insufficient for bacterial eradication” where we test this hypothesis. For clarity, we have now moved this sentence to the beginning of the section “Tachyplesin accumulation on the bacterial membranes is insufficient for bacterial eradication”.

      (8) What is meant by membrane accumulation? Outside, inside, periplasm? Where? Figure 2H conclusions are unjustified. Bacterial killing with many antibiotics is associated with membrane damage, which is an aftereffect of direct antibiotic action. How can the authors state that "low accumulators primarily accumulate tachyplesin-NBD on the bacterial membrane, maintaining an intact membrane, strongly contributing to the survival of the bacterial population"? This reviewer could not find justifications for the claims about the location of the accumulation or cells actively maintaining an intact membrane. Also, PI staining reports damage both membranes.

      Based on the experiments that we have carried out after this reviewer’s suggestions, please see response 2 above, it is likely that tachyplesin-NBD is present only on the bacterial surface, i.e. in or on the outer membrane of low accumulators, considering that their fluorescence decreases during treatment with proteinase K. However, to take a more conservative approach we have now written on the cell membranes throughout the manuscript, i.e. either the outer or the inner membrane.

      We have also rephrased the statement reported by the reviewer as follows:

      “Taken together with PI staining data indicating membrane damage caused by high tachyplesin accumulation, these data demonstrate that low accumulators, which primarily accumulate tachyplesin-NBD on the bacterial membranes, maintain membrane integrity and strongly contribute to the survival of the bacterial population in response to tachyplesin treatment.”

      These amendments can be found on lines 228-232.

      (9) Figure 3: The findings about cluster 2 and cluster 4 genes do not correlate logically. If the cells are in a metabolically low active state, how are the cells getting enough energy for active efflux and membrane transport? This scenario is possible, but the authors must confirm the metabolic activity by measuring respiration rates. Also, metabolically less-active cells may import a lower number of peptides to begin with. That also may contribute to cell survival. Additionally, lowered metabolism is a known strategy of antibiotic survival that is distinctly different from efflux-mediated survival.

      Following this reviewer’s comment and comment 2 of reviewer 1, we have now carried out further experiments to estimate the metabolic activity of low and high accumulators. Please see our response to comment 2 of reviewer 1 above.

      (10) Figure S10: How did the authors test their hypothesis that cardiolipin is involved in the binding of the peptide to the membrane? The transcriptome data does not confirm it. Genetic experiments are necessary to confirm this claim.

      We would like to clarify that we have not set out to test the hypothesis that cardiolipin is involved in the binding of tachyplesin-NBD. We have only stated that cardiolipin could bind tachyplesin due to its negative charge. We have now cited two previous studies that suggest that tachyplesin has an increased affinity for lipids mixtures containing either cardiolipin (Edwards et al. ACS Inf Dis 2017) or PG lipids (Matsuzaki et al. BBA 1991), i.e. the main constituents of cardiolipins.

      These amendments can be found on lines 264-267.

      (11) Figure 4B-F: There are several controls missing. For Sertraline treatment, the authors must test that the metabolic profile, transcriptomic changes, or import of the peptide are not responsible for enhanced survival. CCCP will not only abolish efflux but also many other respiration-associated or all other energy-driven processes.

      Figure 4D presents data acquired in efflux assays in the absence of extracellular tachyplesin-NBD. Therefore, altered tachyplesin-NBD import cannot contribute to the lack of formation of the low accumulator subpopulation.

      We have now acknowledged that it is conceivable that increased tachyplesin efficacy is due to metabolic and transcriptomic changes induced by sertraline.

      These amendments can be found on lines 396-397.

      We have also acknowledged that CCCP does not only abolish efflux but also other respiration-associated and energy-driven processes.

      These amendments can be found on lines 341-342.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a very well-written paper presenting interesting findings related to the recovery following the end-Permian event in continental settings, from N China. The finding is timely as the topic is actively discussed in the scientific community. The data provides additional insights into the faunal, and partly, floral global recovery following the EPE, adding to the global picture.

      Strengths:

      The conclusions are supported by an impressive amount of sedimentological and paleontological data (mainly trace fossils) and illustrations.

      We thank Reviewer #1 for the positive assessments.

      Weaknesses: [eliminated in revision]

      We thank Reviewer #1.

      Reviewer #2 (Public review):

      Summary:

      The authors made a thorough revision of the manuscript, strengthening the message. They also considered all the comments made by the reviewers and provided appropriate and convincing arguments.

      Strengths:

      The revised manuscript clarifies all the major points raised by the reviewers, and the way the information is presented (in the text, figures and tables) is clear.

      We thank Reviewer #2 for the positive comments on our work.

      Weaknesses:

      The authors provided an appropriate and convincing rebuttal regarding the potential weakness I pointed out in the first review of the manuscript. Therefore, I do not see any major issue in their work.

      Introduction

      (1) P. 2, L. 32: Replace "to migrated" with "to migrate".

      Revised as suggested.

      (2) P. 3, L. 43-44: We recently published a review article on the tetrapod terrestrial record from the Central European Basin, showing that Olenekian tetrapod faunas (and ichnofaunas) were already quite rich and diverse. Article: https://doi.org/10.1016/j.earscirev.2025.105085

      Yes, we have read this paper. This summary is very important for the understanding of the biotic recovery after the PTME, especially in the early stage. We have added the new result in our manuscript.

      (3) P. 3, L. 57: Replace "recovered terrestrial ecosystems in tropical" with "recovered tropical terrestrial ecosystems".

      Revised as suggested.

      Results and Discussion

      (4) P. 6, L. 118: Replace "declined" with "decline".

      Revised as suggested.

      (5) P. 7, L. 131: Replace "microbial" with "microbially".

      Revised as suggested.

      Conclusions

      (6) P. 11, L. 224: Replace "as little as" with "as early as".

      Revised as suggested.

      (7) P. 11, L. 227: Replace "not only results in" with "not only result in".

      Revised as suggested.

      (8) 11, L. 230: Replace "suggesting" with "suggest".

      Revised as suggested.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Guo and colleagues features the documentation and interpretation of three successions of continental to marginal marine deposits spanning the P/T transition and their respective ichnofaunas. Based on these new data inferences concerning end-Permian mass extinction and Triassic recovery in the tropical realm are discussed.

      Strengths:

      The manuscript is well-written and organized and includes a large amount of new lithological and ichnological data that illuminate ecosystem evolution in a time of large-scale transition. The lithological documentations, facies interpretations, and ichnotaxonomic assignments look okay (with a few exceptions).

      We thank Reviewer #3 for the positive assessments.

      Weaknesses:

      Weaknesses: [all eliminated in revision]

      We thank Reviewer #3.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The authors found that IL-1b signaling is pivotal for hypoxemia development and can modulate NETs formation in LPS+HVV ALI model.  

      Strengths: 

      They used IL1R1 ko mice and proved that IL1R1 is involved in ALI model proving that IL1b signalling leads towards ARDS. In addition, hypothermia reduces this effect, suggesting a therapeutic option.  

      We thank the Reviewer for recognizing the strengths of our study and their positive feedback.

      Weaknesses: 

      (1) IL1R1 binds IL1a and IL1b. What would be the role of IL1a in this scenario? 

      Thank you for asking this question. We have addressed this in our previous paper (Nosaka et al. Front Immunol 2020;11; 207) where we used  anti-IL-1a and IL-1a KO mice (Nosaka et al. Front Immunol 2020;11; 207) in our model and found that neither anti-IL-1a treated mice nor IL-1a KO mice were protected. Thus, IL-1b plays a role in inducing hypoxemia during LPS+HVV but not IL-1a. We will now add this point in our revised manuscript discussion.

      (2) The authors depleted neutrophils using anti-Ly6G. What about MDSCs? Do these latter cells be involved in ARDS and VILI?  

      Anti-Ly6G neutrophils depletion may potentially affect G-MDSCs as well (Blood Adv 2022 Jul 29;7(1):73–86), however, we have not looked directly at G-MDSCs.  If these cells were depleted we would have expected to see an increase in inflammation, which we did not.   Instead, anti-Ly6G treated mice were protected. Thus, we can not comment on any presumed role of G-MDSCs in LPS+HVV induced severe ALI model that we used.  

      (3) The authors found that TH inhibited IL-1β release from macrophages led to less NETs formation and albumin leakage in the alveolar space in their lung injury model. A graphical abstract could be included suggesting a cellular mechanism.  

      Thanks for summarizing our findings and the suggestion. Unfortunately, eLIFE does not publish a graphical abstract.  

      (4) If Macrophages are responsible for IL1b release that via IL1R1 induces NETosis, what happens if you deplete macrophages? what is the role of epithelial cells?  

      Previous studies have found that macrophage depletion is protective in several models of ALI (Eyal. Intensive Care Med. 2007;33:1212–1218., Lindauer.  J Immunol. 2009;183:1419–1426.), and other researchers have found that airway epithelial cells did not contribute to IL-1β secretion (Tang. PLoS ONE. 2012;7:e37689.). We have previously reported that epithelial cells produce IL-18 without LPS priming signal during LPS+HVV (Nosaka et al. Front Immunol 2020;11; 207). Thus, IL-18 is not sufficient to induce Hypoxemia as Saline+HVV treated mice do not develop hypoxemia (Nosaka et al. Front Immunol 2020;11; 207). We will now add this point to the revised discussion of the manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript by Nosaka et al is a comprehensive study exploring the involvement of IL1beta signaling in a 2-hit model of lung injury + ventilation, with a focus on modulation by hypothermia. 

      Strengths: 

      The authors demonstrate quite convincingly that interleukin 1 beta plays a role in the development of ventilator-induced lung injury in this model, and that this role includes the regulation of neutrophil extracellular trap formation. The authors use a variety of in vivo animal-based and in vitro cell culture work, and interventions including global gene knockout, cell-targeted knockout and pharmacological inhibition, which greatly strengthen the ability to make clear biological interpretations. 

      We thank the Reviewer for their positive feedback 

      Weaknesses: 

      A primary point for open discussion is the translatability of the findings to patients. The main model used, one of intratracheal LPS plus mechanical ventilation is well accepted for research exploring the pathogenesis and potential treatments for acute respiratory distress syndrome (ARDS). However, the interpretation may still be open to question - in the model here, animals were exposed to LPS to induce inflammation for only 2 hours, and seemingly displayed no signs of sickness, before the start of ventilation. This would not be typical for the majority of ARDS patients, and whether hypothermia could be effective once substantial injury is already present remains an open question. The interaction between LPS/infection and temperature is also complicated - in humans, LPS (or infection) induces a febrile, hyperthermic response, whereas in mice LPS induces hypothermia (eg. Ganeshan K, Chawla A. Nat Rev Endocrinol. 2017;13:458-465). Given this difference in physiological response, it is therefore unclear whether hypothermia in mice and hypothermia in humans are easily comparable. Finally, the use of only young, male animals such as in the current study has been typical but may be criticised as limiting translatability to people. 

      Therefore while the conclusions of the paper are well supported by the data, and the biological pathways have been impressively explored, questions still remain regarding the ultimate interpretations.  

      We agree with the reviewer that at two hours post LPS, there is only minimal pulmonary inflammation at that time (Dagvadorj et al Immunity 42, 640–653). This is a limitation to the experimental model we used in our study. Additionally, as the reviewer pointed out that LPS induces hyperthermia in human, but it is also well-established that physiological hypothermia occurs in humans with severe infections and sepsis (Baisse. Am J Emerg Med. 2023 Sep: 71: 134-138., Werner.  Am J Emerg Med. 2025 Feb;88:64-78.). Therefore, the difference between human and mouse responses to sepsis or infections may be more nuanced.  Furthermore, it is important to distinguish between physiological hypothermia (just <36°C) and therapeutic hypothermia (typically 32-34°C). We will add to the discussion whether hypothermia serves as a protective response, and the transition from normothermia to hyperthermia could have detrimental effects. We only used young male mice in our study as the Reviewer points out; we will also add this point to the revised discussion as a limitation of our study.

      Recommendations for the authors: 

      (i) With hypothermia, metabolic activity would be expected to be reduced and therefore presumably impact on CO2/pH. These may have an impact on outcomes from ventilation, so could the authors include this data and discuss as appropriate? 

      We have now included these data in Suppl Fig 6.  While we observed significant differences in blood pH and  PaCO<sub>2</sub> in Hypothermia treatment group, these values remained within clinically normal range (PaCO<sub>2</sub> : 35 - 45 mmHg, pH : 7.35 - 7.45). Neither Alkalosis (PaCO<sub>2</sub> < 35 mmHg , pH> 7.45) nor Acidosis (PaCO<sub>2</sub> > 45 mmHg, pH < 7.35) was observed.

      (ii) It is noticeable that there are quite large differences in experimental numbers between groups - typically 7-12, 5-12 in Figure 2. How were these N determined? For example is there a reason why there is apparently N = 8 for BALF neutrophils in the saline + HVV group (Figure 1c) but N = 12 for LPS + HVV group? Did any animals die during any of the protocols for example? 

      We conducted experiments with 4 mice per experiment (2 mice per group x2  or 4 mice per group) for ventilation experiments, and pooled data from 5-6 independent experiments or 3-4 independent experiments, respectively. No mouse mortality was observed (unless otherwise noted). However, in the severe ARDS group, some mice were dehydrated by the endpoint of experiments, preventing blood or BALF collections. As a result sample sizes were unequal in some case. Nevertheless, no data were selectively excluded.

      (iii) Discussion - On page 13 you refer to data involving Cl-amidine administration. This does not seem to be related to any experiments reported in the manuscript. 

      We apology for this mistake and have removed it.

      (iv) Methods - authors state that BALF was obtained after 150 minutes of ventilation, yet the experiments apparently lasted for 180 minutes. Presumably this is an error? 

      We apology for this inconsistency.  We collected blood for measuring blood gas at 30 min and 150 min after ventilation. However, mice were kept on ventilator 30 min longer, and then mice were euthanized and BALF were collected.  Thus, BALF were collected at 180 min, 30 minutes after the final blood draw. We have corrected the methods in revised manuscript.  

      (v) Statistical methods - authors state that sometimes Mann-Whitney U-test was used and sometimes unpaired t-test, presumably reflecting that some data were normally distributed and some were not. Could the authors please describe the tests used to confirm distribution of data. 

      We have clarified which stattistcal methods were used in our revised manuscript. 

      Briefly, Normality within the groups was assessed using the Shapiro-Wilk and KolmogorovSmirnov tests. Three-way ANOVA (Figure 1B; Supplemental Figure 1B-D; Supplemental Figure 6), one-way ANOVA (Supplemental Figure 4D-E; Supplemental Figure 5C), and two-way ANOVA were performed for data with more than two groups, followed by Tukey's post hoc test. Some groups analyzed by two-way ANOVA in Figure 1 and Supplemental Figure 1 failed the normality tests due to zero values (analyte not detected by ELISA) or the relatively small sample size, as samples were distributed across multiple measurements. However, the primary group of interest, LPS+HVV, showed significant differences from other groups with consistently low P-values in most datasets, supporting the decision to retain the ANOVA analyses. For comparisons between two groups, the Mann-Whitney U test was used when one or both groups failed the Shapiro-Wilk normality test, while the unpaired Student's t-test was applied to the remaining normally distributed data.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript presents a significant and rigorous investigation into the role of CHMP5 in regulating bone formation and cellular senescence. The study provides compelling evidence that CHMP5 is essential for maintaining endolysosomal function and controlling mitochondrial ROS levels, thereby preventing the senescence of skeletal progenitor cells.

      Strengths:

      The authors demonstrate that the deletion of Chmp5 results in endolysosomal dysfunction, elevated mitochondrial ROS, and ultimately enhanced bone formation through both autonomous and paracrine mechanisms. The innovative use of senolytic drugs to ameliorate musculoskeletal abnormalities in Chmp5-deficient mice is a novel and critical finding, suggesting potential therapeutic strategies for musculoskeletal disorders linked to endolysosomal dysfunction.

      Weaknesses:

      The manuscript requires a deeper discussion or exploration of CHMP5's roles and a more refined analysis of senolytic drug specificity and effects. This would greatly enhance the comprehensiveness and clarity of the manuscript.

      We thank the reviewer for these insightful comments. In the revised manuscript, we have expanded the discussion of the distinct roles of CHMP5 in different cell types. Specifically, we add the following sentences (Lines 433-439 in the combined manuscript):

      “Also, a previous study by Adoro et al. did not detect endolysosomal abnormalities in Chmp5 deficient developmental T cells [1]. Since both osteoclasts and T cells are of hematopoietic origin, and meanwhile osteogenic cells and MEFs, which show endolysosomal abnormalities after CHMP5 deficiency, are of mesenchymal origin, it turns out that the function of CHMP5 in regulating endolysosomal pathway could be cell lineage-specific, which remains clarified in future studies.”

      In addition, we tested another senolytic drug Navitoclax (ABT-263), which is a BCL-2 family inhibitor and induces apoptosis of senescent cells, in Chmp5<sup>Ctsk</sup> mice. Micro-CT analysis showed that ABT-263 could also improve periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice (Fig. 5F). Furthermore, we have also discussed the potential off-target effects of senolytic drugs in Chmp5<sup>Ctsk</sup> mice in the revised manuscript. Specifically, we added the following paragraph (Lines 441-451):

      “Furthermore, it is unclear whether the effect of senolytic drugs in Chmp5<sup>Ctsk</sup> mice involves targeting osteoclasts other than osteogenic cells, as osteoclast senescence has not yet been evaluated. However, the efficacy of Q + D in targeting osteogenic cells, which is the focus of the current study, was confirmed in Chmp5<sup>Dmp1</sup> mice (Fig. 5C-E). Additionally, Q + D caused a higher cell apoptotic ratio in Chmp5<sup>Ctsk</sup> compared to wild-type periskeletal progenitors in ex vivo culture (Fig. 5A), demonstrating the effectiveness of Q + D in targeting osteogenic cells in the Chmp5<sup>Ctsk</sup> model. Furthermore, an alternative senolytic drug ABT-263 could also ameliorate periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice (Fig. 5F). Together, these results confirm that osteogenic cell senescence is responsible for the bone overgrowth in Chmp5<sup>Ctsk</sup> and Chmp5<sup>Dmp1</sup> mice, and senolytic treatments are effective in alleviating these skeletal disorders.”

      Reviewer #2 (Public review):

      Summary:

      The authors try to show the importance of CHMP5 for skeletal development.

      Strengths:

      The findings of this manuscript are interesting. The mouse phenotypes are well done and are of interest to a broader (bone) field.

      Weaknesses:

      The mechanistic insights are mediocre, and the cellular senescence aspect poor.

      In total, it has not been shown that there are actual senescent cells that are reduced after D+Qtreatment. These statements need to be scaled back substantially.

      We thank the reviewer for these suggestive comments. We have added additional results to strengthen the senescent phenotypes of Chmp5-deficient skeletal progenitor cells, including significant enrichment of the SAUL_SEN_MAYO geneset (positively correlated with cell senescence) and the KAMMINGA_SENESCENCE geneset (negatively correlated with cell senescence) at the transcriptional level by GSEA analysis of RNA-seq data (Fig. S3C), and the increase of γH2Ax<sup>+</sup>;GFP<sup>+</sup> cells at periskeletal overgrowth in Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice vs. the periosteum of Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> control mice (Fig. 3E). These results further advocate for the senescent phenotypes of Chmp5-deficient skeletal progenitors.

      Furthermore, the combination of Q + D caused a higher cell apoptotic ratio in Chmp5<sup>Ctsk</sup> vs. wildtype periskeletal progenitors in ex vivo culture (Fig. 5A), suggesting their effectiveness in targeting periskeletal progenitor cell senescence in Chmp5<sup>Ctsk</sup> mice. Furthermore, we tested an alternative senolytic drug ABT-263, which is an inhibitor of the BCL-2 family and induces apoptosis of senescent cells, in Chmp5<sup>Ctsk</sup> mice, and ABT-263 could also alleviate periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice (Fig. 5F). Together, these results demonstrate that osteogenic cell senescence is responsible for abnormal bone overgrowth in Chmp5-deficient mice and that senolytic drugs are effective in improving these skeletal disorders.

      Reviewer #3 (Public review):

      Summary:

      In this study, Zhang et al. reported that CHMP5 restricts bone formation by controlling endolysosomemitochondrion-mediated cell senescence. The effects of CHMP5 on osteoclastic bone resorption and bone turnover have been reported previously (PMID: 26195726), in which study the aberrant bone phenotype was observed in the CHMP5-ctsk-CKO mouse model, using the same mouse model, Zhang et al., report a novel role of CHMP5 on osteogenesis through affecting cell senescence. Overall, it is an interesting study and provides new insights in the field of cell senescence and bone.

      Strengths:

      Analyzed the bone phenotype OF CHMP5-periskeletal progenitor-CKO mouse model and found the novel role of senescent cells on osteogenesis and migration.

      Weaknesses:

      (1) There are a lot of papers that have reported that senescence impairs osteogenesis of skeletal stem cells. In this study, the author claimed that Chmp5 deficiency induces skeletal progenitor cell senescence and enhanced osteogenesis. Can the authors explain the controversial results?

      Different skeletal stem cell populations in time and space have been identified and reported [2-6]. The present study shows that Chmp5 deficiency in periskeletal (Ctsk-Cre) and endosteal (Dmp1-Cre) osteogenic cells causes cell senescence and aberrant bone formation. Although cell senescence during aging can impair the osteogenesis of marrow stromal cells (MSCs), which contributes to diseases with low bone mass such as osteoporosis, aging can also increase heterotopic ossification or mineralization in musculoskeletal soft tissues such as ligaments and tendons [7]. Notably, the abnormal periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice was mainly mapped to insertion sites of tendons and ligaments on the bone (Fig. 1A and E), consistent with changes during aging. More broadly, aging can also cause abnormal ossification or mineralization in other body tissues, such as the heart valve [8, 9]. These different results reflect an aberrant state of ossification or mineralization in musculoskeletal tissues and throughout the body during aging. Based on the reviewer’s comment, we have discussed these results in the revised manuscript. Specifically, we add the following paragraph (Lines 453-462 in the combined manuscript):

      “Notably, aging is associated with decreased osteogenic capacity in marrow stromal cells, which is related to conditions with low bone mass, such as osteoporosis. Rather, aging is also accompanied by increased ossification or mineralization in musculoskeletal soft tissues, such as tendons and ligaments [7]. In particular, the abnormal periskeletal overgrowth in Chmp5<sup>Ctsk</sup> mice was predominantly mapped to insertion sites of tendons and ligaments on the bone (Fig. 1A and E), which is consistent with changes during aging and suggests that mechanical stress at these sites could contribute to the aberrant bone growth. These results suggest that skeletal stem/progenitor cells at different sites of musculoskeletal tissues could demonstrate different, even opposite outcomes in osteogenesis, due to cell senescence.”

      (2) Co-culture of Chmp5-KO periskeletal progenitors with WT ones should be conducted to detect the migration and osteogenesis of WT cells in response to Chmp5-KO-induced senescent cells. In addition, the co-culture of WT periskeletal progenitors with senescent cells induced by H2O2, radiation, or from aged mice would provide more information.

      In the present study, the increased proliferation and osteogenesis of CD45-;CD31-;GFP- periskeletal progenitors were shown as paracrine mechanisms of Chmp5-deficient periskeletal progenitors to promote bone overgrowth in Chmp5<sup>Ctsk</sup> mice (Figs. 4F, G, and S4C-E). According to the reviewer’s suggestion, we have carried out the coculture experiment and the coculture of Chmp5<sup>Ctsk</sup> with wild-type skeletal progenitors could promote osteogenesis of wild-type cells (Fig. S4B), which further supports the paracrine effect of Chmp5-deficient periskeletal progenitors.

      In addition, the cause and outcome of cell senescence could be highly heterogeneous, and different causes of cell senescence can cause significantly distinct, even opposite outcomes. Although the coculture experiments of WT periskeletal progenitors with senescent cells induced by H2O2, radiation, or from aged mice are very interesting, these are beyond the scope of the current study.

      (3) Many EVs were secreted from Chmp5-deleted periskeletal progenitors, compared to the rarely detected EVs around WT cells. Since EVs of BMSCs or osteoprogenitors show strong effects of promoting osteogenesis, did the EVs contribute to the enhanced osteogenesis induced by Chmp5defeciency? Author’s response:

      This is an interesting question. Although we did not separately test the effect of EVs from Chmp5-deficient periskeletal progenitors on the osteogenesis of WT skeletal progenitors, the CD45-;CD31-;GFP- skeletal progenitor cells from Chmp5<sup>Ctsk</sup> mice have an increased capacity of osteogenesis compared to corresponding cells from control animals (Figs. 4G and S4D). Also, the coculture of Chmp5-deficient with wild-type skeletal progenitors could enhance the osteogenesis of wild-type cells (Fig. S4B). These results suggest that EVs from Chmp5-deficient periskeletal progenitors could promote osteogenesis of neighboring WT skeletal progenitors. The specific functions of EVs of Chmp5-deficient periskeletal progenitors in regulating osteogenesis will be further investigated in future studies.

      (4) EVs secreted from senescent cells propagate senescence and impair osteogenesis, why do EVs secreted from senescent cells induced by Chmp5-defeciency have opposite effects on osteogenesis?

      The question is similar to comments #1 and #3 from this reviewer. First, the manifestations (including the secretory phenotype) and outcomes of cell senescence could be highly heterogeneous depending on inducers, tissue and cell contexts, and other factors such as “time”. Different causes of cell senescence could lead to different manifestations and outcomes, which have been discussed in the manuscript (Lines 381-383). Similarly, as mentioned above, skeletal stem/progenitor cells at different sites of musculoskeletal tissues could also demonstrate distinct, even opposite outcomes, as a result of cell senescence (Line 453-462). Second, CD45-;CD31-;GFP- periskeletal progenitor cells from Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice have an increased capacity of proliferation and osteogenesis compared to corresponding cells from control animals (Figs. 4F, G and S4C-E). Furthermore, the conditioned medium of Chmp5-deficient skeletal progenitors promoted the proliferation of ATDC5 cells (Fig. 4E) and the coculture of Chmp5<sup>Ctsk</sup> and wild-type periskeletal progenitors could enhance the osteogenesis of wild-type cells (Fig. S4B). Taken together, these results show paracrine actions of Chmp5-deficient periskeletal progenitors in promoting aberrant bone growth in Chmp5 conditional knockout mice. We also refer the reviewer to our responses to comments #1 and #3.

      (5) The Chmp5-ctsk mice show accelerated aging-related phenotypes, such as hair loss and joint stiffness. Did Ctsk also label cells in hair follicles or joint tissue?

      This is an interesting question. Although we did not check the expression of CHMP5 in hair follicles, which is outside the scope of the present study, the result in Fig. 1E showed the expression of Ctsk in joint ligaments, tendons, and their insertion sites on the bone (Lines 108-111). Notably, the periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice was mainly mapped to insertion sites of ligaments and tendons on the bone, which have been discussed in the revised manuscript (Lines 456-460).

      (6) Fifteen proteins were found to increase and five proteins to decrease in the cell supernatant of Chmp5<sup>Ctsk</sup> periskeletal progenitors. How about SASP factors in the secretory profile?

      The SASP phenotype and related factors of senescent cells could be highly heterogeneous depending on inducers, cell types, and timing of senescence [10, 11]. Most of the proteins we identified in the secretome analysis have previously been reported in the secretory profile of osteoblasts or involved in the regulation of osteogenesis. Although we were interested in changes in common SASP factors, such as cytokines and chemokines, the experiment did not detect these factors, probably due to their small molecular weights and the technical limitations of the mass-spec analysis. We have clarified this in the revised manuscript. Specifically, we add the following sentences (Lines 258-261):

      “Notably, the secretome analysis did not detect common SASP factors, such as cytokines and chemokines, in the secretory profile of Chmp5<sup>Ctsk</sup> periskeletal progenitors, probably due to their small molecular weights and the technical limitations of the mass-spec analysis.”

      (7) D+Q treatment mitigates musculoskeletal pathologies in Chmp5 conditional knockout mice. In the previously published paper (CHMP5 controls bone turnover rates by dampening NF-κB activity in osteoclasts), inhibition of osteoclastic bone resorption rescues the aberrant bone phenotype of the Chmp5 conditional knockout mice. Whether the effects of D+Q on bone overgrowth is because of the inhibition of bone resorption?

      This is an important question. We have discussed the potential off-target effect of senolytic drugs in Chmp5<sup>Ctsk</sup> mice in the revised manuscript. Specifically, we add the following paragraph (Lines 441451):

      “Furthermore, it is unclear whether the effect of senolytic drugs in Chmp5<sup>Ctsk</sup> mice involves targeting osteoclasts other than osteogenic cells, as osteoclast senescence has not yet been evaluated. However, the efficacy of Q + D in targeting osteogenic cells, which is the focus of the current study, was confirmed in Chmp5<sup>Dmp1</sup> mice (Fig. 5C-E). Additionally, Q + D caused a higher cell apoptotic ratio in Chmp5<sup>Ctsk</sup> compared to wild-type periskeletal progenitors in ex vivo culture (Fig. 5A), demonstrating the effectiveness of Q + D in targeting osteogenic cells in the Chmp5<sup>Ctsk</sup> model. Furthermore, an alternative senolytic drug ABT-263 could also ameliorate periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice (Fig. 5F). Together, these results confirm that osteogenic cell senescence is responsible for the bone overgrowth in Chmp5<sup>Ctsk</sup> and Chmp5<sup>Dmp1</sup> mice and senolytic treatments are effective in alleviating these skeletal disorders.”

      (8) The role of VPS4A in cell senescence should be measured to support the conclusion that CHMP5 regulates osteogenesis by affecting cell senescence.

      We thank the reviewer for this suggestion. The current study mainly reports the function of CHMP5 in the regulation of skeletal progenitor cell senescence and osteogenesis. The roles of VPS4A in cell senescence and skeletal biology will be further explored in future studies. We have discussed this in the revised manuscript. Specifically, we add the following sentence (Lines 407-409):

      “The roles of VPS4A in regulating musculoskeletal biology and cell senescence should be further explored in future studies.”

      (9) Cell senescence with markers, such as p21 and H2AX, co-stained with GFP should be performed in the mouse models to indicate the effects of Chmp5 on cell senescence in vivo.

      According to the reviewer’s suggestion, we have already performed immunostaining of γH2AX and colocalization with GFP in Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> and Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> mice. The results showed that there are more γH2AX+;GFP+ cells in the periskeletal overgrowth in Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice compared to the periosteum of Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> control animals. Because the γH2AX staining could stand as one of the critical results supporting the senescent phenotype of Chmp5-deficient periskeletal progenitors. We have added these results to Fig. 3E and put Fig. 3F in the original manuscript into Fig. S3E due to the space limitation in Figure 3. In sum, these results further enrich the senescent manifestations of Chmp5-deficient periskeletal progenitors.

      (10) ADTC5 cell as osteochondromas cells line, is not a good cell model of periskeletal progenitors.

      Maybe primary periskeletal progenitor cell is a better choice.

      ATDC5 cells are typically used as a chondrocyte progenitor cell line. However, our previous study showed that ATDC5 cells could also be used as a reasonable cell model for periskeletal progenitors [12], which was mentioned in the manuscript (Lines 202-204). In addition, the results of ATDC5 cells were also verified in primary periskeletal progenitor cells in this study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Despite the robust experimental framework and intriguing findings, there are several areas that require further attention to enhance the manuscript's overall quality and clarity:

      (1) The manuscript could benefit from a more in-depth discussion of the tissue-specific roles of CHMP5, particularly in addressing why CHMP5 deficiency results in distinct outcomes in osteogenic cells as opposed to other cell types, such as osteoclasts. Expanding the discussion would greatly enhance the comprehensiveness and clarity of the manuscript.

      Based on the reviewer’s suggestion, we have expanded the discussion of the distinct roles of CHMP5 in different cell types. Specifically, we state (Lines 433-439):

      “Also, a previous study by Adoro et al. did not detect endolysosomal abnormalities in _Chmp5_deficient developmental T cells [1]. Since both osteoclasts and T cells are of hematopoietic origin, and meanwhile osteogenic cells and MEFs, which show endolysosomal abnormalities after CHMP5 deficiency, are of mesenchymal origin, it turns out that the function of CHMP5 in regulating the endolysosomal pathway could be cell lineage-specific, which remains clarified in future studies.”

      (2) Given that Figures 1 and 2 suggest that the absence of Chmp5 (CHMP5Ctsk & CHMP5Dmp1) leads to disordered proliferation or mineralization of bone or osteoblasts, the manuscript should delve deeper into the potential links between these findings and aging-related processes, such as age-associated fibrosis. Providing clearer explanations and discussion on these connections would help present a more cohesive understanding of the results in the context of aging.

      We thank the reviewer for this favorable suggestion. A feature of aging is heterotopic ossification or mineralization in musculoskeletal soft tissues, including tendons and ligaments [7]. Notably, the abnormal periskeletal bone formation in Chmp5<sup>Ctsk</sup> mice in this study was mostly mapped to the insertion sites of tendons and ligaments on the bone (Fig. 1A and E), which is consistent with changes during aging and suggests that mechanical stress at these sites could be a contributor to periskeletal overgrowth. We have discussed these results in the revised manuscript. Specifically, we add the following paragraph (Lines 453-462):

      “Notably, aging is associated with decreased osteogenic capacity in marrow stromal cells, which is related to conditions with low bone mass, such as osteoporosis. Rather, aging is also accompanied by increased ossification or mineralization in musculoskeletal soft tissues, such as tendons and ligaments [7]. In particular, the abnormal periskeletal overgrowth in Chmp5<sup>Ctsk</sup> mice was predominantly mapped to the insertion sites of tendons and ligaments on the bone (Fig. 1A and E), which is consistent with changes during aging and suggests that mechanical stress at these sites could contribute to the aberrant bone growth. These results suggest that skeletal stem/progenitor cells at different sites of musculoskeletal tissues could demonstrate different, even opposite outcomes in osteogenesis, due to cell senescence.”

      (3) The manuscript would be improved by a more refined analysis in Figures 3 and 5, particularly in relation to the use of senolytic drugs. Furthermore, a detailed discussion of the specificity and potential off-target effects of quercetin and dasatinib treatments in Chmp5-deficient mice would strengthen the therapeutic claims of these drugs.

      In Figure 3, we have added additional experiments and results to strengthen the senescent phenotypes of Chmp5-deficient periskeletal progenitors, including significant enrichment of the SAUL_SEN_MAYO geneset (positively correlated with cell senescence) and the KAMMINGA_SENESCENCE geneset (negatively correlated with cell senescence) at the transcriptional level by GSEA analysis of RNA-seq data (Fig. S3F), and an increase of γH2AX+;GFP+ cells at the site of periskeletal overgrowth in Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice compared to the periosteum of Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> control mice (Fig. 3E). These results further enrich the senescent molecular manifestations of Chmp5-deficient periskeletal progenitors.

      In Figure 5, we used an alternative senolytic drug ABT-263 to treat Chmp5<sup>Ctsk</sup> mice, and this antisenescence treatment could also alleviate periskeletal bone overgrowth in this mouse model (Fig. 5F). Furthermore, we have also discussed the potential off-target effects of senolytic drugs in Chmp5<sup>Ctsk</sup> mice. Specifically, we add the following paragraph (Lines 441-451):

      “Furthermore, it is unclear whether the effect of senolytic drugs in Chmp5<sup>Ctsk</sup> mice involves targeting osteoclasts other than osteogenic cells, as osteoclast senescence has not yet been evaluated. However, the efficacy of Q + D in targeting osteogenic cells, which is the focus of the current study, was confirmed in Chmp5<sup>Dmp1</sup> mice (Fig. 5C-E). Additionally, Q + D caused a higher cell apoptotic ratio in Chmp5<sup>Ctsk</sup> compared to wild-type periskeletal progenitors in ex vivo culture (Fig. 5A), demonstrating the effectiveness of Q + D in targeting osteogenic cells in the Chmp5<sup>Ctsk</sup> model. Furthermore, an alternative senolytic drug ABT-263 could also ameliorate periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice (Fig. 5F). Together, these results confirm that osteogenic cell senescence is responsible for the bone overgrowth in Chmp5<sup>Ctsk</sup> and Chmp5<sup>Dmp1</sup> mice and senolytic treatments are effective in alleviating these skeletal disorders.”

      (4) The manuscript could be further enhanced by providing more details into how CHMP5 specifically regulates VPS4A protein levels. Notably, this is a central aspect of the paper linking CHMP5 to endolysosomal dysfunction.

      We thank the reviewer for this important suggestion. One of the novel findings of this study is that CHMP5 regulates the protein level of VPS4A without affecting its RNA transcription. The mechanism of CHMP5 in the regulation of VPS4A protein will be reported in a separate study. However, we have discussed the potential mechanism in the manuscript (Lines 399-409). Specifically, we state:

      “However, the mechanism of CHMP5 in the regulation of the VPS4A protein has not yet been studied. Since CHMP5 can recruit the deubiquitinating enzyme USP15 to stabilize IκBα in osteoclasts by suppressing ubiquitination-mediated proteasomal degradation [13], it is also possible that CHMP5 stabilizes the VPS4A protein by recruiting deubiquitinating enzymes and regulating the ubiquitination of VPS4A, which needs to be clarified in future studies. Notably, mutations in the VPS4A gene in humans can cause multisystemic diseases, including musculoskeletal abnormalities [14] (OMIM: 619273), suggesting that normal expression and function of VPS4A are important for musculoskeletal physiology. The roles of VPS4A in regulating musculoskeletal biology and cell senescence should be further explored in future studies.”

      (5) The discussion section could be enriched by more thoroughly integrating the current findings with previous studies on CHMP5, particularly those exploring its role in osteoclast differentiation and NF-κB signaling.

      The comment is similar to comment #1 of this reviewer. We have expanded the discussion of the distinct functions of CHMP5 in osteoclasts and osteogenic cells (Lines 424-439). We also refer the reviewer to our response to comment #1.

      (6) Figure S4 D is incorrectly arranged and should be revised accordingly.

      Sorry for the confusion. We have added additional annotations to make the images clearer. Now it is Fig. S4E in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) Abstract A clinical perspective or at least an outline is desirable.

      The clinical importance of the findings of this study in understanding and treating musculoskeletal disorders of lysosomal storage diseases has been highlighted at the end of the abstract (Line 38).

      (2) Introduction Header missing.

      The protein name is BCL2, not Bcl2.

      These have been corrected in the revised manuscript (Lines 41, 66).

      (3) Results

      The mouse phenotype experiments are well done.

      Hmga1, Hmga2, Trp53, Ets1, and Txn1 are no typical senescence-associated genes. How about

      Cdkn2a and Cdkn1a? These could easily be highlighted in Figure 3B.

      Hmga1, Hmga2, Trp53, Ets1, and Txn1 are within the geneset of Reactome Cellular Senescence. Notably, only the protein levels of CDKN2A (p16) and CDKN1A (p21) showed significant changes (Fig. 3D) and the mRNA levels of Cdkn2a and Cdkn1a did not show significant changes according to RNAseq data. We have added the result of Cdkn2a and Cdkn1a mRNA levels to Fig. S3D in the revised manuscript. Also, we add the following sentences in the text (Lines 193-195):

      “However, the mRNA levels of Cdkn2a (p16) and Cdkn1a (p21) did not show significant changes according to the RNA-seq analysis (Fig. S3D).”

      Figure 3C: Which gene set was used for SASP?

      The SASP geneset in Fig. 3C was from the Reactome database. We have clarified this in the figure legend of Fig. 3 in the revised manuscript (Line 1013).

      The symptom "joint stiffness/contracture" could also be due to skeletal abnormalities related to Chmp5Ctsk.

      Joint stiffness/contracture during aging is mainly the result of heterotopic ossification or mineralization in musculoskeletal soft tissues, including ligaments, tendons, joint capsules, and their insertion sites on the bone. Notably, the periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice was mainly mapped to the insertion sites of tendons, ligaments, and joint capsules on the bone, which are consistent with changes during aging. These results have been discussed in the revised manuscript (Lines 456-460).

      Overall, cellular senescence needs at least Cdkn2a and/or Cdkn1a and another marker, i.e. SenMayo or telomere-associated foci or senescence-associated distortion of satellites.

      We have run GSEA with the SenMayo geneset and the result is added in Fig. S3F in the revised manuscript. Also, we ran another geneset KAMMINGA_SENESCENCE which includes genes downregulated in cell senescence. Both genesets are significantly enriched in Chmp5-deficient periskeletal progenitors based on RNA-seq data (Fig. S3F).

      In addition, we also performed immunostaining for another senescence marker γH2AX and the results showed that there are more γH2AX+;GFP+ cells in periskeletal overgrowth in Chmp5<sup>Ctsk</sup>;Rosa26<sup>mTmG/+</sup> mice compared to the periosteum of Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> control animals (Fig. 3E).

      Together, these results further support the senescent phenotypes of Chmp5-deficient periskeletal progenitors.

      For Figure 4A: What is the NES?

      The value of NES has been added in Fig. 4A.

      The existence of vesicles does not necessarily indicate more SASP. Author’s response:

      We agree with the reviewer that the secretion of extracellular vesicles is not directly correlated with the SASP. In this study, the increased secretory vesicles around Chmp5<sup>Ctsk</sup> periskeletal progenitors represent a secretory phenotype of Chmp5-deficient periskeletal progenitors and have paracrine effects in the abnormal bone growth in Chmp5 conditional knockout mice as shown in Figs. 4 and S4.

      The Chmp5-deficient cells COULD promote the proliferation and osteogenesis of other progenitors, but they might as well not. And if this is through the SASP, is completely unresolved.

      CD45<sup>-</sup>;CD31<sup>-</sup>;GFP<sup>-</sup> periskeletal progenitor cells from Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice showed an increased capacity of proliferation and osteogenesis compared to the corresponding cells from control animals (Figs. 4F, G, and S4C-E). Also, the conditioned medium of Chmp5-deficient skeletal progenitors promoted the proliferation of ATDC5 cells (Fig. 4E). In addition, the coculture of Chmp5<sup>Ctsk</sup> and wild-type periskeletal progenitors could enhance the osteogenesis of wild-type cells (Fig. S4B). These results demonstrate the paracrine actions of Chmp5-deficient periskeletal progenitors in promoting aberrant bone growth in Chmp5<sup>Ctsk</sup> and Chmp5<sup>Dmp1</sup> mice. However, factors that mediate the paracrine effects of Chmp5-deficient periskeletal progenitors remain further clarified in future studies.

      This has been mentioned in the revised manuscript (Lines 263-265).

      Figure 5C: The time points are not labelled.

      The time point of 16 weeks was mentioned in the Method section and now it has been added in the legend of Fig. 5C (Line 1063).

      Figure B: Was the bone's overall thickness quantified?

      In Fig. 5B, bone morphology in Chmp5<sup>Ctsk</sup> mice is irregular and difficult to quantify. Therefore, we did not qualify the overall bone thickness in these animals. However, the thickness of the cortical bone was measured by micro-CT analysis in Chmp5<sup>Dmp1</sup> mice after treatment with Q + D (Fig. 5E). Also, we have added the image of the gross femur thickness of Chmp5<sup>Dmp1</sup> mice before and after treatment with Q + D in Fig. 5E.

      It needs to be demonstrated that the actual cell number was reduced after D+Q treatment.

      The Q + D treatment caused a higher cell apoptotic ratio in Chmp5<sup>Ctsk</sup> vs. wild-type skeletal progenitors in ex vivo culture (Fig. 5A), suggesting its effectiveness in targeting the senescent periskeletal progenitors.

      Figure 7A: What is the NES?

      The value of NES has been added in Fig. 7A.

      Reviewer #3 (Recommendations for the authors):

      (1) The WB analysis should be quantified in the Figure 3D.

      In Fig. 3D, the numbers above the lanes of p16 and p21 are the results of the quantification of the band intensity after normalization by β-Actin, which has been indicated in the Figure legend (Lines 10151017).

      (2) The osteoblast detection should be measured with antibody against osteocalcin.

      This comment did not specify what result the reviewer was referring to. However, most of the experiments in this study were performed in primary skeletal progenitor cells or cell lines. Osteoblasts were not specifically involved in the current study.

      (3) Co-culture of Chmp5-KO periskeletal progenitors with WT ones should be conducted to detect the migration and osteogenesis of WT cell in response to Chmp5-KO induced senescent cells. In addition, co-culture of WT periskeletal progenitors with senescent cells induced by H2O2, radiation, or from aged mice would provide more information.

      This comment is the same as comment #2 in the Public Reviews of this Reviewer. We already carried out the coculture experiment of Chmp5-deficient and wild-type periskeletal progenitors and the result was added in Fig. S4B. We refer the reviewer to our response to comment #2 in the Public Reviews for more details.

      (4) D+Q treatment mitigates musculoskeletal pathologies in Chmp5 conditional knockout mice. In the previously published paper (CHMP5 controls bone turnover rates by dampening NF-κB activity in osteoclasts), inhibition of osteoclastic bone resorption rescues the aberrant bone phenotype of the Chmp5 conditional knockout mice. Is the effect of D+Q on bone overgrowth because of the inhibition of bone resorption?

      This comment is the same as comment #7 in the Public Reviews of this Reviewer, where we already address this question.

      (5) The role of VPS4A in cell senescence should be measured to support the conclusion that CHMP5 regulates osteogenesis through affecting cell senescence.

      This comment is the same as comment #8 in the Public Reviews of this Reviewer. We refer the reviewer to our response to that comment.

      (6) Cell senescence with the markers, such as p21 and H2AX, co-stained with GFP should be performed in the mouse models to indicate the effects of Chmp5 on cell senescence in vivo.

      This comment is the same as comment #9 in the Public Reviews of this Reviewer. We have performed immunostaining of γH2AX and colocalization with GFP in Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice and Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> mice. The results showed that there were more γH2AX+;GFP+ cells at the site of periskeletal overgrowth in Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice compared to the periosteum of Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> control mice (Fig. 3E). We also refer the reviewer to our response to comment #9 in Public Reviews.

      (7) ADTC5 cell as osteochondromas cells line, is not a good cell model of periskeletal progenitors.

      Maybe primary periskeletal progenitor cell is a better choice.

      This comment is the same as comment #10 in the Public Reviews of this Reviewer. Our previous study showed that ATDC5 cells could be used as a reasonable cell model for periskeletal progenitors [12]. Also, most of the results of ATDC5 cells in the current study were verified in primary periskeletal progenitors.

      References

      (1) Adoro S, Park KH, Bettigole SE, Lis R, Shin HR, Seo H, et al. Post-translational control of T cell development by the ESCRT protein CHMP5. Nat Immunol. 2017;18(7):780-90. doi: 10.1038/ni.3764. PubMed PMID: 28553951.

      (2) Kassem M, Bianco P. Skeletal stem cells in space and time. Cell. 2015;160(1-2):17-9. doi: 10.1016/j.cell.2014.12.034. PubMed PMID: 25594172.

      (3) Chan CKF, Gulati GS, Sinha R, Tompkins JV, Lopez M, Carter AC, et al. Identification of the Human Skeletal Stem Cell. Cell. 2018;175(1):43-56 e21. doi: 10.1016/j.cell.2018.07.029. PubMed PMID: 30241615.

      (4) Debnath S, Yallowitz AR, McCormick J, Lalani S, Zhang T, Xu R, et al. Discovery of a periosteal stem cell mediating intramembranous bone formation. Nature. 2018;562(7725):133-9. Epub 20180924. doi: 10.1038/s41586-018-0554-8. PubMed PMID: 30250253; PubMed Central PMCID: PMCPMC6193396.

      (5) Mizuhashi K, Ono W, Matsushita Y, Sakagami N, Takahashi A, Saunders TL, et al. Resting zone of the growth plate houses a unique class of skeletal stem cells. Nature. 2018;563(7730):254-8. doi: 10.1038/s41586-018-0662-5. PubMed PMID: 30401834; PubMed Central PMCID: PMCPMC6251707.

      (6) Zhang F, Wang Y, Zhao Y, Wang M, Zhou B, Zhou B, et al. NFATc1 marks articular cartilage progenitors and negatively determines articular chondrocyte differentiation. Elife. 2023;12. Epub 20230215. doi: 10.7554/eLife.81569. PubMed PMID: 36790146; PubMed Central PMCID: PMCPMC10076019.

      (7) Dai GC, Wang H, Ming Z, Lu PP, Li YJ, Gao YC, et al. Heterotopic mineralization (ossification or calcification) in aged musculoskeletal soft tissues: A new candidate marker for aging. Ageing Res Rev. 2024;95:102215. Epub 20240205. doi: 10.1016/j.arr.2024.102215. PubMed PMID: 38325754.

      (8) Mohler ER, 3rd, Adam LP, McClelland P, Graham L, Hathaway DR. Detection of osteopontin in calcified human aortic valves. Arterioscler Thromb Vasc Biol. 1997;17(3):547-52. doi: 10.1161/01.atv.17.3.547. PubMed PMID: 9102175.

      (9) Mohler ER, 3rd, Gannon F, Reynolds C, Zimmerman R, Keane MG, Kaplan FS. Bone formation and inflammation in cardiac valves. Circulation. 2001;103(11):1522-8. doi: 10.1161/01.cir.103.11.1522. PubMed PMID: 11257079.

      (10) Paramos-de-Carvalho D, Jacinto A, Saude L. The right time for senescence. Elife. 2021;10. Epub 2021/11/11. doi: 10.7554/eLife.72449. PubMed PMID: 34756162; PubMed Central PMCID: PMCPMC8580479.

      (11) Wiley CD, Campisi J. The metabolic roots of senescence: mechanisms and opportunities for intervention. Nat Metab. 2021;3(10):1290-301. Epub 2021/10/20. doi: 10.1038/s42255-021-00483-8. PubMed PMID: 34663974; PubMed Central PMCID: PMCPMC8889622.

      (12) Ge X, Tsang K, He L, Garcia RA, Ermann J, Mizoguchi F, et al. NFAT restricts osteochondroma formation from entheseal progenitors. JCI Insight. 2016;1(4):e86254. doi: 10.1172/jci.insight.86254. PubMed PMID: 27158674; PubMed Central PMCID: PMCPMC4855520.

      (13) Greenblatt MB, Park KH, Oh H, Kim JM, Shin DY, Lee JM, et al. CHMP5 controls bone turnover rates by dampening NF-kappaB activity in osteoclasts. J Exp Med. 2015;212(8):1283-301. Epub 20150720. doi: 10.1084/jem.20150407. PubMed PMID: 26195726; PubMed Central PMCID: PMCPMC4516796.

      (14) Rodger C, Flex E, Allison RJ, Sanchis-Juan A, Hasenahuer MA, Cecchetti S, et al. De Novo VPS4A Mutations Cause Multisystem Disease with Abnormal Neurodevelopment. Am J Hum Genet. 2020;107(6):1129-48. Epub 20201112. doi: 10.1016/j.ajhg.2020.10.012. PubMed PMID: 33186545; PubMed Central PMCID: PMCPMC7820634.

    1. Author response:

      eLife Assessment

      This manuscript introduces a useful protein-stability-based fitness model for simulating protein evolution and unifying non-neutral models of molecular evolution with phylogenetic models. The model is applied to four viral proteins that are of structural and functional importance. The justification of some hypotheses regarding fitness is incomplete, as well as the evidence for the model's predictive power, since it shows little improvement over neutral models in predicting protein evolution.

      We thank for the constructive comments that helped improve our study. Regarding the comment about justification of fitness, we will include in the revised manuscript additional information to support the relevance of modeling protein evolution accounting for protein folding stability. We agree that increasing the parameterization of the developed birth-death model is interesting, if it does not lead to overfitting. The model presented considers the fitness of protein variants to determine their reproductive success through the corresponding birth and death rates, varying among lineages, and it is biologically meaningful and technically correct (Harmon 2019). Following a suggestion of the first reviewer to allow variation of the global birth-death rate among lineages, we will additionally incorporate this aspect into the model and evaluate its performance with the data for the evaluation of the models. The integration of structurally constrained substitution models of protein evolution, as Markov models, into the birth-death process was made following standards approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012) and we will provide more information about it in the revised manuscript. Regarding the predictive power, our study showed good accuracy in predicting the real folding stability of forecasted protein variants. On the other hand, predicting the exact sequences proved to be more challenging, indicating needs in the field of substitution models of molecular evolution. Altogether, we believe our findings provide a significant contribution to the field, as accurately forecasting the folding stability of future real proteins is fundamental for predicting their protein function and enabling a variety of applications. Additionally, we implemented the models into a freely available computer framework, with detailed documentation and diverse practical examples.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is constrained by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral structural proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which have struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death. Unfortunately, though, the model shows little improvement over neutral models in predicting protein evolution, and this ultimately appears to be due to fundamental conceptual problems with how fitness is modeled and linked to the phylodynamic birth-death model.

      We thank the reviewer for the positive comments about our work.

      Regarding predictive power, the study showed a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. However, predicting the exact sequences was more challenging. For example, amino acids with similar physicochemical properties can result in similar folding stability while differ in the specific sequence, more accurate substitution models of molecular evolution are required in the field. We consider that forecasting the folding stability of future real proteins is an important advancement in forecasting protein evolution, given the essential role of folding stability in protein function and its variety of applications. Regarding the conceptual concerns related to fitness modeling, we clarify this issue in detail in our responses to the specific comments below.

      Major concerns:

      (1) Fitness model: All lineages have the same growth rate r = b-d because the authors assume b+d=1. But under a birth-death model, the growth r is equivalent to fitness, so this is essentially assuming all lineages have the same absolute fitness since increases in reproductive fitness (b) will simply trade off with decreases in survival (d). Thus, even if the SCS model constrains sequence evolution, the birth-death model does not really allow for non-neutral evolution such that mutations can feed back and alter the structure of the phylogeny.

      We thank the reviewer for this comment that aims to improve the realism of our model. In the model presented (but see later for another model derived from the proposal of the reviewer and that we are now implementing into the framework and applying to the data used for the evaluation of the models), the fitness predicted from a protein variant is used to obtain the corresponding birth rate of that variant. In this way, protein variants with high fitness have high birth rates leading to overall more birth events, while protein variants with low fitness have low birth rates resulting in overall more extinction events, which has biological meaning for the study system. The statement “All lineages have the same growth rate r = b-d” in our model is incorrect because, in our model, b and d can vary among lineages according to the fitness. For example, a lineage might have b=0.9, d=0.1, r=0.8, while another lineage could have b=0.6, d=0.4, r=0.2. Indeed, the statement “this is essentially assuming all lineages have the same absolute fitness” is incorrect. Clearly, assuming that all lineages have the same fitness would not make sense, in that situation the folding stability of the forecasted protein variants would be similar under any model, which is not the case as shown in the results. In our model, the fitness affects the reproductive success, where protein variants with a high fitness have higher birth rates leading to more birth events, while those with lower fitness have higher death rates leading to more extinction events. This parameterization is meaningful for protein evolution because the fitness of a protein variant can affect its survival (birth or extinction) without necessarily affecting its rate of evolution. While faster growth rate can sometimes be associated with higher fitness, a variant with high fitness does not necessarily accumulate substitutions at a faster rate. Regarding the phylogenetic structure, the model presented considers variable birth and death events across different lineages according to the fitness of the corresponding protein variants, and this alters the derived phylogeny (i.e., protein variants selected against can go extinct while others with high fitness can produce descendants). We are not sure about the meaning of the term “mutations can feed back” in the context of our system. Note that we use Markov models of evolution, which are well-stablished in the field (despite their limitations), and substitutions are fixed mutations, which still could be reverted later if selected by the substitution model (Yang 2006). Altogether, we find that the presented birth-death model is technically correct and appropriate for modeling our biological system. Its integration with structurally constrained substitution (SCS) models of protein evolution, as Markov models, is correct following general approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012). We will provide a more detailed description of the model in the revised manuscript.

      Apart from these clarifications about the birth-death model used, we understand the point of the reviewer and following the suggestion we are now incorporating an additional birth-death model that accounts for variable global birth-death rate among lineages. Specifically, we are following the model proposed by Neher et al (2014), where the death rate is considered as 1 and the birth rate is modeled as 1 + fitness. In this model, the global birth-death rate varies among lineages. We are now implementing this model into the computer framework and applying it to the data used for the evaluation of the models. Preliminary results, which will be finally presented in the revised manuscript, indicate that this model yields similar predictive accuracy compared to the previous birth-death model. If this is confirmed, accounting for variability in the global birth-death rate does not appear to play a major role in the studied systems of protein evolution. We will present this additional birth-death model and its results in the revised manuscript.

      (2) Predictive performance: Similar performance in predicting amino acid frequencies is observed under both the SCS model and the neutral model. I suspect that this rather disappointing result owes to the fact that the absolute fitness of different viral variants could not actually change during the simulations (see comment #1).

      The study shows similar performance in predicting the sequences of the forecasted proteins under both the SCS model and the neutral model, but shows differences in predicting the folding stability of the forecasted proteins between these models. Indeed, as explained in the previous answer, the birth-death model accounts for variation in fitness among lineages, leading to differences among lineages in reproductive success. The new birth-death model that we are now implementing, which incorporates variation of the global birth-death rate among lineages, is producing similar preliminary results. In addition to these considerations, it is known that SCS models applied to phylogenetics (such as ancestral molecular reconstruction) can model protein evolution with high accuracy in terms of folding stability. However, inferring sequences (i.e., ancestral sequences) is considerably more challenging even for ancestral molecular reconstruction (Arenas, et al. 2017; Arenas and Bastolla 2020). The observed sequence diversity is much greater than the observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions among amino acids with similar physicochemical properties can result in protein variants with similar folding stability but different specific amino acid sequences; further work is demanded in the field of substitution models of molecular evolution. We will expand the discussion of this aspect in the revised manuscript.

      (3) Model assessment: It would be interesting to know how much the predictions were informed by the structurally constrained sequence evolution model versus the birth-death model. To explore this, the authors could consider three different models: 1) neutral, 2) SCS, and 3) SCS + BD. Simulations under the SCS model could be performed by simulating molecular evolution along just one hypothetical lineage. Seeing if the SCS + BD model improves over the SCS model alone would be another way of testing whether mutations could actually impact the evolutionary dynamics of lineages in the phylogeny.

      In the present study, we compare the neutral model + birth-death (BD) with the SCS model + BD. Markov substitution models Q are applied upon an evolutionary time (i.e., branch length, t) and this allows to determine the probability of substitution events during that time period [P(t) = exp (Qt)]. This approach is traditionally used in phylogenetics to model the incorporation of substitutions over time. Therefore, to compare the neutral and SCS models, an evolutionary time is required, in this case it is provided by the birth-death process. The suggestions 1) and 2) cannot be compared without an underlined evolutionary history. However, comparisons in terms of likelihood, and other aspects, between models that ignore the protein structure and the implemented SCS models are already available in our previous studies based on coalescent simulations or given phylogenetic trees (Arenas, et al. 2013; Arenas, et al. 2015). There, SCS models produced proteins with more realistic folding stability than models that ignore evolutionary constraints from the protein structure, and those findings are consistent with the results from the present study where we explore the application of these models to forecasting protein evolution. We would like to emphasize that forecasting the folding stability of future real proteins is a significant and novel finding, folding stability is fundamental to protein function and has diverse implications. While accurately forecasting the exact sequences would indeed be ideal, this remains a challenging task with current substitution models. In this regard, we will discuss in the revised manuscript the need of developing more accurate substitution models.

      (4) Background fitness effects: The model ignores background genetic variation in fitness. I think this is particularly important as the fitness effects of mutations in any one protein may be overshadowed by the fitness effects of mutations elsewhere in the genome. The model also ignores background changes in fitness due to the environment, but I acknowledge that might be beyond the scope of the current work.

      This comment made us realize that more information about the features of the implemented SCS models should be included in the manuscript. In particular, the implemented SCS models consider a negative design based on the observed residue contacts in nearly all proteins available in the Protein Data Bank (Arenas, et al. 2013; Arenas, et al. 2015). This data is provided as an input file and it can be updated to incorporate new structures (see the framework documentation and the practical examples). Therefore, the prediction of folding stability is a combination of positive design (direct analysis of the target protein) and negative design (consideration of background proteins to reduce biases), thus incorporating background molecular diversity. This important feature was not sufficiently described in the manuscript, and we will add more details in the revised version. Regarding the fitness caused by the environment, we agree with the reviewer. This is a challenge for any method aiming to forecast evolution, as future environmental shifts are inherently unpredictable and may impact the accuracy of the predictions. Although one might attempt to incorporate such effects into the model, doing so risks overparameterization, especially when the additional factors are uncertain or speculative. We will include a discussion in the revised manuscript about our perspective on the potential effects of environmental changes on forecasting evolution.

      (5) In contrast to the model explored here, recent work on multi-type birth-death processes has considered models where lineages have type-specific birth and/or death rates and therefore also type-specific growth rates and fitness (Stadler and Bonhoeffer, 2013; Kunhert et al., 2017; Barido-Sottani, 2023). Rasmussen & Stadler (eLife, 2019) even consider a multi-type birth-death model where the fitness effects of multiple mutations in a protein or viral genome collectively determine the overall fitness of a lineage. The key difference with this work presented here is that these models allow lineages to have different growth rates and fitness, so these models truly allow for non-neutral evolutionary dynamics. It would appear the authors might need to adopt a similar approach to successfully predict protein evolution.

      We agree with the reviewer that robust birth-death models have been developed applying statistics and, in many cases, the primary aim of those studies is the development and refinement of the model itself. Regarding the study by Rasmussen and Stadler 2019, it incorporates an external evaluation of mutation events where the used fitness is specific for the proteins investigated in that study, which may pose challenges for users interested in analyzing other proteins. In contrast, our study takes a different approach. We implement a fitness function that can be predicted and evaluated for any type of protein (Goldstein 2013), making it broadly applicable. In addition, we provide a freely available and well-documented computational framework to facilitate its use. The primary aim of our study is not the development of novel or complex birth-death models. Rather, we aim to explore the integration of a standard birth-death model with structurally constrained substitution models for the purpose of predicting protein evolution. In the context of protein evolution, substitution models are a critical factor (Liberles, et al. 2012; Wilke 2012; Bordner and Mittelmann 2013; Echave, et al. 2016; Arenas, et al. 2017; Echave and Wilke 2017), and their combination with a birth-death model constitutes a first approximation upon which next studies can build to better understand this biological system. We will include these considerations in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this study, "Forecasting protein evolution by integrating birth-death population models with structurally constrained substitution models", David Ferreiro and co-authors present a forward-in-time evolutionary simulation framework that integrates a birth-death population model with a fitness function based on protein folding stability. By incorporating structurally constrained substitution models and estimating fitness from ΔG values using homology-modeled structures, the authors aim to capture biophysically realistic evolutionary dynamics. The approach is implemented in a new version of their open-source software, ProteinEvolver2, and is applied to four viral proteins from HIV-1 and SARS-CoV-2.

      Overall, the study presents a compelling rationale for using folding stability as a constraint in evolutionary simulations and offers a novel framework and software to explore such dynamics. While the results are promising, particularly for predicting biophysical properties, the current analysis provides only partial evidence for true evolutionary forecasting, especially at the sequence level. The work offers a meaningful conceptual advance and a useful simulation tool, and sets the stage for more extensive validation in future studies.

      We also thank this reviewer for the positive comments on our study. Regarding the predictive power, our results showed good accuracy in predicting the folding stability of the forecasted protein variants. However, predicting the specific sequences of these variants is more challenging. For example, forecasting in amino acids with similar physicochemical properties can result in different sequences but in similar folding stability. We believe that these findings are realistic and interesting as they indicate that while forecasting folding stability is feasible, forecasting the specific sequence evolution is more complex that one could anticipate.

      Strengths:

      The results demonstrate that fitness constraints based on protein stability can prevent the emergence of unrealistic, destabilized variants - a limitation of traditional, neutral substitution models. In particular, the predicted folding stabilities of simulated protein variants closely match those observed in real variants, suggesting that the model captures relevant biophysical constraints.

      We agree with the reviewer and appreciate the consideration that forecasting the folding stability of future real proteins is a relevant finding. For instance, folding stability is fundamental for protein function and affects several other molecular properties.

      Weaknesses:

      The predictive scope of the method remains limited. While the model effectively preserves folding stability, its ability to forecast specific sequence content is not well supported.

      It is known that structurally constrained substitution (SCS) models applied to phylogenetics (such as ancestral molecular reconstruction) can model protein evolution with high accuracy in terms of folding stability, while inferring sequences (i.e., ancestral sequences) remains considerably more challenging (Arenas, et al. 2017; Arenas and Bastolla 2020). The observed sequence diversity is much higher than the observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions between amino acids with similar physicochemical properties can result in protein variants with similar folding stability but with different specific amino acid composition. We will expand the discussion of this aspect in the manuscript.

      Only one dataset (HIV-1 MA) is evaluated for sequence-level divergence using KL divergence; this analysis is absent for the other proteins. The authors use a consensus Omicron sequence as a representative endpoint for SARS-CoV-2, which overlooks the rich longitudinal sequence data available from GISAID. The use of just one consensus from a single time point is not fully justified, given the extensive temporal and geographical sampling available. Extending the analysis to include multiple timepoints, particularly for SARS-CoV-2, would strengthen the predictive claims. Similarly, applying the model to other well-sampled viral proteins, such as those from influenza or RSV, would broaden its relevance and test its generalizability.

      The evaluation of forecasting evolution using real datasets is complex due to several conceptual and practical aspects. In contrast to traditional phylogenetic reconstruction of past evolutionary events and ancestral sequences, forecasting evolution often begins with a variant that is evolved forward in time and requires a rough fitness landscape to select among possible future variants (Lässig, et al. 2017). Another concern for validating the method is the need to know the initial variant that gives rise to the corresponding forecasted variants, and it is not always known. Thus, we investigated systems where the initial variant, or a close approximation, is known, such as scenarios of in vitro monitored evolution. In the case of SARS-CoV-2, the Wuhan variant is commonly used as the starting variant of the pandemic. Next, since forecasting evolution is highly dependent on the used model of evolution, unexpected external factors can be dramatic for the predictions. For this reason, systems with minimal external influences provide a more controlled context for evaluating forecasting evolution. For instance, scenarios of in vitro monitored virus evolution avoid some external factors such as host immune response. Another important aspect is the availability of data at two (i.e., present and future) or more time points along the evolutionary trajectory, with sufficient genetic divergence between them to identify clear evolutionary signatures. Additionally, using consensus sequences can help mitigate effects from unfixed mutations, which should not be modeled by a substitution model of evolution. Altogether, not all datasets are appropriate to properly evaluate forecasting evolution. We will include these considerations in the revised manuscript.

      Sequence comparisons based on the KL divergence require, at the studied time point, an observed distribution of amino acid frequencies among sites and an estimated distribution of amino acid frequencies among sites. In the study datasets, this is only the case for the HIV-1 MA dataset, which belongs to a previous study from one of us and collaborators where we obtained at least 20 independent sequences at each sampling point (Arenas, et al. 2016). We will provide additional information on this aspect in the manuscript.

      Regarding the Omicron dataset, we used 384 curated sequences of the Omicron variant of concern to construct the study dataset and we believe that it is a representative sample. The sequence used for the initial time point was the Wuhan variant (Wu, et al. 2020), which is commonly assumed to be the origin of the pandemic in SARS-CoV-2 studies. As previously indicated, the use of consensus sequences is convenient to avoid variants with unfixed mutations. Regarding extending the analysis to other timepoints (other variants of concern), we kindly disagree because Omicron is the variant of concern with the highest genetic distance to the Wuhan variant, and a high genetic distance is required to properly evaluate the prediction method. We noted that earlier variants of concern show a small number of fixed mutations in the study proteins, despite the availability of large numbers of sequences in databases such as GISAID.

      Additionally, we investigated the evolutionary trajectories of HIV-1 protease (PR) in 12 intra-host viral populations.

      Next, following the proposal of the reviewer, we will incorporate the analysis of an additional viral dataset (probably influenza following the suggestion of the reviewer) to further assess the generalizability of the method. Still, as previously indicated, not all datasets are suitable for a proper evaluation of forecasting evolution. Factors such as the shape of the fitness landscape and the amount of genetic variation over time can influence the accuracy of predictions. We will present the results of the analysis of the new data in the revised manuscript.

      It would also be informative to include a retrospective analysis of the evolution of protein stability along known historical trajectories. This would allow the authors to assess whether folding stability is indeed preserved in real-world evolution, as assumed in their model.

      Our present study is not focused on investigating the evolution of the folding stability over time, although it provides this information indirectly at the studied time points. Instead, the present study shows that the folding stability of the forecasted protein variants is similar to the folding stability of the corresponding real protein variants for diverse viral proteins, which is an important evaluation of the method. Next, the folding stability can indeed vary over time in both real and modeled evolutionary scenarios, and our present study is not in conflict with this. In that regard, which is not the aim of our present study, some previous phylogenetic-based studies have reported temporal fluctuations in folding stability for diverse data (Arenas, et al. 2017; Olabode, et al. 2017; Arenas and Bastolla 2020; Ferreiro, et al. 2022).

      Finally, a discussion on the impact of structural templates - and whether the fixed template remains valid across divergent sequences - would be valuable. Addressing the possibility of structural remodeling or template switching during evolution would improve confidence in the model's applicability to more divergent evolutionary scenarios.

      This is an important point. For the datasets that required homology modeling (in several cases it was not necessary because the sequence was present in a protein structure of the PDB), the structural templates were selected using SWISS-MODEL, and we applied the best-fitting template. We will include additional details about the parameters of the homology modeling in the revised version. Indeed, our method assumes that the protein structure is maintained over the studied evolutionary time, which can be generally reasonable for short timescales where the structure is conserved (Illergard, et al. 2009; Pascual-Garcia, et al. 2010). Over longer evolutionary timescales, structural changes may occur, and in such cases, modeling the evolution of the protein structure would be necessary. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence, may offer promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real data with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research. We will include this discussion in the revised manuscript.

      Cited references

      Arenas M. 2012. Simulation of Molecular Data under Diverse Evolutionary Scenarios. PLoS Comput Biol 8:e1002495.

      Arenas M, Bastolla U. 2020. ProtASR2: Ancestral reconstruction of protein sequences accounting for folding stability. Methods Ecol Evol 11:248-257.

      Arenas M, Dos Santos HG, Posada D, Bastolla U. 2013. Protein evolution along phylogenetic histories under structurally constrained substitution models. Bioinformatics 29:3020-3028.

      Arenas M, Lorenzo-Redondo R, Lopez-Galindez C. 2016. Influence of mutation and recombination on HIV-1 in vitro fitness recovery. Molecular Phylogenetics and Evolution 94:264-270.

      Arenas M, Sanchez-Cobos A, Bastolla U. 2015. Maximum likelihood phylogenetic inference with selection on protein folding stability. Molecular Biology and Evolution 32:2195-2207.

      Arenas M, Weber CC, Liberles DA, Bastolla U. 2017. ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability. Systematic Biology 66:1054-1064.

      Bordner AJ, Mittelmann HD. 2013. A new formulation of protein evolutionary models that account for structural constraints. Molecular Biology and Evolution 31:736-749.

      Carvajal-Rodriguez A. 2010. Simulation of genes and genomes forward in time. Current Genomics 11:58-61.

      Echave J, Spielman SJ, Wilke CO. 2016. Causes of evolutionary rate variation among protein sites. Nature Reviews Genetics 17:109-121.

      Echave J, Wilke CO. 2017. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 46:85-103.

      Ferreiro D, Khalil R, Gallego MJ, Osorio NS, Arenas M. 2022. The evolution of the HIV-1 protease folding stability. Virus Evol 8:veac115.

      Goldstein RA. 2013. Population Size Dependence of Fitness Effect Distribution and Substitution Rate Probed by Biophysical Model of Protein Thermostability. Genome Biol Evol 5:1584-1593.

      Harmon LJ. 2019. Introduction to birth-death models. In. Phylogenetic Comparative Methods. p. https://lukejharmon.github.io/pcm/chapter10_birthdeath/.

      Hoban S, Bertorelle G, Gaggiotti OE. 2012. Computer simulations: tools for population and evolutionary genetics. Nature Reviews Genetics 13:110-122.

      Illergard K, Ardell DH, Elofsson A. 2009. Structure is three to ten times more conserved than sequence--a study of structural response in protein cores. Proteins 77:499-508.

      Lässig M, Mustonen V, Walczak AM. 2017. Predicting evolution. Nature Ecology & Evolution 1:0077.

      Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning AP, Dokholyan NV, Echave J, et al. 2012. The interface of protein structure, protein biophysics, and molecular evolution. Protein Science 21:769-785.

      Neher RA, Russell CA, Shraiman BI. 2014. Predicting evolution from the shape of genealogical trees. Elife 3.

      Olabode AS, Kandathil SM, Lovell SC, Robertson DL. 2017. Adaptive HIV-1 evolutionary trajectories are constrained by protein stability. Virus Evol 3:vex019.

      Pascual-Garcia A, Abia D, Mendez R, Nido GS, Bastolla U. 2010. Quantifying the evolutionary divergence of protein structures: the role of function change and function conservation. Proteins 78:181-196.

      Wilke CO. 2012. Bringing molecules back into molecular evolution. PLoS Comput Biol 8:e1002572.

      Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, et al. 2020. A new coronavirus associated with human respiratory disease in China. Nature 579:265-269.

      Yang Z. 2006. Computational Molecular Evolution. Oxford, England.: Oxford University Press.

    1. Author response:

      Reviewer #1 (Public review):

      (1) The broader significance of the findings needs to be better articulated. While the authors emphasize that comparing adaptive traits in sympatry and allopatry provides insights into selective processes shaping reproductive isolation and coexistence, it is unclear what key conceptual or theoretical questions are being addressed. Are these patterns expected under certain evolutionary scenarios? Have they been empirically demonstrated in other systems? The authors should explicitly state the overarching research question, incorporate some predictions, and better contextualize their findings within the existing literature. If the results challenge or support previous work, that should be highlighted to strengthen the study's importance in a broader context.

      We thank the reviewer for their valuable feedback. We understand that the framing of the results and the discussion did not allow to highlight the broader significance of our findings. In the revised version of the manuscript, we will explicitly mention the theoretical questions asked and our hypotheses in the introduction, and better compare our results to pre-existing examples from the literature.

      (2) The motivation for studying visual signals and mate choice in allopatric populations (i.e., at the intraspecific level) is not well articulated, leaving their role in the broader narrative unclear. In particular, the rationale behind experiments 1, 2, and 3 is not well defined, as the authors have not made a strong case for the need for these intraspecific comparisons in the introduction. This issue is further compounded by the authors' primary focus on signal evolution in sympatry throughout both the results and the discussion. For instance, the divergence of iridescence in allopatry is a potentially interesting result. But the authors have not discussed its implications.

      Overall, given that the primary conclusions are based on results and analyses in sympatry, the role of allopatric populations in shaping these conclusions needs to be better integrated and justified.

      Without a stronger link between the comparative framework and the study's key takeaways, the use of allopatric populations feels somewhat peripheral rather than central to the study's aim.

      Since the primary conclusions remain valid even without the allopatric comparisons, their inclusion requires a clearer rationale.

      We recognize that the current manuscript places more emphasis on the sympatric Morpho population, and that the analysis and the discussion of the results regarding the allopatric Morpho population were underdeveloped. In the revised version, we plan to address this by (1) developing the rationale behind the male choice experiments performed on the allopatric population. We will argue that intraspecific comparison helps identify the traits involved in mate preference within species (iridescent color and/or wing pattern) and that those results can be compared to the interspecific mate choice results to identify the traits involved in species recognition. To explain the relevance of the comparison with the allopatric population, we will also (2) strengthen expectations on the effect of species interactions on the evolution of traits and mate recognition in sympatric populations vs. allopatric populations.

      (3) While the authors demonstrate that iridescence is indistinguishable to predators in sympatry, they overstate the role of predation in driving convergence. The present study does not experimentally demonstrate that iridescence in this species has a confusion effect or contributes to evasive mimicry. Alternatively, convergence could result from other selective forces, such as signal efficacy due to environmental conditions, rather than being solely driven by predation.

      We acknowledge that this study neither demonstrates that iridescence contributes to evasive mimicry nor that predation is the driver of the convergence in iridescence. We will tone down the interpretation of the results in the discussion and state that predation is not the only selective pressure that could have promoted a convergent evolution of iridescence in sympatric species, although this observation is consistent with the evasive mimicry hypothesis.

      Reviewer #2 (Public review):

      My only major comment concerns the authors' favoured explanation for aposematism (or evasive mimicry) for convergence among species, which is based upon the you-can't-catch-me hypothesis first presented by Young 1971. Although there is supporting work showing that iridescent-like stimuli are more difficult to precisely localize by a range of viewers, most of the evidence as applied to the Morpho system is circumstantial, and I'm not certain that there is widespread acceptance of this hypothesis. Given that the present study deals with closely-related (sub)species, one alternative explanation - a "null" hypothesis of sorts - is for a lack of divergence (from a common starting point) as opposed to evolutionary convergence per se. in other words, two subspecies are likely to retain ancestral character states unless there is selection that causes them to diverge. I feel that the manuscript would benefit from a discussion of this alternative, if not others. Signalling to predators could very well be involved in constraining the extent of convergence, but this seems a little premature to state as an up-front conclusion of this work. There is also the result of a *dorsal* wing manipulation by Vieira-Silva et al. 2024 (https://doi.org/10.1111/eth.13517), which seems difficult to reconcile in light of this explanation. Whereas this paper is cited by the authors, a more nuanced discussion of their experimental results would seem appropriate here.

      We thank the reviewer for their constructive comments on our manuscript. We appreciate the reviewer’s concern regarding the way iridescence convergence between sympatric species is discussed in our manuscript, which aligns with similar concerns raised by Reviewer 1. We will improve the discussion on the different evolutionary forces that could have favored this convergent iridescent signal in sympatry to bring more nuance to the discussion.

      Reviewer #3 (Public review):

      First, when using allopatric and sympatric (sub)species pairs to test evolutionary hypotheses, replication is important. Ideally, multiple allopatric and sympatric (sub)species pairs are compared to avoid outlier (sub)species or pairs that lead to biased conclusions. Unfortunately, the current study compares 1 allopatric and 1 sympatric (sub)species pair, hence having poor (no) replication on the level of allopatric and sympatric (sub)species pairs.

      We would like to thank the reviewer for their constructive feedbacks. We agree that replication is important to test evolutionary hypotheses and that our study lacks replication for allopatric and sympatric Morpho populations. Ideally, one would require several allopatric and sympatric replicates pointing respectively toward divergence and convergence of Morpho iridescence to conclude on the effect of species interaction in trait evolution. Our study is a first attempt at answering this question, covering few Morpho populations but proposing a broad assessment of iridescence and mate preference for those populations. We will make sure to mention this limitation more clearly in the revised version of our manuscript.

      Second, chemical profiles were only measured for sympatric species and not for allopatric (sub)species, which limits the interpretation of this data. The allopatric (sub)species could have been measured as non-coexistence "control". If coexistence and convergence in wing colouration drives the evolution of alternative mate recognition signals, such alternative signals should not evolve/diverge for allopatric (sub)species where wing colouration is still a reliable mate recognition cue. More importantly, no details are provided on the quantification of butterfly chemical profiles, which is essential to understand such data. It is unclear how the chemical profiles were quantified and what data (concentrations, ratios, proportions) were used to perform NDMS and generate Figure 5 and the associated statistical tests.

      We recognize that having the chemical profiles of the genitalia of the Morpho from the allopatric population would have made a stronger case arguing in favor of reinforcement acting on the divergence of the chemical compounds found on the genitalia of the sympatric Morpho species. Due to limited access to the biological material needed by the time of the chromatography, we could not test for lower divergence in the chemical profiles of allopatric Morpho butterflies. We will mention this limitation in the results, and clarify the protocol used to extract the chemical profiles, by mentioning the use of concentration data to generate Figure 5 and the associated statistical tests.

      Third, throughout the discussion, the authors mention that their results support natural selection by predators on iridescent wing colouration, without measuring natural selection by predators or any other measure related to predation. It is unclear by what predators any of the butterfly species are predated on at this point.

      We will mention in the next version of the manuscript previous predation experiments performed on Morpho and other butterflies showing evidence that birds can be predators for those species. Those observations lead us to test for the putative effect of predation on the evolution of their color pattern, without directly testing predatory rates. We will make sure this information is transparent in the revised manuscript.

      To continue on the interpretation of the data related to selection on specific traits by specific selection agents: This study did not measure any form of selection or any selection agent. Hence, it is not known if iridescent wing colouration is actually under selection by predators and/or mates, if maybe other selection agents are involved or if these traits converge due to genetic correlations with other traits under selection. For example, Iridescent colouration in ground beetles has functions as antipredator defence but also thermo- and water regulation. None of these issues are recognized or discussed.

      We acknowledge that the lack of discussion on alternative evolutionary forces involved in the evolution of iridescence has been highlighted by all reviewers. We will discuss how environmental factors, genetic factors or the correlation with others traits as explanatory variables might explain the convergent signal of iridescence found in sympatric Morpho species, and not only focus on the putative effect of predation.

      Finally, some of the results are weakly supported by statistics or questionable methodology. Most notably, the perception of the iridescence coloration of allopatric subspecies by bird visual systems. Although for females, means and errors (not indicated what exactly, SD, SE or CI) are clearly above the 1 JND line, for males, means are only slightly above this line and errors or CIs clearly overlap with the 1 JND line. Since there is no additional statistical support, higher means but overlap of SD, SE or CI with the baseline provides weak statistical support for differences.

      We thank the reviewer for bringing interpretation issues concerning the chromatic distances of allopatric Morpho species measured with a bird vision model. We will make sure to bring nuance to the interpretation of this graph, and clearly mention in the figure’s legend that the error bars represent the confidence intervals obtained after performing a bootstrap analysis.

      Regarding the assortative mating experiment, the results are clearly driven by M. bristowi. For M. theodorus, females mate equally often with conspecifics (6 times) as with M. bristowi (5 times). For males, the ratio is slightly better (6 vs 3), but with such low numbers, I doubt this is statistically testable. Overall low mating for M. bristowi could indicate suboptimal experimental conditions, and hence results should be interpreted with care.

      Regarding the wing manipulation experiment, M. theodorus does not show a preference when dummies with non-modified wings are presented and prefers non-modified dummies over modified dummies. This is acknowledged by the authors but not further discussed. Certainly, some control treatment for wing modification could have been added.

      We recognize that the tetrad experiment results are mainly driven by M. bristowi’s behavior. This experiment would have benefited from more replicates. We will mention that the conclusions we draw for this experiment are mainly driven by male M. bristowi behavior, and that it is more difficult to test for assortative or disassortative mating in M. theodorus, adding more nuance to our interpretation. We will also make sure to discuss further the effect of wing modification in the discussion.

      Overall, the fact that certain measurements only provide evidence for 1 of the 2 (sub)species (assortative mating, wing manipulation) or one sex of one of the species (bird visual systems) means overall interpretation and overgeneralization of the results to both allopatric or sympatric species should be done with care, and such nuances should ideally be discussed.

      The aim of the authors, "to investigate the antagonistic effects of selective pressures generated by mate recognition and shared predation" has not been achieved, and the conclusions regarding this aim are not supported by the results. Nevertheless, the iridescence colour measurements are solid, and some of the behavioural experiments and chemical profile measurements seem to yield interesting results. The study would benefit from less overinterpretation of the results in the framework of predation and more careful consideration of methodological difficulties, statistical insecurities, and nuances in the results.

      Overall, we would like to thank all reviewers for their thorough assessment of our work. We understand that the imbalance between mate choice data, visual model data and chemical data only give us a partial assessment of species recognition in Morpho butterflies, thus requiring more precision in the interpretation and the discussion of our results. We will implement all the comments made by the reviewers in the next version of our manuscript.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors have developed SPLASH+, a micro-assembly and biological interpretation framework that expands on their previously published reference-free statistical approach (SPLASH) for sequencing data analysis.

      Thank you for this thorough overview of our work.

      Strengths:

      (1) The methodology developed by the authors seems like a promising approach to overcome many of the challenges posed by reference-based single-cell RNA-seq analysis methods.

      Thank you for your positive comment on the potential of our approach to address the limitations of reference-based methods for scRNA-Seq analysis.

      (2) The analysis of the RNU6 repetitive small nuclear RNA provides a very compelling example of a type of transcript that is very challenging to analyze with standard reference-based methods (e.g., most reads from this gene fail to align with STAR, if I understood the result correctly).

      We thank the reviewer for their positive comment. We agree that the variation in RNU6 detected by SPLASH+ underscores the potential of our reference-free method to make discoveries in cases where reference-based approaches fall short.

      Weaknesses:

      (1) The manuscript presents a number of case studies from very diverse domains of single-cell RNA-seq analysis. As a result, the manuscript has been challenging to review, because it requires domain expertise in centromere biology, RNA splicing, RNA editing, V(D)J transcript diversity, and repeat polymorphisms.

      We appreciate the reviewer’s effort in thoroughly evaluating this manuscript, especially given the broad range of biological domains discussed. Our main goal in presenting a wide range of applications was to highlight the key strength of the SPLASH+ framework: its ability to unify diverse biological discoveries within a single method that operates directly on sequencing reads.

      (2) Although the paper focuses on SmartSeq2 full-length single-cell RNA-seq data analysis, the vast majority of single-cell RNA-seq data that is currently being generated comes from droplet-based methods (e.g., 10x Genomics) that sequence only the 3' or 5' ends of transcripts. As a result, it is unclear if SPLASH+ is also applicable to these types of data.

      We thank the reviewer for this comment. Due to the specific data format of barcoded single-cell sequencing platforms such as 10x Genomics, extending the SPLASH framework to support 10x analysis required engineering a specialized preprocessing tool. We have addressed this in a recent work, which is now available as a preprint (https://doi.org/10.1101/2024.12.24.630263).

      (3) The criteria used for the selection of the 10 'core genes' have not been sufficiently justified.

      We chose these genes as SPLASH+ detected regulated splicing for them in nearly all tissues (18 out of 19)  analyzed in our study (i.e., identifying anchors classified as splicing anchors in those tissues). Our subsequent analysis showed that all these genes are involved in either splicing regulation or histone modification. We will further clarify this selection criterion in the revision. 

      (4) It is currently unclear how the splicing diversity discovered in this paper relates to the concept of noisy splicing (i.e., there are likely many low-frequency transcripts and splice junctions that are unlikely to have a significant functional impact beyond triggering nonsense-mediated decay).

      In our analysis, to ensure sufficient read coverage, we considered significant anchors supported by more than 50 reads and detected in over 10 cells. Additionally, our downstream analyses (including splicing analysis) are based on assembled sequences (compactors) generated through our micro-assembly step. This process effectively acts as a denoising step by filtering out sequences likely caused by sequencing errors or with very low read support. However, we agree that the detected splice variants have not been fully functionally characterized, and further functional experiments may be needed.

      (5) The paper presents only a very superficial discussion of the potential weaknesses of the SPLASH+ method.

      We discussed two potential limitations of SPLASH+ in the Conclusions section: (1) it is not suitable for differential gene expression analysis, and (2) although we provide a framework for interpreting and analyzing SPLASH results, further work is still needed to improve the annotation of calls lacking BLAST matches. We will add more discussion for these in the revision. 

      (6) The cursory mention of metatranscriptome in the conclusion of the paper is confusing, as it might suggest the presence of microbial cells in sterile human tissues (which has recently been discredited in cancer, see e.g. https://www.science.org/content/article/journal-retracts-influential-cancer-microbiome-paper).

      We will remove the mention of metatranscriptome in the revised manuscript.

      Reviewer #2 (Public review):

      The authors extend their SPLASH framework with single-cell RNA-seq in mind, in two ways. First, they introduce "compactors", which are possible paths branching out from an anchor. Second, they introduce a workflow to classify compactors according to the type of biological sequence variation represented (splicing, SNV, etc). They focus on simulated data for fusion detection, and then focus on analyzing the Tabula sapiens Smart-seq2 data, showing extensive results on alternative splicing analysis, VDJ, and repeat elements.

      This is strong work with an impressive array of biological investigations and results for a methods paper. I have various concerns about terminology and comparisons, as follows (in a somewhat arbitrary order, apologies).

      Thank you for this thorough overview of our work and your positive comment on the strength of our work.

      (1) The discussion of the weaknesses of the consensus sequence approach of SPLASH is an odd way to motivate SPLASH+ in my opinion, in that SPLASH is not yet so widely used, so the baseline for SPLASH+ is really standard alignment-based approaches. It is fine to mention consensus sequence issues briefly, but it felt belabored.

      We thank the reviewer and agree that the primary comparison for SPLASH+ is with reference-based methods. However, since SPLASH+ builds upon SPLASH, we also aimed to highlight the limitations of the consensus step in original SPLASH and how SPLASH+ addresses them. To maintain the main focus of the paper on comparison with reference-based methods and biological investigations, this discussion with consensus was provided in a Supplementary Figure. We will shorten this discussion in the revision.

      (2) Regarding compactors reducing alignment cost: the comparison should really be between compactor construction and alignment vs read alignment (and maybe vs modern contig construction algorithms and alignment).

      Since the SPLASH framework is fundamentally reference-free and does not require read alignment, we compared the number of sequence alignments for compactors to the total read alignments required by a reference-based method to show that while compactors are aligned to the reference, the number of alignments needed is still orders of magnitude less than a reference-based approach requiring alignment of all the reads.

      (3) The language around "compactors" is a bit confusing, where the authors sometimes refer to the tree of possibilities from an anchor as a "compactor", and sometimes a compactor is a single branch. Presumably, ideally, compactors should be DAGs, not trees, i.e., they can connect back together. Perhaps the authors could comment on whether this matters/would be a valuable extension.

      We thank the reviewer for their comment. We refer to each generated assembled sequence as “a compactor”, and we attempted to make this clear in the paper. We will review the text further to ensure this definition is clear in the revised version.

      (4) The main oddness of the splicing analysis to me is not using cell-type/state in any way in the statistical testing. This need not be discrete cell types: psiX, for example, tested whether exonic PSI was variable with reference to a continuous gene expression embedding. Intuitively, such transcriptome-wide signal should be valuable for a) improving power and b) distinguishing cell-type intrinsic/"noisy" from cell-type specific splicing variation. A straightforward way of doing this would be pseudobulking cell types. Possibly a more sophisticated hierarchical model could be constructed also.

      We appreciate the reviewer’s concern regarding SPLASH+ not using cell type metadata. SPLASH, which performs the core statistical inference in SPLASH+, is an unsupervised tool specifically designed to make biological discoveries without relying on metadata (such as cell type annotations in scRNA-Seq). This is particularly useful in scRNA-seq, where cell type labels could be missing, imprecise, or may miss important within-cell-type variation. As shown in the paper, even without using metadata, SPLASH+ demonstrated improved performance than both SpliZ and Leafcutter (two metadata-dependent tools) in terms of achieving higher concordance and identifying more differentially spliced genes. Regarding pseudobulking, as has been shown in the SpliZ paper (https://doi.org/10.1038/s41592-022-01400-x), pseudobulking requires multiple pseudobulked replicates per cell type for reliable inference, which is often not feasible in scRNA-seq settings, making such methods statistically suboptimal for single-cell studies. We will add a discussion on pseudobulking in the revision. 

      (5) A secondary weakness is that some informative reads will not be used, for example, unspliced reads aligning to an alterantive exons. This relates to the broader weakness of SPLASH that it is blind to changes in coverage that are not linked to a specific anchor (which should be acknowledged somewhere, maybe in the Discussion). In the deeply sequenced SS2 data, this is likely not an issue, but might be more limiting in sparser data. A related issue is that coverage change indicative of, e.g., alternative TSS or TES (that do not also include a change in splice junction use) will not be detected. In fairness, all these weaknesses are shared by LeafCutter. It would be valuable to have a comparison to a more "traditional" splicing analysis approach (pick your favorite of rMATS, MISO, SUPPA).

      We thank the reviewer for their comment. As noted in the Conclusion, the SPLASH framework is not designed for differential gene expression analysis, which relies on quantifying read coverage. Rather, it focuses on detecting differential sequence diversity arising from mechanisms like alternative splicing or RNA editing. We will clarify this limitation further in the revised Conclusion. 

      Regarding splicing evaluation, we have performed extensive comparisons with two widely used and recent methods—SpliZ and Leafcutter—for both bulk and single-cell splicing analysis. While we appreciate the reviewer’s suggestion to include an additional method, given the current length of the paper and the fact that leafcutter has previously been shown to outperform rMATS, MAJIQ, and Cufflinks2

      (https://www.nature.com/articles/s41588-017-0004-9), we believe the current comparisons provide sufficient support for the evaluation of the splicing detection by SPLASH+.

      (6) "We should note that there is no difference between gene fusions and other RNA variants (e.g., RNA splicing) from a sequence assembly viewpoint". Maybe this is true in an abstract sense, but I don't think it is in reality. AS can produce hundreds of isoforms from the same gene, and be variable across individual cells. Gene fusions are generally less numerous/varied and will be shared across clonal populations, so the complexity is lower. That simplicity is balanced against the challenge that any genes could, in principle, fuse.

      We selected the fusion benchmarking dataset solely to evaluate how well compactors reconstruct sequences. Since our goal was to assess the accuracy of reconstructed compactor sequences, we needed a benchmarking dataset with ground truth sequences, which this dataset provides. We had explained our main reason and purpose for selecting fusion dataset in the text, but we will clarify it further in the revision.

      (7) For the fusion detection assessment, SPLASH+ is given the correct anchor for detection. This feels like cheating since this information wouldn't usually be available. Can the authors motivate this? Are the other methods given comparable information? Also, TPM>100 seems like a very high expression threshold for the assessment.

      We agree with the reviewer that the fusion benchmarking dataset should not be used to assess the entire SPLASH+ framework. In fact, we did not use this dataset to evaluate SPLASH+; it was used exclusively to evaluate the performance of compactors as a standalone module. Specifically, we tested how well compactors can reconstruct fusion sequences when provided with seed sequences corresponding to fusion junctions. This aligns with our expectation from compactors in SPLASH+, that they should correctly reconstruct the sequence context for the detected anchors. As noted in our previous response, since our goal was to assess the accuracy of reconstructed compactor sequences, we required a benchmarking dataset with ground truth sequences, which this dataset provides. We will clarify this further in the revision.

      We appreciate the reviewer’s concern that a TPM of 100 is high. In Figure 1C, we presented the full TPM distribution for fusions missed or detected by compactors. The 100 threshold was an arbitrary benchmark to illustrate the clear difference in TPM profiles between these two sets of fusions. We will clarify this point in the revised manuscript.

      (8) Why are only 3'UTRs considered and not 5'? Is this because the analysis is asymmetric, i.e., only considering upstream anchors and downstream variation? If so, that seems like a limitation: how much additional variation would you find if including the other direction?

      We thank the reviewer for their comment. SPLASH+ can, in principle, detect variation in 5’ UTR regions, as demonstrated by the variations observed in the 5’ UTRs of the genes ANPC16 and ARPC2. If sequence variation exists in the 5′ UTR, SPLASH+ can still detect it by identifying an anchor upstream of the variable region, as it directly parses sequencing reads to find anchors with downstream sequence diversity. Even when the variation occurs near the 5′ end of the 5′ UTR, SPLASH+ can still capture this diversity if the user selects a shorter anchor length.

      (9) I don't find the theoretical results very meaningful. Assuming independent reads (equivalently binomial counts) has been repeatedly shown to be a poor assumption in sequencing data, likely due to various biases, including PCR. This has motivated the use of overdispersed distributions such as the negative Binomial and beta binomial. The theory would be valuable if it could say something at a specified level of overdispersion. If not, the caveat of assuming no overdispersion should be clearly stated.

      We appreciate the reviewer’s comment. We will clarify this in the revised paper.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript entitled "Phosphodiesterase 1A Physically Interacts with YTHDF2 and Reinforces the Progression of Non-Small Cell Lung Cancer" explores the role of PDE1A in promoting NSCLC progression by binding to the m6A reader YTHDF2 and regulating the mRNA stability of several novel target genes, consequently activating the STAT3 pathway and leading to metastasis and drug resistance.

      Strengths:

      The study addresses a novel mechanism involving PDE1A and YTHDF2 interaction in NSCLC, contributing to our understanding of cancer progression.

      Reviewer #2 (Public review):

      Summary

      This revised manuscript investigates the role and the mechanism by which PDE1 impacts NSCLC progression. They provide evidence to demonstrate that PDE1 binds to m6A reader YTHDF2, in turn, regulating STAT3 signaling pathway through its interaction, promoting metastasis and angiogenesis.

      Strength:

      The study uncovers a novel PDE1A/YTHDF2/SOCS2/STAT3 pathway in NSCLC progression and the findings provide a potential treatment strategy for NSCLC patients with metastasis.

      Weakness:

      In discussion, it is stated in the revised version that "the role of YTHDF2 in PDE1A-driven tumor metastasis should be elucidated in future studies", however, given that physical interaction of PDE1A and YTHDF2 plays a critical role in PDE1A-mediated NSCLC metastasis, whether YTHDF2 mimicking the effect of PDE1A in metastasis will strength the manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) In Figure 1A, the y-axis should be "IOD/Area" instead of "IDO/Area".

      Figure 1A was revised as suggested.

      (2) Figure 3A legend for (F) and (G) was switched.

      Figure 3A legend was revised as suggested “(F-G) The mRNA (F) and protein (G) levels of indicated genes were determined in P3 and P0 NSCLC cells.”.

      (3) The statistical analysis should be performed for Figure 3H.

      Figure 3H was revised as suggested.

      (4) Figure 4F, Y-axis has a typo for "vessels" and statistical analysis should be performed on this data.

      Figure 4F was revised as suggested.

      (5) Figure 6 E, typo for "migrated" on the y-axis.

      Figure 6E was revised as suggested.

      (6) Figure 7 C, typos for "expression" on y-aixs in both figures need to be fixed.

      Figure 7C was revised as suggested.

      (7) P-values for Figure 7B need to be stated.

      Figure 7B was revised as suggested.

      (8) m6A should be consistent throughout the manuscript.

      m6A was consistent throughout the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      IKK is the key signaling node for inflammatory signaling. Despite the availability of molecular structures, how the kinase achieves its specificity remains unclear. This paper describes a dynamic sequence of events in which autophosphorylation of a tyrosine near the activate site facilitates phosphorylation of the serine on the substrate via a phosphor-transfer reaction. The proposed mechanism is conceptually novel in several ways, suggesting that the kinase is dual specificity (tyrosine and serine) and that it mediates a phospho-transfer reaction. While bacteria contain phosphorylation-transfer enzymes, this is unheard of for mammalian kinases. However, what the functional significance of this enzymatic activity might remain unaddressed.

      The revised manuscript adequately addresses all the points I suggested in the review of the first submission.

      Response: Authors thank the reviewer for their valuable comments and constructive criticisms for the betterment of the manuscript. We also thank them for appreciating our work. We agree with the reviewer that the functional significance of this particular enzymatic activity of IKK2 is yet to be fully realized. 

      Reviewer #2 (Public review):

      The authors investigate the phosphotransfer capacity of Ser/Thr kinase IkB kinase (IKK), a mediator of cellular inflammation signaling. Canonically, IKK activity is promoted by activation loop phosphorylation at Ser177/Ser181. Active IKK can then unleash NF-kB signaling by phosphorylating repressor IkBα at residues Ser32/Ser26. Noting the reports of other IKK phosphorylation sites, the authors explore the extent of autophosphorylation.

      Semi-phosphorylated IKK purified from Sf9 cells, exhibits the capacity for further autophosphorylation. Anti-phosphotyrosine immunoblotting indicated unexpected tyrosine phosphorylation. Contaminating kinase activity was tested by generating a kinase-dead K44M variant, supporting the notion that the unexpected phosphorylation was IKK-dependent. In addition, the observed phosphotyrosine signal required phosphorylated IKK activation loop serines.

      Two candidate IKK tyrosines were examined as the source of the phosphotyrosine immunoblotting signal. Activation loop residues Tyr169 and Tyr188 were each rendered non-phosphorylatable by mutation to Phe. The Tyr variants decreased both autophosphorylation and phosphotransfer to IkBα. Likewise, Y169F and Y188F IKK2 variants immunoprecipitated from TNFa-stimulated cells also exhibited reduced activity in vitro.

      The authors further focus on Tyr169 phosphorylation, proposing a role as a phospho-sink capable of phosphotransfer to IkBα substrate. This model is reminiscent of the bacterial two-component signaling phosphotransfer from phosphohistidine to aspartate. Efforts are made to phosphorylate IKK2 and remove ATP to assess the capacity for phosphotransfer. Phosphorylation of IkBα is observed after ATP removal, although there are ambiguous requirements for ADP.

      Strengths:

      Ultimately, the authors draw together the lines of evidence for IKK2 phosphotyrosine and ATP-independent phosphotransfer to develop a novel model for IKK2-mediated phosphorylation of IkBα. The model suggests that IKK activation loop Ser phosphorylation primes the kinase for tyrosine autophosphorylation. With the assumption that IKK retains the bound ADP, the phosphotyrosine is conformationally available to relay the phosphate to IkBα substrate. The authors are clearly aware of the high burden of evidence required for this unusual proposed mechanism. Indeed, many possible artifacts (e.g., contaminating kinases or ATP) are anticipated and control experiments are included to address many of these concerns. The analysis hinges on the fidelity of pan-specific phosphotyrosine antibodies, and the authors have probed with two different anti-phosphotyrosine antibody clones. Taken together, the observations are thought-provoking, and I look forward to seeing this model tested in a cellular system.

      Weaknesses:

      Multiple phosphorylated tyrosines in IKK2 were apparently identified by mass spectrometric analyses. LC-MS/MS spectra are presented, but fragments supporting phospho-Y188 and Y325 are difficult to distinguish from noise. It is common to find non-physiological post-translational modifications in over-expressed proteins from recombinant sources. Are these IKK2 phosphotyrosines evident by MS in IKK2 immunoprecipitated from TNFa-stimulated cells? Identifying IKK2 phosphotyrosine sites from cells would be especially helpful in supporting the proposed model.

      Authors thank the reviewer for their elaborate comments and constructive criticisms that helped enrich the manuscript. We also thank them for pointing out the critical points in the model. We agree with the reviewer that testing this model in a cellular system is required to bolster this concept. However, an appropriate cellular assay system to investigate and monitor this mode of phosphotransfer is still elusive. We agree with the reviewer’s concerns on the identification of Y188 and Y325 as potential phosphosites. They have been omitted in the current version and relevant changes have been incorporated. IKK2’s tyrosine phosphorylation status in cells is reported earlier. Although we have not analyzed IKK2 from TNF-a treated cells in this study, a different study of phospho-status of cellular IKK2 indicated tyrosine phosphorylation (Meyer et al 2013).   

      Reviewer #3 (Public review):

      Summary:

      The authors investigate the kinase activity of IKK2, a crucial regulator of inflammatory cell signaling. They describe a novel tyrosine kinase activity of this well-studied enzyme and a highly unusual phosphotransfer from phosphorylated IKK2 onto substrate proteins in the absence of ATP as a substrate.

      Strengths:

      The authors provide an extensive biochemical characterization of the processes with recombinant protein, western blot, autoradiography, protein engineering and provide MS data now.

      Weaknesses:

      The identity and purity of the used proteins has improved in the revised work. Since the findings are so unexpected and potentially of wide-reaching interest - this is important. Similar specific detection of phospho-Ser/Thr vs phospho-Tyr relies largely on antibodies which can have varying degrees of specificity. Using multiple antibodies and MS improves the quality of the data.

      Authors thank the reviewer for their crisp comments and constructive criticisms that helped improve the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Generally, the paper is well written, but the first 4 figures are slow going and could be condensed to show the key points, so that reader gets to Figure 6 and 7 which contain the "meat" of the paper.

      Specific points:

      Several figures should be quantified and experimental reproducibility is not always clear.

      I understand that Figure 3 shows that K44M abolishes both S32/26 phosphorylation and tyrosine phosphorylation, but not PEST region phosphorylation. This suggests that autophosphorylation is reflective of its known specific biological role in signal transduction. But I do not understand why "these results strongly suggest that IKK2-autophosphorylation is critical for its substrate specificity". That statement would be supported by a mutant that no longer autophosphorylates, and as a result shows a loss of substrate specificity, i.e. phosphorylates non-specific residues more strongly. Is that the case? Maybe Darwech et al 2010 or Meyer et al 2013 showed this? Later figures seem to address this point, so maybe this conclusion should be stated later in the paper.

      Page 10: mentions DFG+1 without proper introduction. The Chen et al 2014 paper appears to inform the author's interest in Y169 phosphorylation, or is just an additional interesting finding? Does this publication belong in the Introduction or the Discussion?

      To understand the significance of Figure 4D, we need a WT IKK2 control: or is there prior literature to cite?

      This is relevant for the conclusion that Y169 phosphorylation is particularly important for S32 phosphorylation.

      The cold ATP quenching experiment is nice for testing the model that Y169 functions as a phospho sink that allows for a transfer reaction. However, there is only a single timepoint and condition, which does not allow for a quantitative analysis. Furthermore, a positive control would make this experiment more compelling, and Y169F mutant should show that cold ATP quenching reduces the phosphorylation of IkBa.

      Note after revision: I thank the authors for addressing these points. The manuscript is thereby improved.

      We thank the reviewer for appreciating our efforts in addressing their concerns.

      Reviewer #2 (Recommendations for the authors):

      In the revisions, the authors provide LC-MS/MS spectra for putative phospho-Y325 and phospho-Y188. The details are hard to see at the scale provided, but the fragment ions for pY188 and pY325 peptides are unconvincing. Phospho-Y169, on the other hand, is much more credible. In addition, the revision rebuttal clarifies that Y188 would be packed into a catalytically important core, and Y188F is likely to disrupt the fold. Taken together, it seems doubtful that Y188 is subject to any significant autophosphorylation, and presenting the Y188F data (and discussion) seems like a distraction.

      We agree with the reviewer’s concerns on the identification of Y188 and Y325 as potential phosphosites. They have been omitted in the current version and relevant sections in the manuscript text and figures have been edited.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for the careful review of our manuscript. Overall, they were positive about our use of cutting-edge methods to identify six inversions segregating in Lake Malawi. Their distribution in ~100 species of Lake Malawi species demonstrated that they were differentially segregating in different ecogroups/habitats and could potentially play a role in local adaptation, speciation, and sex determination. Reviewers were positive about our finding that the chromosome 10 inversion was associated with sex-determination in a deep benthic species and its potential role in regulating traits under sexual selection. They agree that this work is an important starting point in understanding the role of these inversions in the amazing phenotypic diversity found in the Lake Malawi cichlid flock.

      There were two main criticisms that were made which we summarize:

      (1) Lack of clarity. It was noted that the writing could be improved to make many technical points clearer. Additionally, certain discussion topics were not included that should be.

      We will rewrite the text and add additional figures and tables to address the issues that were brought up in a point-by-point response. We will improve/include (1) the nomenclature to understand the inversions in different lineages, (2) improved descriptions for various genomic approaches, (3) a figure to document the samples and technologies used for each ecogroup, and 4) integration of LR sequences to identify inversion breakpoints to the finest resolution possible.

      (2) We overstate the role that selection plays in the spread of these inversions and neglect other evolutionary processes that could be responsible for their spread.

      We agree with the overarching point. We did not show that selection is involved in the spread of these inversions and other forces can be at play. Additionally, there were concerns with our model that the inversions introgressed from a Diplotaxodon ancestor into benthic ancestors and incomplete lineage sorting or balancing selection (via sex determination) could be at play. Overall, we agree with the reviewers with the following caveats. 1. Our analysis of the genetic distance between Diplotaxodons and benthic species in the inverted regions is more consistent with their spread through introgression versus incomplete lineage sorting or balancing selection. 2. Further the role of these inversions is likely different in different species. For example, the inversion of 10 and 11 play a role in sex determination in some species but not others and the potential pressures acting on the inverted and non-inverted haplotypes will be very different. These are very interesting and important questions booth for understanding the adaptive radiations in Lake Malawi and in general, and we are actively studying crosses to understand the role of these inversions in phenotypic variation between two species. We will modify the text to make all of these points clearer.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using high-quality genomic data (long-reads, optical maps, short-reads) and advanced bioinformatic analysis, the authors aimed to document chromosomal rearrangements across a recent radiation (Lake Malawi Cichlids). Working on 11 species, they achieved a high-resolution inversion detection and then investigated how inversions are distributed within populations (using a complementary dataset of short-reads), associated with sex, and shared or fixed among lineages. The history and ancestry of the inversions is also explored.

      On one hand, I am very enthusiastic about the global finding (many inversions well-characterized in a highly diverse group!) and impressed by the amount of work put into this study. On the other hand, I have struggled so much to read the manuscript that I am unsure about how much the data supports some claims. I'm afraid most readers may feel the same and really need a deep reorganisation of the text, figures, and tables. I reckon this is difficult given the complexity brought by different inversions/different species/different datasets but it is highly needed to make this study accessible.

      The methods of comparing optical maps, and looking at inversions at macro-evolutionary scales can be useful for the community. For cichlids, it is a first assessment that will allow further tests about the role of inversions in speciation and ecological specialisation. However, the current version of the manuscript is hardly accessible to non-specialists and the methods are not fully reproducible.

      Strengths:

      (1) Evidence for the presence of inversion is well-supported by optical mapping (very nice analysis and figure!).

      (2) The link between sex determination and inversion in chr 10 in one species is very clearly demonstrated by the proportion in each sex and additional crosses. This section is also the easiest to read in the manuscript and I recommend trying to rewrite other result sections in the same way.

      (3) A new high-quality reference genome is provided for Metriaclima zebra (and possibly other assemblies? - unclear).

      (4) The sample size is great (31 individuals with optical maps if I understand well?).

      (5) Ancestry at those inversions is explored with outgroups.

      (6) Polymorphism for all inversions is quantified using a complementary dataset.

      Weaknesses:

      (1) Lack of clarity in the paper: As it currently reads, it is very hard to follow the different species, ecotypes, samples, inversions, etc. It would be useful to provide a phylogeny explicitly positioning the samples used for assembly and the habitat preference. Then the text would benefit from being organised either by variant or by subgroups rather than by successive steps of analysis.

      We have extensively rewritten the paper to improve the clarity. With respect to this point, we moved Figure 6 to Figure 1, which places the phylogeny of Lake Malawi cichlids at the beginning of the paper. We incorporated information about samples/technologies by ecogroup into this figure to help the reader gain an overview of the technologies involved. We added information about habitat for each ecogroup as well. While we considered a change to the text organization suggested here, we thought it was clearer to keep the original headings.

      (2) Lack of information for reproducibility: I couldn't find clearly the filters and parameters used for the different genomic analyses for example. This is just one example and I think the methods need to be re-worked to be reproducible. Including the codes inside the methods makes it hard to follow, so why not put the scripts in an indexed repository?

      We now provide a link to a github repository (https://github.com/ptmcgrat/CichlidSRSequencing/tree/Kumar_eLife) containing the scripts used for the major analysis in the paper. Because our data is behind a secure Dropbox account, readers will not be able to run the analysis, however, they can see the exact programs, filters, and parameters used for manuscript embedded within each script.

      (3) Further confirmation of inversions and their breakpoints would be valuable. I don't understand why the long-reads (that were available and used for genome assembly) were not also used for SV detection and breakpoint refinement.

      We did use long reads to confirm the presence of the inversions by creating five new genome assemblies from the PacBio HiFi reads: two additional Metriaclima zebra samples and three Aulonocara samples. Alignment of these five genomes to the MZ_GT3 reference is shown in Figures S2 – S7. These genome assemblies were also used to identify the breakpoints of the inversions. However, because of the extensive amount of repetitive DNA at the breakpoints (which is known to be important for the formation of large inversions), our ability to resolve the breakpoints was limited.

      (4) Lack of statistical testing for the hypothesis of introgression: Although cichlids are known for high levels of hybridization, inversions can also remain balanced for a long time. what could allow us to differentiate introgression from incomplete lineage sorting?

      The coalescent time between the inversions between Diplotaxodons and benthics should allow us to distinguish these two mechanisms. Our finding that the genetic distance, which is related to coalescent time, is closer within the inversions than the whole genome is supportive of introgression. However, we did not perform any simulations or statistical tests. We make it clearer in the text that incomplete lineage sorting remains a possible mechanism for the distribution of inversions within these ecogroups.

      (5) The sample size is unclear: possibly 31 for Bionano, 297 for short-reads, how many for long-reads or assemblies? How is this sample size split across species? This would deserve a table.

      We have included this information in the new Figure 1.

      (6) Short read combines several datasets but batch effect is not tested.

      We do not test for batch effect. However, we do note that all of the datasets were analyzed by the same pipeline starting from alignment so batch effects would be restricted to aspects of the reads themselves. Additionally, samples from the different data sets clustered as expected by lineage and inferred inversion, so for these purposes unlikely to have affected analysis.

      (7) It is unclear how ancestry is determined because the synteny with outgroups is not shown.

      Ancestry analysis was determined using the genome alignments of two outgroups from outside of Lake Malawi. This is shown in Figure S8.

      (8) The level of polymorphism for the different inversions is difficult to interpret because it is unclear whether replicated are different species within an eco-group or different individuals from the same species. How could it be that homozygous references are so spread across the PCA? I guess the species-specific polymorphism is stronger than the ancestral order but in such a case, wouldn't it be worth re-doing the PCa on a subset?

      The genomic PCA plots reflect the evolutionary histories that are observed in the whole genome phylogenies. Because the distribution of the inverted alleles violate the species tree, they form separate clusters on the PCA plots that can be used to genotype specific species. We have also performed this analysis on benthics (utaka/shallow benthics/deep benthics) and the distribution matches the expectation.

      Reviewer #2 (Public review):

      Summary:

      Chromosomal inversions have been predicted to play a role in adaptive evolution and speciation because of their ability to "lock" together adaptive alleles in genomic regions of low recombination. In this study, the authors use a combination of cutting-edge genomic methods, including BioNano and PacBio HiFi sequencing, to identify six large chromosomal inversions segregating in over 100 species of Lake Malawi cichlids, a classic example of adaptive radiation and rapid speciation. By examining the frequencies of these inversions present in species from six different linages, the authors show that there is an association between the presence of specific inversions with specific lineages/habitats. Using a combination of phylogenetic analyses and sequencing data, they demonstrate that three of the inversions have been introduced to one lineage via hybridization. Finally, genotyping of wild individuals as well as laboratory crosses suggests that three inversions are associated with XY sex determination systems in a subset of species. The data add to a growing number of systems in which inversions have been associated with adaptation to divergent environments. However, like most of the other recent studies in the field, this study does not go beyond describing the presence of the inversions to demonstrate that the inversions are under sexual or natural selection or that they contribute to adaptation or speciation in this system.

      Strengths:

      All analyses are very well done, and the conclusions about the presence of the six inversions in Lake Malawi cichlids, the frequencies of the inversions in different species, and the presence of three inversions in the benthic lineages due to hybridization are well-supported. Genotyping of 48 individuals resulting from laboratory crosses provides strong support that the chromosome 10 inversion is associated with a sex-determination locus.

      Weaknesses:

      The evidence supporting a role for the chromosome 11 inversion and the chromosome 9 inversion in sex determination is based on relatively few individuals and therefore remains suggestive. The authors are mostly cautious in their interpretations of the data. However, there are a few places where they state that the inversions are favored by selection, but they provide no evidence that this is the case and there is no consideration of alternative hypotheses (i.e. that the inversions might have been fixed via drift).

      We have removed mention of chromosome 9’s potential role in sex determination from the paper. While our analysis of sex association with chromosome 11 was limited compared to our analysis of chromosome 10, it was still statistically significant, and we believe it should be left in the paper. The role of 11 (and 9 and 10) in sex determination was also demonstrated using an independent dataset by Blumer et al (https://doi.org/10.1101/2024.07.28.605452)

      We agree that we did not properly consider alternative hypothesis in the original submission and have rewritten the Discussion substantially to consider various alternative hypothesis.

      Reviewer #3 (Public review):

      This is a very interesting paper bringing truly fascinating insight into the genomic processes underlying the famous adaptive radiation seen in cichlid fishes from Lake Malawi. The authors use structural and sequence information from species belonging to distinct ecotypic categories, representing subclades of the radiation, to document structural variation across the evolutionary tree, infer introgression of inversions among branches of the clade, and even suggest that certain rearrangements constitute new sex-determining loci. The insight is intriguing and is likely to make a substantial contribution to the field and to seed new hypotheses about the ecological processes and adaptive traits involved in this radiation.

      I think the paper could be clarified in its prose, and that the discussion could be more informative regarding the putative roles of the inversions in adaptation to each ecotypic niche. Identifying key, large inversions shared in various ways across the different taxa is really a great step forward. However, the population genomics analysis requires further work to describe and decipher in a more systematic way the evolutionary forces at play and their consequences on the various inversions identified.

      The model of evolution involving multiple inversions putatively linking together co-adapted "cassettes" could be better spelled out since it is not entirely clear how the existing theory on the recruitment of inversions in local adaptation (e.g. Kirkpatrick and Barton) operates on multiple unlinked inversions. How such loci correspond to distinct suites of integrated traits, or not, is not very easy to envision in the current state of the manuscript.

      This is a very interesting point, and we agree creates complications for a simple model of local adaptation. We imagine though that the actual evolutionary history was much more complicated than a single Rhamphochromis-type species separating from a single Diplotaxodon-type species and could have occurred sequentially involving multiple species that are now extinct. A better understanding of the role each of these inversions play in phenotypic diversity could potentially help us determine if different inversions carry variation that could be linked to distinct habit differences. We have added a line to the discussion.

      The role of one inversion in sex determination is apparent and truly intriguing. However, the implication of such locus on ecological adaptation is somewhat puzzling. Also, whether sex determination loci can flow across species via introgression seems quite important as a route to chromosomal sex determination, so this could be discussed further.

      Another very interesting point. If the inversions are involved in ecological adaptation (an important caveat), then potentially the inverted and non-inverted haplotypes play dual roles in the Aulonocara animals with the inverted haplotype carrying adaptive alleles to deep water and the non-inverted haplotype carrying alleles resolving sexual conflict. We have broadened our discussion about their function at the origin including non-adaptive roles.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Overall, the paper is well-written and clear. I do have a few suggestions for changes that would help the reader:

      (1) Figure 1: the figure legend could be expanded here to help the reader; what are the blue and yellow lines? Why are there two lines for the GT3a assembly? And, I had to somehow read the legend a few times to understand that the top line is the UMD2a reference assembly, and the next line is the new Bionano map.

      Fixed in what is now Figure 2

      (2) Paragraph starting on line 133: you use the word "test" to refer to the Bionano analyses; it is not clear whether anything is being tested. Perhaps "analyse the maps" or just "map" would be more clear? Or more explanation?

      The text has been modified to address this point

      (3) L145-146: perhaps change "a single inversion" and "a double inversion" to "single inversions" and "double inversions".

      The text has been modified to address this point

      (4) L157: suppression of recombination in inversion heterozygotes is "textbook" material and perhaps does not need a reference. Or, you could reference an empirical paper that demonstrates this point. Though I love the Kirkpatrick and Barton paper, it certainly is not the correct reference for this point.

      The Kirkpatrick reference was incorrectly included here. The correct reference was an empirical demonstration (Conte) that there were regions of suppressed recombination that have been observed in the location of the inversions. We have also moved this reference further up in the sentence to a more appropriate position

      (5) L173: how do you know this is an assembly error and not polymorphism?

      The text has been modified to address this point

      (6) L277(?): "currently growing in the lab" is probably unnecessary.

      The text has been modified to address this point

      (7) L298: "the inversion on 10 acts as an XY sex determiner": the inversion itself is not the sex determination gene; rather, it is linked. I think it would be more precise, here and throughout the paper, to say that these inversions likely harbor the sex determination locus (for example, the wording on lines 369-370 is misleading).

      We agree with the larger point that the inversion might not be causal for sex determination, however, it could still be causal through positional effects. We have modified the text to make it clear that it could also carry the causal locus (or loci).

      (8) Figure 6: overall, this figure is very helpful! However, it contains several problematic statements. In no case do you have evidence that these inversions are "favored by selection"; such statements should be deleted. Also, in point 3, you state that inversions 9, 11, and 20 are transferred to benthic lineages, and then that these inversions are involved in sex determination. But, your data suggests that it is chromosomes 9, 10, and 11 that are linked to sex determination.

      This figure is now Figure 1. We have remove these problematic statements.

      (9) L356-360: I would move the references that are currently at the end of the sentence to line 357 after the statement about the previous work on hybridization. Otherwise, it reads as if these previous papers demonstrated what you have demonstrated in your work.

      The text has been modified to address this point

      (10) Overall, the discussion focuses completely on adaptive explanations for your results, and I would like to see at least an acknowledgement that drift could also be involved unless you have additional data to support adaptive explanations.

      We have rewritten the text to account for the possibility of drift (line 404 and 405).

      Reviewer #3 (Recommendations for the authors):

      The paper utilizes heterogeneous datasets coming from different sources, and it is not always clear which specimens were used to generate structural information (bionano) or sequence information. A diagram summarizing the sequence data, methodologies, and research questions would be beneficial for the reader to navigate in this paper.

      Much of this information has been added to what is now Figure 1. All of this data is also found in Table S2.

      The authors performed genome alignments to analyze and homologize inversion, but this process is not clearly described. For the PCA, SNP information likely involves mapping onto a common reference genome. However, it is not clear how this was achieved given the different species and varying divergence times involved.

      We now include a link to the github that contains the commands that were run. Because the overall level of sequence divergence between cichlid species is quite low (2*10^-3 – Milansky et al), mapping different species onto a common reference is commonly performed in Lake Malawi cichlids.

      The introgression scenario is very intriguing but its role in local adaptation of the ecogroup types is not easy to understand. I understand this is still an outstanding question, but it is unclear how the directionality of introgressions was estimated. This can be substantiated using tree topology analysis, comparative estimates of sequence divergence, and accumulation of DNA insertions. The diagram does not clearly indicate which ones are polymorphic. In some cases, polymorphic inversions could result from the coexistence of native and introgressed haplotypes.

      We agree that this analysis would be interesting but is beyond the scope of this paper.

      The alternative model of introgression proposed in the cited preprint is interesting and should deserve a formal analysis here. The authors consider unclear what would drive "back" introgressions of non-inverted haplotypes, but this would depend on the selection regimes acting on the inversions themselves, which can include forms of balancing selection and a role for recessive lethals (heterozygote advantage). For instance, a standard haplotype could be favored if it shelters deleterious mutations carried by an inversion. Testing the introgression history over a wider range of branches and directions would provide further insights.

      We agree that this analysis would be interesting but is beyond the scope of this paper.

      The prose in the paper is occasionally muddled and somewhat unclear. Referring to chromosomes solely by their numbers (e.g.. "inversion on 11") complicates readability.

      This is the standard way to refer to chromosomes in cichlids and we believe while it complicates readability, any other method would be inconsistent with other papers. Changes to nomenclature might improve the readability of this paper, but would make it more difficult to compare results for these chromosomes from other papers with what we have found.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1:

      This paper seeks to address the question of how quantitative trait variation and expression variation are related. scRNAseq represents an appealing approach to eQTL mapping as it is possible to simultaneously genotype individual cells and measure expression in the same cell. As eQTL mapping requires large sample sizes to identify statistical relationships, the use of scRNAseq is likely to dramatically increase the statistical power of such studies. However, there are several technical challenges associated with scRNAseq and the authors' study is focused on addressing those challenges. Most of the points raised by my review of the initial version have been addressed. However, one point remains and one additional point should be considered. In this version the authors have introduced the use of data imputation using a published algorithm, DISCERN. This has greatly increased the variation explained by their model as presented in figure 3. However, it is possible that the explained variance is now an overestimation as a result of using the imputed expression data. I think that it would be appropriate to present figure 3 using the sparse data presented in the initial version of the paper and the newly presented imputed data so that the reader can draw their own conclusions about the interpretation.

      We thank the reviewer for pointing this out and decided to present the results obtained from the sparse data in the main Figure 3 to avoid any overestimation. We also performed the variance partitioning at different sample sizes and used an optimized implementation of the GREML method to be able to handle high sample sizes instead of having to use a bootstrap estimate. As for the benefits of denoising the expression data, we illustrated it in the supplementary figure S6 so that people can draw their own conclusions about this imputation method. The imputation generally increases the contribution of the expressiongenotype interaction and decreases the residuals of the model by up to 8%.

      Reviewer #1:

      Given that the authors overcame many technical and analytical challenges in the course of this research, the study would be greatly strengthened through analysis of at least one, and ideally several, more conditions which would expand the conclusions that could be drawn from the study and demonstrate the power of using scRNAseq to efficiently quantify expression in different environments.

      Our aim was to illustrate the benefit of one-pot scRNA-seq for eQTL mapping and the association of transcriptomic variation to trait variation. We think we have reached this goal with the current study. We understand that performing another scRNA-seq experiment in a new environment would help expand/validate our conclusions, but we think this would be a better fit for a future study. 

      Reviewer #2:

      The authors now say the main take-home for their work is (1) they have established methods for linkage mapping with scRNA-seq and that these (2) "can help gain insights about the genotype-phenotype map at a broader scale." My opinion in this revision is much the same as it was in the first round: I agree that they have met the first goal, and the second theme has been so well explored by other literature that I'm not convinced the authors' results meet the bar for novelty and impact. To my mind, success for this manuscript would be to support the claim that the scRNA-seq approach helps "reveal hidden components of the yeast genotype-to-phenotype map." I'm not sure the authors have achieved this. I agree that the new Figure 3 is a nice addition-a result that apparently hasn't been reported elsewhere (30% of growth trait variation can't be explained by expression). The caveats are that this is a negative result that needs to be interpreted with caution; and that it would be useful for the authors to clarify whether the ability to do this calculation is a product of the scRNA-seq method per se or whether they could have used any bulk eQTL study for it. Beside this, I regret to say that I still find that the results in the revision recapitulate what the bulk eQTL literature has already found, especially for the authors' focal yeast cross: heritability, expression hotspots, the role of cis and transacting variation, etc.

      We agree with the reviewer that this study does not reveal new modes of transcription regulation or phenomena that were not highlighted or hypothesized in the literature. To avoid confusion, we refrained from using the word “reveal” for such cases. However, we provide convincing evidence that one-pot scRNA-seq helps refining our understanding of genotype-phenotype map in two ways. First, the larger scalability of this approach allowed us to find a median number of eQTL per gene that is ~4 times higher than the largest bulk-eQTL mapping in the same genetic background. For 60% of these genes, i.e. the ones with higher expression heritability in our dataset, the ability to explain their transcriptomic variation from SNPs increased by ~16% on average, which is substantial. This gain in power can thus improve our understanding of the gene network by highlighting new downstream effects of mutations or transcriptome variation. Second, by performing one-pot eQTL as opposed to large-scale bulk eQTL, thousands of transcriptomes can be collected simultaneously without having to use batching strategies. This enables the association between phenotype, genotype and expression variation, which we show in figure 3 through variance partitioning. While it is possible that the growth trait variation not being fully explained by expression could be an artifact of scRNA-seq, we do not believe this is the case because most transcriptional variation is explained by genotype (~76%).

      Furthermore, we show that by having to control expression for growth, by missing some hotspots of regulation and by missing multiple eQTL for each gene, previous bulk-eQTL analysis could not replicate the significant association between eQTL hotspots and QTL hotspot, which this study highlights. Thus, we agree in general that many of the insights about transcriptional regulation have been obtained through ‘brute-force’, bulk RNA-seq, which fundamentally can reach tens of thousands of transcriptomes as well, but we believe the one-pot scRNA-seq approach is much easier and expedient once genotyping the single-cells and other challenges regarding denoising and low coverage have been solved (which we believe we did). There is indeed another reviewed preprint [Boocock et al, eLife] that has used similar approaches as our study since the publication of our manuscript (in October 2023).

      Likewise, when in the first round of review I recommended that the authors repeat their analyses on previous bulk RNA-seq data from Albert et al., my point was to lead the authors to a means to provide rigorous, compelling justification for the scRNA-seq approach. The response to reviewers and the text (starting on line 413) says the comparison in its current form doesn't serve this purpose because Albert et al. studied fewer segregants. Wouldn't down-sampling the current data set allow a fair comparison? Again, to my mind what the current manuscript needs is concrete evidence that the scRNA-seq method per se affords truly better insights relative to what has come before.

      We agree that down-sampling the current dataset would allow for a fair comparison. Thus, we illustrate the results of the variance partitioning at different sample sizes. While the total variance explained is similar, the contribution of the genotype-expression interaction increases with sample size, highlighting the increase in the confidence of the associations between expression and genotype that contributed to trait variation. We also showed that a lot of important low-effect sizes eQTL are missing at a sample size of 1000 compared to a sample size 4000. Indeed, by increasing the scale of eQTL mapping by ~4, about 60% of genes have increased heritability and this increase is due to eQTLs that cumulatively explain more than 15% of transcript level variation.

      I also recommend that the authors take care to improve the main text for readability and professionalism. It would benefit from further structural revision throughout (especially in the figure captions) to allow high-impact conclusions to be highlighted and low-impact material to be eliminated. Figure 4 and the results text sections from line 319 onward could be edited for concision or perhaps moved to supplementary if they obscure the authors' case for the scRNA-seq approach. The text could also benefit from copy editing (e.g. three clauses starting with "while" in the paragraph starting on line 456; "od ratio" on line 415). I appreciate the authors' work on the discussion, including posing big picture questions for the field (lines 426-429), but I don't see how they have anything to do with the current scRNA-seq method.

      We thank the reviewer for their suggestions for improving the readability of the text. We edited some of the figure captions and result section titles to better highlight the main results. However, we do not think that the last result section obscures our findings but rather supports the fact that scRNA-seq refines our understanding of the GPM. Indeed, we discovered many new eQTLs that are related to both expression and trait variation, highlighting the potential for understanding the downstream effects of mutations on the gene network and on trait variation through multiple trans-regulation paths.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This work provides a new Python toolkit for combining generative modeling of neural dynamics and inversion methods to infer likely model parameters that explain empirical neuroimaging data. The authors provided tests to show the toolkit's broad applicability and accuracy; hence, it will be very useful for people interested in using computational approaches to better understand the brain.

      Strengths:

      The work's primary strength is the tool's integrative nature, which seamlessly combines forward modelling with backward inference. This is important as available tools in the literature can only do one and not the other, which limits their accessibility to neuroscientists with limited computational expertise. Another strength of the paper is the demonstration of how the tool can be applied to a broad range of computational models popularly used in the field to interrogate diverse neuroimaging data, ensuring that the methodology is not optimal to only one model. Moreover, through extensive in-silico testing, the work provided evidence that the tool can accurately infer ground-truth parameters, which is important to ensure results from future hypothesis testing are meaningful.

      We are happy to hear the positive feedback on our effort to provide an open-source and widely accessible tool for both fast forward simulations and flexible model inversion, applicable across popular models of large-scale brain dynamics.

      Weaknesses:

      Although the tool itself is the main strength of the work, the paper lacked a thorough analysis of issues concerning robustness and benchmarking relative to existing tools.

      The first issue is the robustness to the choice of features to be included in the objective function. This choice significantly affects the training and changes the results, as the authors even acknowledged themselves multiple times (e.g., Page 17 last sentence of first paragraph or Page 19 first sentence of second paragraph). This brings the question of whether the accurate results found in the various demonstrations are due to the biased selection of features (possibly from priors on what worked in previous works). The robustness of the neural estimator and the inference method to noise was also not demonstrated. This is important as most neuroimaging measurements are inherently noisy to various degrees.

      The second issue is on benchmarking. Because the tool developed is, in principle, only a combination of existing tools specific to modeling or Bayesian inference, the work failed to provide a more compelling demonstration of its added value. This could have been demonstrated through appropriate benchmarking relative to existing methodologies, specifically in terms of accuracy and computational efficiency.

      We fully agree with the reviewer that the VBI estimation heavily depends on the choice of data features, and this is the core of the inference procedure, not its weakness. We have demonstrated different scenarios showing how the informativeness of features (commonly used in the literature) results in varying uncertainty quantification. For instance, using summary statistics of functional connectivity (FC) and functional connectivity dynamics (FCD) matrices to estimate global coupling parameter leads to fast convergence; however, it is not sufficient to accurately estimate the whole-brain heterogeneous excitability parameter, which requires features such as statistical moments of time series. VBI provides a taxonomy of data features that users can employ to test their hypotheses. It is important to note that one major advantage of VBI is its ability to make estimation using a battery of data features, rather than relying on a limited set (such as only FC or FCD) as is often the case in the literature. In the revised version, we will elaborate further by presenting additional scenarios to demonstrate the robustness of the estimation. We will also evaluate the robustness of the neural density estimators to (dynamical/additive) noise.

      More importantly, relative to benchmarking, we would like to draw attention to a key point regarding existing tools and methods. The literature often uses optimization for fitting whole-brain network models, and its limitations for reliable causal hypothesis testing have been pointed out in the Introduction/Discussion. As also noted by the reviewer under strengths, and to the best of our knowledge, there are no existing tools other than VBI that can scale and generalize to operate across whole-brain models for Bayesian model inversion. Previously, we developed Hamiltonian Monte Carlo (HMC) sampling for Epileptor model in epilepsy (Hashemi et al., 2020, Jha et al., 2022). This phenomenological model is very well-behaved in terms of numerical integration, gradient calculation, and dynamical system properties (Jirsa et al., 2014). However, this does not directly generalize to other models, particularly the Montbrió model for resting-state, which exhibits bistability with noise driving transitions between states. As shown in Baldy et al., 2024, even at the level of a single neural mass model (i.e., one brain region), gradient-based HMC failed to capture such switching behaviour, particularly when only one state variable (membrane potential) was observed while the other (firing rate) was missing. Our attempts to use other methods (e.g., the second-derivative-based Laplace approximation used in Dynamic Causal Modeling) also failed, due to divergence in gradient calculation. Nevertheless, reparameterization techniques (Baldy et al., 2024) and hybrid algorithms (Gabrié et al., 2022) could offer improvements, although this remains an open problem for these classes of computational models.

      In sum, for oscillatory systems, it has been shown previously that SBI approach used in VBI substantially outperforms both gradient-based and gradient-free alternative methods (Gonçalves et al., 2020, Hashemi et al., 2023, Baldy et al., 2024). Importantly, for bistable systems with switching dynamics, gradient-based methods fail to converge, while gradient-free methods do not scale to the whole-brain level (Hashemi et al., 2020). Hence, the generalizability of VBI relies on the fact that neither the model nor the data features need to be differentiable. We will clarify this point in the revised version. Moreover, we will provide better explanations for some terms mentioned by the reviewer in Recommendations.

      Hashemi, M., Vattikonda, A. N., Sip, V., Guye, M., Bartolomei, F., Woodman, M. M., & Jirsa, V. K. (2020). The Bayesian Virtual Epileptic Patient: A probabilistic framework designed to infer the spatial map of epileptogenicity in a personalized large-scale brain model of epilepsy spread. NeuroImage, 217, 116839.

      Jha, J., Hashemi, M., Vattikonda, A. N., Wang, H., & Jirsa, V. (2022). Fully Bayesian estimation of virtual brain parameters with self-tuning Hamiltonian Monte Carlo. Machine Learning: Science and Technology, 3(3), 035016.

      Jirsa, V. K., Stacey, W. C., Quilichini, P. P., Ivanov, A. I., & Bernard, C. (2014). On the nature of seizure dynamics. Brain, 137(8), 2210-2230.

      Baldy, N., Breyton, M., Woodman, M. M., Jirsa, V. K., & Hashemi, M. (2024). Inference on the macroscopic dynamics of spiking neurons. Neural Computation, 36(10), 2030-2072.

      Baldy, N., Woodman, M., Jirsa, V., & Hashemi, M. (2024). Dynamic Causal Modeling in Probabilistic Programming Languages. bioRxiv, 2024-11.

      Gabrié, M., Rotskoff, G. M., & Vanden-Eijnden, E. (2022). Adaptive Monte Carlo augmented with normalizing flows. Proceedings of the National Academy of Sciences, 119(10), e2109420119.

      Gonçalves, P. J., Lueckmann, J. M., Deistler, M., Nonnenmacher, M., Öcal, K., Bassetto, G., ... & Macke, J. H. (2020). Training deep neural density estimators to identify mechanistic models of neural dynamics. eLife, 9, e56261.

      Hashemi, M., Vattikonda, A. N., Jha, J., Sip, V., Woodman, M. M., Bartolomei, F., & Jirsa, V. K. (2023). Amortized Bayesian inference on generative dynamical network models of epilepsy using deep neural density estimators. Neural Networks, 163, 178-194.

      Reviewer #2 (Public review):

      Summary:

      Whole-brain network modeling is a common type of dynamical systems-based method to create individualized models of brain activity incorporating subject-specific structural connectome inferred from diffusion imaging data. This type of model has often been used to infer biophysical parameters of the individual brain that cannot be directly measured using neuroimaging but may be relevant to specific cognitive functions or diseases. Here, Ziaeemehr et al introduce a new toolkit, named "Virtual Brain Inference" (VBI), offering a new computational approach for estimating these parameters using Bayesian inference powered by artificial neural networks. The basic idea is to use simulated data, given known parameters, to train artificial neural networks to solve the inverse problem, namely, to infer the posterior distribution over the parameter space given data-derived features. The authors have demonstrated the utility of the toolkit using simulated data from several commonly used whole-brain network models in case studies.

      Strengths:

      (1) Model inversion is an important problem in whole-brain network modeling. The toolkit presents a significant methodological step up from common practices, with the potential to broadly impact how the community infers model parameters.

      (2) Notably, the method allows the estimation of the posterior distribution of parameters instead of a point estimation, which provides information about the uncertainty of the estimation, which is generally lacking in existing methods.

      (3) The case studies were able to demonstrate the detection of degeneracy in the parameters, which is important. Degeneracy is quite common in this type of model. If not handled mindfully, they may lead to spurious or stable parameter estimation. Thus, the toolkit can potentially be used to improve feature selection or to simply indicate the uncertainty.

      (4) In principle, the posterior distribution can be directly computed given new data without doing any additional simulation, which could improve the efficiency of parameter inference on the artificial neural network if well-trained.

      We thank the reviewer for the careful consideration of important aspects of the VBI tool, such as uncertainty quantification, degeneracy detection, parallelization, and amortization strategy.

      Weaknesses:

      (1) While the posterior estimator was trained with a large quantity of simulated data, the testing/validation is only demonstrated with a single case study (one point in parameter space) per model. This is not sufficient to demonstrate the method's accuracy and reliability, but only its feasibility. Demonstrating the accuracy and reliability of the posterior estimation in large test sets would inspire more confidence.

      (2) The authors have only demonstrated validation of the method using simulated data, but not features derived from actual EEG/MEG or fMRI data. So, it is unclear if the posterior estimator, when applied to real data, would produce results as sensible as using simulated data. Human data can often look quite different from the simulated data, which may be considered out of distribution. Thus, the authors should consider using simulated test data with out-of-distribution parameters to validate the method and using real human data to demonstrate, e.g., the reliability of the method across sessions.

      (3) The z-scores used to measure prediction error are generally between 1-3, which seems quite large to me. It would give readers a better sense of the utility of the method if comparisons to simpler methods, such as k-nearest neighbor methods, are provided in terms of accuracy.

      (4) A lot of simulations are required to train the posterior estimator, which seems much more than existing approaches. Inferring from Figure S1, at the required order of magnitudes of the number of simulations, the simulation time could range from days to years, depending on the hardware. Although once the estimator is well-trained, the parameter inverse given new data will be very fast, it is not clear to me how often such use cases would be encountered. Because the estimator is trained based on an individual connectome, it can only be used to do parameter inversion for the same subject. Typically, we only have one session of resting state data from each participant, while longitudinal resting state data where we can assume the structural connectome remains constant, is rare. Thus, the cost-efficiency and practical utility of training such a posterior estimator remains unclear.

      We agree with the reviewer that it is necessary to show results on larger synthetic test sets, and we will elaborate further by presenting additional scenarios to demonstrate the robustness of the estimation. However, there are some points raised by the reviewer that we need to clarify.

      The validation on empirical data was beyond the scope of this study, as it relates to model validation rather than the inversion algorithms. This is also because we aimed to avoid repetition, given that we have previously demonstrated model validation on empirical data using these techniques, for invasive sEEG (Hashemi et al., 2023), MEG (Sorrentino et al., 2024), EEG (Angiolelli et al., 2025) and fMRI (Lavanga et al., 2024, Rabuffo et al., 2025). Note that if the features of the observed data are not included during training, VBI ignores them, as it requires an invertible mapping function between parameters and data features.

      We have used z-scores and posterior shrinkage to measure prediction performance, as these are Bayesian metrics that take into account the variance of both prior and posterior rather than only the mean value or thresholding for ranking of the prediction used in k-NN or confusion matrix methods. This helps avoid biased accuracy estimation, for instance, if the mean posterior is close to the true value but there is no posterior shrinkage. Although shrinkage is bounded between 0 and 1, we agree that z-scores have no upper bound for such diagnostics.

      Finally, the number of required simulations depends on the dimensionality of the parameter space and the informativeness of the data features. For instance, estimating a single global scaling parameter requires around 100 simulations, whereas estimating whole-brain heterogeneous parameters requires substantially more simulations. Nevertheless, we have provided fast simulations, and one key advantage of VBI is that simulations can be run in parallel (unlike MCMC sampling, which is more limited in this regard). Hence, with commonly accessible CPUs/GPUs, the fast simulations and parallelization capabilities of the VBI tool allow us to run on the order of 1 million simulations within 2–3 days on desktops, or in less than half a day on supercomputers at cohort level, rather than over several years! It has been previously shown that the SBI method used in VBI provides an order-of-magnitude faster inversion than HMC for whole-brain epilepsy spread (Hashemi et al., 2023). Moreover, after training, the amortized strategy is critical for enabling hypothesis testing within seconds to minutes. We agree that longitudinal resting-state data under the assumption of a constant structural connectome is rare; however, this strategy is essential in brain diseases such as epilepsy, where experimental hypothesis testing is prohibitive.

      We will clarify these points and better explain some terms mentioned by the reviewer in the revised manuscript.

      Hashemi, M., Vattikonda, A. N., Jha, J., Sip, V., Woodman, M. M., Bartolomei, F., & Jirsa, V. K. (2023). Amortized Bayesian inference on generative dynamical network models of epilepsy using deep neural density estimators. Neural Networks, 163, 178-194.

      Sorrentino, P., Pathak, A., Ziaeemehr, A., Lopez, E. T., Cipriano, L., Romano, A., ... & Hashemi, M. (2024). The virtual multiple sclerosis patient. Iscience, 27(7).

      Angiolelli, M., Depannemaecker, D., Agouram, H., Regis, J., Carron, R., Woodman, M., ... & Sorrentino, P. (2025). The virtual parkinsonian patient. npj Systems Biology and Applications, 11(1), 40.

      Lavanga, M., Stumme, J., Yalcinkaya, B. H., Fousek, J., Jockwitz, C., Sheheitli, H., ... & Jirsa, V. (2023). The virtual aging brain: Causal inference supports interhemispheric dedifferentiation in healthy aging. NeuroImage, 283, 120403.

      Rabuffo, G., Lokossou, H. A., Li, Z., Ziaee-Mehr, A., Hashemi, M., Quilichini, P. P., ... & Bernard, C. (2025). Mapping global brain reconfigurations following local targeted manipulations. Proceedings of the National Academy of Sciences, 122(16), e2405706122.

    1. Author response:

      We thank all three reviewers for providing excellent suggestions that we feel will enhance the clarity and impact of our manuscript. When we submit the revised manuscript, we plan to respond to each comment and provide additional data and discussion points as requested. Below, we include an outline of the main points that we intend to address.

      (1) Reviewers 1 and 2 both suggested investigating degenerative changes in Purkinje cells that are more resistant to age-related loss. We will look for hallmarks of neurodegeneration, such as shrunken dendrites and axonal swellings, in two areas: surviving Purkinje cells adjacent to stripes of cell loss, and the Purkinje cells in aged mice without Purkinje cell loss.

      (2) We agree with Reviewer 2’s point that our manuscript would benefit from discussion of the differences in vulnerability between individual mice.  Therefore, we will elaborate upon possible reasons why some aged mice are more resistant to age-related Purkinje cell loss than others.

      (3) We will take Reviewer 3’s suggestion to perform zebrin II co-staining in our GFP reporter mice, given our findings that calbindin staining can be unreliable in this context. 4) We appreciate Reviewer 3’s comment that quantification would support the observations made in our study. To provide quantitative evidence for our categorization of mice with striped and non-striped Purkinje cell loss, we will measure the gaps (or lack thereof) between Purkinje cell bodies in the anterior zone.

      (4) We will also incorporate several minor but important changes suggested by all three reviewers.

      Thank you to the reviewers and editors for taking the time and effort to review our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents compelling evidence for a novel treatment approach in a challenging patient population with MSS/pMMR mCRC, where traditional immunotherapy has often fallen short. The combination of SBRT and tislelizumab not only yielded a high disease control rate but also indicated significant improvements in the tumor's immune landscape. The safety profile appears favorable, which is crucial for patients who have already undergone multiple lines of therapy.

      Strengths:

      The results underscore the potential of leveraging radiation therapy to enhance the effectiveness of immunotherapy, especially in tumor environments previously deemed hostile to immune interventions. Future research should focus on larger cohorts to validate these findings and explore the underlying mechanisms of immune modulation post-treatment.

      Weaknesses:

      I believe the author's work is commendable and should be considered with some minor modifications:

      (1) While the author categorized patients based on the type of RAS mutation and the location of colorectal cancer metastasis, the article does not adequately address how these classifications influence treatment outcomes. Such as whether KRAS or NRAS mutations, as well as the type of metastatic lesions, affect the sensitivity to gamma-ray treatment and lead to varying responses.

      Thank you very much for your question. Therefore, in the revised manuscript, we added an analysis of the impact of RAS mutation types and different metastatic sites on patient prognosis, but unfortunately, due to the limited number of samples, we were unable to obtain satisfactory results. We also placed the relevant results in the supplementary figure.

      (2) In Figure 2, clarification is needed on how the author differentiated between on-target and off-target lesions. I observed that some images depicted both lesion types at the same level, which could lead to confusion.

      We sincerely apologize for any oversight in our previous submission. To clarify, during the process of radiotherapy planning, we pre-select target lesions at the CT image level, and subsequently define the planning treatment volume (PTV) by marking these pre-selected areas with the 50% isodose lines. In our efficacy evaluation, we distinguish between the target lesions inside the PTV and any lesions outside the target area. In response to your valuable feedback, we have now added the isodose lines for the target lesions to the supplementary figure for greater clarity.

      (3) The author performed only a basic difference analysis. A more comprehensive analysis, including calculations of markers related to treatment efficacy, could offer additional insights for clinical practice.

      To identify potential markers associated with treatment efficacy, we attempted to establish a Cox proportional hazards model and conducted both univariate and multivariate Cox regression analyses. Unfortunately, due to the constraints of sample size and sequencing depth, the analyses did not yield statistically significant results, and we were unable to identify markers that could clearly predict treatment outcomes.

      (4) The transcriptome sequencing analysis provides insights into how stereotactic radiotherapy sensitizes immunotherapy; however, it currently relies on a simple pre- and post-treatment group comparison. It would be beneficial to include additional subgroups to explore more nuanced findings.

      We acknowledge the limitations in the depth of our analysis. In addition to performing differential analysis between the responder group (PR) and the non-responder group (Non-PR), we also conducted differential gene expression analysis on samples before and after treatment. The results revealed a consistent increase in the expression of NOS2 in both groups following Gamma Knife combined with immunotherapy, suggesting that this gene may serve as a potential prognostic factor influencing treatment outcomes. However, given the limited number of studies exploring the role of NOS2 in this context, we recognize that further research is necessary to better understand its involvement and to substantiate its potential as a predictive marker.

      (5) The author briefly discusses the effects of changes in tumor fibrosis and angiogenesis on treatment outcomes. Further experiments may be necessary to validate these findings and investigate the underlying mechanisms of immune regulation following treatment.

      We sincerely appreciate your thoughtful feedback on our results. In response, we conducted additional experiments, including immunohistochemical analysis of patient samples before and after combined treatment. The results demonstrated a reduction in the expression of CD31, a marker of tumor angiogenesis, following the combined treatment. This finding further supports our hypothesis that Gamma Knife treatment, in combination with immunotherapy, may effectively inhibit tumor angiogenesis, contributing to an improved therapeutic outcome.

      Reviewer #2 (Public review):

      Summary:

      This Phase II clinical trial investigates the combination of Gamma Knife Stereotactic Body Radiation Therapy (SBRT) with Tislelizumab for the treatment of metastatic colorectal cancer (mCRC) in patients with proficient mismatch repair (pMMR). The study addresses a critical clinical challenge in the management of pMMR CRC, focusing on the selection of appropriate candidates. The results suggest that the combination of Gamma Knife SBRT and Tislelizumab provides a safe and potent treatment option for patients with pMMR/MSS/MSI-L mCRC who have become refractory to first- and second-line chemotherapy. The study design is rigorous, and the outcomes are promising.

      Advantage:

      The trial design was meticulously structured, and appropriate statistical methods were employed to rigorously analyze the results. Bioinformatics approaches were utilized to further elucidate alterations in the patient's tumor microenvironment and to explore the underlying factors contributing to the observed differences in treatment efficacy. The conclusions drawn from this trial offer valuable insights for managing advanced colorectal cancer in patients who have not responded to first- and second-line therapies.

      Weakness:

      (1) Clarity and Structure of the Abstract<br /> - Results Section: The results section should contain important data, I suggest some important sequencing data should be shown to enhance understanding.

      Thank you for your insightful question. In response, we have revised the content of the article and restructured the abstract to enhance its scientific clarity and make it more accessible to readers.

      (2) As the author using the NanoString assay for transcriptome analysis, more detail should be shown such as the version of R, and the bioinformatics analysis methods.

      We have also addressed the missing details in our research methodology. The revised manuscript now includes a complete description of the research methods, along with the specific software and versions used.

      (3) It is interesting for included patients that PD-L1 increase expression after Gamma Knife Stereotactic Body Radiation Therapy (SBRT) treatment, How to explain it?

      Thank you for your thought-provoking question. PD-L1 plays a crucial role in tumor cell immune evasion, and anti-PD-1/PD-L1 inhibitors have emerged as effective immune checkpoint inhibitors, widely used in cancer therapy. In our clinical trials, we observed an increase in PD-L1 expression in some patients following combined treatment. Existing literature suggests that activation of various carcinogenic and stress response pathways, along with post-transcriptional modifications of PD-L1 (such as phosphorylation, glycosylation, acetylation, ubiquitination, and palmitoylation), can influence its expression[1]. We hypothesize that the increase in PD-L1 expression may be attributed to the activation of specific signaling pathways induced by the radiation from Gamma Knife treatment, as well as the enhanced tumor stress in response to the treatment. However, the precise mechanisms underlying this observation require further experimental investigation. A deeper understanding of these processes could potentially optimize our clinical treatment strategies.

      (4) It would be helpful to include a brief discussion of the limitations of the study, such as sample size constraints and their impact on the generalizability of the results. This will give readers a more comprehensive understanding of the findings.

      Thank you for highlighting the limitations of the article. In response, we have added a detailed discussion of the constraints arising from the limited number of experimental samples and insufficient sequencing depth. This addition aims to provide readers with a clearer understanding of the study's limitations and the context of our research findings.

      (5) Language Accuracy: There are a few instances where wording could be more professional or precise.

      Regarding the language deficiency, we are very sorry that the wording of the professional content in the article is not careful and accurate enough due to the difference in the native language environment. We have checked our article again and revised the wording and grammar in the hope that you and other readers can grasp our research content more accurately.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The research presented in this article is commendable; however, I would like to propose several revisions for consideration:

      Consideration of Concomitant Medications: It is imperative to ascertain whether enrolled patients utilized additional pharmacological agents alongside the trial regimen. Such concurrent drug use could potentially influence the final outcomes. A concise discussion of this aspect is warranted within the manuscript.

      Clinical Characterization of Response Groups: An examination of the clinical characteristics distinguishing the effective and non-responsive cohorts within the trial is essential. This inquiry merits further exploration, as it may elucidate factors influencing treatment efficacy.

      Tumor Microenvironment Analysis: The authors highlight the implications of tumor fibrosis and angiogenesis on therapeutic response. Identification of specific biomarkers associated with these phenotypes is crucial. I recommend undertaking straightforward testing and validation to substantiate these observations.

      Thank you very much for your valuable suggestions, many of which have been incorporated into the revised manuscript. Regarding the consideration of concurrent medication, we would like to clarify that all patients included in the study were advanced CRC patients who had progressed during first- or second-line treatments. As such, targeted therapy or chemotherapy was used concurrently in the trial. Previous studies have not indicated that different targeted therapies influence the efficacy of Gamma Knife treatment, though some chemotherapy agents may vary in their side effects. However, we believe these differences do not significantly impact the final outcomes. Given that existing chemotherapy regimens do not substantially affect patient prognosis, we considered the combined drug treatment regimen to be an irrelevant variable in our analysis.

      Additionally, we have carefully examined the clinical characteristics of patients across different groups. We have also included an analysis of the impact of various mutation types and metastatic sites in the revised manuscript. Furthermore, we plan to perform CD31 staining on lesions from both the responder and non-responder groups before and after Gamma Knife treatment to assess the role of angiogenesis in treatment response.

      Reviewer #2 (Recommendations for the authors):

      The abstract should be revised for greater clarity and include key results that substantiate the conclusions. The discussion section needs to more thoroughly address the limitations of the clinical trial, providing readers with a deeper understanding of the trial's findings and implications. Additionally, the methods section should be more rigorous and detailed, offering sufficient information to enhance the transparency and robustness of the experimental design.

      Thank you for your constructive suggestions regarding the shortcomings in our manuscript. In response, we have thoroughly reviewed the article and addressed the missing content, including revisions to the abstract, results, discussion, and methods sections. Additionally, we have refined the grammar and wording throughout the manuscript to enhance its professionalism and ensure it aligns with the standards expected for publication.

      (1)  YAMAGUCHI H, HSU J M, YANG W H, et al. Mechanisms regulating PD-L1 expression in cancers and associated opportunities for novel small-molecule therapeutics [J]. Nature reviews Clinical oncology, 2022, 19(5): 287-305.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The Authors investigated the anatomical features of the excitatory synaptic boutons in layer 1 of the human temporal neocortex. They examined the size of the synapse, the macular or the perforated appearance and the size of the synaptic active zone, the number and volume of the mitochondria, the number of the synaptic and the dense core vesicles, also differentiating between the readily releasable, the recycling and the resting pool of synaptic vesicles. The coverage of the synapse by astrocytic processes was also assessed, and all the above parameters were compared to other layers of the human temporal neocortex. The Authors conclude that the subcellular morphology of the layer 1 synapses is suitable for the functions of the neocortical layer, i.e. the synaptic integration within the cortical column. The low glial coverage of the synapses might allow the glutamate spillover from the synapses enhancing synaptic crosstalk within this cortical layer.

      Strengths:

      The strengths of this paper are the abundant and very precious data about the fine structure of the human neocortical layer 1. Quantitative electron microscopy data (especially that derived from the human brain) are very valuable, since this is a highly time- and energy consuming work. The techniques used to obtain the data, as well as the analyses and the statistics performed by the Authors are all solid, strengthen this manuscript, and mainly support the conclusions drawn in the discussion.

      Comments on latest version:

      The corrected version of the article titled “Ultrastructural sublaminar specific diversity of excitatory synaptic boutons in layer 1 of the adult human temporal lobe neocortex" has been improved thanks to the comments and suggestions of the reviewers. The Authors implemented several of my comments and suggestions. However, many of them were not completed. It is understandable that the Authors did not start a whole new series of experiments investigating inhibitory synapses (as it was a misunderstanding affecting 2 reviewers from the three). But the English text is still very hard to understand and has many mistakes, although I suggested to extensively review the use of English. Furthermore, my suggestion about avoiding many abbreviations in the abstract, analyse and discuss more the perforated synapses, the figure presentation (Figure 3) and including data about the astrocytic coverage in the Results section were not implemented. My questions about the number of docked vesicles and p10 vesicles, as well as about the different categories of the vesicle pools have not been answered neither. Many other minor comments and suggestions were answered, corrected and implemented, but I think it could have been improved more if the Authors take into account all of the reviewers' suggestions, not only some of them. I still have several main and minor concerns, with a few new ones as well I did not realize earlier, but still think it is important.

      We would like to thank the reviewer for the comments.

      - We worked on the English again and tried to improve the language.

      - We avoided to use too many abbreviations in the Abstract and reduced them to a minimum.

      - We included a small paragraph about non-perforated vs. perforated active zones in both the Results and Discussion sections. However, since the majority of active zones in all cortical layers of the human TLN were of the macular type, we concluded that it is not relevant to describe their function in more detail.

      - In Figure 3 A-C we added contour lines to the boutons to make their outlines more visible.

      - We completed the data about the astrocytic coverage in the Results section (see also below).

      - Concerning the vesicle pools please see below.

      Main concerns:

      (1) Epileptic patients:

      As all patients were epileptic, it is not correct to state in the abstract that non-epileptic tissue was investigated. Even if the seizure onset zone was not in the region investigated, seizures usually invade the temporal lobe in TLE. If you can prove that no spiking activity occurred in the sample you investigated and the seizures did not invade that region, then you can write that it is presumably non-epileptic. I would suggest to write “L1 of the human temporal lobe neocortical biopsy tissue". See also Methods lines 608-612. Write only “non-epileptic" or “non-affected" if you verified it with EcoG. If this was the case, please write a few sentences about it in the Methods.

      We rephrased Material and Methods concerning this point and added that patients were monitored with EEG, MRI and multielectrode recordings. In addition, we stated that the epileptic focus was always far away from the neocortical tissue samples. Furthermore, we added a small paragraph that functional studies using the same methodology have shown that neocortical access tissue samples taken from epilepsy surgery do not differ in electrophysiological properties and synaptic physiology when compared with acute slice preparations in experimental animals and we quoted the relevant papers.

      We hope that the reviewer is now convinced that our tissue samples can be regarded as non-affected.

      (2) About the inhibitory/excitatory synapses.

      Since our focus was on excitatory synaptic boutons as already stated in the title we have not analyzed inhibitory SBs. Now, I do understand that only excitatory synapses were investigated. Although it was written in the title, I did not realized, since all over the manuscript the Authors were writing synapses, and were distinguishing between inhibitory and excitatory synapses in the text and showing numerous excitatory and inhibitory synapses on Figure 2 and discussing inhibitory interneurons in the Discussion as well. Maybe this was the reason why two reviewers out of the three (including myself) thought you investigated both types of synapses but did not differentiated between them. So, please, emphasize in the Abstract (line 40), Introduction (for ex. line 92-97) and the Discussion (line 369) that only excitatory synaptic boutons were investigated.

      As this paper investigated only excitatory synaptic boutons, I think it is irrelevant to write such a long section in the Discussion about inhibitory interneurons and their functions in the L1 of the human temporal lobe neocortex. Same applies to the schematic drawing of the possible wiring of L1 (Figure 7). As no inhibitory interneurons were examined, neither the connection of the different excitatory cells, only the morphology of single synaptic boutons without any reference on their origin, I think this figure does not illustrate the work done in this paper. This could be a figure of a review paper about the human L1, but is inappropriate in this study.

      We followed the reviewer’s suggestion and pointed out explicitly that we only investigated excitatory synaptic boutons. We also changed the Discussion and focused more on circuitry in L1 and the role of CR-cells.

      (3) Perforated synapses

      The findings of the Geinismann group suggesting that perforated synapses are more efficient than non-perforated ones is nowadays very controversially discussed” I did not ask the Authors to say that perforated synapses are more efficient. However, based on the literature (for ex. Harris et al, 1992; Carlin and Siekievitz, 1982; Nieto-Sampedro et al., 1982) the presence of perforated synapses is indeed a good sign of synapse division/formation - which in turn might be coupled to synaptic plasticity (Geinisman et al, 1993), increased synaptic activity (Vrensen and Cardozo, 1981), LTP (Geinisman et al, 1991, Harris et al, 2003), pathological axonal sprouting (Frotscher et al, 2006), etc. I think it is worth mentioning this at least in the Discussion.

      We agree with the reviewer and added a small paragraph in the Results section about the two types of AZs in L1 of the human TLN. We pointed out that there are both types, macular non-perforated and perforated AZs, but the majority in all layers were of the non-perforated type. In the Discussion we added some paper pointing out the role of perforated synapses.

      (4) Question about the vesicle pools

      Results, Line 271: Still not understandable, why the RRP was defined as {less than or equal to}10 nm and {less than or equal to}20nm. Why did you use two categories? One would be sufficient (for example {less than or equal to}20nm). Or the vesicles between 10 and 20nm were considered to be part of RRP? In this case there is a typo, it should be {greater than or equal to}10 nm and {less than or equal to}20nm.

      The answer of the Authors was to my question raised: We decided that also those very close within 10 and 20 nm away from the PreAZ, which is less than a SV diameter may also contribute to the RRP since it was shown that SVs are quite mobile.

      This does not clarify why did you use two categories. Furthermore, I did not receive answer (such as Referee #2) for my question on how could you have 3x as many docked vesicles than vesicles {less than or equal to}10nm. The category {less than or equal to}10nm should also contain the docked vesicles. Or if this is not the case, please, clarify better what were your categories.

      We thank the reviewer for pointing out that mentioning two distance criteria (p10 and p20) to define one physiological entity (RRP) is somewhat confusing and we acknowledge that the initial response to the reviewers falls short of explaining this choice. This is indeed only understandable in the context of the original paper by Sätzler et al. 2002, where these criteria were first introduced. We therefore referenced this publication more prominently in the paragraph in question.

      So to explain this, we first would like to clarify the definition of the two RRP classification criteria used (p10 and p20), which has caused some confusion amongst the reviewers as to which vesicles where included or not:

      - p10 criterion: p£10 nm (SVs have a minimum distance less than or equal to 10 nm from the PreAZ), including ‘docked’ vesicles which have a distance of zero or less (p0)

      - p20 criterion: p£20 nm (SVs have a minimum distance less than or equal to 20 nm from the PreAZ), including vesicles of the p10 criterion.

      As mentioned, these criteria were introduced first in Sätzler et al. 2002 looking at the Calyx of Held synapse. In that paper, we tried to establish a morphological correlate to existing physiological measurements, which included the RRP. As there is no known marker that would allow to discriminate between vesicles that contribute to the RRP anatomically, we looked at existing physiological experiments such as Schneggenburger et al. 1999; Wu and Borst 1999; Sun and Wu 2001 and compared their total numbers to our measurements. As the number of docked vesicles (p0, see above) was on the lower side of these physiological estimates, we also looked at vesicles close to the AZ, which we think could be recruited within a short time (£ 10 msec). Comparing with existing literature, we found that at p20 we get pool sizes comparable to midrange estimates of reported RRP sizes. In order to account for the variability of the observed physiological pool sizes, we reported all three measurements (p0, p10, p20) not only in the original Calyx of Held, but in all subsequent studies of different CNS synapses of our group since then.

      As it remains uncertain if such correlate indeed exists, we therefore followed the suggestion to rephrase RRP and RP to putative RRP and putative RP (see also Rollenhagen et al. 2007). We thank both reviewers for pointing out this omission.

      Concerning the difference between ‘docked’ vesicles and vesicles within the p10 perimeter criterion. First of all, the reviewer is right in saying that the category p10 ({less than or equal to}10nm) should also contain the docked vesicles (see above). The fact to have 3x as many ‘docked’ vesicles in our TEM tomography than in the p10 distance analysis could be partly explained, on the one hand, by a very high variability between patients (as expressed by the high SD, table 1) and, on the other hand, by a high intraindividual synaptic bouton variability. In both sublayers, there is a huge difference in the number of vesicles within the p10 criterion of individual synaptic boutons ranging from 0 to ~40 with a mean value of ~1 to ~4 (calculated per patient), the upper level being close to the values calculated with TEM tomography for the ‘docked’ vesicles.

      (5) Astrocytic coverage

      On Fig. 6 data are presented on the astrocytic coverage derived from L1 and L4. In my previous review I asked to include this in the text of the Results as well, but I still do not see it. It is also lacking from the Results how many samples from which layer were investigated in this analysis. Only percentages are given, and only for L1 (but how many patients, L1a and/or L1b and/or L4 is not provided). In contrast, Figure 6 and Supplementary Table 2 (patient table) contains the information that this analysis has been made in L4 as well. Please, include this information in the text as well (around lines 348-360).

      In our previous revised version, we had included the values shown in Fig. 6 for both L1 and L4 in the Results section (L4: lines 352 – 355: ‘The findings in L1…’). However, we agree with the reviewer and have now also added the number of patients and synapses investigated (now lines 359 – 365).

      About how to determine glial elements. I cannot agree with the Authors that glial elements can be determined with high certainty based only on the anatomical features of the profiles seen in the EM. “With 25 years of experience in (serial) EM work" I would say, that glial elements can be very similar to spine necks and axonal profiles.

      All in all, if similar methods were used to determine the glial coverage in the different layers of the human neocortex, than it can be compared (I guess this is the case). However, I would say in the text that proper determination would need immunostaining and a new analysis. This only gives an estimation with the possibility of a certain degree of error.

      We do not entirely agree with the reviewer on this point. As stated in the text, there are structural criteria to identify astrocytic elements (see citations quoted). These golden standard criteria are commonly used also by other well-known groups (DeFelipe and co-workers, Francisco Clasca and co-workers; Michael Frotscher the late and co-workers etc.). However, in a past paper about astrocytic coverage of synaptic complexes in L5 of the human TLN, immunohistochemistry against glutamine synthetase, a key enzyme in astrocytes, was carried out to describe the coverage. This experiment supports our findings in the other cortical layers of the human TLN. As the reviewer might know, immunohistochemistry always led to a reduction in ultrastructural preservation, so we decided not to use immunohistochemistry for the further publications of the other cortical layers. We added a short notice on this in the Material and Methods section.

      (6) Large interindividual differences in the synapse density should be discussed in the Discussion.

      As suggested by the reviewer we have included a sentence in the Discussion that interindividual differences can be either related to differences in age, gender and the use of different methodology as suggested by DeFelipe and co-workers (1999)

      Reviewer #2 (Public review):

      Summary:

      The study of Rollenhagen et al examines the ultrastructural features of Layer 1 of human temporal cortex. The tissue was derived from drug-resistant epileptic patients undergoing surgery, and was selected as further from the epilepsy focus, and as such considered to be non-epileptic. The analyses has included 4 patients with different age, sex, medication and onset of epilepsy. The MS is a follow-on study with 3 previous publications from the same authors on different layers of the temporal cortex:

      Layer 4 - Yakoubi et al 2019 eLife

      Layer 5 - Yakoubi et al 2019 Cerebral Cortex,

      Layer 6 - Schmuhl-Giesen et al 2022 Cerebral Cortex

      They find, the L1 synaptic boutons mainly have single active zone a very large pool of synaptic vesicles and are mostly devoid of astrocytic coverage.

      Strengths:

      The MS is well written easy to read. Result section gives a detailed set of figures showing many morphological parameters of synaptic boutons and surrounding glial elements. The authors provide comparative data of all the layers examined by them so far in the Discussion. Given that anatomical data in human brain are still very limited, the current MS has substantial relevance.

      The work appears to be generally well done, the EM and EM tomography images are of very good quality. The analyses is clear and precise.

      Weaknesses:

      The authors made all the corrections required, answered most of my concerns, included additional data sets, and clarified statements where needed.

      My remaining points are:

      Synaptic vesicle diameter (that has been established to be ~40nm independent of species) can properly be measured with EM tomography only, as it provides the possibility to find the largest diameter of every given vesicle. Measuring it in 50 nm thick sections result in underestimation (just like here the values are ~25 nm) as the measured diameter will be smaller than the true diameter if the vesicle is not cut in the middle, (which is the least probable scenario). The authors have the EM tomography data set for measuring the vesicle diameter properly.

      We thank the reviewer for the helpful comments. We followed the recommendation to measure the vesicle diameter using our TEM tomography tilt series, but came to similar results concerning this synaptic parameter. As stated in our Material and Methods section, we only counted (measured) clear ring-link structures according to a paper by Abercrombie (1963). Since our results are similar for both methods, we do believe that our measurements are correct. Even random single measurements on the original 3D tilt-series yielded comparable results (Lübke and co-workers, personal observation). Furthermore, our results are within ranges, although with high variability, also described by other groups (see discussion lines 436 - 449). We therefore hope that the reviewer will now accept our measurements.

      It is a bit misleading to call vesicle populations at certain arbitrary distances from the presynaptic active zone as readily releasable pool, recycling pool and resting pool, as these are functional categories, and cannot directly be translated to vesicles at certain distances. Even it is debated whether the morphologically docked vesicles are the ones, that are readily releasable, as further molecular steps, such as proper priming is also a prerequisite for release.

      It would help to call these pools as "putative" correlates of the morphological categories.

      We followed the suggestion by the reviewer and renamed our vesicle pools as putative RRP, putative RP and putative resting pools.

      Reviewer #3 (Public review):

      Summary:

      Rollenhagen at al. offer a detailed description of layer 1 of the human neocortex. They use electron microscopy to assess the morphological parameters of presynaptic terminals, active zones, vesicle density/distribution, mitochondrial morphology and astrocytic coverage. The data is collected from tissue from four patients undergoing epilepsy surgery. As the epileptic focus was localized in all patients to the hippocampus, the tissue examined in this manuscript is considered non-epileptic (access) tissue.

      Strengths:

      The quality of the electron microscopic images is very high, and the data is analyzed carefully. Data from human tissue is always precious and the authors here provide a detailed analysis using adequate approaches, and the data is clearly presented.

      Weaknesses:

      The text connects functional and morphological characteristics in a very direct way. For example, connecting plasticity to any measurement the authors present would be rather difficult without any additional functional experiments. References to various vesicle pools based on the location of the vesicles is also more complex than it is suggested in the manuscript. The text should better reflect the limitations of the conclusions that can be drawn from the authors' data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Astrocytic coverage

      On Fig. 6 data are presented on the astrocytic coverage derived from L1 and L4. In my previous review I asked to include this in the text of the Results as well, but I still do not see it. It is also lacking from the Results how many samples from which layer were investigated in this analysis. Only percentages are given, and only for L1 (but how many patients, L1a and/or L1b and/or L4 is not provided). In contrast, Figure 6 and Supplementary Table 2 (patient table) contains the information that this analysis has been made in L4 as well. Please, include this information in the text as well (around lines 348-360).

      See above.

      About how to determine glial elements. I cannot agree with the Authors that glial elements can be determined with high certainty based only on the anatomical features of the profiles seen in the EM. “With 25 years of experience in (serial) EM work" I would say, that glial elements can be very similar to spine necks and axonal profiles. Please, see the photos below, out of the 16 circled profiles (2nd picture, very similar to each other) only 3 belong to an astroglial cell (last picture, purple profiles-purple cell), 10 are spines/spine necks/small caliber dendrites of pyramidal cells, 3 are axonal profiles (last but one picture, blue profiles, marked with arrows on the right side). If you follow in your serial sections those elements which you think are glial processes and indeed they are attached to a confidently identifiable glial cell, I agree, it is a glial process. But identifying small, almost empty profiles without any specific staining, from one single EM section, as glial process is very uncertain. Please, check the database of the Allen Institute made from the V1 visual cortex of a mouse. It is a large series of EM sections where they reconstructed thousands of neurons, astroglial and microglial cells. It is possible to double click on the EM picture on a profile and it will show the cell to which that profile belongs. https://portal.brain-map.org/connectivity/ultrastructural-connectomics Pictures included here: https://elife-rp.msubmit.net/eliferp_files/2024/11/25/00132644/02/132644_2_attach_21_29456_convrt.pdf

      All in all, if similar methods were used to determine the glial coverage in the different layers of the human neocortex, than it can be compared (I guess this is the case). However, I would say in the text that proper determination would need immunostaining and a new analysis. This only gives an estimation with the possibility of a certain degree of error.

      As stated above, we carried out glutamine synthetase immunohistochemistry in L5 of the human TLN and came to the same results. However, we added a sentence on this in the chapter on astrocytic coverage in the Material and Methods section. Additionally, we modified this chapter according to the reviewer’s suggestion.

      Minor comments

      Introduction: Last sentence is not understandable (lines 101-103), please rephrase. (contribute to understand or contribute in understanding or contribute to the understanding of..., but definitely not contribute to understanding). The authors should check and review extensively for improvements to the use of English, or use a program such as Grammarly.

      Results: Grammar (line 107): L1 in the adult mammalian neocortex represents a relatively...

      Line 173: “Some SBs in both sublaminae were seen to establish either two or three SBs on the same spine, spines 173 of other origin or dendritic shafts." - Some SBs established two or three SBs? I would write Some SBs established two or three synapses on...

      Line 243: “The synaptic cleft size were slightly, but non-significantly different"

      Line 260: “DCVs play an important role in endo- and exocytosis, the build-up of PreAZs by releasing Piccolo and Bassoon (Schoch and Gundelfinger 2006; Murkherjee et al. 2010)," - please, correct this.

      We have done corrections as suggested by the reviewer.

      Line 374: No point at the end of the last phrase.

      Discussion:

      Lines 400-404: “The majority of SBs in L1 of the human TLN had a single at most three AZs that could be of the non perforated macular or perforated type comparable with results for other layers in the human TLN but by ~1.5-fold larger than in rodent and non-human primates." - What is comparable with the other layers, but different from animals? Please rephrase this sentence, it is not understandable. I already mentioned this sentence in my previous review, but nothing happened.

      Lines 435-437: “Remarkably, the total pool sizes in the human TLN were significantly larger by more than 6-fold (~550 SVs/AZ), and ~4.7-fold (~750 SVs/AZ;) than those in L4 and L5 (Yakoubi et al. 2019a, b; see also Rollenhagen et al. 2018) in rats." Please rethink what you wished to say and compare to the sentence meaning. I think you wanted to compare human TLN L1 pool size to L4 and L5 in the human TLN (Yakoubi 2019a and b) and to rat (Rollenhagen 2018). Instead, you compared all layers of the human TLN to L4 and L5 in rats (with partly wrong references). Please rephrase this. Lines 483-484: “Astrocytes serve as both a physical barrier to glutamate diffusion and as mediate neurotransmitter uptake via transporters".

      This sentence is grammatically incorrect, please rephrase.

      We corrected the sentences as suggested by the reviewer.

      Methods:

      In the text, there are only 4 patients (lines 603-604), but in the supplementary table there are 9 patients (5 new included for L4 astrocytic coverage). Please, correct it in the text.

      Lines 608-609: “neocortical access tissue samples were resected to control the seizures for histological inspection by neuropathologists." - What is the meaning of this? Please, rephrase.

      We thank the reviewer for the comment and included the 5 patients used for L4 to the Material and Methods section, as well as in the Results section.

      The reviewer is right, and we rephrased and corrected the sentence concerning the inspection by neuropathologists.

      Figures

      Figures 5B: The legend says “SB (sb) synapsing on a stubby spine (sp) with a prominent spine apparatus (framed area) and a thick dendritic segment (de) in L1b" - In my opinion this is not one synaptic bouton, but two. Clearly visible membranes separate them, close to the spine.

      Supplemental Table 2 (patient table). If there is no information about Hu_04 patient's epilepsy, please write N/A (=non available) instead of - (which means it does not exist).

      The reviewer is right, and we corrected the figure and the legend, as well as the table accordingly.

      Reviewer #2 (Recommendations for the authors):

      The authors addressed almost all of my concern, only this one remained:

      If there is, however, relevant literature on "methods based on EM tomography" and "stereological methods to estimate both types of error" (over- and underestimates) that we are missing out on, we would appreciate the reviewer providing us with the corresponding references so that we can include such calculations in our paper.

      There is a very detailed new study on calculating correction for TEM 2D 3D, Rothman et al 2023 PLOS One. That addresses most of these issues.

      We thank the reviewer for drawing our attention to the publication by Rothman et al. 2023, which is a very detailed and comprehensive study looking at accurately estimating distributions of 3D size and densities of particles from 2D measurements using – amongst others – ET and TEM images as well as synaptic vesicles for validating their method. However, we do not see how this would be relevant to the reported mean diameters and their corresponding variances. And even if we would have reported on vesicle size/diameter distributions (referred to as G(d) in Rothmann et al. 2023), the authors themselves state that “… the results from our ET and TEM image analysis highlight the difficulty in computing a complete G(d) of MFT vesicles due to their small size…

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Weaknesses:

      It would be helpful for the authors to highlight why their technique (large scale analysis of one emm type) can yield more information than a typical GWAS analysis of invasive vs. non-invasive strains. Are SNPs easier to identify using a large-scale core genome? Is it more likely evolutionarily to find mutations in non-coding regions as opposed to the core genome and accessory genes, and this is what this technique allows? Did the analysis yield unexpected genes or new genes that had not been previously identified in other GWAS analyses? These points may need to be made more apparent in the results and deserve some thought in the discussion section.

      We thank the reviewer for pointing out the importance of this study. By focusing on bacteria within a single emm type, false positives caused by confounding lineage effects can be minimized, which contributes to greater accuracy of the pan-GWAS. We have added relevant text describing the strong points of our pan-GWAS approach to the Results and Discussion sections, as shown following:

      “The present pan-GWAS of bacteria within a single emm type minimized lineage effects, thus reducing false positives.” (lines 204–205)

      The present study focused on emm89 S. pyogenes, known to cause increasing rates of invasive infections worldwide, and also assessed differences between emm89 strains causing invasive and non-invasive infections. By focusing on bacteria within closed phylogenies, false positives caused by confounding lineage effects were minimized, thus contributing to a higher level of accuracy of the pan-GWAS.” (lines 420–424)

      In addition, we would like to comment more regarding the reviewer’s question, "Is it more likely evolutionarily to find mutations in non-coding regions as opposed to the core genome and accessory genes, and this is what this technique allows?". Mutations are generally considered to be more frequent in non-coding than coding regions. However, the actual mutation frequencies in both types of regions were not assessed in this study. Nevertheless, exploring non-coding regions using the k-mer method is of considerable importance, as variations significantly associated with infectious phenotypes may contribute to alterations in gene expression and other regulatory mechanisms.

      The Alpha-fold data does not demonstrate why the mutations the authors identified could contribute to the invasive phenotype. It would be helpful to show an overlay of the predicted structures containing the different SNPs to demonstrate the potential structural differences that can occur due to the SNP. This would make the data more convincing that the SNP has a potential impact on the function of the protein. Similarly, the authors discuss modification of the hydrophobicity of the side chain in the ferrichrome transporter (lines 317-318) due to a SNP, but this is not immediately obvious in the figure (Fig. 5).

      As the reviewer suggested, we have substituted Figure 5E in the previous version with a figure illustrating the molecular surface within proximity of the mutation. We speculated that the mutation may induce a small indentation on the surface, and thus attenuate the stability of the hydrophobic bound between FhuB and FhuD by invasion of solvent into the indentation. Additionally, images showing the wild-type and mutated models have been separated for better visibility instead of as an overlay of the predicted models suggested by the reviewer. Relevant text in the Results section and legend of Figure 5E have been accordingly revised, as shown following:

      “The mutation was predicted to induce formation of a small indentation on the molecular surface, thus increasing the surface area accessible to the solvent, and is considered to potentially affect the stability of the hydrophobic bond between FhuB and FhuD, and thus ferrichrome transport (Figure 5E).” (lines 360–363)

      “The 73rd valine in FhuB, shown in magenta, was substituted with alanine. The molecular surface is illustrated with a wireframe and that of the predicted indentation is shown with an arrowhead.” (lines 1162–1164)

      Reviewer #1 (Recommendations for the author):

      The figure legend for Fig. 3C needs to be explained so that it is similarly laid out as in Fig. 2C. Fig. 2C should indicate that the magenta color represents the invasive phenotype.

      Based on this helpful suggestion, more detailed information about the magenta color representing the invasive phenotype has been added to the legends of Fig. 2C and 3C, with relevant text also included in the revised legends, as shown following:

      “Colored bars above indicate countries and phenotypes, and magenta bars represent invasive phenotypes. Using the Roary program, gene names starting with “Group_” were automatically assigned. Position indicates the location of each SNP/indel on the core gene alignment. The full results are shown in Table S6.” (lines 1116–1120)

      “Colored bars above indicate countries and phenotypes, and magenta bars represent invasive phenotypes. Using the Roary program, gene names starting with “Group_” were automatically assigned. The full results are shown in Table S8. (lines 1130–1133)

      The wording and organization of results in the k-mer section started to get confusing around lines 270-271. It begins to be a list of results and would be better served by some interpretation or explanation of the significance (why it is important to find such mutations). For example, for mutations you find in non-coding regions, do you expect them to have a detrimental effects on gene expression/regulation?

      As the reviewer kindly suggested, we have added interpretation or explanation of the significance of Comp_6 and Comp_24 to the Results section. We analyzed the function of the non-coding region of Comp_6 by employing web-based in silico tools, including MLDSPP and BacPP, though no promoter sequences could be identified. Next, using BLAST, a search for known promoter sequences of S. pyogenes M1 strain SF370 of the CDBProm database was attempted, because the web-based in silico promoter prediction tools are not suitable for S. pyogenes. However, neither identical nor homologous sequences were detected. Thus, the significance of this region remains unknown. In Comp_24, group_141 was also identified in the COGs-based pan-GWAS as a non-invasiveness related gene. Furthermore, group_141 showed high levels of correlation with group_139 and group_467, encoding transposase and uncharacterized protein, respectively, which suggests that the presence of an MGE is associated with a non-invasive phenotype.

      Relevant text has been added to the Materials and Methods (lines 653–657) and Results (lines 308–311 and 314–319) sections, as shown following:

      “Promoter sequences in intergenic regions were predicted using web-based tools, MLDSPP and BacPP[29,30]. Additionally, BLAST was employed to search the promoter sequences of S. pyogenes strain SF370 registered in the CDBProm database (https://aw.iimas.unam.mx/cdbprom/)[69]” (lines 653–657)

      “We speculated that this region is related to regulation of gene expression. However, no promoter sequences were identified by utilizing MLDSPP, BacPP, and BLAST, thus the significance of this region remains to be clarified[29,30].” (lines 308–311)

      “Furthermore, group_141 was also identified in the COGs-based pan-GWAS as a non-invasiveness-related gene along with group_139 and group_467, which encode transposase and uncharacterized protein, respectively (Table S8 and Figure S4). Taken together, the absence of an MGE containing group_141, and the presence of another MGE harboring group_142 and group_143 may result in an invasive phenotype.” (lines 314–319)

      Additionally, new references (#29, 30, and 69) concerning bacterial promoter prediction have been included in the revised version of the manuscript.

      Because there is no difference in intracellular free ferric ions in the fhuB mutant compared with the wild-type, the authors speculate that the upregulation of the fhuBCD operon can compensate for the loss of function of the fhuB gene, but there is insufficient data to support this claim.

      As the reviewer indicate, the data presented in the previous version were insufficient to support our speculation. Therefore, the following sentence has been deleted from the manuscript (previous version line 367):

      “Therefore, the upregulation of fhuBCD may compensate for the impaired function mediated by SNP T218C.”

      The authors mention that there was no direct association between invasiveness and acquisition of genes (lines 451-455), including antibiotic resistance genes from prophages and MGEs (lines 467-469). These data should be moved to the results section to focus the results on the correlation between invasiveness and mutation of existing DNA vs acquisition of new DNA.

      Accordingly, we have added relevant text to the Results section, as shown following:

      “On the other hand, the present pan-GWAS found no genes encoding known virulence factors significantly associated with invasiveness, thus further analysis of the relationships of detected distribution patterns with prophages and MGEs was performed.” (lines 264–267)

      Minor spelling error at line 210 ("waws" instead of "was").

      As the reviewer kindly pointed out, the spelling has been corrected. (line 233)

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Line 55: Does this rate apply to all types of infections?

      The authors appreciate this question from the reviewer. We checked what types of infections the mortality rate is applied to and confirmed that it only represents STSS. Therefore, relevant text has been revised, as shown following:

      However, even with proper treatment, the mortality rate of patients with STSS remains high, ranging from 23–81%[6]”. (lines 72–73)

      Line 58: Could you explain the protein encoded by the emm gene and the role of the hypervariable region in pathogenesis?

      As requested, relevant text regarding the pathogenic role of the hypervariable region of M protein has been added, as shown following:

      S. pyogenes has been classified into at least 240 emm types based on a hypervariable region sequence of the emm gene, which encodes the M protein. This hypervariable region of the M protein is responsible for type-specific antigenicity and binds with high affinity to C4b-binding protein, a major fluid phase inhibitor of the classical and lectin pathways of the complement system that confers resistance to opsonophagocytosis[8].” (lines 76–81)

      Line 161: Figure 1C does not show the strain with the different pattern.

      The authors apologize for the lack of clarity. In Fig. 1C, the strain is shown by a pale pink color bar used to indicate the related clade. For clarity, an arrowhead pointing to the strain from outside of the tree has been added along with the following text in the legend:

      “Arrowhead indicates strain belonging to the novel clade.” (lines 1102–1103)

      Line 239: It could be interesting to examine the genes in the region between the mobile elements found in the global cohort, as the result profile was very different from the Japanese group, which revealed more specific genes. Consider adding this to the results section.

      Based on the reviewer’s insightful suggestion, we attempted to find regions between the mobile genetic element-related genes. However, contigs generated from short reads were not adequate to identify such a genome structure. Therefore, calculations to analyze the pairwise correlation of the presence of significant COGs in the 666 strains to predict genes on prophages and MGEs were performed, and the results added to Figure S4. Eight clusters were detected as coexisting COG groups, seven of which comprised phage- or MGE-related genes. Furthermore, a cluster with antimicrobial-resistant genes was shown to be correlated with non-invasive infections. It is thus speculated that gain or loss of gene sets via phages and MGEs rather than acquisition of virulence genes may lead to changes in fitness to the environment and bacterial phenotypes. Relevant text has been added to the revised versions of the Results, Discussion, and Materials and Methods sections, as shown following:

      “On the other hand, the present pan-GWAS found no genes encoding known virulence factors significantly associated with invasiveness, thus further analysis of the relationships of detected distribution patterns with prophages and MGEs was performed. For calculating the pairwise correlation of the presence of significant COGs in the 666 strains, the COGs were clustered into eight coexisting groups, seven of which contained phage- and/or MGEs-related genes (Figure S4). The largest group comprised 65 genes including phage proteins, while the second largest with 42 genes was found to be associated with non-invasive infections, and included group_2689, group_1833, and ermA1, encoding TetR/AcrR family transcriptional regulator, multidrug efflux system permease protein, and rRNA adenine N-6-methyltransferase, respectively.” (lines 264–273)

      “On the other hand, a cluster comprising 49 non-invasiveness-associated genes including antibiotic-resistance genes was identified. Furthermore, among the genes showing a significant correlation with the infectious phenotype, approximately 90% (152 of 169) were associated with non-invasiveness. One possible explanation is that significantly related genes reflect the process of not only gain of factors but also loss of those affecting fitness cost.” (lines 517–522)

      “The correlation of the presence of significant COGs was calculated and visualized using the R program.” (lines 643–644)

      Line 548: What cutoff values were used in Fastp?

      The default cutoff value for Fastp (Q>15) was used, and relevant text has been added to the Materials and Methods section in the revised version, as shown following:

      “All collected sequences were subjected to quality checks using Fastp v.0.20.1, with a default cutoff value of Q>15[53].” (lines 600–601)

      Line 635: Were the transcriptome experiments performed in triplicate?

      We apologize for the confusion. The transcriptome experiment was performed only once with three samples for each condition. The notation “(n=3 for each condition)” has been added to the relevant text portion in the Materials and Methods section (line 696).

      Discussion section: I believe the authors should place more emphasis on the fact that FhuB is associated with non-invasiveness, to provide clearer context in the discussion.

      Based on this helpful suggestion, we have revised relevant text in the Discussion section, as shown following:

      “Transcriptomic analysis findings suggested that the Japan-specific fhuB mutation associated with non-severe invasive infections contributes to the growth rate of S. pyogenes in human blood by adapting to the environment.” (lines 457–459)

      Also, “V73A” has been removed from the relevant text in the Discussion section to provide a more clear and precise context, with the revised sentence shown following:

      “Two possible roles of the FhuB mutation in the pathogenesis of severe invasive infections are thus proposed.” (lines 470–471)

    1. Author response:

      The following is the authors’ response to the previous reviews

      We would like to respond to just one remaining concern from Reviewer 1 and Reviewer 2 regarding a potential overfitting in Test Set 1, which involves combinations already present in the training set. DIPx’s (and TAIJI’s) performance in Test Set 1 is better than in Test Set 2, which involves combinations not present in the training set. Let’s consider two general points to highlight why the improved performance is not the result of overfitting. 

      (1) Suppose we are testing the e ect of one drug D; the training may involve, for example, selecting an optimal dose. A validated e ect of D in an independent test set is not an overfit, even though we are using the same drug in the training and the test set. Testing one drug is an extreme case, but the same idea holds for any number of drugs. What matters is the independence of the test set. 

      (2) A prediction model P1 will legitimately perform better than model P2, if P1 uses better or more informative features than P2. The features could be those used directly in the model, but they could also be other observable characteristics not directly used in the model, such as optimal subregions of the feature space. DPIx or TAIJI results indicate that the identity of previously trained combinations is one such informative feature. The set of previously trained combinations corresponds to a subregion of the feature space. DIPx’s prediction performance for known combinations would be expected to follow the results from Test Set 1; we cannot expect that if there is an overfitting issue. Finally, we note that Test Set 1 was established and used in the AstraZeneca Dream Challenge for rigorously testing the prediction of known combinations.

    1. Author response:

      We appreciate the constructive and thoughtful reviews provided by the reviewers and editorial team. We thank you for the opportunity to submit a provisional response and are grateful for the detailed and critical feedback that will strengthen our work. Below, we provide a summary of our planned revisions in response to the public reviews from Reviewer #1 and Reviewer #2.

      Reviewer #1 – Public Review Response Plan

      (1) Sample Overlap (MR Bias):

      We plan to replace several non-overlapping GWAS data sources to validate the association between aneurysms and atherosclerosis, thereby eliminating bias and Type I errors caused by sample overlap.

      (2) Multivariable MR (MVMR):<br /> We will attempt to incorporate known confounding factors (e.g., hypertension, smoking, diabetes) within the multivariable MR framework to verify the robustness of our results.

      (3) Clarifications and Presentation:

      - We will correct eTable citations.

      - Distinguish correctly between "incidence" and "prevalence".

      - Reorganize results to consistently present primary analyses first (IVW), followed by sensitivity results.

      - Expand the methods section to fully reflect all analyses.

      Reviewer #2 – Public Review Response Plan

      (1) Justification of MMP Selection:<br /> We will provide a detailed rationale for the inclusion of the 12 MMPs, based on prior literature and biological relevance.

      (2) Multiple Testing Clarification:<br /> We will clarify the Bonferroni correction strategy, explicitly accounting for all tests (e.g., 72 comparisons × multiple MR methods).

      (3) Instrument Selection Threshold:

      - We agree with the reviewer and will revise the SNP selection strategy, starting from p < 5×10⁻⁸ and only relaxing thresholds when fewer than 3 instruments are found.

      - Clarify the reasons why we do not use LD proxies.

      (4) Pleiotropy and Heterogeneity Tests:

      - We will add Egger's intercept results alongside MR-PRESSO.

      - Specify the R packages used (e.g., TwoSampleMR).

      - To prevent cluttered data presentation, we have included both heterogeneity and pleiotropy p-values in the supplementary tables.

      - Supplement forest plots showing outlier exclusion effects.

      (5) Clarifications in Figures and Tables:

      - Fix the duplicated “simple mode” entry in Figure 2.

      - Correct inconsistencies in p-values between figures and text.

      - Improve figure legends (e.g., color bar labels, panel identifiers).

      - Revise Table 4 title for clarity.

      - Remove the term "causal" where associations are nominal (e.g., p ~ 0.05).

    1. Author response:

      We would like to express our sincere gratitude to the editor and reviewers for their thoughtful comments and suggestions on our manuscript. Below is our interim response to the reviewers’ public review:

      Reviewer 1:

      (1) We appreciate the reviewer’s insightful comment on the consideration of RAS mutation type and lesion metastasis site in our study. We will undertake a more comprehensive review of the literature and conduct a detailed analysis to assess how these factors influence treatment efficacy in our cohort.

      (2) Regarding the radiotherapy planning process, we will provide further clarification in the revised manuscript. Specifically, we select the target lesion using CT imaging and delineate it by marking the 50% isodose line to define the planning target volume (PTV). In assessing treatment efficacy, we differentiate between target lesions (within the PTV) and off-target lesions (outside the PTV). We will update the figures to include the isodose line display for better clarity.

      (3 & 4) We acknowledge the limitations of our study, particularly with respect to the sample size, which may hinder the statistical power required for a comprehensive analysis of treatment effect markers and subgroup variations. Nonetheless, we will continue to refine our analyses in the revised manuscript to provide additional insights and strengthen the conclusions where possible.

      (5) During the early stages of our research, our team conducted a series of investigations into the impact of tumor fibrosis and angiogenesis on treatment outcomes. We have accumulated a substantial body of data, and we will summarize these findings in the revised manuscript to provide further context and support for our current study.

      Reviewer 2:

      (1, 4 & 5) We greatly appreciate the reviewer’s careful reading of the manuscript. We will revise the abstract, methods, and results sections to improve clarity and precision. Additionally, we will refine the overall wording of the manuscript to enhance its scientific rigor and professionalism.

      (2) We also appreciate the reviewer’s suggestions regarding the methods and results. These will be incorporated into the revised manuscript, with additional detail in the methods section to clarify our experimental approach and strengthen the discussion of our findings.

      (3) This is an intriguing point raised by the reviewer. We agree that the upregulation of PD-L1 expression following SBRT treatment could potentially enhance the efficacy of subsequent immunotherapy. To explore this further, we will conduct a detailed literature review and provide a more in-depth analysis of our data to elucidate the underlying mechanisms.

      We trust that the clarifications provided above partially address the reviewers' concerns. We are committed to fully resolving the raised issues through more comprehensive revisions in the subsequent manuscript update.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Ning et al. reported that Bcas2 played an indispensable role in zebrafish primitive hematopoiesis via sequestering β-catenin in the nucleus. The authors showed that loss of Bcas2 caused primitive hematopoietic defects in zebrafish. They unraveled that Bcas2 deficiency promoted β-catenin nuclear export via a CRM1-dependent manner in vivo and in vitro. They further validated that BCAS2 directly interacted with β-catenin in the nucleus and enhanced β-catenin accumulation through its CC domains. They unveil a novel insight into Bcas2, which is critical for zebrafish primitive hematopoiesis via regulating nuclear β-catenin stabilization rather than its canonical pre-mRNA splicing functions. Overall, the study is impressive and well-performed, although there are also some issues to address.

      Strengths:

      The study unveils a novel function of Bcas2, which is critical for zebrafish primitive hematopoiesis by sequestering β-catenin. The authors validated the results in vivo and in vitro. Most of the figures are clear and convincing. This study nicely complements the function of Bcas2 in primitive hematopoiesis.

      Weaknesses:

      A portion of the figures were over-exposed.

      Thank you for the time reviewing our manuscript. We agree with your suggestion and the exposure of Figure 5C and Figure 7E has been reduced. We hope that the revisions will meet your expectation.

      Reviewer #2 (Public Review):

      Summary:

      Ning and colleagues present studies supporting a role for breast carcinoma amplified sequence 2 (Bcas2) in positively regulating primitive wave hematopoiesis through amplification of beta-catenin-dependent (canonical) Wnt signaling. The authors present compelling evidence that zebrafish bcas2 is expressed at the right time and place to be involved in primitive hematopoiesis, that there are primitive hematopoietic defects in hetero- and homozygous mutant and knockdown embryos, that Bcas2 mechanistically positively regulates canonical Wnt signaling, and that Bcas2 is required for nuclear retention of B-cat through physical interaction involving armadillo repeats 9-12 of B-cat and the coiled-coil domains of Bcas2. Overall, the data and writing are clean, clear, and compelling. This study is a first-rate analysis of a strong phenotype with highly supportive mechanistic data. The findings shed light on the controversial question of whether, when, and how canonical Wnt signaling may be involved in hematopoietic development. We detail some minor concerns and questions below, which if answered, we believe would strengthen the overall story and resolve some puzzling features of the phenotype. Notwithstanding these minor concerns, we believe this is an exceptionally well-executed and interesting manuscript that is likely suitable for publication with minor additional experimental detail and commentary.

      Strengths:

      (1) The study features clear and compelling phenotypes and results.

      (2) The manuscript narrative exposition and writing are clear and compelling.

      (3) The authors have attended to important technical nuances sometimes overlooked, for example, focusing on different pools of cytosolic or nuclear b-catenin.

      (4) The study sheds light on a controversial subject: regulation of hematopoietic development by canonical Wnt signaling and presents clear evidence of a role.

      (5) The authors present evidence of phylogenetic conservation of the pathway.

      Weaknesses:

      (1) The authors present compelling data that Bcas2 regulates nuclear retention of B-cat through physical association involving binding between the Bcas2 CC domains and B-cat arm repeats 9-12. Transcriptional activation of Wnt target genes by B-cat requires physical association between B-cat and Tcf/Lef family DNA binding factors involving key interactions in Arm repeats 2-9 (Graham et al., Cell 2000). Mutually exclusive binding by B-cat regulatory factors, such as ICAT that prevent Tcf-binding is a documented mechanism (e.g. Graham et al., Mol Cell 2002). It would appear - based on the arm repeat usage by Bcas2 (repeats 9-12)-that Bcas2 and Tcf binding might not be mutually exclusive, which would support their model that Bcas2 physical association with B-cat to retain it in the nucleus would be compatible with co-activation of genes by allowing association with Tcf. It might be nice to attempt a three-way co-IP of these factors showing that B-cat can still bind Tcf in the presence of Bcas2, or at least speculate on the plausibility of the three-way interaction.

      We appreciate your assessment and generous comments for the manuscript. As you mentioned, the binding sites for TCF on β-catenin almost do not overlap with those for BCAS2. It is likely that BCAS2-mediated nuclear sequestration of β-catenin would be compatible with the initiation of gene transcription by allowing TCF to associate with β-catenin. To test this possibility, we have taken your suggestion and performed co-IP assays. The results showed that β-catenin still bound with TCF4 in the presence of BCAS2 (Supplemental Figure 12), confirming that the binding of BCAS2 to β-catenin would not interfere with the formation of β-catenin/TCF complex.

      (2) A major way that canonical Wnt signaling regulates hematopoietic development is through regulation of the LPM hematopoietic competence territories by activating expression of cdx1a, cdx4, and their downstream targets hoxb5a and hoxa9a (Davidson et al., Nature 2003; Davidson et al., Dev Biol 2006; Pilon et al., Dev Biol 2006; Wang et al., PNAS 2008). Could the authors assess (in situ) the expression of cdx1a, cdx4, hoxb5a, and hoxa9a in the bcas2 mutants?

      We agree with your suggestion and have examined the expression of cdx4 and hoxa9a by performing WISH. Diminished expression of cdx4 and hoxa9a was detected in the lateral plate mesoderm of bcas2<sup>+/-</sup> embryos at the 6-somite stage (Supplemental Figure 7).

      (3) The authors show compellingly that even heterozygous loss of bcas2 has strong Wnt-inhibitory effects. If Bcas2 is required for canonical Wnt signaling and bcas2 is expressed ubiquitously from the 1-cell stage through at least the beginning of gastrulation, why do bcas2 KO embryos not have morphological axis specification defects consistent with loss of early Wnt signaling, like loss of head (early), or brain anteriorization (later)? Could the authors provide some comments on this puzzle? Or if they do see any canonical Wnt signaling patterning defects in het- or homozygous embryos, could they describe and/or present them?

      You have raised an interesting question. In fact, we did not observe ventralization or axis determination defects in the early embryos of bcas2<sup>+/-</sup> mutants. Even in the very small number of homozygous mutant embryos, we did not find such morphological defects. Given that the homozygous and heterozygous mutant embryos were derived from crossing bcas2<sup>+/-</sup> males with bcas2<sup>+/-</sup> females, maternal Bcas2 might still remain and function in these embryos during gastrulation when axis determination and neural patterning took place. Accordingly, we have expanded our discussion to incorporate these insights (Line 565-572).

      Reviewer #3 (Public Review):

      Summary:

      This manuscript utilized zebrafish bcas2 mutants to study the role of bcas2 in primitive hematopoiesis and further confirms that it has a similar function in mice. Moreover, they showed that bcas2 regulates the transition of hematopoietic differentiation from angioblasts via activating Wnt signaling. By performing a series of biochemical experiments, they also showed that bcas2 accomplishes this by sequestering b-catenin within the nucleus, rather than through its known function in pre-mRNA splicing.

      Strengths:

      The work is well-performed, and the manuscript is well-written.

      Weaknesses:

      Several issues need to be clarified.

      (1) Is wnt signaling also required during hematopoietic differentiation from angioblasts? Can the authors test angioblast and endothelial markers in embryos with wnt inhibition? Also, can the authors add export inhibitor LMB to the mouse mutants to test if sequestering of b-catenin by bcas2 is conserved during primitive hematopoiesis in mice?

      Thank you very much for your appreciation and detailed assessment. To test whether Wnt signaling is also required during hematopoietic differentiation from angioblasts, wild-type embryos were exposed to 10 µM CCT036477, a small molecule β-catenin antagonist, from 9 hpf and then collected for WISH experiments. As shown in Supplemental Figure 8, the expression of hemangioblast markers npas4l, scl, and gata2 and endothelial marker fli1a remained unchanged, but the expression of erythroid progenitor marker gata1 was significantly reduced. These results suggest that canonical Wnt pathway may not be required for the generation of hemangioblasts or their endothelial differentiation, but is pivotal for their hematopoietic differentiation.

      It is quite difficult to validate the conserve role of BCAS2 during primitive hematopoiesis in mice, because the toxicity of LMB may cause severe adverse effects in mice.[1,2]

      (2) Bcas2 is required for primitive myelopoiesis in ALM. Does bcas2 play a similar function in primitive myelopoiesis, or is bcas2/b-catenin interaction more important for hematopoietic differentiation in PLM?

      You have raised an important question. In our study, we have demonstrated that the expression of myeloid progenitor marker pu.1 was significantly decreased in bcas2 mutants, hinting that Bcas2 is pivotal for primitive myelopoiesis. To further clarify the function of Bcas2 in primitive myelopoiesis, we injected 8 ng of bcas2 morpholino into Tg(coro1a:GFP) embryos at the 1-cell stage and examined β-catenin distribution at 17 hpf via immunostaining. We observed a significant decline of nuclear β-catenin in primitive myeloid cells (Supplemental Figure 9), indicating that Bcas2 is highly likely to play a similar role in sequestering β-catenin within the nucleus during primitive myelopoiesis.

      (3) Is it possible that CC1-2 fragment sequester b-catenin? The different phenotypes between this manuscript and the previous article (Yu, 2019) may be due to different mutations in bcas2. Is it possible that the bcas2 mutation in Yu's article produces a complete CC1-2 fragment, which might sequester b-catenin?

      This is an interesting perspective. To test the possibility that CC1-2 sequesters β-catenin, mRNA expressing the CC domains of BCAS2 has been co-injected with bcas2 morpholino into Tg(gata1:GFP) embryo at the one-cell stage. Increased nuclear β-catenin levels were detected in the GFP-positive hematopoietic progenitor cells at 16 hpf (Supplemental Figure 11). Our findings support that CC1-2 fragment of BCAS2 can sequester β-catenin within the nucleus.

      In the previous article (Yu, 2019), a deletion 5 bases mutation in the third exon of BCAS2 was produced by TALEN, therefore the CC domains of this mutant should be affected. It is difficult to conclude that the mutant BCAS2 protein in Yu’s study still remains association with β-catenin.

      (4) Can the author clarify what embryos the arrows point to in SI Figure 2D? In SI Figure 6B and B', can the author clarify how the nucleus and cytoplasm are bleached? In B, the nucleus also appears to be bleached.

      Thank you for your query and suggestion. In our revisions, the corresponding clarifications have been supplemented (Line 239-242; Line 978-979).

      We acknowledge that the nuclei in both the BCAS2 overexpression group and control group were slightly bleached. Given that we have performed real-time analysis for fluorescent recovery after photobleaching, and we have observed a much slower recovery of cytoplasmic fluorescence in BCAS2 overexpressed cells, the conclusion that BCAS2 inhibits the nuclear export of β-catenin but not its nuclear import, remains changed.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) In this study, the authors detected β-catenin distribution in erythrocytes (gata1-GFP+ cells). Estimating the β-catenin distribution in the myeloid cells is recommended.

      Thank you for your assessment and we have taken your suggestion. Tg(coro1a:GFP) embryos, which is commonly used to track both macrophages and neutrophils,[3] were injected with 8 ng of bcas2 morpholino into at the 1-cell stage and collected for immunostaining to examine the β-catenin distribution at 17 hpf. We observed a significant decline of nuclear β-catenin in primitive myeloid cells (Supplemental Figure 9). This result indicates that Bcas2 is highly likely to play a similar role in sequestering β-catenin within the nucleus during primitive myelopoiesis.

      (2) The reduced nuclear localization of β-catenin in Figure 3H required further evidence. It would be helpful if the authors quantified the fluorescence intensity in the cell nucleus and cytoplasm. Meanwhile, the figures (Figure 5C, Figure 7E) were over-exposed. Please validate these figures.

      Thank you for your suggestions. We agree with you that the fluorescence intensity of β-catenin in the nucleus and cytoplasm should be quantified. However, as the nucleus comprises a large part of the cell, we believe it would be more appropriate to quantify the relative fluorescence intensity by dividing the fluorescence intensity of nuclear β-catenin by the fluorescence intensity of DAPI.

      Such quantifications have been added for Figure 3G, 5C, 7E, S9A, and S13A. In addition, we have reduced the exposure of Figure 5C and Figure 7E. We hope that you will be satisfied with the revisions.

      (3) The authors used cKO mice to validate that the erythrocytes were eliminated. It would be interesting to detect β-catenin distribution by immunofluorescent staining in primitive hematopoietic cells in cKO mice. Addressing this issue can provide further evidence to support the conservation of Bcas2.

      We appreciate your suggestion. However, we found that red blood cells were almost eliminated in the yolk sac of Bcas2<sup>F/F</sup>;Flk1-Cre mice at E12.5. It is difficult to further detect β-catenin distribution in primitive erythroid cells in these mice.

      (4) The authors discovered that Bcas2 mediated β-catenin nuclear export in a CRM1-dependent manner. CRM1 is a key regulator involved in the majority of factors of nuclear export via recognizing specific nuclear export signals (NES). Validating the NES of Bcas2 is recommended. Furthermore, I wonder about the relationship between Bcas2 and CRM1 in regulating β-catenin nuclear export. One possibility is that Bcas2 covers the NES to inhibit the interaction between CRM1 and β-catenin, thus leading to β-catenin accumulation in the cell nucleus. The authors should discuss this possibility accordingly.

      Thank you for providing an interesting perspective. CRM1-mediated nuclear export of β-catenin usually requires CRM1 recognition and binding with the NES sequences in chaperon proteins, such as APC, Axin and Chibby.[4-6] Moreover, CRM1 can bind directly to and function as an efficient nuclear exporter for β-catenin.[7] Since BCAS2 has not been reported to contain any recognizable NES sequences, it will be interesting to investigate whether BCAS2 competitively inhibits β-catenin from associating with CRM1, or with the chaperone proteins. We have rewritten the discussion on CRM1-dependent nuclear export of β-catenin in line with your comments (Line 572-578).

      (5) It would be interesting if the authors could answer the specificity in Bcas2-mediated protein nuclear export pathway. The authors should detect other classical factors (CRM1 mediated) distribution when loss of Bcas2.

      Thank you for bringing up this point. To test whether BCAS2 specifically regulates CRM1-mediated nuclear export of β-catenin, we have investigated the nucleocytoplasmic distribution of other known CRM1 cargoes, such as ATG3 and CDC37L.[8] BCAS2 overexpression in HeLa cells slightly enhanced the nuclear localization of CDC37L, and had no significant impact on that of ATG3 (Supplemental Figure 11), indicating the specificity of BCAS2 in the regulation of CRM1-dependent nuclear export of β-catenin.

      Minor concerns:

      (1) The name "bcas2Δ7+/- and bcas2Δ14+/-" should be changed into "bcas2+/Δ7 and bcas2+/Δ14"(+/Δ7 or +/Δ14 should be superior on the right).

      Thank you for your suggestion. We have changed the names of the mutants throughout the manuscript.

      (2) The scale bar position in the figures should be unified.

      We agree with your suggestion and have unified the scale bar position in all figures.

      (3) In Figure 4E, "Nuclear" should be changed into "Nucleus".

      We apologize for the mistake and Figure 4E has been revised.

      (4) There are some unaesthetic issues in the figures. The figures need to be further edited. Figure 3H "β-catenin and Merge", Figure 4D "Merge". All these words should be centered in the figures.

      Thank you. We have edited all the figures to ensure that the text is centered.

      Reviewer #2 (Recommendations For The Authors):

      (1) It would be nice to have whole blot images for the Westerns in Supplementary Info.

      Thank you for your suggestion. Whole images for immunoblotting have been supplemented as Source data.

      (2) Line 292 change 5 hpf to 5 dpf.

      (3) Line 301 change "primary" to "primitive"?

      We apologize for the mistakes. We have incorporated these suggestions in the revised manuscript and reexamined spelling throughout the paper.

      (4) Figure S2C: is "Maker" a typographical error? Change to "ladder"?

      We apologize for this typographical error and we have revised it in Figure S2C.

      Reference

      (1) Ishizawa J, Kojima K, Hail N, Tabe Y, Andreeff M. Expression, function, and targeting of the nuclear exporter chromosome region maintenance 1 (CRM1) protein. Pharmacology & Therapeutics. 2015;153:25-35.

      (2) Li X, Feng Y, Yan MF, et al. Inhibition of Autism-Related Crm1 Disrupts Mitosis and Induces Apoptosis of the Cortical Neural Progenitors. Cerebral Cortex. 2020;30(7):3960-3976.

      (3) Li L, Yan B, Shi YQ, Zhang WQ, Wen ZL. Live Imaging Reveals Differing Roles of Macrophages and Neutrophils during Zebrafish Tail Fin Regeneration. Journal of Biological Chemistry. 2012;287(30):25353-25360.

      (4) Neufeld KL, Nix DA, Bogerd H, et al. Adenomatous polyposis coli protein contains two nuclear export signals and shuttles between the nucleus and cytoplasm. Proceedings of the National Academy of Sciences of the United States of America. 2000;97(22):12085-12090.

      (5) Li FQ, Mofunanya A, Harris K, Takemaru KI. Chibby cooperates with 14-3-3 to regulate β-catenin subcellular distribution and signaling activity. Journal of Cell Biology. 2008;181(7):1141-1154.

      (6) Cong F, Varmus H. Nuclear-cytoplasmic shuttling of Axin regulates subcellular localization of β-catenin. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(9):2882-2887.

      (7) Ki H, Oh M, Chung SW, Kim K. β-Catenin can bind directly to CRM1 independently of adenomatous polyposis coli, which affects its nuclear localization and LEF-1/β-catenin-dependent gene expression. Cell Biology International. 2008;32(4):394-400.

      (8) Kirli K, Karaca S, Dehne HJ, et al. A deep proteomics perspective on CRM1-mediated nuclear export and nucleocytoplasmic partitioning. Elife. 2015;4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Using a cross-modal sensory selection task in head-fixed mice, the authors attempted to characterize how different rules reconfigured representations of sensory stimuli and behavioral reports in sensory (S1, S2) and premotor cortical areas (medial motor cortex or MM, and ALM). They used silicon probe recordings during behavior, a combination of single-cell and population-level analyses of neural data, and optogenetic inhibition during the task.

      Strengths:

      A major strength of the manuscript was the clarity of the writing and motivation for experiments and analyses. The behavioral paradigm is somewhat simple but well-designed and wellcontrolled. The neural analyses were sophisticated, clearly presented, and generally supported the authors' interpretations. The statistics are clearly reported and easy to interpret. In general, my view is that the authors achieved their aims. They found that different rules affected preparatory activity in premotor areas, but not sensory areas, consistent with dynamical systems perspectives in the field that hold that initial conditions are important for determining trial-based dynamics.

      Weaknesses:

      The manuscript was generally strong. The main weakness in my view was in interpreting the optogenetic results. While the simplicity of the task was helpful for analyzing the neural data, I think it limited the informativeness of the perturbation experiments. The behavioral read-out was low dimensional -a change in hit rate or false alarm rate- but it was unclear what perceptual or cognitive process was disrupted that led to changes in these read-outs. This is a challenge for the field, and not just this paper, but was the main weakness in my view. I have some minor technical comments in the recommendations for authors that might address other minor weaknesses.

      I think this is a well-performed, well-written, and interesting study that shows differences in rule representations in sensory and premotor areas and finds that rules reconfigure preparatory activity in the motor cortex to support flexible behavior.

      Reviewer #2 (Public Review):

      Summary:

      Chang et al. investigate neuronal activity firing patterns across various cortical regions in an interesting context-dependent tactile vs visual detection task, developed previously by the authors (Chevee et al., 2021; doi: 10.1016/j.neuron.2021.11.013). The authors report the important involvement of a medial frontal cortical region (MM, probably a similar location to wM2 as described in Esmaeili et al., 2021 & 2022; doi: 10.1016/j.neuron.2021.05.005; doi: 10.1371/journal.pbio.3001667) in mice for determining task rules.

      Strengths:

      The experiments appear to have been well carried out and the data well analysed. The manuscript clearly describes the motivation for the analyses and reaches clear and well-justified conclusions. I find the manuscript interesting and exciting!

      Weaknesses:

      I did not find any major weaknesses.

      Reviewer #3 (Public Review):

      This study examines context-dependent stimulus selection by recording neural activity from several sensory and motor cortical areas along a sensorimotor pathway, including S1, S2, MM, and ALM. Mice are trained to either withhold licking or perform directional licking in response to visual or tactile stimulus. Depending on the task rule, the mice have to respond to one stimulus modality while ignoring the other. Neural activity to the same tactile stimulus is modulated by task in all the areas recorded, with significant activity changes in a subset of neurons and population activity occupying distinct activity subspaces. Recordings further reveal a contextual signal in the pre-stimulus baseline activity that differentiates task context. This signal is correlated with subsequent task modulation of stimulus activity. Comparison across brain areas shows that this contextual signal is stronger in frontal cortical regions than in sensory regions. Analyses link this signal to behavior by showing that it tracks the behavioral performance switch during task rule transitions. Silencing activity in frontal cortical regions during the baseline period impairs behavioral performance.

      Overall, this is a superb study with solid results and thorough controls. The results are relevant for context-specific neural computation and provide a neural substrate that will surely inspire follow-up mechanistic investigations. We only have a couple of suggestions to help the authors further improve the paper.

      (1) We have a comment regarding the calculation of the choice CD in Fig S3. The text on page 7 concludes that "Choice coding dimensions change with task rule". However, the motor choice response is different across blocks, i.e. lick right vs. no lick for one task and lick left vs. no lick for the other task. Therefore, the differences in the choice CD may be simply due to the motor response being different across the tasks and not due to the task rule per se. The authors may consider adding this caveat in their interpretation. This should not affect their main conclusion.

      We thank the Reviewer for the suggestion. We have discussed this caveat and performed a new analysis to calculate the choice coding dimensions using right-lick and left-lick trials (Fig. S3h) on page 8. 

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).”

      We also have included the caveats for using right-lick and left-lick trials to calculate choice coding dimensions on page 13.

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      (2) We have a couple of questions about the effect size on single neurons vs. population dynamics. From Fig 1, about 20% of neurons in frontal cortical regions show task rule modulation in their stimulus activity. This seems like a small effect in terms of population dynamics. There is somewhat of a disconnect from Figs 4 and S3 (for stimulus CD), which show remarkably low subspace overlap in population activity across tasks. Can the authors help bridge this disconnect? Is this because the neurons showing a difference in Fig 1 are disproportionally stimulus selective neurons?

      We thank the Reviewer for the insightful comment and agree that it is important to link the single-unit and population results. We have addressed these questions by (1) improving our analysis of task modulation of single neurons  (tHit-tCR selectivity) and (2) examining the relationship between tHit-tCR selective neurons and tHit-tCR subspace overlaps.  

      Previously, we averaged the AUC values of time bins within the stimulus window (0-150 ms, 10 ms bins). If the 95% CI on this averaged AUC value did not include 0.5, this unit was considered to show significant selectivity. This approach was highly conservative and may underestimate the percentage of units showing significant selectivity, particularly any units showing transient selectivity. In the revised manuscript, we now define a unit as showing significant tHit-tCR selectivity when three consecutive time bins (>30 ms, 10ms bins) of AUC values were significant. Using this new criterion, the percentage of tHittCR selective neurons increased compared with the previous analysis. We have updated Figure 1h and the results on page 4:

      “We found that 18-33% of neurons in these cortical areas had area under the receiver-operating curve (AUC) values significantly different from 0.5, and therefore discriminated between tHit and tCR trials (Fig. 1h; S1: 28.8%, 177 neurons; S2: 17.9%, 162 neurons; MM: 32.9%, 140 neurons; ALM: 23.4%, 256 neurons; criterion to be considered significant: Bonferroni corrected 95% CI on AUC did not include 0.5 for at least 3 consecutive 10-ms time bins).”

      Next, we have checked how tHit-tCR selective neurons were distributed across sessions. We found that the percentage of tHit-tCR selective neurons in each session varied (S1: 9-46%, S2: 0-36%, MM:25-55%, ALM:0-50%). We examined the relationship between the numbers of tHit-tCR selective neurons and tHit-tCR subspace overlaps. Sessions with more neurons showing task rule modulation tended to show lower subspace overlap, but this correlation was modest and only marginally significant (r= -0.32, p= 0.08, Pearson correlation, n= 31 sessions). While we report the percentage of neurons showing significant selectivity as a simple way to summarize single-neuron effects, this does neglect the magnitude of task rule modulation of individual neurons, which may also be relevant. 

      In summary, the apparent disconnect between the effect sizes of task modulation of single neurons and of population dynamics could be explained by (1) the percentages of tHit-tCR selective neurons were underestimated in our old analysis, (2) tHit-tCR selective neurons were not uniformly distributed among sessions, and (3) the percentages of tHit-tCR selective neurons were weakly correlated with tHit-tCR subspace overlaps. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      For the analysis of choice coding dimensions, it seems that the authors are somewhat data limited in that they cannot compare lick-right/lick-left within a block. So instead, they compare lick/no lick trials. But given that the mice are unable to initiate trials, the interpretation of the no lick trials is a bit complicated. It is not clear that the no lick trials reflect a perceptual judgment about the stimulus (i.e., a choice), or that the mice are just zoning out and not paying attention. If it's the latter case, what the authors are calling choice coding is more of an attentional or task engagement signal, which may still be interesting, but has a somewhat different interpretation than a choice coding dimension. It might be worth clarifying this point somewhere, or if I'm totally off-base, then being more clear about why lick/no lick is more consistent with choice than task engagement.

      We thank the Reviewer for raising this point. We have added a new paragraph on page 13 to clarify why we used lick/no-lick trials to calculate choice coding dimensions, and we now discuss the caveat regarding task engagement.  

      “No-lick trials included misses, which could be caused by mice not being engaged in the task. While the majority of no-lick trials were correct rejections (respond-to-touch: 75%; respond-to-light: 76%), we treated no-licks as one of the available choices in our task and included them to calculate choice coding dimensions (Fig. S4c,d,f). To ensure stable and balanced task engagement across task rules, we removed the last 20 trials of each session and used stimulus parameters that achieved similar behavioral performance for both task rules (Fig. 1d; ~75% correct for both rules).”

      In addition, to address a point made by Reviewer 3 as well as this point, we performed a new analysis to calculate choice coding dimensions using right-lick vs left-lick trials. We report this new analysis on page 8:

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).” 

      We added discussion of the limitations of this new analysis on page 13:

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      The authors find that the stimulus coding direction in most areas (S1, S2, and MM) was significantly aligned between the block types. How do the authors interpret that finding? That there is no major change in stimulus coding dimension, despite the change in subspace? I think I'm missing the big picture interpretation of this result.

      That there is no significant change in stimulus coding dimensions but a change in subspace suggests that the subspace change largely reflects a change in the choice coding dimensions.

      As I mentioned in the public review, I thought there was a weakness with interpretation of the optogenetic experiments, which the authors generally interpret as reflecting rule sensitivity. However, given that they are inhibiting premotor areas including ALM, one might imagine that there might also be an effect on lick production or kinematics. To rule this out, the authors compare the change in lick rate relative to licks during the ITI. What is the ITI lick rate? I assume pretty low, once the animal is welltrained, in which case there may be a floor effect that could obscure meaningful effects on lick production. In addition, based on the reported CI on delta p(lick), it looks like MM and AM did suppress lick rate. I think in the future, a task with richer behavioral read-outs (or including other measurements of behavior like video), or perhaps something like a psychological process model with parameters that reflect different perceptual or cognitive processes could help resolve the effects of perturbations more precisely.

      Eighteen and ten percent of trials had at least one lick in the ITI in respond-to-touch and  respond-tolight blocks, respectively. These relatively low rates of ITI licking could indeed make an effect of optogenetics on lick production harder to observe. We agree that future work would benefit from more complex tasks and measurements, and have added the following to make this point (page 14):

      “To more precisely dissect the effects of perturbations on different cognitive processes in rule-dependent sensory detection, more complex behavioral tasks and richer behavioral measurements are needed in the future.”

      Reviewer #2 (Recommendations For The Authors):

      I have the following minor suggestions that the authors might consider in revising this already excellent manuscript :

      (1) In addition to showing normalised z-score firing rates (e.g. Fig 1g), I think it is important to show the grand-average mean firing rates in Hz.

      We thank the Reviewer for the suggestion and have added the grand-average mean firing rates as a new supplementary figure (Fig. S2a). To provide more details about the firing rates of individual neurons, we have also added to this new figure the distribution of peak responses during the tactile stimulus period (Fig. S2b).

      (2) I think the authors could report more quantitative data in the main text. As a very basic example, I could not easily find how many neurons, sessions, and mice were used in various analyses.

      We have added relevant numbers at various points throughout the Results, including within the following examples:

      Page 3: “To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM and ALM (Fig. 1e-g, Fig. S1a-h, and Fig. S2a; S1: 6 mice, 10 sessions, 177 neurons, S2: 5 mice, 8 sessions, 162 neurons, MM: 7 mice, 9 sessions, 140 neurons, ALM: 8 mice, 13 sessions, 256 neurons).”

      Page 5: “As expected, single-unit activity before stimulus onset did not discriminate between tactile and visual trials (Fig. 2d; S1: 0%, 177 neurons; S2: 0%, 162 neurons; MM: 0%, 140 neurons; ALM: 0.8%, 256 neurons). After stimulus onset, more than 35% of neurons in the sensory cortical areas and approximately 15% of neurons in the motor cortical areas showed significant stimulus discriminability (Fig. 2e; S1: 37.3%, 177 neurons; S2: 35.2%, 162 neurons; MM: 15%, 140 neurons; ALM: 14.1%, 256 neurons).”

      Page 6: “Support vector machine (SVM) and Random Forest classifiers showed similar decoding abilities

      (Fig. S3a,b; medians of classification accuracy [true vs shuffled]; SVM: S1 [0.6 vs 0.53], 10 sessions, S2

      [0.61 vs 0.51], 8 sessions, MM [0.71 vs 0.51], 9 sessions, ALM [0.65 vs 0.52], 13 sessions; Random

      Forests: S1 [0.59 vs 0.52], 10 sessions, S2 [0.6 vs 0.52], 8 sessions, MM [0.65 vs 0.49], 9 sessions, ALM [0.7 vs 0.5], 13 sessions).”

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).” 

      Page 8: “In contrast, we found that S1, S2 and MM had stimulus CDs that were significantly aligned between the two block types (Fig. S4e; magnitude of dot product between the respond-to-touch stimulus CDs and the respond-to-light stimulus CDs, mean ± 95% CI for true vs shuffled data: S1: 0.5 ± [0.34, 0.66] vs 0.21 ± [0.12, 0.34], 10 sessions; S2: 0.62 ± [0.43, 0.78] vs 0.22 ± [0.13, 0.31], 8 sessions; MM: 0.48 ± [0.38, 0.59] vs 0.24 ± [0.16, 0.33], 9 sessions; ALM: 0.33 ± [0.2, 0.47] vs 0.21 ± [0.13, 0.31], 13 sessions).”  Page 9: “For respond-to-touch to respond-to-light block transitions, the fractions of trials classified as respond-to-touch for MM and ALM decreased progressively over the course of the transition (Fig. 5d; rank correlation of the fractions calculated for each of the separate periods spanning the transition, Kendall’s tau, mean ± 95% CI: MM: -0.39 ± [-0.67, -0.11], 9 sessions, ALM: -0.29 ± [-0.54, -0.04], 13 sessions; criterion to be considered significant: 95% CI on Kendall’s tau did not include 0).

      Page 11: “Lick probability was unaffected during S1, S2, MM and ALM experiments for both tasks, indicating that the behavioral effects were not due to an inability to lick (Fig. 6i, j; 95% CI on Δ lick probability for cross-modal selection task: S1/S2 [-0.18, 0.24], 4 mice, 10 sessions; MM [-0.31, 0.03], 4 mice, 11 sessions; ALM [-0.24, 0.16], 4 mice, 10 sessions; Δ lick probability for simple tactile detection task: S1/S2 [-0.13, 0.31], 3 mice, 3 sessions; MM [-0.06, 0.45], 3 mice, 5 sessions; ALM [-0.18, 0.34], 3 mice, 4 sessions).”

      (3) Please include a clearer description of trial timing. Perhaps a schematic timeline of when stimuli are delivered and when licking would be rewarded. I may have missed it, but I did not find explicit mention of the timing of the reward window or if there was any delay period.

      We have added the following (page 3): 

      “For each trial, the stimulus duration was 0.15 s and an answer period extended from 0.1 to 2 s from stimulus onset.”

      (4) Please include a clear description of statistical tests in each figure legend as needed (for example please check Fig 4e legend).

      We have added details about statistical tests in the figure legends:

      Fig. 2f: “Relationship between block-type discriminability before stimulus onset and tHit-tCR discriminability after stimulus onset for units showing significant block-type discriminability prior to the stimulus. Pearson correlation: S1: r = 0.69, p = 0.056, 8 neurons; S2: r = 0.91, p = 0.093, 4 neurons; MM: r = 0.93, p < 0.001, 30 neurons; ALM: r = 0.83, p < 0.001, 26 neurons.” 

      Fig. 4e: “Subspace overlap for control tHit (gray) and tCR (purple) trials in the somatosensory and motor cortical areas. Each circle is a subspace overlap of a session. Paired t-test, tCR – control tHit: S1: -0.23, 8 sessions, p = 0.0016; S2: -0.23, 7 sessions, p = 0.0086; MM: -0.36, 5 sessions, p = <0.001; ALM: -0.35, 11 sessions, p < 0.001; significance: ** for p<0.01, *** for p<0.001.”  

      Fig. 5d,e: “Fraction of trials classified as coming from a respond-to-touch block based on the pre-stimulus population state, for trials occurring in different periods (see c) relative to respond-to-touch → respondto-light transitions. For MM (top row) and ALM (bottom row), progressively fewer trials were classified as coming from the respond-to-touch block as analysis windows shifted later relative to the rule transition. Kendall’s tau (rank correlation): MM: -0.39, 9 sessions; ALM: -0.29, 13 sessions. Left panels: individual sessions, right panels: mean ± 95% CI. Dash lines are chance levels (0.5). e, Same as d but for respond-to-light → respond-to-touch transitions. Kendall’s tau: MM: 0.37, 9 sessions; ALM: 0.27, 13 sessions.”

      Fig. 6: “Error bars show bootstrap 95% CI. Criterion to be considered significant: 95% CI did not include 0.”

      (5) P. 3 - "To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM, and ALM using 64-channel silicon probes (Fig. 1e-g and Fig. S1a-h)." Please specify if these areas were recorded simultaneously or not.

      We have added “We recorded from one of these cortical areas per session, using 64-channel silicon probes.”  on page 3.  

      (6) Figure 4b - Please describe what gray and black lines show.

      The gray traces are the distance between tHit and tCR trajectories in individual sessions and the black traces are the averages across sessions in different cortical areas. We have added this information on page 6 and in the Figure 4b legend. 

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).

      Fig. 4b: “Distance between tHit and tCR trajectories in S1, S2, MM and ALM. Gray traces show the time varying tHit-tCR distance in individual sessions and black traces are session-averaged tHit-tCR distance (S1:10 sessions; S2: 8 sessions; MM: 9 sessions; ALM: 13 sessions).”

      (7) In addition to the analyses shown in Figure 5a, when investigating the timing of the rule switch, I think the authors should plot the left and right lick probabilities aligned to the timing of the rule switch time on a trial-by-trial basis averaged across mice.

      We thank the Reviewer for suggesting this addition. We have added a new figure panel to show the probabilities of right- and left-licks during rule transitions (Fig. 5a).

      Page 8: “The probabilities of right-licks and left-licks showed that the mice switched their motor responses during block transitions depending on task rules (Fig. 5a, mean ± 95% CI across 12 mice).” 

      (8) P. 12 - "Moreover, in a separate study using the same task (Finkel et al., unpublished), high-speed video analysis demonstrated no significant differences in whisker motion between respond-to-touch and respond-to-light blocks in most (12 of 14) behavioral sessions.". Such behavioral data is important and ideally would be included in the current analysis. Was high-speed videography carried out during electrophysiology in the current study?

      Finkel et al. has been accepted in principle for publication and will be available online shortly. Unfortunately we have not yet carried out simultaneous high-speed whisker video and electrophysiology in our cross-modal sensory selection task.

      Reviewer #3 (Recommendations For The Authors):

      (1) Minor point. For subspace overlap calculation of pre-stimulus activity in Fig 4e (light purple datapoints), please clarify whether the PCs for that condition were constructed in matched time windows. If the PCs are calculated from the stimulus period 0-150ms, the poor alignment could be due to mismatched time windows.

      We thank the Reviewer for the comment and clarify our analysis here. We previously used timematched windows to calculate subspace overlaps. However, the pre-stimulus activity was much weaker than the activity during the stimulus period, so the subspaces of reference tHit were subject to noise and we were not able to obtain reliable PCs. This caused the subspace overlap values between the reference tHit and control tHit to be low and variable (mean ± SD, S1:  0.46± 0.26, n = 8 sessions, S2: 0.46± 0.18, n = 7 sessions, MM: 0.44± 0.16, n = 5 sessions, ALM: 0.38± 0.22, n = 11 sessions).  Therefore, we used the tHit activity during the stimulus window to obtain PCs and projected pre-stimulus and stimulus activity in tCR trials onto these PCs. We have now added a more detailed description of this analysis in the Methods (page 32). 

      “To calculate the separation of subspaces prior to stimulus delivery, pre-stimulus activity in tCR trials (100 to 0 ms from stimulus onset) was projected to the PC space of the tHit reference group and the subspace overlap was calculated. In this analysis, we used tHit activity during stimulus delivery (0 to 150 ms from stimulus onset) to obtain reliable PCs.”   

      We acknowledge this time alignment issue and have now removed the reported subspace overlap between tHit and tCR during the pre-stimulus period from Figure 4e (light purple). However, we think the correlation between pre- and post- stimulus-onset subspace overlaps should remain similar regardless of the time windows that we used for calculating the PCs. For the PCs calculated from the pre-stimulus period (-100 to 0 ms), the correlation coefficient was 0.55 (Pearson correlation, p <0.01, n = 31 sessions). For the PCs calculated from the stimulus period (0-150 ms), the correlation coefficient was 0.68 (Figure 4f, Pearson correlation, p <0.001, n = 31 sessions). Therefore, we keep Figure 4f.  

      (2) Minor point. To help the readers follow the logic of the experiments, please explain why PPC and AMM were added in the later optogenetic experiment since these are not part of the electrophysiology experiment.

      We have added the following rationale on page 9.

      “We recorded from AMM in our cross-modal sensory selection task and observed visually-evoked activity (Fig. S1i-k), suggesting that AMM may play an important role in rule-dependent visual processing. PPC contributes to multisensory processing51–53 and sensory-motor integration50,54–58.  Therefore, we wanted to test the roles of these areas in our cross-modal sensory selection task.”

      (3) Minor point. We are somewhat confused about the timing of some of the example neurons shown in figure S1. For example, many neurons show visually evoked signals only after stimulus offset, unlike tactile evoked signals (e.g. Fig S1b and f). In addition, the reaction time for visual stimulus is systematically slower than tactile stimuli for many example neurons (e.g. Fig S1b) but somehow not other neurons (e.g. Fig S1g). Are these observations correct?

      These observations are all correct. We have a manuscript from a separate study using this same behavioral task (Finkel et al., accepted in principle) that examines and compares (1) the onsets of tactile- and visually-evoked activity and (2) the reaction times to tactile and visual stimuli. The reaction times to tactile stimuli were slightly but significantly shorter than the reaction times to visual stimuli (tactile vs visual, 397 ± 145 vs 521 ± 163 ms, median ± interquartile range [IQR], Tukey HSD test, p = 0.001, n =155 sessions). We examined how well activity of individual neurons in S1 could be used to discriminate the presence of the stimulus or the response of the mouse. For discriminability for the presence of the stimulus, S1 neurons could signal the presence of the tactile stimulus but not the visual stimulus. For discriminability for the response of the mouse, the onsets for significant discriminability occurred earlier for tactile compared with visual trials (two-sided Kolmogorov-Smirnov test, p = 1x10-16, n = 865 neurons with DP onset in tactile trials, n = 719 neurons with DP onset in visual trials).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study combines a range of advanced ultrastructural imaging approaches to define the unusual endosomal system of African trypanosomes. Compelling images show that instead of a distinct set of compartments, the endosome of these protists comprises a continuous system of membranes with functionally distinct subdomains as defined by canonical markers of early, late and recycling endosomes. The findings suggest that the endocytic system of bloodstream stages has evolved to facilitate the extraordinarily high rates of membrane turnover needed to remove immune complexes and survive in the blood, which is of interest to anyone studying infectious diseases.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Bloodstream stages of the parasitic protist, Trypanosoma brucei, exhibit very high rates of constitutive endocytosis, which is needed to recycle the surface coat of Variant Surface Glycoproteins (VSGs) and remove surface immune complexes. While many studies have shown that the endo-lysosomal systems of T. brucei BF stages contain canonical domains, as defined by classical Rab markers, it has remained unclear whether these protists have evolved additional adaptations/mechanisms for sustaining these very high rates of membrane transport and protein sorting. The authors have addressed this question by reconstructing the 3D ultrastructure and functional domains of the T. brucei BF endosome membrane system using advanced electron tomography and super-resolution microscopy approaches. Their studies reveal that, unusually, the BF endosome network comprises a continuous system of cisternae and tubules that contain overlapping functional subdomains. It is proposed that a continuous membrane system allows higher rates of protein cargo segregation, sorting and recycling than can otherwise occur when transport between compartments is mediated by membrane vesicles or other fusion events.

      Strengths:

      The study is a technical tour-de-force using a combination of electron tomography, super-resolution/expansion microscopy, immune-EM of cryo-sections to define the 3D structures and connectivity of different endocytic compartments. The images are very clear and generally support the central conclusion that functionally distinct endocytic domains occur within a dynamic and continuous endosome network in BF stages.

      Weaknesses:

      The authors suggest that this dynamic endocytic network may also fulfil many of the functions of the Golgi TGN and that the latter may be absent in these stages. Although plausible, this comment needs further experimental support. For example, have the authors attempted to localize canonical makers of the TGN (e.g. GRIP proteins) in T. brucei BF and/or shown that exocytic carriers bud directly from the endosomes?

      We agree with the criticism and have shortened the discussion accordingly and clearly marked it as speculation. However, we do not want to completely abandon our hypothesis.

      The paragraph now reads:

      Lines 740 – 751:

      “Interestingly, we did not find any structural evidence of vesicular retrograde transport to the Golgi. Instead, the endosomal ‘highways’ extended throughout the posterior volume of the trypanosomes approaching the trans-Golgi interface. It is highly plausible that this region represents the convergence point where endocytic and biosynthetic membrane trafficking pathways merge. A comparable merging of endocytic and biosynthetic functions has been described for the TGN in plants. Different marker proteins for early and recycling endosomes were shown to be associated and/ or partially colocalized with the TGN suggesting its function in both secretory and endocytic pathways (reviewed in Minamino and Ueda, 2019). As we could not find structural evidence for the existence of a TGN we tentatively propose that trypanosomes may have shifted the central orchestrating function of the TGN as a sorting hub at the crossroads of biosynthetic and recycling pathways to the endosome. Although this is a speculative scenario, it is experimentally testable.”

      Furthermore, we removed the lines 51 - 52, which included the suggestion of the TGN as a master regulator, from the abstract.

      Reviewer #2 (Public Review):

      The authors suggest that the African trypanosome endomembrane system has unusual organisation, in that the entire system is a single reticulated structure. It is not clear if this is thought to extend to the lysosome or MVB. There is also a suggestion that this unusual morphology serves as a trans-(post)Golgi network rather than the more canonical arrangement.

      The work is based around very high-quality light and electron microscopy, as well as utilising several marker proteins, Rab5A, 11 and 7. These are deemed as markers for early endosomes, recycling endosomes and late or pre-lysosomes. The images are mostly of high quality but some inconsistencies in the interpretation, appearance of structures and some rather sweeping assumptions make this less easy to accept. Two perhaps major issues are claims to label the entire endosomal apparatus with a single marker protein, which is hard to accept as certainly this reviewer does not really even know where the limits to the endosomal network reside and where these interface with other structures. There are several additional compartments that have been defined by Rob proteins as well, and which are not even mentioned. Overall I am unconvinced that the authors have demonstrated the main things they claim.<br /> The endomembrane system in bloodstream form T. brucei is clearly delimited. Compared to mammalian cells it is tidy and confined to the posterior part of the spindleshaped cell. The endoplasmic reticulum is linked to one side of the longitudinal cell axis, marked by the attached flagellum, while the mitochondrion locates to the opposite side. Glycosomes are easily identifiable as spheres, as are acidocalcisomes, which are smaller than glycosomes and – in electron micrographs – are characterized by high electron density. All these organelles extend beyond the nucleus, which is not the case for the endosomal compartment, the lysosome and the Golgi. The vesicles found in the posterior half of the trypanosome cell are quantitatively identifiable as COP1, CCVI or CCVII vesicles, or exocytic carriers. The lysosome has a higher degree of morphological plasticity, but this is not topic of the present work. Thus, the endomembrane system in T. brucei is comparatively well structured and delimited, which is why we have chosen trypanosomes as cell biological model.

      We have published EP1::GFP as marker for the endosome system and flagellar pocket back in 2004. We have defined the fluid phase volume of the trypanosome endosome in papers published between 2002 and 2007. This work was not intended to represent the entirety of RAB proteins. We were only interested in 3 canonical markers for endosome subtypes. We do not claim anything that is not experimentally tested, we have clearly labelled our hypotheses as such, and we do not make sweeping assumptions.

      The approaches taken are state-of-the-art but not novel, and because of the difficulty in fully addressing the central tenet, I am not sure how much of an impact this will have beyond the trypanosome field. For certain this is limited to workers in the direct area and is not a generalisable finding.

      To the best of our knowledge, there is no published research that has employed 3D Tokuyasu or expansion microscopy (ExM) to label endosomes. The key takeaway from our study, which is the concept that "endosomes are continuous in trypanosomes" certainly is novel. We are not aware of any other report that has demonstrated this aspect.

      The doubts formulated by the reviewer regarding the impact of our work beyond the field of trypanosomes are not timely. Indeed, our results, and those of others, show that the conclusions drawn from work with just a few model organisms is not generalisable. We are finally on the verge of a new cell biology that considers the plethora of evolutionary solutions beyond ophistokonts. We believe that this message should be widely acknowledged and considered. And we are certainly not the only ones who are convinced that the term "general relevance" is unscientific and should no longer be used in biology.

      Reviewer #3 (Public Review):

      Summary:

      As clearly highlighted by the authors, a key plank in the ability of trypanosomes to evade the mammalian host’s immune system is its high rate of endocytosis. This rapid turnover of its surface enables the trypanosome to ‘clean’ its surface removing antibodies and other immune effectors that are subsequently degraded. The high rate of endocytosis is likely reflected in the organisati’n and layout of the endosomal system in these parasites. Here, Link et al., sought to address this question using a range of light and three-dimensional electron microscopy approaches to define the endosomal organisation in this parasite.

      Before this study, the vast majority of our information about the make-up of the trypanosome endosomal system was from thin-section electron microscopy and immunofluorescence studies, which did not provide the necessary resolution and 3D information to address this issue. Therefore, it was not known how the different structures observed by EM were related. Link et al., have taken advantage of the advances in technology and used an impressive combination of approaches at the LM and EM level to study the endosomal system in these parasites. This innovative combination has now shown the interconnected-ness of this network and demonstrated that there are no ‘classical’ compartments within the endosomal system, with instead different regions of the network enriched in different protein markers (Rab5a, Rab7, Rab11).

      Strengths:

      This is a generally well-written and clear manuscript, with the data well-presented supporting the majority of the conclusions of the authors. The authors use an impressive range of approaches to address the organisation of the endosomal system and the development of these methods for use in trypanosomes will be of use to the wider parasitology community.

      I appreciate their inclusion of how they used a range of different light microscopy approaches even though for instance the dSTORM approach did not turn out to be as effective as hoped. The authors have clearly demonstrated that trypanosomes have a large interconnected endosomal network, without defined compartments and instead show enrichment for specific Rabs within this network.

      Weaknesses:

      My concerns are:

      i) There is no evidence for functional compartmentalisation. The classical markers of different endosomal compartments do not fully overlap but there is no evidence to show a region enriched in one or other of these proteins has that specific function. The authors should temper their conclusions about this point.

      The reviewer is right in stating that Rab-presence does not necessarily mean Rabfunction. However, this assumption is as old as the Rab literature. That is why we have focused on the 3 most prominent endosomal marker proteins. We report that for endosome function you do not necessarily need separate membrane compartments. This is backed by our experiments.

      ii) The quality of the electron microscopy work is very high but there is a general lack of numbers. For example, how many tomograms were examined? How often were fenestrated sheets seen? Can the authors provide more information about how frequent these observations were?

      The fenestrated sheets can be seen in the majority of the 37 tomograms recorded of the posterior volume of the parasites. Furthermore, we have randomly generated several hundred tiled (= very large) electron micrographs of bloodstream form trypanosomes for unbiased analyses of endomembranes. In these 2D-datasets the “footprint” of the fenestrated flat and circular cisternae is frequently detectable in the posterior cell area.

      We now have included the corresponding numbers in all EM figure legends.

      iii) The EM work always focussed on cells which had been processed before fixing. Now, I understand this was important to enable tracers to be used. However, given the dynamic nature of the system these processing steps and feeding experiments may have affected the endosomal organisation. Given their knowledge of the system now, the authors should fix some cells directly in culture to observe whether the organisation of the endosome aligns with their conclusions here.

      This is a valid criticism; however, it is the cell culture that provides an artificial environment. As for a possible effect of cell harvesting by centrifugation on the integrity and functionality of the endosome system, we consider this very unlikely for one simple reason. The mechanical forces acting in and on the parasites as they circulate in the extremely crowded and confined environment of the mammalian bloodstream are obviously much higher than the centrifugal forces involved in cell preparation. This becomes particularly clear when one considers that the mass of the particle to be centrifuged determines the actual force exerted by the g-forces. Nevertheless, the proposed experiment is a good control, although much more complex than proposed, since tomography is a challenging technique. We have performed the suggested experiment and acquired tomograms of unprocessed cells. The corresponding data is now included as supplementary movie 2, 3 and 4. We refer to it in lines 202 – 206: To investigate potential impacts of processing steps (cargo uptake, centrifugation, washing) on endosomal organization, we directly fixed cells in the cell culture flask, embedded them in Epon, and conducted tomography. The resulting tomograms revealed endosomal organization consistent with that observed in cells fixed after processing (see Supplementary movie 2, 3, and 4).

      We furthermore thank the reviewer for the experiment suggestion in the acknowledgments.

      iv) The discussion needs to be revamped. At the moment it is just another run through of the results and does not take an overview of the results presenting an integrated view. Moreover, it contains reference to data that was not presented in the results.

      We have improved the discussion accordingly.

      Recommendations for the authors:

      The reviewers concurred about the high calibre of the work and the importance of the findings.

      They raised some issues and made some suggestions to improve the paper without additional experiments - key issues include

      (1) Better referencing of the trypanosome endocytosis/ lysosomal trafficking literature.

      The literature, especially the experimental and quantitative work, is very limited. We now provide a more complete set of references. However, we would like to mention that we had cited a recent review that critically references the trypanosome literature with emphasis on the extensive work done with mammalian cells and yeast.

      (2) Moving the dSTORM data that detracts from otherwise strong data in a supplementary figure.

      We have done this.

      (3) Removal of the conclusion that the continuous endosome fulfils the functions of TGN, without further evidence.

      As stated above, this was not a conclusion in our paper, but rather a speculation, which we have now more clearly marked as such. Lines 740 to 751 now read:

      “Interestingly, we did not find any structural evidence of vesicular retrograde transport to the Golgi. Instead, the endosomal ‘highways’ extended throughout the posterior volume of the trypanosomes approaching the trans-Golgi interface. It is highly plausible that this region represents the convergence point where endocytic and biosynthetic membrane trafficking pathways merge. A comparable merging of endocytic and biosynthetic functions was already described for the TGN in plants. Different marker proteins for early and recycling endosomes were shown to be associated and/ or partially colocalized with the TGN suggesting its function in both secretory and endocytic pathways (reviewed in Minamino and Ueda, 2019). As we could not find structural evidence for the existence of a TGN we tentatively propose that trypanosomes may have shifted the central orchestrating function of the TGN as a sorting hub at the crossroads of biosynthetic and recycling pathways to the endosome. Although this is a speculative scenario, it is experimentally testable.”

      (4) Broader discussion linking their findings to other examples of organelle maturation in eukaryotes (e.g cisternal maturation of the Golgi)

      We have improved the discussion accordingly.

      Reviewer #1 (Recommendations For The Authors):

      What are the multi-vesicular vesicles that surround the marked endosomal compartments in Fig 1. Do they become labelled with fluid phase markers with longer incubations (e.g late endosome/ lysosomal)?

      The function of MVBs in trypanosomes is still far from being clear. They are filled with fluid phase cargo, especially ferritin, but are devoid of VSG. Hence it is likely that MVBs are part of the lysosomal compartment. In fact, this part of the endomembrane system is highly dynamic. MVBs can be physically connected to the lysosome or can form elongated structures. The surprising dynamics of the trypanosome lysosome will be published elsewhere.

      Figure 2. The compartments labelled with EP1::Halo are very poorly defined due to the low levels of expression of the reporter protein and/or sensitivity of detection of the Halo tag. Based on these images, it would be hard to conclude whether the endosome network is continuous or not. In this respect, it is unclear why the authors didn't use EP1-GFP for these analyses? Given the other data that provides more compelling evidence for a single continuous compartment, I would suggest removing Fig 2A.

      We have used EP1::GFP to label the entire endosome system (Engstler and Boshart, 2004). Unfortunately, GFP is not suited for dSTORM imaging. By creating the EP1::Halo cell line, we were able to utilize the most prominent dSTORM fluorescent dye, Alexa 647. This was not primarily done to generate super resolution images, but rather to measure the dynamics of the GPI-anchored, luminal protein EP with single molecule precision. The results from this study will be published separately. But we agree with the reviewer and have relocated the dSTORM data to the supplementary material.

      The observation that Rab5a/7 can be detected in the lumen of lysosome is interesting. Mechanistically, this presumably occurs by invagination of the limiting membrane of the lysosome. Is there any evidence that similar invagination of cytoplasmic markers occurs throughout or in subdomains of the endocytic network (possibly indicative of a 'late endosome' domain)?

      So far, we have not observed this. The structure of the lysosome and the membrane influx from the endosome are currently being investigated.

      The authors note that continuity of functionally distinct membrane compartments in the secretory/endocytic pathways has been reported in other protists (e.g T. cruzi). A particular example that could be noted is the endo-lysosomal system of Dictyostelium discoideum which mediates the continuous degradation and eventual expulsion of undigested material.

      We tried to include this in the discussion but ultimately decided against it because the Dictyostelium system cannot be easily compared to the trypanosome endosome.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Not sure that 'common' is the correct term here. Frequent, near-universal..... it would be true that endocytosis is common across most eukaryotes.

      We have changed the sentence to “common process observed in most eukaryotes” (line 33).

      Immune evasion - the parasite does not escape the immune system, but does successfully avoid its impact, at least at the population level.

      We have replaced the word “escape” with “evasion” (line 35).

      The third sentence needs to follow on correctly from the second. Also, more than Igs are internalised and potentially part of immune evasion, such as C3, Factor H, ApoL1 etcetera.

      We believe that there may be a misunderstanding here. The process of endocytic uptake and lysosomal degradation has so far only been demonstrated in the context of VSGbound antibodies, which is why we only refer to this. Of course, the immune system comprises a wide range of proteins and effector molecules, all of which could be involved in immune evasion.

      I do not follow the logic that the high flux through the endocytic system in trypanosomes precludes distinct compartmentalisation - one could imagine a system where a lot of steps become optimised for example. This idea needs expanding on if it is correct.

      Membrane transport by vesicle transfer between several separate membrane compartments would be slower than the measured rate of membrane flux.

      Again I am not sure 'efficient' on line 40. It is fast, but how do you measure efficiency? Speed and efficiency are not the same thing.

      We have replaced the word “efficient” with “fast” (line 42).

      The basis for suggesting endosomes as a TGN is unclear. Given that there are AP complexes, retromer, exocyst and other factors that are part of the TGN or at least post-G differentiation of pathways in canonical systems, this seems a step too far. There really is no evidence in the rest of the MS that seems to support this.

      Yes, we agree and have clarified the discussion accordingly. We have not completely removed the discussion on the TGN but have labelled it more clearly as speculation.

      I am aware I am being pedantic here, but overall the abstract seems to provide an impression of greater novelty than may be the case and makes several very bold claims that I cannot see as fully valid.

      We are not aware of any claim in the summary that we have not substantiated with experiments, or any hypothesis that we have not explained.

      Moreover, the concept of fused or multifunctional endosomes (or even other endomembrane compartments) is old, and has been demonstrated in metazoan cells and yeast. The concept of rigid (in terms of composition) compartments really has been rejected by most folks with maturation, recycling and domain structures already well-established models and concepts.

      We agree that the (transient) presence of multiple Rab proteins decorating endosomes has been demonstrated in various cell types. This finding formed the basis for the endosomal maturation model in mammals and yeast, which has replaced the previous rigid compartment model.

      However, we do not appreciate attempts to question the originality of our study by claiming that similar observations have been made in metazoans or yeast. This is simply wrong. There are no reports of a functionally structured, continuous, single and large endosome in any other system. The only membrane system that might be similar was described in the American parasite Trypanosoma cruzi, however, without the use of endosome markers or any functional analysis. We refer to this study in the discussion.

      In summary, the maturation model falls short in explaining the intricacies of the membrane system we have uncovered in trypanosomes. Therefore, one plausible interpretation of our data is that the overall architecture of the trypanosome endosomes represents an adaptation that enables the remarkable speed of plasma membrane recycling observed in these parasites. In our view, both our findings and their interpretation are novel and worth reporting. Again, modern cell biology should recognize that evolution has developed many solutions for similar processes in cells, about whose diversity we have learned almost nothing because of our reductionist view. A remarkable example of this are the Picozoa, tiny bipartite eukaryotes that pack the entire nutritional apparatus into one pouch and the main organelles with the locomotor system into the other. Another one is the “extreme” cell biology of many protozoan parasites such as Giardia, Toxpoplasma or Trypanosoma.

      Higher plants have been well characterised, especially at the level of Rab/Arf proteins and adaptins.

      We now mention plant endosomes in our brief discussion of the trypanosome TGN. Lines 744 – 747:

      “A comparable merging of endocytic and biosynthetic functions was already described for the TGN in plants. Different marker proteins for early and recycling endosomes were shown to be associated and/ or partially colocalized with the TGN suggesting its function in both secretory and endocytic pathways (reviewed in Minamino and Ueda, 2019).”

      The level of self-citing in the introduction is irritating and unscholarly. I have no qualms with crediting the authors with their own excellent contributions, but work from Dacks, Bangs, Field and others seems to be selectively ignored, with an awkward use of the authors' own publications. Diversity between organisms for example has been a mainstay of the Dacks lab output, Rab proteins and others from Field and work on exocytosis and late endosomal systems from Bangs. These efforts and contributions surely deserve some recognition?

      This is an original article and not a review. For a comprehensive overview the reviewer might read our recent overview article on exo- and endocytic pathways in trypanosomes, in which we have extensively cited the work of Mark Field, Jay Bangs and Joel Dacks. In the present manuscript, we have cited all papers that touch on our results or are otherwise important for a thorough understanding of our hypotheses. We do not believe that this approach is unscientific, but rather improves the readability of the manuscript. Nevertheless, we have now cited additional work.

      For the uninitiated, the posterior/anterior axis of the trypanosome cell as well as any other specific features should be defined.

      In lines 102 - 110 we wrote:

      “This process of antibody clearance is driven by hydrodynamic drag forces resulting from the continuous directional movement of trypanosomes (Engstler et al., 2007). The VSG-antibody complexes on the cell surface are dragged against the swimming direction of the parasite and accumulate at the posterior pole of the cell. This region harbours an invagination in the plasma membrane known as the flagellar pocket (FP) (Gull, 2003; Overath et al., 1997). The FP, which marks the origin of the single attached flagellum, is the exclusive site for endo- and exocytosis in trypanosomes (Gull, 2003; Overath et al., 1997). Consequently, the accumulation of VSG-antibody complexes occurs precisely in the area of bulk membrane uptake.”

      We think this sufficiently introduces the cell body axes.

      I don't understand the comment concerning microtubule association. In mammalian cells, such association is well established, but compartments still do not display precise positioning. This likely then has nothing to do with the microtubule association differences.

      We have clarified this in the text (lines 192 – 199). There is no report of cytoplasmic microtubules in trypanosomes. All microtubules appear to be either subpellicular or within the flagellum. To maintain the structure and position of the endosomal apparatus, they should be associated either with subpellicular microtubules, as is the case with the endoplasmic reticulum, or with the more enigmatic actomyosin system of the parasites. We have been working on the latter possibility and intend to publish a follow-up paper to the present manuscript.

      The inability to move past the nucleus is a poor explanation. These compartments are dynamic. Even the nucleus does interesting things in trypanosomes and squeezes past structures during development in the tsetse fly.

      The distance between the nucleus and the microtubule cytoskeleton remains relatively constant even in parasites that squeeze through microfluidic channels. This is not unexpected as the nucleus can be highly deformed. A structure the size of the endosome will not be able to physically pass behind the nucleus without losing its integrity. In fact, the recycling apparatus is never found in the anterior part of the trypanosome, most probably because the flagellar pocket is located at the posterior cell pole.

      L253 What is the evidence that EP1 labels the entire FP and endosomes? This may be extensive, but this claim requires rather more evidence. This is again suggested at l263. Again, please forgive me for being pedantic, but this is an overstatement unless supported by evidence that would be incredibly difficult to obtain. This is even sort of acknowledged on l271 in the context of non-uniform labelling. This comes again in l336.

      The evidence that EP1 labels the entire FP and endosomes is presented here: Engstler and Boshart, 2004; 10.1101/gad.323404).

      Perhaps I should refrain from comments on the dangers of expansion microscopy, or asking what has actually been gained here. Oddly, the conclusion on l290 is a fair statement that I am happy with.

      An in-depth discussion regarding the advantages and disadvantages of expansion microscopy is beyond the manuscript's intended scope. Our approach involved utilizing various imaging techniques to confirm the validity of our findings. We appreciate that our concluding sentence is pleasing.

      F2 - The data in panel A seem quite poor to me. I also do not really understand why the DAPI stain in the first and second columns fails to coincide or why the kinetoplast is so diffuse in the second row. The labelling for EP1 presents as very small puncta, and hence is not evidence for a continuum. What is the arrow in A IV top? The data in panel B are certainly more in line with prior art, albeit that there is considerable heterogeneity in the labelling and of the FP for example. Again, I cannot really see this as evidence for continuity. There are gaps.... Albeit I accept that labelling of such structures is unlikely to ever be homogenous.

      We agree that the dSTORM data represents the least robust aspect of the findings we have presented, and we concur with relocating it to the supplementary material.

      F3 - Rather apparent, and specifically for Rab7, that there is differential representation - for example, Cell 4 presents a single Rab7 structure while the remaining examples demonstrate more extensive labelling. Again, I am content that these are highly dynamic strictures but this needs to be addressed at some level and commented upon. If the claim is for continuity, the dynamics observed here suggest the usual; some level of obvious overlap of organellar markers, but the representation in F3 is clever but not sure what I am looking at. Moreover, the title of the figure is nothing new. What is also a bit odd is that the extent of the Rab7 signal, and to some extent the other two Rabs used, is rather variable, which makes this unclear to me as to what is being detected. Given that the Rab proteins may be defining microdomains or regions, I would also expect a region of unique straining as well as the common areas. This needs to at least be discussed.

      The differences in the representation result from the dynamics of the labelled structures. Therefore, we have selected different cells to provide examples of what the labelling can look like. We now mention this in the results section.

      The overlap of the different Rab signals was perhaps to be expected, but we now have demonstrated it experimentally. Importantly, we performed a rigorous quantification by calculating the volume overlaps and the Pearson correlation coefficients.

      In previous studies the data were presented as maximal intensity projections, which inherently lack the complete 3D information.

      We found that Rab proteins define microdomains and that there are regions of unique staining as well as common areas, as shown in Figure 3. The volumes do not completely overlap. This is now more clearly stated in lines 315 – 319:

      “These objects showed areas of unique staining as well as partially overlapping regions. The pairwise colocalization of different endosomal markers is shown in Figure 3 A, XI - XIII and 3 B. The different cells in Figure 3 B were selected to represent the dynamic nature of the labelled structures. Consequently, the selected cells provide a variety of examples of how the labelling can appear.”

      This had already been stated in lines 331 – 336:

      “In summary, the quantitative colocalization analyses revealed that on the one hand, the endosomal system features a high degree of connectivity, with considerable overlap of endosomal marker regions, and on the other hand, TbRab5A, TbRab7, and TbRab11 also demarcate separated regions in that system. These results can be interpreted as evidence of a continuous endosomal membrane system harbouring functional subdomains, with a limited amount of potentially separated early, late or recycling endosomes.”

      F4-6 - Fabulous images. But a couple of issues here; first, as the authors point out, there is distance between the gold and the antigen. So, this of course also works in the z-plane as well as the x/y-planes and some of the gold may well be associated with membraneous figures that are out of the plane, which would indicate an absence of colinearity on one specific membrane. Secondly, in several instances, we have Rab7 essentially mixed with Rab11 or Rab5 positive membrane. While data are data and should be accepted, this is difficult to reconcile when, at least to some level, Rab7 is a marker for a late-endosomal structure and where the presence of degradative activity could reside. As division of function is, I assume, the major reason for intracellular compartmentalisation, such a level of admixture is hard to rationalise. A continuum is one thing but the data here seem to be suggesting something else, i.e. almost complete admixture.

      We are grateful for the positive feedback regarding the image quality. It is true that the "linkage error," representing the distance between the gold and the antigen, also functions to some extent in the z-axis. However, it's important to note that the zdimension of the section in these Figures is 55 nm. Nevertheless, it's interesting to observe that membranes, which may not be visible within the section itself but likely the corresponding Rab antigen, is discernible in Figure 4C (indicated by arrows).

      We have clarified this in lines 397 – 400:

      “Consequently, gold particles located further away may represent cytoplasmic TbRab proteins or, as the “linkage error” can also occur in the z-plane, correspond to membranes that are not visible within the 55 nm thickness of the cryosection (Figure 4, panel C, arrows). “

      The coexistence of different Rabs is most likely concentrated in regions where transitions between different functions are likely. Our focus was primarily on imaging membranes labelled with two markers. We wanted to show that the prevailing model of separate compartments in the trypanosome literature is not correct.

      F7 - Not sure what this adds beyond what was published by Grunfelder.

      First, this figure is an important control that links our results to published work (Grünfelder et al. (2003)). Second, we include double staining of cargo with Rab5, Rab7, and Rab11, whereas Grünfelder focused only on Rab11. Therefore, our data is original and of such high quality that it warrants a main figure.

      F8 - and l583. This is odd as the claim is 'proof' which in science is a hard thing to claim (and this is definitely not at a six sigma level of certainty, as used by the physics community). However, I am seeing structures in the tomograms which are not contiguous - there are gaps here between the individual features (Green in the figure).

      We have replaced the term "proof". It is important to note that the structures in individual tomograms cannot all be completely continuous because the sections are limited to a thickness of 250 nm. Therefore, it is likely that they have more connectivity above and below the imaged section. Nevertheless, we believe that the quality of the tomograms is satisfactory, considering that 3D Tokuyasu is a very demanding technique and the production of serial Tokuyasu tomograms is not feasible in practice.

      Discussion - Too long and the self-citing of four papers from the corresponding author to the exclusion of much prior work is again noted, with concerns about this as described above. Moreover, at least four additional Rab proteins are known associated with the trypanosome endosomal system, 4, 5B, 21 and 28. These have been completely ignored.

      We have outlined our position on referencing in original articles above. We also explained why we focused on the key marker proteins associated with early (Rab5), late (Rab7) and recycling endosomes (Rab11). We did not ignore the other Rabs, we just did not include them in the present study.

      Overall this is disappointing. I had expected a more robust analysis, with a clearer discussion and placement in context. I am not fully convinced that what we have here is as extreme as claimed, or that we have a substantial advance. There is nothing here that is mechanistic or the identification of a new set of gene products, process or function.

      We do not think that this is constructive feedback.

      This MS suggests that the endosomal system of African trypanosomes is a continuum of membrane structures rather than representing a set of distinct compartments. A combination of light and electron microscopy methods are used in support. The basic contention is very challenging to prove, and I'm not convinced that this has been. Furthermore, I am also unclear as to the significance of such an organisation; this seems not really addressed.

      We acknowledge and respect varying viewpoints, but we hold a differing perspective in this matter. We are convinced that the data decisively supports our interpretation. May future work support or refute our hypothesis.

      Reviewer #3 (Recommendations For The Authors):

      Line 81 - delete 's

      Done.

      Generally, the introduction was very well written and clearly summarised our current understanding but the paragraph beginning line 134 felt out of place and repeated some of the work mentioned earlier.

      We have removed this paragraph.

      For the EM analysis throughout quantification would be useful as highlighted in the public review. How many tomograms were examined, and how often were types of structures seen? I understand the sample size is often small but this would help the reader appreciate the diversity of structures seen.

      We have included the numbers.

      Following on from this how were the cells chosen for tomogram analysis? For example, the dividing cell in 1D has palisades associating with the new pocket - is this commonly seen? Does this reflect something happening in dividing cells. This point about endosomal division was picked up in the discussion but there was little about in the main results.

      This issue is undoubtedly inherent to the method itself, and we have made efforts to mitigate it by generating a series of tomograms recorded randomly. We have refrained from delving deeper into the intricacies of the cell cycle in this manuscript, as we believe that it warrants a separate paper.

      As the authors prosecute, the co-localisation analysis highlights the variable nature of the endosome and the overlap of different markers. When looking at the LM analysis, I was struck by the variability in the size and number of labelled structures in the different cells. For example, in 3A Rab7 is 2 blobs but in 3B Cell 1 it is 4/5 blobs. Is this just a reflection of the increase in the endosome during the cell cycle?

      The variability in representation is a direct consequence of the dynamic nature of the labelled structures. For this reason, we deliberately selected different cells to represent examples of how the labelling can look like. We have decided not to mention the dynamics of the endosome during the cell cycle. This will be the subject of a further report.

      Moreover, Rab 11 looks to be the marker covering the greatest volume of the endosomal system - is this true? I think there's more analysis of this data that could be done to try and get more information about the relative volumes etc of the different markers that haven't been drawn out. The focus here is on the co-localisation.

      Precisely because we recognize the importance of this point, we intend to turn our attention to the cell cycle in a separate publication.

      I appreciate that it is an awful lot of work to perform the immuno-EM and the data is of good quality but in the text, there could be a greater effort to tie this to the LM data. For example, from the Rab11 staining in LM you would expect this marker to be the most extensive across the networks - is this reflected in the EM?

      For the immuno-EM there were no numbers, the authors had measured the position of the gold but what was the proportion of gold that was in/near membranes for each marker? This would help the reader understand both the number of particles seen and the enrichment of the different regions.

      Our original intent was to perform a thorough quantification (using stereology) of the immuno-EM data. However, we later realized that the necessary random imaging approach is not suitable for Tokuyasu sections of trypanosomes. In short, the cells are too far apart, and the cell sections are only occasionally cut so that the endosomal membranes are sufficiently visible. Nevertheless, we continue to strive to generate more quantitative data using conventional immuno-EM.

      The innovative combination of Tokuyasu tomograms with immuno-EM was great. I noted though that there was a lack of fenestration in these models. Does this reflect the angle of the model or the processing of these samples?

      We are grateful to the referee, as we have asked ourselves the same question. However, we do not attribute the apparent lack of fenestration to the viewing angle, since we did not find fenestration in any of the Tokuyasu tomograms. Our suspicion is more directed towards a methodological problem. In the Tokuyasu workflow, all structures are mainly fixed with aldehydes. As a result, lipids are only effectively fixed through their association with membrane proteins. We suggest that the fenestration may not be visible because the corresponding lipids may have been lost due to incomplete fixation.

      We now clearly state this in the lines 563 – 568.

      “Interestingly, these tomograms did not exhibit the fenestration pattern identified in conventional electron tomography. We suspect that this is due to methodological reasons. The Tokuyasu procedure uses only aldehydes to fix all structures. Consequently, effective fixation of lipids occurs only through their association with membrane proteins. Thus, the lack of visible fenestration is likely due to possible loss of lipids during incomplete fixation.”

      The discussion needs to be reworked. Throughout it contains references to results not in the main results section such as supplementary movie 2 (line 735). The explicit references to the data and figures felt odd and more suited to the results rather than the discussion. Currently, each result is discussed individually in turn and more effort needs to be made to integrate the results from this analysis here but also with previous work and the data from other organisms, which at the moment sits in a standalone section at the end of the discussion.

      We have improved the discussion and removed the previous supplementary movies 2 and 3. Supplementary movie 1 is now mentioned in the results section.

      Line 693 - There was an interesting point about dividing cells describing the maintenance of endosomes next to the old pocket. Does that mean there was no endosome by the new pocket and if so where is this data in the manuscript? This point relates back to my question about how cells were chosen for analysis - how many dividing cells were examined by tomography?

      The fate of endosomes during the cell cycle is not the subject of this paper. In this manuscript we only show only one dividing cell using tomography. An in-depth analysis focusing on what happens during the cell cycle will be published separately.

      Line 729 - I'm unclear how this represents a polarization of function in the flagellar pocket. The pocket I presume is included within the endosomal system for this analysis but there was no specific mention of it in the results and no marker of each position to help define any specialisation. From the results, I thought the focus was on endosomal co-localisation of the different markers. If the authors are thinking about specialisation of the pocket this paper from Mark Field shows there is evidence for the exocyst to be distributed over the entire surface of the pocket, which is relevant to the discussion here. Boehm, C.M. et al. (2017) The trypanosome exocyst: a conserved structure revealing a new role in endocytosis. PLoS Pathog. 13, e1006063

      We have formulated our statement more cautiously. However, we are convinced that membrane exchange cannot physically work without functional polarization of the pocket. We know that Rab11, for example, is not evenly distributed on the pocket. By the way, in Boehm et al. (2017) the exocyst is not shown to cover the entire pocket (as shown in Supplementary Video 1).

      We now refer to Boehm et al. (Lines 700 – 703):

      “Boehm et al (2017) report that in the flagellar pocket endocytic and exocytic sites are in close proximity but do not overlap. We further suggest that the fusion of EXCs with the flagellar pocket membrane and clathrin-mediated endocytosis take place on different sites of the pocket. This disparity explains the lower colocalization between TbRab11 and TbRab5A.”

      Line 735 - link to data not previously mentioned I think. When I looked at this data I couldn't find a key to explain what all the different colours related to.

      We have removed the previous supplementary movies 2 and 3. We now reference supplementary movie 1 in the results section.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Du et al. address the cell cycle-dependent clearance of misfolded protein aggregates mediated by the endoplasmic reticulum (ER) associated Hsp70 chaperone family and ER reorganisation. The observations are interesting and impactful to the field.

      Strength:

      The manuscript addresses the connection between the clearance of misfolded protein aggregates and the cell cycle using a proteostasis reporter targeted to ER in multiple cell lines. Through imaging and some biochemical assays, they establish the role of BiP, an

      Hsp70 family chaperone, and Cdk1 inactivation in aggregate clearance upon mitotic exit.

      Furthermore, the authors present an initial analysis of the role of ER reorganisation in this clearance. These are important correlations and could have implications for ageingassociated pathologies. Overall, the results are convincing and impactful to the field.

      Weakness:

      The manuscript still lacks a mechanistic understanding of aggregate clearance. Even though the authors have provided the role of different cellular components, such as BiP, Cdk1 and ATL2/3 through specific inhibitors, at least an outline establishing the sequence of events leading to clearance is missing. Moreover, the authors show that the levels of ERFlucDM-eGFP do not change significantly throughout the cell cycle, indicating that protein degradation is not in play. Therefore, addressing/elaborating on the mechanism of disassembly can add value to the work. Also, the physiological relevance of aggregate clearance upon mitotic exit has not been tested, nor have the cellular targets of this mode of clearance been identified or discussed.

      Thank you for your suggestions. 

      We have added descriptions about the sequence of events leading to clearance in the abstract (line 33) and discussion (line 316). 

      We have commented on the future work that could address the molecular mechanisms behind the aggregate clearance in the discussion (line 388). 

      It has been difficult to address the physiological relevance of aggregate clearance during cell division, as the inhibition of BiP or depletion of ATL2/3 that prevent aggregate clearance cause cellular consequences not specific to aggregate clearance. Future work that lead to understanding of aggregate clearance at the molecular level may allow us to address this more specifically. Furthermore, we have commented about the potential defects that could arise in cells expressing ER-FlucDM-eGFP that have a perturbed cellular health based on the proteomic analysis (line 359). 

      To identify pathological targets that undergo clearance as the ER-FlucDM-eGFP, we tested three pathological mutants (CFTR-∆F508, AAT S and Z variants) that are known to mis-fold and accumulate in the ER. Unfortunately, expression of these mutants did not result in the confinement of aggregates in the nucleus. The data related to this have been added as Figure S1E and S1F (line 102) in this revised manuscript. We have also commented in the discussion that pathological targets are yet to be identified and could be a part of future work (line 392).

      Reviewer #2 (Public review):

      This paper describes an interesting observation that ER-targeted misfolded proteins are trapped within vesicles inside nucleus to facilitate quality control during cell division. This work supports the concept that transient sequestration of misfolded proteins is a fundamental mechanism of protein quality control. The authors satisfactorily addressed several points asked in the review of first submission. The manuscript is improved but still unable to fully address the mechanisms.

      Strengths:

      The observations in this manuscript are very interesting and open up many questions on proteostasis biology.

      Weaknesses:

      Despite inclusions of several protein-level experiments, the manuscript remained a microscopy-driven work and missed the opportunity to work out the mechanisms behind the observations.

      Thank you for your suggestions. We believe that our study has provided a genetic basis for the involvement of ER reorganization and BiP during cell division in aggregate clearance, which is a new observation. We have also commented in this revised manuscript about the future work that could address the molecular mechanisms behind the aggregate clearance in the discussion (line 388).  

      Reviewer #3 (Public review):

      This paper describes a new mechanism for the clearance of protein aggregates associated to endoplasmic reticulum re-organization that occurs during mitosis.

      Experimental data showing clearance of protein aggregates during mitosis is solid, statistically significant, and very interesting. The authors made several new experiments included in the revised version to address the concerns raised by reviewers. A new proteomic analysis, co-localization of the aggregates with the ER membrane Sec61beta protein, expression of the aggregate-prone protein in the nucleus does not result in accumulation of aggregates, detection of protein aggregates in the insoluble faction after cell disruption and mostly importantly knockdown of ATL proteins involved in the organization of ER shape and structure impaired the clearance mechanism. This last observation addresses one of the weakest points of the original version which was the lack of experimental correlation between ER structure capability to re-shape and the clearance mechanism.

      In conclusion, this new mechanism of protein aggregate clearance from the ER was not completely understood in this work but the manuscript presented, particularly in the revised version, an ensemble of solid observations and mechanistic information to scaffold future studies that clarify more details of this mechanism. As stated by the authors: "How protein aggregates are targeted and assembled into the intranuclear membranous structure waits for future investigation". This new mechanism of aggregate clearance from the ER is not expected to be fully understood in a single work but this paper may constitute one step to better comprehend the cell capability to resolve protein aggregates in different cell compartments.

      We thank the reviewer for the comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript presents a very interesting set of observations that could have significant implications on age-related protein misfolding and aggregate clearance. There are a few places in the manuscript that still need more clarity. Some are listed below, which I think can improve the manuscript.

      - The new data associated with proteomic analysis is appreciated, but the information gained has not been explored or elaborated sufficiently in the manuscript. Based on the differential expression of cell cycle proteins, how the authors interpret cellular health is unclear. Also, the physiological role of this mode of aggregate clearance remains unclear.

      We have added our interpretation of perturbed cellular health in cells expressing ERFlucDM-eGFP in the discussion (line 359). 

      It has been difficult to address the physiological relevance of aggregate clearance during cell division, as the inhibition of BiP or depletion of ATL2/3 that prevent aggregate clearance cause cellular consequences not specific to aggregate clearance. Future work that lead to understanding of aggregate clearance at the molecular level may allow us to address this more specifically.

      - In Figure 3A, have the authors measured the total GFP intensity from interphase through early G1? Even though the number and area of the aggregates decrease significantly, the cytoplasmic GFP signal does not seem to increase. Considering new CHX chase experiments and total Fluorescence intensity calculations (Figure S7D), which indicate no difference, one would expect an increase in cytoplasmic signal upon the disassembly of aggregates. Therefore, the data from Figures 3A and 7D seem contradictory. Can the authors please explain?

      We apologized for the confusion. The images in Figure 3A were derived from fixed cells. So, different cells were shown in every cell cycle phases and were not suitable for quantification. Fluorescence intensity changes could be better appreciated in Figure 3C or 4D as these were time-lapse microscopy images of live cells progressing through mitosis and cytokinesis. Data used in the quantification of fluorescence intensity in Figure S7D were derived from live cells taken from specific time points to avoid unwanted fluorescence bleaching during time-lapse microscopy. 

      - Do the authors expect a similar clearance of pathological aggregates such as mutant FUS or TDP43 condensates? Showing aggregate disassembly of disease-relevant aggregates would be an excellent addition to the manuscript, but it might be beyond the scope of the current version. However, the authors can comment/speculate how their study might extend to pathological condensates.

      We tested three pathological mutants (CFTR-∆F508, AAT S and Z variants) that are known to mis-fold and accumulate in the ER. Unfortunately, expression of these mutants did not result in the confinement of aggregates in the nucleus. The data related to this have been added as Figure S1E and S1F (line 102) in this revised manuscript. We have commented that pathological targets are yet to be identified and could be a part of future work (line 392).

      - The presence of ER membrane around these aggregates is an interesting observation. This membrane is retained even after nuclear membrane breakdown. What could be the relevance of membrane-bound aggregates, especially since the membrane can limit the access of chaperones involved in disassembly? This observation becomes more important since the depletion of ER membrane fusion proteins also leads to the accumulation of aggregates. Are the membranes a beacon for disassembly? The authors may comment/ speculate. This could also be an important aspect of the mechanism of clearance.

      We think that the ER membranes around the aggregates are disassembled when the ER networks reorganize during mitotic exit and this may allow accessibility of BiP to disaggregate the aggregates. We have added this in the discussion (line 316).

    1. eLife Assessment

      This important manuscript uses circuit mapping, chemogenetics, and optogenetics to demonstrate a novel hippocampal lateral septal circuit that regulates social novelty behaviours and shows that downstream of the hippocampal septal circuit, septal projections to the ventral tegmental area are necessary for general novelty discrimination. The strength of the evidence supporting the claims is convincing but would be strengthened by the inclusion of additional functional assays. The work will be of interest to systems and behavioural neuroscientists who are interested in the brain mechanisms of social behaviours.

      We thank the reviewers for their thoughtful and constructive feedback. We are excited that both reviewers thought that the manuscript was of “interest to specialists in the field and to the broad readership of the journal”, that the paper was “well-written and logically organized” and that the “study opens an avenue to study these circuits further to uncover the plasticity and synaptic mechanisms regulating social novelty preference.” Additionally, the reviewers wrote that the experiments were “well-designed” “with clever controls and conditions to provide compelling evidence for their conclusion.” The reviewers additionally provided constructive feedback, which we address in our responses below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study investigated the neural circuits underlying social novelty preference in mice. Using viral circuit tracing, chemogenetics, and optogenetics in the vHPC, LS, and VTA, the authors found that vHPC to LS projections may contribute to the salience of social novelty investigations. In addition, the authors identify LS projections to the VTA involved in social novelty and familiar food responses. Finally, via viral tracing, they demonstrate that vHPC-LS neurons may establish direct monosynaptic connections with VTA dopaminergic neurons. The experiments are well-designed, and the conclusions are mostly very clear. The manuscript is well-written and logically organized, and the content will be of interest to specialists in the field and to the broad readership of the journal.

      Strengths:

      (1) The vHPC has been involved in social memory for novel and familiar conspecifics. Yet, how the vHPC conveys this information to drive motivation for novel social investigations remains unclear. The authors identified a pathway from the vHPC to the LS and eventually the VTA, that may be involved in this process.

      (2) Mice became familiar with a novel conspecific by co-housing for 72h. This represents a familiarization session with a longer duration as compared to previous literature. Using this new protocol, the authors found robust social novelty preference when animals were given a choice between a novel and familiar conspecific.

      (3) The effects of vHPC-LS inhibition are specific to novel social stimuli. The authors included novel food and novel object control experiments and those were not affected by neuronal manipulations.

      (4) For optogenetic studies, the authors applied closed-loop photoinhibition only when the animals investigated either the novel conspecific or the familiar. This optogenetic approach allowed for the investigation of functional manipulations to selective novel or familiar stimuli approaches.

      Weaknesses:

      (1) The abstract and the overall manuscript pose that the authors identified a novel vHPC-LS-VTA pathway that is necessary for mice to preferentially investigate novel conspecifics. However, the authors assessed the functional manipulations of vHPC-LS and LS-VTA circuits independently and the sentence could be misleading. Therefore, a viral strategy specifically designed to target the vHPC-LS-VTA circuit combined with optogenetic/chemogenetic tools and behavior may be necessary for the statement of this conclusion.

      The reviewer raises an important point. Although Figure 3 shows that vHPC (vCA1 and vCA3) is the source of the greatest number of monosynaptic inputs onto LS-VTA neurons, we did not perform any experiments that specifically manipulated vHPC neurons that project to LS-VTA neurons. While these experiments would be extremely interesting, they are technically challenging and beyond the scope of this study.

      (2) The authors combined males and females in their analysis, as neural circuit manipulation affected novelty discrimination ratios in both sexes. However, supplementary Figure 1 demonstrates the chemogentic inhibition of vHPC-LS circuit may cause stronger effects in male mice as compared to females.

      The reviewer makes an interesting point. We can confirm that we found no significant differences in the effectiveness of our vHPC-LS inhibition between the males and females (2-factor ANOVA with sex (male/female) and drug condition (saline/CNO) as factors on the discrimination scores of hM4Di expressing animals: interaction p=0.2241, sex: p=0.1233, drug condition: p=0.0166). These data suggest that there are no significant sex differences in the effectiveness of inhibition of the vHPC-LS neurons.

      (3) In most experiments, the same animals were used for social novelty preference, for food or object novelty responses but washout periods between experiments are not mentioned in the methods section. In this line, the authors did not mention the time frame between the closed-loop optogenetic experiments that silenced the vHPC-LS only during familiar and then only novel social investigations. When using the same animals tested for social experiments in the same context there may be an effect of context-dependent social behaviors that could affect future outcomes.

      We thank the reviewer for this important clarification. We apologize for not including these crucial details in our Methods section. For both the chemogenetic and optogenetic inhibition experiments, all conditions were separated by a minimum of 24 hours. In the chemogenetic inhibition experiments, saline and CNO conditions were counterbalanced between animals. Similarly, we counterbalanced the order of light ON vs light OFF conditions across animals during our optogenetic inhibition experiments.

      (4) All the experiments were performed in a non-cell-type-specific manner. The viral strategies used targeted multiple neuronal subpopulations that could have divergent effects on social novelty preference. This constraint could be added in the discussion section.

      The reviewer raises an important point. In our study, while we specifically manipulate projection populations (either vHPC-LS or LS-VTA), it is possible that these projection populations themselves are composed of heterogeneous cell types. It would be an interesting direction of study to pursue in the future.

      (5) The authors' assumptions were all based on experiments of necessity. The authors could use an experiment of sufficiency by targeting for instance the LS-VTA circuit and assess if animals reduce novel social investigations with LS-VTA photostimulation.

      We agree with the reviewers that it would be interesting to determine if LS-VTA neurons are sufficient, in addition to being necessary, to drive social novelty. These will be interesting experiments to pursue in the future.

      Reviewer #2 (Public Review):

      Summary:

      Rashid and colleagues demonstrate a novel hippocampal lateral septal circuit that is important for social recognition and drives the exploration of novel conspecifics. Their study spans from neural tracing to close-loop optogenetic experiments with clever controls and conditions to provide compelling evidence for their conclusion. They demonstrate that downstream of the hippocampal septal circuit, septal projections to the ventral tegmental area are necessary for general novelty discrimination. The study opens an avenue to study these circuits further to uncover the plasticity and synaptic mechanisms regulating social novelty preference.

      Strengths:

      Chemogenetic and optogenetic experiments have excellent behavioral controls. The synaptic tracing provides important information that informs the narrative of experiments presented and invites future studies to investigate the effects of septal input on dopaminergic activity.

      Weaknesses:

      There are unclear methodological important details for circuit manipulation experiments and analyses where multiple measures are needed but missing. Based on the legends, the chemogenetic experiment is done in a within-animal design. That is the same mouse receives SAL and CNO. However, the data is not presented in a within-animal manner such that we can distinguish if the behavior of the same animal changes with drug treatment. Similarly, the methods specify that the optogenetic manipulations were done in three different conditions, but the analyses do not report within-animal changes across conditions nor account for multiple measures within subjects.

      Thank you for raising this important point. We agree that a repeated measures ANOVA would be ideal, but there is sufficient behavioral variability that such analyses will be difficult without very large sample sizes.

      Finally, it is unclear if the order of drug treatment and conditions were counterbalanced across subjects.

      As mentioned in the above response to Reviewer 1, for both the chemogenetic and optogenetic inhibition experiments, all conditions were separated by a minimum of 24 hours and we counterbalanced the order of chemogenetic (saline/CNO) and optogenetic (light ON/light OFF) experimental manipulations across animals.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates the potential of targeting specific regions within the RNA genome of the Porcine Epidemic Diarrhea Virus (PEDV) for antiviral drug development. The authors used SHAPE-MaP to analyze the structure of the PEDV RNA genome in infected cells. They categorized different regions of the genome based on their structural characteristics, focusing on those that might be good targets for drugs or small interfering RNAs (siRNAs).

      They found that dynamic single-stranded regions can be stabilized by compounds (e.g., to form G-quadruplexes), which inhibit viral proliferation. They demonstrated this by targeting a specific G4-forming sequence with a compound called Braco-19. The authors also describe stable (structured) single-stranded regions that they used to design siRNAs showing that they effectively inhibited viral replication.

      Strengths:

      There are a number of strengths to highlight in this manuscript.

      (1) The study uses a sophisticated technique (SHAPE-MaP) to analyze the PEDV RNA genome in situ, providing valuable insights into its structural features.

      (2) The authors provide a strong rationale for targeting specific RNA structures for antiviral development.

      (3) The study includes a range of experiments, including structural analysis, compound screening, siRNA design, and viral proliferation assays, to support their conclusions.

      (4) Finally, the findings have potential implications for the development of new antiviral therapies against PEDV and other RNA viruses.

      Overall, this interesting study highlights the importance of considering RNA structure when designing antiviral therapies and provides a compelling strategy for identifying promising RNA targets in viral genomes.

      Weaknesses:

      I have some concerns about the utility of the 3D analyses, the effects of their synonymous mutants on expression/proliferation, a potentially missed control for studies of mutants, and the therapeutic utility of the compound they tested vs. Gquadruplexes.

      We thank the reviewer for their positive assessment and insightful comments. Below, we address each point of concern:

      (1) The utility of the 3D analyses:

      In the revised manuscript, we have toned down this discussion and moved Figure 3A to the supplementary materials to reduce any sense of fragmentation in the overall story. While SHAPE-MaP technology is mature and convenient to use and can indeed capture some RNA structural elements with special functions in certain case; we acknowledge that its application for 3D analyses requires further validation. We believe this approach will become more prevalent in future research.

      (2) The effects of synonymous mutants on expression/proliferation:

      In the PEDV genome, the PQS1 mutation site encodes lysine (AAG). Given that lysine has only two codons (AAG and AAA), the G3109A synonymous mutation represented our sole viable option. Published studies (Ding et al., 2024) confirm that neither AAG nor AAA are classified as rare or dominant codons in mammalian cells. Therefore, the observed changes in viral proliferation levels are likely to stem from alterations in RNA secondary structure rather than codon usage effects.

      REFERENCES:

      Ding W, Yu W, Chen Y, et al. Rare codon recoding for efficient noncanonical amino acid incorporation in mammalian cells. Science. 2024;384(6700):1134-1142. 

      (3) Potentially missed control for studies of mutants:

      In the revised manuscript, we have incorporated additional control experiments evaluating Braco-19's therapeutic effects on the PQS3 mutant strain (Figure 4 – figure supplement 3):

      (4) The therapeutic utility of Braco-19 vs. G-quadruplexes:

      While Braco-19 is indeed a broad-spectrum G4 ligand, our data clearly show that not all PQSs in the viral genome can form G4 structures. Our findings primarily provide proof-of-concept that sequences with high G4-forming potential in viral genomes represent viable targets for antiviral therapy. Future studies could leverage SHAPEguided structural insights to design ligands with enhanced specificity for viral G4s, potentially improving therapeutic utility while minimizing off-target effects.

      Reviewer #2 (Public review):

      Summary:

      Luo et. al. use SHAPE-MaP to find suitable RNA targets in Porcine Epidemic Diarrhoea Virus. Results show that dynamic and transient structures are good targets for small molecules, and that exposed strand regions are adequate targets for siRNA. This work is important to segment the RNA targeting.

      Strengths:

      This work is well done and the data supports its findings and conclusions. When possible, more than one technique was used to confirm some of the findings.

      Weaknesses:

      The study uses a cell line that is not porcine (not the natural target of the virus).

      We thank the reviewer for their insightful comments and recognition of our study's value. The most commonly employed cell models for in vitro PEDV studies are monkey-derived Vero E6 cells and porcine PK1 cells. However, PEDV (particularly our strain) exhibits significantly lower replication efficiency in PK1 cells compared to Vero cells, and no cytopathic effects were observed in PK1 cells. In our preliminary attempts to perform SHAPE-MaP experiments using infected PK1 cells, the sequencing data showed less than 0.03% alignment to the PEDV genome, rendering subsequent analysis and downstream experiments unfeasible.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Luo et al. applied SHAPE-Map to analyze the secondary structure of the Porcine Epidemic Diarrhoea Virus (PEDV) RNA genome in infected cells. By combining SHAPE reactivity and Shannon entropy, the study indicated that the folding of the PEDV genomic RNA was nonuniform, with the 5' and 3' untranslated regions being more compactly structured, which revealed potentially antiviral targetable RNA regions. Interestingly, the study also suggested that compounds bound to well-folded RNA structures in vitro did not necessarily exhibit antiviral activity in cells, because the binding of these compounds did not necessarily alter the functions of the well-folded RNA regions. Later in the manuscript, the authors focus on guanine-rich regions, which may form G-quadruplexes and be potential targets for small interfering RNA (siRNA). The manuscript shows the binding effect of Braco-19 (a G-quadruplex-binding ligand) to a predicted G4 region in vitro, along with the inhibition of PEDV proliferation in cells. This suggests that targeting high SHAPE-high Shannon G4 regions could be a promising approach against RNA viruses. Lastly, the manuscript identifies 73 singlestranded regions with high SHAPE and low Shannon entropy, which demonstrated high success in antiviral siRNA targeting.

      Strengths:

      The paper presents valuable data for the community. Additionally, the experimental design and data analysis are well documented.

      Weakness:

      The manuscript presents the effect of Braco-19 on PQS1, a single G4 region with high SHAPE and high Shannon entropy, to suggest that "the compound can selectively target the PQS1 of the high SHAPE-high Shannon region in cells" (lines 625-626). While the effect of Braco-19 on PQS1 is supported by strong evidence in the manuscript, the conclusion regarding the G4 region with high SHAPE and high Shannon entropy is based on a single target, PQS1.

      We thank the reviewer for their positive assessment of our methodology and dataset. We propose that dynamic RNA structures in high SHAPE-high Shannon regions, when stabilized by small molecules, can serve as viable targets for antiviral therapy. Gquadruplexes represent a characteristic type of such dynamic structures that compete with local stem-loop formations in the genome. While we identified seven highly conserved PQSs in the PEDV genome, only PQS1 was located within a high SHAPEhigh Shannon region. To further validate this concept, we have supplemented the revised manuscript with Thioflavin T (ThT) fluorescence turn-on assays (Figures 3D, 3E, and Figure 3 – figure supplement 6), which provide additional evidence for the differential G4-forming capabilities of PQSs across regions with distinct structural features.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major Comments:

      (1) It could be valuable for the authors to spend some more effort comparing their approach to siRNA target discovery and design to current methods for siRNA design. It would be good to highlight which components are novel, and which might offer superior performance with respect to other existing methods.

      We thank the reviewer for highlighting this important point. In response, we have rewritten the relevant section in the discussion:

      “Our approach uniquely integrates in situ RNA structural data (SHAPE reactivity and Shannon entropy) to prioritize siRNA targets within stable single-stranded regions (high SHAPE reactivity, low Shannon entropy), which are experimentally validated as accessible in infected cells. This represents a significant departure from traditional siRNA design methods that rely primarily on sequence conservation, thermodynamic rules (e.g., Tuschl rules), or in vitro structural predictions (Ali Zaidi et al., 2023; Qureshi et al., 2018; Tang and Khvorova, 2024),which may not accurately reflect intracellular RNA accessibility. Bowden-Reid et al. designed 39 antiviral siRNAs against various SARS-CoV-2 variants based on sequence conservation, ultimately identifying 8 highly effective sequences (Bowden-Reid et al., 2023). Notably, five of these effective sequences targeted regions that were located in high SHAPE-high Shannon regions according to SARS-CoV-2 SHAPE datasets (Supplementary Table 8) (Manfredonia et al., 2020). This independent finding aligns perfectly with our conclusions and demonstrates that SHAPE-based siRNA design outperforms sequence/structureagnostic approaches, at least in terms of significantly improving antiviral siRNA screening efficiency. Given the growing availability of SHAPE datasets for numerous viruses, we are confident that our methodology will facilitate more precise design of antiviral siRNAs.”

      (2) The section targeting their discovered G4 structure with Braco-19 is interesting, particularly showing effects on viral proliferation; however, it's not clear to me how this compound could be used therapeutically against PEDV, as it is a non-selective binder of G4 structures. Their results are good support for the presence and functionality of a G4 structure in PEDV, but I don't see any strategy outlined in the manuscript on how this could be specifically targeted with Braco-19.

      While Braco-19 is indeed a broad-spectrum G4 ligand, our data demonstrate that not all PQSs in the viral genome can form G4 structures under physiological conditions. Our results specifically show that Braco-19 exerts its anti-PEDV activity by targeting PQS1, which is located in a high SHAPE-high Shannon entropy region. This target specificity was further confirmed by the complete resistance of the PQS1mut strain (lacking G4-forming ability) to Braco-19 treatment in our in vitro assays. 

      Additionally, previous studies have reported that during rapid viral replication, viral RNA accumulates to levels that significantly exceed host RNA concentrations. This "concentration advantage" suggests that G4 ligands like Braco-19 would preferentially bind viral G4 structures over host targets, thereby enhancing their antiviral specificity in vivo. In summary, our data provide proof-of-concept that viral genomic regions with high G4-forming potential - particularly those in high SHAPE-high Shannon entropy regions - represent promising targets for antiviral therapy.

      (3) The section where they proposed 3D RNA structures based on sequence similarity feels "tacked on" and I don't see how it adds to the overall story. The authors identify a short RNA hairpin in the PEDV genome with some sequence similarity to the CPEB3 nuclease P4 hairpin. However, they don't provide any evidence that this motif functions in a similar way or that it's important for the virus's life cycle. They also don't explain how this similarity could be exploited for antiviral drug development. It's not clear whether targeting this motif would have any effect on the virus. It's interesting that these two sequences share nucleotides, but it's unlikely that they share any homology...perhaps they convergently evolved (or were captured), but the similarity could also be coincidental.

      We appreciate the reviewer's insightful observation regarding this section. While our intention was to demonstrate that flexible conformations in high SHAPE-high Shannon regions could potentially be targeted, we acknowledge that extensive discussion of these motifs' functions would exceed the scope of this study, resulting in some disconnection from the main narrative. In response to this valuable feedback, we have consequentially removed it from the manuscript.

      (4) The authors should consider the optimality of the synonymous mutation (G3109A) that they introduced, as G3109A could swap a rare codon for a more optimal one. Even though the protein sequence is unaffected, the translation rate (and ability to proliferate) could be very different due to altered codon optimality. Additionally, to show the inactivity of the PQS3 mutant, the Braco-19 treatment studies performed on the PQS1 mutants could be repeated with PQS3 - using this as a control for these experiments.

      We appreciate the reviewer's insightful comment regarding codon optimization. In the PEDV genome, the PQS1 mutation site encodes lysine (AAG). Since lysine has only two codons (AAG and AAA), the G3109A synonymous mutation was our only viable option. Published literature (Ding et al. 2024) confirms that neither AAG nor AAA are classified as either preferred or rare codons in mammalian cells. Therefore, this substitution should have minimal direct impact on translation efficiency. Compared to nonsynonymous mutations that would alter amino acid sequences, we believe this synonymous mutation represents the optimal approach for maintaining native protein function while introducing the desired structural modification.

      REFERENCES:

      Ding W, Yu W, Chen Y, et al. Rare codon recoding for efficient noncanonical amino acid incorporation in mammalian cells. Science. 2024;384(6700):1134-1142.

      In the revised version, we have added control experiments showing the inhibitory activity of Braco-19 against the PQS3 mutant strain (Figure 4—figure supplement 3C) and discussed it in the results section.

      “Furthermore, as a control, we observed nearly identical inhibitory activity of Braco19 against both the PQS3 mutant strain (AJ1102-PQS3mut) and wild-type virus (Figure 4—figure supplement 3C), demonstrating the specificity of Braco-19's action on PQS1.”

      Minor Comments:

      (5) The authors' description of the Shannon Entropy could be improved. The current description makes it seem like the Shannon Entropy only provides information on base pairing, however, the Shannon entropy quantifies the uncertainty of structural states at each position and is calculated based on the probabilities of the different states (paired or unpaired) that a nucleotide can adopt.

      We have revised the description of Shannon entropy in the manuscript:

      "The pairing probability of each nucleotide derived from SHAPE reactivities was subsequently used to calculate Shannon entropy. Regions with high Shannon entropy may adopt alternative conformations, while those with low Shannon entropy correspond to either well-defined RNA structures or persistently single-stranded regions (MATHEWS, 2004; Siegfried et al., 2014)."

      (6) The overall writing of the manuscript is very good, but there are some minor grammatical issues throughout, e.g., here are some of the ones that I caught:

      a) Lines 71-3: "various types of RNA structures such as hairpin structure, RNA singlestrand, RNA pseudoknot and RNA G-quadruplex (G4)" - the examples should be plural and, rather than "hairpins" (or in addition), perhaps add "helixes" to be more generically correct(?).

      We have revised the relevant description: 

      "various types of RNA structures such as stem-loop structures (with double-helical stems), RNA single-strand, RNA pseudoknot and RNA G-quadruplex (G4)"

      b) Lines 74-5: "Of these, RNA G4 has shown considerable promise because of the high stability and modulation by small molecules" should be "Of these, RNA G4 has shown considerable promise because of its high stability and ability for modulation by small molecules."

      We have revised the sentence:

      “Of these, RNA G4 has shown considerable promise because of its high stability and ability for modulation by small molecules.”

      c) Line 76: "have" should be "has".

      We have revised the sentence.

      d) Lines 104-5 (and elsewhere): "frameshift stimulation element (FSE)" should be "frameshift stimulatory element (FSE)".

      We have revised the sentence.

      e) Lines 428-9: following the Manfredonia's methods" should be "following Manfredonia's method" or "following the Manfredonia method".

      We have made the appropriate edit.

      These edits ensure grammatical accuracy and consistency with standard scientific terminology. We appreciate the reviewer's attention to detail, which has significantly improved the clarity of our manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) There are some important references missing, on shape-seq from Julius Lucks.

      We have added citations to the foundational work by Lucks et al. (2011, PNAS) that pioneered in vitro RNA structure probing using SHAPE-seq.

      (2) Describe the acronym "SHAPE",

      We have now included the full name of SHAPE:“Selective 2’-Hydroxyl Acylation and Primer Extension”.

      (3) Line 81: 2"-hydroxyl-selective - the prime is incorrect.

      We thank the reviewer for catching this technical error. We have corrected "2"hydroxyl" to "2'-hydroxyl".

      (4) Explaining a bit better how shape reagent works would be beneficial (one sentence should suffice).

      We have revised the Introduction section:

      “SHAPE reagents like NAI selectively modify flexible, unpaired 2′-OH groups in RNA, and these modifications are detected as mutations during reverse transcription, enabling precise mapping of RNA secondary structures through sequencing.”

      (5) Line 128: cite the paper that introduced NAI.

      We have now properly cited the original publication introducing NAI(Spitale et al., 2012).

      (6) Line 243: Can you describe what the compound is?

      The compound is Braco-19. This has now been included in the methods section. 

      (7) Line 272: describe what 3Dpol is and the source of it.

      We have supplemented the relevant information as follows:

      "3Dpol (recombinant RNA-dependent RNA polymerase; Abcam, ab277617, 0.02 mg/reaction)"

      (8) Figure 1 legend: For both C and D, the explanation of the G4 structure and the RISC complex should be added, otherwise, it becomes unclear why they are there.

      We have revised the captions for Figure 1 as follows:

      "(A) Well-folded regions (low SHAPE reactivity and low Shannon entropy; 26.40% of genome). These regions represent stably folded RNA structures with minimal conformational flexibility, likely serving as structural scaffolds or functional elements in viral replication. (B) Dynamic structured regions (low SHAPE reactivity and high Shannon entropy; 11.70% of genome). These conformationally plastic domains likely mediate regulatory switches between alternative secondary structures during infection. (C) Dynamic unpaired regions (high SHAPE reactivity and high Shannon entropy; 26.90% of genome). These regions are prone to form non-canonical nucleic acid structures (e.g., G-quadruplexes), which can be stabilized by small-molecule ligands to inhibit viral replication. (D) Persistent unpaired regions (high SHAPE reactivity and low Shannon entropy; 9.67% of genome). These regions are more accessible for siRNA binding, facilitating recruitment of Argonaute proteins and Dicer to form the RNAinduced silencing complex (RISC) for targeted cleavage."

      (9) Figure S2 panel A should be in Figure 1. This is a nice picture showing the backbone of the research.

      In the revised manuscript, we have reorganized Figure 1 and Figure S2 by incorporating the SHAPE-MaP workflow diagram (previously Figure S2A) into Figure 1 as panel (A): 

      (10) Please add the citation to Braco-19.

      We have now added the appropriate citation for Braco-19 (Gowan et al., 2002) in the revised manuscript.

      (11) Figure 5 legend: could you add in parenthesis the what ds means (and call Figure S28).

      We appreciate the reviewer's attention to detail. In the revised manuscript, we have clarified the abbreviations in the Figure 5 legend: ss (single-stranded targeting siRNAs); ds (dual-stranded targeting siRNAs). 

      (12) Line 107: I would argue that the "stabilization of a G4" inhibited viral proliferation. And that supports the point of the paper, that a small molecule that stabilizes the G4 can be used to reduce viral replication. I suggest emphasizing this thorough the paper.

      We fully concur with the reviewer's insightful perspective. In the revised manuscript, we have comprehensively strengthened the point of 'G4 stabilization' as an antiviral mechanism through the following enhancements:

      (1) In the Results section: We present Thioflavin T (ThT) fluorescence assays demonstrating the G4-forming capability of PQSs in the full-length PEDV genomic RNA context:

      “These findings indicate that although most PQSs can form G4 structures in vitro, PQS1—located in the high SHAPE-high Shannon entropy region—demonstrates the most robust G4-forming capability when competing with local secondary structures in the genomic context.”

      (2) In the Results section: The inclusion of Braco-19 inhibition assays using PQS3 mutant virus as control provides robust evidence that Braco-19 exerts its antiviral effects specifically through PQS1 stabilization:

      “Furthermore, as a control, we observed nearly identical inhibitory activity of Braco-19 against both the PQS3 mutant strain (AJ1102-PQS3mut) and wild-type virus, demonstrating the specificity of Braco-19's action on PQS1.”

      (3) In the Discussion section: We have rewritten the mechanistic interpretation to emphasize: 

      "Crucially, Braco-19 showed no inhibitory activity against the PQS1-mutant strain while maintaining potent activity against the PQS3-mutant strain (Figure 4E, Figure 4—figure supplement 3C). This suggests that the compound can selectively target the PQS1 of the high SHAPE-high Shannon region in cells." 

      (13) For PQS1, it's suggested that it is indeed a competing and transient conformation that forms the G4. I wonder if using an extended PQS1 (perhaps what is shown in Figure 3E) and using fluorescence, and/or K+ vs Li+, and/or in-vitro SHAPE could tell us more about this dynamic structure. Thioflavin T or any other fluorescent molecule that binds to G4s could be easily used to show how the formation of G4 may happen or not. In addition, how Braco-19 could really lock the dynamic structure in-vitro as well. I think the field would benefit from a deeper investigation of it.

      To address the dynamic competition between G4 and alternative RNA conformations, we performed Thioflavin T (ThT) fluorescence turn-on assay (now in Figure 3D-E and Figure 3—figure supplement 6) under physiological K<sup>+</sup> conditions (100 mM), with PRRSV-G4 RNA as a positive control. This reads as:

      “To validate whether SHAPE analysis could reflect the competitive conformational folding of PQSs in the PEDV genome, we performed in vitro transcription to obtain local intact structures containing PQSs within dynamic single-stranded regions and stable double-stranded regions (Table S6). Thioflavin T (ThT) fluorescence turn-on assays were conducted under physiological K<sup>+</sup> conditions (100 mM), with the G4 sequence of porcine reproductive and respiratory syndrome virus (PRRSV) serving as a positive control (Control-G4)(Fang et al., 2023). The results demonstrated that for short PQSs sequences containing only G4-forming motifs (Table S7), PQS1, PQS3, PQS4, and PQS6 all induced significant ThT fluorescence enhancement (Figure 3D-E, Figure 3—figure supplement 6), confirming their ability to form G4 structures. However, in long RNA fragments encompassing PQSs and their flanking sequences, only PQS1 and PQS4 exhibited pronounced ThT fluorescence responses (Figure 3DE), whereas PQS2, PQS3, and PQS6 showed negligible signals (Figure 3E, Figure 3— figure supplement 6). Notably, the PQS1-long chain displayed the strongest fluorescence signal, while its mutant counterpart (PQS1mut-long chain) exhibited the lowest background fluorescence (Figure 3D). These findings indicate that although most PQSs can form G4 structures in vitro, PQS1—located in the high SHAPE-high Shannon entropy region—demonstrates the most robust G4-forming capability when competing with local secondary structures in the genomic context. Therefore, PQS1 was selected for further structural and functional validation.”

      (14) Figure S29 is nice and informative. Consider moving it to the main text.

      We appreciate the reviewer's positive assessment of Figure S29. Now we have renamed this figure as "Figure 5—Supplement 2".

    1. Author response:

      Reviewer #1:

      (1) Changes in blood volume due to brain activity are indirectly related to neuronal responses. The exact relationship is not clear, however, we do know two things for certain: (a) each measurable unit of blood volume change depends on the response of hundreds or thousands of neurons, and (b) the time course of the volume changes are slow compared to the potential time course of the underlying neuronal responses. Both of these mean that important variability in neuronal responses will be averaged out when measuring blood changes. For example, if two neighbouring neurons have opposite responses to a given stimulus, this will produce opposite changes in blood volume, which will cancel each other out in the blood volume measurement due to (a). This is important in the present study because blood volume changes are implicitly being used as a measure of coding in the underlying neuronal population. The authors need to acknowledge that this is a coarse measure of neuronal responses and that important aspects of neuronal responses may be missing from the blood volume measure.

      The reviewer is correct: we do not measure neuronal firing, but use blood volume as a proxy for bulk local neuronal activity, which does not capture the richness of single neuron responses. We will highlight this point in the manuscript. This is why the paper focuses on large-scale spatial representations as well as cross-species comparison. For this latter purpose, fMRI responses are on par with our fUSI data, with both neuroimaging techniques showing the same weakness.

      (2) More importantly for the present study, however, the effect of (b) is that any rapid changes in the response of a single neuron will be cancelled out by temporal averaging. Imagine a neuron whose response is transient, consisting of rapid excitation followed by rapid inhibition. Temporal averaging of these two responses will tend to cancel out both of them. As a result, blood volume measurements will tend to smooth out any fast, dynamic responses in the underlying neuronal population. In the present study, this temporal averaging is likely to be particularly important because the authors are comparing responses to dynamic (nonstationary) stimuli with responses to more constant stimuli. To a first approximation, neuronal responses to dynamic stimuli are themselves dynamic, and responses to constant stimuli are themselves constant. Therefore, the averaging will mean that the responses to dynamic stimuli are suppressed relative to the real responses in the underlying neurons, whereas the responses to constant stimuli are more veridical. On top of this, temporal following rates tend to decrease as one ascends the auditory hierarchy, meaning that the comparison between dynamic and stationary responses will be differently affected in different brain areas. As a result, the dynamic/stationary balance is expected to change as you ascend the hierarchy, and I would expect this to directly affect the results observed in this study.

      It is not trivial to extrapolate from what we know about temporal following in the cortex to know exactly what the expected effect would be on the authors' results. As a first-pass control, I would strongly suggest incorporating into the authors' filterbank model a range of realistic temporal following rates (decreasing at higher levels), and spatially and temporally average these responses to get modelled cerebral blood flow measurements. I would want to know whether this model showed similar effects as in Figure 2. From my guess about what this model would show, I think it would not predict the effects shown by the authors in Figure 2. Nevertheless, this is an important issue to address and to provide control for.

      We understand the reviewer’s concern about potential differences in response dynamics in stationary vs non-stationary sounds. In particular, it seems that the reviewer is concerned that responses to foregrounds may be suppressed in non-primary fields because foregrounds are not stationary, and non-primary regions could struggle to track and respond to these sounds. Nevertheless, we  observed the contrary, with non-primary regions over-representing non-stationary (dynamic) sounds, over stationary ones. For this reason, we are inclined to think that this explanation cannot falsify our findings.

      Furthermore, background sounds are not completely constant: they are still dynamic sounds, but their temporal modulation rates are usually faster (see Figure 3B). Similarly, neural responses to these two types of sounds are dynamic (see for example Hamersky et al., 2025, Figure 1).  Thus, we are not sure that blood volume would transform the responses to these types of sounds non-linearly.

      We understand the comment that temporal following rates might differ across regions in the auditory hierarchy and agree. In fact, we show that tuning to temporal rates differ across regions and partly explains the differences in background invariance we observe. We think the reviewer’s suggestion is already implemented by our spectrotemporal model, which incorporates the full range of realistic temporal following rates (up to 128 Hz). The temporal averaging is done as we take the output of the model (which varies continuously through time) and average it in the same window as we used for our fUSI data. When we fit this model to the ferret data, we find that voxels in non-primary regions, especially VP (tertiary auditory cortex), tend to be more tuned to low temporal rates (Figure 2F, G), and that background invariance is stronger in voxels tuned to low rates. This is, however, not true in humans, suggesting that background invariance in humans rely on different computational mechanisms.

      (3) I do not agree with the equivalence that the authors draw between the statistical stationarity of sounds and their classification as foreground or background sounds. It is true that, in a common foreground/background situation - speech against a background of white noise - the foreground is non-stationary and the background is stationary. However, it is easy to come up with examples where this relationship is reversed. For example, a continuous pure tone is perfectly stationary, but will be perceived as a foreground sound if played loudly. Background music may be very non-stationary but still easily ignored as a background sound when listening to overlaid speech. Ultimately, the foreground/background distinction is a perceptual one that is not exclusively determined by physical characteristics of the sounds, and certainly not by a simple measure of stationarity. I understand that the use of foreground/background in the present study increases the likely reach of the paper, but I don't think it is appropriate to use this subjective/imprecise terminology in the results section of the paper.

      We appreciate the reviewer’s comment that the classification of our sounds into foregrounds and backgrounds is not verified by any perceptual experiments. We use those terms to be consistent with the literature, including the paper we derived this definition from (Kell et al., 2019). These terms are widely used in studies where no perceptual or behavioral experiments are included, and even when animals are anesthetized. However, we will emphasize the limits of this definition when introducing it, as well as in the discussion.

      (4) Related to the above, I think further caveats need to be acknowledged in the study. We do not know what sounds are perceived as foreground or background sounds by ferrets, or indeed whether they make this distinction reliably to the degree that humans do. Furthermore, the individual sounds used here have not been tested for their foreground/background-ness. Thus, the analysis relies on two logical jumps - first, that the stationarity of these sounds predicts their foreground/background perception in humans, and second, that this perceptual distinction is similar in ferrets and humans. I don't think it is known to what degree these jumps are justified. These issues do not directly affect the results, but I think it is essential to address these issues in the Discussion, because they are potentially major caveats to our understanding of the work.

      We agree with the reviewer that the foreground-background distinction might be different in ferrets. In anticipation of that issue, we had enriched the sound set with more ecologically relevant sounds, such as ferret and other animal vocalizations. Nevertheless, the point remains valid and is already raised in the discussion. We will emphasize this limitation in addition to the limitation of our definition of foregrounds and backgrounds.

      Reviewer #2:

      (1) Interpretation of the cerebral blood volume signal: While the results are compelling, more caution should be exercised by the authors in framing their results, given that they are measuring an indirect measure of neural activity, this is the difference between stating "CBV in area MEG was less background invariant than in higher areas" vs. saying "MEG was less background invariant than other areas". Beyond framing, the basic properties of the CBV signal should be better explored:

      a) Cortical vasculature is highly structured (e.g. Kirst et al.( 2020) Cell). One potential explanation for the results is simply differences in vasculature and blood flow between primary and secondary areas of auditory cortex, even if fUS is sensitive to changes in blood flow, changes in capillary beds, etc (Mace et al., 2011) Nat. Methods.. This concern could be addressed by either analyzing spontaneous fluctuations in the CBV signal during silent periods or computing a signal-to-noise ratio of voxels across areas across all sound types. This is especially important given the complex 3D geometry of gyri and sulci in the ferret brain.

      We agree with the reviewers that there could be differences in vasculature across subregions of the auditory cortex. We will run analyses providing comparisons of basic signal properties across our different regions of interest. We note that this point would also be valid for the human fMRI data, for which we cannot run these controls. Nevertheless, this should not affect our analyses and results, which should be independent of local vascular density. First, we normalize the signal in each voxel before any analysis, so that the absolute strength of the signal, or blood volume in a given voxel, does not matter. Second, we do see sound-evoked responses in all regions (Figure S2) and only focus on reliable voxels in each region. Third, our analysis mostly relies on voxel-based correlation across sounds, which is independent of the mean and variance of the voxel responses. Thus, we believe that differences in vascular architecture across regions are unlikely to affect our results.

      b) Figure 1 leaves the reader uncertain what exactly is being encoded by the CBV signal, as temporal responses to different stimuli look very similar in the examples shown. One possibility is that the CBV is an acoustic change signal. In that case, sounds that are farther apart in acoustic space from previous sounds would elicit larger responses, which is straightforward to test. Another possibility is that the fUS signal reflects time-varying features in the acoustic signal (e.g. the low-frequency envelope). This could be addressed by cross-correlating the stimulus envelope with fUS waveform. The third possibility, which the authors argue, is that the magnitude of the fUS signal encodes the stimulus ID. A better understanding of the justification for only looking at the fUS magnitude in a short time window (2-4.8 s re: stimulus onset) would increase my confidence in the results.

      We thank the reviewer for raising that point as it highlights that the layout of Figure 1 is misleading. While Figure 1B shows an example snippet of our sound streams, Figure 1D shows the average timecourse of CBV time-locked to a change in sound (foreground or background, isolated or in a mixture). This is the average across all voxels and sounds, and the point is just to illustrate the dynamics for the three broad categories. In Figure 1E however, we show the cross-validated cross-correlation of CBV  across sounds (and different time lags). To obtain this, we compute for each voxel the response to each sound at each time lag, thus obtaining two vector of size number of sounds per lag, one per repeat. Then, we correlate all these vectors across the two repeats, obtaining one cross-correlation matrix per neuron. We finally average these matrices across all neurons. The fact that you see red squares demonstrates that the signal encodes sound identity, since CBV is more similar across two repeats of the same sound (for e.g., in the foreground only matrix, 0-5 s vs 0-5 s), than two different sounds (0-5 s vs. 7-12 s). We will modify the figure layout as well as the legend to improve clarity.

      (2) Interpretation of the human data: The authors acknowledge in the discussion that there are several differences between fMRI and fUS. The results would be more compelling if they performed a control analysis where they downsampled the Ferret fUS data spatially and temporally to match the resolution of fMRI and demonstrated that their ferret results hold with lower spatiotemporal resolution.

      We agree with the reviewer that the use of different techniques might come in the way of cross-species comparison. We will add additional discussion on this point. We already control for the temporal aspect by using the average of stimulus-evoked activity across time (note that due to scanner noise, sounds are presented cut into small pieces in the fMRI experiments). Regarding the spatial aspect, there are several things to consider. First, both species have brains of very different sizes, a factor that is conveniently compensated for by the higher spatial resolution of fUSI compared to fMRI (0.1 vs 2 mm). Downsampling to fMRI resolution would lead to having one voxel per region per slice, which is not feasible. We also summarize results with one value per region, which is a form of downsampling that is fairer across species. Furthermore, we believe that we already established in a previous study (Landemard et al, 2021 eLife) that fUSI and fMRI data are comparable signals. We indeed could predict human fMRI responses to most sounds from ferret fUSI responses to the same identical sounds.

      Reviewer #3:

      As mentioned above, interpretation of the invariance analyses using predictions from the spectrotemporal modulation encoding model hinges on the model's ability to accurately predict neural responses. Although Figure S5 suggests the encoding model was generally able to predict voxel responses accurately, the authors note in the introduction that, in human auditory cortex, this kind of tuning can explain responses in primary areas but not in non-primary areas (Norman-Haignere & McDermott, PLOS Biol. 2018). Indeed, the prediction accuracy histograms in Figure S5C suggest a slight difference in the model's ability to predict responses in primary versus non-primary voxels. Additional analyses should be done to a) determine whether the prediction accuracies are meaningfully different across regions and b) examine whether controlling for prediction accuracy across regions (i.e., sub-selecting voxels across regions with matched prediction accuracy) affects the outcomes of the invariance analyses.

      The reviewer is correct: the spectrotemporal model tends to perform less well in human non-primary cortex. We believe this does not contradict our results but goes in the same direction: while there is a gradient in invariance in both ferrets and humans, this gradient is predicted by the spectrotemporal model in ferrets, but not in humans (possibly indeed because predictions are less good in human non-primary auditory cortex). Regardless of the mechanism, this result points to a difference across species. We will clarify these points by quantifying potential differences in prediction accuracy in both species and comment on those in the manuscript.

      A related concern is the procedure used to train the encoding model. From the methods, it appears that the model may have been fit using responses to both isolated and mixture sounds. If so, this raises questions about the interpretability of the invariance analyses. In particular, fitting the model to all stimuli, including mixtures, may inflate the apparent ability of the model to "explain" invariance, since it is effectively trained on the phenomenon it is later evaluated on. Put another way, if a voxel exhibits invariance, and the model is trained to predict the voxel's responses to all types of stimuli (both isolated sounds and mixtures), then the model must also show invariance to the extent it can accurately predict voxel responses, making the result somewhat circular. A more informative approach would be to train the encoding model only on responses to isolated sounds (or even better, a completely independent set of sounds), as this would help clarify whether any observed invariance is emergent from the model (i.e., truly a result of low-level tuning to spectrotemporal features) or simply reflects what it was trained to reproduce.

      We thank the reviewer for this suggestion and will run an additional prediction using only the sounds presented in isolation. This will be included in the next version of the manuscript.

      Finally, the interpretation of the foreground invariance results remains somewhat unclear. In ferrets (Figure 2I), the authors report relatively little foreground invariance, whereas in humans (Figure 5G), most participants appear to show relatively high levels of foreground invariance in primary auditory cortex (around 0.6 or greater). However, the paper does not explicitly address these apparent cross-species differences. Moreover, the findings in ferrets seem at odds with other recent work in ferrets (Hamersky et al. 2025 J. Neurosci.), which shows that background sounds tend to dominate responses to mixtures, suggesting a prevalence of foreground invariance at the neuronal level. Although this comparison comes with the caveat that the methods differ substantially from those used in the current study, given the contrast with the findings of this paper, further discussion would nonetheless be valuable to help contextualize the current findings and clarify how they relate to prior work.

      We thank the reviewer for this point. We will indeed add further discussion of the  difference between ferrets and humans in foreground invariance in primary auditory cortex. In addition, while we found a trend for higher background invariance than foreground invariance in ferret primary auditory cortex, this difference was not significant and many voxels exhibit similar levels of background and foreground invariance (for example in Figure 2D, G). Thus, we do not think our results are inconsistent with Hamersky et al., 2025, though we agree the bias towards background sounds is not as strong in our data. This might indeed reflect differences in methodology, both in the signal that is measured (blood volume vs spikes), and the sound presentation paradigm. We will add this point to our discussion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The ventral nerve cord (VNC) of organisms like Drosophila is an invaluable model for studying neural development and organisation in more complex organisms. Its well-defined structure allows researchers to investigate how neurons develop, differentiate, and organise into functional circuits. As a critical central nervous system component, the VNC plays a key role in controlling motor functions, reflexes, and sensory integration.

      Particularly relevant to this work, the VNC provides a unique opportunity to explore neuronal hemilineages - groups of neurons that share molecular, genetic, and functional identities. Understanding these hemilineages is crucial for elucidating how neurons cooperate to form specialized circuits, essential for comprehending normal brain function and dysfunction.<br /> A significant challenge in the field has been the lack of developmentally stable, hemilineage-specific driver lines that enable precise tracking and measurement of individual VNC hemilineages. The authors address this need by generating and validating a comprehensive, lineage-specific split-GAL4 driver library.

      Strengths and weaknesses

      The authors select new marker genes for hemilineages from previously published single-cell data of the VNC. They generate and validate specific and temporally stable lines for almost all the hemilineages in the VNC. They successfully achieved their aims, and their results support their conclusions. This will be a valuable resource for investigating neural circuit formation and function.

      We thank the reviewer for her/his positive comments and time reviewing our manuscript. We are pleased that the reviewer recognized the value of our work in generating a comprehensive, lineage-specific split-GAL4 driver library for VNC hemilineages. We agree that this will be a critical resource for investigating neural circuit formation and function, and we are encouraged by the positive comments regarding the novelty and potential impact of our approach.

      Reviewer #1 (Recommendations for the authors):

      I have no suggestions for further experiments, data, or analyses. There are some grammatical errors and referencing issues throughout, but the editors will hopefully catch them.

      We appreciate the reviewer’s comments regarding the grammatical errors and referencing issues and have carefully checked the revised manuscript.

      Reviewer #2 (Public review):

      It is my pleasure to review this manuscript from Soffers, Lacin, and colleagues, in which they identify pairs of transcription factors unique to (almost) every ventral nerve cord hemilineage in Drosophila and use these pairs to create reagents to label and manipulate these cells. The advance is sold as largely technical-as a pipeline for identifying durably expressed transcription factor codes in postmitotic neurons from single cell RNAseq data, generating knock-in alleles in the relevant genes, using these to match transcriptional cell types to anatomic cell types, and then using the alleles as a genetic handle on the cells for downstream explication of their function. Yet I think the work is gorgeous in linking the expression of genes that are causal for neuron-type-specific characteristics to the anatomic instantiations of those neurons. It is astounding that the authors are able to use their deep collective knowledge of hemilineage anatomy and gene expression to match 33 of 34 transcriptional profiles. Together with other recent studies, this work drives a major course correction in developmental biology, away from empirically identified cell type "markers" (in Drosophila neuroscience, often genomic DNA fragments that contain enhancers found to be expressed in specific neurons at specific times), and towards methods in which the genes that generate neuronal type identity are actually used to study those neurons. Because the relationship between fate and form/function is built into the tools, I believe that this approach will be a trojan horse to integrate the fields of neural development and systems neuroscience.

      We thank the reviewer for their time reviewing our manuscript, generous compliments, and appreciation of the potential of our study to drive a major shift in developmental biology, moving away from traditional marker-based methods toward utilizing the genes that mark neuronal type identity in “omics” datasets. Much like the Trojan Horse, which, though initially a concealed and subtle tool, we hope that the strategy outlined here will have continued impact, as we and others plan to leverage future high-resolution and developmental series of scRNAseq datasets to generate driver lines to target neuronal cell types with uttermost precision.

      Reviewer #2 (Recommendations for the authors):

      Line 126-127: I'm not sure if it is true to say "most TFs in the CNS are expressed in a hemilineage-specific manner." As the authors haven't formally interrogated how different neuronal features relate to expression patterns of all ~600 Drosophila TFs, how about replacing "most" with "many?"

      The reviewer makes an excellent point. Work by Lacin and colleagues has demonstrated via genetic studies that lineage-specific transcription factors that regulate the specification and differentiation of postembryonic neurons are stably expressed during development. This was documented for 15 transcription factors in Lacin et al., 2014, and our lab has identified additional examples since. When we refer to the stable expression of transcription factors, we refer to such transcription factors, not the complete set of ~600 transcription factors described to date. We have added this citation to clarify this statement and replaced p6 line 135 ”Most”  by “Many”. We have also address this now in the introduction (p5 line 109-116). Of note, as we conducted this study, we found that is closer to be a rule than an exception that if a transcription factor acted cluster as marker, it was also stably expressed during development. Thus, a growing number of transcription factors is now documented to be stably expressed in a hemilineage-specific manner

      Line 265: Typo? 334 should be 34?

      We thank the reviewer for noting this type error. We have corrected this typographical error.

      Line 522: Refs 56, 57 here related to chinmo, mamo, br-c don't show br-c or mamo mark temporal cohorts of postmitotic neurons. Consider adding PMID: 19883497, 18510932, and 31545163.

      We thank the reviewer for pointing this out and have added these references that demonstrate that broad, Mamo and Chinmo mark temporal cohorts in the developing adult CNS (p17 line 535).

      Reviewer #3 (Public review):

      Soffers et al. developed a comprehensive genetic toolkit that enables researchers to access neuronal hemilineages during developmental and adult time points using scRNA-seq analysis to guide gene cassette exchange-based or CRISPR-based tool building. Currently, research groups studying neural circuit development are challenged with tying together findings in the development and mature circuit function of hemilineage-related neurons. Here, authors leverage publicly available scRNA-seq datasets to inform the development of a split-Gal4 library that targets 32 of 34 hemilineages in development and adult stages. The authors demonstrated that the split-Gal4 library, or genetic toolkit, can be used to assess the functional roles, neurotransmitter identity, and morphological changes in targeted cells. The tools presented in this study should prove to be incredibly useful to Drosophila neurobiologists seeking to link neural developmental changes to circuit assembly and mature circuit function. Additionally, some hemilineages have more than one split-Gal4 combination that will be advantageous for studies seeking to disrupt associated upstream genes.

      Strengths:

      Informing genetic tool development with publicly available scRNA-seq datasets is a powerful approach to creating specific driver lines. Additionally, this approach can be easily replicated by other researchers looking to generate similar driver lines for more specific subpopulations of cells, as mentioned in the Discussion.

      The unification of optogenetic stimulation data of 8B neurons and connectomic analysis of the Giant-Fiber-induced take-off circuit was an excellent example of the utility of this study. The link between hemilineage-specific functional assays and circuit assembly has been limited by insufficient genetic tools. The tools and data present in this study will help better understand how collections of hemilineages develop in a genetically constrained manner to form circuits amongst each other selectively.

      Weaknesses:

      Although cell position, morphology (to some extent), and gene expression are good markers to track cell identity across developmental time, there are genetic tools available that could have been used to permanently label cells that expressed genes of interest from birth, ensuring that the same cells are being tracked in fixed tissue images.

      Although gene activation is a good proxy for assaying neurochemical features, relying on whether neurochemical pathway genes are activated in a cell to determine its phenotype can be misleading given that the Trojan-Gal4 system commandeers the endogenous transcriptional regulation of a gene but not its post-transcriptional regulation. Therefore, neurochemical identity is best identified via protein detection. (strong language used in this section of the paper).

      The authors mainly rely on the intersectional expression of transcription factors to generate split-Gal4 lines and target hemilineages specifically. However, the Introduction (Lines 97-99) makes a notable point about how driver lines in the past, which have also predominantly relied on the regulatory sequences of transcription factors, lack the temporal stability to investigate hemilineages across time. This point seems to directly conflict with the argument made in the Results (Lines 126-127) that states that most transcription factors are stably expressed in hemilineage neurons that express them. It is generally known that transcription factors can be expressed stably or transiently depending on the context. It is unclear how using the genes of transcription factors in this study circumvents the issue of creating temporally stable driver lines.<br />

      We thank the reviewer for their time to thoroughly and carefully review our manuscript. We appreciate the reviewer’s comments on its strengths, and we to hope that this body of work will prove to be incredibly useful to Drosophila neurobiologists seeking to link neural developmental changes to circuit assembly and mature circuit function. Likewise, we also appreciate the reviews careful consideration of its weaknesses, as the reviewer raises valid points. We have addressed these in our revised manuscript and believe this has significantly improved our manuscript.

      Weakness 1: Although cell position, morphology (to some extent), and gene expression are good markers to track cell identity across developmental time, there are genetic tools available that could have been used to permanently label cells that expressed genes of interest from birth, ensuring that the same cells are being tracked in fixed tissue images.

      The reviewer is fully correct, and we are aware of techniques developed by the laboratories of U. Banerjee, T. Lee, and J. Truman that can make transient GAL4 expression permanent, such as G-TRACE and lineage filtering. A common feature of these techniques is that effector activity is permanent (FLP-mediated removal of the FRT-flanked stop codon preceding GFP in G-TRACE or LexA in lineage filtering) but not the GAL4 activity, which is needed to take advantage of the vast UAS based effector lines such as RNAi libraries. For example, the study of Harris et al., 2015 from the Truman lab beautifully showed the strength of this kind of approaches for labeling the hemilineages but their approach cannot be used for functional studies for the reasons mentioned above. Fly lines using these approaches already have several transgenes and require the addition of several more to be used for functional studies. Our approach requires only two transgenes and is compatible with all UAS lines. One additional advantage of the splitGAL4 combinations that we identify here is that they are inserted in genes that are stably expressed throughout larval and pupal development in postmitotic cells, such that they can be used for functional manipulations during development. We emphasized this point in the discussion on page 16 under the heading “Mapping and manipulating morphological outgrowth patterns of hemilineages during development”. 

      Weakness 2: Although gene activation is a good proxy for assaying neurochemical features, relying on whether neurochemical pathway genes are activated in a cell to determine its phenotype can be misleading given that the Trojan-Gal4 system commandeers the endogenous transcriptional regulation of a gene but not its post-transcriptional regulation. Therefore, neurochemical identity is best identified via protein detection. (strong language used in this section of the paper).

      We thank the reviewer for bringing up this important point. We agree that the Trojan-GAL4 approach will not faithfully recapitulate expression of genes that undergo posttranscriptional regulation. Our previous eLife paper (Lacin et al., 2019) showed that this is the case for Trojan driver lines for the ChAT gene. This study demonstrated that ChAT drivers unexpectedly but strongly labeled many GABAergic and Glutamatergic neurons in both the brain and VNC. With RNA in situ hybridization and immunostainings approaches, we showed that these neurons indeed express ChAT mRNA but not the protein. After our publication, another group showed a class of miRNA binds to the 3’UTR of the ChAT gene and regulates its expression post-transcriptionally (Griffith 2023). We believe that one major reason the Trojan driver lines do not faithfully recapitulate this expression pattern is due to the presence of the Hsp70 transcriptional terminator located at the 5’ end of the trojan exon which prematurely ends the transcript and affects the host gene’s 3’ UTR regulation. For this reason, we have recently generated new Trojan plasmids which allow the retention of the 3’UTR of the host gene in the transcript. We have revised the result section “Neurotransmitter use on pages 11-12 to address this point and have modified the language.

      Weakness 3: The authors mainly rely on the intersectional expression of transcription factors to generate split-Gal4 lines and target hemilineages specifically. However, the Introduction (Lines 97-99) makes a notable point about how driver lines in the past, which have also predominantly relied on the regulatory sequences of transcription factors, lack the temporal stability to investigate hemilineages across time. This point seems to directly conflict with the argument made in the Results (Lines 126-127) that states that most transcription factors are stably expressed in hemilineage neurons that express them. It is generally known that transcription factors can be expressed stably or transiently depending on the context. It is unclear how using the genes of transcription factors in this study circumvents the issue of creating temporally stable driver lines.

      We thank the reviewer for pointing out this apparent paradox, which we have clarified in the manuscript (p4. lines 94-102). Driver lines in the past have relied on the intersection of genes to label a defined set of neurons, which helped marking more narrow cell populations compared to enhancer traps in the adult CNS. Elegant and elaborate screening methods have been devised to identify hemidriver combinations that mark specific subset of neurons in the adult (Meissner et al, 2025 (eLife 98405.2) and citations therein). However, these hemidrivers do not leverage the expression pattern of hemilineage marker genes. Instead, their expression is controlled by random 2-3 kb genomic fragments. We and others observed that these drivers are not stably expressed during development. Hence, hemidrivers combinations that work beautifully to target adult neuronal cel populations can oftentimes not be directly used for developmental studies. Work by Lacin et al. 2014 has demonstrated that transcription factors that mark hemilineages are oftentimes stably expressed in the embryo larvae and even adult. When we made driver lines for these TF, using artificial exons, its complete endogenous enhancers elements remain intact. Consequently, we find that Trojan driver lines recapitulate the expression pattern of the transcription factor gene in which it was inserted, and the hemidrivers are stably expressed during development. Hence, leveraging scRNAseq cluster markers for hemilineages and converting them to Trojan driver lines, the approach we took in this paper, has proven a powerful method to generate stable driver lines for developmental studies.

      Reviewer #3 (Recommendations for the authors):

      (1) Line 14: Affiliations typo should be correct to "St. Louis".

      We thank the reviewer for catching this and have corrected the typo.

      (2) Line 26: "model systems have focused on only on a few".

      We have replaced the words “a few regions” by “select regions” to better contrast that studies to date have been performed, but not at CNS level, due to the lack of genetic driver lines.

      (3) Line 52: The use of "medium" here is ambiguous without a comparison.

      We agree that the term “medium” in line 52 could be ambiguous without context, and we appreciate your suggestion to clarify this. The revised sentence now reads: “Drosophila has served as a powerful model system to investigate how neuronal circuits function due to its medium complexity compared to vertebrate models”

      (4) Line 91-92: Consider shortening to "of behavioral circuit assembly".

      Thank you for this suggestion, we have revised p4 lines 90-91 to: “Thus, taking a hemilineage-based approach is essential for a systematic and comprehensive understanding of behavioral circuit assembly during development in complex nervous systems.”

      (5) Line 216-217: Consider establishing what the expected morphology and neurochemical phenotype for 2A neurons is before presenting findings.

      This suggestion is well-taken, and agree that this paragraph did not fully get the point across we were trying to make. This purpose of this paragraph is to explain our workflow of how we assigned 16 hemilineages to orphan clusters, which is why we present the data in this order and present the morphology of hemilineage 2A last. To accommodate the reviewer’s suggestion, we have now clarified our approach before diving into the results to improve the flow of this paragraph (p8 lines 218-223). Briefly, the starting point to annotate the 16 orphan scRNAseq clusters was each time taking one orphan scRNAseq cluster, picking its top cluster marker genes that had not been established yet as marker genes for any hemilineage, and visualizing the morphology of the neurons that expressed such cluster marker using a reporter line for the cluster marker or an antibody stain for its protein. We then compared this to documented hemilineage morphologies, and to narrow down our search, we compared the observed trajectories to those of unannotated hemilineages that used the same neurotransmitter as the orphan scRNAseq. The evaluation of the documented morphologies of the hemilineages came at the last part of our method to annotate the hemilineages to orphan scRNAseq clusters, which is why we chose to present the expected morphology of a hemilineage at the end.

      (6) If "neurochemical" phenotype and "neurotransmitter" identity are sometimes used interchangeably but seem to mean the same thing. Consider choosing one term throughout.

      We thank the reviewer for this suggestion and have changed the terminology to “neurotransmitter use” (p11-12 lines 326-359).

      (7) Line 235: MARCM technique citation needed.

      We thank the reviewer for pointing this out, the citation (no. 37, p9 line 249) was present in the method section, but we had inadvertently omitted it in the main text and we have now corrected this.

      (8) Line 281: typo, should be "patterns".

      We thank the reviewer for noting this and have corrected this.

      (9) Line 469: End of sentence needs a ".".

      We have added the punctuation mark.

      (10) Line 516: "driver line combinations to express...".

      We have inserted the word “to” to correct it.

      (11) Please make sure that the correct genotypes are matched in the figure legends and Table 1. For instance, knot-GAL4-DBD is listed as the hemi driver for 10B neurons in Figure 3 but only knot-p65.AD is listed in Table 1.

      We thank the reviewer for catching this, we made a mistake and the correct hemidriver combination used in Figure 3L i: knot-GAL4-AD with hb9-GAL4-DBD. We have updated the legend and carefully checked the legends and tables.

      (12) Consider making different color choices for readability when possible and be consistent with labeling CadN. For instance, in Figure 1 the magenta color has three separate meanings: CadN, Acj6, and unc-4. Either of the three genes can be mistaken for the other for a reader mainly paying attention to the magenta color. I find that one color can mean two things in a figure if organized properly but any more begs for confusion. Also, CadN can be easily labeled if used in a new figure (e.g. Figure 1-Supplment 1).

      We thank the reviewer for this insightful observation and have adjusted figure 1 so that cadN is displayed in blue and reporter genes expressing Acj6, Unc-4 or their intersection in green. The legend is modified to reflect these changes.

      (13) If Seurat object changes or additional quality control steps were taken from the original studies, please provide these changes. Similarly, provide any scRNA-seq code used or cite code used for readers to access. Also, provide a section in the methods briefly describing how genes were chosen (criteria) for tool development.

      We thank the reviewer for nothing we had not described our scRNA analysis pipeline and criteria to select transcription factors in the methods section of the manuscript. We have added this section at p19 lines 548-558. Briefly, we used the Seurat object generated by Allen et al., 2015, and did not change quality control steps, normalizations or scaling. Candidate genes to make split-GAL4 drivers from were chosen based on their ability to mark the clusters defined by Allen et al. We did not use computer-based algorithms and made a list of the top cluster markers. Then, we made binary combinations amongst these cluster markers and with hemilineages markers we had identified before (Lacin et al, 2014; Lacin et al 2019), and used the code generated by Allen et al., 2015 (deposited on Github) with Seurat v5 to test if these combinations marked unique clusters. We then prioritized testing these combinations based on the availability of antibodies, BAC lines and CRiMIC/MiMIC constructs to validate their expression pattern prior to creating split-GAL4 lines for these candidates.

      (14) In regard to the seemingly contradictory argument that most transcription factors are stably expressed when most drivers of the past used regulatory elements of transcription factors: the paper could be strengthened by either a) describing how older driver lines differ from the lines presented in the paper or b) remarking on the endogenous temporal stability of the transcription factors used in this study.

      We thank the reviewer for pointing this out, and we agree that it is necessary to clarify this apparent paradox since it is essential for understanding the impact of the present work. We have revised our manuscript described in our response to weakness 1.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a very well-written paper presenting interesting findings related to the recovery following the end-Permian event in continental settings, from N China. The finding is timely as the topic is actively discussed in the scientific community. The data provides additional insights into the faunal, and partly, floral global recovery following the EPE, adding to the global picture.

      Strengths:

      The conclusions are supported by an impressive amount of sedimentological and paleontological data (mainly trace fossils) and illustrations.

      We thank Reviewer #1 for the positive assessments.

      Weaknesses:

      The occurrence of MISS (Microbially Induced Sedimentary Structures) could be discussed more in detail as these provide interesting information directly linked to the delayed recovery of the biota.

      We appreciate the reviewer for highlighting this important point. In the Phanerozoic, increase of microbial abundances generally occurred with rapid warming when documented and those hyperthermal events had causal links to mass extinction in continental realms, including the Permian–Triassic mass extinction (Mays et al., 2021). Accumulations of cyanobacteria and other microbes was favored by low dissolved oxygen concentrations (Pacton et al., 2011) and the produced secondary metabolites may also be toxic to animals (Paerl and Otten, 2013). Therefore, repeated algal and bacterial blooms in the post-extinction interval could disrupt ecological stability and inhibit the restoration of ecosystems.

      So, the sentence from Lines 127–130 “The depauperate ichnofauna of the late Smithian were monospecific, representing initial recolonization of empty niches by opportunists, but the coeval thrived microbial mats indicated harsh environments, which might have inhibited the recovery of freshwater ecosystems (Tu et al., 2016; Chu et al., 2017; Mays et al., 2021).” is rephased by:

      “The depauperate ichnofauna of the late Smithian were monospecific, representing initial recolonization of empty niches by opportunists. However, recurrent occurrences of microbial induced sedimentary structures (MISS) in the Liujiagou Formation imply that depressed ecosystems persisted until the Smithian (Tu et al., 2016; Chu et al., 2017). Studies revealed that the increase in microbial abundances were generally associated with hyperthermals, which would be the principal causes for mass extinction on land (Mays et al., 2021). Accumulations of microbes were favored by low dissolved oxygen concentration condition and their secondary metabolites could also be toxic to animals (Pacton et al., 2011; Paerl and Otten, 2013). Therefore, repeated thriving of MISS during the Dienerian–Smithian disrupted ecological stability in freshwater ecosystem and delayed biotic recovery in North China.”

      References:

      Mays, C., et al. 2021. Lethal microbial blooms delayed freshwater ecosystem recovery following the end-Permian extinction. Nat. Commun. 12, 5511. https://doi.org/10.1038/s41467-021-25711-3

      Pacton, M., et al. 2011. Amorphous organic matter—Experimental data on formation and the role of microbes. Rev. Palaeobot. Palynol. 166, 253–267. https://doi.org/10.1016/j.revpalbo.2011.05.011

      Paerl, H. W. & Otten, T. G. 2013. Harmful cyanobacterial blooms: causes, consequences, and controls. Microb. Ecol. 65, 995–1010. https://doi.org/10.1007/s00248-012-0159-y

      Reviewer #2 (Public review):

      Summary:

      A rapid recovery of the ecosystems during the late Early Triassic, in the aftermath of the end-Permian mass extinction, is discussed based on different types of fossils.

      Strengths:

      The combined study of invertebrate trace fossils, tetrapod bones, and plant remains together with their stratigraphic distribution in different sections provides a convincing case to support a rapid recovery as the authors hypothesize.

      We thank Reviewer #2 for the positive comments on our work.

      Weaknesses:

      The study is based on three regions with Triassic successions from the North China block. While a first-hand study of other localities of similar age would be ideal, this is of course a difficult task. Instead, the authors provide comparisons with other worldwide regions to build their case and support the initial hypothesis.

      Globally, ichnoassemblages reported from the Lower Triassic are relatively impoverished (Guo et al., 2019). We have compiled ichnoassemblages from several continental basins before, including South Africa, Antarctica, North America, European Basin and North China (Fig. 14 in Guo et al., 2019). However, most of the Early Triassic strata lack bioturbation (e.g., Guo et al., 2019, Buatois et al., 2021). On the contrary, the coeval deposits in North China contain diverse trace fossils, making it an ideal place for ichnological investigations. Hence, this study mainly focuses on the ichnological records in North China, but we hope more work will be done in other basins. 

      References:

      Guo, W.W, et al. 2019. Secular variations of ichnofossils from the terrestrial Late Permian–Middle Triassic succession at the Shichuanhe section in Shaanxi Province, North China. Glob. Planet. Change 181, 102978. https://doi.org/10.1016/j.gloplacha.2019.102978

      Buatois, L.A., et al. 2021. Impact of Permian mass extinctions on continental invertebrate infauna. Terra Nova 33, 455–464. https://doi.org/10.1111/ter.12530

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Guo and colleagues features the documentation and interpretation of three successions of continental to marginal marine deposits spanning the P/T transition and their respective ichnofaunas. Based on these new data inferences concerning end-Permian mass extinction and Triassic recovery in the tropical realm are discussed.

      Strengths:

      The manuscript is well-written and organized and includes a large amount of new lithological and ichnological data that illuminate ecosystem evolution in a time of large-scale transition. The lithological documentations, facies interpretations, and ichnotaxonomic assignments look okay (with a few exceptions).

      We thank Reviewer #3 for the positive assessments.

      Weaknesses:

      Some interpretations in Table 1 could be questioned: For facies association FA2 the interpretation as „terrestrial facies with periodical flooding" should be put into the right column and, given the fossil content, other interpretations, such as "marine facies" or "lagoonal environment" with some plant debris and (terrestrial) animal remains washed in, could also be possible. For FA3 the statement "bioturbation is absent" is in conflict with the next statement "strata are moderately reworked". For FA5 the observation of a "monospecific ichnoassemblage" contradicts the listing of several ichnotaxa.

      We thank the reviewer for this feedback. The “FA2: terrestrial facies with periodical flooding” has been moved to the right column. As for the interpretation of depositional environment of FA2, this interval was basically terrestrial accordingly to the well-developed paleosols (Yu et al., 2022). Meanwhile, regional geological surveys have shown a faunal transition in this interval among a series of successions, from typical marine fauna containing Lingula, Eumorphotis, etc. in the southwest to a marine bivalve-terrestrial conchostracan mixed fauna in the northeast (Yin and Lin, 1979; Chu et al., 2019). Therefore, occurrence of episodic transgressions is suggested.

      The FA3: Costal mudplain facies distributed to both the upper Sunjiagou Formation and Lower Heshang Formation (Fig S1), where the former lack bioturbation and the latter were moderately disturbed. We have stated this clearly in the table S1.

      Ichnofauna in FA5 are dominated by Skolithos, Lockeia and Gordia, with only one poorly preserved specimen of Palaeophycus, which are distributed at the Shichuanhe and Liulin sections. However, there ichnotaxa were distributed separately, characterized by low diversity (single ichnogenus) and high density. We have deleted the “monospecific ichnoassemblage” for clarity.

      References:

      Chu, D., et al. 2019, Mixed continental-marine biotas following the Permian-Triassic mass extinction in South and North China: Palaeogeography, Palaeoclimatology, Palaeoecology, v. 519, p. 95–107, doi:10.1016/j.palaeo.2017.10.028.

      Yu, Y., et al. 2021, Latest Permian–Early Triassic paleoclimatic reconstruction by sedimentary and isotopic analyses of paleosols from the Shichuanhe section in central North China Basin: Palaeogeography, Palaeoclimatology, Palaeoecology, v. 585, p. 110726, doi:10.1016/j.palaeo.2021.110726.

      Yin, H.F., Lin, H.M., 1979. Marine Triassic faunas and the geologic time from Shihchienfeng Group in the northern Weihe River Basin, Shaanxi Province. Acta Stratigr. Sin. 3, 233–241 (in Chinese).

      Concerning the structure of the manuscript, certain hypotheses related to the end-Permian mass extinction and the process of the P/T extinction and recovery, namely the existence of a long-persisting "tropic dead zone" are introduced as a foregone conclusion to which the new data seemingly shall be fit as corroborating evidence. Some of the data - e.g. the presence of a supposedly Smithian-age ichnofauna are interpreted as a fast recovery shortening the duration of the "tropic dead zone" episode - but these interpretations could also be interpreted as contradicting the idea of a "dead zone" sensu stricto in favour of a "normal" post-extinction environment with low diversity and occurrence of typical disaster taxa. Due to their large error bars the early Triassic radiometric ages did not put much of a constraint on the age determination of the earliest post-extinction ichnofaunas discussed here.

      In the first ~5 Myr of the Triassic, there is evidence for a broad equatorial belt (30°N-40°S) where marine and terrestrial animals were nearly absent (namely “equatorial tetrapod gap”; Sun et al., 2012). However, the nature, duration and range of the “equatorial tetrapod gap” remain debated. Allen et al. (2020) show poleward migrations of terrestrial tetrapods during the Late Permian to Middle Triassic, with marine reptile diversity peak still restricted to northern low latitudes. Romano et al. (2020) argued that the Early Triassic equatorial terrestrial tetrapod gap would be narrower and restricted the “death belt” between 15° N and about 31° S, while Liu et al. (2022) consider that the exact boundaries of this gap likely varied with climate change (hot phases). Moreover, duration of the gap is also questioned, it’s long-lasting (Late Permian to Middle Triassic), during Induan (Bernardi et al., 2018), or from Induan to the early Spathian (Liu et al., 2022). Regardless of these discrepancies, all the related studies show the existence of the “low latitudinal tetrapod gap”, which is mentioned as background information. On this basis, this study aims to reveal when and how terrestrial ecosystems recovered from the “tropic dead zone” from the ecological point of view, rather than tetrapods only.

      The fast recovered terrestrial ecosystems are represented by diverse traces, and concurrent tetrapods and plants found in the Heshanggou Formation. We acknowledge that the chronostratigraphy of the Lower Triassic in North China (and most of continental basins globally) are not controlled by precise ages, this formation, however, could be constrained to Spathian (or even straddle to earliest Middle Triassic), based on integrated magnetostratigraphic correlation, fossil records and geochemical data (Liu, 2018; Guo et al., 2022). The Smithian-age ichnofaunas here are not interpreted as a rapidly recovering biota, but early occurring opportunist-dominated communities that explore the empty ecospace under inhospitable environments. Our study also constrains roughly the “tropical dead zone” from Induan to late Smithian in North China (Fig. 4).

      References:

      Allen, B.J., et al. 2020. The latitudinal diversity gradient of tetrapods across the Permo-Triassic mass extinction and recovery interval. Proc Biol Sci 287, 20201125. https://doi.org/10.1098/rspb.2020.1125

      Bernardi, M., et al. 2018. Tetrapod distribution and temperature rise during the Permian-Triassic mass extinction. Proc Biol Sci 285, 20172331. https://doi.org/10.1098/rspb.2017.2331

      Guo, W., et al. 2022. Late Permian–Middle Triassic magnetostratigraphy in North China and its implications for terrestrial-marine correlations. Earth Planet. Sci. Lett. 585, 117519. https://doi.org/10.1016/j.epsl.2022.117519

      Liu, J. 2018. New progress on the correlation of Chinese terrestrial Permo-Triassic strata. Vertebrata Palasiatica, 56, 327-342. 10.19615/j.cnki.1000-3118.180709

      Liu, J., et al. 2021. Permo-Triassic tetrapods and their climate implications. Glob. Planet. Change 103618. https://doi.org/10.1016/j.gloplacha.2021.103618

      Romano, M., et al. 2020. Early Triassic terrestrial tetrapod fauna: a review. Earth-Sci. Rev. 210, 103331. https://doi.org/10.1016/j.earscirev.2020.103331

      Sun, Y., er al. 2012. Lethally hot temperatures during the early triassic greenhouse. Science 338, 366–70. https://doi.org/10.1126/science.1224126

      Considering the somewhat equivocal evidence and controversial ideas about the P/T transition, the introduction could be improved by describing how the idea of a "tropic dead zone" arose against the background of earlier ideas, alternative views, and conflicting data. In the discussion section, alternative interpretations of the extensive data presented here - e.g. proximal-distal shifts in lithofacies with respect to the sediment source, sea level changes, preservation bias, the local occurrence of hostile environments instead of a regional scale, etc. should be discussed, also to avoid the impression that the author's conclusion was driven by confirmation bias.

      As mentioned above, it’s still controversial about the nature, duration and range of the “equatorial tetrapod gap”, which primarily derived from the database (body fossils only vs. both skeletal and footprint data) and analytical methods. However, detailed discussions about these differences are beyond the scope of our study. This paper provides new evidence for the "tropical dead zone" from the ecological perspective (invertebrate ichnology, paleobotany and newly found tetrapods). Our results show that the "tropical dead zone" in North China terminated in the Smithian, followed by the reappearance of many animals in the Spathian, shedding light on the more rapidly recovering terrestrial ecosystems than previously thought.

      We have improved the Introduction section by providing a summary of the “equatorial tetrapod gap”. Lines 33-35: “A tropical “tetrapod gap”, spanning between 15°N and ~31°S, prevailed through the Early Triassic, or at least during particular intervals of intense global warming (Bernardi et al., 2018; Allen et al., 2020; Romano et al., 2020; Liu et al., 2022).” is revised to:

      “A tropical “tetrapod gap”, spanning between 15°N and ~31°S, prevailed in the Early Triassic, or at particular interval of intense global warming, even though the nature, duration and range remain debated (Bernardi et al., 2018; Allen et al., 2020; Romano et al., 2020; Liu et al., 2022).”

      In the Discussion section, Lines 180-181: “Although the specimens are not yet fully prepared for taxonomic description, they clearly show the existence of tetrapod at this level” is revised to:

      “Although the specimens are not yet fully prepared for taxonomic description, they clearly show the existence of tetrapods at this level, narrowing the “tetrapod gap” to the Spathian.”

      we also add a new paragraph from Line 208:

      “Our results also shed light on the timing of the tropical dead zone. The late Smithian-age ichnofauna, although impoverished, represents early opportunist-dominated communities that explored empty ecospace under inhospitable environments, which constrains the equatorial death belt to the late Smithian in North China.”

      Contrary to the authors' claim, Figures S7 and S8 suggest that burrow size does not vary much within the studied sections. Size decreases and increases in the Shichuanhe and Liulin sections do not contemporaneously, are usually within the error-bar range, and might be driven by ichnotaxa composition, i.e. the presence or absence of larger ichnotaxa, rather than by size changes in the same ichnotaxon (and producer group). Here the measurement data would be needed as well to check the basis of the authors' interpretations.

      We thank the reviewer for highlighting this important point. We have checked the accuracy of our raw data. Both the average size of all ichnogenera and single ichnogenera do not change obviously, but increase slightly upwards in the Spathian (Figures S7c and S8). This tendency is congruent with other coeval studies in North China (e.g., Shu et al., 2018; Xing et al., 2020). The presence of larger ichnotaxa will indeed improve the average sizes of fossil-bearing horizons, however, burrows of single ichnogenera in the Spathian generally show wider size distributions than in the Smithian, which might be associated with enriched producer groups or different growth stages of the same biota.

      The asynchronous burrow size changes in the Shichuanhe and Liulin sections could be attributed to sedimentary facies. Late Permian deposits at Shichuanhe are finer than those at Linlin, which is located at the basin margin. As a result, tiny traces, like Helminthoidichnites, which were widely distributed at Shichuanhe, are absent at Linlin section. Those traces significantly reduce the average sizes in this interval, leading to inconsistent size variation patterns.

      References:

      Shu, W., et al. 2018. Limuloid trackways from Permian-Triassic continental successions of North China. Palaeogeogr. Palaeoclimatol. Palaeoecol. 508, 71–90. https://doi.org/10.1016/j.palaeo.2018.07.022

      Xing, Z.F., et al. 2020. Trace fossils from the Lower Triassic of North China—a potential signature of the gradual recovery of a terrestrial ecosystem. Palaeoworld 30, 95–105. https://doi.org/10.1016/j.palwor.2020.06.002

      Some arthropod tracks assigned here to Kouphichnium might not represent limulid traces but other (non-marine) arthropod taxa in accordance with their occurrence in terrestrial facies/non-marine units of the succession. More generally, the ichnotaxonomy of arthropod trackways is not yet well reserved - beyond Kouphichnium and Diplichnites various similar-looking types may occur that can have a variety of distinct insect, crustacean, millipede, etc. producers (including larval stages).

      Well, individual trace-makers can produce different traces, and different organisms can make morphologically similar traces. In consideration of this, it’s hard to give a one-on-one relationship between trace fossils and their producers in most cases, especially for the invertebrates. So, Kouphichnium could be made by arthropods other than limuloidss.

      However, horseshoe crabs, originating in the early Ordovician, invaded freshwater environments twice in the Paleozoic and once in the Mesozoic (Lamsdell, 2016), and their body fossils have been found from the Early Triassic of Germany (e.g., Hauschke and Wilde, 2008) and North China (which occur with their traces; unpublished data). Accordingly, we tentatively speculate Kouphichnium found in this interval could be primarily produced by limuloids.

      References:

      Hauschke, N., Wilde, V. 2008. Limuliden aus dem Oberen Buntsandstein von Süddeutschland. Hallesches Jahrb. Für Geowiss. 30, 21–26.

      Lamsdell, J.C. 2016. Horseshoe crab phylogeny and independent colonizations of fresh water: ecological invasion as a driver for morphological innovation. Palaeontology 59, 181–194. https://doi.org/10.1111/pala.12220

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      (1)  Line 112 - was identified during..; please change to ...was identified in successions of late Changsian-early Smithian age.

      Revised as suggested.

      (2)  Line 116 - change prolong to prolonged.

      Revised as suggested.

      (3) Line 121 - change ichnofaunal to ichnofauna (check the entire sentence).

      We checked the manuscript thoroughly and revised as suggested.

      (4) Figure 1 caption - check sentence starting with - Base map...(delete 'of is')

      Revised as suggested.

      (5) Line 471 - tiny instead of tinny.

      Revised as suggested.

      (6) Figure S9 - would it be possible to include this reconstruction in the main manuscript?

      We have moved the artistic illustration to the main text as Figure 5.

      (7) Add the illustrators name / or indicate if it is produced by AI.

      We have added the sentence “The artistic illustration is credited to J. Sun” at the end.

      Reviewer #2 (Recommendations for The Authors):

      (1) Line 15 – change 252 million years ago to ca. 252 million years ago.

      Revised as suggested.

      (2) Line 18 – change low-latitude North China to low-latitude present-day North China.

      Actually, the paleolatitude of North China during the Early Triassic is about 17-18°N according to paleomagnetic results (Huang et al., 2018; Guo et al., 2022,).

      References:

      Guo, W., et al. 2022. Late Permian–Middle Triassic magnetostratigraphy in North China and its implications for terrestrial-marine correlations. Earth Planet. Sci. Lett. 585, 117519. https://doi.org/10.1016/j.epsl.2022.117519

      Huang, B., et al. 2018. Paleomagnetic constraints on the paleogeography of the east asian blocks during Late Paleozoic and Early Mesozoic times. Earth-Sci. Rev. 186, 8–36. https://doi.org/10.1016/j.earscirev.2018.02.004

      (3) Line 25 - "possible" doesn't seem the appropriate term here for the structure of the sentence. Could it be "to make possible" that it meant? Or otherwise you could write "possibly". Please revise this.

      Revised “possible” to “possibly”.

      (4) Line 33 – change “are” to “were”.

      Revised as suggested.

      (5) Line 43 – There are other, more appropriate articles that should (also) be cited here, especially because Mujal et al. (2017) doesn't deal with the Central European Basin (so you could even remove this reference). For sure this one should be cited:

      Scholze, F., Wang, Z., Kirscher, U., Kraft, J., Schneider, J.W., Götz, A.E., Joachimski, M.M., Bachtadse, V., 2017. A multistratigraphic approach to pinpoint the Permian-Triassic boundary in continental deposits: the Zechstein–Lower Buntsandstein transition in Germany. Glob. Planet. Chang. 152, 129–151. http://dx.doi.org/10.1016/j.gloplacha.2017.03.004.

      We have replaced Mujal’s paper with Scholze et al., (2017) in the main text.

      (6) Line 46 – change “Roopnarinev et al., 2019” to “Roopnarine et al., 2019”.

      Revised as suggested.

      (7) Line 53 – Here Mujal et al. (2017) would be more appropriate, since it deals with a basin from the western peri-Tethys, also, this other article by Mujal et al. (2017) discussed the recovery in the western peri-Tethys based on tetrapod footprints:

      Mujal, E., Fortuny, J., Bolet, A., Oms, O., López, J.Á., 2017. An archosauromorph dominated ichnoassemblage in fluvial settings from the late Early Triassic of the Catalan Pyrenees (NE Iberian Peninsula). PLoS One 12 (4), e0174693. http://dx.doi.org/10.1371/journal.pone.0174693.

      Revised as suggested.

      (8) Line 58 – change “relatively diversified trace fossils have been found during the late Early Triassic” to “because relatively diversified trace fossils have been found in upper Lower Triassic deposits”.

      Revised as suggested.

      (9) Line 58 – change “recovered” to “ecosystems recovered”.

      Revised as suggested.

      (10) Line 81 – These two paragraphs could be under a section named Geological setting or similar.

      Yes, these two paragraphs are brief introductions of the geological background of North China, so we change the section name to “Geological Settings and Methods”.

      (11) Line 99 – change “behavioural” to “behavioral”.

      Revised as suggested and check spelling throughout.

      (12) Line 103 – add “is” before adopted.

      The sentence “Tiering, referring to the life position of an animal vertically in the sediment, is divided into surficial, semi-infaunal (0–0.5 cm), shallow (0.5–6 cm), intermediate (6–12 cm) and deep infaunal tiers (> 12 cm), adopted from Minter et al. (2017).” is changed to “…, based on Minter et al. (2017).”

      (13) Line 113 –change “mainly” to “were mainly”.

      Revised as suggested

      (14) Line 116 - change prolong to prolonged.

      Revised as suggested.

      (15) Line 121 – add “preserved” before in.

      Revised as suggested.

      (16) Line 123 - change “were” to “are”.

      Revised as suggested.

      (17) Line 127 – “Kouphichnium” instead of “Kouphichnim”.

      Revised as suggested.

      (18) Line 135 – change to “Occupied by”.

      Revised as suggested.

      (19) Line 140 – change “bioturbations” to “bioturbated deposits”.

      Revised as suggested.

      (20) Line 145 – “Spathian” rather than “Spthian”.

      Revised as suggested.

      (21) Line 140 – change “displayed” to “displays”.

      Revised as suggested.

      (22) Line 160 – change “continental” to “terrestrial”.

      Revised as suggested.

      (23) Line 165 – “Marchetti” rather than “Marchettti”.

      Revised as suggested.

      (24) Line 168 – change “relationships” to “relation”.

      Revised as suggested.

      (25) Line 177 – “including” instead of “includes”.

      Revised as suggested.

      (26) Line 181 and Line 214– change “tetrapod” to “tetrapods”.

      Revised as suggested.

      (27) Line 195 and Line 218 – change “cooccurred” to “co-occurring”.

      Revised as suggested.

      (28) Line 540 – delete “herein”.

      Revised as suggested.

      (28) Line 559 – “Helminthoidichnites tenuis”, it should be in italics.

      Revised as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper is an elegant, mostly observational work, detailing observations that polysome accumulation appears to drive nucleoid splitting and segregation. Overall I think this is an insightful work with solid observations.

      Thank you for your appreciation and positive comments. In our view, an appealing aspect of this proposed biophysical mechanism for nucleoid segregation is its self-organizing nature and its ability to intrinsically couple nucleoid segregation to biomass growth, regardless of nutrient conditions.

      Strengths:

      The strengths of this paper are the careful and rigorous observational work that leads to their hypothesis. They find the accumulation of polysomes correlates with nucleoid splitting, and that the nucleoid segregation occurring right after splitting correlates with polysome segregation. These correlations are also backed up by other observations:

      (1) Faster polysome accumulation and DNA segregation at faster growth rates.

      (2) Polysome distribution negatively correlating with DNA positioning near asymmetric nucleoids.

      (3) Polysomes form in regions inaccessible to similarly sized particles.

      These above points are observational, I have no comments on these observations leading to their hypothesis.

      Thank you!

      Weaknesses:

      It is hard to state weaknesses in any of the observational findings, and furthermore, their two tests of causality, while not being completely definitive, are likely the best one could do to examine this interesting phenomenon.

      It is indeed difficult to prove causality in a definitive manner when the proposed coupling mechanism between nucleoid segregation and gene expression is self-organizing, i.e., does not involve a dedicated regulatory molecule (e.g., a protein, RNA, metabolite) that we could have eliminated through genetic engineering to establish causality. We are grateful to the reviewer for recognizing that our two causality tests are the best that can be done in this context.

      Points to consider / address:

      Notably, demonstrating causality here is very difficult (given the coupling between transcription, growth, and many other processes) but an important part of the paper. They do two experiments toward demonstrating causality that help bolster - but not prove - their hypothesis. These experiments have minor caveats, my first two points.

      (1) First, "Blocking transcription (with rifampicin) should instantly reduce the rate of polysome production to zero, causing an immediate arrest of nucleoid segregation". Here they show that adding rifampicin does indeed lead to polysome loss and an immediate halting of segregation - data that does fit their model. This is not definitive proof of causation, as rifampicin also (a) stops cell growth, and (b) stops the translation of secreted proteins. Neither of these two possibilities is ruled out fully.

      That’s correct; cell growth also stops when gene expression is inhibited, which is consistent with our model in which gene expression within the nucleoid promotes nucleoid segregation and biomass growth (i.e., cell growth), inherently coupling these two processes. This said, we understand the reviewer’s point: the rifampicin experiment doesn’t exclude the possibility that protein secretion and cell growth drive nucleoid segregation. We are assuming that the reviewer is envisioning an alternative model in which sister nucleoids would move apart because they would be attached to the membrane through coupled transcription-translation-protein secretion (transertion) and the membrane would expand between the separating nucleoids, similar to the model proposed by Jacob et al in 1963 (doi:10.1101/SQB.1963.028.01.048). There are several observations arguing against cell elongation/transertion acting a predominant mechanism of nucleoid segregation.

      (1) For this alternative mechanism to work, membrane growth must be localized at the middle of the splitting nucleoids (i.e., midcell position for slow growth and ¼ and ¾ cell positions for fast growth) to create a directional motion. To our knowledge, there is no evidence of such localized membrane incorporation. Furthermore, even if membrane growth was localized at the right places, the fluidity of the cytoplasmic membrane (PMID: 6996724, 20159151, 24735432, 27705775) would be problematic. To circumvent the membrane fluidity issue, one could potentially evoke an additional connection to the rigid peptidoglycan, but then again, peptidoglycan growth would have to be localized at the middle of the splitting nucleoid. However, peptidoglycan growth is dispersed early in the cell division cycle when the nucleoid splitting happens in fast growing cells and only appears to be zonal after the onset of cell constriction (PMID: 35705811, 36097171, 2656655).

      (2) Even if we ignore the aforementioned caveats, Paul Wiggins’s group ruled out the cell elongation/transertion model by showing that the rate of cell elongation is slower than the rate of chromosome segregation (PMID: 23775792). In our revised manuscript, we clarify this point and provide confirmatory data showing that the cell elongation rate is indeed slower than the nucleoid segregation rate (Figure 1H and Figure 1 - figure supplement 5A), indicating that it cannot be the main driver.

      (3) The asymmetries in nucleoid compaction that we described in our paper are predicted by our model. We do not see how they could be explained by cell growth or protein secretion.

      (4) We also show that polysome accumulation at ectopic sites (outside the nucleoid) results in correlated nucleoid dynamics, consistent with our proposed mechanism. It is not clear to us how such nucleoid dynamics could be explained by cell growth or protein secretion (transertion).

      (1a) As rifampicin also stops all translation, it also stops translational insertion of membrane proteins, which in many old models has been put forward as a possible driver of nucleoid segregation, and perhaps independent of growth. This should at last be mentioned in the discussion, or if there are past experiments that rule this out it would be great to note them.

      It is not clear to us how the attachment of the DNA to the cytoplasmic membrane could alone create a directional force to move the sister nucleoids. We agree that old models have proposed a role for cell elongation (providing the force) and transertion (providing the membrane tether). Please see our response above for the evidence (from the literature and our work) against it. This was mentioned in the Introduction and Results section, but we agree that this was not well explained. We have now put emphasis on the related experimental data (Figure 1H, Figure 1 – figure supplement 5A, ) and revised the text (lines 199 - 210) to clarify these points.

      (1b) They address at great length in the discussion the possibility that growth may play a role in nucleoid segregation. However, this is testable - by stopping surface growth with antibiotics. Cells should still accumulate polysomes for some time, it would be easy to see if nucleoids are still segregated, and to what extent, thereby possibly decoupling growth and polysome production. If successful, this or similar experiments would further validate their model.

      We reviewed the literature and could not find a drug that stops cell growth without stopping gene expression. Any drug that affects the integrity or potential of the membrane depletes cells of ATP; without ATP, gene expression is inhibited. However, our experiment in which we drive polysome accumulation at ectopic sites decouples polysome accumulation from cell growth. In this experiment, by redirecting most of chromosome gene expression to a single plasmid-encoded gene, we reduce the rate of cell growth but still create a large accumulation of polysomes at an ectopic location. This ectopic polysome accumulation is sufficient to affect nucleoid dynamics in a correlated fashion. In the revised manuscript, we have clarified this point and added model simulations (Figure 7 – figure supplement 2) to show that our experimental observations are predicted by our model.

      (2) In the second experiment, they express excess TagBFP2 to delocalize polysomes from midcell. Here they again see the anticorrelation of the nucleoid and the polysomes, and in some cells, it appears similar to normal (polysomes separating the nucleoid) whereas in others the nucleoid has not separated. The one concern about this data - and the differences between the "separated" and "non-separated" nuclei - is that the over-expression of TagBFP2 has a huge impact on growth, which may also have an indirect effect on DNA replication and termination in some of these cells. Could the authors demonstrate these cells contain 2 fully replicated DNA molecules that are able to segregate?

      We have included new flow cytometry data of fluorescently labeled DNA to show that DNA replication is not impacted.

      (3) What is not clearly stated and is needed in this paper is to explain how polysomes do (or could) "exert force" in this system to segregate the nucleoid: what a "compaction force" is by definition, and what mechanisms causes this to arise (what causes the "force") as the "compaction force" arises from new polysomes being added into the gaps between them caused by thermal motions.

      They state, "polysomes exert an effective force", and they note their model requires "steric effects (repulsion) between DNA and polysomes" for the polysomes to segregate, which makes sense. But this makes it unclear to the reader what is giving the force. As written, it is unclear if (a) these repulsions alone are making the force, or (b) is it the accumulation of new polysomes in the center by adding more "repulsive" material, the force causes the nucleoids to move. If polysomes are concentrated more between nucleoids, and the polysome concentration does not increase, the DNA will not be driven apart (as in the first case) However, in the second case (which seems to be their model), the addition of new material (new polysomes) into a sterically crowded space is not exerting force - it is filling in the gaps between the molecules in that region, space that needs to arise somehow (like via Brownian motion). In other words, if the polysome region is crowded with polysomes, space must be made between these polysomes for new polysomes to be inserted, and this space must be made by thermal (or ATP-driven) fluctuations of the molecules. Thus, if polysome accumulation drives the DNA segregation, it is not "exerting force", but rather the addition of new polysomes is iteratively rectifying gaps being made by Brownian motion.

      We apologize for the understandable confusion. In our picture, the polysomes and DNA (conceptually considered as small plectonemic segments) basically behave as dissolved particles. If these particles were noninteracting, they would simply mix. However, both polysomes and DNA segments are large enough to interact sterically. So as density increases, steric avoidance implies a reduced conformational entropy and thus a higher free energy per particle. We argue (based on Miangolarra et al. 2021 PMID: 34675077 and Xiang et al. 2021 PMID: 34186018) that the demixing of polysomes and DNA segments occurs because DNA segments pack better with each other than they do with polysomes. This raises the free energy cost associated with DNA-polysome interactions compared to DNA-DNA interactions. We model this effect by introducing a term in the free energy χ_np, which refers to as a repulsion between DNA and polysomes, though as explained above it arises from entropic effects. At realistic cellular densities of DNA and polysomes, this repulsive interaction is strong enough to cause the DNA and polysomes to phase separate.

      This same density-dependent free energy that causes phase separation can also give rise to forces, just in the way that a higher pressure on one side of a wall can give rise to a net force on the wall. Indeed, the “compaction force” we refer to is fundamentally an osmotic pressure difference. At some stages during nucleoid segregation, the region of the cell between nucleoids has a higher polysome concentration, and therefore a higher osmotic pressure, than the regions near the poles. This results in a net poleward force on the sister nucleoids that drives their migration toward the poles. This migration continues until the osmotic pressure equilibrates. Therefore, both phase separation (due to the steric repulsion described above) and nonequilibrium polysome production and degradation (which creates the initial accumulation of polysomes around midcell) are essential ingredients for nucleoid segregation.

      This has been clarified in the revised text, with the support of additional simulation results showing how the asymmetry in polysome distribution causes a compaction force (Figure 4A).

      The authors use polysome accumulation and phase separation to describe what is driving nucleoid segregation. Both terms are accurate, but it might help the less physically inclined reader to have one term, or have what each of these means explicitly defined at the start. I say this most especially in terms of "phase separation", as the currently huge momentum toward liquid-liquid interactions in biology causes the phrase "phase separation" to often evoke a number of wider (and less defined) phenomena and ideas that may not apply here. Thus, a simple clear definition at the start might help some readers.

      In our case, phase separation means that the DNA-polysome steric repulsion is strong enough to drive their demixing, which creates a compact nucleoid. As mentioned in a previous point, this effect is captured in the free energy by the χ_np term, which is an effective repulsion between DNA and polysomes, though it arises from entropic effects.

      In the revised manuscript, we now illustrate this with our theoretical model by initializing a cell with a diffuse nucleoid and low polysome concentration. For the sake of simplicity, we assume that the cell does not elongate. We observe that the DNA-polysome steric repulsion is sufficient to compact the nucleoid and place it at mid-cell (new Figure 4A).

      (4) Line 478. "Altogether, these results support the notion that ectopic polysome accumulation drives nucleoid dynamics". Is this right? Should it not read "results support the notion that ectopic polysome accumulation inhibits/redirects nucleoid dynamics"?

      We think that the ectopic polysome accumulation drives nucleoid dynamics. In our theoretical model, we can introduce polysome production at fixed sources to mimic the experiments where ectopic polysome production is achieved by high plasmid expression. The model is able to recapitulate the two main phenotypes observed in experiments (Figure 7). These new simulation results have been added to the revised manuscript (Figure 7 – figure supplement 2).

      (5) It would be helpful to clarify what happens as the RplA-GFP signal decreases at midcell in Figure 1- is the signal then increasing in the less "dense" parts of the cell? That is, (a) are the polysomes at midcell redistributing throughout the cell? (b) is the total concentration of polysomes in the entire cell increasing over time?

      It is a redistribution—the RplA-GFP signal remains constant in concentration from cell birth to division (Figure 1 – Figure Supplement 1E). This is now clarified in the revised text.

      (6) Line 154. "Cell constriction contributed to the apparent depletion of ribosomal signal from the mid-cell region at the end of the cell division cycle (Figure 1B-C and Movie S1)" - It would be helpful if when cell constriction began and ended was indicated in Figures 1B and C.

      Good idea. We have added markers in Figure 1C to indicate the average start of cell constriction. This relative time from birth to division was estimated as described in the new Figure 1 – figure supplement 2. We have also indicated that cell birth and division correspond to the first and last images/timepoint in Figure 1B and C, respectively. The two-imensional average cell projections presented in Figure 3D also indicate the average timing of cell constriction, consistent with our analysis in Figure 1 – figure supplement 2.

      (7) In Figure 7 they demonstrate that radial confinement is needed for longitudinal nucleoid segregation. It should be noted (and cited) that past experiments of Bacillus l-forms in microfluidic channels showed a clear requirement role for rod shape (and a given width) in the positing and the spacing of the nucleoids.

      Wu et al, Nature Communications, 2020. "Geometric principles underlying the proliferation of a model cell system" https://dx.doi.org/10.1038/s41467-020-17988-7

      Good point! We have revised the text to mention this work. Thank you.

      (8) "The correlated variability in polysome and nucleoid patterning across cells suggests that the size of the polysome-depleted spaces helps determine where the chromosomal DNA is most concentrated along the cell length. This patterning is likely reinforced through the displacement of the polysomes away from the DNA dense region"

      It should be noted this likely functions not just in one direction (polysomes dictating DNA location), but also in the reverse - as the footprint of compacted DNA should also exclude (and thus affect) the location of polysomes

      We agree that the effects could go both ways at this early stage of the story. We have revised the text accordingly.

      (9) Line 159. Rifampicin is a transcription inhibitor that causes polysome depletion over time. This indicates that all ribosomal enrichments consist of polysomes and therefore will be referred to as polysome accumulations hereafter". Here and throughout this paper they use the term polysome, but cells also have monosomes (and 2 somes, etc). Rifampicin stops the assembly of all of these, and thus the loss of localization could occur from both. Thus, is it accurate to state that all transcription events occur in polysomes? Or are they grouping all of the n-somes into one group?

      In the original discussion, we noted that our term “polysomes” also includes monosomes for simplicity, but we agree that the term should have been defined much earlier. The manuscript has been revised accordingly. Furthermore, in the revised manuscript, we have included additional simulation results with three different diffusion coefficients that reflect different polysome sizes to show that different polysome species with less or more ribosomes give similar results (Figure 4 – figure supplement 4). This shows that the average polysome description in our model is sufficient.

      Thank you for the valuable comments and suggestions!

      Reviewer #2 (Public review):

      Summary:

      The authors perform a remarkably comprehensive, rigorous, and extensive investigation into the spatiotemporal dynamics between ribosomal accumulation, nucleoid segregation, and cell division. Using detailed experimental characterization and rigorous physical models, they offer a compelling argument that nucleoid segregation rates are determined at least in part by the accumulation of ribosomes in the center of the cell, exerting a steric force to drive nucleoid segregation prior to cell division. This evolutionarily ingenious mechanism means cells can rely on ribosomal biogenesis as the sole determinant for the growth rate and cell division rate, avoiding the need for two separate 'sensors,' which would require careful coupling.

      Terrific summary! Thank you for your positive assessment.

      Strengths:

      In terms of strengths; the paper is very well written, the data are of extremely high quality, and the work is of fundamental importance to the field of cell growth and division. This is an important and innovative discovery enabled through a combination of rigorous experimental work and innovative conceptual, statistical, and physical modeling.

      Thank you!

      Weaknesses:

      In terms of weaknesses, I have three specific thoughts.

      Firstly, my biggest question (and this may or may not be a bona fide weakness) is how unambiguously the authors can be sure their ribosomal labeling is reporting on polysomes, specifically. My reading of the work is that the loss of spatial density upon rifampicin treatment is used to infer that spatial density corresponds to polysomes, yet this feels like a relatively indirect way to get at this question, given rifampicin targets RNA polymerase and not translation. It would be good if a more direct way to confirm polysome dependence were possible.

      The heterogeneity of ribosome distribution inside E. coli cells has been attributed to polysomes by many labs (PMID: 25056965, 38678067, 22624875, 31150626, 34186018, 10675340). The attribution is also consistent with single-molecule tracking experiments showing that slow-moving ribosomes (polysomes) are excluded by the nucleoid whereas fast-diffusing ribosomes (free ribosomal subunits) are distributed throughout the cytoplasm (PMID: 25056965, 22624875). These points are now mentioned in the revised manuscript.

      Second, the authors invoke a phase separation model to explain the data, yet it is unclear whether there is any particular evidence supporting such a model, whether they can exclude simpler models of entanglement/local diffusion (and/or perhaps this is what is meant by phase separation?) and it's not clear if claiming phase separation offers any additional insight/predictive power/utility. I am OK with this being proposed as a hypothesis/idea/working model, and I agree the model is consistent with the data, BUT I also feel other models are consistent with the data. I also very much do not think that this specific aspect of the paper has any bearing on the paper's impact and importance.

      We appreciate the reviewer’s comment, but the output of our reaction-diffusion model is a bona fide phase separation (spinodal decomposition). So, we feel that we need to use the term when reporting the modeling results. Inside the cell, the situation is more complicated. As the reviewer points out, there are likely entanglements (not considered in our model) and other important factors (please see our discussion on the model limitations). This said, we have revised our text to clarify our terms and proposed mechanism.

      Finally, the writing and the figures are of extremely high quality, but the sheer volume of data here is potentially overwhelming. I wonder if there is any way for the authors to consider stripping down the text/figures to streamline things a bit? I also think it would be useful to include visually consistent schematics of the question/hypothesis/idea each of the figures is addressing to help keep readers on the same page as to what is going on in each figure. Again, there was no figure or section I felt was particularly unclear, but the sheer volume of text/data made reading this quite the mental endurance sport! I am completely guilty of this myself, so I don't think I have any super strong suggestions for how to fix this, but just something to consider.

      We agree that there is a lot to digest. We could not come up with great ideas for visuals others than the schematics we already provide. However, we have revised the text to clarify our points and added a simulation result (Figure 4A) to help explain biophysical concepts.

      Reviewer #3 (Public review):

      Summary:

      Papagiannakis et al. present a detailed study exploring the relationship between DNA/polysome phase separation and nucleoid segregation in Escherichia coli. Using a combination of experiments and modelling, the authors aim to link physical principles with biological processes to better understand nucleoid organisation and segregation during cell growth.

      Strengths:

      The authors have conducted a large number of experiments under different growth conditions and physiological perturbations (using antibiotics) to analyse the biophysical factors underlying the spatial organisation of nucleoids within growing E. coli cells. A simple model of ribosome-nucleoid segregation has been developed to explain the observations.

      Weaknesses:

      While the study addresses an important topic, several aspects of the modelling, assumptions, and claims warrant further consideration.

      Thank you for your feedback. Please see below for a response to each concern.

      Major Concerns:

      Oversimplification of Modelling Assumptions:

      The model simplifies nucleoid organisation by focusing on the axial (long-axis) dimension of the cell while neglecting the radial dimension (cell width). While this approach simplifies the model, it fails to explain key experimental observations, such as:

      (1) Inconsistencies with Experimental Evidence:

      The simplified model presented in this study predicts that translation-inhibiting drugs like chloramphenicol would maintain separated nucleoids due to increased polysome fractions. However, experimental evidence shows the opposite-separated nucleoids condense into a single lobe post-treatment (Bakshi et al 2014), indicating limitations in the model's assumptions/predictions. For the nucleoids to coalesce into a single lobe, polysomes must cross the nucleoid zones via the radial shells around the nucleoid lobes.

      We do not think that the results from chloramphenicol-treated cells are inconsistent with our model. Our proposed mechanism predicts that nucleoids will condense in the presence of chloramphenicol, consistent with experiments. It also predicts that nucleoids that were still relatively close at the time of chloramphenicol treatment could fuse if they eventually touched through diffusion (thermal fluctuation) to reduce their interaction with the polysomes and minimize their conformational energy. Fusion is, however, not expected for well-separated nucleoids since their diffusion is slow in the crowded cytoplasm. This is consistent with our experimental observations: In the presence of a growth-inhibitory concentration of chloramphenicol (70 μg/mL), nucleoids in relatively close proximity can fuse, but well-separated nucleoids condense and do not fuse. Since the growth rate inhibition is not immediate upon chloramphenicol treatment, many cells with well-separated condensed nucleoids divide during the first hour. As a result, the non-fusion phenotype is more obvious in non-dividing cells, achieved by pre-treating cells with the cell division inhibitor cephalexin (50μg/mL). In these polyploid elongated cells, well-separated nucleoids condensed but did not fuse, not even after an hour in the presence of chloramphenicol. We have revised the manuscript to add these data (illustrative images + a quantitative analysis) in Figure 4 – figure supplement 1.

      (2) The peripheral localisation of nucleoids observed after A22 treatment in this study and others (e.g., Japaridze et al., 2020; Wu et al., 2019), which conflicts with the model's assumptions and predictions. The assumption of radial confinement would predict nucleoids to fill up the volume or ribosomes to go near the cell wall, not the nucleoid, as seen in the data.

      The reviewer makes a good point that DNA attachment to the membrane through transertion could contribute to the nucleoid being peripherally localized in A22 cells. We have revised the text to add this point. However, we do not think that this contradicts the proposed nucleoid segregation mechanism described in our model. On the contrary, by attaching the nucleoid to the cytoplasmic membrane along the cell width, transertion might help reduce the diffusion and thus exchange of polysomes across nucleoids. We have revised the text to discuss transertion over radial confinement.

      (3) The radial compaction of the nucleoid upon rifampicin or chloramphenicol treatment, as reported by Bakshi et al. (2014) and Spahn et al. (2023), also contradicts the model's predictions. This is not expected if the nucleoid is already radially confined.

      We originally evoked radial confinement to explain the observation that polysome accumulations do not equilibrate between DNA-free regions. We agree that transertion is an alternative explanation. Thank you for bringing it to our attention. However, please note that this does not contradict the model. In our view, it actually supports the 1D model by providing a reasonable explanation for the slow exchange of polysomes across DNA-free regions. The attachment of the nucleoid to the membrane along the cell width may act as diffusion barrier. We have revised the text and the title of the manuscript accordingly.

      (4) Radial Distribution of Nucleoid and Ribosomal Shell:

      The study does not account for well-documented features such as the membrane attachment of chromosomes and the ribosomal shell surrounding the nucleoid, observed in super-resolution studies (Bakshi et al., 2012; Sanamrad et al., 2014). These features are critical for understanding nucleoid dynamics, particularly under conditions of transcription-translation coupling or drug-induced detachment. Work by Yongren et al. (2014) has also shown that the radial organisation of the nucleoid is highly sensitive to growth and the multifork nature of DNA replication in bacteria.

      We have revised the manuscript to discuss the membrane attachment. Please see the previous response.

      The omission of organisation in the radial dimension and the entropic effects it entails, such as ribosome localisation near the membrane and nucleoid centralisation in expanded cells, undermines the model's explanatory power and predictive ability. Some observations have been previously explained by the membrane attachment of nucleoids (a hypothesis proposed by Rabinovitch et al., 2003, and supported by experiments from Bakshi et al., 2014, and recent super-resolution measurements by Spahn et al.).

      We agree—we have revised the text to discuss membrane attachment in the radial dimension. See previous responses.

      Ignoring the radial dimension and membrane attachment of nucleoid (which might coordinate cell growth with nucleoid expansion and segregation) presents a simplistic but potentially misleading picture of the underlying factors.

      Please see above.

      This reviewer suggests that the authors consider an alternative mechanism, supported by strong experimental evidence, as a potential explanation for the observed phenomena:

      Nucleoids may transiently attach to the cell membrane, possibly through transertion, allowing for coordinated increases in nucleoid volume and length alongside cell growth and DNA replication. Polysomes likely occupy cellular spaces devoid of the nucleoid, contributing to nucleoid compaction due to mutual exclusion effects. After the nucleoids separate following ter separation, axial expansion of the cell membrane could lead to their spatial separation.

      This “membrane attachment/cell elongation” model is reminiscent to the hypothesis proposed by Jacob et al in 1963 (doi:10.1101/SQB.1963.028.01.048). There are several lines of evidence arguing against it as the major driver of nucleoid segregation:

      (Below is a slightly modified version of our response to a comment from Reviewer 1—see page 3)

      (1) For this alternative model to work, axial membrane expansion (i.e., cell elongation) would have to be localized at the middle of the splitting nucleoids (i.e., midcell position for slow growth and ¼ and ¾ cell positions for fast growth) to create a directional motion. To our knowledge, there is no evidence of such localized membrane incorporation. Furthermore, even if membrane growth was localized at the right places, the fluidity of the cytoplasmic membrane (PMID: 6996724, 20159151, 24735432, 27705775) would be problematic. To go around this fluidity issue, one could potentially evoke a potential connection to the rigid peptidoglycan, but then again, peptidoglycan growth would have to be localized at the middle of the splitting nucleoid to “push” the sister nucleoid apart from each other. However, peptidoglycan growth is dispersed prior to cell constriction (PMID: 35705811, 36097171, 2656655).

      (2) Even if we ignore the aforementioned caveats, Paul Wiggins’s group ruled out the cell elongation/transertion model by showing that the rate of cell elongation is slower than the rate of chromosome segregation (PMID: 23775792). In the revised manuscript, we confirm that the cell elongation rate is indeed overall slower than the nucleoid segregation rate (see Figure 1 - figure supplement 5A where the subtraction of the cell elongation rate to the nucleoid segregation rate at the single-cell level leads to positive values).

      (3) Furthermore, our correlation analysis comparing the rate of nucleoid segregation to the rate of either cell elongation or polysome accumulation argues that polysome accumulation plays a larger role than cell elongation in nucleoid segregation. These data were already shown in the original manuscript (Figure 1I and Figure 1 – figure supplement 5B) but were not highlighted in this context. We have revised the text to clarify this point.

      (4) The membrane attachment/cell elongation model does not explain the nucleoid asymmetries described in our paper (Figure 3), whereas they can be recapitulated by our model.

      (5) The cell elongation/transertion model cannot predict the aberrant nucleoid dynamics observed when chromosomal expression is largely redirected to plasmid expression (Figure 7). In the revised manuscript, we have added simulation results showing that these nucleoid dynamics are predicted by our model (Figure 7 – figure supplement 2).

      Based on these arguments, we do not believe that a mechanism based on membrane attachment and cell elongation is the major driver of nucleoid segregations. However, we do believe that it may play a complementary role (see “Nucleoid segregation likely involves multiple factors” in the Discussion). We have revised the text to clarify our thoughts and mention the potential role of transertion.

      Incorporating this perspective into the discussion or future iterations of the model may provide a more comprehensive framework that aligns with the experimental observations in this study and previous work.

      As noted above, we have revised the text to mention transertion.

      Simplification of Ribosome States:

      Combining monomeric and translating ribosomes into a single 'polysome' category may overlook spatial variations in these states, particularly during ribosome accumulation at the mid-cell. Without validating uniform mRNA distribution or conducting experimental controls such as FRAP or single-molecule measurements to estimate the proportions of ribosome states based on diffusion, this assumption remains speculative.

      Indeed, for simplicity, we adopt an average description of all polysomes with an average diffusion coefficient and interaction parameters, which is sufficient for capturing the fundamental mechanism underlying nucleoid segregation. To illustrate that considering multiple polysome species does not change the physical picture, we have considered an extension of our model, which contains three polysome species, each with a different diffusion coefficient (D<sub>P</sub> = 0.018, 0.023, or 0.028 μm<sup>2</sup>/s), reflecting that polysomes with more ribosomes will have a lower diffusion coefficient. Simulation of this model reveals that the different polysome species have essentially the same concentration distribution, suggesting that the average description in our minimal model is sufficient for our purposes. We present these new simulation results in Figure 4 – figure supplement 4 of the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Does the polysome density correlate with the origins? If the majority of ribosomal genes are expressed near the origins,

      This is indeed an interesting point that we mention in the discussion. The fact that the chromosomal origin is surrounded by highly expressed genes (PMID: 30904377) and is located near the middle of the nucleoid prior to DNA replication (PMID: 15960977, 27332118, 34385314, 37980336) can only help the model that we propose by increasing the polysome density at the mid-nucleoid position.

      (2) Red lines in 3C are hard to resolve - can the authors make them darker?

      Absolutely. Sorry about that.

      Reviewer #2 (Recommendations for the authors):

      The authors use rifampicin treatment as a mechanism to trigger polysome disassembly and show this leads to homogenous RplA distribution. This is a really important experiment as it is used to link RplA localization to polysomes, and tp argue that RplA density is reporting on polysomes. Given rifampicin inhibits RNA polymerase, and given the only reference of the three linking rifampicin to polysome disassembly is the 1971 Blundell and Wild ref), it would perhaps be useful to more conclusively show that polysome depletion (as opposed to inhibition of mRNA synthesis, which is upstream of polysome assembly) by using an alternative compound more commonly linked to polysome disassembly (e.g., puromycin) and show timelapse loss of density as a function of treatment time. This is not a required experiment, but given the idea that RplA density reports on polysomes is central to the authors' interpretation, it feels like this would be a thing worth being certain of. An alternative model is that ribosomes undergo self-assembly into local storage depots when not being used, but those depots are not translationally active/lack polysomes. I don't know if I think this is likely, but I'm not convinced the rifampicin treatment + waiting for a relatively long period of time unambiguously excludes other possible mechanisms given the large scale remodeling of the intracellular environment upon mRNA inhibition. I 100% buy the relationship between ribosomal distribution and nucleoid segregation (and the ectopic expression experiments are amazing in this regard), so my own pause for thought here is "do we know those ribosomes are in polysomes in the ribosome-dense regions". I'm not sure the answer to this question has any bearing on the impact and importance of this work (in my mind, it doesn't, but perhaps there's a reason it does?). The way to unambiguously show this would really be to do CryoET and show polysomes in the dense ribosomal regions, but I would never suggest the authors do that here (that's an entire other paper!).

      We agree that mRNAs play a role, as mRNAs are major components of polysomes and most mRNAs are expected to be in the form of polysomes (i.e., in complex with ribosomes). In addition, as mentioned above, the enrichments of ribosome distribution are known to be associated with polysomes (PMID: 25056965, 38678067, 22624875, 31150626, 34186018, 10675340). The attribution is consistent with single-molecule tracking experiments showing that slow-moving ribosomes (polysomes) are excluded by the nucleoid whereas fast-diffusing ribosomes (free ribosomal subunits) are distributed throughout the cytoplasm (PMID: 25056965, 22624875). This is also consistent with cryo-ET results that we actually published (see Figure S5, PMID: 34186018). We have added this information to the revised manuscript. Thank you for alerting us of this oversight.

      On line 320 the authors state "Our single-cell studies provided experimental support that phase separation between polysomes and DNA contributes to nucleoid segregation." - this comes pretty out of left field? I didn't see any discussion of this hypothesis leading up to this sentence, nor is there evidence I can see that necessitates phase separation as a mechanistic explanation unless we are simply using phase separation to mean cellular regions with distinct cellular properties (which I would advise against). If the authors really want to pursue this model I think much more support needs to be provided here, including (1) defining what the different phases are, (2) providing explicit description of what the attractive/repulsive determinants of these different phases could be/are, and (3) ruling out a model where the behavior observed is driven by a combination of DNA / polysome entanglement + steric exclusion; if this is actually the model, then being much more explicit about this being a locally arrested percolation phenomenon would be essential. Overall, however, I would probably dissuade the authors from pursuing the specific underlying physics of what drives the effects they're seeing in a Results section, solely because I think ruling in/out a model unambiguously is very difficult. Instead, this would be a useful topic for a Discussion, especially couched under a "our data are consistent with..." if they cannot exclude other models (which I think is unreasonably difficult to do).

      Thank you for your advice. We have revised the text to more carefully choose our words and define our terms.

      Minor comments:

      The results in "Cell elongation may also contribute to sister nucleoid migration near the end of the division cycle" are really interesting, but this section is one big paragraph, and I might encourage the authors to divide this paragraph up to help the reader parse this complex (and fascinating) set of results!

      We have revised this section to hopefully make it more accessible.

      Reviewer #3 (Recommendations for the authors):

      Technical Controls:

      The authors should conduct a photobleaching control to confirm that the perceived 'higher' brightness of new ribosomes at the mid-cell position is not an artefact caused by older ribosomes being photobleached during the imaging process. Comparing results at various imaging frequencies and intensities is necessary to address this issue.

      The ribosome localization data across 30 nutrient conditions (Figure 2, Figure 1 – figure supplement 6, Figure 2 – Figure supplement 1, Figure 2 – Figure supplement 3 and Figure 5) are from snapshot images, which do not have any photobleaching issue. They confirm the mid-cell accumulation seen by time-lapse microscopy. We have revised the text to clarify this point.

      Novelty of Experimental Measurements:

      While the scale of the study is unprecedented, claims of novelty (e.g., line 142) regarding ribosome-nucleoid segregation tracking are overstated. Similar observations have been made previously (e.g., Bakshi et al., 2012; Bakshi et al., 2014; Chai et al., 2014).

      Our apologies. The text in line 142 oversimplified our rationale. This has been corrected in the revised manuscript.

    1. Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports experiments designed to dissect the function of N-cadherin during mammalian folliculogenesis, using the mouse as a model system. Prior studies have shown that this is the principal cadherin expressed by the follicular granulosa cells. Two main strategies are used - small-molecule inhibitors that target N-cadherin and a conditional knockout where the gene encoding N-cad is deleted in granulosa cells. The authors also take advantage of the ability to reproduce key events of folliculogenesis, such as oocyte meiotic maturation, in vitro. Four main conclusions are drawn from the studies: (i) cadherin-based cell contact is required to maintain cadherin (N-cad in the granulosa cells; E-cad in the oocyte) at the plasma membrane; (ii) N-cad is required for cumulus layer expansion; (iii) N-cad is required for meiotic maturation of the oocyte; (iv) N-cad is required for ovulation.

      Strengths:

      The experiments are logically conceived, clearly described and presented, and carefully interpreted. A key strength of the paper is that multiple approaches have been used (drugs, knockouts, immunofluorescence, PLA, in vitro and in vivo studies). Taken together, they clearly establish essential roles for N-cadherin during folliculogenesis.

      It is intriguing that, when cadherin activity is impaired, the cadherins are lost from the plasma membrane. This suggests that, in a multicellular context, interactions with other cadherins, either in cis within the same cell or in trans with a neighboring cell, are required to maintain cadherins at the membrane. Hence, beyond their significance for understanding female reproductive biology, these experiments have broader implications for cell biology.

      Weaknesses:

      A few points could be considered or clarified by the authors:

      The YAP experiments were confusing to the reviewer. CRS-066 increased YAP activity, as indicated by increased expression of target genes. Since CRS-066 prevents expansion, this result suggests that YAP antagonizes expansion. Therefore, blocking YAP should favor expansion. Yet, the YAP inhibitor impaired expansion. In the reviewer's eyes, these results seem to be contradictory.

      The mechanism through which N-cadherin inhibitors block cumulus expansion isn’t fully elucidated but isn’t deemed to be through YAP alone. The transcriptional changes indicate crosstalk between N-cadherin, β-catenin and Hippo/YAP pathways, as well as impacting on the signalling between cumulus cells and the oocyte.

      It is intriguing that the inhibitors were able to efficiently block oocyte maturation. Oocytes from which the cumulus granulosa cells have been removed (denuded) will mature in vitro in the absence of LH or EGF. Since the effect of the inhibitors is to break the contact between the cumulus cells and oocyte, one might have predicted that this would not impair the ability of the oocytes to mature. Perhaps the authors could comment on this.

      Indeed, removal of cumulus cells permits oocyte meiotic maturation by reducing oocyte cAMP, leading to activation of meiosis promoting factor (MPF). A hypothesis would be that cyclic nucleotides and MPF arrest in the oocyte are maintained when N-cadherin contacts are blocked but this was not determined.

      Regarding the experiments where the inhibitors were administered intra-peritoneally, the authors might comment on the rationale for choosing the doses that were used. An additional point to consider is that, since N-cadherin is expressed in a variety of tissues, an effect of interfering with N-cadherin at these non-ovarian sites could indirectly influence ovarian function.

      Doses were chosen based on previous reported use of these inhibitors in vivo (Mrozik et al. 2020). Possible effects of the N-cadherin antagonists in other tissues was a carefully considered in this and the previous Mrozik et al study. While we saw no evidence of effects in gross morphological observations, or closer examination of vasculature or blood in these studies, this potential is not excluded.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript entitled "N-cadherin mechanosensing in ovarian follicles controls oocyte maturation and ovulation" aimed to investigate the role of N-cadherin in different ovarian physiological processes, including cumulus oocyte expansion, oocyte maturation, and ovulation. The authors performed several in vitro and in vivo mice experiments, using diverse techniques to reinforce their results.

      First, they identified two compounds (N-cadherin antagonists) that block the adhesion of periovulatory COCs to fibronectin through screening a small molecule library, using the xCELLigenceTM system, performing proper and complementary controls. Second, the authors showed the presence of N-cadherin adherens junctions between granulosa cells and cumulus cells and at the interface of cumulus cell transzonal projections and the oocyte throughout folliculogenesis. And that these adherens complexes between cumulus cells and oocytes were disrupted when inhibited N-cadherin, as observed by nice representative confocal images. Then, the authors assessed COC expansion and oocyte meiotic maturation to determine whether the loss of oocyte membrane β-catenin and E-cadherin upon N-cadherin inhibitor treatment disrupts the bi-directional communication between cumulus cells and the oocyte. Indeed, N-cadherin antagonists disrupted both processes (cumulus expansion and oocyte meiotic). However, the expression of known mediators of COC expansion (E.g., Areg and Ptgs2) were either increased or unaffected. Nevertheless, RNA-Seq showed consistent effects on cell signaling mRNA genes by the antagonist CRS-066.

      In vivo studies using mice were also achieved using stimulated protocols (together with one of the antagonists or vehicle) or granulosa-specific Cdh2 Knockouts to further analyze the role of N-cadherin. N-cadherin antagonist CRS-066 (but not LCRF-0006) significantly reduced mouse ovulation compared to controls. RNA-sequencing data analysis identified distinct gene expression profiles in CRS-066 treated compared to control ovaries. Ovulation in CdhFl/FL; Amhr2Cre mice after stimulation were also significantly reduced; multiple large unruptured follicles were observed in these granulosa-specific Cdh2 mutant ovaries, and the mRNA expression of Areg and Ptgs2 were reduced.

      The authors conclude that their study identified N-cadherin as a mechanosensory regulator important in ovarian granulosa cell differentiation able to respond to hormone stimuli both in vivo and in vitro, demonstrating a critical role for N-cadherin in ovarian follicular development and ovulation. They highlighted the potential to inhibit ovulation by targeting this signaling mechanism.

      Strengths:

      This remarkable manuscript is very well designed, performed, and discussed. The authors analyzed different aspects, and their data supports their conclusions.

      Weaknesses:

      This study was performed using the mouse as a research model; further studies in larger animals and humans would be interesting and warranted.

      Indeed, this would be interesting. Ongoing research into therapeutic applications of N-cadherin targeting is reviewed in Blaschuk OW. Front Cell Dev Biol. 2022 Mar 3;10:866200

      Minor comments:

      Some results are intriguing. While the AREG y PTGS2 mRNA increased within the COC in vitro by the N-cadherin antagonists, in vivo, the treatment induced a significant increase in both genes when analyzing the whole ovary. What are the authors' ideas that could explain these discrepancies in outcomes?

      Comparing the responses in IVM COCs to in vivo whole ovaries carries multiple caveats, though as noted, the observations are consistent with altered mechanotransduction in each case. It is important to note the change in pre-ovulatory follicle gene expression in vivo, which likely affects the response of follicles to ovulatory stimulus.

      The authors stated that the ovaries from mice treated in the same manner and collected either before hCG treatment (eCG 44 h) or 11 h after hCG showed equivalent numbers of follicles at each stage of development from primary to antral. However, in Panel l from Figure 5, there is a significant increase in the number of antral follicles in the CRS-066 group (hCG 11 h) compared to the vehicle. Could the author discuss it in the manuscript?

      A small change in these follicle types was significant in hCG 11h treated mice and is consistent with the altered response to the ovulatory stimulus and reduced ovulation resulting in persistent antral follicles.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Is the mechanism by which the small molecules block N-cad's adhesive activity known? And is the stable residence of cadherins in the plasma membrane known to depend on their engagement with other cadherins either in cis or in trans?

      Adhesion interactions between N-cadherin in Cis or Trans results in their clustering and enrichment at the membrane. Molecular docking models of the small molecule N-cadherin inhibitors are not available. However, these inhibitors were designed as peptidomimetics of the N-cadherin amino-terminus that is shown to interacts in Trans with N-cadherin on neighbouring cells (Blaschuk OW. Front Cell Dev Biol. 2022 Mar 3;10:866200).

      Since the inhibitors are blocking cadherin activity, one might have expected the cumulus cell mass to disaggregate into individual cells. Yet, Figures 3a and 3c show that this does not happen. Could the authors speculate how the cells are being held together?

    1. Author Response

      We appreciate the insightful feedback provided by the editors and reviewers who have recognized the novelty of our study. We have mapped the spatial distribution of six endogenous somatic histone H1 variants within the nuclei of several human cell lines using specific antibodies, which strongly suggest functional differences between variants. We will submit a reviewed version of the manuscript to accommodate the reviewers comments.

      To answer the reviewers comments at this stage:

      1. We do have investigated co-localization of H1 variants with HP1 proteins and we are eager to add some of this data in a revised version of this manuscript.

      2. Respect to the functional significance of the results presented here, we want to stress that as a consequence of the differential distribution and abundance of H1 variants among cell types, depletion of different variants has different consequences. For example, H1.2 depletion but not others has a great impact on chromatin compaction. Besides, cell lines lacking H1.3/H1.5 expression present a basal up-regulation of some Interferon stimulated genes (ISGs) and particular repetive elements, as it was previously described upon induced depletion of H1.2/H1.4 in a breast cancer cell line or in pancreatic adenocarcinomas with lower levels of replication-dependent H1 variants (Izquierdo et al. 2017 NAR 45:11622). So, our results reinforce the existing link between H1 content and immune signature. We are eager to add this data in a revised version of this manuscript. Moreover, we also analyzed the chromatin structural changes upon combined depletion of H1.2 and H1.4. Combined H1.2/H1.4 depletion triggers a global chromatin decompaction, which supports previous observations from ATAC-Seq and Hi-C experiments in these cells (Izquierdo et al. 2017 NAR 45:11622; Serna-Pujol et al. 2022 NAR 50:3892). Although H1 content is more compromised in these cells (30% total H1 reduction) compared to single H1 KDs, the phenotype observed could not be recapitulated when other H1 KD combinations, in which total H1 content was reduced similarly, were investigated (Izquierdo et al. 2017 NAR 45:11622), supporting that the deleterious defects were due to the non-redundant role of H1.2 and H1.4 proteins. Indeed, this manuscript supports this notion, as H1.2 and H1.4 show a different genome-wide and nuclear distribution.

      3. We totally agree with the reviewers that the use of commercially available antibodies does not guarantee their quality and specificity. As this issue was crucial for our studies, we extensively assayed performance and specificity of the antibodies, using different approaches. The validations were shown in our previous publications where these antibodies where successfully used for ChIP-seq (Serna et al. 2022 NAR 50:3892; Salinas-Pena et al, under revision). In summary, performance of H1.0 (05-629l, Millipore), H1.2 (ab4086, abcam), H1.4 (702876; Invitrogen), H1.5 (711912, Invitrogen) and H1X (ab31972; abcam) antibodies was tested by Western-Blot, ChIP and proteomic analyses (all the results are included in Supplementary Figure 1 in Serna et al. 2022 NAR 50:3892). Concretely, we tested specificity using inducible KDs for the depletion of each of the somatic H1 variants in T47D. We also checked that the antibodies did not recognize additional H1 variants using recombinant proteins or cell lines naturally lacking some of the variants. All the experiments confirmed that antibodies were variant-specific. In addition, when the corresponding epitope was absent, the antibodies did not gain new cross-reactivity with other variants. More recently, validation of the specificicity of the H1.3 antibody (ab203948) was performed following the same experimental approaches described for the rest of antibodies (Salinas-Pena et al, under revision).

      4. Our immunofluorescence data, together with ChIP-seq data, do not discard binding of H1 variants to a great variety of chromatin, but show enrichment or preferential binding to certain regions or chromatin types. Our data on the interphase nuclei does not suggest at all any type of quenching or saturation. Obviously, detection with antibodies depends on epitope accessibility, just like all immunofluorescence data ever published, and we have acknowledged that post-translational modifications of H1 may occlude antibody accessibility as some phospho-H1 antibodies give distribution patterns different than total/unmodified H1 antibodies. Thus, we cannot exclude that specific modified-H1s exhibit particular distribution patterns that are not being recapitulated in our data. This represents another layer of complexity in H1 diversity and we agree that exploration of the repertoire of H1 PTMs and their functional roles are an interesting matter of study that needs to be addressed. Still, our data is highly relevant as it demonstrates for the first time the unique distribution patterns of H1 variants among multiple cell lines and it does not use overexpression of tagged H1 variants that in our experience produces mislocalization of H1s.

      5. We will further explain how the relative quantification of H1 variants in different cell lines was performed if not clear enough. We agree that more sophisticated mass spectrometry-based quantification is desirable and we are collaborating to do this using internal H1 peptide controls, but this is out of the scope of this manuscript as the observed patterns of distribution of H1 variants do not depend on mild differences in variants abundance. Only the absence of H1.3 and H1.5 in some cell lines alters the distribution of other variants.

      6. We have also studied the spatial distribution of H1 variants in non-tumorogenic cell lines and we are eager to add this in a revised version of the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Public review:

      In this study, Porter et al report on outcomes from a small, open-label, pilot randomized clinical trial comparing dornase-alfa to the best available care in patients hospitalized with COVID-19 pneumonia. As the number of randomized participants is small, investigators describe also a contemporary cohort of controls and the study concludes about a decrease of inflammation (reflected by CRP levels) aJer 7 days of treatment but no other statistically significant clinical benefit.

      Suggestions to the authors:

      • The RCT does not follow CONSORT statement and reporting guidelines

      We thank you for this suggestion and have now amended the order and content of the manuscript to follow the CONSORT statement as closely as possible.

      • The authors have chosen a primary outcome that cannot be at least considered as clinically relevant or interesting. AJer 3 years of the pandemic with so much research, why investigate if a drug reduces CRP levels as we already have marketed drugs that provide beneficial clinical outcomes such as dexamethasone, anakinra, tocilizumab and baricitinib.

      We thank the reviewer for bringing up this central topic. The answer to this question has both a historical and practical component. This trial was initiated in June of 2020 and was completed in June of 2021. At that time there were no known treatments for the severe immune pathology of COVID19 pneumonia. In June 2020, dexamethasone data came out and we incorporated dexamethasone into the study design. It took much longer for all other anti-inflammatories to be tested. Hence, our decision to trial an approved endonuclease was based purely on basic science work on the pathogenic role of cell-free chromatin and NETs in murine sepsis and flu models and the ability of DNase I to clear them and reduce pathology in these animal models. In addition, evidence for the presence of cell-free chromatin components in COVID-19 patient plasma had already been communicated in a pre-print. Finally, several studies had reported the anti-inflammatory effects of dornase treatment in CF patients. Hence there was a strong case for a cheap, safe, pulmonary noninvasive treatment that could be self-administered outside the clinical se]ng.

      The Identification of novel/repurposed treatments effective for COVID-19 were hampered by patient recruitment to competing studies during a pandemic. This resulted in small studies with inconclusive or contrary findings. In general, effective treatments were only picked up in very large RCTs. For example, demonstrating dexamethasone as effective in COVID-19 required recruitment of 6,425 patients into the RECOVERY study. Multiple trials with anti-IL-6 gave conflicting evidence until RECOVERY recruited 4116 adults with COVID-19 (n=2022, tocilizumab and 2094, control) similar for Baracitinib (4,148 randomised to treatment and 4,008 to standard care). Anakinra is approved for patients with elevated suPAR, based on data from one randomized clinical trial of 594 patients, of whom 405 had active treatment (PMID: 34625750). However, a systematic review analysing over 1,627 patients (anakinra 888, control 739) with COVID-19 showed no benefit (PMID: 36841793). Regarding the choice of the primary endpoint, there is a wealth of clinical evidence to support the relevance of CRP as a prognostic marker for COVID-19 pneumonia patients and it is a standard diagnostic and prognostic clinical parameter in infectious disease wards. This choice in March 2020 was supported by evidence of the prognostic value of IL-6; CRP is a surrogate of IL-6. We also provide our own data from a large study of severe COVID-19 pneumonia in figure 1, showing how well CRP correlates with survival.

      In summary, our data suggest that Dornase yields an anti-inflammatory effect that is comparable or potentially superior to cytokine-blocking monotherapies at a fraction of the cost and potentially without the additional adverse effects such as the increase for co-infections.

      We now provide additional justification on these points in the introduction on pg.4 as follows:

      “The trial was ini.ated in June 2020 and was completed in September of 2021. At the start of the trial only dexamethasone had been proven to benefit hospitalized COVID-19 pneumonia pa.ents and was thus included in both arms of the trial. To increase the chance of reaching significance under challenging constraints in pa.ent access, we opted to increase our sample size by using a combina.on of randomized individuals and available CRP data from matched contemporary controls (CC) hospitalized at UCL but not recruited to a trial. These approaches demonstrated that when combined with dexamethasone, nebulized DNase treatment was an effec.ve an.-inflammatory treatment in randomized individuals with or without the implementa.on of CC data.”

      We also added the following explanation in the discussion on pg. 16:

      “Our study design offered a solution to the early screening of compounds for inclusion in larger platform trials. The study took advantage of frequent repeated measures of quantifiable CRP in each patient, to allow a smaller sample size to determine efficacy/futility than if powered on clinical outcomes. We applied a CRP-based approach that was similar to the CATALYST and ATTRACT studies. CATALYST showed in much smaller groups (usual care, 54, namilumab, 57 and infliximab, 35) that namilumab that is an antibody that blocks the cytokine GM-CSF reduced CRP even in participants treated with dexamethasone whereas infliximab that targets TNF-α had no significant effect on CRP. This led to a suggestion that namilumab should be considered as an agent to be prioritised for further investigation in the RECOVERY trial. A direct comparison of our results with CATALYST is difficult due to the different nature of the modelling employed in the two studies. However, in general Dornase alfa exhibited comparable significance in the reduction in CRP compared to standard of care as described for namilumab at a fraction of the cost. Furthermore, endonuclease therapies may prove superior to cytokine blocking monotherapies, as they are unlikely to increase the risk for microbial co-infections that have been reported for antibody therapies that neutralize cytokines that are critical for immune defence such as IL-1β, IL-6 or GM-CSF. “

      • Please provide in Methods the timeframe for the investigation of the primary endpoint

      This information is provided in the analysis on pg. 8:

      “The primary outcome was the least square (LS) mean CRP up to 7 days or at hospital discharge whichever was sooner.”

      • Why day 35 was chosen for the read-out of the endpointt?

      We now state on pg. 8 that “Day 35 was chosen as being likely to include most early mortality due to COVID-19 being 4 weeks after completion of a week of treatment. ( i.e. d7 of treatment +28 (4 x 7 days))”

      • The authors performed an RCT but in parallel chose to compare also controls. They should explain their rationale as this is not usual. I am not very enthusiastic to see mixed results like Figures 2c and 2d.

      We initially aimed at a fully randomized trial. However, the swiJ implementation of trial prioritization strategies towards large and pre-established trial plamorms in the UK made the recruitment COVID19 patients to small studies extremely challenging. Thus, we struggled to gain access to patients. Our power calculations suggested that a mixed trial with randomized and contemporary controls was the best way forward under these restrictions in patient access that could provide sufficient power.

      That being said, we also provide the primary endpoint (CRP) results in Fig. 3B as well as the results for the length of hospitalization (Fig. S3D) for the randomized subjects only.

      • Analysis is performed in mITT; this is a major limitation. The authors should provide at least ITT results. And they should describe in the main manuscript why they chose mITT analysis.

      We apologize if this point was confusing. We performed the analysis on the ITT as defined in our SAP: “The primary analysis population will be all evaluable patients randomised to BAC + dornase alfa or BAC only who have at least one post-baseline CRP measurement, as well as matched historical comparators.”

      We understand that the reason this might be mistaken as an mITT is because the N in the ITT (39) doesn’t match the number randomised and because we had stated on pg. 8 that “ Efficacy assessments of primary and secondary outcomes in the modified inten.on-to-treat popula.on were performed.”

      However, we did randomise 41 participants, but:

      One participant in the DA arm never received treatment. The individual withdrew consent and was replaced. We also have no CRP data for this participant in the database, so they were unevaluable, and we couldn’t include them in the baseline table even if we wanted to. In addition, 1 participant in BAC only had a baseline CRP measurement available. Hence not evaluable as we only have a baseline CRP measurement for this participant.

      We have corrected the confusing statement on pg. 8 and added an additional explanation.

      “Efficacy assessments of primary and secondary outcomes in the inten.on-to-treat (ITT) popula.on were performed on all randomised par.cipants who had received at least one dose of dornase alfa if randomized to treatment. For full details see Sta.s.cal Analysis Plan. The ITT was adjusted to mi.gate the following protocol viola.ons where one par.cipant in the BAC arm and one in the DA arm withdrew before they received treatment and provided only a baseline CRP measurement available. The par.cipant in the DA arm was replaced with an addi.onal recruited pa.ent. Exploratory endpoints were only available in randomised par.cipants and not in the CC. In this case, a post hoc within group analysis was conducted to compare baseline and post-baseline measurements.”

      • It is also not usual to exclude patients from analysis because investigators just do not have serial measurements. This is lost to follow up and investigators should have pre-decided what to do with lost-to-follow-up.

      Our protocol pre-specified that the primary analysis population should have at least one postbaseline CRP measurement (pg. 13 of protocol). The patient that was excluded was one that initially joined the trial but withdrew consent after the first treatment but before the first post-treatment blood sample could be drawn. Hence, the pre-treatment CRP of this patient alone provided no useful information.

      • In Table 1 I would like to see all randomized patients (n=39), which is missing. There are also baseline characteristics that are missing, like which other treatments as BAT received by those patients except for dexamethasone.

      Table 1 includes all 39 patients plus 60 CCs.<br /> Table 2 shows additional treatments given for COVID-19 as part of BAC.

      • In the first paragraph of clinical outcomes, the authors refer to a cohort that is not previously introduced in the manuscript. This is confusing. And I do not understand why this analysis is performed in the context of this RCT although I understand its pilot nature.

      One of the main criticisms we have encountered in this study has been the choice of the primary endpoint. The best way respond to these questions was to provide data to support the prognostic relevance of CRP in COVID-19 pneumonia from a separate independent study where no other treatments such as dexamethasone, anakinra or anti-IL6 therapies were administered. We think this is very useful analysis and provides essential context for the trial and the choice of the primary endpoint, indicating that CRP has good enough resolution to predict clinical outcomes.

      • Propensity-score selected contemporary controls may introduce bias in favor of the primary study analysis, since controls are already adjusted for age, sex and comorbidities.

      The contemporary controls were selected to best match the characteristics of the randomized patients including that the first CRP measurement upon admission surpassed the trial threshold, so we do not see how this selection process introduces biases, as it was blinded with regards to the course of the CRP measurements. Given that this was a small trial, matching for baseline characteristics is necessary to minimize confounding effects.

      • The authors do not clearly present numerically survivors and non-survivors at day 34, even though this is one of the main secondary outcomes.

      We now provide the mortality numbers in the following paragraph on pg. 13.

      “Over 35 days follow up, 1 person in the BAC + dornase-alfa group died, compared to 8 in the BAC group. The hazard ra.o observed in the Cox propor.onal hazards model (95% CI) was 0.47 (0.06, 3.86), which es.mates that throughout 35 days follow-up, there was a 53% reduced chance of death at any given .mepoint in the BAC + dornase-alfa group compared to the BAC group, though the confidence intervals are wide due to a small number of events. The p-value from a log-rank test was 0.460, which does not reach sta.s.cal significance at an alpha of 0.05.”

      • It is unclear why another cohort (Berlin) was used to associate CRP with mortality. CRP association with mortality should (also) be performed within the current study.

      As we explained above, the Berlin cohort CRP data serve to substantiate the relevance of CRP as a primary endpoint in a cohort that experienced sufficient mortality as this cohort did not receive any approved anti-inflammatory therapy. Mortality in our COVASE trial was minimal, since all patients were on dexamethasone and did not reach the highest severity grade, since we opted to treat patients before they deteriorated further. The overall mortality was 8% across all arms of our study, which does not provide enough events for mortality measurements. In contrast the Berlin cohort did not receive dexamethasone and all patients had reached a WHO severity grade 7 category with mortality at 30%.

      My other concerns are:

      • This report is about an RCT and the authors should follow the CONSORT reporting guidelines. Please amend the manuscript and Figure 1b accordingly and provide a CONSORT checklist.

      We now provide a CONSORT checklist and have amended the CONSORT diagram accordingly.

      • Please provide in brief the exclusion criteria in the main manuscript

      We have now included the exclusion criteria in the manuscript on pg. 6.

      “1.1.1 Exclusion criteria

      1. Females who are pregnant, planning pregnancy or breasmeeding

      2. Concurrent and/or recent involvement in other research or use of another experimental inves.ga.onal medicinal product that is likely to interfere with the study medica.on within (specify .me period e.g. last 3 months) of study enrolment 3. Serious condi.on mee.ng one of the following:

      a. Respiratory distress with respiratory rate >=40 breaths/min

      b. oxygen satura.on<=93% on high-flow oxygen

      1. Require mechanical invasive or non-invasive ven.la.on at screening

      2. Concurrent severe respiratory disease such as asthma, COPD and/or ILD

      3. Any major disorder that in the opinion of the Inves.gator would interfere with the evalua.on of the results or cons.tute a health risk for the trial par.cipant

      4. Terminal disease and life expectancy <12 months without COVID-19

      5. Known allergies to dornase alfa and excipients

      6. Par.cipants who are unable to inhale or exhale orally throughout the en.re nebulisa.on period So briefly Patients were excluded if they were:

      7. pregnant, planning pregnancy or breasmeeding

      8. Serious condition meeting one of the following:

      a. Respiratory distress with respiratory rate >=40 breaths/min

      b. oxygen satura.on<=93% on high-flow oxygen

      1. Require ven.la.on at screening

      2. Concurrent severe respiratory disease such as asthma, COPD and/or ILD

      3. Terminal disease and life expectancy <12 months without COVID-19

      4. Known allergies to dornase alfa and excipients

      5. Participants who are unable to inhale or exhale orally throughout the en.re nebulisa.on period”

      • "The final trial visit occurred at day 35." "Analysis included mortality at day 35". I am not sure I understand why. In clinicaltrials.gov all endpoints are meant to be studies at day 7 except for mortality rate day 28. Why day 35 was chosen? Please be consistent.

      Thank you for identifying this inconsistency. We have amended the record on clinicaltrials.gov to read ‘’the time to event data was censored at 28 days post last dose (up to d35) for the randomised participants and at the date of the last electronic record for the CC.”

      • Please provide in Methods the timeframe for the investigation of the primary endpoint

      • The authors performed an RCT but in parallel chose to compare also controls. They should explain their rationale as this is not usual. I am not very enthusiastic to see mixed results like Figures 2c and 2d.

      • Analysis is performed in mITT; this is a major limitation. The authors should provide at least ITT results. And they should describe in the main manuscript why they chose mITT analysis.

      • It is also not usual to exclude patients from analysis because investigators just do not have serial measurements. This is lost to follow up and investigators should have pre-decided what to do with lost-to-follow-up.

      • Figure 1b as in CONSORT statement, please provide reasons why screened patients were not enrolled.

      • In Table 1 I would like to see all randomized patients (n=39), which is missing. There are also baseline characteristics that are missing, like which other treatment as BAT received those patients except for dexamethasone.

      • In the first paragraph of clinical outcomes, the authors refer to a cohort that is not previously introduced in the manuscript. This is confusing. And I do not understand why this analysis is performed in the context of this RCT although I understand its pilot nature.

      • In Figure 2 the authors draw results about ITT although in methods describe that they performed an mITT analysis. Please be consistent.

      Please see answers provided to these queries above.

      Reviewer #2 (Recommendations For The Authors):

      1) Suppl Figure 2B would be more informative if presented as a Table with N of patients with per day sampling

      We now provide the primary end point daily sampling table in Table 3.

      2) The numbers at risk should figure under the KM curves

      The numbers at risk for figures 1E, 2C, 2D have been added as graphs either in the main figures or in the supplement.

      3) HD in Supplementary figure 3 should be explained

      We apologize for this omission. We now provide a description for the healthy donor samples that we used in the cell-free DNA measurements in figure S3B on pg. 14:

      “Compared to the plasma of anonymized healthy donors volunteers at the Francis Crick ins.tute (HD), plasma cf-DNA levels were elevated in both BAC and DA-treated COVASE par.cipants.

      4) Presentation is inappropriate for Table S4

      We thank the reviewer for pointing this issue. We have now formaxed Table S4 to be consistent with all other tables.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript is a focused investigation of the phosphor-regulation of a C. elegans kinesin-2 motor protein, OSM-3. In C-elegans sensory ciliary, kinesin-2 motor proteins Kinesin-II complex and OSM-3 homodimer transport IFT trains anterogradely to the ciliary tip. Kinesin-II carries OSM-3 as an inactive passenger from the ciliary base to the middle segment, where kinesin-II dissociates from IFT trains and OSM-3 gets activated and transports IFT trains to the distal segment. Therefore, activation/inactivation of OSM-3 plays an essential role in its ciliary function.

      Strengths:

      In this study, using mass spectrometry, the authors have shown that the NEKL-3 kinase phosphorylates a serine/threonine patch at the hinge region between coiled coils 1 and 2 of an OSM-3 dimer, referred to as the elbow region in ubiquitous kinesin-1. Phosphomimic mutants of these sites inhibit OSM-3 motility both in vitro and in vivo, suggesting that this phosphorylation is critical for the autoinhibition of the motor. Conversely, phospho-dead mutants of these sites hyperactivate OSM-3 motility in vitro and affect the localization of OSM3 in C. elegans. The authors also showed that Alanine to Tyrosine mutation of one of the phosphorylation rescues OS-3 function in live worms.

      Weaknesses:

      Collectively, this study presents evidence for the physiological role of OSM-3 elbow phosphorylation in its autoregulation, which affects ciliary localization and function of this motor. Overall, the work is well performed, and the results mostly support the conclusions of this manuscript. However, the work will benefit from additional experiments to further support conclusions and rule out alternative explanations, filling some logical gaps with new experimental evidence and in-text clarifications, and improving writing before I can recommend publication.

      We appreciate Reviewer #1’s comments and suggestions. We have now provided additional evidences and discussions to further support our conclusions and fill the logical gaps. We have also provided alternative explanations to our data and improved writing.

      Reviewer #2 (Public review):

      Summary:

      The regulation of kinesin is fundamental to cellular morphogenesis. Previously, it has been shown that OSM-3, a kinesin required for intraflagellar transport (IFT), is regulated by autoinhibition. However, it remains totally elusive how the autoinhibition of OSM-3 is released. In this study, the authors have shown that NEKL-3 phosphorylates OSM-3 and releases its autoinhibition.

      The authors found NEKL-3 directly phosphorylates OSM-3 (although the method is not described clearly) (Figure 1). The phophorylated residue is the "elbow" of OSM-3. The authors introduced phospho-dead (PD) and phospho-mimic (PM) mutations by genome editing and found that the OSM-3(PD) protein does not form cilia, and instead, accumulates to the axonal tips. The phenotype is similar to another constitutive active mutant of OSM-3, OSM-3(G444A) (Imanishi et al., 2006; Xie et al., 2024). osm-3(PM) has shorter cilia, which resembles with loss of function mutants of osm-3 (Figure 3). The authors did structural prediction and showed that G444E and PD mutations change the conformation of OSM-3 protein (Figure 3). In the single-molecule assays G444E and PD mutations exhibited increased landing rate (Figure 4). By unbiased genetic screening, the authors identified a suppressor mutant of osm-3(PD), in which A489T occurs. The result confirms the importance of this residue. Based on these results, the authors suggest that NEKL-3 induces phosphorylation of the elbow domain and inactivates OSM-3 motor when the motor is synthesized in the cell body. This regulation is essential for proper cilia formation.

      Strengths:

      The finding is interesting and gives new insight into how the IFT motor is regulated.

      Weaknesses:

      The methods section has not presented sufficient information to reproduce this study.

      We appreciate that Reviewer #2 is also positive to our study. We have now provided sufficient information in the revised Methods section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major Concerns

      (1) Why do the authors think that NEKL-3 phosphorylates OSM-3 in the first place? This seems to come out of nowhere and prior evidence indicating that NEKL-3 may be phosphorylating OSM-3 is not even mentioned in the Introduction.

      We thank the Reviewer for raising this important point. Our hypothesis that NEKL-3 phosphorylates OSM-3 stems from prior findings in our lab. In a previous study (Yi et al., Traffic, 2018, PMID: 29655266), we identified NEKL-4, a member of the NIMA kinase family, as a suppressor of the OSM-3(G444E) hyperactive mutation. This discovery prompted us to explore the broader role of NIMA kinases in regulating OSM3. Subsequent genetic screens (Xie et al., EMBO J, 2024, PMID: 38806659) revealed that both NEKL-3 and NEKL-4 suppress multiple OSM-3 mutations, further supporting their functional interaction. Given the established role of NIMA kinases in phosphorylation-dependent processes (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), we hypothesized that NEKL-3/4 may directly phosphorylate OSM-3 to modulate its activity.

      To test this hypothesis, we expressed recombinant C. elegans NEKL-3 and OSM-3 proteins and conducted in vitro phosphorylation assays. While we were unable to obtain active recombinant NEKL-4 (limitations noted in the revised text), our experiments with NEKL-3 revealed phosphorylation at residues 487-490 (YSTT motif) in OSM-3’s tail region, as confirmed by mass spectrometry. These findings are now explicitly contextualized in the Introduction and Results sections of the revised manuscript.

      Page #4, Line #11:

      “...In our previous study (Yi et al., Traffic, 2018, PMID: 29655266), a genetic screen targeting the OSM-3(G444E) hyperactive mutation identified NEKL-4, a member of the NIMA kinase family, as a suppressor of this phenotype. This finding, combined with reports that NIMA kinases regulate ciliary processes independently of their canonical mitotic roles (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), prompted us to investigate whether NIMA kinases modulate OSM-3-driven intraflagellar transport. We hypothesized that NEKL-3/4, as paralogs within this family, might directly phosphorylate OSM-3 to regulate its motility...”

      Page #4, line #26:  

      “... To determine whether NIMA kinase family members could directly phosphorylate

      OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM3 protein in order to perform in vitro phosphorylation assays. We were able to obtain active recombinant NEKL-3 but not NEKL-4. The in vitro phosphorylation assays showed that NEKL-3, directly phosphorylates OSM-3 (Fig. 1A-B, Appendix Table S1). Subsequent mass spectrometric analysis revealed phosphorylation at residues 487-490, which localize to the conserved "YSTT" motif within OSM-3’s C-terminal tail region ...”

      (2) The authors need to characterize the proteins they expressed and purified for in vitro ATPase and motility assays. Are these proteins monomers or dimers?

      For our in vitro ATPase and motility assays, OSM-3 was expressed in E. coli BL21(DE3) and purified using established protocols (Xie et al., EMBO J, 2024, PMID: 38806659; Imanishi et al., JCB, 2006, PMID: 17000874). To confirm its oligomeric state, we analyzed recombinant OSM-3 by size-exclusion chromatography coupled with multiangle light scattering (SEC-MALS). As reported in Xie et al. (2024), OSM-3 (~80 kDa monomer) elutes with a molecular weight of 173–193 kDa under physiological buffer conditions, consistent with a homodimeric assembly. These findings confirm that the functional unit used in our assays is the biologically relevant dimer. This characterization has been added to the revised manuscript on Page #35, Line #7.

      “…OSM-3 was expressed in E. coli BL21(DE3) and purified for in vitro assays using established protocols (REFs). Size-exclusion chromatography coupled with multiangle light scattering (SEC-MALS) (Xie et al., EMBO J., 2024) confirmed that recombinant OSM-3 forms a homodimer (173–193 kDa) under physiological conditions, ensuring its dimeric state remained intact....” 

      (3) The authors primarily used PD and PM mutations, which affect all four amino acids in the region. This may or may not be physiologically relevant. Figure 5 indicates that T489 is a critical regulatory site. However, this conclusion is undermined by reliance on PD mutations, which affect all four amino acids. Creating PM (T489E) and PD (T489A) mutations based on WT OSM-3 would better reflect physiological relevance. In vitro assays with a single phosphomimic or phosphor-dead mutation at residue 489 are missing at the end of this story. This would better link Figure 5 with the rest of the manuscript.

      We thank the reviewer for this constructive critique. Below, we address the concerns and integrate new data to strengthen the link between T489 and autoinhibition:

      To probe the regulatory role of T489 phosphorylation, we generated osm-3(T489E) (phosphomimetic, PM) and osm-3(T489A) (phospho-dead, PD) mutant animals. Strikingly, both mutants formed axonal puncta (Figure S7), recapitulating the hyperactive phenotype of the OSM-3G444E mutant. While the similar puncta formation in PM and PD mutants initially appeared paradoxical, this observation underscores the necessity of dynamic phosphorylation cycling at T489 for proper autoinhibition. Specifically, the PD mutant (T489A) likely disrupts phosphorylationdependent autoinhibition stabilization, leading to constitutive activation, where as the PM mutant (T489E) may mimic a "locked" phosphorylated state, preventing dephosphorylation-dependent release of autoinhibition in cilia and trapping OSM-3 in an aggregation-prone conformation. These results highlight T489 as a structural linchpin whose post-translational modification dynamically regulates motor activity. While the precise molecular mechanism—such as how phosphorylation modulates tailmotor domain interactions—remains to be elucidated, our data conclusively demonstrate that perturbing T489 (even in isolation) destabilizes autoinhibition, driving puncta formation and the constitutive activity.

      We have integrated the above paragraph in the revised manuscript on page #8, line #27.

      (4) There seems to be a disconnect between the MT gliding assays in Figure 4C and single molecule motility assays in Figure 4E. The gliding assays show that all constructs can glide microtubules at near WT speeds. Yet, the motility assays show that WT and PM cannot land or walk on MTs. The authors need to explain why this is the case. Is this because surface immobilization of kinesin from its tail disrupts autoinhibition? Alternatively, the protein preparation may include monomers that cannot be autoinhibited and cannot land and processively walk on surface-immobilized microtubules (because they only have one motor domain) but can glide microtubules when immobilized on the surface from their tail.

      The surface immobilization of OSM-3 via its tail domain disrupts autoinhibition, a phenomenon previously observed in other kinesins such as kinesin-1 (Nitzsche et al, Methods Cell Biol., 2010, PMID: 20466139). In our assays, OSM-3 was nonspecifically immobilized on glass surfaces, enabling microtubule gliding by motors whose autoinhibition was relieved through tail anchoring. Critically, the PD and PM mutations reside in the tail region and do not alter the intrinsic properties of the motor head domain. Consequently, once autoinhibition is released via immobilization, the gliding velocities reflect the conserved motor head activity, which is expected to remain comparable across all constructs. While we cannot entirely rule out the presence of monomeric OSM-3 in solution, several lines of evidence argue against this possibility. First, the mutations are located in the elbow region, which is dispensable for motor dimerization. Second, SEC-MALS analysis from prior studies confirms that purified OSM-3 exists predominantly as dimers in solution. 

      We have discussed these issues in the revised text on page #10, line #18: 

      “…In our gliding assays, OSM-3PM has an increased gliding speed of 0.69 ± 0.07 μm/s (Fig. 4 C-D), similar to PD mutant. PD and PM mutations are confined to the elbow region, leaving the motor head’s mechanochemical properties intact. Upon tail immobilization—which releases autoinhibition—the gliding speeds reflect motor head activity. Single-molecule assays, however, directly resolve their native regulatory states: PD mutants are constitutively active, whereas PM mutants persist in an autoinhibited state (Fig. 4E-G). Although monomeric OSM-3 could theoretically mediate singlemotor gliding, the previous SEC-MALS data demonstrate that OSM-3 purifies as stable dimers (Xie et al., EMBO J, 2024, PMID: 38806659). Thus, dimeric OSM-3 is perhaps the predominant functional species in our assays…”

      (5) An alternative explanation for the data is that both PD and PM mutations result in loss-of-function effects, disrupting OSM-3 activity. For instance:

      a) In Figure 2C, both mutations cause shorter cilia than the wild type (WT).

      b) In Figure 4A, both mutations result in higher ATPase activity than WT.

      c) In Figure 4D, both mutations show increased gliding velocity compared to WT. These results suggest the observed effects could stem from loss of function rather than phosphorylation-specific regulation.

      Although PD and PM mutations exhibit superficially similar "loss-of-function" phenotypes in certain assays, they mechanistically disrupt motor regulation in distinct ways:

      a) Ciliary Length (Figure 2C) PD Mutants: Hyperactivation causes OSM-3-PD to prematurely aggregate into axonal puncta, preventing ciliary entry. Consequently, cilia are built solely by the weaker Kinesin-II motor, which only constructs shorter middle segments.

      PM Mutants: OSM-3-PM retains autoinhibition during transport (enabling ciliary entry) but cannot be dephosphorylated in cilia. This blocks activation, leaving OSM-3-PM partially functional and resulting in cilia intermediate in length between WT and PD.

      We have discussed this issue in the revised text on page #5, line #30:

      “…These findings indicate that OSM-3-PM is in an autoinhibited state capable of ciliary delivery, yet fails to achieve full activation due to defective dephosphorylation. This incomplete activation results in suboptimal motor function and intermediate ciliary length phenotypes (Fig.2 B-C). In contrast, OSM-3-PD exhibits constitutive activation leading to aggregation into axonal puncta, which completely abolishes its ciliary entry capacity (Fig.2 A-B)...”

      b) ATPase Activity (Figure 4A)

      PD Mutants: Fully autoinhibition-released (98.15% of KHC ATPase activity), consistent with constitutive activation.

      PM Mutants: Show partial ATPase activity (34.28% of KHC), reflecting imperfect phosphomimicry. While the DDEE substitution introduces negative charges, it fails to fully replicate the steric/kinetic effects of phosphorylated tyrosine (Y486; phenyl ring absent), resulting in incomplete autoinhibition stabilization. Despite this, the residual inhibition is sufficient to phenocopy shorter cilia in vivo.

      We have discussed this issue in the revised text on page #7, line#19:

      “…The PM mutant’s partial ATPase activity (34.28% of KHC) might arise from imperfect phosphomimicry—while the DDEE substitution introduces negative charges, it lacks the steric bulk of phosphorylated tyrosine (pY487). And this incomplete mimicry allows residual autoinhibition, sufficient to limit ciliary construction in vivo...”

      c) Microtubule Gliding Velocity (Figure 4D)

      Gliding Assay Limitation: Tail immobilization artificially releases autoinhibition, masking regulatory differences. Thus, all constructs (PD, PM) exhibit similar velocities (~0.7 µm/s), reflecting conserved motor head activity.

      Single-Molecule Assay (Figure 4E): Directly resolves native autoinhibition states:

      PD mutants show robust motility (autoinhibition released).

      PM mutants remain largely inactive (autoinhibition retained).

      We have discussed this issue in the revised text on page #10, line#18:

      “…In our gliding assays, OSM-3PM has an increased gliding speed of 0.69 ± 0.07 μm/s (Fig. 4 C-D), similar to PD mutant. PD and PM mutations are confined to the elbow region, leaving the motor head’s mechanochemical properties intact. Upon tail immobilization—which releases autoinhibition—the gliding speeds reflect motor head activity. Single-molecule assays, however, directly resolve their native regulatory states: PD mutants are constitutively active, whereas PM mutants persist in an autoinhibited state (Fig. 4E-G)...”

      Minor Suggestions and Concerns

      (1) Lines 60-66: References that support these observations are missing from this section.

      We have added the relevant references.

      (2) Lines 66-67: I would revise this sentence as "It remains unclear how OSM-3 becomes enriched...".

      We have made the changes.

      (3) Line 85: The authors should describe how they perform these assays (i.e. recombinantly expressed NEKL-3 and OSM-3, are these C. elegans proteins, and which expression system was used...).

      We have described them in the main text and methods

      Page #4 line #26

      “...To determine whether NIMA kinase family members could directly phosphorylate OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM-3 protein in order to perform in vitro phosphorylation assays...”

      Page #35 line#12

      “...Basically, point mutations was introduced in to pET.M.3C OSM-3-eGFP-His6 plasmid for prokaryotic expression. Plasmid transformed E. coli (BL21) was cultured at 37°C and induced overnight at 23°C with 0.2 mM IPTG. Cells were lysed in lysis buffer (50 mM NaPO4 pH8.0, 250 mM NaCl, 20 mM imidazole, 10 mM bME, 0.5 mM ATP, 1 mM MgCl¬2, Complete Protease Inhibitor Cocktail (Roche)) and Ni-NTA beads were applied for affinity purification. After incubation, beads were washed with wash buffer (50 mM NaPO4 pH6.0, 250 mM NaCl, 10 mM bME, 0.1 mM ATP, 1 mM MgCl¬2) and eluted with elute buffer (50 mM NaPO4 pH7.2, 250 mM NaCl, 500 mM imidazole, 10 mM bME, 0.1 mM ATP, 1 mM MgCl¬2). Protein concentration was determined by standard Bradford assay. C elegans nekl-3 cDNA was cloned in to pGEX-6P GST vector and expressed in E. coli BL21 (DE3) and purified for in vitro phosphorylation assays. Plasmid transformed E. coli (BL21) was cultured at 37°C and induced overnight at 18°C with 0.5 mM IPTG. Cells were lysed in lysis buffer (50 mM NaPO4 pH8.0, 250 mM NaCl, 1 mM DTT, Complete Protease Inhibitor Cocktail (Roche)) and GST beads were applied for affinity purification. After incubation, beads were washed with wash buffer (50 mM NaPO4 pH6.0, 250 mM NaCl, 1 mM DTT) and eluted with elute buffer (50 mM NaPO4 pH7.2, 150 mM NaCl, 10 mM GSH, 1 mM DTT). Purified proteins were dialyzed against storge buffer (50 mM Tris-HCl, pH 8.0, 150 mM NaCl). Protein concentration was determined by standard Bradford assay...”

      (4) Line 141: The first sentence of this paragraph lacks motivation. I would start this sentence with "To directly observe the effects of phosphor mutants in the elbow region in microtubule binding and motility of OSM-3, we...".

      We have made the change.

      (5) Figure 1B: The mass spectrometry data in Figure 1B lacks adequate explanation. The Methods section should detail the experimental protocol, data interpretation, and any databases used. Additionally, the manuscript should list all identified phosphorylation sites on OSM-3 to provide context, including whether Y487_T490 is the major site.

      We have provided the detailed experimental protocol, data interpretation, and databases used in methods. We have provided all identified sites as Appendix table S1.

      (6) Figure 1C: Is it possible to model the effect of PM and PD mutations using AlphaFold? The authors should also show PAE or pLDDT scores of their model.

      AlphaFold cannot well model the effect of mutants, but we conducted the Rosetta relax to capture their possible conformational changes, as shown in the revised Figure 3. We have provided PAE and pLDDT as a new figure, Figure S2.

      (7) Figure 2D: The unit for speed should use a lowercase "s" for seconds.

      We have fixed it.

      (8) Figure 3: I am not sure whether this figure stands for a main text figure on its own, as it is only a Rosetta prediction and is not supported by any experimental data. In addition, it remains unclear what the labels on the x-axis mean.

      We have updated the figure and explain the labels on the x-axis in Figure S4 to make it more reader-friendly.

      (9) Figure 4: NEKL-3-treated OSM-1 should be included as a positive control in the in vitro experiments.

      We suspect that the Reviewer asked for NEKL-3-treated OSM-3. 

      In our other study which has just been accepted by the Journal of Cell Biology, NEKL3-treated OSM-3 significantly reduced the affinity between OSM-3 motor and microtubules and showed very low ATPase activity. We have cited and discussed this in the revised text on page #10, line #28: 

      “…As demonstrated in our recent study (Huang et al., JCB, 2025, In press, attached), phosphorylation of OSM-3 by NEKL-3 at two distinct regions—Ser96 and the conserved "elbow" motif—differentially regulates its activity and localization. Phosphorylation at Ser96 reduces OSM-3’s ATPase activity and alters its ciliary distribution from the distal segment to a uniform localization, while elbow phosphorylation induces autoinhibition, retaining OSM-3 in the cell body. Strikingly, in vitro phosphorylation of OSM-3 by NEKL-3 significantly reduces its microtubulebinding affinity, likely arising from combined modifications at both sites. We propose a model wherein elbow phosphorylation ensures anterograde ciliary transport, while Ser96 phosphorylation fine-tunes distal segment targeting. This multistep regulation may involve distinct phosphatases to reverse phosphorylation at specific sites, a hypothesis warranting further investigation….”

      (10) Figure 4C, D, and F: The unit of velocity is wrong. The authors should use the same units they used in the table shown in Figure 4B.

      We have fixed these errors

      (11) Figure 4F: The velocity of PD is a lot lower than G444E. Therefore, it would be more appropriate to refer to PD as partially active, rather than hyperactive.

      We have made the change. 

      (12) Figure 5: There is too much genetics jargon on this figure (EMF, F2, 100%Dyf,...). How are the alleles numbered? Is it OK to refer to them as Alleles 1 and 2 for simplicity?

      According to the established C. elegans allele nomenclature, each worm allele has a unique number named after the lab code for identification. We have simplified the labels and updated the figure to make it more reader-friendly.

      (13) Figure 5E: A plot would be more reader-friendly than a table. Additionally, the legend for Fig. 5E mistakenly refers to it as "D."

      We have changed the table to a plot and fixed the mistakes. We thank the Reviewer for pointing them out.

      Reviewer #2 (Recommendations for the authors):

      (1) The model appears as if NEKL-3 induces dephosphorylation of OSM-3 (Figure 6). This is not consistent with the conclusions described in the Discussion and is confusing.

      We have updated the model figure and fixed the error.

      (2) It should be described why the authors hypothesized NEKL-3 phosphorylates OSM3. Was there genetic evidence? Did the authors screened cilia-related kinases? or Did the authors identify it incidentally? Providing this information would help readers to understand the context of the research.

      We appreciate both Reviewers for pointing out this issue. 

      Our hypothesis that NEKL-3 phosphorylates OSM-3 stems from prior findings in our lab. In a previous study (Yi et al., Traffic, 2018, PMID: 29655266), we identified NEKL-4, a member of the NIMA kinase family, as a suppressor of the OSM-3(G444E) hyperactive mutation. This discovery prompted us to explore the broader role of NIMA kinases in regulating OSM-3. Subsequent genetic screens (Xie et al., EMBO J, 2024, PMID: 38806659) revealed that both NEKL-3 and NEKL-4 suppress multiple OSM-3 mutations, further supporting their functional interaction. Given the established role of NIMA kinases in phosphorylation-dependent processes (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), we hypothesized that NEKL-3/4 may directly phosphorylate OSM3 to modulate its activity.

      To test this hypothesis, we expressed recombinant C. elegans NEKL-3 and OSM-3 proteins and conducted in vitro phosphorylation assays. While we were unable to obtain active recombinant NEKL-4 (limitations noted in the revised text), our experiments with NEKL-3 revealed phosphorylation at residues 487-490 (YSTT motif) in OSM-3’s tail region, as confirmed by mass spectrometry. These findings are now explicitly contextualized in the Introduction and Results sections of the revised manuscript.

      Page #4, Line #11:

      “... In our previous study (Yi et al., Traffic, 2018, PMID: 29655266), a genetic screen targeting the OSM-3(G444E) hyperactive mutation identified NEKL-4, a member of the NIMA kinase family, as a suppressor of this phenotype. This finding, combined with reports that NIMA kinases regulate ciliary processes independently of their canonical mitotic roles (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), prompted us to investigate whether NIMA kinases modulate OSM-3-driven intraflagellar transport. We hypothesized that NEKL-3/4, as paralogs within this family, might directly phosphorylate OSM-3 to regulate its motility...”

      Page #4, line #26: 

      “... To determine whether NIMA kinase family members could directly phosphorylate OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM3 protein in order to perform in vitro phosphorylation assays. We were able to obtain active recombinant NEKL-3 but not NEKL-4. The in vitro phosphorylation assays showed that NEKL-3, directly phosphorylates OSM-3 (Fig. 1A-B, Appendix Table S1). Subsequent mass spectrometric analysis revealed phosphorylation at residues 487-490, which localize to the conserved "YSTT" motif within OSM-3’s C-terminal tail region...”

      (3) It is curious the authors have not addressed the cilia phenotype and the localization of OSM-3 in nekl-3 mutant. Regardless of whether these observations agrees with the proposed mechanisms, it is essential for the authors to show and discuss the cilia phenotype and OSM-3 localization in nekl-3 mutants.

      We thank the Reviewer for highlighting this critical point. Indeed, nekl-3 null mutants are inviable due to essential mitotic roles (Barstead et al., 2012, PMID: 23173093), precluding direct analysis of ciliary phenotypes. To bypass this limitation, we recently generated nekl-3 conditional knockouts (cKOs) in ciliated neurons (Huang et al., JCB, 2025 in press, attached). In these mutants, OSM-3—which is normally enriched in the ciliary distal segment—becomes uniformly distributed along the cilium. This redistribution correlates with premature activation of OSM-3-driven anterograde motility in the ciliary middle region, consistent with our proposed model where NEKL3 phosphorylation suppresses OSM-3 activity. We have now integrated this result and discussion into the revised manuscript, reinforcing the physiological relevance of NEKL-3-mediated regulation in ciliary transport. 

      Page #6 line #10

      “… While nekl-3 null mutants are inviable due to essential mitotic roles (Barstead et al., 2012, PMID: 23173093), conditional knockout (cKO) of nekl-3 in ciliated neurons (Huang et al., JCB, 2025 in press, attached) revealed its critical role in regulating OSM3 dynamics. In nekl-3 cKO animals, OSM-3—normally enriched in the ciliary distal segment—redistributed uniformly along the cilium, concomitant with premature activation of anterograde motility in the middle ciliary region. This phenotype aligns with our model wherein NEKL-3 phosphorylation suppresses OSM-3 activity, ensuring spatiotemporal regulation of IFT.…”

      (4) The methods section lacks some information, which is critical to reproducing this study.

      We have now provided detailed information in the methods section in the revised manuscript.

      (a) It is not described how the authors determined phosphorylation of OSM-3 by NEKL-3. In methods, nothing is described about the assay.

      We performed in vitro phosphorylation assays using recombinant OSM-3 and NEKL3 purified from bacteria. We then used LC-MS/MS for identification of phosphorylation sites. We have now updated the methods section to include all the information.

      Page #4 line #26

      “... To determine whether NIMA kinase family members could directly phosphorylate OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM3 protein in order to perform in vitro phosphorylation assays. We were able to obtain active recombinant NEKL-3 but not NEKL-4. The in vitro phosphorylation assays showed that NEKL-3, directly phosphorylates OSM-3 (Fig. 1A-B, Appendix Table S1). Subsequent mass spectrometric analysis revealed phosphorylation at residues 487-490, which localize to the conserved "YSTT" motif within OSM-3’s C-terminal tail region...”

      Page #36, line #19

      “In vitro phosphorylation assay 20 μM purified OSM-3 was incubated with 1 μM GST-NEKL-3 at 30 °C in 100 μL reaction buffer (50 mM Tris-HCl pH 8.0, 10 mM MgCl2, 150 mM NaCl, and 2 mM ATP) for 30 min. The reaction was terminated by boiling for 5 min with an SDS-sample buffer.

      Mass spectrometry

      Following NEKL-3 treatment, OSM-3 proteins were resolved by SDS-PAGE and visualized with Coomassie Brilliant Blue staining. Protein bands corresponding to OSM-3 were excised and subjected to digestion using the following protocol: reduction with 5 mM TCEP at 56°C for 30 min; alkylation with 10 mM iodoacetamide in darkness for 45 min at room temperature, and tryptic digestion at 37°C overnight with a 1:20 enzyme-to-protein ratio. The resulting peptides were subjected to mass spectrometry analysis. Briefly, the peptides were analyzed using an UltiMate 3000 RSLCnano system coupled to an Orbitrap Fusion Lumos mass spectrometer (Thermo Fisher Scientific). We applied an in-house proteome discovery searching algorithm to search the MS/MS data against the C. elegans database. Phosphorylation sites were determined using PhosphoRS algorithm with manual validation of MS/MS spectra.”

      (b) The method of structural prediction by Alfafold2 and LocalColabFold needs clarification. In general, the prediction gives several candidates. How did the authors choose one of these candidates?

      We generated five candidate models and all of them showed similar conformation. We thus chose the model with the highest confidence. We have provided PAE and pLDDT as additional data in Figure S2 and discussed them in the revised text on, Page #4, line #32: 

      “...To gain structural insights from this motif, we employed LocalColabFold based on AlphaFold2 to predict the dimeric structure of OSM-3 (Evans et al., 2022; Jumper et al., 2021; Mirdita et al., 2022). The highest-confidence model was selected for further analysis (Fig. 1C, Fig. S2)...”

      (c) The methods to predict conformational changes by introducing various point mutations are interesting (Figure 3). However, the methods require more detailed descriptions. In the current form, the manuscript only lists the tools used. The pipelines and parameters need to be described. This information is important because AlphaFoldbased predictions often give folded conformations because the training data are mainly composed of folded proteins. It is surprising that the methods applied here give open conformations induced by point mutations.

      We have described the pipelines in the revised Methods section on page#34, line#25: 

      “…OSM-3 model was predicted using LocalColabFold (Evans et al., 2022; Jumper et al., 2021; Mirdita et al., 2022). Mutated proteins were designed by Pymol 2.6, choosing the rotamer of the mutated residues in G444E, PM and PD models with the least clash as the initial conformation. To predict mutation-induced conformational changes, the initial models were subjected to Pyrosetta (Chaudhury et al., 2010). The energies of pre-relaxed models were evaluated with Rosetta Energy Function 2015 (Alford et al., 2017), and then the relax procedure were applied to the models with default parameters to obtain the relaxed models visualized by Pymol to minimize the energy of these models. In detail, to obtain the relaxed models visualized by Pymol and minimize the energy of these models, the classic relax mover was used in the procedure mentioned above with default settings. The relax script has been uploaded to Github: https://github.com/young55775/RosettaRelax_for_OSM3...”

      (5) The authors have purified proteins. Do they show different properties in gel filtration that are consistent with the structural prediction? It is anticipated that open-form mutants are eluted from earlier than closed forms.

      We thank the reviewer for this insightful suggestion. Indeed, our recent study supported that the open-from of the active OSM-3 G444E mutation were eluted earlier than the wild-type closed form (Xie et al., EMBO J., 2024). While the current study did not perform gel filtration chromatography (SEC) to directly compare the hydrodynamic properties of the OSM-3 mutants, our functional assays provide robust evidence for conformational changes predicted by structural modeling. For example: ATPase activity assays revealed that the open-state mutants (e.g., G444E and PD muatnts) exhibited significantly enhanced enzymatic activity (Figure 4A), consistent with structural predictions of an active, destabilized autoinhibitory interface (Figure 3A). These functional readouts collectively validate the predicted structural states. While SEC could further corroborate these findings by distinguishing compact (closed) versus extended (open) conformations, we prioritized assays that directly link structural predictions to in vitro enzymatic activity and in vivo ciliary transport dynamics. Future studies incorporating SEC or cryo-EM will provide additional biophysical validation of these states.

      We have revised the text in the manuscript (Page #7, Lines #22): 

      “…Notably, the open-state OSM-3 mutants (e.g., G444E) displayed elevated ATPase activity, consistent with structural predictions of autoinhibition release (Fig. 3A, Fig. 4A) (Xie et al., 2024). While hydrodynamic profiling (e.g., SEC) could further resolve conformational states, our functional assays directly connect predicted structural changes to altered biochemical and cellular activity...”

      Minor point

      (1) Line 85 "MIMA kinase family" should be "NIMA kinase family".

      We have corrected the typo and appreciate that the Reviewer for pointing it out. 

      (2) M.S. and D.S. need to be defined in Figure 2D.

      We have updated the figures.

    1. Author Response

      The following is the authors’ response to the current reviews.

      1) The main issue relates to Set2, and how STIM1 expression rescues Set2-dependent functions in Set2 KO flies. If Set2 is downstream of STIM1, how would STIM1 over-expression rescue a Set2-dependent effect?

      STIM rescue is of Set2 knockdown (RNAi) and NOT Set2 Knockout flies. Over expression of STIM raises SOCE in primary cultures of Drosophila neurons (as demonstrated in previous publications from our group: Agrawal et al., 2010; Chakraborty et al, 2016; Deb et al., 2016). The higher SOCE drives greater expression of Set2 from the endogenous locus thus reducing the efficacy of Set2 RNAi. Hence the rescue by STIM of Set2 KD flies in Figure S2E. We have explained this in lines 227-234.

      2) There is still no characterization of SOCE in fpDANs from flies expressing native Orai or the dominant negative OraiE180A mutant.

      Measurement of SOCE is not technically feasible in ex-vivo preps due to the presence of extracellular calcium in the brain milieu. In the past we have measured SOCE from primary cultures of central dopaminergic neurons expressing either native Orai OR OraiE180A mutant (Pathak et al., 2015) where we found that all dopaminergic neurons expressing OraiE180A exhibit very low SOCE. This is the reason we have not measured SOCE in the fewer cells of the fpDAN subset marked by THD' GAL4. This point has been specifically mentioned and explained in the section on “limitations of the study” at the end of the manuscript.

      3) The revised version does not include an analysis of the STIM:Orai stoichiometry, which has been demonstrated to be essential for SOCE.

      To measure such stoichiometry we would need to perform direct measurements of STIM and Orai levels by protein extraction from the fpDANs of all appropriate genotypes. This is not feasible due to the small number of cells available from each brain.

      I confirm that there are no changes to the text OR figures from the previous version of the manuscript.


      The following is the authors’ response to the original reviews.

      […]

      The manuscript by Mitra and coworkers analyses the functional role of Orai in the excitability of central dopaminergic neurons in Drosophila. The authors show that a dominant-negative mutant of Orai (OraiE180A) significantly alters the gene expression profile of flight-promoting dopaminergic neurons (fpDANs). Among them, OraiE180A attenuates the expression of Set2 and enhances that of E(z) shifting the level of epigenetic signatures that modulate gene expression. The present results also demonstrate that Set2 expression via Orai involves the transcription factor Trl. The Orai-Trl-Set1 pathway modulates the expression of VGCC, which, in turn, are involved in dopamine release. The topic investigated is interesting and timely and the study is carefully performed and technically sound; however, there are several major concerns that need to be addressed:

      1) In Figure S2E, STIM is overexpressed in the absence of Set2 and this leads to rescue. It is presumed that STIM overexpression causes excess SOCE, yet this is rarely the case. Perhaps the bigger concern, however, is how excess SOCE might overcome the loss of SET2 if SET2 mediates SOCE-induced development of flight. These data are more consistent with something other than SET2 mediating this function.

      Our statement that STIM overexpression overcomes deficits in SOCE is based on the following published work, which has been highlighted in the revised version of the manuscript (see Lines 226-233):

      1. Studies of SOCE in wildtype cultured larval Drosophila neurons demonstrated that overexpression of STIM raised SOCE to the same extent as co-expression of STIM and Orai in the WT background (Chakraborty et al, 2016; Figure 1D).

      2. Both Carbachol-induced IP3-mediated Ca2+ release and SOCE (measured by Ca2+ add back after Thapsigargin-induced store depletion) were rescued in primary cultures of IP3R hypomorphic mutant (itprku) Drosophila neurons by overexpression of STIM (Agrawal et al., 2010; Figure 8A-G).

      3. Deb et al., 2016 (Supplementary Figure 2h,i) reaffirmed that overexpression of STIM significantly improves SOCE after Thapsigargin-induced passive store-depletion in Drosophila neurons expressing IP3RRNAi.

      4. Consistent with the cellular rescue of SOCE, defects in flight initiation and physiology observed in the heteroallelic IP3R hypomorphic background (itprku) could be rescued by overexpression of STIM (Agrawal et al., 2010; Figure 3A-E) as well as Orai (Venkiteswaran and Hasan, 2009; Figure 3).

      5. In Figure S2E, we show that flight deficits arising from THD’> Set2RNAi are rescued upon overexpression of STIM (i.e. THD’>Set2RNAi; STIMOE). Here and in another recent publication (Mitra et al., 2021) we show that neurons expressing Set2RNAi exhibit reduced expression of the IP3R and reduced ER-Ca2+ release presumably leading to reduced SOCE. As mentioned above we have consistently found that STIM overexpression raises both IP3-mediated Ca2+ release and SOCE in Drosophila neurons.

      In this study, we propose that Ca2+ release through the IP3R followed by SOCE are part of a positive feedback loop (described in the revised manuscript- see Lines 302-307) driving expression of Set2 which in turn upregulates expression of mAChR and IP3R (Figure 3F) to regulate dopaminergic neuron function. Our observation that loss of Set2 (THD’>Set2RNAi) can be rescued by STIM overexpression is consistent with this model because:

      1. Loss of Set2 (THD’>Set2RNAi) results in downregulation of several genes including mAChR and IP3R leading to decreased SOCE.

      2. As evident from our previous studies increased STIM expression in the Set2RNAi background (THD’>Set2RNAi; STIMOE) is expected to enhance SOCE which we predict would rescue Set2 expression leading to rescue of other Set2 dependent downstream functions like flight (Figure 2D).

      2) In Figure 3, data is provided linking SET2 expression and Cch-induced Ca2+ responses. The presentation of these data is confusing. In addition, the results may be a simple side effect of SET2-dependent expression of IP3R. Given that this article is about SOCE, why isn't SOCE shown here? More generally, there are no measurements of SOCE in this entire article. Measuring SOCE (not what is measured in response to Cch) could help eliminate some of this confusion.

      This section has been re-written in the revised version for better clarity and we have explained how Set2-dependent IP3R expression is an important component of Orai-mediated Ca2+ entry in fpDANs (see Lines 302-307). Here, we propose that IP3-mediated Ca2+ release and SOCE, through Orai, are together part of a positive feedback loop (see Lines 286-307) driving transcription of Set2 which in turn upregulates mAChR and IP3R expression (Figure 3F). We hypothesized that the observed loss of CCh-induced Ca2+ response in the Set2RNAi background (Figure 3B-D; THD’>Set2RNAi) results from decreased itpr and mAChR expression and verified this in Figure 3E. This is further validated by the rescue of CCh-induced Ca2+ response and itpr/mAChR expression in the OraiE180A background upon Set2 overexpression (Figure 3B-E; THD’>OraiE180A; Set2OE). We were constrained to measure CCh-induced Ca2+ responses in OraiE180A expressing neurons for the following reasons (highlighted in the revised version of the manuscript- (See Lines 307-313; ‘Limitations of the study’-Lines 719-735):

      1. SOCE measurements through Tg mediated store Ca2+ release followed by Ca2+ add back require a 0 Ca2+ environment that can only be achieved in culture. The Drosophila brain is bathed in hemolymph which contains Ca2+ and there do not exist any methods to readily deplete Ca2+ from the tissue to create a 0 Ca2+ environment without also effecting the health of the neurons.

      2. Cultures of the subset of dopaminergic neurons (THD’) we have focused on in this study were not feasible due to the small number of neurons being studied from the total number of dopaminergic neurons in the brain (~35/400). In previous studies we have shown that SOCE post-Tg induced store depletion is abrogated in cultured dopaminergic neurons from Drosophila upon expression of OraiE180A (Pathak et al., 2015). Furthermore, Carbachol-induced IP3-mediated Ca2+ release is tightly coupled to SOCE in Drosophila neurons (Venkiteswaran and Hasan, 2009) and Ca2+ release from the IP3R is physiologically relevant for flight behavior in THD’ neurons (Sharma and Hasan, 2020).

      3) A significant gap in the study relates to the conclusion that trl is a SOCE-regulated transcription factor. This conclusion is entirely based on genetic analysis of STIMKO heterozygous flies in which a copy of the trl13C hypomorph allele is introduced. While these results suggest a genetic interaction between the expression of the two genes, the evidence that expression translates into a functional interaction that places trl immediately downstream of SOCE is not rigorous or convincing. All that can be said is that the double mutant shows a defect in flight which could arise from an interruption of the circuit. Further, it is not clear whether the trl13C hypomorph is only introduced during the critical 72-96 hour time window when the Orai1E180E phenotype shows up. The same applies to the over-expression of Set2 and the other genes. If the expression is not temporally controlled, then the phenotype could be due to the blockade of an entirely different aspect of flight neuron function.

      The idea that Trl functions downstream of Orai-mediated Ca2+ entry in THD’ neurons is based on the following genetic evidence (highlighted in the revised version; see Lines 339-341; 351-367; 647-65; ‘Limitations of the study’: 736-739)

      1. In Figure 4D, we show evidence of genetic interaction between trl-STIM and trl-Set2. The rescue of trl13c/STIMKO with STIM overexpression in THD’ neurons indicates that excess SOCE (driven by STIMOE) may activate the residual Trl (there exists a WT Trl copy in this genetic background) to rescue THD’ flight function. This is further supported by the rescue of trl/STIMKO with Set2 overexpression in THD’ neurons, which is consistent with the feedback loop model proposed in Figure 5C (see Lines 390-396) where we propose that reduced SOCE leads to reduced ‘activated’ Trl and thus reduced Set2 expression, and the latter is rescued by SET2OE . The manner in which SOCE ‘activates’ Trl is the subject of ongoing investigations.

      2. The trl hypomorphic alleles (including trl13C) exist as genetic mutants and they affect Trl function in all tissues throughout development. While we concede that these mutant alleles would affect multiple functions at other stages of development, which may impinge on the phenotypes noted in Figure S4B, we have used a targeted RNAi approach to validate Trl function specifically in the THD’ neurons (see Figure 4C; Lines 339-341).

      3. Overexpression mediated rescues (including Set2) were not induced only during the critical 72-96 hrs APF developmental window. Having established that Orai function drives critical gene expression during this window (Figure 1), it is reasonable to assume that Set2 rescue of loss of flight in OraiE180A occurs in the same time window where flight is disrupted (see Lines 221-224).

      4) In Figure 4, data is shown that SOCE compensates for the loss of Trl, the presumed mediator of SOCE-dependent flight. The fact that flight deficits are rescued by raising SOCE in the absence of Trl is very inconsistent with this conclusion.

      We apologise for this confusion and have clarified in the revision (see Lines 346-367). trl13c is a recessive allele of Trl and has been written as such throughout the text and in the figures (i.e trl13c and NOT Trl13c). In all cases of Trl mutant rescue by STIMOE and Set2OE there exists residual Trl that can be activated by excess SOCE thus leading to the rescue. This is true for trl13C/ STIMKO where each mutant is present as a heterozygote (the complete genotype of this strain is STIMKO/+; trl13c/+; this has been corrected in the revision). Similarly, for TrlRNAi we expect reduced levels (but not complete loss) of Trl. Thus the SOCE rescue of loss of Trl occurs in conditions where Trl levels are reduced but NOT absent. Homozygous trl null mutants are lethal.

      5) In Figure 5 (A-C), data is provided that Trl transcripts are unaffected by loss of SOCE and that overexpression cannot rescue flightlessness. From this, the authors conclude that this gene "must" be calcium responsive. While that is one possibility, it is also possible that these genes are not functionally linked.

      The idea that Trl is functionally linked to SOCE is based on the following evidence (included in the revised version- see Lines 339-341; 346-367; 391-396)

      1. In Figure 4C we show that flight defects caused by partial loss of Trl (THD’>TrlRNAi) were rescued by STIM overexpression (THD’>TrlRNAi; STIMOE). As mentioned above we have found that STIM overexpression raises SOCE.

      2. Heteroalleles of the trl13C hypomorph exhibit a strong genetic interaction with a single copy of the null allele of STIMKO as shown by the flight deficit of trl13c/+; STIMKO/+ (trl13C/STIMKO ) flies (Figure 4D). The genotypes will be corrected in the revision.

      3. Flight defects in trl13C/STIMKO flies could be rescued by STIM overexpression in the THD’ neurons (trl13C/STIMKO; THD’>STIMOE)

      4. In Figure 4E, we show that partial loss of Trl in THD’ neurons (THD’>TrlRNAi) leads to decreased expression of the Ca2+ responsive genes mAChR, itpr, and Set2 genes indicating that Trl is a constituent of the SOCE-driven transcriptional feedback loop (see Figure 5C).

      Since we could not detect a well-defined Ca2+ binding domain in Trl, we hypothesize that it could be activated by a Ca2+ dependent post-translational modification. Phosphoproteome analysis of Trl demonstrated that it does indeed undergo phosphorylation at a Threonine residue (T237; Zhai et al., 2008), which lies within a potential site for CaMKII. Independently, CaMKII has been identified as a binding partner of Trl from a Trl interactome study (Lomaev et al., 2018). Past work from our group (Ravi et al., 2018) identified a role for CaMKII in THD’ neurons in the context of flight. We are currently testing if CaMKII functions downstream of SOCE in THD’ neurons to mediate flight and will update this information in the next version of the manuscript.

      Now included in the revised version of the manuscript as Figure S5; Lines 397-424)

      6) There is no characterization of SOCE in fpDANs from flies expressing native Orai or the dominant negative OraiE180A mutant. While the authors refer to previous studies, as the manuscript is essentially based on Orai function thapsigargin-induced SOCE should be tested using the Ca2+ add-back protocol in order to assess the release of Ca2+ from the ER in response to thapsigargin as well as the subsequent SOCE.

      The fpDANs consist of 16-19 neurons in each hemisphere (PPL1 are 10-12 and PPM3 are 6-7 cells; Pathak et al., 2015). Measuring SOCE from these neurons in vivo is not possible due to the presence of abundant extracellular Ca2+ in the brain. Given their sparse number, it proved technically challenging to isolate the fpDANs in culture to perform SOCE measurements using the Ca2+ add back protocol. Due to these reasons, we have relied upon using Carbachol to elicit IP3-mediated Ca2+ release and SOCE as a proxy for in vivo SOCE. In previous studies we have shown that Carbachol treatment of cultured Drosophila neurons elicits IP3-mediated Ca2+ release and SOCE (Agrawal et al., 2010; Figure 8). Moreover, expression of OraiE180A completely blocks SOCE as measured in primary cultures of dopaminergic neurons (Pathak et al., 2015; Figure 1E). Hence we have not repeated SOCE measurements from all dopaminergic neurons in this work. In the revised version we have explicitly stated this weakness of our study and the reasons for it (See Lines 307-313; ‘Limitations of the study’-Lines 719-735).

      7) In the experiments performed to rescue flight duration in Set2RNAi individuals the authors overexpress STIM and attribute the effect to "Excess STIM presumably drives higher SOCE sufficient to rescue flight bout durations caused by deficient Set2 levels.". This should be experimentally tested as the STIM:Orai stoichiometry has been demonstrated as essential for SOCE.

      The assumption that STIM overexpression drives higher SOCE is based upon previously published work from Drosophila neurons (Agrawal et al., 2010; Chakraborty et al, 2016; Deb et al., 2016) which demonstrates that excess WT STIM overcomes IP3R deficiencies (RNAi or hypomorphic mutants) to rescue SOCE. We agree that STIM-Orai stoichiometry is essential for SOCE, and propose that the rescue backgrounds possess sufficient WT Orai, which is recruited by the excess STIM to mediate the rescue. We have referenced the earlier work to validate our use of STIMOE for rescue of SOCE (See Lines 226-233).

      Here, we propose that Set2 is part of a positive feedback loop (see Lines 286-307) driving transcription of mAChR and IP3R (Figure 3F). In keeping with this hypothesis, we posit that the phenotypes observed in the Set2RNAi background (Figure 2D) result from decreased itpr and mAChR expression (validated in Figure 3E). This is further validated by the Set2 overexpression mediated rescue of OraiE180A (Figure 2D) and rescue of itpr/mAChR expression in the OraiE180A background (Figure 3B-E; THD’>OraiE180A; Set2OE).

      8) The authors show that overexpression of OraiE108A results in Stim downregulation at a mRNA level. What about the protein level? And more important, how does OraiE108A downregulate Stim expression? Does it promote Stim degradation? Does it inhibit Stim expression?

      We hypothesize that changes in STIM mRNA observed in the THD’ > OraiE180A neurons stems from an overall reduction in IP3-mediated Ca2+ release and SOCE due to loss of Trl-Set2 driven gene expression detailed in our transcriptional feedback loop model (Figure 5C; see Lines 286-307; 581-591). We have attempted to explain this aspect more clearly in the revised version of the manuscript. While we agree that measuring levels of STIM protein would be helpful, estimation of protein levels from a limited number of neurons (~35 cells per brain) is technically challenging. The STIM antibody does not work well in immunohistochemistry. In the absence of any experimental evidence we cannot comment on how expression of OraiE180A might affect STIM protein turnover (see Lines 307-313).

      9) Lines 271-273, the authors state "whereas overexpression of a transgene encoding Set2 in THD' neurons either with loss of SOCE (OraiE180A) or with knockdown of the IP3R (itprRNAi), lead to significant rescue of the Ca2+ response". This is attributed to a positive effect of Set2 expression on IP3R expression and the authors show a positive correlation between these two parameters; however, there is no demonstration that Set2 expression can rescue IP3R expression in cells where the IP3R is knocked down (itprRNAi). This should be further demonstrated.

      The rescue of IP3R expression by Set2 overexpression in itprRNAi was demonstrated in a different set of Drosophila neurons in an earlier study (Mitra et al., 2021) and has not been repeated specifically in THD’ neurons (see Lines 286-307). Similar to the previous study, here we tested CCh stimulated Ca2+ responses of THD’ neurons with itprRNAi and itprRNAi; SetOE (Fig S3), which are indeed rescued by SET2OE see Lines 280-285)

      10) The data presented in Figure 3E should be functionally demonstrated by analyzing the ability of CCh to release Ca2+ from the intracellular stores in the absence of extracellular Ca2+.

      CCh-mediated Ca2+ release from the intracellular stores in the absence of extracellular Ca2+ has been described in primary cultures of Drosophila neurons in previously published work (Venkiteswaran and Hasan, 2009; Agrawal et al., 2010) This work focuses on a set of 16-19 dopaminergic neurons in a hemisphere of the Drosophila central brain. It is technically challenging to generate a 0 Ca2+ environment in vivo, which is essential for measuring store Ca2+ release. Given their meagre numbers, primary cultures of these neurons is not readily feasible. (see Lines 307-313; ‘Limitations of the study’-Lines 719-735)

      11) The conclusion that SOCE regulates the neuronal excitability threshold is based entirely on either partial behavioral rescue of flight, or measurements of KCl-induced Ca2+ rises monitored by GCaMP6m in DAN neurons. The threshold for neuronal excitability is a precise parameter based on rheobase measurements of action potentials in current-clamp. Measurements of slow calcium signals using a slow dye such as GCaMp6m should not be equated with neuronal excitability. What is measured is a loss of the calcium response in high K depolarization experiments, which occurs due to the loss of expression of Cav channels. Hence, the use of this term is not accurate and will confuse readers. The use of terms referring to neuronal excitability needs to be changed throughout the manuscript. As such, the conclusions regarding neuronal excitability should be strongly tempered and the data reinterpreted as there are no true measurements of neuronal excitability in the manuscript. All that can be said is that expression of certain ion channel genes is suppressed. Since both Na+ channels and K+ channel expression is down-regulated, it is hard to say precisely how membrane excitability is altered without action potential analysis.

      The claim that SOCE influences neuronal excitability is based on the following observations:

      1. Interruption of the transcriptional feedback loop involving SOCE, Trl, and Set2 through loss of any of its constituents, results in the downregulation of VGCCs (Figure 5G, 6H), which are essential components of action potentials.

      2. OraiE180A mediated loss of SOCE in THD’ neurons abrogates the KCl-evoked depolarization response (Figure 6B, C) measured using GCaMP6m. We verified that this response requires VGCC function using pharmacological inhibition of L-type VGCCs (Figure 6E, F).

      3. SOCE deficient THD’ neurons, which were presumably compromised in their ability to evoke action potentials could be rescued to undergo KCl-evoked depolarisation by expression of NachBac, which lowers the depolarization threshold (Figure 7C, D) or through optogenetic stimulation using CsChrimson (Figure 7F).

      We agree that ‘neuronal excitability threshold’ is a precise electrophysiological parameter that has not been directly investigated here by measurement of action potentials. Therefore, references to neuronal excitability have been tempered throughout the revised manuscript and be replaced with a more generic reference to ‘neuronal activity’. In this context we have included further evidence supporting reduced activity of THD’ neurons upon loss of SOCE in the revision.

      Since one of the key functional outcomes of activity during critical developmental periods such as the 72-96 hrs APF developmental window identified in this study, is remodelling of neuronal morphology, we decided to investigate the same in our context. Neuronal activity can drive changes in neurite complexity and axonal arborization (Depetris-Chauvin et al., 2011) especially during critical developmental periods (Sachse et al., 2007). To understand if Orai mediated Ca2+ entry and downstream gene expression through Set2 affects this activity-driven parameter, we investigated the morphology of fpDANs, and specifically measured the complexity of presynaptic terminals within the 2’1 lobe MB using super-resolution microscopy. We found striking changes in the neurite volume upon expression of OraiE180A which could be rescued by restoring either Set2 (OraiE180A; Set2OE) or by inducing hyperactivity through NachBac expression (OraiE180A ; NachBacOE). These data have been included in the revised manuscript (Figure 8 B, C, D; see Lines 481-482; 519-534; 584-591; 701-704).

      12) Related, since trl does not contain any molecular domains that could be regulated by Ca2+ signaling, it is unclear whether trl is directly regulated by SOCE or the regulation is highly indirect. Reporter assays evaluating trl activation upon Ca2+ rises would provide much stronger and more direct evidence for the conclusion that trl is a SOCE-regulated TF. As such the evidence is entirely based on RNAi downregulation of trl which indicates that trl is essential but has no bearing on exactly what point of the signaling cascade it is involved.

      We agree that luciferase Trl reporters would provide a direct method to test SOCE-mediated activation. Future investigations will be targeted in this direction. Regarding possible mechanisms of Trl activation - since we could not detect a well-defined Ca2+ binding domain in Trl, we hypothesize that it may be phosphorylation by a Ca2+ sensitive kinase. Phosphoproteome analysis of Trl indicates that it does indeed undergo phosphorylation at a Threonine reside (T237; Zhai et al., 2008), which may be mediated by the Ca2+ sensitive kinase-CaMKII based on binding partners identified in the Trl interactome (Lomaev et al., 2018; Past work (Ravi et al., 2018) has indeed demonstrated a requirement for CaMKII in THD’ neurons for flight. We are currently testing whether CaMKII functions downstream of SOCE in these neurons to mediate flight, and will be updating this information in the next version of the manuscript.

      New data and analysis has been included - see Figure S5; ‘Limitations of the study’- Lines 397-424; 736-739).

      13) Are NFAT levels altered in the Orai1 loss of function mutant? If not, this should be explicitly stated. It would seem based on previous literature that some gene regulation may be related to the downregulation of this established Ca2+-dependent transcription factor. Same for NFkb.

      As mentioned in the revised version of the manuscript (see Lines 315-326), Drosophila NFAT lacks a calcineurin binding site and is therefore not sensitive to Ca2+ (Keyser et al., 2007). In the past we tested if knockdown of NF-kB in dopaminergic neurons gave a flight phenotype and did not observe any measurable deficit. From the RNAseq data we find a slight downregulation of NFAT (0.49 fold, p value=0.048) and NF-kb (0.26 fold, p value =0.258) the significance of which is unclear at this point. We did not find any consensus binding sites for these two factors in the regulatory regions of downregulated genes from THD’ neurons.

      14) Does over-expression of Set2 restore ion channel expression especially those of the VGCCs? This would provide rigorous, direct evidence that SOCE-mediated regulation of VGCCs through Set2 controls voltage-gated calcium channel signaling.

      Set2 overexpression in the OraiE180A background indeed restores the expression of VGCC genes (see Figure 6H; Lines 461-468).

      15) All 6 representative panels from Figure 3B are duplicated in Figure 4G. Likewise, 2 representative panels from Figure 5H are duplicated in Figure 6D. Although these panels all represent the results from control experiments, the relevant experiments were likely not conducted at the same time and under the same conditions. Thus, control images from other experiments should not be used simply because they correspond to controls. This situation should be clarified.

      We regret the confusion caused by the same representative images for the control experiments. These have been replaced by new representative images for Figure 4G and 6D in the updated version of the manuscript.

      16) The figures are unusually busy and difficult to follow. In part this is because they usually have many panels (Fig. 1: A-I; Fig. 2, A-J, etc) but also because the arrangement of the panels is not consistent: sometimes the following panel is found to the right, other times it is below. It would help the reader to make the order of the panels consistent, and, if possible, reduce the number of panels and/or move some of the panels to new figures (eLife does not limit the number of display items).

      The image panels have been rearranged for ease of reading in the updated version of the manuscript.

      17) As a final recommendation, the reviewers suggest that the authors a- Reword the text that refers to membrane excitability since membrane excitability was not directly measured here. b-Explain why STIM1 rescues the partial loss of flight in Set2 RNAi flies (Fig. S2E); and c- Explain how/why trl is calcium regulated and test using luciferase (or other) reporter assays whether Orai activation leads to trl activation.

      a. Textual references to membrane excitability have been appropriately modified and some new data has been included in this regard (see Figure 8 B, C, D; Lines 481-483; 519-534; 584-591; 701-704).

      b. We have provided a detailed explanation for how STIM overexpression might rescue the phenotypes caused by Set2RNAi in Point 1 (see Lines 226-233). In short, these phenotypes depend upon IP3R mediated Ca2+ entry driving a transcriptional feedback loop. We relied upon past reports that STIM overexpression upregulates IP3R-mediated Ca2+ release and SOCE in Drosophila itpr mutant neurons (Agrawal et al., 2010; Chakraborty et al, 2016; Deb et al, 2016). We therefore propose that STIM overexpression in the Set2RNAi background rescues IP3R mediated Ca2+ release followed by SOCE, which drives enhanced Set2 transcription, counteracting the effects of the RNAi. We will explain this more clearly with past references in the next revision.

      c. We have provided a detailed response to this comment in Point 12. Briefly, we agree that building luciferase reporters for Trl could be an ideal strategy to test for its responsiveness to SOCE and needs to be done in future. As an alternate strategy, we have looked at data from existing studies of interacting partners of Trl (Lomaev et al., 2017) and identified CamKII, which is both Ca2+ responsive (Braun and Schulman, 1995; Yasuda et al., 2022), and thus might activate Trl through a phosphorylation-switch like mechanism (see Figure S5; ‘Limitations of the study’-736-739; Lines 397-424). Moreover, a previous publication identified a requirement for CamKII in THD’ neurons for Drosophila flight (Ravi et al., 2018). We have tested the ability of a dominant active version of CamKII to rescue THD’>E180A flight deficits and have included this information in the next version of the manuscript.

      References

      1. Agrawal N, Venkiteswaran G, Sadaf S, Padmanabhan N, Banerjee S, Hasan G. Inositol 1,4,5-Trisphosphate Receptor and dSTIM Function in Drosophila Insulin-Producing Neurons Regulates Systemic Intracellular Calcium Homeostasis and Flight. J Neurosci. 2010;30:1301-1313. doi:10.1523/jneurosci.3668-09.2010

      2. Braun AP, Schulman H. A non-selective cation current activated via the multifunctional Ca(2+)-calmodulin-dependent protein kinase in human epithelial cells. J Physiol. 1995. 488:37-55. doi:10.1113/jphysiol.1995.sp020944

      3. Chakraborty S, Deb BK, Chorna T, Konieczny V, Taylor CW, Hasan G. Mutant IP3 receptors attenuate store-operated Ca2+ entry by destabilizing STIM-Orai interactions in Drosophila neurons. J Cell Sci. 2016. 129:3903-3910. doi:10.1242/jcs.191585

      4. Deb BK, Pathak T, Hasan G. Store-independent modulation of Ca2+ entry through Orai by Septin 7. Nat Commun. 2016. 7:11751. doi:10.1038/ncomms11751

      5. Depetris-Chauvin A, Berni J, Aranovich EJ, Muraro NI, Beckwith EJ, Ceriani MF. Adult-specific electrical silencing of pacemaker neurons uncouples molecular clock from circadian outputs. Curr Biol. 2011. 21:1783-1793. doi: 10.1016/j.cub.2011.09.027.

      6. Keyser P, Borge-Renberg K, Hultmark D. The Drosophila NFAT homolog is involved in salt stress tolerance. Insect Biochem Mol Biol. 2007. 37:356-362. doi:10.1016/j.ibmb.2006.12.009

      7. Kilo L, Stürner T, Tavosanis G, Ziegler AB. Drosophila Dendritic Arborisation Neurons: Fantastic Actin Dynamics and Where to Find Them. Cells. 2021. 10:2777. doi:10.3390/cells10102777

      8. Lomaev D, Mikhailova A, Erokhin M, et al. The GAGA factor regulatory network: Identification of GAGA factor associated proteins. PLoS One. 2017. 12:e0173602. doi:10.1371/journal.pone.0173602

      9. Mitra R, Richhariya S, Jayakumar S, Notani D, Hasan G. IP3/Ca2+ signals regulate larval to pupal transition under nutrient stress through the H3K36 methyltransferase dSET2. Development. 2021. 148:dev199018. doi:10.1101/2020.11.25.399329

      10. Pathak T, Agrawal T, Richhariya S, Sadaf S, Hasan G. Store-Operated Calcium Entry through Orai Is Required for Transcriptional Maturation of the Flight Circuit in Drosophila. J Neurosci. 2015. 35:13784-13799. doi:10.1523/jneurosci.1680-15.2015

      11. Ravi P, Trivedi D, Hasan G. FMRFa receptor stimulated Ca2+ signals alter the activity of flight modulating central dopaminergic neurons in Drosophila melanogaster. Barsh GS, ed. PLOS Genet. 2018. 14:e1007459. doi:10.1371/journal.pgen.1007459

      12. Sachse S, Rueckert E, Keller A, Okada R, Tanaka NK, Ito K, Vosshall LB. Activity-dependent plasticity in an olfactory circuit. Neuron. 2007. 56:838-50. doi: 10.1016/j.neuron.2007.10.035.

      13. Sharma A, Hasan G. Modulation of flight and feeding behaviours requires presynaptic IP3Rs in dopaminergic neurons. Elife. 2020;9. e62297.doi:10.7554/elife.62297

      14. Venkiteswaran G, Hasan G. Intracellular Ca2+ signalling and store operated Ca2+ entry are required in Drosophila neurons for flight. Proc Natl Acad Sci. 2009.106:10326-10331. doi: 10.1073/pnas.0902982106

      15. Yasuda R, Hayashi Y, Hell JW. CaMKII: a central molecular organizer of synaptic plasticity, learning and memory. Nat Rev Neurosci. 2022. 23: 666-682 doi:10.1038/s41583-022-00624-2

      16. Zhai B, Villén J, Beausoleil SA, Mintseris J, Gygi SP. Phosphoproteome Analysis of Drosophila melanogaster Embryos. J Proteome Res. 2008. 7:1675-1682. doi:10.1021/pr700696a

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:<br /> The global decline of amphibians is primarily attributed to deadly disease outbreaks caused by the chytrid fungus, Batrachochytrium dendrobatidis (Bd). It is unclear whether and how skin-resident immune cells defend against Bd. Although it is well known that mammalian mast cells are crucial immune sentinels in the skin and play a pivotal role in the immune recognition of pathogens and orchestrating subsequent immune responses, the roles of amphibian mast cells during Bd infections are largely unknown. The current study developed a novel way to enrich X. laevis skin mast cells by injecting the skin with recombinant stem cell factor (SCF), a KIT ligand required for mast cell differentiation and survival. The investigators found an enrichment of skin mast cells provides X. laevis substantial protection against Bd and mitigates the inflammation-related skin damage resulting from Bd infection. Additionally, the augmentation of mast cells leads to increased mucin content within cutaneous mucus glands and shields frogs from the alterations to their skin microbiomes caused by Bd.

      Strengths:<br /> This study underscores the significance of amphibian skin-resident immune cells in defenses against Bd and introduces a novel approach to examining interactions between amphibian hosts and fungal pathogens.

      Weaknesses:<br /> The main weakness of the study is the lack of functional analysis of X. laevis mast cells. Upon activation, mast cells have the characteristic feature of degranulation to release histamine, serotonin, proteases, cytokines, and chemokines, etc. The study should determine whether X. laevis mast cells can be degranulated by two commonly used mast cell activators IgE and compound 48/80 for IgE-dependent and independent pathways. This can be easily done in vitro. It is also important to assess whether in vivo these mast cells are degranulated upon Bd infection using avidin staining to visualize vesicle releases from mast cells. Figure 3 only showed rSCF injection caused an increase in mast cells in naïve skin. They need to present whether Bd infection can induce mast cell increase and rSCF injection under Bd infection causes a mast cell increase in the skin. In addition, it is unclear how the enrichment of mast cells provides protection against Bd infection and alternations to skin microbiomes after infection. It is important to determine whether skin mast cells release any contents mentioned above.

      We would like to thank the reviewer for taking the time to review our work and for providing us with valuable feedback.

      Please note that amphibians do not possess the IgE antibody isotype1.

      To our knowledge there have been no published studies using approaches for studying mammalian mast cell degranulation to examine amphibian mast cells. Notably, several studies suggest that amphibian mast cells lack histamine2, 3, 4, 5 and serotonin2, 6. While there are commercially available kits and reagents for examining mammalian mast cell granule content, most of these reagents may not cross-react with their amphibian counterparts. This is especially true of cytokines and chemokines, which diverged quickly with evolution and thus do not share substantial protein sequence identity across species as divergent as frogs and mammals. Respectfully, while following up on these findings is possible, it would involve considerable additional work to find reagents that would detect amphibian mast cell contents.

      We would also like to respectfully point out that while mast cell degranulation is a feature most associated with mammalian mast cells, this is not the only means by which mammalian mast cells confer their immunological effects. While we agree that defining the biology of amphibian mast cell degranulation is important, we anticipate that since the anti-Bd protection conferred by enriching frog mast cells is seen after 21 days of enrichment, it is quite possible that degranulation may not be the central mechanism by which the mast cells are mediating this protection.

      As noted in our manuscript, frog mast cells upregulate their expression of interleukin-4 (IL4), which is a hallmark cytokine associated with mammalian mast cells7. We are presently exploring the role of the frog IL4 in the observed mast cell anti-Bd protection. Should we generate meaningful findings in this regard, we will add them to the revised version of this manuscript.

      We are also exploring the heparin content of frog mast cells and capacities of these cells to degranulate in vitro in response to compound 48/80. In addition, we are exploring in vivo mast cell degranulation via histology and avidin-staining. Should these studies generate significant findings, we will include them in the revised version of this manuscript.

      Per the reviewer’s suggestion, in our revised manuscript we also plan to include data showing whether Bd infections affect skin mast cell numbers and how rSCF injection impacts skin mast cell numbers in the context of Bd infections.

      In regard to how mast cells impact Bd infections and skin microbiomes, our data indicate that mast cells are augmenting skin integrity during Bd infections and promoting mucus production, as indicated by the findings presented in Figure 4A-C and Figure 5A-C, respectively. There are several mammalian mast cell products that elicit mucus production. In mammals, this mucus production is mediated by goblet cells while the molecular control of amphibian skin mucus gland content remains incompletely understood. Interleukin-13 (IL13) is the major cytokine associated with mammalian mucus production8, while to our knowledge this cytokine is either not encoded by amphibians or else has yet to be identified and annotated in these animals’ genomes. IL4 signaling also results in mucus production9 and we are presently exploring the possible contribution of the X. laevis IL4 to skin mucus gland filling. Any significant findings on this front will be included in the revised manuscript. Histamine release contributes to mast cell-mediated mucus production10, but as we outline above, several studies indicate that amphibian mast cells may lack histamine2, 3, 4, 5. Mammalian mast cell-produced lipid mediators also play a critical role in eliciting mucus secretion11 and our transcriptomic analysis indicates that frog mast cells express several enzymes associated with production of such mediators. We will highlight this observation in our revised manuscript.

      We anticipate that X. laevis mast cells influence skin integrity, microbial composition and Bd susceptibility in a myriad of ways. Considering the substantial differences between amphibian and mammalian evolutionary histories and physiologies, we anticipate that many of the mechanisms by which X. laevis mast cells confer anti-Bd protection will prove to be specific to amphibians and some even unique to X. laevis. We are most interested in deciphering what these mechanisms are but foresee that they will not necessarily reflect what one would expect based on what we know about mammalian mast cells in the context of mammalian physiologies.

      Reviewer #2 (Public Review):

      Summary:<br /> In this study, Hauser et al investigate the role of amphibian (Xenopus laevis) mast cells in cutaneous immune responses to the ecologically important pathogen Batrachochytrium dendrobatidis (Bd) using novel methods of in vitro differentiation of bone marrow-derived mast cells and in vivo expansion of skin mast cell populations. They find that bone marrow-derived myeloid precursors cultured in the presence of recombinant X. laevis Stem Cell Factor (rSCF) differentiate into cells that display hallmark characteristics of mast cells. They inject their novel (r)SCF reagent into the skin of X. laevis and find that this stimulates the expansion of cutaneous mast cell populations in vivo. They then apply this model of cutaneous mast cell expansion in the setting of Bd infection and find that mast cell expansion attenuates the skin burden of Bd zoospores and pathologic features including epithelial thickness and improves protective mucus production and transcriptional markers of barrier function. Utilizing their prior expertise with expanding neutrophil populations in X. laevis, the authors compare mast cell expansion using (r)SCF to neutrophil expansion using recombinant colony-stimulating factor 3 (rCSF3) and find that neutrophil expansion in Bd infection leads to greater burden of zoospores and worse skin pathology.

      Strengths: <br /> The authors report a novel method of expanding amphibian mast cells utilizing their custom-made rSCF reagent. They rigorously characterize expanded mast cells in vitro and in vivo using histologic, morphologic, transcriptional, and functional assays. This establishes solid footing with which to then study the role of rSCF-stimulated mast cell expansion in the Bd infection model. This appears to be the first demonstration of the exogenous use of rSCF in amphibians to expand mast cell populations and may set a foundation for future mechanistic studies of mast cells in the X. laevis model organism. 

      We thank the reviewer for recognizing the breadth and extent of the undertaking that culminated in this manuscript. Indeed, this manuscript would not have been possible without considerable reagent development and adaptation of techniques that had previously not been used for amphibian immunity research. In line with the reviewer’s sentiment, to our knowledge this is the first report of using molecular approaches to augment amphibian mast cells, which we hope will pave the way for new areas of research within the fields of comparative immunology and amphibian disease biology.

      Weaknesses:<br /> The conclusions regarding the role of mast cell expansion in controlling Bd infection would be stronger with a more rigorous evaluation of the model, as there are some key gaps and remaining questions regarding the data. For example:

      1. Granulocyte expansion is carefully quantified in the initial time courses of rSCF and rCSF3 injections, but similar quantification is not provided in the disease models (Figures 3E, 4G, 5D-G). A key implication of the opposing effects of mast cell vs neutrophil expansion is that mast cells may suppress neutrophil recruitment or function. Alternatively, mast cells also express notable levels of csfr3 (Figure 2) and previous work from this group (Hauser et al, Facets 2020) showed rG-CSF-stimulated peritoneal granulocytes express mast cell markers including kit and tpsab1, raising the question of what effect rCSF3 might have on mast cell populations in the skin. Considering these points, it would be helpful if both mast cells and neutrophils were quantified histologically (based on Figure 1, they can be readily distinguished by SE or Giemsa stain) in the Bd infection models.

      We thank the reviewer for this insightful suggestion. We are performing a further examination of skin granulocyte content during Bd infections and plan on including any significant findings in our revised manuscript.

      We predict that rSCF administration results in the accumulation of mast cells that are polarized such that they ablate the inflammatory response elicited by Bd infection. Mammalian mast cells, including peritonea-resident mast cells, express csf3r12, 13. Although the X. laevis animal model does not permit nearly the degree of immune cell resolution afforded by mammalian animal models, we do know that the adult X. laevis peritonea contain heterogenous leukocyte populations. We anticipate that the high kit expression reported by Hauser et al., 2020 in the rCSF3-recruited peritoneal leukocytes reflects the presence of mast cells therein. As such and in acknowledgement of the reviewer’s suggestion, we also think that the cells recruited by rCSF3 into the skin may include not only neutrophils but also mast cells. Possibly, these mast cells have distinct polarization states from those enriched by rSCF. While the lack of antibodies against frog neutrophils or mast cells has limited our capacity to address this question, we will attempt to reexamine by histology the proportions of skin neutrophils and mast cells in the skins of frogs under the conditions described in our manuscript. Any new findings in this regard will be included in the revised version of this work.

      2. Epithelial thickness and inflammation in Bd infection are reported to be reduced by rSCF treatment (Figure 3E, 5A-B) or increased by rCSF3 treatment (Figure 4G) but quantification of these critical readouts is not shown.

      We thank the reviewer for this suggestion. We will score epithelial thickness under the distinct conditions described in our manuscript and present the quantified data in the revised paper.

      3. Critical time points in the Bd model are incompletely characterized. Mast cell expansion decreases zoospore burden at 21 dpi, while there is no difference at 7 dpi (Figure 3E). Conversely, neutrophil expansion increases zoospore burden at 7 dpi, but no corresponding 21 dpi data is shown for comparison (Figure 4G). Microbiota analysis is performed at a third time point,10 dpi (Figure 5D-G), making it difficult to compare with the data from the 7 dpi and 21 dpi time points. Reporting consistent readouts at these three time points is important to draw solid conclusions about the relationship of mast cell expansion to Bd infection and shifts in microbiota.

      Because there were no significant effects of mast cell enrichment at 7 days post Bd infection, we chose to look at the microbiome composition in a subsequent experiment at 10 days and 21 days post Bd infection, with 10 days being a bit more of a midway point between the initial exposure and day 21, when we see the effect on Bd loads. We will clarify this rationale in the revised manuscript.

      The enrichment of neutrophils in frog skins resulted in prompt (12 hours post enrichment) skin thickening (in absence of Bd infection) and increased frog Bd susceptibility by 7 days of infection. Conversely, mast cell enrichment stabilized skin mucosal and symbiotic microbial environment, presumably accounting at least in part for the lack of further Bd growth on mast cell-enriched animals by 21 days of infection. Our question regarding the roles of inflammatory granulocytes/neutrophils during Bd infections was that of ‘how’ rather ‘when’ these cells affect Bd infections. Because the central focus of this work was mast cells and not other granulocyte subsets, when we saw that rCSF3-recruited granulocytes adversely affected Bd infections at 7 days post infection, we did not pursue the kinetics of these responses further. We plan to explore the roles of inflammatory mediators and disparate frog immune cell subsets during the course of Bd infections, but we feel that these future studies are more peripheral to the central thesis of the present manuscript regarding the roles of frog mast cells during Bd infections.

      4. Although the effect of rSCF treatment on Bd zoospores is significant at 21 dpi (Figure 3E), bacterial microbiota changes at 21 dpi are not (Figure S3B-C). This discrepancy, how it relates to the bacterial microbiota changes at 10 dpi, and why 7, 10, and 21 dpi time points were chosen for these different readouts (Figure 5F-G), is not discussed.

      Our results indicate that after 10 days of Bd infection, control Bd-challenged animals exhibited reduced microbial richness, while skin mast cell-enriched Bd-infected frogs were protected from this disruption of their microbiome. The amphibian microbiome serves as a major barrier to these fungal infections14, and we anticipate that Bd-mediated disruption of microbial richness and composition facilitates host skin colonization by this pathogen. Control and mast cell-enriched animals had similar skin Bd loads at 10 days post infection. However, by 21 days of Bd infection the mast cells-enriched animals maintained their Bd loads to levels observed at 10 days post infection, whereas the control animals had significantly greater Bd loads. Thus, we anticipate that frog mast cells are conferring the observed anti-Bd protection in part by preventing microbial disassembly and thus interfering with optimal Bd colonization and growth on frog skins. In other words, maintained microbial composition at 10 days of infection may be preventing additional Bd colonization/growth, as seen when comparing skins of control and mast cell-enriched frogs at 21 days post infection. By 21 days of infection, control animals rebounded from the Bd-mediated reduction in bacterial richness seen at 10 days. Considering that after 21 days of infection control animals also had significantly greater Bd loads than mast-cell enriched animals suggests that there may be a critical earlier window during which microbial composition is able to counteract _Bd_growth. 

      While the current draft of our manuscript has a paragraph to this effect (see below), we appreciate the reviewer conveying to us that our perspective on the relationship between skin mast cells and the kinetics of microbial composition and _Bd_loads could be better emphasized. We plan to revise our manuscript to include the above discussion points. 

      Bd infections caused major reductions in bacterial taxa richness, changes in composition and substantial increases in the relative abundance of Bd-inhibitory bacteria early in the infection. Similar changes to microbiome structure occur during experimental Bd infections of red-backed salamanders and mountain yellow-legged frogs15, 16. In turn, progressing Bd_infections corresponded with a return to baseline levels of _Bd-inhibitory bacteria abundance and rebounding microbial richness, albeit with dissimilar communities to those seen in control animals. These temporal changes indicate that amphibian microbiomes are dynamic, as are the effects of Bd infections on them. Indeed, Bd infections may have long-lasting impacts on amphibian microbiomes15. While Bd infections manifested in these considerable changes to frog skin microbiome structure, mast cell enrichment appeared to counteract these deleterious effects to their microbial composition. Presumably, the greater skin mucosal integrity and mucus production observed after mast cell enrichment served to stabilize the cutaneous environment during Bd infections, thereby ameliorating the Bd-mediated microbiome changes. While this work explored the changes in established antifungal flora, we anticipate the mast cell-mediated inhibition of Bd may be due to additional, yet unidentified bacterial or fungal taxa. Intriguingly, while mammalian skin mast cell functionality depends on microbiome elicited SCF production by keratinocytes17, our results indicate that frog skin mast cells in turn impact skin microbiome structure and likely their function. It will be interesting to further explore the interdependent nature of amphibian skin microbiomes and resident mast cells.

      5. The time course of rSCF or rCSF3 treatments relative to Bd infection in the experiments is not clear. Were the treatments given 12 hours prior to the final analysis point to maximize the effect? For example, in Figure 3E, were rSCF injections given at 6.5 dpi and 20.5 dpi? Or were treatments administered on day 0 of the infection model? If the latter, how do the authors explain the effects at 7 dpi or 21 dpi given mast cell and neutrophil numbers return to baseline within 24 hours after rSCF or rCSF3 treatment, respectively?

      Please find the schematic of the immune manipulation, Bd infection, and sample collection times below. We will include a figure like this in our revised manuscript.

      The title of the manuscript may be mildly overstated. Although Bd infection can indeed be deadly, mortality was not a readout in this study, and it is not clear from the data reported that expanding skin mast cells would ultimately prevent progression to death in Bd infections.

      We acknowledge this point. The revised manuscript will be titled: “Amphibian mast cells: barriers to chytrid fungus infections”.

      Reviewer #3 (Public Review):

      Summary:<br /> Hauser et al. provide an exceptional study describing the role of resident mast cells in amphibian epidermis that produce anti-inflammatory cytokines that prevent Batrachochytrium dendrobatidis (Bd) infection from causing harmful inflammation, and also protect frogs from changes in skin microbiomes and loss of mucin in glands and loss of mucus integrity that otherwise cause changes to their skin microbiomes. Neutrophils, in contrast, were not protective against Bd infection. Beyond the beautiful cytology and transcriptional profiling, the authors utilized elegant cell enrichment experiments to enrich mast cells by recombinant stem cell factor, or to enrich neutrophils by recombinant colony-stimulating factor-3, and examined respective infection outcomes in Xenopus.

      Strengths:<br /> Through the use of recombinant IL4, the authors were able to test and eliminate the hypothesis that mast cell production of IL4 was the mechanism of host protection from Bd infection. Instead, impacts on the mucus glands and interaction with the skin microbiome are implicated as the protective mechanism. These results will press disease ecologists to examine the relative importance of this immune defense among species, the influence of mast cells on the skin microbiome and mucosal function, and open the potential for modulating mucosal defense.

      We thank the reviewer for recognizing the significance and utility of the findings presented in our manuscript.

      Weaknesses:<br /> A reduction of bacterial diversity upon infection, as described at the end of the results section, may not always be an "adverse effect," particularly given that anti-Bd function of the microbiome increased. Some authors (see Letourneau et al. 2022 ISME, or Woodhams et al. 2023 DCI) consider these short-term alterations as encoding ecological memory, such that continued exposure to a pathogen would encounter an enriched microbial defense. Regardless, mast cell-initiated protection of the mucus layer may negate the need for this microbial memory defense.

      We thank the reviewer their insightful comment. We will revise our discussion to include this possible interpretation.

      While the description of the mast cell location in the epidermal skin layer in amphibians is novel, it is not known how representative these results are across species ranging in chytridiomycosis susceptibility. No management applications are provided such as methods to increase this defense without the use of recombinant stem cell factor, and more discussion is needed on how the mast cell component (abundance, distribution in the skin) of the epidermis develops or is regulated.

      We appreciate the reviewer’s comment and would like to point out that the work presented in our manuscript was driven by comparative immunology questions more than by conservation biology.

      We thank the reviewer for suggesting expanding our discussion to include potential management applications and potential mechanisms for regulating frog skin mast cells. While any content to these effects would be highly speculative, we agree that it may spark new interest and pave new avenues for research. To this end, our revised manuscript will include a paragraph to this effect.

      References:

      1. Flajnik, M.F. A cold-blooded view of adaptive immunity. Nat Rev Immunol 18, 438-453 (2018).

      2. Mulero, I., Sepulcre, M.P., Meseguer, J., Garcia-Ayala, A. & Mulero, V. Histamine is stored in mast cells of most evolutionarily advanced fish and regulates the fish inflammatory response. Proc Natl Acad Sci U S A 104, 19434-19439 (2007).

      3. Reite, O.B. A phylogenetical approach to the functional significance of tissue mast cell histamine. Nature 206, 1334-1336 (1965).

      4. Reite, O.B. Comparative physiology of histamine. Physiol Rev 52, 778-819 (1972).

      5. Takaya, K., Fujita, T. & Endo, K. Mast cells free of histamine in Rana catasbiana. Nature 215, 776-777 (1967).

      6. Galli, S.J. New insights into "the riddle of the mast cells": microenvironmental regulation of mast cell development and phenotypic heterogeneity. Lab Invest 62, 5-33 (1990).

      7. Babina, M., Guhl, S., Artuc, M. & Zuberbier, T. IL-4 and human skin mast cells revisited: reinforcement of a pro-allergic phenotype upon prolonged exposure. Archives of dermatological research 308, 665-670 (2016).

      8. Lai, H. & Rogers, D.F. New pharmacotherapy for airway mucus hypersecretion in asthma and COPD: targeting intracellular signaling pathways. J Aerosol Med Pulm Drug Deliv 23, 219-231 (2010).

      9. Rankin, J.A. et al. Phenotypic and physiologic characterization of transgenic mice expressing interleukin 4 in the lung: lymphocytic and eosinophilic inflammation without airway hyperreactivity. Proc Natl Acad Sci U S A 93, 7821-7825 (1996).

      10. Church, M.K. Allergy, Histamine and Antihistamines. Handb Exp Pharmacol 241, 321-331 (2017).

      11. Nakamura, T. The roles of lipid mediators in type I hypersensitivity. J Pharmacol Sci 147, 126-131 (2021).

      12. Aponte-Lopez, A., Enciso, J., Munoz-Cruz, S. & Fuentes-Panana, E.M. An In Vitro Model of Mast Cell Recruitment and Activation by Breast Cancer Cells Supports Anti-Tumoral Responses. Int J Mol Sci 21 (2020).

      13. Jamur, M.C. et al. Mast cell repopulation of the peritoneal cavity: contribution of mast cell progenitors versus bone marrow derived committed mast cell precursors. BMC Immunol 11, 32 (2010).

      14. Walke, J.B. & Belden, L.K. Harnessing the Microbiome to Prevent Fungal Infections: Lessons from Amphibians. PLoS Pathog 12, e1005796 (2016).

      15. Jani, A.J. et al. The amphibian microbiome exhibits poor resilience following pathogen-induced disturbance. ISME J 15, 1628-1640 (2021).

      16. Muletz-Wolz, C.R., Fleischer, R.C. & Lips, K.R. Fungal disease and temperature alter skin microbiome structure in an experimental salamander system. Mol Ecol 28, 2917-2931 (2019).

      17. Wang, Z. et al. Skin microbiome promotes mast cell maturation by triggering stem cell factor production in keratinocytes. J Allergy Clin Immunol 139, 1205-1216 e1206 (2017).

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      For decades it has been accepted that only the growth-arrested "stumpy" form of Trypanosoma brucei can infect the arthropod vector, the Tsetse fly, but this was recently challenged by a demonstration that - under artificial conditions that are known to enhance infectivity - the proliferative "slender" form can also establish Tsetse infections. The infectiousness of the two forms is a fundamental question in trypanosome biology and epidemiology, concerning both infection dynamics and parasite differentiation. The authors of the current study provide compelling evidence that without artificial enhancement, the "stumpy" form is indeed much more infective for Tsetse than the slender form; they suggest that this is probably also true in the wild.

      Since the authors of this paper did not themselves test the effect of enhancing conditions, the precise reason for the discrepancy in results between the two laboratories has not been demonstrated conclusively.

      This specific comment was addressed in the revision and illustrated with new data.

      Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Ngoune et al. present compelling evidence that Slender cells are challenged to infect tsetse flies. They explore the experimental context of a recent important paper in the field, Schuster et al., that presents evidence suggesting the proliferative Slender bloodstream T.brucei can infect juvenile tsetse flies. Schuster et al. was disruptive to the widely accepted paradigm that the Stumpy bloodstream form is solely responsible for tsetse infection and T.brucei transmission potential. Evidence presented here shows that in all cases, Stumpy form parasites are exponentially more capable of infecting tsetse flies. They further show that Slender cells do not infect mature flies.

      However, they raise questions of immature tsetse immunological potential and field transmission potential that their experiments do not address. Specifically, they do not show that teneral tsetse flies are immunocompromised, that tsetse flies must be immunocompromised for Slender infection nor that younger teneral tsetse infection is not pertinent to field transmission.

      All these specific comments were addressed in the revision and illustrated with new data and references.

      - The limited immunocompetence of teneral flies has been extensively studied by the labs of S. Aksoy at Yale and M. Lehane at Liverpool. In the discussion, we provide key references from these two labs 19-22.

      - Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

      - Our comment on the relevance to field transmission is simply based on field observations of the fly biology. For example, according to the capture-recapture experiments described in HARGROVE JW insect sci applic 1990 (new ref 23), wild female mortality was reported 6.8% shortly after emergence, <1% for ages 20-50 days and rose to 5% by 130 day (a pattern similar to that for laboratory reared tsetse), while wild male daily mortality was 8.3% after emergence, fell to 5.5% by 9 days, then rose continuously to more than 10% by 30 days. This means that adult flies represent the majority of individuals in a wild tsetse population. Hence, knowing that both males and females are strictly hematophagous and that they can live up to nine months, the impact of teneral flies (up to 4 days after emergence) on trypanosome transmission appears limited, if not incidental.

      Strengths:

      Experimental Design is precise and elegant, outcomes are convincing. Discussion is compelling and important to the field. This is a timely piece that adds important data to a critical discussion of host:parasite interactions, of relevance to all parasite transmission.

      Thank you

      Weaknesses:

      As above, the authors dispute the biological relevance of teneral tsetse infection in the wild, without offering evidence to the contrary. Statements need to be softened for claims regarding immunological competence or relevance to field transmission.

      All these specific comments were addressed in the revision and illustrated with new data and references.

      - The limited immunocompetence of teneral flies has been extensively studied by the labs of S. Aksoy at Yale and M. Lehane at Liverpool. In the discussion, we provide key references from these two labs 19-22.

      - Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

      - Our comment on the relevance to field transmission is simply based on field observations of the fly biology. For example, according to the capture-recapture experiments described in HARGROVE JW insect sci applic 1990 (new ref 23), wild female mortality was reported 6.8% shortly after emergence, <1% for ages 20-50 days and rose to 5% by 130 day (a pattern similar to that for laboratory reared tsetse), while wild male daily mortality was 8.3% after emergence, fell to 5.5% by 9 days, then rose continuously to more than 10% by 30 days. This means that adult flies represent the majority of individuals in a wild tsetse population. Hence, knowing that both males and females are strictly hematophagous and that they can live up to nine months, the impact of teneral flies (up to 4 days after emergence) on trypanosome transmission appears limited, if not incidental.

      Reviewer #2 (Public Review):

      Summary:

      In contrast to the recent findings reported by Schuster S et al., this brief paper presents evidence suggesting that the stumpy form of T. brucei is likely the most pre-adapted form to progress through the life cycle of this parasite in the tsetse vector.

      Strengths:

      One significant experimental point is that all fly infection experiments are conducted in the absence of "boosting" metabolites like GlcNAc or S-glutathione. As a result, flies infected with slender trypanosomes present very low or nonexistent infection rates. This provides important experimental evidence that the findings of Schuster S and colleagues may need to be revisited.

      Thank you

      Weaknesses:

      However, I believe the authors should have included their own set of experiments demonstrating that the presence of these metabolites in the infectious bloodmeal enhances infection rates in flies receiving blood meals containing slender trypanosomes. Considering the well-known physiological variabilities among flies from different facilities, including infection rates, this would have strengthened the experimental evidence presented by the authors.

      This specific comment was addressed in the revision and illustrated with new data.

      Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

      Reviewer #3 (Public Review):

      The dogma in the Trypanosome field is that transmission by Tsetse flies is ensured by stumpy forms. This has been recently challenged by the Engstler lab (Schuster et al.), who showed that slender forms can also be transmitted by teneral flies. In this work, the authors aimed to test whether transmission by slender forms is possible and frequent. The authors observed that most stumpy forms infections with teneral and adult flies were successful while only 1 out of 24 slender form infections were successful.

      In this revised version of the manuscript, the authors made some text changes and included statistical testing as a new section of the Materials and Methods. It seems the comparison of midgut infection in adult vs teneral flies was significant in most of the conditions. However, the critical comparison is still missing: within each type of fly (adult or teneral), was the MG infection significantly different between slender and stumpy forms?

      An ANOVA statistical analysis was performed and a dedicated section added to the revised version. MG infection rate comparisons were statistically significant between teneral and adult flies infected with ST in each amount (p<0.02 with 10 parasites; p<0.0001 with 100 and 1,000 parasites) and with 1,000 SL (p<0.0001). MG infection rate comparisons were statistically significant (p<0.0001) between parasite stages (SL and ST) in each amount (10, 100 and 1,000) and for each fly group (teneral and adult), excepted in teneral flies infected with 1,000 parasites (p=0.2356).

      Given no additional experiments were performed, it remains unknown why this work and Schuster et al. reached different conclusions. As a result it remains unclear in which conditions slender forms could be important for transmission. Several variables could explain differences between the two groups: the strain used, the presence or absence of N-acetylglucosamine and/or glutathione, how Tsetse colonies were maintained, thorough molecular and cellular characterisation of slender and stumpy forms (to avoid using intermediate forms as slender forms), comparison to recent field parasite strains.

      This specific comment was addressed in the revision and illustrated with new data.

      Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The manuscript is improved, but the author has not addressed much of the constructive criticism offered that would benefit the manuscript.

      To clarify, evidence from Schuster et al did not demonstrate, rather it suggested. That is a major point of this paper - that the previous evidence presented had caveats. Terms such as demonstrate or prove are inappropriate in most biological contexts, unless evidence is without caveat.

      This specific comment was addressed in the revision and illustrated with new data.

      Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

      Statements regarding teneral flies in the field are softened. Yet the referenced papers pertain more to commensurate coinfections rather than reduced immunocapacity of immature teneral flies in the field. This should be clarified.

      The limited immunocompetence of teneral flies has been extensively studied by the labs of S. Aksoy at Yale and M. Lehane at Liverpool. In the discussion, we provide key references from these two labs 19-22.

      The text remains convoluted to read with grammatical errors in places. For example, it is incorrect to begin a sentence with However. There are far too many run-on sentences in the manuscript that confuse this straightforward story.

      The revised text was improved as much as possible.

      All text requires grammatical refinement and softer claims unless additional experiments are undertaken.

      Reviewer #2 (Recommendations For The Authors):

      I continue to endorse the publication of this manuscript; however, I am somewhat disappointed by the authors' justifications for not conducting additional experiments or exploring other factors that might influence the infection phenotypes in the fly.

      This specific comment was addressed in the revision and illustrated with new data.

      Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors performed an integration of 48 scRNA-seq public datasets and created a single-cell transcriptomic atlas for AML (222 samples comprising 748,679 cells). This is important since most AML scRNA-seq studies suffer from small sample size coupled with high heterogeneity. They used this atlas to further dissect AML with t(8;21) (AML-ETO/RUNX1-RUNX1T1), which is one of the most frequent AML subtypes in young people. In particular, they were able to predict Gene Regulatory Networks in this AML subtype using pySCENIC, which identified the paediatric regulon defined by a distinct group of hematopoietic transcription factors (TFs) and the adult regulon for t(8;21). They further validated this in bulk RNA-seq with AUCell algorithm and inferred prenatal signature to 5 key TFs (KDM5A, REST, BCLAF1, YY1, and RAD21), and the postnatal signature to 9 TFs (ENO1, TFDP1, MYBL2, KLF1, TAGLN2, KLF2, IRF7, SPI1, and YXB1). They also used SCENIC+ to identify enhancer-driven regulons (eRegulons), forming an eGRN, and found that prenatal origin shows a specific HSC eRegulon profile, while a postnatal origin shows a GMP profile. They also did an in silico perturbation and found AP-1 complex (JUN, ATF4, FOSL2), P300, and BCLAF1 as important TFs to induce differentiation. Overall, I found this study very important in creating a comprehensive resource for AML research.

      Strengths:

      (1) The generation of an AML atlas integrating multiple datasets with almost 750K cells will further support the community working on AML.

      (2) Characterisation of t(8;21) AML proposes new interesting leads.

      We thank the reviewer for a succinct summary of our work and highlighting its strengths.

      Weaknesses:

      Were these t(8;21) TFs/regulons identified from any of the single datasets? For example, if the authors apply pySCENIC to any dataset, would they find the same TFs, or is it the increase in the number of cells that allows identification of these?

      The purpose of our study was to gain biological insights by integrating multiple datasets, to overcome limitations from small sample size. We expect that the larger dataset would improve network inference, which is what we implemented in the manuscript, hence we have not looked at individual datasets. However, we will investigate this further in the revised manuscript by running pySCENIC on individual datasets and comparing to the results drawn from the whole atlas.

      Reviewer #2 (Public review):

      Summary:

      The authors assemble 222 publicly available bone marrow single-cell RNA sequencing samples from healthy donors and primary AML, including pediatric, adolescent, and adult patients at diagnosis. Focusing on one specific subtype, t(8;21), which, despite affecting all age classes, is associated with better prognosis and drug response for younger patients, the authors investigate if this difference is reflected also in the transcriptomic signal. Specifically, they hypothesize that the pediatric and part of the young population acquires leukemic mutations in utero, which leads to a different leukemogenic transformation and ultimately to differently regulated leukemic stem cells with respect to the adult counterpart. The analysis in this work heavily relies on regulatory network inference and clustering (via SCENIC tools), which identifies regulatory modules believed to distinguish the pre-, respectively, post-natal leukemic transformation. Bulk RNA-seq and scATAC-seq datasets displaying the same signatures are subsequently used for extending the pool of putative signature-specific TFs and enhancer elements. Through gene set enrichment, ontology, and perturbation simulation, the authors aim to interpret the regulatory signatures and translate them into potential onset-specific therapeutic targets. The putative pre-natal signature is associated with increased chemosensitivity, RNA splicing, histone modification, stem-ness marker SMARCA2, and potentially maintained by EP300 and BCLAF1.

      Strengths:

      The main strength of this work is the compilation of a pediatric AML atlas using the efficient Cellxgene interface. Also, the idea of identifying markers for different disease onsets, interpreting them from a developmental angle, and connecting this to the different therapy and relapse observations, is interesting. The results obtained, the set of putative up-regulated TFs, are biologically coherent with the mechanisms and the conclusions drawn. I also appreciate that the analysis code was made available and is well documented.

      We thank the reviewer for reviewing our work, and highlighting its key features, including creation of AML atlas, downstream analysis and interpretation for t(8;21) subtype.

      We also appreciate useful critique of our paper provided below.

      Weaknesses:

      There were fundamental flaws in how methods and samples were applied, a general lack of critical examination of both the results and the appropriateness of the methods for the data at hand, and in how results were presented. In particular:

      (1) Cell type annotation:

      a) The 2-phase cell type annotation process employed for the scRNA-seq sample collection raised concerns. Initially annotated cells are re-labeled after a second round with the same cell types from the initial label pool (Figure 1E). The automatic annotation tools were used without specifying the database and tissue atlases used as a reference, and no information was shown regarding the consensus across these tools.

      We believe that most of the reviewer’s criticisms stem from a misunderstanding, and we apologize for not explaining certain aspects of our work more clearly.

      The two types of cell type annotation applied were different and served distinct purposes:

      • One was using general bone marrow/blood reference datasets to annotate blood subtype lineage clusters.

      • The other was using a CD34 purified AML specific reference dataset which included leukaemia-associated annotations, to identify HSPC subpopulations. We also implemented this on a single-cell level to allow more robust identification of these rare populations in a large dataset.

      This is probably not well explained in the methods and figure presentation. We will clearly indicate in the revised manuscript that different HSPC annotations represent separate analysis and will update the figures to highlight this. We will provide a comprehensive review of the annotation strategies implemented, including the automated tool outputs, which may be useful for the single-cell community.

      b) Expression of the CD34 marker is only reported as a selection method for HSPCs, which is not in line with common practice. The use of only is admitted as a surface marker, while robust annotation of HSPCs should be done on the basis of expression of gene sets.

      We used CD34 expression in conjunction with other cell type annotations and marker sets to identify LSCs, although results are same when we use HSPC annotated cells without condition on CD34 expression.  In the revised manuscript, we will simplify this analysis to use HSPC clusters as suggested by the reviewer.

      c) During several analyses, the cell types used were either not well defined or contradictory, such as in Figure 2D, where it is not clear if pySCENIC and AUC scores were computed on HSPCs alone or merged with CMPs. In other cases, different cell type populations are compared and used interchangeably: comparing the HSPC-derived regulons with bulk (probably not enriched for CD34+ cells) RNA samples could be an issue if there are no valid assumptions on the cell composition of the bulk sample.

      As mentioned in the Methods, we only excluded lymphoid cell types from the pySCENIC analysis to overcome the bias that some samples were enriched using CD34 selection when preparing them for scRNA-seq. We will make this clearer in the text and figures of the revised manuscript. It is difficult to overcome this bias when using bulk RNA samples, which may explain why some of our samples do not fit into our defined signature groups. However, as we do not have access to primary samples ourselves, we cannot provide a better matched experimental cohort for validation.

      (2) Method selection:

      a) The authors should explain why they use pySCENIC and not any other approach. They should briefly explain how pySCENIC works and what they get out in the main text. In addition they should explain the AUCell algorithm and motivate its usage.

      pySCENIC is state-of-the-art method for network inference from scRNA data and is widely used within the single-cell community (over 5000 citations for both versions of the SCENIC pipeline). The pipeline has been benchmarked as one of the top performers for GRN analysis (Nguyen et al, 2021. Briefings in Bioinformatics). AUCELL is a module within the pySCENIC pipeline to summarise the activity of a set of genes (a regulon) into a single number which helps compare and visualise different regulons. We agree with reviewer that this could have been more clearly explained within the manuscript. We will update text in the revised manuscript to add more explanation.

      b) The obtained GRN signatures were not critically challenged on an external dataset. Therefore, the evidence that supports these signatures to be reliable and significant to the investigated setting is weak.

      These signatures were inferred from the best suitable AML single-cell RNA datasets available to date, and we used two independent datasets to validate our findings (the TARGET AML bulk RNA sequencing cohort, and the Lambo et al. scRNA-seq dataset). To our knowledge, there are no other better suited datasets for validation. Experimental validations on patient samples are beyond the scope of this study.

      (3) There are some issues with the analysis & visualization of the data.

      We will provide new statistical tests to improve robustness of the analysis as well as presentation and visualization of the data in the revised manuscript.

      (4) Discussion:

      a) What exactly is the 'regulon signature' that the authors infer? How can it be useful for insights into disease mechanisms?

      The ’regulon signature’ here refers to a gene regulatory program (multiple gene modules, each defined by a transcription factor and its targets) which are specific to different age groups. Further investigation into this can be useful for understanding why patients of different ages confer a different clinical course. We will add more text on the utility of our discovered 'regulon signature' in the discussion section of revised manuscript.

      b) The authors write 'Together this indicates that EP300 inhibition may be particularly effective in t(8;21) AML, and that BCLAF1 may present a new therapeutic target for t(8;21) AML, particularly in children with inferred pre-natal origin of the driver translocation.' I am missing a critical discussion of what is needed to further test the two targets. Put differently: Would the authors take the risk of a clinical study given the evidence from their analysis?

      Of course, many extensive studies would be required before these findings are clinically translatable. We can include some perspectives on what further work is required in terms of further experimental validation and potential subsequent clinical study.

    1. Author response:

      The following is the authors’ response to the original reviews

      Thank you for your valuable comments, which helped us improve our manuscript. We will make the following modifications in the revised manuscript:

      (1) In the first paragraph of the Result section, we will provide a summary of trimeric G proteins in Ciona and explain how we focused on Gαs and Gαq in the initial phase of this study.

      We added a summary of trimeric G proteins in Ciona in the initial part of the Results section (page 6, line 23 to page 8, line 5). In this summary, we added the following sentence explaining the reason we focused on Gas and Gaq in the initial phase of this study: "Among them, we prioritized examining the Gα proteins having an excitatory function (Gαq and Gαs) rather than inhibitory roles since previous studies suggested that excitatory events like Ca<sup>2+</sup> transient and neuropeptide secretion occur when Ciona metamorphose."

      (2) As the reviewer 1 suggests, the polymodal roles of papilla neurons are interesting. Although we could not address this through functional analyses in this study, we will add a discussion regarding this aspect. The sentences will be something like the following:

      “The recent study (Hoyer et al., 2024) provided several lines of evidence suggesting that PSNs can serve as the sensors of several chemicals in addition to the mechanical stimuli. This finding and our model could be mutually related because these chemicals could modify Ca<sup>2+</sup> and cAMP production. The use of G protein signaling allows Ciona to reflect various environmental stimuli to initiate metamorphosis in the appropriate situation, both mechanically and chemically.”

      We added a discussion related to the recent publication by Hoyer and colleagues on page 23, lines 13-18: " A recent study[19] provided several lines of evidence suggesting that PNs can serve as the sensors of several chemicals in addition to mechanical stimuli. This finding and our model could be mutually related because these chemicals could modify Ca<sup>2+</sup> and cAMP production. G protein signaling allows Ciona to reflect various environmental stimuli to initiate metamorphosis either mechanically or chemically according to the situation."

      (3) As both reviewers suggested, imaging cAMP on the backgrounds of some G protein knockdowns is essential, and we will conduct the experiments.

      We added the data on cAMP imaging in Gas, Gaq, and dvGai_Chr2 knockdown larvae in Supplementary Figure S4C-D and Figure 6E.

      (4) We carefully modify the text throughout the manuscript so that the descriptions suitably reflect the results.

      We modified the descriptions of experimental results so that the text reflects the results more precisely.

      Reviewer #1:

      Pg1 - need to add an additional '6' to the author list to clarify which two or more authors contributed equally.

      We added a 6 as suggested. Thank you for pointing this out.

      Pg3 - note that larval adhesive organ applies to not all benthic adults, but to benthic sessile adults this makes it sound like the adhesive organ can trigger metamorphosis but has that been shown? In Ciona or others? Need to specify the role of cells secreting adhesive, vs sensory cells that trigger metamorphosis?

      We divided the corresponding sentence into two to clearly state that adhesion and triggering metamorphosis are related but could be different events. Moreover, we modified the sentence to state that physical contact is one example of a cue triggering metamorphosis. We then added another example of a factor triggering metamorphosis—i.e., chemicals from the organisms surrounding the adherence site (page 3, lines 16-20 of the revised version):

      "Many marine invertebrates exhibit a benthic lifestyle at the adult stage[4]. Their planktonic larvae have an adhesive organ that secretes adhesives and adheres to a substratum. The cues associated with the adhesion, such as the physical contact with the substratum and a chemical from organisms surrounding the adherence site, can trigger their metamorphosis."

      Pg 4 - although mechanosensation is the focus here, could there also be chemoreception/chemoreceptors involved in Ciona metamorphosis? For example, Hoyer et al. 2024 (Current Biology 34(6):1168-1182) concluded that some palp sensory neurons were multimodal and could be both chemo- and mechano-sensory.

      We added statements about this recent finding in the Introduction and Discussion sections. In the Introduction (page 4, lines 16-18), however, we also stated that a mechanical stimulus can trigger metamorphosis in the lab without the need to supply these chemicals. This is to emphasize that the mechanical stimulus is the focus of this study. In the Discussion, we added a statement that G-protein signaling could also be used to receive the chemical stimuli (page 23, lines 13-18).

      Pg 6 - Before starting functional characterizations, it would be useful to give an overview (table?) of the G proteins found in papillae, and what receptor they are suspected of binding to, or if this is completely unknown, and which downstream pathways they likely activate. That is, to show some results about which G proteins are found in Ciona, and which are found in papillae. In this way, it will make more sense for readers when the Gai is suddenly introduced later, following the sections of Gaq and Gas.

      Thank you for your idea to improve the readability of this manuscript. In the initial part of the Results section (page 6, line 22 to page 8, line 5), we added descriptions of the repertoire of trimeric G-proteins in Ciona, including phylogenetic analyses, and expression in the papillae based on RNA-seq data, followed by the reason why we initially focused on Gaq and Gas. The data are displayed in Supplementary Figure S1. The phylogenetic analyses were modified from those shown in Supplementary Figure S5 of the previous version. We also added the general downstream activities of Gas, Gai and Gaq in the Introduction section (page 6, lines 10-12). Considering the contents, the general function of Ga12/13 was stated in the Results section (page 8, lines 2-3).

      We did not add the information about their partner receptors in this early section. This is because there are many candidates, and we could not pick some of them. Instead, we described our current suppositions about their possible partners in the Discussion (page 23, line 22 to page 24, line 19). However, we suspect that there are more candidates, and we wish to promote unbiased research in the future.

      Pg 9 - would be good to know the timing of this PF fluorescence increase and the timing of stimulation in the text here, relevant to the 30-min gap before metamorphosis initiation

      We added the start times for the cAMP reduction and re-upregulation in the following sentence (page 11, lines 17-18): "The cAMP reduction and increase respectively started at 35 seconds and 4 min 40 seconds after stimulation on average."

      Pg 28 - Phylogenetic analysis: Given that the results may be of interest to metamorphosis in other marine invertebrates as discussed in the last paragraph of the paper, it would be useful to include G proteins from these other animal phyla where available in the phylogenetic tree. Similarly, in Figure S5A it would be useful to highlight further all the different Ciona G proteins, and the different protein families, through the use of additional colour/labelling (regardless of whether this remains Fig S5A, or becomes part of the main figures)

      We drew a phylogenetic tree of G-proteins including those in some sessile and benthic animals (barnacle, sea anemone, hydra, sponge, sea urchin and shell). However, we decided not to add the tree in the revised version because, unfortunately, the bootstrap values of many branches were not high enough to have confidence in the results. We hope you understand our decision. Ciona divergent G-proteins are likely to be specific to Ciona.

      According to your comment, we highlighted all Ciona G alpha proteins in red in Figure S5A, which is now Figure S1A in the revised version.

      Figure 3E and Figure S3 - is the data shown as an average of all larvae measured (n=5 and n=4) or is it data from one representative larva out of the 4-5 measured? This needs clarification.

      The original graphs in Figure 3E and Figure S3 are typical examples. We added the graphs summarizing data of all larvae in each experimental condition in Supplementary Figure S4 (corresponding to Supplementary Figure S3 of the original version). Figure 3E remains as a typical example of the result of a single larva to explain our data analysis in detail.

      Experimental suggestion - As mentioned above, one missing detail seems to be the need for evidence that cAMP is elevated in the papillae directly as a result of Gs activation- this could be shown with measurement of cAMP via PF in Gs knockdown larvae that are mechanically stimulated compared to wildtype stimulated and non-stimulated?

      Thank you for your suggestion. The experiments are indeed important. We added the data of Pink Flamindo imaging in the Gas, Gaq and dvGai_Chr2 knockdown conditions. The results of Gas and Gaq knockdowns are described in page 11, line 24 to page 12, line 5, and are displayed in Supplementary Figure S4C-D. The result of dvGai_Chr2 knockdown is given on page 16, lines 20-22 and shown in Figure 6E.

      In order to insert the data of cAMP imaging of dvGai_Chr2 knockdown larvae, we transferred some panels of Figure 6 to Supplementary Figure S6. In addition, the knockdown data of dvGαi_Chr4 and double knockdowns of Gai genes are also included in Supplementary Figure S6.

      Reviewer #2:

      Page 6, line 3-4 in the first paragraph of the "Results"; the authors state "Neither morphant showed any signature of metamorphosis even though both were allowed to adhere to the base of culture dishes...". However, judging from Fig. 1E, "the percentage of metamorphosis initiation" (indicated by the initiation of tail regression) in Gαq morphans is not close to 0 (average about 40%), thus I am not convinced this observation can be described as "Neither morphant showed any signature of metamorphosis..." in this sentence.

      Thank you for your suggestion. In writing the original text, we oversimplified some of the descriptions when trying to improve the readability. We agree this resulted in imprecision in places. We have revised all these passages in our revision. In this particular case, we softened the overly emphatic statement to better reflect the results, changing “... any signature of metamorphosis...” to “... reduced rate of metamorphosis initiation...” In addition, we stated that the effect of G_α_q MO was weaker than that of G_α_s MO on page 8, lines 10-12. The weaker effect of Gaq MO was due to the redundant role of the Gi pathway, which is shown on page 17, lines 10-17, and in Figure 6G-H.

      Similarly, in the next paragraph describing the knockdown of PLCβ1/2/3, PLCβ4, and IP3R genes, the authors appear to neglect there is a weaker effect of the PLCβ4 MO, and simply described the results as "The knockdown larvae of these three genes failed to start metamorphosis". Based on Fig. 1H, about 30% of the PLCβ4 MO-injected animals still initiated tail regeneration. This difference may have some biological meanings and thus should be described more precisely.

      We added the following sentence on page 8, lines 18-19 of the revised version: “The effect of PLCβ4 MO was weaker than those of the other MOs, suggesting that this PLC plays an auxiliary role.”

      Page 7, second paragraph, on the description of GCaMP8 fluorescence and also at the end of Fig. 1O legend, the citation to "Figure S1" is confusing; Fig. S1 is the phylogenetic tree of PLCβ proteins. Is there additional data regarding this Gαq MO plus GCaMP8 mRNA injection experiment?

      Figure S1 of the original version corresponds to Figure S2 of the revised version. To avoid confusion, we deleted this citation from the legend of Figure 1O. By this modification, the sentence stating the repertoire of PLCb and IP3R in Ciona (page 8, lines 15-16) is the only sentence citing Figure S2 in the revised version.

      Page 8, first sentence; The purpose of theophylline treatment is not to prevent larvae from adhesion, thus I would suggest modifying this sentence to: "We treated wild-type larvae with theophylline after tail amputation, and we observed that most theophylline-treated larvae completed tail regression without adhesion (Figure 2D-F)".

      We modified the sentence according to your comment. Thank you for your suggestion.

      Page 9, second paragraph; judging from the data presented in Fig. 3C, I think this description: "when papillae were removed from larvae, theophylline failed to induce metamorphosis" is not accurate, because about ~30% of the Papilla cut +Theophylline-treated larvae still initiated their tail regression. This needs to be explained clearly.

      We modified the sentence (page 11, lines 2-3) as follows: “...the average rate of metamorphosis induction by theophylline was reduced from 100% to 30%...”

      Similarly in the next few sentences regarding the results presented in Fig, 3D, the effects of overexpressing those genes are not uniform. While amputation of papillae in larvae overexpressing caPLCβ1/2/3 could inhibit metamorphosis almost completely, papilla cut seems to have a weaker effect on caGαq, caGαs, and bPAC-overexpressing larvae.

      We added a description explaining that caPLCβ1/2/3 was the most sensitive to papilla amputation, and the possibility that PLCβ1/2/3 works specifically in the papillae (page 11, lines 9-11): “Among these experiments, caPLCβ1/2/3 overexpression was the most sensitive to papilla amputation, suggesting that PLCβ1/2/3 acts specifically in the papillae during metamorphosis.”

      Page 9, the paragraph on using the fluorescent cAMP indicator; there is a discrepancy between the described developmental time when the authors conducted this experiment and the metamorphosis competent timing (after 24hpf) described on page 7. On page 26, the authors describe "The Pink Flamindo mRNA-injected larvae were immobilized on Poly L lysine-coated glass bottom dishes at 20-21 hpf...". Did the authors start stimulating the larvae to observe the fluorescent signal soon after immobilization, or wait several hours until the larvae passed 24hpf and then conduct the experiment?

      The latter is the case. The immobilized larvae were kept until they acquired the competence for metamorphosis and then stimulation/recording was carried out. This point is described in the Materials and Methods section of the revised version (page 29, lines 16-18):

      "The Pink Flamindo mRNA-injected larvae were immobilized on Poly L lysine-coated glass-bottom dishes at 20-21 hpf, and stimulated their adhesive papillae around 25 hpf."

      Page 10, the description "...Gαq morphants initiated metamorphosis when caGαs was overexpressed in the nervous system (Figure 4F)". It should be noted that the result is only a partial rescue. To be precise, this description needs to be modified.

      We changed the sentence to reflect the results more precisely (page 14, lines 2-3): “Moreover, caGαs overexpression in the nervous system significantly, although not perfectly, ameliorated the effect of Gαq MO (Figure 4F).”

      Page 12-13, This description and the figure 5E presented is a bit confusing to me. The figure legend for 5E: "GABA is necessary for Ca2+ transient in the adhesive papillae (arrow)" But the arrow in this image points to a place with no fluorescent signal, and on the upper corner it labeled as "29% (n=17)". Does that mean the proportion of "no Ca2+ increase after stimulation" was 29% among the 17 samples examined? Or actually, is the other way around that 81% of the examined larvae did not show Ca2+ signal increase after stimulation?

      The latter is the case. We added a caption explaining this clearly in the Figure legend: “The percentage and number exhibit the rate of animals showing Ca<sup>2+</sup> transient in the papillae.”

      Page 13, second paragraph; I do not agree with the overly simplified description that "GABA significantly ameliorated the metamorphosis-failed phenocopies of Gαq, PLCβ, and Gαs morphants". As shown in Fig. 5F-H, adding GABA exerts different levels of partial rescue effect on each morphant, and thus should be described clearly.

      When the outliers are neglected, the effect of GABA is most evident in Gαs knockdowns. This suggests that the target(s) of GABA signaling is more likely to be Gq pathway components. We added the following sentence to the revised version (page 15, lines 14-16):

      “Among the three morphants, GABA exhibited the most effective rescues in Gαs knockdowns than Gαq and PLCβ.”

      In addition, we think this sentence establishes a more logical connection with the sentence that follows it: “These results could be explained by assuming enhancement of the Gq pathway by GABA through PLCβ and another GABA-mediated metamorphic pathway bypassing Gq components.” Thank you for your suggestion.

      The section "Contribution of Gi to metamorphosis" confirmed the possibility that GABA signaling targets Gq pathway components.

      Page 13, the first paragraph on "Contribution of Gi to metamorphosis"; the description that "The knockdown of this gene (Gαi) exhibited a significantly reduced rate of metamorphosis;..." is misleading. I would suggest modifying the entire sentence as "The knockdown of this gene (Gαi) exhibited a moderate (although statistically significant) reduction of metamorphosis rate, suggesting the presence of another Gαi regulating metamorphosis".

      Thank you for your suggestion. We modified the sentence (page 16, lines 2-4 in the revised version) as recommended. We believe the description is much improved.

      Page 20, the last sentence about Ciona papilla neurons expressing transcription factor Islet; the authors seem to attempt to make some comparison with the vertebrate pancreatic beta cells in this paragraph, but the comparison and the argument are not fully developed in this current format.

      To deepen this discussion, we added the following sentence (page 23, lines 10-12): “The atypical secretion of GABA might depend on the transcription factor like Islet shared between Ciona papilla neurons and vertebrate beta cells.”

      However, we would like to limit the depth of our discussion on this point, as we hope to expand on it further in future studies.

      Other suggestions:

      Page 3, second paragraph: as they become unable to "move" after metamorphosis -> "relocate"

      We corrected the word as suggested.

      Page 4, second paragraph: In the first sentence, the author states the current understanding of chordate phylogeny and cites Delsuc et al. 2006 Nature paper at the end of this sentence. However, in this paper cephalochordates were erroneously grouped with echinoderms, and thus chordates did not form a monophyletic clade. A later paper by Bourlat et al, (Nature 444:85-88, 2006) corrected this problem, and subsequently Dulsuc et al. also published another paper (genesis, 46:592-604, 2008) with broader sampling to overcome this problem. These later publications need to be included for the sake of correctness.

      We added this reference.

      Page 14, regarding the redundant function of the typical Gαi protein in the papillae; the authors may try double KD of Gαi and dvGαi_Chr2 in their experimental system to test this idea.

      We carried out double knockdown of typical Gai and dvGαi_Chr2. However, we could not address their redundant role sufficiently because most of the double knockdown larvae exhibited severe shape malformation.

      dvGαi_Chr4 is also expressed in the papillae. We carried out knockdown of this gene, to find that the knockdown resulted in very minor but statistically significant reduction of the metamorphosis rate, suggesting that this Gai also plays a supportive role in metamorphosis. We also carried out double knockdown of dvGαi_Chr2 and dvGαi_Chr4. The double KD larvae exhibited responsiveness to GABA, probably because of the presence of typical Gai.

      These results are described on page 16, lines 2-18, and the data are shown in Supplementary Figure S6A-D of the revised version.

      Responses to the Reviewing editor's comments:

      "Larvae of the ascidian Ciona initiate metamorphosis tens of minutes after adhesion to a substratum via its adhesive organ." - Larvae is plural so change to 'via their adhesive organ'

      The sentence was corrected as suggested.

      "Metamorphosis is a widespread feature of animal development that allows them" - revise the sentence, e.g. "Metamorphosis is a widespread feature of development that allows animals"

      The sentence was corrected as suggested.

      "GABA synthase (GAD)" GAD is not called GABA synthase but glutamate decarboxylase - clarify, e.g. encoding the enzyme synthesizing GABA called glutamate decarboxylase (GAD)

      This part was corrected exactly as suggested. Thank you.

      "IP3 is received by its receptor on the endoplasmic reticulum (ER) and releases calcium ion (Ca2+ )" revise to "IP3 is received by its receptor on the endoplasmic reticulum (ER) that releases calcium ion (Ca2+ )"

      The sentence was corrected as suggested.

      "Moreover, GPCR is implicated as the mediator of settlement" - GPCRs are implicated

      This sentence was modified as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1(Public review):

      Summary:

      This manuscript details the results of a small pilot study of neoadjuvant radiotherapy followed by combination treatment with hormone therapy and dalpiciclib for early-stage HR+/HER2-negative breast cancer.

      Strengths:

      The strengths of the manuscript include the scientific rationale behind the approach and the inclusion of some simple translational studies.

      Weaknesses:

      The main weakness of the manuscript is that overly strong conclusions are made by the authors based on a very small study of twelve patients. A study this small is not powered to fully characterize the efficacy or safety of a treatment approach, and can, at best, demonstrate feasibility. These data need validation in a larger cohort before they can have any implications for clinical practice, and the treatment approach outlined should not yet be considered a true alternative to standard evidence-based approaches.

      I would urge the authors and readers to exercise caution when comparing results of this 12-patient pilot study to historical studies, many of which were much larger, and had different treatment protocols and baseline patient characteristics. Cross-trial comparisons like this are prone to mislead, even when comparing well powered studies. With such a small sample size, the risk of statistical error is very high, and comparisons like this have little meaning.

      We greatly appreciate your evaluation of our study and fully agree with the limitations you have pointed out. We have clearly stated the limitations of the small sample size and emphasized the need for a larger population to validate our preliminary findings in the discussion section (Lines 311-316).

      We acknowledge that this small sample size is not powered to characterize this regimen as a promising alternative regimen in the treatment of patients with HR-positive, HER2-negative breast cancer. Therefore, we have revised the description of this regimen to serve as a feasible option for neoadjuvant therapy in HR-positive, HER2-negative breast cancers both in the discussion (Lines 317-320) and the abstract (Lines 71-72).

      We agree with you that cross-trial comparisons should be approached with caution due to differences in study designs and patient populations. In our discussion section, we acknowledge that small sample size limited the comparison of our data with historical data in the literature due to the potential bias (Lines 312-313). We clearly state that such comparisons hold limited significance (Lines 313-314) and suggest a larger population to validate our preliminary findings.

      • Why was dalpiciclib chosen, as opposed to another CDK4/6 inhibitor?

      Thank you for your comments. The rationale for selecting dalpiciclib over other CDK4/6 inhibitors in our study is primarily based on the following considerations:

      (1) Clinical Efficacy: In several clinical trials, including DAWNA-1 and DAWNA-2, the combination of dalpiciclib with endocrine therapies such as fulvestrant, letrozole, or anastrozole has been shown to significantly extend the progression-free survival (PFS) in patients with hormone receptor-positive, HER2-negative advanced breast cancer [1-2].

      (2) Tolerability and Management of Adverse Reactions: The primary adverse reactions associated with dalpiciclib are neutropenia, leukopenia, and anemia. Despite these potential side effects, the majority of patients are able to tolerate them, and with proper monitoring and management, these reactions can be effectively mitigated [1-2].

      (3) Comparable pharmacodynamic with other CDK4/6 inhibitors: The combination of CDK4/6 inhibitors, including palbociclib, ribociclib, and abemaciclib, with aromatase inhibitors has demonstrated an enhanced ability to suppress tumor proliferation and increase the rate of clinical response in neoadjuvant therapy for HR-positive, HER2-negative breast cancer [3-5]. Furthermore, preclinical studies have shown that dalpiciclib has comparable in vivo and in vitro pharmacodynamic activity to palbociclib, suggesting its potential effectiveness in similar treatment regimens [6].

      (4) Accessibility and Regulatory Approval: Dalpiciclib has gained marketing approval in China on December 31, 2021, which facilitates the accessibility of this medication, making it a more convenient option when considering treatment plans.

      References:

      (1) Zhang P, Zhang Q, Tong Z, et al. Dalpiciclib plus letrozole or anastrozole versus placebo plus letrozole or anastrozole as first-line treatment in patients with hormone receptor-positive, HER2-negative advanced breast cancer (DAWNA-2): a multicentre, randomised, double-blind, placebo-controlled, phase 3 trial[J]. The Lancet Oncology, 2023, 24(6): 646-657.

      (2) Xu B, Zhang Q, Zhang P, et al. Dalpiciclib or placebo plus fulvestrant in hormone receptor-positive and HER2-negative advanced breast cancer: a randomized, phase 3 trial[J]. Nature medicine, 2021, 27(11): 1904-1909.

      (3) Hurvitz S A, Martin M, Press M F, et al. Potent cell-cycle inhibition and upregulation of immune response with abemaciclib and anastrozole in neoMONARCH, phase II neoadjuvant study in HR+/HER2− breast cancer[J]. Clinical Cancer Research, 2020, 26(3): 566-580.

      (4) Prat A, Saura C, Pascual T, et al. Ribociclib plus letrozole versus chemotherapy for postmenopausal women with hormone receptor-positive, HER2-negative, luminal B breast cancer (CORALLEEN): an open-label, multicentre, randomised, phase 2 trial[J]. The lancet oncology, 2020, 21(1): 33-43.

      (5) Ma C X, Gao F, Luo J, et al. NeoPalAna: neoadjuvant palbociclib, a cyclin-dependent kinase 4/6 inhibitor, and anastrozole for clinical stage 2 or 3 estrogen receptor–positive breast cancer[J]. Clinical Cancer Research, 2017, 23(15): 4055-4065.

      (6) Long F, He Y, Fu H, et al. Preclinical characterization of SHR6390, a novel CDK 4/6 inhibitor, in vitro and in human tumor xenograft models[J]. Cancer science, 2019, 110(4): 1420-1430.

      • The eligibility criteria are not consistent throughout the manuscript, sometimes saying early breast cancer, other times saying stage II/III by MRI criteria.

      Thank you for pointing out the inconsistencies in the description of the eligibility criteria in our manuscript. We deeply apologize for any confusion caused by these inconsistencies. We have revised the term from “early-stage HR-positive, HER2-negative breast cancer” to “early or locally advanced HR-positive, HER2-negative breast cancer” (Lines 128 and 150). The term “early or locally advanced” encompasses two different stages of breast cancer, whereas “Stage II/III by MRI criteria” refers to specific stages within the TNM staging system.

      • The authors should emphasize the 25% rate of conversion from mastectomy to breast conservation and also report the type and nature of axillary lymph node surgery performed. As the authors note in the discussion section, rates of pathologic complete response/RCB scores are less prognostic for hormone-receptor-positive breast cancer than other subtypes, so one of the main rationales for neoadjuvant medical therapy is for surgical downstaging. This is a clinically relevant outcome.

      We appreciate your constructive comments. Based on your suggestions, we have made the following revisions and additions to the article.

      The breast conservation rate serves as a secondary endpoint in our study (Line 62 and 179). We have highlighted the significant 25% conversion rate from mastectomy to breast conservation in both the results (Lines 229-230) and discussion sections (Lines 290-292).

      In our study, all patients underwent lymph node surgery, including sentinel lymph node biopsy or axillary lymph node dissection. Among them, 58.3% of patients (7/12) underwent sentinel lymph node biopsies.

      We agree with your point that the prognostic value of pathologic complete response/RCB score is lower for hormone receptor-positive breast cancer compared to other subtypes, we have revised the discussion section to clarify that one of the principal objectives for neoadjuvant therapy in this patient population is to facilitate downstaging and enhance the rate of breast conservation (Lines 289-290). And also emphasized that this neoadjuvant therapeutic regiment appeared to improve the likelihood of pathological downstaging and achieve a margin-free resection, particularly for those with locally advanced and high-risk breast cancer (Lines 293-295).

      Reviewer #2 (Public review):

      Firstly, as this is a single-arm preliminary study, we are curious about the order of radiotherapy and the endocrine therapy. Besides, considering the radiotherapy, we also concern about the recovery of the wound after the surgery and whether related data were collected.

      Thanks for the comments. The treatment sequence in this study is to first administer radiotherapy, followed by endocrine therapy. A meta-analysis has indicated that concurrent radiotherapy with endocrine therapy does not significantly impact the incidence of radiation-induced toxicity or survival rates compared to a sequential approach [1]. In light of preclinical research suggesting enhanced therapeutic efficacy when radiotherapy is delivered prior to CDK4/6 inhibitors, we have opted to administer radiotherapy before the combination therapy of CDK4/6 inhibitors and hormone therapy [2].

      In our study, we collected data on surgical wound recovery. All 12 patients had Class I incisions, which healed by primary intention. The wounds exhibited no signs of redness, swelling, exudate, or fat necrosis.

      References:

      (1) Li Y F, Chang L, Li W H, et al. Radiotherapy concurrent versus sequential with endocrine therapy in breast cancer: A meta-analysis[J]. The Breast, 2016, 27: 93-98.

      (2) Petroni G, Buqué A, Yamazaki T, et al. Radiotherapy delivered before CDK4/6 inhibitors mediates superior therapeutic effects in ER+ breast cancer[J]. Clinical Cancer Research, 2021, 27(7): 1855-1863.

      Secondly, in the methodology, please describe the sample size estimation of this study and follow up details.

      Thanks for pointing out this crucial omission. Sample size estimation for this study and follow-up details have been added in the methodology section. The section on sample size estimation has been revised to state in Statistical analysis: “This exploratory study involves 12 patients, with the sample size determined based on clinical considerations, not statistical factors (Lines 210-211).” The section on follow up has been revised to state in Procedures section “A 5-year follow-up is conducted every 3 months during the first 2 years, and every 6 months for the subsequent 3 years. Additionally, safety data are collected within 90 days after surgery for subjects who discontinue study treatment (Lines 169-172).”

      Thirdly, in Table 1, the item HER2 expression, it's better to categorise HER2 into 0, 1+, 2+ and FISH-.

      Thank you very much for pointing out this issue. The item HER2 expression in Table 1 has been revised from “negative, 1+, 2+ and FISH-” to “0, 1+, 2+ and FISH-”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I can find no problems with the experiments performed in this study, but there are several results that are not easily explained. I would like to see more consideration of possible explanations. For example, one of the major differences between the the CESA structure from primary and secondary cell walls is the displacement of TM7 in the primary cell wall CESAs that leads to the formation of lipid exposed channel. Why does this vary between primary and secondary cell wall CESA proteins? Could it explain differences in the properties, such as crystallinity between primary and secondary cell wall cellulose?

      At this time, the different position of TM helix 7 observed in our GmCesA structures is just an observation. We have some emerging evidence that this helix is also flexible in POCesA8 under certain conditions; however, we do not know whether this affects catalytic activity or cellulose coalescence. We have revised the text to avoid the interpretation that TM 7 repositioning is a characteristic feature of primary cell wall CesAs only.

      Similarly, regarding the formation of the larger structures from mixtures of different CESA trimers. Why do they not form roseOes? Par;cularly as these appear to be forming 2-dimensional structures.

      We have included additional data on the interaction between different CesA isoform trimers (Figure 6). To answer the reviewer’s ques;on, the most likely reasons for not observing closely packed roseOe-like structures are (a) steric interferences between the micelles harboring the individual CesA trimers, and (b) the lack of a stabilizing cellulose fiber.  This interpretation is supported by 2D class averages of dimers of CesA1 and CesA3 trimers (now shown in Fig. 6). The class averages show an ‘upside-down and side-by-side’ orientation of the two trimers, consistent with interferences between the solubilizing detergent micelles. The implica;ons of this non-physiological arrangement are discussed in the revised manuscript. In a biological membrane, the CesA trimers are confined to the same plane in the same orientation, which is likely necessary to form ordered arrangements.

      What role does the NTD play in trimer formation given its apparent very high class specificity?

      We have no data suggesting any contribution of the NTD to trimer formation. Recent work on moss CesA5 and similar AlphaFold predic;ons suggest that, for some CesAs, an extreme Nterminal region can interact with the beta sheet of the catalytic domain via beta-strand augmentation. Whether this interaction can contribute to CesA-CesA interactions remains unknown.

      Reviewer #2 (Recommendations For The Authors):

      The authors provide PDB codes but not EMDB codes for the EM maps, also I would encourage the authors to upload the raw micrographs to the EMPIAR database.

      The EMDB codes are shown in Table 1 and data transfer to EMPIAR is ongoing.  

      Page 6 line 144, the statement "All CesA isoforms show greatest catalytic activity at neutral pH" seems to contradict the data in Figure 1e and the subsequent statements. This sentence should be removed.

      The text has been revised to indicate that CesA1 and CesA6 show highest activity under mild alkaline conditions.  

      Page 6, line 150, the authors state "The affinities for substrate binding range from 1.4 mM for CesA1 to 0.6 and 2.4 mM for CesA3 and CesA6, respectively." How were the affinities determined? Is this the affinities or the Michaelis constants? Is it known whether CesAs are rapid equilibrium enzymes? This should be clarified.

      The text now states that we performed Michaelis Menten kine;cs using the ‘UDP-Glo’ glycosyltransferase assay kit. We are uncertain about whether CesAs can be classified as rapid equilibrium enzymes. The rate-limiting step of cellulose biosynthesis has been proposed to be glycosyl transfer, rather than cellulose transloca;on.  To avoid any confusion, we changed the text from '…reveals Michaelis Menten constants for substrate binding of CesA1 and CesA3' to '…reveals Michaelis Menten constants for CesA1 and CesA3 with respect to UDP-Glc'.

      Page 6, line 153, the authors state "CesA1's apparent Ki for UDP is roughly 0.8 mM, whereas this concentration is increased to about 1.2 to 1.5 mM for CesA6 and CesA3, respectively." From the Figure 1g legend, it appears that the authors performed additional experiments at different UDP-Glc concentrations in order to determine Ki that are not shown. This data should be included as a figure supplement as the data presented are insufficient to determine Ki (only IC50).

      The UDP inhibition data show apparent IC50 values, and this has been corrected in the text. For each CesA isoform, the titration was done at one UDP-Glc concentration only.    

      Page 8, line 202, the authors state that TM helix 7 of the primary cell wall CesAs is more flexible "as evidenced by weaker density." The density for the TM helix 7 should be shown. If the density shown in Supplementary Figure 3 corresponds to TM helices the number of the helices should be indicated as it is not immediately obvious from the amino acid residue numbers.

      The densities for TM helix 7 of all CesA isoforms are shown in Supplemental Figure 3. The helices are now labeled to orient the reader.  

      Reviewer #2 (Public Review)

      The authors demonstrate via truncation that the N-terminus of the CesA is not involved in the interactions between the isoforms and propose that the CSR hook-like extensions are the primary mediator of trimer-trimer interactions. This argument would be strengthened by equivalent truncation experiments in which the CSR region is removed.

      We performed the suggested experiment. We replaced the CSR in N-terminally truncated GmCesA1 and GmCesA3 with a 20-residue long linker. The resulting constructs assemble into homotrimeric complexes as observed for the wild type and only N-terminally truncated versions. However, the CSR-truncated constructs of the different isoforms do not interact with each other in vitro. Further, CSR-deleted GmCesA3 also does not interact with full-length CesA1, suggesting that two CSR domains of different isoforms are necessary for homotrimer interaction. This data is now shown as Fig. 5.  

      Reviewer #3 (Recommendations For The Authors):

      Major Points

      (1) The authors state on Line 354 that they were unable to isolate heterotrimers, but they need to provide the data to support this claim; for example, it is important for readers to understand whether co-expression of all three CESAs leads to only homotrimers or only monomers. This information is essential to exclude model C in Figure 6.

      We have revised the corresponding discussion and toned down the statement that heterotrimeric complexes did not form in our recombinant expression system. Co-expression of differently tagged secondary or primary cell wall CesAs in Sf9 cells has consistently resulted in negligible amounts of material that can be purified sequentially over different affinity matrices (corresponding to the tags on the recombinantly expressed CesAs – His, Strep, Flag). While this does not exclude the formation of a small fraction of hetero-oligomeric complexes (which could be trimers as observed in the structures or monomers interacting via their CSR regions), it demonstrates that CesAs favor the same isoform for trimer formation, rather than partnering with other isoforms. An example of such a purification is now shown as Supplemental Figure 8.

      Determining whether heterotrimers are formed upon co-expression of different CesA isoforms requires high resolution structural analysis because co-purification of different isoforms can also be due to interactions between different homo-trimeric complexes, as demonstrated in this study.

      While we cannot exclude that factors exist in planta that may prevent the formation of homotrimers and favor the formation of hetero-trimers, it is important to keep in mind that currently no experimental data supports the formation of hetero-trimeric complexes. Instead, our work demonstrates that existing data on CesA isoform interactions can be explained by the interaction of homotrimers of different isoforms.

      (2) The evidence that the products of GmCEA1, GmCESA3, and GmCESA6 homotrimers are cellulose is that they consume UDP-glucose and produce a beta-glucanase-sensitive product. Other beta-glucans synthesized by similar GT2 family proteins (e.g. CSLDs, Yang et al., 2020 Plant Cell or CSLCs, Kim et al., 2020 PNAS) would be sensitive to this enzyme, and the product cannot truly be called cellulose unless it forms microfibrils. Previous reports of CESA activity in vitro have demonstrated that the products form genuine cellulose microfibrils rather than amorphous beta-glucan (via electron microscopy); extensively documented that the product is sensitive to beta-glucanase, but not other enzymes (e.g., callose or MLG degrading enzymes); provided linkage analysis of the product to conclusively demonstrate that it is a beta1,4-linked glucan; and documented a loss of activity when key catalytic residues were mutated (Purushotham et al., 2016 PNAS; Cho et al., 2017 Plant Phys; Purushotham et al., 2020 Science).

      Other GT2 characterization efforts have documented activity to similar standards (e.g. CSLDs, Yang et al., 2020 Plant Cell or CSLFs, Purushotham et al., 2022 Science Advances). At least one independent method should be provided, and the TEM of the product is necessary for readers to appreciate whether the product forms true cellulose microfibrils.

      There may be some confusion regarding the nomenclature. Therefore, we revised the second sentence of the Introduction to define ‘cellulose’ as a beta-1,4 linked glucose polymer, in accordance with the ‘Essentials of Glycobiology’. This is also consistent with enzyme nomenclature as the primary product of cellulose synthase is a single glucose polymer, and not a fibril. For example, most bacterial cellulose synthases only produce amorphous (single chain) cellulose. 

      We show that the GmCesA products can be degraded with a beta-1,4 specific glucanase (cellulase), which demonstrates the formation of authentic cellulose. This study does not focus on the formation of fibrillar cellulose apart from suggesting a revised model for a microfibrilforming CSC.       

      (3) The position of isoxaben-resistant mutations implies that primary cell wall CESAs form heterotrimers (Shim et al., 2018 Frontiers in Plant Biology). Indeed, in their previous description of the POCESA8 structure (Purushotham et al., 2020 Science), the authors discussed the position of isoxaben-resistant mutations as a way to justify the way that TM7 of one CESA can contribute to forming the cellulose translocation pore in the neighbouring CESA within a heterotrimer. However, in this manuscript, the authors document a different location for TM7 in the GmCEA1, GmCESA3, and GmCESA6 homotrimers, which would change the position of these resistance mutations. Please discuss.

      As stated in the manuscript, we do not know what the functional implication of the TM7 flexibility may be, but we speculate that it could affect the alignment of the synthesized cellulose polymers. Regarding the previously reported POCesA8 structure, the mapping of one of the reported isoxaben resistance mutants to the C-terminus of TM7 was not used to justify the structure; the structure with its position of TM7 stands on its own.  Considering recent observations suggesting that isoxaben may affect cellulose biosynthesis via secondary effects, we prefer not to speculate on the mechanism by which these mutations cause the apparent resistance to isoxaben (PMID: 37823413).

      (4) The authors present no evidence that GmCESA1/3/6 are involved in primary cell wall synthesis. Please include gene expression information (documenting widespread expression consistent with primary CESAs) and rigorous molecular phylogenetic analysis (or references to these published data) to clarify that these are indeed primary cell wall CESAs.

      This has been addressed. We have included additional figures (Fig. 1 and S1B) that show the strong and wide distribution of the selected CesAs in soybean leaves, their co-expression with primary cell wall markers, and their phylogenetic clustering with Arabidopsis primary cell wall CesAs.  

      (5) Several small changes need to be made to the abstract to ensure that it aligns with the data: Line 28: add "in vitro" arer "their assembly into homotrimeric complexes" Line 28: change "stabilized by the PCR" to "presumably stabilized by the PCR".

      We inserted ‘in vitro’ as requested. We did not insert the second modification as requested since CesA trimers are stabilized by the PCR. This is a fact arising from several experimentally determined CesA trimer structures.  

      (6) In all graphs in all figures it is unclear what the sample size is and what the bars represent. These must be stated in the figure legends. It is best practice to plot individual data points so that readers can easily interpret both the sample size and the variation.

      The sample sizes and error bars are now defined in the relevant figure legends.

      (7) The methods need to unambiguously define GmCESA1, GmCESA3, GmCESA6 protein identities using appropriate accession numbers.

      The accession codes are now provided in the Methods.

      Minor Points

      (1) Does CESA1 have higher activity in Figure 1D because of the pH at which the assay was conducted (see Figure 1E)? Could this difference in activity or pH preference have also affected their capacity to resolve TM7 of CESA1?

      We consistently observe higher in vitro catalytic activity of CesA1, compared to CesA3 and CesA6. Activity assays are performed at a pH of 7.5, roughly halfway between the activity maxima of CesA3 and CesA1/6. At this pH, we expect activity differences to arise from factors other than the buffer pH. As detailed above, we do not know whether the conformational flexibility of TM helix 7 affects catalytic activity.

      (2) Line 55: The authors should cite additional papers that also provide insight into CESA structure (e.g. Qiao et al 2021 PNAS).

      A recent publication on moss CesA5 has been included. Qiao et al unfortunately report on a dimeric assembly of a fragment of Arabidopsis thaliana’s CesA3 catalytic domain, which we consider non-physiological. We added a brief statement in the Discussion explaining that our GmCesA3 structure is inconsistent with the dimeric arrangement reported by Qiao et al.

      (3) Line 95: these references are about secondary cell wall CESA isoforms, but there are more appropriate references for the primary CESAs that should be included in place of these papers.

      Fagard et al report on growth defects in roots and dark-grown hypocotyls linked to Arabidopsis CesA 1 and CesA6, which are primary cell wall CesAs. Nevertheless, we have included two additional recent publications from the Meyerowitz and Persson labs.

      (4) Line 121-122: Please cite a specific figure that supports this claim, since the (Purushotham et al., 2020) reference refers to POCESA8 enrichment results, but the claims are about the GmCESA1/3/6 enrichment.

      The POCesA8 reference has been removed. The classification into monomers and trimers arises from the data processing described in this manuscript and is consistent with similar results obtained for POCesA8.

      (5) Line 314: It is more appropriate to use "enzyme activity" rather than "cellulose synthesis".

      We prefer to use cellulose biosynthesis since the enzyme produces cellulose.

      (6) Figure 1: please add colour to the graphs to clarify which trend lines belong to which data series (especially Figure 1G).

      The figure (now Fig. 2) has been revised as suggested.  

      (7) Figure 2D: It's not clear which parts are GmCESA and which are POCESA8; please clarify the figure legend.

      Thank you, the legend has been revised accordingly (now Fig. 3).

      (8) In Figure 5, It's not clear that the one CESA is maintained at a steady concentration throughout the assay since there is only a bar for that CESA at the highest concentration (e.g. in Figure 5A, the blue bar for CESA1 only appears on the right-most assay, but there was CESA1 in all assays, so this should be indicated).

      In the panel the reviewer is referring to, the blue bar corresponds to the activity measured for only CesA1 at a concentration of 20 µM. The red columns (indicated as ‘Mix’) represent the activities measured in the presence of 20 µM of CesA1 plus increasing concentrations of CesA3. The purple columns represent activities obtained for only CesA3 at the indicated concentrations. Numerical addition of the activities of CesA1 alone at 20 µM (blue column) and CesA 3 alone (purple columns) gives rise to the gray columns, now indicated by a capital ‘sigma’ sign. We are unclear on how the figure could be improved, but we have revised the legend to avoid confusion.    

      (9) Figure 5 legend needs to be clarified to indicate whether monomers or homotrimers were used in the assays.

      This is now shown as Fig. 7 and the legend has been revised as requested. The experiments were performed with the trimeric CesA fractions.

      (10) There seem to be some random dots near the top of Figures 6B & 6C

      Removed. Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      We appreciate the reviewers' thoughtful comments and suggestions. Below, we provide point-by-point responses to the recommendations and outline the updates made to the manuscript.

      (1) Discussion, "the obvious experiment is to manipulate a neuron's anatomical embedding while leaving stimulus information intact."] The epiphenomenon can arise from the placement and types of a neuron's neurotransmitters and neuromodulators, too.

      The content of vesicles released by a neuron is obviously of great importance in determining postsynaptic impact. However, we’re suggesting that (assuming vesicular content is held constant) the anatomically-relevant patterning of spiking might additionally affect the postsynaptic neuron’s integration of the presynaptic input. To avoid confusion, we updated the text accordingly: “the obvious experiment is to manipulate a neuron's anatomical embedding while minimally impacting external and internal variables, such as stimulus information and levels of neurotransmitters or neuromodulators” (Line 594 - 596).

      (2) “In all conditions, the slope of the input duration versus sensitivity line was still positive at 1,800 seconds (Fig. 3B)". This may suggest that the estimate of the calculated statistics (ISI, PSTH) is more reliable with more data, rather than (or in addition to) specific information being extracted from faraway time points. Another potential confound is the training statistics were calculated from all training data, so the test data is a better match to training data when test statistics are calculated from more data. Overall, the validity of the conclusions following this observation is not clear to me.

      This is a great point. Accordingly, we revised the text to include this possibility: “Because the training data were of similar duration, this could be explained by either of two possibilities. First, the signal is relatively short, but noisy—in this case, extended sampling will increase reliability. Second, the anatomical signal is, itself, distributed over time scales of tens to hundreds of seconds.” (Line 252 - 255).

      (3) "This further suggests that there is a latent neural code for anatomical location embedded within the spike train, a feature that could be practically applied to determining the brain region of a recording electrode without the need for post-hoc histology". The performance of the model at the subregion level, which is a typical level of desired precision in locating cells, does not seem to support such a practical application. Please clarify to avoid confusion.

      The current model should not be considered a replacement for traditional methods, such as histology. Our intention is to convey that, with the inclusion of multimodal data and additional samples, a computational approach to anatomical localization has great promise. We updated the manuscript to clarify this point: “While significantly above chance, the structure-level model still lacks the accuracy for immediate practical application. However, it is highly likely that the incorporation of datasets with diverse multi-modal features and alternative regions from other research groups will increase the accuracy of such a model. In addition, a computational approach can be combined with other methods of anatomical reconstruction.” (Line 355 - 359).

      Additionally, we directly addressed this point in our original manuscript (Discussion section: Line 498 - 505 in the current version). Furthermore, following the release of our preprint, independent efforts have adopted a multimodal strategy with qualitatively similar results (Yu et al., 2024). Other recent work expands on the idea of utilizing single-neuron features for brain region/structure characterization (La Merre et al., 2024).

      Yu, H., Lyu, H., Xu, E. Y., Windolf, C., Lee, E. K., Yang, F., ... & Hurwitz, C. (2024). In vivo cell-type and brain region classification via multimodal contrastive learning. bioRxiv, 2024-11.

      Le Merre, P., Heining, K., Slashcheva, M., Jung, F., Moysiadou, E., Guyon, N., ... & Carlén, M. (2024). A Prefrontal Cortex Map based on Single Neuron Activity. bioRxiv, 2024-11.

      (4) "These results support the notion the meaningful computational division in murine visuocortical regions is at the level of VISp versus secondary areas.". The use of the word "meaningful" is vague and this conclusion is not well justified because it is possible that subregions serve different functional roles without having different spiking statistics.

      Precisely! It is well established that different subregions serve different functional purposes - but they do not necessitate different regional embeddings. It is important to note the difference between stimulus encoding and the embedding that we are describing. As a rough analogy, the regional embedding might be considered a language, while the stimulus is the content of the spoken words. However, to avoid vague words, we revised the sentence to “These results suggest that the computational differentiability of murine visuocortical regions is at the level of VISp versus secondary areas.” (Line 380 - 381)

      (5) Figure 3D left/right halves look similar. A measure of the effect size needs to accompany these p-values.

      We assume the reviewer is referring to Figure 3E. Although some of the violin plots in Figure 3E look similar, they are not identical. In the revision, we include effect sizes in the caption.

      (6) Figure 3A, 3F: Could uncertainty estimates be provided?

      Yes. We added uncertainty estimates to the text (Line 272 - 294) and to the caption of Figure S2, which displays confusion matrices corresponding to Figure 3A. The inclusion of similar estimates for 3F would be so unwieldy as to be a disservice to the reader—there are 240 unique combinations of stimulus parameters and structures. In the context of the larger figure, 3F serves to illustrate a relationship between stimulus, region, and the anatomical embedding.

      (7) Page 21. "semi-orthogonal". Please reword or explain if this usage is technical.

      We replaced “semi-orthogonal” with “dissociable” (Line 549).

      (8) Page 11, "This approach tested whether..."] Unclear sentence. Please reword.

      We changed “This approach tested whether the MLP’s performance depended on viewing the entire ISI distribution or was enriched in a subset of patterns” to “This approach identified regions of the ISI distribution informative for classification” (Line 261).

      Reviewer #2 (Recommendations for the authors):

      We appreciate the reviewer’s comments and summary of the results. We agree that the introductory results (Figs. 1-3) are not particularly compelling when considered in isolation. They provide a baseline of comparison for the subsequent results. Our intention was to approach the problem systematically, progressing from well-established, basic methods to more advanced approaches. This allows us to clearly test a baseline and avoid analytical leaps or untested assumptions. Specifically:

      ● Figure 1 provides an evaluation of the standard dimensionality reduction methods. As expected, these methods yield minimal results, serving as a clear baseline. This is consistent, for example, with an understanding of single units as rate-varying Poisson processes.

      ● Figures 2 and 3 then build upon these results with spiking features frequent in neuroscience literature such as firing rate, coefficient of variation, etc using linear supervised and more detailed spiking features such as ISI distribution using nonlinear supervised machine learning methods.

      By starting from the standpoint of the status quo, we are better able to contextualize the significance of our later findings in Figures 4–6.

      Response to Specific Points in the Summary

      (6) Separability of VISp vs. Secondary Visual Areas

      I found the entire argument about visual areas somewhat messy and unclear. The stimuli used might not drive the secondary visual areas particularly well and might necessitate task engagement.

      We appreciate your feedback that the dissection of visual cortical structures is unclear. To summarize, as shown in the bottom three rows of Figure 6, there is a notable lack of diagonality in visuocortical structures. This means that our model was unable to learn signatures to reliably predict these classes. In contrast, visuocortical layer is returned well above chance, and superstructures (primary and secondary areas) are moderately well identified, albeit still well above chance.

      Consider a thought experiment, if Charlie Gross had not shown faces to monkeys to find IT, or Newsome and others shown motion to find MT and Zeki and others color stimuli to find V4, we would conclude that there are no differences.

      The thought experiment is misleading. The results specifically do not arise from stimulus selectivity—much of Newsome’s own work suggests that the selectivity of neurons in IT etc. is explained by little more than rate varying Poisson processes. In this case, there should be no fundamental anatomical difference in the “language” of the neurons in V4 and IT, only a difference in the inputs driving those neurons. In contrast, our work suggests that the “language” of neurons varies as a function of some anatomical divisions. In other words, in contrast to a Poisson rate code, our results predict that single neuron spike patterns might be remarkably different in MT and IT— and that this is not a function of stimulus selectivity. Notably, the anatomical (and functional) division between V1 and secondary visual areas does not appear to manifest in a different “language”, thus constituting an interesting result in and of itself.

      We regret a failure to communicate this in a tight and compelling fashion on the first submission, but hope that the revision is limpid and accessible.

      Barberini, C. L., Horwitz, G. D., & Newsome, W. T. (2001). A comparison of spiking statistics in motion sensing neurones of flies and monkeys. Motion Vision: Computational, Neural, and Ecological Constraints, 307-320.

      Bair, W., Zohary, E., & Newsome, W. T. (2001). Correlated firing in macaque visual area MT: time scales and relationship to behavior. Journal of Neuroscience, 21(5), 1676-1697.

      Similarly, why would drifting gratings be a good example of a stimulus for the hippocampus, an area thought to be involved in memory/place fields?

      The results suggest that anatomical “language” is not tied to stimuli. It is imperative to recall that neurons are highly active absent experimentally imposed stimuli, such as when an animal is at rest, when an animal is asleep, and when an animal is in the dark (relevant to visual cortices). With this in mind, also recall that, despite the lack of stimuli tailored to the hippocampus, neurons therein were still reliably separable from neurons in seven nuclei in the thalamus, 6 of which are not classically considered visual regions. Should these regions (including hippocampus) have been inert during the presentation of visual stimuli, there would have been very little separability.

      (7) Generalization across laboratories

      “[C]omparison across laboratories was somewhat underwhelming. It does okay but none of the results are particularly compelling in terms of performance.

      Any result above chance is a rejection of the null hypothesis: that a model trained on a set of animals in Laboratory A will be ineffective in identifying brain regions when tested on recordings collected in Laboratory B (in different animals and under different experimental conditions). As an existence proof, the results suggest conserved principles (however modest) that constrain neuronal activity as a function of anatomy. That models fail to achieve high accuracy (in this context) is not surprising (given the limitations of available recordings)---that models achieve anything above chance, however, is.

      Thus, after reading the paper many times, I think part of the problem is that the study is not cohesive, and the authors need to either come up with a tool or demonstrate a scientific finding.

      We demonstrate that neuronal spike trains carry robust anatomical information. We developed an ML architecture for this and that architecture is publicly available.

      They try to split the middle and I am left somewhat perplexed about what exact scientific problem they or other researchers are solving.

      We humbly suggest that the question of a neurons “language” is highly important and central to an understanding of how brains work. From a computational perspective, there is no reason for a vast diversity of cell types, nor a differentiation of the rules that dictate neuronal activity in one region versus another. A Turing Complete system can be trivially constructed from a small number of simple components, such as an excitatory and inhibitory cell type. This is the basis of many machine learning tools.

      Please do not confuse stimulus specificity with the concept of a neuron’s language. Neurons in VISp might fire more in response to light, while those in auditory cortex respond to sound. This does not mean that these neurons are different - only that their inputs are. Given the lack of a literature describing our main effect—that single neuron spiking carries information about anatomical location—it is difficult to conclude that our results are either commonplace or to be expected.

      I am also unsure why the authors think some of these results are particularly important.

      See above.

      For instance, has anyone ever argued that brain areas do not have different spike patterns?

      Yes. In effect, by two avenues. The first is a lack of any argument otherwise (please do not conflate spike patterns with stimulus tuning), and the second is the preponderance of, e.g., rate codes across many functionally distinct regions and circuits.

      Is that not the premise for all systems neuroscience?

      No. The premise for all systems neuroscience (from our perspective) is that the brain is a) a collection of interacting neurons and b) the collective system of neurons gives rise to behavior, cognition, sensation, and perception. As stated above, these axiomatic first principles fundamentally do not require that neurons, as individual entities, obey different rules in different parts of the brain.

      I could see how one could argue no one has said ISIs matter but the premise that the areas are different is a fundamental part of neuroscience.

      Based on logic and the literature, we fundamentally disagree. Consider: while systems neuroscience operates on the principle that brain regions have specialized functions, there is no a priori reason to assume that these functions must be reflected in different underlying computational rules. The simplest explanation is that a single language of spiking exists across regions, with functional differences arising from processing distinct inputs rather than fundamentally different spiking rules. For example, an identical spike train in the amygdala and Layer 5 of M1 would have profoundly different functional impacts, yet the spike timing itself could be identical (even as stimulus response). Until now, evidence for region-specific spiking patterns has been lacking, and our work attempts to begin addressing this gap. There is extensive further work to be conducted in this space, and it is certain that models will improve, rules will be clarified, and mechanisms will be identified.

      Detailed major comments

      (1) Exploratory trends in spiking by region and structure across the population:

      The argument in this section is that unsupervised analyses might reveal subtle trends in the organization of spiking patterns by area. The authors show 4 plots from t-SNE and claim to see subtle organization. I have concerns. For Figure 1C, it is nearly impossible to see if a significant structure exists that differentiates regions and structures. So this leads certain readers to conclude that the authors are looking at the artifactual structure (see Chari et al. 2024) - likely to contribute to large Twitter battles. Contributing to this issue is that the hyperparameter for tSNE was incorrectly chosen. I do think that a different perplexity should be used for the visualization in order to better show the underlying structure; the current visualization just looks like a single "blob". The UMAP visualizations in the supplement make this point more clearly. I also think the authors should include a better plot with appropriate perplexity or not include this at all. The color map of subtle shades of green and yellow is hard to see as well in both Figure S1 and Figure 1.

      In response to the feedback, we replaced t-SNE/UMAP with LDA, while keeping PCA for dimensionality reduction.

      As stated in the original methods, t-SNE/UMAP hyperparameters were chosen based on the combination that led to the greatest classifiable separability of the regions/structures in the space (across a broad range of possible combinations). It just so happens that the maximally separable structure from a regions/structures perspective is the “blob”. This suggests that perhaps the predominant structure the t-SNE finds in the data is not driven by anatomy. If we selected hyperparameters in some other way that was not based specifically on regions/structures (e.g. simple visual inspection of the plots) the conformation would of course be different and not blob-like. However, we removed the t-SNE and UMAP to avoid further confusion.

      The “muddy appearance” is not an issue with the color map. As seen in Figure 1B, the chosen colors are visibly distinct. Figure 1C (previous version) appeared muddy yellow/green because of points that overlap with transparency, resulting in a mix of clearly defined classes (e.g., a yellow point on top of a blue point creating green). This overlap is a meaningful representation of the separability observed in this analysis. We also tried using 2D KDE for visualization, but it did not improve the impression of visual separability.

      We are removing p-values from the figures because they lead to the impression that we over-interpret these results quantitatively. However, we calculated p-values based on label permutation similar to the way R2 suggests (see previous methods). The conflation with the Wasserstein distances is an understandable misunderstanding. These are unrelated to p-values and used for the heatmaps in S1 only (see previous methods).

      Instead of p-values, we now use the adjusted rand index, which measures how accurately neurons within the same region are clustered together (see Line 670 - 671, Figure 1C, and Figure S1) (Hubert & Arabie 1985). This quantifies the extent to which the distribution of points in dimensionally-reduced space is shaped by region/structure.

      Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. https://doi.org/10.1007/BF01908075

      (2) Logistic classifiers:

      The results in this section are somewhat underwhelming. Accuracy is around 40% and yes above chance but I would be very surprised if someone is worried about separating visual structures from the thalamus. Such coarse brain targeting is not difficult. If the authors want to include this data, I recommend they show it as a control in the ISI distribution section. The entire argument here is that perhaps one should not use derived metrics and a nonlinear classifier on more data is better, which is essentially the thrust of the next section.

      As outlined above, our work systematically increases in model complexity. The logistic result is an intermediate model, and it returns intermediate results. This is an important stepping stone between the lack of a result based on unsupervised linear dimensionality reduction and the performance of supervised nonlinear models.

      From a purely utilitarian perspective, the argument could be framed as “one should not use derived metrics, and a nonlinear classifier on more data is better.” However, please see all of our notes above.

      (3) MLP classifiers:

      Even in this section, I was left somewhat underwhelmed that a nonlinear classifier with large amounts of data outperforms a linear classifier with small amounts of data. I found the analysis of the ISIs and which timescales are driving the classifier interesting but I think the classifier with smoothing is more interesting. So with a modest chance level decodability of different brain areas in the visual system, I found it somewhat grandiose to claim a "conserved" code for anatomy in the brain. If there is conservation, it seems to be at the level of the coarse brain organization, which in my opinion is not particularly compelling.

      The sample size used for both the linear and nonlinear classifiers is the same; however, the nonlinear classifier leverages the detailed spiking time information from ISIs. Our goal here was to systematically evaluate how classical spike metrics compare to more detailed temporal features in their ability to decode brain areas. We chose a linear classifier for spike metrics because, with fewer features, nonlinear methods like neural networks often offer very modest advantages over linear methods, less interpretability, and are prone to overfitting.

      Respectfully, we stand by our word choice. The term “conserved” is appropriate given that our results hold appreciably, i.e., statistically above chance, across animals.

      (4) Generalization section:

      The authors suggest that a classifier learned from one set of data could be used for new data. I was unsure if this was a scientific point or the fact that they could use it as a tool.

      It can be both. We are more driven by the scientific implications of a rejection of the null.

      Is the scientific argument that ISIs are similar across areas even in different tasks?

      It appears so - despite heterogeneity in the tuning of single neurons, their presynaptic inputs, and stimuli, there is identifiable information about anatomical location in the spike train.

      Why would one not learn a classifier from every piece of available data: like LFP bands, ISI distributions, and average firing rates, and use that to predict the brain area as a comparison?

      Because this would obfuscate the ability to conclude that spike trains embed information about anatomy.

      Considering all features simultaneously and adding additional data modalities—such as LFP bands and spike waveforms—has potential to improve classification accuracy at the cost of understanding the contribution of each feature. The spike train as a time series is the most fundamental component of neuronal communication. As a result, this is the only feature of neuronal activity of concern for the present investigation.

      Or is the argument that the ISIs are a conserved code for anatomy? Unfortunately, even in this section, the data are underwhelming.

      We appreciate the reviewer’s comments, but arrive at a very different conclusion. We were quite surprised to find any generalizability whatsoever.

      Moreover, for use as a tool, I think the authors need to seriously consider a control that is either waveforms from different brain areas or the local field potentials. Without that, I am struggling to understand how good this tool is. The authors said "because information transmission in the brain arises primarily from the timing of spiking and not waveforms (etc)., our studies involve only the timestamps of individual spikes from well-isolated units ". However, we are not talking about information transmission and actually trying to identify and assess brain areas from electrophysiological data.

      While we are not blind to the “tool” potential that is suggested by our work, this is not the primary motivation or content in any section of the paper. As stated clearly in the abstract, our motivation is to ask “whether individual neurons [...] embed information about their own anatomical location within their spike patterns”. We go on to say “This discovery provides new insights into the relationship between brain structure and function, with broad implications for neurodevelopment, multimodal integration, and the interpretation of large-scale neuronal recordings. Immediately, it has potential as a strategy for in-vivo electrode localization.” Crucially, the last point we make is a nod to application. Indeed, our results suggest that in-vivo electrode localization protocols may benefit from the incorporation of such a model.

      In light of the reviewer’s concerns, we have further dampened the weight of statements about our model as a consumer-ready tool.

      Example 1: The final sentence of the abstract now reads: “Computational approximations of anatomy have potential to support in-vivo electrode localization.”

      Example 2: The results sections now contains the following text: “While significantly above chance, the structure-level model still lacks the accuracy for immediate practical application. However, it is highly likely that the incorporation of datasets with diverse multi-modal features and alternative regions from other research groups will increase the accuracy of such a model. In addition, a computational approach can be combined with other methods of anatomical reconstruction.” (Line 355 - 359).

      Example 3: We replaced the phrase "because information transmission in the brain arises primarily from the timing of spiking and not waveforms (etc) " with the phrase “because information is primarily encoded by the firing rate or the timing of spiking and not waveforms (etc)” (Line 116 - 118).

      (5) Discussion section:

      In the discussion, beginning with "It is reasonable to consider . . ." all the way to the penultimate paragraph, I found the argumentation here extremely hard to follow. Furthermore, the parts of the discussion here I did feel I understood, I heavily disagreed with. They state that "recordings are random in their local sampling" which is almost certainly untrue when it comes to electrophysiology which tends to oversample task-modulated excitatory neurons (https://elifesciences.org/articles/69068). I also disagree that "each neuron's connectivity is unique, and vertebrate brains lack 'identified neurons' characteristic of simple organisms. While brains are only eutelic and "nameable" in only the simplest organisms (C. elegans), cell types are exceedingly stereotyped in their connectivity even in mammals and such connectivity defines their computational properties. Thus I don't find the premise the authors state in the next sentence to be undermined ("it seems unlikely that a single neuron's happenstance imprinting of its unique connectivity should generalize across stimuli and animals"). Overall, I found this subsection to rely on false premises and in my opinion it should be removed.

      At the suggestion of R2, we removed the paragraph in question. However, we would like to address some points of disagreement:

      We agree that electrophysiology, along with spike-sorting, quality metrics, and filtering of low-firing neurons, leads to oversampling of task-modulated neurons. However, when we stated that recordings are random in their local sampling, we were referring to structural (anatomical) randomness, not functional randomness. In other words, the recorded neurons were not specifically targeted (see below).

      Electrode arrays, such as Neuropixels, record from hundreds of neurons within a small volume relative to the total number of neurons and the volume of a given brain region. For instance, the paper R2 referenced includes a statement supporting this: “... assuming a 50-μm ‘listening radius’ for the probes (radius of half-cylinder around the probe where the neurons’ spike amplitude is sufficiently above noise to trigger detection) …, the average yield of 116 regular-spiking units/probe (prior to QC filtering) would imply a density of 42,000 neurons/mm³, much lower than the known density of ~90,000 neurons/mm³ for excitatory cells in mouse visual cortex….”

      If we take the estimated volume of V1 to be approximately 3 mm³, this region could theoretically be subdivided into multiple cylinders with a 100-μm diameter. While stereotaxic implantation of the probe mitigates some variability, the natural anatomical variability across individual animals introduces spatially random sampling. This was the randomness we were referring to, and thus, we disagree with the assertion that our claim is “almost certainly untrue.”

      Additionally, each cortical pyramidal neuron is understood to have ~ 10,000 presynaptic partners. It is highly unlikely that these connections are entirely pre-specified, perfectly replicated within the same animal, and identical across all members of species. Further, there is enormous diversity in the activity properties of even neighboring cells of the same type. Consider pyramidal neurons in V1. Single neuron firing rates are log normally distributed, there are many of combinations of tuning properties (i.e., direction, orientation) that must occupy each point in retinotopic space, and there is powerful experience dependent change in the connectivity of these cells. We suggest that it is inconceivable that any two neurons, even within a small region of V1, have identical connectivity.

      Minor Comments:

      (1) Although the description of confusion matrices is good from a didactic perspective, some of this could be moved to methods to simplify the paper.

      We thank the reviewer for the suggestion. However, given the broad readership of eLife, we gently suggest that confusion matrices are not a trivial and universally appreciated plotting format. For the purpose of accessibility, a brief and didactic 2-sentence description will make the paper far more comprehensible to many readers at little cost to experts.

      (2) Figure 3A: It is concluded in their subsequent figure that the longer the measured amount of time, the better the decoding performance. Thus it makes sense why the average PSTHs do not show significant decoding of areas or structures

      That is a good observation. However, all features were calculated from the same duration of data, except in Figure 3B, where we tested the effect of duration. The averaged PSTH was calculated from the same length of data as the ISI distribution and binned to have the same number of feature lengths as the ISI distribution (refer to Methods section). Therefore, we interpreted this as an indication of information degradation through averaging, rather than an effect of data length (Line 234 - 237).

      (3) Figure 3D: A Gaussian is used to fit the ISI distributions here but ISI distributions do not follow a normal distribution, they follow an inverse gamma distribution.

      We agree with the reviewer and we are familiar with the literature that the ISI distribution is best fitted by a gamma family distribution (as a recent, but not earliest example: Li et al. 2018). However, we did not fit a gaussian (or any distribution) to the data, we just calculated the sample mean and variance. Reporting sample mean and variance (or standard deviation) is not something that is only done for Gaussian distributions. They are broadly used metrics that simply have additional intrinsic meaning for Gaussian distributions. We used the schematic illustration in Fig 3D because mean and variance are much more familiar in Gaussian distribution context, but ultimately that does not affect our analyses in Fig 3 E-F. Alternatively, the alpha and beta intrinsic parameters of a gamma distribution could have been used, but they are known by a much smaller portion of neuroscientists.

      Li, M., Xie, K., Kuang, H., Liu, J., Wang, D., Fox, G. E., ... & Tsien, J. Z. (2018). Spike-timing pattern operates as gamma-distribution across cell types, regions and animal species and is essential for naturally-occurring cognitive states. Biorxiv, 145813(10.1101), 145813.

      (4) Figure 3G: Something is wrong with this figure as each vertical bar is supposed to represent a drifting grating onset but yet, they are all at 5 hz despite the PSTH being purportedly shown at many different frequencies from 1 to 15 hz.

      We appreciate your attention to detail, but we are not representing the onset of individual drifting gratings in this. We just meant to represent the overall start\end of the drifting grating session. We did not intend to signal the temporal frequency of the drifting gratings (or the spatial frequency, orientation, or contrast).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      The mechanism, as I understand it, is different from what the authors described before in the RNN with tonic gain changes. As uncertainty increases, the network enters a regime in which the two excitatory populations start to oscillate. My intuition is that this oscillation arises from the feedback loop created by the new gain control mechanism. If my intuition is correct, I think it would be worth to explain this mechanism in the paper more explicitly.

      While interesting, this intuition is not correct. The oscillations are generated by the interaction between excitatory and inhibitory nodes in the network and occur in the model even with stationary gain. All of the plots in figure 3 exploring the dynamical regime of the network at different input x gain combinations (i.e., where the oscillatory regime is characterised) are simulations run with stationary gain.

      To ensure that this intuition is more clearly presented in the manuscript, we have edited the description in the text.

      P. 12: “Because of the large size of the network, we could not solve for the fixed points or study their stability analytically. Instead, we opted for a numerical approach and characterised the dynamical regime (i.e. the location and existence of approximate fixed-point attractors) across all combinations of (static) gain and  visited by the network.”

      Reviewer #2 (Public review):

      - The demonstration of the causal role of gain modulation in perceptual switches is partial. This causality is clearly demonstrated in the simulation work with the RNN. However, it is not fully demonstrated in the pupil analysis and the fMRI analysis. One reason is that this work is correlative (which is already very informative). An analysis of the timing of the effect might have overcome this limitation. For example, in a previous study, the same group showed that fMRI activity in the LC region precedes changes in the energy landscape of fMRI dynamics, which is a step towards investigating causal links between gain modulation, changes in the energy landscape and perceptual switches.

      Thank you for the suggestion, which we considered in detail. Unfortunately, the  temporal and spatial resolution of the fMRI data collected for this study precluded the same analyses we’ve run in previous work, however this is an important question for future work.

      - Some effects may reflect the expectation of a perceptual switch rather than the perceptual switch itself. To mitigate this risk, the design of the fMRI task included catch trials, in which no switch occurs, to reduce the expectation of a switch. The pupil study, however, did not include such catch trials.

      We agree that this is a limitation of the current study, which we previously highlighted in the methods section.

      - The paper uses RNN-based modelling to provide mechanistic insight into the role of gain modulation in perceptual switches. However, the RNN solves a task that differs markedly from that performed by human participants, which may limit the explanatory value of the model. The RNN is provided with two inputs characterising the sensory evidence supporting the first and last image category in the sequence (e.g. plane and shark). In contrast, observers in the task were naïve as to the identity of the last image at the beginning of the sequence. The brain first receives sensory evidence about the image category (e.g. plane) with which the sequence begins, which is very easy to recognise, then it sees a sequence of morphed images and has to discover what the final image category will be. To discover the final image category, the brain has to search a vast space of possible second images (it is a shark?, a frog?, a bird?, etc.), rather than comparing the likelihood of just two categories. This search process and the perceptual switch in the task appear to be mechanistically different from the competition between two inputs in the RNN.

      We appreciate the critical analysis of the experimental paradigm but disagree with the reviewers conclusions for two keys reasons: 1) Participants prior exposure to the images, such that they could create an expectation about what stimulus category a particular image would transition into (i.e., the image could not switch into any possible category); and 2) even if the reviewers’ concern was founded, models of K winner-take-all decision making are structured identically irrespective of whether the options are 2 or K options all that changes is the simulated reaction times which depend linearly on the K (for an example model see Hugh Wilson’s textbook Spikes, Decisions, and Actions, 1999, p.89-91). For these reasons, we maintain that the RNN is a sensible representation of the behavioural task.

      - Another aspect of the motivation for the RNN model remains unclear. The authors introduce dynamic gain modulation in the RNN, but it is not clear what the added value of dynamic gain modulation is. Both static (Fig. S1) and dynamic (Fig. 2F) gain modulation lead to the predicted effect: faster switching when the gain is larger.

      While we agree that the effect is observable with both static and dynamic gain, the stronger construct validity associated with the dynamic approach, including a stronger link with the observed pupil dynamics and a rich literature associated with modelling the behavioural consequences of surprise/uncertainty led us to the conclusion that the dynamical approach was a better representation of our hypothesis.

      - Fig 1C: I don't see a "top grey bar" indicating significance.

      Thank you for catching this, the caption has been amended. The text was from an older version of the manuscript.

      - p. 10, reference to fig 3F seems incorrect: there is Fig 3F upper and Fig 3F lower, and nothing on Fig 3 and its legend mention the lesion of units

      This has been amended. We meant to refer to 2F.

      - In the response letter you mention a MATLAB tutorial, but I could not find it.

      This has been amended. Github repository can be found at https://github.com/ShineLabUSYD/AmbiguousFigures

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript reports that expression of the E. coli operon topAI/yjhQ/yjhP is controlled by the translation status of a small open reading frame, that authors have discovered and named toiL, located in the leader region upstream of the operon. Authors propose the following model for topAI activation: Under normal conditions, toiL is translated but topAI is not expressed because of Rho-dependent transcription termination within the topAI ORF and because its ribosome binding site and start codon are trapped in an mRNA hairpin. Ribosome stalling at various codons of the toiL ORF, prompted in this work by some ribosome-targeting antibiotics, triggers an mRNA conformational switch which allows translation of topAI and, in addition, activation of the operon's transcription because presence of translating ribosomes at the topAI ORF blocks Rho from terminating transcription. The model is appealing and several of the experimental data mainly support it. However, it remains unanswered what is the true trigger of the translation arrest at toiL and what is the physiological role of the induced expression of the topAI/yjhQ/yjhP operon.

      Reviewer #2 (Public review):

      Summary:

      Baniulyte and Wade describe how translation of an 8-codon uORF denoted toiL upstream of the topAI-yjhQP operon is responsive to different ribosome-targeting antibiotics, consequently controlling translation of the TopAI toxin as well as Rho-dependent termination with the gene.

      Strengths:

      The authors used multiple different approaches such as a genetic screen to identify factors such as 23S rRNA mutations that affect topA1 expression and ribosome profiling to examine the consequences of various antibiotics on toiL-mediated regulation.

      Weaknesses:

      Future experiments will be needed to better understand the physiological role of the toiL-mediated regulation and elucidate the mechanism of specific antibiotic sensing.

      The results are clearly described, and the revisions have helped to improve the presentation of the data.

      Reviewer #3 (Public review):

      In this revised manuscript, the authors provide convincing data to support an elegant model in which ribosome stalling by ToiL promotes downstream topAI translation and prevents premature Rho-dependent transcription termination. However, the physiological consequences of activating topAI-yjhQP expression upon exposure to various ribosome-targeting antibiotics remain unresolved. The authors have satisfactorily addressed all major concerns raised by the reviewers, particularly regarding the SHAPE-seq data. Overall, this study underscores the diversity of regulatory ribosome-stalling peptides in nature, highlighting ToiL's uniqueness in sensing multiple antibiotics and offering significant insights into bacterial gene regulation coordinated by transcription and translation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      - Showing the ribosome density profiles of topAI/yjhQP and toiL in control and tetracycline treated cells is necessary to support that ribosome arrest at toiL increases translation of topAI/yjhQP.

      Figure 7B shows ribosome density around the start of toiL. Ribosome density increases across topAI in the presence of tetracycline, but we have opted not to show this region because we cannot say whether the increase in ribosome occupancy (represented in Figure 7A) is due to an increase in translation efficiency, RNA level, or both.

      - The subinhibitory antibiotic concentrations used in the reporter assays were based on MICs reported in the literature. This is not appropriate since MICs can greatly vary between strains, antibiotic solution stocks, and experimental conditions.

      Reported MICs were used as an initial guide for selecting antibiotic concentrations to test in our reporter assays. We have added text to indicate this, and to highlight that MICs vary considerably between strains.

      - toiL sequence may have evolved to maintain base-pairing with the topAI upstream region rather than, as authors suggest in Discussion, to respond to antibiotic-mediated arrest in an amino acid sequence specific manner.

      We have chosen to frame this as speculation.

      - Authors may consider commenting on the possibility that chloramphenicol does not induce because ToiL lacks alanine residues, whose presence at specific places of a nascent protein have been shown to promote chloramphenicol action (2016 PNAS 113:12150; 2022 NSMB 29:152).

      This is a great point as none of our stalling reporters included an ORF with alanine. We now include a short paragraph in the Discussion section to raise this possibility.

      - Tetracycline was added at the "subinhibitory concentration" of 8 ug/mL for the reporter assays but at 1 ug/mL for the ribosome profiling experiments. Authors should explain what was the rational for this.

      We think the reviewer is mixing up the epidemiological cut-off value of 8 ug/mL with the concentration used in experiments (0.5-1 ug/mL for reporter assays and ribosome profiling). The text was confusing, so we have added a sentence to the Methods section to indicate that epidemiological cut-off values and MICs were only a guide for selecting antibiotic concentrations to test.

      Reviewer #2 (Recommendations for the authors):

      I wish the authors had been slightly less dismissive of the reviewers' comments. At a minimum, it would be nice if the authors could be consistent about the ribosome representation throughout the manuscript;

      We apologize if our previous responses gave the impression of being dismissive. That was certainly not our intention. We greatly value the reviewers' feedback, and we appreciate the opportunity to clarify any misunderstandings. We believe the reviewer is referring to the different shape and color of the ribosome in Figures 8 and 9, and Figure 8 figure supplement 2, which we have now corrected.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public Review):

      Comments on revisions:

      Although the authors have revealed the sulfane sulfur content in native MT-3, my question, namely, whether canonical MT-1 and MT-2 contained sulfane sulfur after the induction has been left.

      The authors argue that the biological significance of sulfane sulfur in MTs lies in its ability to contribute to metal binding affinity, provide a sensing mechanism against oxidative stress, and aid in the regulation of the protein. Due to their biological roles, induced MT-1 and MT-2 could contain sulfane sulfur in their molecules. Thus, I expect the authors to evaluate or explain the sulfane sulfur content in induced MT-1 and MT-2.

      Thank you for your valuable comments. In this study, we were not able to examine the role of sulfane sulfur in the induced forms of MT-1 and MT-2. However, this topic is undoubtedly important and intriguing; therefore, we will continue to explore it in future studies.

      Reviewer #3 (Public Review):

      Comments on revisions:

      The revised manuscript is only slightly changed from the original, with the inclusion of a supplementary figure (Fig. S2) and minor changes in the text. The authors did not choose to carry out the quantitative Zn binding experiment (which I really wanted to see), but given the complexities of the experiment, I'll let it go.

      Fig. 9: the authors imply in the mechanistic "redox-switch" figure that Trx/TR can not reduce persulfide linkages. A number of groups have shown this to be the case. I recommend modifying the figure legend or text to make this clear to the reader.

      Thank you for your understanding. Regarding the "redox-switch" figure, although some groups have demonstrated the ability of Trx to reduce persulfide moieties, as you pointed out, we have addressed this discrepancy in the Discussion section as follows (lines 357-361): “In contrast, Trx has been proposed to reduce the persulfide moiety of PTP1B (37) and albumin (38, 39). A possible explanation for this discrepancy is that apo-GIF/MT-3-persulfide is rapidly changed into a different conformation that is topologically resistant to Trx reduction. In other words, Trx may exhibit substrate specificity.” Additionally, we have inserted the following sentence just before the above discussion to further clarify this point:“This suggests that the persulfide moiety in GIF/MT-3 appears to be relatively stable against Trx reduction.”

    1. Author response:

      Reviewer 1:

      We thank the reviewer for his/her very positive comments.

      Reviewer 2:

      We thank the reviewer for his/her positive evaluation. We plan to add RNAseq data of yeast wild-type and JDP mutant strains as more direct readout for the role of Apj1 in controlling Hsf1 activity. We agree with the reviewer that our study includes one major finding: the central role of Apj1 in controlling the attenuation phase of the heat shock response. In accordance with the reviewer we consider this finding highly relevant and interesting for a broad readership. We agree that additional studies are now necessary to mechanistically dissect how the diverse JDPs support Hsp70 in controlling Hsf1 activity. We believe that such analysis should be part of an independent study but we will indicate this aspect as part of an outlook in the discussion section of a revised manuscript.

      Reviewer 3:

      We thank the reviewer for his/her suggestions. We agree that it is sometimes difficult to distinguish direct effects of JDP mutants on heat shock regulation from indirect ones, which can result from the accumulation of misfolded proteins that titrate Hsp70 capacity. We also agree that an in vitro reconstitution of Hsf1 displacement from DNA by Apj1/Hsp70 will be important, also to dissect Apj1 function mechanistically. We will add this point as outlook to the revised manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      This important and creative study finds that the uplift of the Qinghai-Tibet Plateau-via its resultant monsoon system rather than solely its high elevation-has shifted avian migratory directions from a latitudinal to a longitudinal orientation. However, the main claims are incomplete and only partially supported, as the reliance on eBird data-which lacks the resolution to capture population-specific teleconnections-combined with a limited tracking dataset covering only seven species leaves key aspects of the argument underdetermined, and the critical assumption of niche conservatism is not sufficiently foregrounded in the manuscript. More clearly communicating these limitations would significantly enhance the interpretability of the results, ensuring that the major conclusions are presented in the context of these essential caveats.

      We appreciate your positive comments and constructive suggestions. We fully acknowledge your concerns about clearly communicating the limitations associated with the data used and analytical assumptions. We will try to get more satellite tracking data of birds migrating across the plateau. We will carefully consider the insights that our paper can deliver and make sure the limitations of our datasets and the critical assumption of niche conservatism are clearly presented. By explicitly clarifying these caveats, we believe the transparency and interpretability of the findings will be much improved.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have done a good job of responding to the reviewer's comments, and the paper is now much improved.

      Again, we thank the reviewer for constructive comments during review.

      Reviewer #2 (Public review):

      I would like to thank the authors for the revision and the input they invested in this study.

      We are grateful for your thoughtful feedback and enthusiasms, which will help us improve our manuscript.

      With the revised text of the study, my earlier criticism holds, and your arguments about the counterfactual approach are irrelevant to that. The recent rise of the counterfactual approach might likely mirror the fact that there are too many scientists behind their computers, and few go into the field to collect in situ data. Studies like the one presented here are a good intellectual exercise but the real impact is questionable.

      We understand your question about the relevance of the counterfactual approach used in our study. Our intent in using a counterfactual scenario (reconstructing migration patterns assuming pre-uplift conditions on the QTP) was to isolate the potential influence of the plateau’s geological history on current migration routes. We agree that such an approach must be used properly. In the revision, we will explicitly clarify why this counterfactual comparison is useful – namely, it provides a theoretical baseline to test how much the QTP’s uplift (and the associated monsoon system) might have redirected migration paths. We acknowledge that the counterfactual results are theoretical and will explicitly emphasise the assumptions involved (e.g. species–environment relationships hold between pre- and post- lift environments) in the main text. Nonetheless, we defend the approach as a valuable study design: it helps generate testable hypotheses about migration (for instance, that the plateau’s monsoon-driven climate, rather than just its elevation, introduces an east–west shift en route). We will also tone down the language around this analysis to avoid overstating its real-world relevance. In summary, we will clarify that the counterfactual analysis is meant to complement, not replace, empirical observations, and we will discuss its limitations so that its role is appropriately bounded in the paper.

      All your main conclusions are inferred from published studies on 7! bird species. In addition, spatial sampling in those seven species was not ideal in relation to your target questions. Thus, no matter how fancy your findings look, the basic fact remains that your input data were for 7 bird species only! Your conclusion, “our study provides a novel understanding of how QTP shapes migration patterns of birds” is simply overstretching.

      Thank you for your comments. We apologise for any confusion regarding the scope of our dataset. Our main conclusions are not solely derived from seven bird species. Rather, we integrated a full list of 50 bird species that migrate across the QTP and analysed their migratory patterns with eBird data. We studied the factors influencing their choices of migratory routes with seven species that were among the few with available tracking data across the QTP. In this revision, we will clarify the role of these seven species and the rationale for their selection. Additionally, we attempt to include more satellite tracking data to improve spatial coverage, as recommended by the reviewer and editor. Based on discussions with potential collaborators, we will hopefully include a number of at least 10 more species with available tracking data.

      The way you respond to my criticism on L 81-93 is something different than what you admit in the rebuttal letter. The text of the ms is silent about the drawbacks and instead highlights your perspective. I understand you; you are trying to sell the story in a nice wrapper. In the rebuttal you state: “we assume species' responses to environments are conservative and their evolution should not discount our findings.” But I do not see that clearly stated in the main text.

      Thanks, as suggested we will clearly state the assumptions of niche conservatism in the Introduction.

      In your rebuttal, you respond to my criticism of "No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites" when you responded: ... "we can track the movement of species every week, and capture the breeding and wintering areas for specific populations" I am having a feeling that you either play with words with me or do not understand that from eBird data nobody will be ever able to estimate population-specific teleconnections between breeding and wintering areas. It is simply impossible as you do not track individuals. eBird gives you a global picture per species but not for particular populations. You cannot resolve this critical drawback of your study.

      We agree that inferring population-specific migratory connections (teleconnections) from eBird data is challenging and inherently limited. eBird provides occurrence records for species, but it generally cannot distinguish which breeding population an individual bird came from or exactly where it goes for winter. However, in this study we intend to infer broad-scale movement patterns (e.g. general directions and stopover regions) rather than precise one-to-one population linkages. In the revision, we will carefully rephrase those sections to make clear that our inferences are at the species level and at large spatial scales. We will also explicitly state in the Discussion that confirming population connectivity would require targeted tracking or genetic studies, and that our eBird-based analysis can only suggest plausible routes and region-to-region linkages. We will contrast migratory routes identified by using eBird data and satellite tracking for the same species to check their similarity. We argue that, even with its limits, the eBird dataset can still yield useful insights (such as identifying major flyway corridors over the QTP).

      I am sorry that you invested so much energy into this study, but I see it as a very limited contribution to understanding the role of a major barrier in shaping migration.

      Thank you for recognising our efforts in the study. By integrating both satellite tracking and community-contributed data, we explored how the uplift of the QTP could shape avian migration across the area. We believe our findings provide important insights of how birds balance their responses to large-scale climate change and geological barrier, which yields the most comprehensive picture to date of how the QTP uplift shapes migratory patterns of birds. We will also acknowledge the study’s limitations to ensure that readers understand the context and constraints of our findings.

      My modest suggestion for you is: go into the field. Ideally use bird radars along the plateau to document whether the birds shift the directions when facing the barrier.

      We appreciate your suggestions to incorporate field tracking or radar studies to strengthen our results. All coauthors have years of field experiences, even on the QTP and Arctic. For example, the tracking data of peregrine falcons (Falco peregrinus) that we will incorporate in the revision are collected with during our own fieldwork in the Arctic for more than six years. We agree that more direct tracking (through GPS tagging or radar) would be an ideal way to validate migration pathways and population connectivity. In this revision, as stated above we will try to more species with satellite tracking data. We will also note that future studies should build on our findings by using dedicated tracking of more individual birds and radar monitoring of migration over the QTP. We will cite recent advances in these techniques and suggest that incorporating more tracking data could further test the hypotheses generated by our analyses.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      L55 "an important animal movement behaviour is.." Is there any unimportant animal movement? I mean this sentence is floppy, empty.

      We will rewrite this sentence to remove any ambiguous phrasing.

      L 152-154 This sentence is full of nonsense or you misinterpretation. First of all, the issue of inflexible initiation of migration was related to long-distance migrants only! The way you present it mixes apples and oranges (long- and short-distance migrants). It is not "owing to insufficient responses" but due to inherited patterns of when to take off, photoperiod and local conditions.

      We will remove the sentence to avoid misinterpretation.

      L 158 what is a migration circle? I do not know such a term.

      We will amend it as “annual migration cycle”, which is a more common way to describe the yearly round-trip journey between breeding and wintering grounds of birds.

      L 193 The way you present and mix capital and income breeding theory with your simulation study is quite tricky and super speculative.

      We will present this idea as an inference rather than a conclusion: “This pattern could be consistent with a ‘capital breeding’ strategy — where birds rely on energy reserves acquired before breeding — rather than an ‘income’ strategy that depends on food acquired during breeding. However, we note that this interpretation would require further study.” By adding this caution, we will make it clear that we are not asserting this link as proven fact, only suggesting it as one possible explanation. We will also double-check that the rest of the discussion around this point is framed appropriately.


      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study addresses a novel and interesting question about how the rise of the Qinghai-Tibet Plateau influenced patterns of bird migration, employing a multi-faceted approach that combines species distribution data with environmental modeling. The findings are valuable for understanding avian migration within a subfield, but the strength of evidence is incomplete due to critical methodological assumptions about historical species-environment correlations, limited tracking data, and insufficient clarity in species selection criteria. Addressing these weaknesses would significantly enhance the reliability and interpretability of the results.

      We would like to thank you and two anonymous reviewers for your careful, thoughtful, and constructive feedback on our manuscript. These reviews made us revisit a lot of our assumptions and we believe the paper is much improved as a result. In addition to minor points, we have made three main changes to our manuscript in response to the reviews. First, we addressed the concerns on the assumptions of historical species-environment correlations from perspectives of both theoretical and empirical evidence. Second, we discussed the benefits and limitations of using tracking data in our study and demonstrate how the findings of our study are consolidated with results of previous studies. Third, we clarified our criteria for selecting species in terms of both eBird and tracking data.

      Below, we respond to each comment in turn. Once again, we thank you all for your feedback.

      Public Reviews:

      Reviewer #1 (Public review):

      Strengths:

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The methods are detailed and well described and written in such a fashion that they are transparent and repeatable.

      We are appreciative of the reviewer’s careful reading of our manuscript, encouraging comments and constructive suggestions.

      Weaknesses:

      I only have one major issue, which is possibly a product of the structure requirements of the paper/journal. This relates to the Results and Discussion, line 91 onwards. I understand the structure of the paper necessitates delving immediately into the results, but it is quite hard to follow due to a lack of background information. In comparison to the Methods, which are incredibly detailed, the Results in the main section reads as quite superficial. They provide broad overviews of broad findings but I found it very hard to actually get a picture of the main results in its current form. For example, how the different species factor in, etc.

      Yes, it is the journal request to format in this way (Methods follows the Results and Discussion) for the article type of short reports. As suggested, in the revision we have elaborated on details of our findings, in terms of (i) shifts of distribution of avian breeding and wintering areas under the influence of the uplift of the Qinghai-Tibet Plateau (Lines 102-116), and (ii) major factors that shape current migration patterns of birds in the plateau (Lines 118-138). We have also better referenced the approaches we used in the study.

      Reviewer #2 (Public review):

      Summary:

      The study tries to assess how the rise of the Qinghai-Tibet Plateau affected patterns of bird migration between their breeding and wintering sites. They do so by correlating the present distribution of the species with a set of environmental variables. The data on species distributions come from eBird. The main issue lies in the problematic assumption that species correlations between their current distribution and environment were about the same before the rise of the Plateau. There is no ground truthing and the study relies on Movebank data of only 7 species which are not even listed in the study. Similarly, the study does not outline the boundaries of breeding sites NE of the Plateau. Thus it is absolutely unclear potentially which breeding populations it covers.

      We are very grateful for the careful review and helpful suggestions. We have revised the manuscript carefully in response to the reviewer’s comments and believe that it is much improved as a result. Below are our point-by-point replies to the comments.

      Strengths:

      I like the approach for how you combined various environmental datasets for the modelling part.

      We appreciate the reviewer’s encouragement.

      Weaknesses:

      The major weakness of the study lies in the assumption that species correlations between their current distribution and environments found today are back-projected to the far past before the rise of the Q-T Plateau. This would mean that species responses to the environmental cues do not evolve which is clearly not true. Thus, your study is a very nice intellectual exercise of too many ifs.

      This is a valid concern. We have addressed this from both the perspectives of the theoretical design of our study and empirical evidence.

      First, we agree with the reviewer that species responses to environmental cues might vary over time. Nonetheless, the simulated environments before the uplift of the plateau serve as a counterfactual state in our study. Counterfactual is an important concept to support causation claims by comparing what happened to what would have happened in a hypothetical situation: “If event X had not occurred, event Y would not have occurred” (Lewis 1973). Recent years have seen an increasing application of the counterfactual approach to detect biodiversity change, i.e., comparing diversity between the counterfactual state and real estimates to attribute the factors causing such changes (e.g., Gonzalez et al. 2023). Whilst we do not aim to provide causal inferences for avian distributional change, using the counterfactual approach, we are able to estimate the influence of the plateau uplift by detecting the changes of avian distributions, i.e., by comparing where the birds would have distributed without the plateau to where they currently distributed. We regard the counterfactual environments as a powerful tool for eliminating, to the extent possible, vagueness, as opposed to simply description of current distributions of birds. Therefore, we assume species’ responses to environments are conservative and their evolution should not discount our findings. We have clarified this in the Introduction (Lines 81-93).

      Second, we used species distribution modelling to contrast the distributions of birds before and after the uplift of the plateau under the assumption that species tend to keep their ancestral ecological traits over time (i.e., niche conservatism). This indicates a high probability for species to distribute in similar environments wherever suitable. Particularly, considering bird distributions are more likely to be influenced by food resources and vegetation distributions (Qu et al. 2010, Li et al. 2021, Martins et al. 2024), and the available food and vegetation before the uplift can provide suitable habitats for birds (Jia et al. 2020), we believe the findings can provide valuable insights into the influence of the plateau rise on avian migratory patterns. Having said that, we acknowledge other factors, e.g., carbon dioxide concentrations (Zhang et al. 2022), can influence the simulations of environments and our prediction of avian distribution. We have clarified the assumptions and evidence we have for the modelling in Methods (Lines 362-370).

      The second major drawback lies in the way you estimate the migratory routes of particular birds. No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites. Some might overwinter in India, some populations in Africa and you will never know the teleconnections between breeding and wintering sites of particular species. The few available tracking studies (seven!) are too coarse and with limited aspects of migratory connectivity to give answer on the target questions of your study.

      We agree with the reviewer that establishing interconnections for birds is important for estimating the migration patterns of birds. We employed a dynamic model to assess their weekly distributions. Thus, we can track the movement of species every week, and capture the breeding and wintering areas for specific populations. That being said, we acknowledge that our approach can be subjected to the patchy sampling of eBird data. In contrast, tracking data can provide detailed information of the movement patterns of species but are limited to small numbers of species due to the considerable costs and time needed. We aimed to adopt the tracking data to examine the influence of focal factors on avian migration patterns, but only seven species, to the best of our ability, were acquired. Moreover, similar results were found in studies that used tracking data to estimate the distribution of breeding and wintering areas of birds in the plateau (e.g., Prosser et al. 2011, Zhang et al. 2011, Zhang et al. 2014, Liu et al. 2018, Kumar et al. 2020, Wang et al. 2020, Pu and Guo 2023, Yu et al. 2024, Zhao et al. 2024). We believe the conclusions based on seven species are rigour, but their implications could be restricted by the number of tracking species we obtained. We have better demonstrated how our findings on breeding and wintering areas of birds are reinforced by other studies reporting the locations of those areas. We have also added a separate caveat section to discuss the limitations stated above (Lines 202-215).

      Your set of species is unclear, selection criteria for the 50 species are unknown and variability in their migratory strategies is likely to affect the direction of the effects.

      In this revision, we have clarified the selection criteria for the 50 species and outlined the boundaries of the breeding areas of all birds (Lines 243-249). Briefly, we first obtained a full list of birds in the plateau from Prins and Namgail (2017). We then extracted species identified as full migrants in Birdlife International (https://datazone.birdlife.org/species/spcdistPOS) from the full list. Migratory birds may follow a capital or income migratory strategy depending on how much birds ingest endogenous reserved energy gained prior to reproduction. We have added discussions on how these migratory strategies might influence the effects of environment on migratory direction (Lines 183-200).

      In addition, the position of the breeding sites relative to the Q-T plate will affect the azimuths and resulting migratory flyways. So in fact, we have no idea what your estimates mean in Figure 2.

      We calculated the azimuths not only by the angles between breeding sites and wintering sites but also based on the angles between the stopovers of birds. Therefore, the azimuths are influenced by the relative positions of breeding, wintering and stopover sites. This would minimize the possible errors by just using breeding areas such as the biases caused by relative locations of breeding areas to the QTP as the reviewer pointed. We have better explained this both in the Introduction, Methods and legend of Figure 2.

      There is no way one can assess the performance of your statistical exercises, e.g. performances of the models.

      As suggested, we have reported Area Under the Curve (AUC) of the Receiver Operator Characteristic (ROC)assess the performances of the models (Table S1). AUC is a threshold-independent measurement for discrimination ability between presence and random points (Phillips et al. 2006). When the AUC value is higher than 0.75, the model was considered to be good (Elith et al. 2006). (Lines 379-383).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The Methods are detailed and well described and written in such a fashion that they are transparent and repeatable.

      I only have one major issue, which is possibly a product of the structure requirements of the paper/journal. With the Results and Discussion, line 91 onwards. I understand the structure of the paper necessitates delving immediately into the results, but it is quite hard to follow due to a lack of background information. In comparison to the Methods, which are incredibly detailed, the Results in the main section read quite superficial. They provide broad overviews of broad findings but I found it very hard to actually get a picture of the main results in its current form. For example, how the different species factor in, etc.

      Please see our responses above.

      Reviewer #2 (Recommendations for the authors):

      Methodological issues:

      Line 219 Why have you selected only 64 species and what were the selection criteria?

      We have clarified the selection criteria (Lines 243-248). Briefly, we first obtained a full list of birds in the plateau from Prins and Namgail (2017). We then extracted species identified as full migrants in Birdlife International (https://datazone.birdlife.org/species/spcdistPOS) from the full list.

      Minor:

      Line 219 eBird has very uneven distribution, especially in vast areas of Russia. How can your exercise on Lines 232-238 overcome this issue?

      Yes, eBird data can be biased due to patchy sampling and variation of observers’ skills in identifying species. To address this issue, we have developed an adaptive spatial-temporal modelling (stemflow; Chen et al. 2024) to correct the imbalance distribution of data and modelled the observer experience to address the bias in recognising species. The stemflow was developed based on a machine learning modelling framework (AdaSTEM) which leverages the spatio-temporal adjacency information of sample points to model occurrence or abundance of species at different scales. It has been frequently used in modelling eBird data (Fink et al. 2013, Johnston et al. 2015, Fink et al. 2020) and has been proven to be efficient and advanced in multi-scale spatiotemporal data modelling. We have better explained this (Lines 251-270; Lines 307-321).

      Line 54 This sentence sounds very empty and in fact does not tell us much.

      We have adjusted this sentenced to “Animal movement underpins species’ spatial distributions and ecosystem processes”.

      Line 55 Again a sentence that implies a causality of the annual cycle to make the species migrate. It does not make sense.

      We have revised this sentence as “An important animal movement behaviour is migrating between breeding and wintering grounds”.

      Line 58 How is our fascination with migratory journeys related to the present article? I think this line is empty.

      We have changed this sentence to “Those migratory journeys have intrigued a body of different approaches and indicators to describe and model migration, including migratory direction, speed, timing, distance, and staging periods”.

      Figure 1 - ABC insets are OK, but a combination of lati- and longitudinal patterns is possible, e.g. in species with conservative strategies or for whatever other reason.

      Thank you for the suggestion. We kept the ABC insets rather than combining them together as we believe this can deliver a clear structure of influence of QTP uplift under different scenarios.

      The legend to Figure 2 is not self-explanatory. Please make it clear what the response variable is and its units. The first line of the legend should read something like The influence of environmental factors on the direction of avian migration.

      Thank you. We have amended the legends of Figure 2 as suggested:

      “Figure 2. The influence of environmental factors on the direction of avian migration.  Migratory directions are calculated based on the azimuths between each adjacent stopover, breeding and wintering areas for each species. We employ multivariate linear regression models under the Bayesian framework to measure the correlation between environmental factors and avian migratory directions. Wind represents the wind cost calculated by wind connectivity. Vegetation is measured by the proportion of average vegetation cover in each pixel (~1.9° in latitude by 2.5° in longitude). Temperature is the average annual temperature. Precipitation is the average yearly precipitation. All environmental layers are obtained using the Community Earth System Model. West QTP, central QTP, and East QTP denote areas in the areas west (longitude < 73°E), central (73°E ≤ longitude < 105°E), and east of (longitude ≥ 105°E) the Qinghai-Tibet Plateau, respectively.”

      References

      Chen, Y., Z. Gu, and X. Zhan. 2024. stemflow: A Python Package for Adaptive Spatio-Temporal Exploratory Model. Journal of Open Source Software 9:6158.

      Elith, J., C. H. Graham, R. P. Anderson, M. Dudík, S. Ferrier, A. Guisan, R. J. Hijmans, F. Huettmann, J. R. Leathwick, A. Lehmann, J. Li, L. G. Lohmann, B. A. Loiselle, G. Manion, C. Moritz, M. Nakamura, Y. Nakazawa, J. McC. M. Overton, A. Townsend Peterson, S. J. Phillips, K. Richardson, R. Scachetti-Pereira, R. E. Schapire, J. Soberón, S. Williams, M. S. Wisz, and N. E. Zimmermann. 2006. Novel methods improve prediction of species' distributions from occurrence data. Ecography 29:129-151.

      Fink, D., T. Auer, A. Johnston, V. Ruiz-Gutierrez, W. M. Hochachka, and S. Kelling. 2020. Modeling avian full annual cycle distribution and population trends with citizen science data. Ecological Applications 30:e02056.

      Fink, D., T. Damoulas, and J. Dave. 2013. Adaptive Spatio-Temporal Exploratory Models: Hemisphere-wide species distributions from massively crowdsourced eBird data. Pages 1284-1290 in Proceedings of the AAAI Conference on Artificial Intelligence.

      Gonzalez, A., J. M. Chase, and M. I. O'Connor. 2023. A framework for the detection and attribution of biodiversity change. Philosophical Transactions of the Royal Society B: Biological Sciences 378.

      Jia, Y., H. Wu, S. Zhu, Q. Li, C. Zhang, Y. Yu, and A. Sun. 2020. Cenozoic aridification in Northwest China evidenced by paleovegetation evolution. Palaeogeography, Palaeoclimatology, Palaeoecology 557:109907.

      Johnston, A., D. Fink, M. D. Reynolds, W. M. Hochachka, B. L. Sullivan, N. E. Bruns, E. Hallstein, M. S. Merrifield, S. Matsumoto, and S. Kelling. 2015. Abundance models improve spatial and temporal prioritization of conservation resources. Ecological Applications 25:1749-1756.

      Kumar, N., U. Gupta, Y. V. Jhala, Q. Qureshi, A. G. Gosler, and F. Sergio. 2020. GPS-telemetry unveils the regular high-elevation crossing of the Himalayas by a migratory raptor: implications for definition of a “Central Asian Flyway”. Scientific Reports 10:15988.

      Lewis, D. 1973. Counterfactuals. Oxford: Blackwell.

      Li, S.-F., P. J. Valdes, A. Farnsworth, T. Davies-Barnard, T. Su, D. J. Lunt, R. A. Spicer, J. Liu, W.-Y.-D. Deng, J. Huang, H. Tang, A. Ridgwell, L.-L. Chen, and Z.-K. Zhou. 2021. Orographic evolution of northern Tibet shaped vegetation and plant diversity in eastern Asia. Science Advances 7:eabc7741.

      Liu, D., G. Zhang, H. Jiang, and J. Lu. 2018. Detours in long-distance migration across the Qinghai-Tibetan Plateau: individual consistency and habitat associations. PeerJ 6:e4304.

      Martins, L. P., D. B. Stouffer, P. G. Blendinger, K. Böhning-Gaese, J. M. Costa, D. M. Dehling, C. I. Donatti, C. Emer, M. Galetti, R. Heleno, Í. Menezes, J. C. Morante-Filho, M. C. Muñoz, E. L. Neuschulz, M. A. Pizo, M. Quitián, R. A. Ruggera, F. Saavedra, V. Santillán, M. Schleuning, L. P. da Silva, F. Ribeiro da Silva, J. A. Tobias, A. Traveset, M. G. R. Vollstädt, and J. M. Tylianakis. 2024. Birds optimize fruit size consumed near their geographic range limits. Science 385:331-336.

      Phillips, S. J., R. P. Anderson, and R. E. Schapire. 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190:231-259.

      Prins, H. H. T., and T. Namgail. 2017. Bird migration across the Himalayas : wetland functioning amidst mountains and glaciers. Cambridge University Press, Cambridge.

      Prosser, D. J., P. Cui, J. Y. Takekawa, M. Tang, Y. Hou, B. M. Collins, B. Yan, N. J. Hill, T. Li, Y. Li, F. Lei, S. Guo, Z. Xing, Y. He, Y. Zhou, D. C. Douglas, W. M. Perry, and S. H. Newman. 2011. Wild Bird Migration across the Qinghai-Tibetan Plateau: A Transmission Route for Highly Pathogenic H5N1. Plos One 6:e17622.

      Pu, Z., and Y. Guo. 2023. Autumn migration of black-necked crane (Grus nigricollis) on the Qinghai-Tibetan and Yunnan-Guizhou plateaus. Ecology and Evolution 13:e10492.

      Qu, Y., F. Lei, R. Zhang, and X. Lu. 2010. Comparative phylogeography of five avian species: implications for Pleistocene evolutionary history in the Qinghai-Tibetan plateau. Molecular Ecology 19:338-351.

      Wang, Y., C. Mi, and Y. Guo. 2020. Satellite tracking reveals a new migration route of black-necked cranes (Grus nigricollis) in Qinghai-Tibet Plateau. PeerJ 8:e9715.

      Yu, X., G. Song, H. Wang, Q. Wei, C. Jia, and F. Lei. 2024. Migratory flyways and connectivity of Brown Headed Gulls (Chroicocephalus brunnicephalus) revealed by GPS tracking. Global Ecology and Conservation 56:e03340.

      Zhang, G.-G., D.-P. Liu, Y.-Q. Hou, H.-X. Jiang, M. Dai, F.-W. Qian, J. Lu, T. Ma, L.-X. Chen, and Z. Xing. 2014. Migration routes and stopover sites of Pallas’s Gulls Larus ichthyaetus breeding at Qinghai Lake, China, determined by satellite tracking. Forktail 30:104-108.

      Zhang, G.-G., D.-P. Liu, Y.-Q. Hou, H.-X. Jiang, M. Dai, F.-W. Qian, J. Lu, Z. Xing, and F.-S. Li. 2011. Migration Routes and Stop-Over Sites Determined with Satellite Tracking of Bar-Headed Geese (Anser indicus) Breeding at Qinghai Lake, China. Waterbirds 34:112-116, 115.

      Zhang, R., D. Jiang, C. Zhang, and Z. Zhang. 2022. Distinct effects of Tibetan Plateau growth and global cooling on the eastern and central Asian climates during the Cenozoic. Global and Planetary Change 218:103969.

      Zhao, T., W. Heim, R. Nussbaumer, M. van Toor, G. Zhang, A. Andersson, J. Bäckman, Z. Liu, G. Song, M. Hellström, J. Roved, Y. Liu, S. Bensch, B. Wertheim, F. Lei, and B. Helm. 2024. Seasonal migration patterns of Siberian Rubythroat (Calliope calliope) facing the Qinghai–Tibet Plateau. Movement Ecology 12:54.

    1. Author response:

      Reviewer #1 (Public Review):

      Overall, I find only two minor weaknesses. First, the insights of this study are, first and foremost, of feed-forward nature, and a feed-forward network would have been enough (and the more parsimonious model) to illustrate the results. While using a recurrent neural network (RNN) shows that the results are, in general, compatible with recurrent dynamics, the specific limitations imposed by RNNs (e.g., dynamical stability, low-dimensional internal dynamics) are not the focus of this study. Indeed, the additional RNN models in the supplementary material show that under more constrained conditions for the RNN (low-dimensional dynamics), using the input control alone runs into difficulties.

      We thank the reviewer for raising this important point. While we agree that recurrent dynamics were not the focus of this study, we would like to point out that 1) dynamics, of some kind, are necessary to simulate the decoder fitting process and 2) recurrent neural networks (RNNs) are valuable for obtaining general insights on how biological constraints shape the reachable manifold:

      (1) To simulate the decoder fitting process, we had to simulate neural activity during the so-called “calibration task”. Some dynamics to these responses are necessary to produce a population response with dimensionality resembling what was found in experiments (10 dimensions). Moreover, dynamics are necessary to create a common direction of high variance across population responses to the calibration task stimuli (see Supplementary Figure 2a and surrounding discussion), which is necessary to reproduce the biases in readouts demonstrated in Figure 4 (as many within-manifold decoder perturbations are aligned with it; Supplementary Figure 2b).

      Because feed-forward networks lack dynamics, reproducing our results with a feed-forward network would require using an input with dynamics. Rather than making an arbitrary choice for these input dynamics, we chose to keep the input static and instead generate the dynamics with a RNN, which is in line with recent models of motor cortex.

      We agree, however, that this is an important point worth clarifying in the manuscript. In our revision we will aim to add a demonstration of how to reproduce a subset of our results with a feed-forward network and a dynamic input.

      (2) While we agree that RNNs impose certain limitations over feed-forward networks, we see these limitations as an advantage because they provide a framework for understanding the structure of the reachable manifold in terms of biological constraints. For example, our simulations in Supplementary Figure 1 show that the dimensionality of the reachable manifold is highly dependent on recurrent connectivity: inhibition-stabilized connectivity makes it higher-dimensional whereas task-specific optimized connectivity makes it lower-dimensional. Such insights are valuable to understand the broader implications and experimental predictions of the re-aiming strategy.

      Because feed-forward networks are untied from the reality of recurrent cortical circuitry, they cannot be characterized in terms of such biological constraints. For instance, as the reviewer points out, dynamical stability is not a well-defined property of feed-forward networks. Such models therefore cannot provide any insight into how the biological constraint of dynamical stability could influence the reachable manifold (which we show it does in Figure 5b). Relatedly, feed-forward networks cannot be optimized to solve complex spatiotemporal tasks like the ballistic reaching task we used for our task-optimized RNN (Supplementary Figure 1, right column), so cannot be used to understand how such behavioral constraints would influence the reachable manifold.

      We agree that these reasons for using RNNs are subtle and left implicit in how they are currently exposed in the text. We will add a discussion point clarifying these in our revision.

      Second, explaining the quantitative differences between the model and data for shifts in tuning curves seems to take the model a bit too literally. The model serves greatly for qualitative observations. I assume, however, that many of the unconstrained aspects of the model would yield quantitatively different results.

      We completely agree: our model is best used to provide a qualitative description of the capabilities of the re-aiming strategy. We will be sure to revise our manuscript to keep such quantitative comparisons at a minimum.

      Reviewer #2 (Public Review):

      The authors mention alternative models (eg, based on synaptic plasticity in the RNN and/or input weights) that can explain the same experimental data that they do, they do not provide any direct comparisons to those models. Thus, the main argument that the authors have in favor of their model is the fact that it is more plausible because it relies on performing the optimization in a low-dimensional space. It would be nice to see more quantitative arguments for why the re-aiming strategy may be more plausible than synaptic plasticity (either by showing that it explains data better, or explaining why it may be more optimal in the context of fast learning).

      We agree this remains a limitation of our study. To contrast our re-aiming model with models of synaptic plasticity (in the input and/or recurrent weights), we have included substantial discussion of these alternative models in two sections of the manuscript:

      • Introduction, where we elaborate on the argument that synaptic plasticity requires solving an exceptionally difficult optimization problem in high dimensions

      • Discussion section “The role of synaptic plasticity in BCI learning”, where we review a number of synaptic plasticity models and experimental results they can account for

      We fully agree that more quantitative comparisons remain an important follow-up to this line of research. However, it is worth noting that there are many such models out there. Moreover, as is the case with many computational models, the results one can achieve with any given model can be highly sensitive to a number of different hyperparameters (e.g. learning rates). We therefore feel that a more rigorous comparison requires deeper study and is out of scope of this manuscript.

      In particular, the authors model the adaptation to outside-manifold perturbations (OMPs) through a "generalized re-aiming strategy". This assumes the existence of additional command variables, which are not used in the original decoding task, but can then be exploited to adapt to these OMPs. While this model is meant to capture the fact that optimization is occurring in a low-dimensional subspace, the fact that animals take longer to adapt to OMPs suggests that WMPs and OMPs may rely on different learning mechanisms, and that synaptic plasticity may actually be a better model of adaptation to OMPs. 

      We thank the reviewer for raising this question. We agree that the fact that animals take longer to adapt to OMPs suggests that the underlying learning strategy is somehow different. But the argument we try to make in this section of the paper is that it in fact does not require an entirely different mechanism. Our simulations show that the same mechanism of re-aiming can suffice to learn OMPs, but it simply requires re-aiming in the larger space of all command variables available to the motor system (rather than just the two command variables evoked by the calibration task). Because this is a much higher-dimensional search space (10-20 vs. 2 dimensions, which is a substantial difference due to the curse of dimensionality), we argue that learning should be slower, even though the mechanism (i.e. re-aiming) is the same.

      This is an important and somewhat surprising takeaway from these simulations, which we will try to bring up more explicitly and clearly in the revision.

      It would be important to discuss how exactly generalized re-aiming would differ from allowing plasticity in the input weights, or in all weights in the network. Do those models make different predictions, and could they be differentiated in future experiments?

      They do in fact make different predictions, and we thank the reviewer for asking and pointing out the lack of discussion of this point. The key difference between these two learning mechanisms is demonstrated in Figure 5b: under generalized re-aiming, there is a fundamental limit to the set of activity patterns one can learn to produce in the brain-computer interface (BCI) learning task. This is quantified in that analysis by the asymptotic participation ratio of the reachable manifold as K increases, which indicates that there is a limited ~12-dimensional subspace that the reachable manifold can occupy. The specific orientation of this subspace is determined by the (recurrent and input) connectivity of the recurrent neural network. With synaptic plasticity in any of the weight matrices (Wrec,Win,U), this subspace could be re-oriented in any arbitrary direction. Our theory of “generalized re-aiming” therefore predicts that the reachable manifold is 1) constrained to a low-d subspace and 2) is not modified when learning BCIs with outside-manifold perturbations.

      Experimentally testing this would require a within-/outside- manifold perturbation BCI learning task akin to that of Sadtler et al, but where the “intrinsic manifold” is measured from population responses evoked by every possible motor command so as to entirely contain the full reachable manifold at max K. This would require measuring motor cortical activity during naturalistic behavior under a wide range of conditions, rather than just in response to the 2D cursor movements on the screen used in the calibration task of the original study. In this case, learning outside-manifold perturbations would require re-orienting the reachable manifold, so a pure generalized re-aiming strategy would fail to learn them. Synaptic plasticity, on the other hand, would not.

      We will be sure to elaborate further on this claim in the revised manuscript.

    1. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The manuscript's logical flow is challenging and hard to follow, and key arguments could be more clearly structured, particularly in transitions between mechanistic components.

      We will revise our manuscript so as to make it easy to follow the logical flow in transitions between mechanistic components.

      (2) The causality between stress-induced α2A-AR internalization and the enhanced MAO-A remains unclear. Direct experimental evidence is needed to determine whether α2A-AR internalization itself or Ca<sup>2+</sup> drives MAO-A activation, and how they activate MAO-A should be considered.

      We believe that the causality between stress-induced α2A-AR internalization and the enhancement of MAO-A is clearly demonstrated by our current experiments, while our explanations may be improved by making them easier to understand especially for those who are not expert on electrophysiology.

      Firstly, it is well established that autoinhibition in LC neurons is mediated by α2A-AR coupled-GIRK (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience). We found that spike frequency adaptation in LC neurons was also mediated by α2A-AR coupled GIRK-I (Fig. 1A-I), and that α2A-AR coupled GIRK-I underwent [Ca<sup>2+</sup>]<sub>i</sub>-dependent rundown (Figs. 2, S1, S2), leading to an abolishment of spike-frequency adaptation (Figs. S4). [Ca<sup>2+</sup>]<sub>i</sub>-dependent rundown of α2A-AR coupled GIRK-I was prevented by barbadin (Fig 2G-J), which prevents the internalization of G-protein coupled receptor (GPCR) channels.

      Abolishment of spike frequency adaptation itself, i.e., “increased spike activity” can increase [Ca<sup>2+</sup>]<sub>i</sub> because [Ca<sup>2+</sup>]<sub>i</sub> is entirely dependent on the spike activity as shown by Ca<sup>2+</sup> imaging method in Figure S3.

      Thus, α2A-AR internalization can increase [Ca<sup>2+</sup>]<sub>i</sub> through the abolishment of autoinhibition or spike frequency adaptation, and a [Ca<sup>2+</sup>]<sub>i</sub> increase drives MAO-A activation as reported previously (Cao et al., 2007, BMC Neurosci). The mechanism how Ca<sup>2+</sup> activates MAO-A is beyond the scope of the current study.

      Our study just focused on the mechanism how chronic or sever stress can cause persistent overexcitation and how it results in LC degeneration.

      (3) The connection between α2A-AR internalization and increased cytosolic NA levels lacks direct quantification, which is necessary to validate the proposed mechanism.

      Direct quantification of the relationship between α2A-AR internalization and increased cytosolic NA levels may not be possible, and may not be necessarily needed to be demonstrated as explained below.

      The internalization of α2A-AR can increase [Ca<sup>2+</sup>]<sub>i</sub> through the abolishment of autoinhibition or spike frequency adaptation, and [Ca<sup>2+</sup>]<sub>i</sub> increases can facilitate NA autocrine (Huang et al., 2007), similar to the transmitter release from nerve terminals (Kaeser & Regehr, 2014, Annu Rev Physiol).

      Autocrine released NA must be re-uptaken by NAT (NA transporter), which is firmly established (Torres et al., 2003, Nat Rev Neurosci). Re-uptake of NA by NAT is the only source of intracellular NA, and NA re-uptake by NAT should be increased as the internalization of NA biding site (α2A-AR) progresses in association with [Ca<sup>2+</sup>]<sub>i</sub> increases (see page 11, lines 334-336).

      Thus, the connection between α2A-AR internalization and increased cytosolic NA levels is logically compelling, and the quantification of such connection may not be possible at present (see the response to the comment made by the Reviewer #1 as Recommendations for the authors (2) and beyond the scope of our current study.

      (4) The chronic stress model needs further validation, including measurements of stress-induced physiological changes (e.g., corticosterone levels) to rule out systemic effects that may influence LC activity. Additional behavioral assays for spatial memory impairment should also be included, as a single behavioral test is insufficient to confirm memory dysfunction.

      It is well established that restraint stress (RS) increases corticosterone levels depending on the period of RS (García-Iglesias et al., 2014, Neuropharmacology), although we are not reluctant to measure the corticosterone levels. In addition, there are numerous reports that showed the increased activity of LC neurons in response to various stresses (Valentino et al., 1983; Valentino and Foote, 1988; Valentino et al., 2001; McCall et al., 2015), as described in the text (page 4, lines 96-98). Measurement of cortisol levels may not be able to rule out systemic effects of CRS on the whole brain.

      We had already done another behavioral test using elevated plus maze (EPM) test.

      By combining the two tests, it may be possible to more accurately evaluate the results of Y-maze test by differentiating the memory impairment from anxiety. However, the results obtained by these behavioral tests are just supplementary to our current aim to elucidate the cellular mechanisms for the accumulation of cytosolic free NA. Its subsequent anxiety and memory impairment are just supplementary to our current study. We will soften the implication of anxiety and memory impairment.

      (5) Beyond b-arrestin binding, the role of alternative internalization pathways (e.g., phosphorylation, ubiquitination) in α2A-AR desensitization should be considered, as current evidence is insufficient to establish a purely Ca<sup>2+</sup>-dependent mechanism.

      We can hardly agree with this comment.

      It was clearly demonstrated that repeated application of NA itself did not cause desensitization of α2A-AR (Figure S1A-D), and that the blockade of b-arrestin binding by barbadin completely suppressed the Ca<sup>2+</sup>-dependent downregulation of GIRK (Fig. 2G-K). These observations can clearly rule out the possible involvement of phosphorylation or ubiquitination for the desensitization.

      Not only the barbadin experiment, but also the immunohistochemistry and western blot method clearly demonstrated the decrease of α2A-AR expression on the cell membrane (Fig. 3).

      Ca<sup>2+</sup>-dependent mechanism of the rundown of GIRK was convincingly demonstrated by a set of different protocols of voltage-clamp study, in which Ca<sup>2+</sup> influx was differentially increased. The rundown of GIRK-I was orderly potentiated or accelerated by increasing the number of positive command pulses each of which induces Ca<sup>2+</sup> influx (compare Figure S1E-J, Figure S2A-E and Figure S2F-K along with Fig. 2A-F). The presence or absence of Ca<sup>2+</sup> currents and the amount of Ca<sup>2+</sup> currents determined the trend of the rundown of GIRK-I (Figs. 2, S1 and S2). Because the same voltage protocol hardly caused the rundown when it did not induce Ca<sup>2+</sup> currents in the absence of TEA (Fig. S1F; compare with Fig. 2B), blockade of Ca<sup>2+</sup> currents by nifedipine would not be so beneficial.

      We believe the series of voltage-clamp protocols convincingly demonstrated the orderly involvement of [Ca<sup>2+</sup>]<sub>i</sub> in accelerating the rundown of GIRK-I.

      (6) NA leakage for free NA accumulation is also influenced by NAT or VMAT2. Please discuss the potential role of VMAT2 in NA accumulation within the LC in AD.

      We will discuss the role of VMAT2 in NA accumulation, especially when VMAT2 was impaired. Indeed, it has been demonstrated that reduced VMAT2 levels increased susceptibility to neuronal damage: VMAT2 heterozygote mice displayed increased vulnerability to MPTP as evidenced by reductions in nigral dopamine cell counts (Takahashi et al, 1997, PNAS). Thus, when the activity of VMAT2 in LC neurons were impaired by chronic restraint stress, cytosolic NA levels in LC neurons would increase. We will add such discussion in the revised manuscript.

      (7) Since the LC is a small brain region, proper staining is required to differentiate it from surrounding areas. Please provide a detailed explanation of the methodology used to define LC regions and how LC neurons were selected among different cell types in brain slices for whole-cell recordings.

      LC neurons were identified immunohistochemically and electrophysiologically as we previously reported (see Fig. 2 in Front. Cell. Neurosci. 16:841239. doi: 10.3389/fncel.2022.841239). A delayed spiking pattern in response to depolarizing pulses (Figure S9) applied at a hyperpolarized membrane potential was commonly observed in LC neurons in many studies (Masuko et al., 1986; van den Pol et al., 2002; Wagner-Altendorf et al., 2019).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The manuscript reports that chronic stress for 5 days increases MAO-A levels in LC neurons, leading to the production of DOPEGAL, activation of AEP, and subsequent tau cleavage into the tau N368 fragment, ultimately contributing to neuronal damage. However, the authors used wild-type C57BL/6 mice, and previous literature has indicated that AEP-mediated tau cleavage in wild-type mice is minimal and generally insufficient to cause significant behavioral alterations. Please clarify and discuss this apparent discrepancy.

      In our study, normalized relative value of AEP-mediated tau cleavage (Tau N368) was much higher in CRS mice than non-stress wild-type mice. It is not possible to compare AEP-mediated tau cleavage between our non-stress wild type mice and those observed in previous study (Zhang et al., 2014, Nat Med), because band intensity is largely dependent on the exposure time and its numerical value is the normalized relative value. In view of such differences, our apparent band expression might have been intensified to detect small changes.

      (2) It is recommended that the authors include additional experiments to examine the effects of different durations and intensities of stress on MAO-A expression and AEP activity. This would strengthen the understanding of stress-induced biochemical changes and their thresholds.

      GIRK rundown was almost saturated after 3-day RS and remained the same in 5-day RS mice (Fig. 4A-G), which is consistent with the downregulation of α2A-AR and GIRK1 expression by 3-day RS (Fig. 3C, F and G; Fig. 4J and K). However, we examine the protein levels of MAO-A, pro/active-AEP and Tau N368 only in 5-day RS mice without examining in 3-day RS mice. This is because we considered the possibility that 3-day RS may be insufficient to induce changes in MAO-A, AEP and Tau N368 and some period of high [Ca<sup>2+</sup>]<sub>i</sub> condition may be necessary to induce such changes. We will discuss this in the revised manuscript.

      (3) Please clarify the rationale for the inconsistent stress durations used across Figures 3, 4, and 5. In some cases, a 3-day stress protocol is used, while in others, a 5-day protocol is applied. This discrepancy should be addressed to ensure clarity and experimental consistency.

      Please see our response to the comment (2).

      (4) The abbreviation "vMAT2" is incorrectly formatted. It should be "VMAT2," and the full name (vesicular monoamine transporter 2) should be provided at first mention.

      Thank you for your suggestion. We will revise accordingly.

    1. Author response:

      We thank the referees for finding our work well written and systematic. We are planning a revision of the manuscript based on the public review and the confidential recommendations of the referees.

      The role of axons:

      Indeed, radial axon projections appear before mature epithelial stripes in the cornea (Iannaccone et al., 2012). Our claim is, however, not that guidance cues are absent, but that global cues are unnecessary. The alignment term in our model, together with evidence that corneal epithelial cells follow contact-mediated substrate cues (Walczysko et al., 2016), show that corneal cells migration is responsive to external forces, and the underlying patterns of axonal projections could be one of those cues.

      Experiments (Collinson et al., 2002) and simulations in this work show that a rapid spiral epithelial flow forms first, with cells migrating radially for ~2 weeks before stripes become visible. Axons seeking the path of least resistance within this moving basal layer would therefore appear radial early on. By contrast, establishing visible stripes requires an entire cohort of epithelial cells to travel from the limbus to the central cornea (Fig. 7). Extensive in-vivo studies (Song et al., 2004; Leiper et al., 2009) find no evidence that axons direct epithelial migration; if anything, epithelial flow dictates axonal trajectories.

      Geometry and boundaries:

      The spiral also forms on a flat disc, but its exact shape changes with curvature and cap angle; this variation is seen across mammals, including humans (Dua et al., 1993) and in diseases such as keratoconus. On a spherical cap the boundary winding number fixes the interior index, so ongoing limbal influx keeps the total index = 1. 

      In the revised version, we will therefore simulate a range of curvatures, cap angles, a prolate ellipsoid, and cases without limbal division, then compare with published data and disease states.

      In-vitro data and parameter fits:

      Although our dataset is limited, the inferred parameters match three independent invitro estimates (Kostanjevec et al. 2020; Saraswathibhatla et al. 2021; Kammeraat et al. in prep.). Spatial correlations exceed those expected from persistence alone, implying some polar alignment - consistent with Saraswathibhatla et al. 2021.  Slide-scanner images that we will include in the revision show cells are neither elongated nor nematically ordered. In the revision we will detail our parameter extraction, highlight evidence for alignment, stress the substrate-based activity mechanism, and draw attention to the supplementary videos.

      Topological clarification:

      Stagnation points can be seen as topological defects because classification depends only on vector directions. Boundary conditions can remove such defects in fluids, yet two sources/sinks still interact via the same logarithmic Green’s function that governs disclinations, despite di^erent physics. The Euler characteristic is a property of the surface; while the boundary winding number fixes the field index, it does not alter the surface’s Euler characteristic. 

      In the revision, we will add a concise primer on the di^erential-geometric concepts to make these points explicit.

    1. Author response:

      We thank the reviewers for their thoughtful and generous assessment of our work. Overall, the reviewers found our work to be novel and relevant. In particular: reviewer #1 found that our manuscript “It is timely and highly valuable for the telomere field” reviewer #2 stated, “Overall, I find this manuscript worthy of publication, as the optimized END-seq methods described here will likely be widely utilized in the telomere field.” Reviewer #3 stated that “The study is original, the experiments were well-controlled and excellently executed.”

      We are extremely grateful for these comments and want to thank all the reviewers and the editors for their time and effort in reviewing our work.

      The reviewers had a number of suggestions to improve our work. We have addressed all the points as highlighted in the point-by-point responses below.

      Reviewer 1:

      One minor question would be whether the authors could expand more on the application of END-Seq to examine the processive steps of the ALT mechanism? Can they speculate if the ssDNA detected in ALT cells might be an intermediate generated during BIR (i.e., is the ssDNA displaced strand during BIR) or a lesion? Furthermore, have the authors assessed whether ssDNA lesions are due to the loss of ATRX or DAXX, either of which can be mutated in the ALT setting?

      We appreciate the reviewer’s insightful questions regarding the application of our assays to investigate the nature of the ssDNA detected in ALT telomeres. Our primary aim in this study was to establish the utility of END-seq and S1-END-seq in telomere biology and to demonstrate their applicability across both ALT-positive and -negative contexts. We agree that exploring the mechanistic origins of ssDNA would be highly informative, and we anticipate that END-seq–based approaches will be well suited for such future studies. However, it remains unclear whether the resolution of S1-END-seq is sufficient to capture transient intermediates such as those generated during BIR. We have now included a brief speculative statement in the revised discussion addressing the potential nature of ssDNA at telomeres in ALT cells.

      Reviewer #2:

      How can we be sure that all telomeres are equally represented? The authors seem to assume that END-seq captures all chromosome ends equally, but can we be certain of this? While I do not see an obvious way to resolve this experimentally, I recommend discussing this potential bias more extensively in the manuscript.

      We thank the reviewer for raising this important point. END-seq and S1-END-seq are unbiased methods designed to capture either double-stranded or single-stranded DNA that can be converted into blunt-ended double-stranded DNA and ligated to a capture oligo. As such, if a subset of telomeres cannot be processed using this approach, it is possible that these telomeres may be underrepresented or lost. However, to our knowledge, there are no proposed telomeric structures that would prevent capture using this method. For example, even if a subset of telomeres possesses a 5′ overhang, it would still be captured by END-seq. Indeed, we observed the consistent presence of the 5′-ATC motif across multiple cell lines and species (human, mouse, and dog). More importantly, we detected predictable and significant changes in sequence composition when telomere ends were experimentally altered, either in vivo (via POT1 depletion) or in vitro (via T7 exonuclease treatment). Together, these findings support the robustness of the method in capturing a representative and dynamic view of telomeres across different systems.

      That said, we have now included a brief statement in the revised discussion acknowledging that we cannot fully exclude the possibility that a subset of telomeres may be missed due to unusual or uncharacterized structures

      I believe Figures 1 and 2 should be merged.

      We appreciate the reviewer’s suggestion to merge Figures 1 and 2. However, we feel that keeping them as separate figures better preserves the logical flow of the manuscript and allows the validation of END-seq and its application to be presented with appropriate clarity and focus. We hope the reviewer agrees that this layout enhances the clarity and interpretability of the data.

      Scale bars should be added to all microscopy figures.

      We thank the reviewer for pointing this out. We have now added scale bars to all the microscopy panels in the figures and included the scale details in the figure legends.

      Reviewer #3:

      Overall, the discussion section is lacking depth and should be expanded and a few additional experiments should be performed to clarify the results.

      We thank the reviewer for the suggestions. Based on this reviewer’s comments and comments for the other reviewers, we incorporated several points into the discussion. As a result, we hope that we provide additional depth to our conclusions.

      (1) The finding that the abundance of variant telomeric repeats (VTRs) within the final 30 nucleotides of the telomeric 5' ends is similar in both telomerase-expressing and ALT cells is intriguing, but the authors do not address this result. Could the authors provide more insight into this observation and suggest potential explanations? As the frequency of VTRs does not seem to be upregulated in POT1-depleted cells, what then drives the appearance of VTRs on the C-strand at the very end of telomeres? Is CST-Pola complex responsible?

      The reviewer raises a very interesting and relevant point. We are hesitant at this point to speculate on why we do not see a difference in variant repeats in ALT versus non-ALT cells, since additional data would be needed. One possibility is that variant repeats in ALT cells accumulate stochastically within telomeres but are selected against when they are present at the terminal portion of chromosome ends. However, to prove this hypothesis, we would need error-free long-read technology combined with END-seq. We feel that developing this approach would be beyond the scope of this manuscript.

      (2) The authors also note that, in ALT cells, the frequency of VTRs in the first 30 nucleotides of the S1-END-SEQ reads is higher compared to END-SEQ, but this finding is not discussed either. Do the authors think that the presence of ssDNA regions is associated with the VTRs? Along this line, what is the frequency of VTRs in the END-SEQ analysis of TRF1-FokI-expressing ALT cells? Is it also increased? Has TRF1-FokI been applied to telomerase-expressing cells to compare VTR frequencies at internal sites between ALT and telomerase-expressing cells?

      Similarly to what is discussed above, short reads have the advantage of being very accurate but do not provide sufficient length to establish the relative frequency of VTRs across the whole telomere sequence. The TRF1-FokI experiment is a good suggestion, but it would still be biased toward non-variant repeats due to the TRF1-binding properties. We plan to address these questions in a future study involving long-read sequencing and END-seq capture of telomeres.

      Finally, in these experiments (S1-END-SEQ or END-SEQ in TRF1-Fok1), is the frequency of VTRs the same on both the C- and the G-rich strands? It is possible that the sequences are not fully complementary in regions where G4 structures form.

      We thank the reviewer for this observation. While we do observe a higher frequency of variant telomeric repeats (VTRs) in the first 30 nucleotides of S1-END-seq reads compared to END-seq in ALT cells, we are currently unable to determine whether this difference is significant, as an appropriate control or matched normalization strategy for this comparison is lacking. Therefore, we refrain from overinterpreting the biological relevance of this observation.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      (3) Based on the ratio of C-rich to G-rich reads in the S1-END-SEQ experiment, the authors estimate that ALT cells contain at least 3-5 ssDNA regions per chromosome end. While the calculation is understandable, this number could be discussed further to consider the possibility that the observed ratios (of roughly 0.5) might result from the presence of extrachromosomal DNA species, such as C-circles. The observed increase in the ratio of C-rich to G-rich reads in BLM-depleted cells supports this hypothesis, as BLM depletion suppresses C-circle formation in U2OS cells. To test this, the authors should examine the impact of POLD3 depletion on the C-rich/G-rich read ratio. Alternatively, they could separate high-molecular-weight (HMW) DNA from low-molecular-weight DNA in ALT cells and repeat the S1-END-SEQ in the HMW fraction.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      (4) What is the authors' perspective on the presence of ssDNA at ALT telomeres? Do they attribute this to replication stress? It would be helpful for the authors to repeat the S1-END-SEQ in telomerase-expressing cells with very long telomeres, such as HeLa1.3 cells, to determine if ssDNA is a specific feature of ALT cells or a result of replication stress. The increased abundance of G4 structures at telomeres in HeLa1.3 cells (as shown in J. Wong's lab) may indicate that replication stress is a factor. Similar to Wong's work, it would be valuable to compare the C-rich/G-rich read ratios in HeLa1.3 cells to those in ALT cells with similar telomeric DNA content.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      Finally, Reviewer #3 raises a list of minor points:

      (1) The Y-axes of Figure 4 have been relabeled to account for the G-strand reads.

      (2) Statistical analyses have been added to the figures where applicable.

      (3) The manuscript has been carefully proofread to improve clarity and consistency throughout the text and figure legends.

      (4) We have revised the text to address issues related to the lack of cross-referencing between the supplementary figures and their corresponding legends.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, authors have tried to repurpose cipargamin (CIP), a known drug against Plasmodium and Toxoplasma against Babesia. They proved the efficacy of CIP on Babesia in nanomolar range. In silico analyses revealed the drug resistance mechanism through a single amino acid mutation at amino acid position 921 on the ATP4 gene of Babesia. Overall, the conclusions drawn by the authors are well justified by their data. I believe this study opens up a novel therapeutic strategy against babesiosis.

      Strengths:

      Authors have carried out a comprehensive study. All the experiments performed were carried out methodically and logically.

      We appreciate your positive feedback. Your acknowledgment reinforces our commitment to rigor and thoroughness in our research.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to establish that cipargamin can be used for the treatment of infection caused by Babesia organisms.

      Strengths:

      The study provides strong evidence that cipargamin is effective against various Babesia species. In vitro growth assays were used to establish that cipargamin is effective against Babesia bovis and Babesia gibsoni. Infection of mice with Babesia microti demonstrated that cipargamin is as effective as the combination of atovaquone plus azithromycin. Cipargamin protected mice from lethal infection with Babesia rodhaini. Mutations that confer resistance to cipargamin were identified in the gene encoding ATP4, a P-type Na+ ATPase that is found in other apicomplexan parasites, thereby validating ATP4 as the target of cipargamin. A 7-day treatment of cipagarmin, when combined with a single dose of tafenoquine, was sufficient to eradicate Babesia microti in a mouse model of severe babesiosis caused by lack of adaptive immunity.

      Thank you for the comments and for your time to review our manuscript.

      Weaknesses:

      Cipargamin was tested in vivo at a single dose administered daily for 7 days. Despite the prospect of using cipargamin for the treatment of human babesiosis, there was no attempt to identify the lowest dose of cipagarmin that protects mice from Babesia microti infection. In the SCID mouse model, cipargamin was tested in combination with tafenoquine but not with atovaquone and/or azithromycin, although the latter combination is often used as first-line therapy for human babesiosis caused by Babesia microti.

      Thank you for your insightful comments. We agree that using a single daily dose over 7 days is one of the limitations in the in vivo trial. Our main goals were to demonstrate cipargamin's efficacy and understand its antibabesial agent mechanism. For future work, we plan to conduct dose‐optimization studies to determine the lowest effective dose in vivo. Regarding the drug combination in the SCID mouse model, although atovaquone and/or azithromycin are frequently used as first-line therapies for human babesiosis, resistance to these traditional drugs is emerging. Based on this challenge, we opted to evaluate a combination with tafenoquine as a novel partner, aiming to overcome resistance issues and improve therapeutic outcomes.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      None other than some minor grammatical mistakes.

      We have corrected the grammatical mistakes.

      Reviewer #3 (Recommendations for the authors):

      The revised manuscript is much improved. I have the following comments.

      Comment 1: Atovaquone plus azithromycin is effective against Babesia microti (Figure 1C) but not against Babesia rodhaini (Figure 1E). It would be valuable to provide a possible explanation.

      Thank you for highlighting this issue. One potential explanation is that B. microti and B. rodhaini might have intrinsic differences in drug sensitivity and susceptibility. A previous study reported that both species possess a unique linear monomeric mitochondrial genome with a dual flip-flop inversion system, which generates four distinct genome structures (Hikosaka et al., 2012). In addition, previous studies have shown that mitochondria-associated energy production is greater in B. microti than in B. rodhaini (Shikano et al., 1998). This suggests that B. microti, whose metabolism is largely driven by mitochondrial function, may be more susceptible to drugs (like atovaquone) that induce parasite death by disrupting mitochondrial targets such as cytochrome b (Wormser et al., 2010). Moreover, B. rodhaini tends to proliferate more rapidly and causes acute infections, which may outpace any drug effects. Further, the rapid proliferation of apicomplexan parasites, as is the case in Plasmodium (Salcedo-Sora et al., 2014), Theileria (Metheni et al., 2015), and B. rodhaini (Rickard, 1970; Shikano et al., 1995), has been ascribed to glycolysis as the primary energy source. This may have contributed to the reduced efficacy of atovaquone and azithromycin in B. rodhaini-infected mice in the current study. Nonetheless, we plan to explore these interspecies differences in our future work.

      Comment 2: The relapse that follows a 7-day treatment with cipargamin is transient in BALB/ mice infected with Babesia rodhaini (Figure 1E) but persistent in SCID mice infected with Babesia microti (Figure 5C). It would be valuable to provide a possible explanation.

      Thank you for your insightful comment. One possible explanation is the difference in immune status between the two mouse models. BALB/c mice have a fully functional immune system that can likely clear residual parasites following a transient relapse after cipargamin treatment. In contrast, SCID mice lack an adaptive immune response, which might allow residual B. microti parasites to persist and cause a sustained relapse. Additionally, intrinsic differences between B. rodhaini and B. microti, such as growth rate or drug susceptibility, could also play a role. We plan to explore these factors in future studies.

      Comment 3: The effect of cipargamin on parasite pH is the greatest when assessed 4 to 8 min after exposure is initiated (Figure 3E). Yet, resistance of parasites that carry a mutation in ATP4, the target of cipargamin, was assessed 20 min after cipargamin addition. At this time point, cipargamin has very little effect (Figure 3E). Accordingly, data reported in Figure 3G are of limited value.

      Thank you for your comment. The initial pH increase we see around 4 to 8 minutes likely reflects the rapid inhibition of ATP4-mediated Na⁺/H⁺ exchange by cipargamin, which quickly alkalinizes the cell. However, after the initial increase, compensatory processes, such as proton influx or metabolic acid production, gradually restored the pH, resulting in a later decline. Although assessing the pH level at 20 minutes may have recorded less dramatic changes, it still allowed us to compare the sustained differences between wild-type and mutant strains. We agree that including earlier time points for the mutants might provide further insight and we will consider this in our future work.

      Comment 4: In Figure 3H, please report the lack of statistical significance between wild-type parasites and parasites that carry the mutation L921V.

      In Figure 3H, the ATPase activity in erythrocytes infected with wild-type parasites (6.31 ± 1.20 nmol Pi/mg protein/min) is higher than that of the L921V mutation (5.11 ± 0.50 nmol Pi/mg protein/min), but the difference is not statistically significant (P = 0.095), so no asterisk was added.

      Comment 5: Tafenoquine was administered as a single 20 mg/kg dose. Please specify whether this dose is for tafenoquine succinate or tafenoquine base.

      Thank you for raising this point. In our study, the single 20 mg/kg dose refers to tafenoquine succinate. We have clarified this detail in the revised manuscript (Line 40).

      Comment 6: A single dose of 20 mg/kg tafenoquine succinate was first tested in the SCID mouse model of severe babesiosis by Mordue et al (JID 2019), not by Liu et al. (JID 2024). Please amend discussion accordingly (line 311). As correctly stated in the discussion, the single 20 mg/kg dose was not sufficient to prevent relapse of Babesia microti in the study by Mordue et al. Please provide a possible explanation for why no parasitemia was detected for 90 days in your SCID model (Figure 5C).

      Thank you for your comment. We have modified the suggested citation (Line 309). As noted by Mordue et al. (JID 2019), a single 20 mg/kg dose of tafenoquine succinate was insufficient to prevent relapse in their SCID mouse model using B. microti (ATCC 30221 Gray strain). In our study, however, no parasitemia was detected for 90 days (Figure 5C) using the B. microti Peabody mjr strain (ATCC PRA-99). Differences in the parasite strain and the timing of treatment relative to infection may have contributed to the extended suppression of parasitemia observed in our study. We plan to explore these aspects in future work.

      Comment 7: Real-time PCR was used to confirm eradication of Babesia microti infection (Figure 5D). Please specify the blood volume from which genomic DNA was extracted for each mouse. Please specify the amount of genomic DNA (i.e., not the volume) used in each reaction. Please explain how/why the cut-off was set at 35 cycles. What were the Ct values when blood was obtained from uninfected mice? For infected mice treated with cipargamin plus tafenoquine, there was no amplification. Was each reaction subjected to a maximum of 40 cycles (as suggested by Figure 5D)?

      In our qPCR assay, genomic DNA was extracted from 200 µL of blood per mouse (Line 458). In each reaction, we used 100 ng of genomic DNA (Line 464), and the thermocycling conditions were set at 40 cycles. We set the cut-off at 35 cycles based on our optimization experiments: samples with Ct values ≤ 35 consistently indicated the presence of parasite DNA, while samples without parasite DNA (distilled water and DNA from uninfected mice) had CT values > 35 cycles or undetectable. Although each reaction was run for 40 cycles, for our analysis, we defined samples as negative if no signal was observed beyond cycle 35. In mice treated with cipargamin plus tafenoquine, no signal was detected until 40 cycles, indicating the absence of parasite DNA in the samples.

      Comment 8:  Persistence of parasite DNA in blood of tafenoquine treated mice highlights the limitation of PCR to assess persistence of infection. That is, PCR cannot distinguish between viable parasites and non-viable (or dead) parasites. An adoptive transfer of blood to immunocompromised mice can help determine whether persistence of DNA is due to persistence of viable parasites. Because the experiment was carried out in SCID mice, no adoptive transfer is needed. Few parasites are required for a successful infection of immunocompromised mice (SCID mice included). Given that parasitemia never rose following treatment of SCID mice with a single dose of tafenoquine, it is highly likely that parasite DNA detected on day 90 post-infection in these tafenoquine treated mice came from persistent non-viable/dead parasites.

      We appreciate your comment and acknowledge that the use of PCR has limitations in differentiating between live and dead parasites. It is possible that the residual DNA may represent a small population of dormant parasites that are not actively replicating and thus remain below the detection threshold of parasitemia. Even in highly immunocompromised SCID mice, such dormant parasites might persist without causing overt infection under our experimental conditions. An adoptive transfer experiment in SCID mice, although not strictly necessary, could validate whether the detection of low levels of DNA comes from viable parasites capable of reactivating under different circumstances. Future studies using more sensitive viability assays or adoptive transfer approaches could provide further insights into this possibility.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study examined the interaction between two key cortical regions in the mouse brain involved in goal-directed movements, the rostral forelimb area (RFA) - considered a premotor region involved in movement planning, and the caudal forelimb area (CFA) - considered a primary motor region that more directly influences movement execution. The authors ask whether there exists a hierarchical interaction between these regions, as previously hypothesized, and focus on a specific definition of hierarchy - examining whether the neural activity in the premotor region exerts a larger functional influence on the activity in the primary motor area than vice versa. They examine this question using advanced experimental and analytical methods, including localized optogenetic manipulation of neural activity in either region while measuring both the neural activity in the other region and EMG signals from several muscles involved in the reaching movement, as well as simultaneous electrophysiology recordings from both regions in a separate cohort of animals.

      The findings presented show that localized optogenetic manipulation of neural activity in either RFA or CFA resulted in similarly short-latency changes in the muscle output and in firing rate changes in the other region. However, perturbation of RFA led to a larger absolute change in the neural activity of CFA neurons. The authors interpret these findings as evidence for reciprocal, but asymmetrical, influence between the regions, suggesting some degree of hierarchy in which RFA has a greater effect on the neural activity in CFA. They go on to examine whether this asymmetry can also be observed in simultaneously recorded neural activity patterns from both regions. They use multiple advanced analysis methods that either identify latent components at the population level or measure the predictability of firing rates of single neurons in one region using firing rates of single neurons in the other region. Interestingly, the main finding across these analyses seems to be that both regions share highly similar components that capture a high degree of variability of the neural activity patterns in each region. Single units' activity from either region could be predicted to a similar degree from the activity of single units in the other region, without a clear division into a leading area and a lagging area, as one might expect to find in a simple hierarchical interaction. However, the authors find some evidence showing a slight bias towards leading activity in RFA. Using a two-region neural network model that is fit to the summed neural activity recorded in the different experiments and to the summed muscle output, the authors show that a network with constrained (balanced) weights between the regions can still output the observed measured activities and the observed asymmetrical effects of the optogenetic manipulations, by having different within-region local weights. These results put into question whether previous and current findings that demonstrate asymmetry in the output of regions can be interpreted as evidence for asymmetrical (and thus hierarchical) inputs between regions, emphasizing the challenges in studying interactions between any brain regions.

      Strengths:

      The experiments and analyses performed in this study are comprehensive and provide a detailed examination and comparison of neural activity recorded simultaneously using dense electrophysiology probes from two main motor regions that have been the focus of studies examining goal-directed movements. The findings showing reciprocal effects from each region to the other, similar short-latency modulation of muscle output by both regions, and similarity of neural activity patterns without a clear lead/lag interaction, are convincing and add to the growing body of evidence that highlight the complexity of the interactions between multiple regions in the motor system and go against a simple feedforward-like network and dynamics. The neural network model complements these findings and adds an important demonstration that the observed asymmetry can, in theory, also arise from differences in local recurrent connections and not necessarily from different input projections from one region to the other. This sheds an important light on the multiple factors that should be considered when studying the interaction between any two brain regions, with a specific emphasis on the role of local recurrent connections, that should be of interest to the general neuroscience community.

      Weaknesses:

      While the similarity of the activity patterns across regions and lack of a clear leading/lagging interaction are interesting observations that are mostly supported by the findings presented (however, see comment below for lack of clarity in CCA/PLS analyses), the main question posed by the authors - whether there exists an endogenous hierarchical interaction between RFA and CFA - seems to be left largely open. 

      The authors note that there is currently no clear evidence of asymmetrical reciprocal influence between naturally occurring neural activity patterns of the two regions, as previous attempts have used non-natural electrical stimulation, lesions, or pharmacological inactivation. The use of acute optogenetic perturbations does not seem to be vastly different in that aspect, as it is a non-natural stimulation of inhibitory interneurons that abruptly perturbs the ongoing dynamics.

      We do believe that our optogenetic inactivation identifies a causal interaction between the endogenous activity patterns in the excitatory projection neurons, which we have largely silenced, and the downstream endogenous activity that is perturbed. The effect in the downstream region results directly from the silencing of activity in the excitatory projection neurons that mediate each region’s interaction with other regions. Here we have performed a causal intervention common in biology: a loss-of-function experiment. Such experiments generally reveal that a causal interaction of some sort is present, but often do not clarify much about the nature of the interaction, as is true in our case. By showing that a silencing of endogenous activity in one motor cortical region causes a significant change to the endogenous activity in another, we establish a causal relationship between these activity patterns. This is analogous to knocking out the gene for a transcription factor and observing causal effects on the expression of other genes that depend on it. 

      Moreover, our experiments are, to our knowledge, the first that localize a causal relationship to endogenous activity in motor cortex at a particular point during a motor behavior. Lesion and pharmacological or chemogenetic inactivation have long-lasting effects, and so their consequences on firing in other regions cannot be attributed to a short-latency influence of activity at a particular point during movement. Moreover, the involvement of motor cortex in motor learning and movement preparation/initiation complicates the interpretation of these consequences in relation to movement execution, as disturbance to processes on which execution depends can impede execution itself. Stimulation experiments generate spiking in excitatory projection neurons that is not endogenous.

      That said, we would agree that the form of the causal interaction between RFA and CFA remains unaddressed by our results. These results do not expose how the silenced activity patterns affect activity in the downstream region, just as knocking out a transcription factor gene does not expose how the transcription factor influences the expression of other genes. To show evidence for a specific type of interaction dynamics between RFA and CFA, a different sort of experiment would be necessary. See Jazayeri and Afraz, Neuron, 2017 for more on this issue.

      Furthermore, the main finding that supports a hierarchical interaction is a difference in the absolute change of firing rates as a result of the optogenetic perturbation, a finding that is based on a small number of animals (N = 3 in each experimental group), and one which may be difficult to interpret. 

      Though N = 3, we do show statistical significance. Moreover, using three replicates is not uncommon in biological experiments that require a large technical investment.

      As the authors nicely demonstrate in their neural network model, the two regions may differ in the strength of local within-region inhibitory connections. Could this theoretically also lead to a difference in the effect of the artificial light stimulation of the inhibitory interneurons on the local population of excitatory projection neurons, driving an asymmetrical effect on the downstream region? 

      We (Miri et al., Neuron, 2017) and others (Guo et al., Neuron, 2014) have shown that the effect of this inactivation on excitatory neurons in CFA is a near-complete silencing (90-95% within 20 ms). There thus is not much room for the effects on projection neurons in RFA to be much larger. We have measured these local effects in RFA as part of other work (Kristl et al., biorxiv, 2025), verifying that the effects on RFA projection neuron firing are not larger.

      Moreover, the manipulation was performed upon the beginning of the reaching movement, while the premotor region is often hypothesized to exert its main control during movement preparation, and thus possibly show greater modulation during that movement epoch. It is not clear if the observed difference in absolute change is dependent on the chosen time of optogenetic stimulation and if this effect is a general effect that will hold if the stimulation is delivered during different movement epochs, such as during movement preparation.

      We agree that the dependence of RFA-CFA interactions on movement phase would be interesting to address in subsequent experiments. While a strong interpretation of lesion results might lead to a hypothesis that premotor influence on primary motor cortex is local to, or stronger during, movement preparation as opposed to execution, at present there is to our knowledge no empirical support from interventional experiments for this hypothesis. Moreover, existing results from analysis of activity in these two regions have produced conflicting results on the strength of interaction between these regions during preparation. Compare for example BachschmidRomano et al., eLife, 2023 to Kaufman et al., Nature Neuroscience, 2014.

      That said, this lesion interpretation would predict the same asymmetry we have observed from perturbations at the beginning of a reach - a larger effect of RFA on CFA than vice versa.

      Another finding that is not clearly interpretable is in the analysis of the population activity using CCA and PLS. The authors show that shifting the activity of one region compared to the other, in an attempt to find the optimal leading/lagging interaction, does not affect the results of these analyses. Assuming the activities of both regions are better aligned at some unknown groundtruth lead/lag time, I would expect to see a peak somewhere in the range examined, as is nicely shown when running the same analyses on a single region's activity. If the activities are indeed aligned at zero, without a clear leading/lagging interaction, but the results remain similar when shifting the activities of one region compared to the other, the interpretation of these analyses is not clear.

      Our results in this case were definitely surprising. Many share the intuition that there should be a lag at which the correlations in activity between regions may be strongest. The similarity in alignment across lags we observed might be expected if communication between regions occurs over a range of latencies as a result of dependence on a broad diversity of synaptic paths that connect neurons. In the Discussion, we offer an explanation of how to reconcile these findings with the seemingly different picture presented by DLAG.

      Reviewer #2 (Public review):

      Summary:

      While technical advances have enabled large-scale, multi-site neural recordings, characterizing inter-regional communication and its behavioral relevance remains challenging due to intrinsic properties of the brain such as shared inputs, network complexity, and external noise. This work by Saiki-Ishkawa et al. examines the functional hierarchy between premotor (PM) and primary motor (M1) cortices in mice during a directional reaching task. The authors find some evidence consistent with an asymmetric reciprocal influence between the regions, but overall, activity patterns were highly similar and equally predictive of one another. These results suggest that motor cortical hierarchy, though present, is not fully reflected in firing patterns alone.

      Strengths:

      Inferring functional hierarchies between brain regions, given the complexity of reciprocal and local connectivity, dynamic interactions, and the influence of both shared and independent external inputs, is a challenging task. It requires careful analysis of simultaneous recording data, combined with cross-validation across multiple metrics, to accurately assess the functional relationships between regions. The authors have generated a valuable dataset simultaneously recording from both regions at scale from mice performing a cortex-dependent directional reaching task.

      Using electrophysiological and silencing data, the authors found evidence supporting the traditionally assumed asymmetric influence from PM to M1. While earlier studies inferred a functional hierarchy based on partial temporal relationships in firing patterns, the authors applied a series of complementary analyses to rigorously test this hierarchy at both individual neuron and population levels, with robust statistical validation of significance.

      In addition, recording combined with brief optogenetic silencing of the other region allowed authors to infer the asymmetric functional influence in a more causal manner. This experiment is well designed to focus on the effect of inactivation manifesting through oligosynaptic connections to support the existence of a premotor to primary motor functional hierarchy.

      Subsequent analyses revealed a more complex picture. CCA, PLS, and three measures of predictivity (Granger causality, transfer entropy, and convergent cross-mapping) emphasized similarities in firing patterns and cross-region predictability. However, DLAG suggested an imbalance, with RFA capturing CFA variance at a negative time lag, indicating that RFA 'leads' CFA. Taken together these results provide useful insights for current studies of functional hierarchy about potential limitations in inferring hierarchy solely based on firing rates.

      While I would detail some questions and issues on specifics of data analyses and modeling below, I appreciate the authors' effort in training RNNs that match some behavioral and recorded neural activity patterns including the inactivation result. The authors point out two components that can determine the across-region influence - 1) the amount of inputs received and 2) the dependence on across-region input, i.e., the relative importance of local dynamics, providing useful insights in inferring functional relationships across regions.

      Weaknesses:

      (1) Trial-averaging was applied in CCA and PLS analyses. While trial-averaging can be appropriate in certain cases, it leads to the loss of trial-to-trial variance, potentially inflating the perceived similarities between the activity in the two regions (Figure 4). Do authors observe comparable degrees of similarity, e.g., variance explained by canonical variables? Also, the authors report conflicting findings regarding the temporal relationship between RFA and CFA when using CCA/PLS versus DLAG. Could this discrepancy be due to the use of trial-averaging in former analyses but not in the latter?

      We certainly agree that the similarity in firing patterns is higher in trial averages than on single trials, given the variation in single-neuron firing patterns across trials. Here, we were trying to examine the similarity of activity variance that is clearly movement dependent, as trial averages are, and to use an approach aligned with those applied in the existing literature. We would also agree that there is more that can be learned about interactions from trial-by-trial analysis. It is possible that the activity components identified by DLAG as being asymmetric somehow are not reflected strongly in trial averages. In our Discussion we offer another potential explanation that is based on other differences in what is calculated by DLAG and CCA/PLS.

      We also note here that all of the firing pattern predictivity analysis we report (Figure 6) was done on single trial data, and in all cases the predictivity was symmetric. Thus, our results in aggregate are not consistent with symmetry purely being an artifact of trial averaging.

      (2) A key strength of the current study is the precise tracking of forelimb muscle activity during a complex motor task involving reaching for four different targets. This rich behavioral data is rarely collected in mice and offers a valuable opportunity to investigate the behavioral relevance of the PM-M1 functional interaction, yet little has been done to explore this aspect in depth. For example, single-trial time courses of inter-regional latent variables acquired from DLAG analysis can be correlated with single-trial muscle activity and/or reach trajectories to examine the behavioral relevance of inter-regional dynamics. Namely, can trial-by-trial change in inter-regional dynamics explain behavioral variability across trials and/or targets? Does the inter-areal interaction change in error trials? Furthermore, the authors could quantify the relative contribution of across-area versus within-area dynamics to behavioral variability. It would also be interesting to assess the degree to which across-area and within-area dynamics are correlated. Specifically, can acrossarea dynamics vary independently from within-area dynamics across trials, potentially operating through a distinct communication subspace?

      These are all very interesting questions. Our study does not attempt to parse activity into components predictive of muscle activity and others that may reflect other functions. Distinct components of RFA and CFA activity may very well rely on distinct interactions between them.

      (3) While network modeling of RFA and CFA activity captured some aspects of behavioral and neural data, I wonder if certain findings such as the connection weight distribution (Figure 7C), across-region input (Figure 7F), and the within-region weights (Figure 7G), primarily resulted from fitting the different overall firing rates between the two regions with CFA exhibiting higher average firing rates. Did the authors account for this firing rate disparity when training the RNNs?

      The key comparison in Figure 7 is shown in 7F, where the firing rates are accounted for in calculating the across-region input strength. Equalizing the firing rates in RFA and CFA would effectively increase RFA rates. If the mean firing rates in each region were appreciably dependent on across-region inputs, we would then expect an off-setting change in the RFA→CFA weights, such that the RFA→CFA distributions in 7F would stay the same. We would also expect the CFA→RFA weights would increase, since RFA neurons would need more input. This would shift the CFA→RFA (blue) distributions up. Thus, if anything, the key difference in this panel would only get larger. 

      We also generally feel that it is a better approach to fit the actual firing rates, rather than normalizing, since normalizing the firing rates would take us further from the actual biology, not closer.

      (4) Another way to assess the functional hierarchy is by comparing the time courses of movement representation between the two regions. For example, a linear decoder could be used to compare the amount of information about muscle activity and/or target location as well as time courses thereof between the two regions. This approach is advantageous because it incorporates behavior rather than focusing solely on neural activity. Since one of the main claims of this study is the limitation of inferring functional hierarchy from firing rate data alone, the authors should use the behavior as a lens for examining inter-areal interactions.

      As we state above, we agree that examining interactions specific to movement-related activity components could reveal interesting structure in interregional interactions. Since it remains a challenge to rigorously identify a subset of neural activity patterns specifically related to driving muscle activity, any such analysis would involve an additional assumption. It remains unclear how well the activity that decoders use for predicting muscle activity matches the activity that actually drives muscle activity in situ.

      To address this issue, which related to one raised by Reviewer #3 below, we have added an additional paragraph to the Discussion (see “Manifestations of hierarchy in firing patterns”).

      Reviewer #3 (Public review):

      This study investigates how two cortical regions that are central to the study of rodent motor control (rostral forelimb area, RFA, and caudal forelimb area, CFA) interact during directional forelimb reaching in mice. The authors investigate this interaction using

      (1) optogenetic manipulations in one area while recording extracellularly from the other, (2) statistical analyses of simultaneous CFA/RFA extracellular recordings, and (3) network modeling.

      The authors provide solid evidence that asymmetry between RFA and CFA can be observed, although such asymmetry is only observed in certain experimental and analytical contexts.

      The authors find asymmetry when applying optogenetic perturbations, reporting a greater impact of RFA inactivation on CFA activity than vice-versa. The authors then investigate asymmetry in endogenous activity during forelimb movements and find asymmetry with some analytical methods but not others. Asymmetry was observed in the onset timing of movement-related deviations of local latent components with RFA leading CFA (computed with PCA) and in a relatively higher proportion and importance of cross-area latent components with RFA leading than CFA leading (computed with DLAG). However, no asymmetry was observed using several other methods that compute cross-area latent dynamics, nor with methods computed on individual neuron pairs across regions. The authors follow up this experimental work by developing a twoarea model with asymmetric dependence on cross-area input. This model is used to show that differences in local connectivity can drive asymmetry between two areas with equal amounts of across-region input.

      Overall, this work provides a useful demonstration that different cross-area analysis methods result in different conclusions regarding asymmetric interactions between brain areas and suggests careful consideration of methods when analyzing such networks is critical. A deeper examination of why different analytical methods result in observed asymmetry or no asymmetry, analyses that specifically examine neural dynamics informative about details of the movement, or a biological investigation of the hypothesis provided by the model would provide greater clarity regarding the interaction between RFA and CFA.

      Strengths:

      The authors are rigorous in their experimental and analytical methods, carefully monitoring the impact of their perturbations with simultaneous recordings, and providing valid controls for their analytical methods. They cite relevant previous literature that largely agrees with the current work, highlighting the continued ambiguity regarding the extent to which there exists an asymmetry in endogenous activity between RFA and CFA.

      A strength of the paper is the evidence for asymmetry provided by optogenetic manipulation. They show that RFA inactivation causes a greater absolute difference in muscle activity than CFA interaction (deviations begin 25-50 ms after laser onset, Figure 1) and that RFA inactivation causes a relatively larger decrease in CFA firing rate than CFA inactivation causes in RFA (deviations begin <25ms after laser onset, Figure 3). The timescales of these changes provide solid evidence for an asymmetry in the impact of inactivating RFA/CFA on the other region that could not be driven by differences in feedback from disrupted movement (which would appear with a ~50ms delay).

      The authors also utilize a range of different analytical methods, showing an interesting difference between some population-based methods (PCA, DLAG) that observe asymmetry, and single neuron pair methods (granger causality, transfer entropy, and convergent cross mapping) that do not. Moreover, the modeling work presents an interesting potential cause of "hierarchy" or "asymmetry" between brain areas: local connectivity that impacts dependence on across-region input, rather than the amount of across-region input actually present.

      Weaknesses:

      There is no attempt to examine neural dynamics that are specifically relevant/informative about the details of the ongoing forelimb movement (e.g., kinematics, reach direction). Thus, it may be preemptive to claim that firing patterns alone do not reflect functional influence between RFA/CFA. For example, given evidence that the largest component of motor cortical activity doesn't reflect details of ongoing movement (reach direction or path; Kaufman, et al. PMID: 27761519) and that the analytical tools the authors use likely isolate this component (PCA, CCA), it may not be surprising that CFA and RFA do not show asymmetry if such asymmetry is related to the control of movement details. 

      An asymmetry may still exist in the components of neural activity that encode information about movement details, and thus it may be necessary to isolate and examine the interaction of behaviorally-relevant dynamics (e.g., Sani, et al. PMID: 33169030).

      To clarify, we are not claiming that firing patterns in no way reflect the asymmetric functional influence that we demonstrate with optogenetic inactivation. Instead, we show that certain types of analysis that we might expect to reflect such influence, in fact, do not. Indeed, DLAG did exhibit asymmetries that matched those seen in functional influence (at least qualitatively), though other methods we applied did not.

      As we state above, we do think that there is more that can be gleaned by looking at influence specifically in terms of activity related to movement. However, if we did find that movement-related activity exhibited an asymmetry following functional influence, our results imply that the remaining activity components would exhibit an opposite asymmetry, such that the overall balance is symmetric. This would itself be surprising. We also note that the components identified by CCA and PLS do show substantial variation across reach targets, indicating that they are not only reflecting condition-invariant components. These analyses were performed on components accounting for well over 90% of the total activity variance, suggesting that both conditiondependent and condition-invariant components should be included.

      To address the concern about condition-dependent and condition-invariant components, we have added a sentence to the Results section reporting our CCA and PLS results: “Because our results here involve the vast majority of trial-averaged activity variance, we expect that they encompass both components of activity that vary for different movement conditions (condition-dependent), and those that do not (condition-invariant).” To address the general concerns about potential differences in activity components specifically related to muscle activity, we have also added an additional paragraph to the Discussion (see “Manifestations of hierarchy in firing patterns”).

      The idea that local circuit dynamics play a central role in determining the asymmetry between RFA and CFA is not supported by experimental data in this paper. The plausibility of this hypothesis is supported by the model but is not explored in any analyses of the experimental data collected. Given the focus on this idea in the discussion, further experimental investigation is warranted.

      While we do not provide experimental support for this hypothesis, the data we present also do not contradict this hypothesis. Here we used modeling as it is often used - to capture experimental results and generate hypotheses about potential explanation. We do feel that our Discussion makes clear where the hypothesis derives from and does not misrepresent the lack of experimental support. We expect readers will take our engagement with this hypothesis with the appropriate grain of salt. The imaginable experiments to support such a hypothesis would constitute another substantial study, requiring numerous controls - a whole other paper in itself.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) There are a few small text/figure caption modifications that can be made for clarity of reading:

      (2) Unclear sentence in the second paragraph of the introduction: "For example, stimulation applied in PM has been shown to alter the effects on muscles of stimulation in M1 under anesthesia, both in monkeys and rodents."

      This sentence has been rephrased for clarity: “For example, in anesthetized monkeys34 and rodents35, stimulation in PM alters the effects of stimulation in M1 on muscles.”

      (3) The first section of the results presents the optogenetic manipulation. However, the critical control that tests whether this was strictly a local manipulation that did not affect cells in the other region is introduced only much later. It may be helpful to add a comment in this section noting that such a control was performed, even if it is explained in detail later when introducing the recordings.

      We have added the following to the first Results section: “we show below that direct optogenetic effects were only seen in the targeted forelimb area and not the other.”

      (4) Figure 1D - I imagine these averages are from a single animal, but this is not stated in the figure caption.

      “For one example mouse,” has been added to the beginning of the Figure 1D legend.

      (5) Figure 2F - N=6 is not stated in the panel's caption (though it can make it clearer), while it is stated in the caption of 2H.

      “n = 6 mice” has been added to the Figure 2F legend.

      (6) There's some inconsistency with the order of RFA/CFA in the figures, sometimes RFA is presented first (e.g., Figure 1D and 1F), and sometimes CFA is presented first (e.g., panels of Figure 2).

      We do not foresee this leading to confusion.

      (7) "As expected, the majority of recorded neurons in each region exhibited an elevated average firing rate during movement as compared to periods when forelimb muscles were quiescent (Figure 2D,E; Figure S1A,B)" - Figure S1A,B show histograms of narrow vs. wide waveforms, is this the relevant figure here?

      We apologize for the cryptic reference. The waveform width histograms were referred to here because they enabled the separation of narrow- and wide-waveform cells shown in Figure 2D,E. We have added the following clause to the referenced sentence to make this explicit:  “, both for narrow-waveform, putative interneurons and wide-waveform putative pyramidal neurons.”

      (8) Figure 2I caption - "The fraction of activity variance from 150 ms before reach onset to 150 ms after it that occurs before reach onset" - this sentence is not clear.

      The Figure 2I legend has been updated to “The activity variance in the 150 ms before muscle activity onset, defined as a fraction of the total activity variance from 150 ms before to 150 ms after muscle activity onset, for each animal (circles) and the mean across animals (black bars, n = 6 mice).”

      (9) Figure 4B-G - is this showing results across the 6 animals? Not stated clearly.

      Yes - the 21 sessions we had referred to are drawn from all six mice. We have updated the legend here to make this explicit.

      (10) DLAG analysis - is there any particular reasoning behind choosing four across-region and four within-region components?

      In actuality, we completed this analysis for a broad range of component numbers and obtained similar results in all cases. Four fell in the center of our range, and so we focused the illustrations shown in the figure on this value. In general, the number of components is arbitrary. The original paper from Gokcen et al. describes a method for identifying a lower bound on the number of distinct components the method can identify. However, this method yields different results for each individual recording session. For the comparisons we performed, we needed to use the same range of values for each session.

      (11) Figure 5A seems to show 11 across-session components, it's unclear from the caption but I imagine this should show 12 (4 components times 3 sessions?)

      As we state in the Methods, any across-region latent variable with a lag that failed to converge between the boundary values of ±200 ms was removed from the analysis. In the case illustrated in this panel, the lag for one of the components failed to converge and is not shown. We have now clarified this both in the relevant Results paragraph and in the figure legend.

      (12) Figure 5B - is each marker here the average variance explained by all across/within components that were within the specified lag criteria across sessions per mouse? In other words, what does a single marker here stand for?

      We apologize for the lack of clarity here. These values reflect the average across sessions for each mouse. We have updated the legend to make this explicit.

      Reviewer #2 (Recommendations for the authors):

      As I have addressed most of my major recommendations in the public review, I will use this section to include relatively minor points for the authors to consider.

      (1) The EMG data in Figure 1C shows distinct patterns across spouts, both in the magnitude and complexity of muscle activations. It would be interesting to investigate whether these differences in muscle activity lead to behavioral variations (e.g., reaction time, reach duration) and how they relate to the relative involvement of the two areas.

      We agree that it would be interesting to examine how the interactions between areas vary as behavior varies. While the differences between reaches here are limited, we have addressed this question for two substantially different motor behaviors (reaching and climbing) in a follow-up study that was recently preprinted (Kristl et al., biorxiv, 2025).

      (2) How do the authors account for the lingering impact of RFA inactivation on muscle activity, which persists for tens of milliseconds after laser offset? Could this effect be due to compensatory motor activity following the perturbation? A further illustration of how the raw limb trajectories and/or muscle activity are perturbed and recovered would help readers better understand the impact of motor cortical inactivation.

      To clarify the effects of inactivation on a longer timescale, we have added a new supplemental figure showing the plots from Figure 1D over a longer time window extending to 500 ms after trial onset (new Figure S1). Lingering effects do persist, at least in certain cases. In general, we find it hard to ascertain the source of optogenetic effects on longer timescales like this. On the shortest timescales, effects will be mediated by relatively direct connections between regions. However, on these longer timescales, effects could be due to broader changes in brain and behavioral state that can influence muscle activity. For example, attempts to compensate for the initial disturbance to muscle activity could cause divergence from controls on these longer timescales. Muscle tissue itself is also known to have long timescale relaxation dynamics, and it would not be surprising if the relevant control circuits here also had long timescales dynamics, such that we would not expect an immediate return to control when the light pulse ends. Because of this ambiguity, we generally avoid interpretation of optogenetic effects on these longer timescales.

      Reviewer #3 (Recommendations for the authors):

      (1) Page 9: ". We measured the time at which the activity state deviated from baseline preceding reach onset," - I cannot find how this deviation was defined (neither the baseline nor the threshold).

      We have added text to the Figure 2G legend that explicitly states how the baseline and activity onset time were defined.

      (2) Given the shape of the curves in Figure 2G, the significance of this result seems susceptible to slight modifications of what defines a baseline or a deviation threshold. For example, it looks like the circle for CFA has a higher y-axis value, suggesting the baseline deviance is higher, but it is unclear why that would be from the plot. If the threshold for deviation in neural activity state were held uniform between CFA and RFA is the difference still significant across animals?

      We have repeated the analysis using the same absolute threshold for each region. We used the higher of the two thresholds from each region. The difference remains significant. This is now described in the last paragraph of the Results section for Figure 2.

      (3) Since summed deviation of the top 3 PCs is used to show a difference in activity onset between CFA/RFA, but only a small proportion of variance is explained pre-movement (<2% in most animals), it seems relevant to understand what percentage of CFA/RFA neuron activity actually is modulated and deviates from baseline prior to movement and to show the distribution of activity onsets at the single neuron level in CFA/RFA. Can an onset difference only be observed using PCA? 

      Because many neurons have low firing rates, estimating the time at which their firing rate begins to rise near reach onset is difficult to do reliably. It is also true that not all neurons show an increase around onset - some show a decrease and others show no discernible change. Using PCs to measure onset avoids both of these problems, since they capture both increases and decreases in individual neuron firing rates and are much less noisy than individual neuron firing rates. 

      However, based on this comment, we have repeated this analysis on a single-neuron level using only neurons with relatively high average firing rates. Specifically, we analyzed neurons with mean firing rates above the 90th percentile across all sessions within an animal. Neurons whose activity never crossed threshold were excluded. Results matched those using PCs, with RFA neurons showing an earlier average activity onset time. This is now described in the last paragraph of the Results section for Figure 2.

      (4) It is stated that to study the impact of inactivation on CFA/RFA activity, only the 50 highest average firing rate neurons were used (and maybe elsewhere too, e.g., convergent cross mapping). It is unclear why this subselection is necessary. It is justified by stating that higher firing rate neurons have better firing rate estimates. This may be supportable for very low firing rate units that spike sorting tools have a hard time tracking, but I don't think this is supported by data for most of the distribution of firing rates. It therefore seems like the results might be biased by a subselection of certain high firing rate neuron populations. It would be useful to also compute and mention if the results for all neurons/neuron pairs are the same. If there is worry about low-quality units being those with low firing rates, a threshold for firing rate as used elsewhere in the paper (at least 1 spike / 2 trials) seems justified.

      The issue here is that as firing rates decrease and firing rate estimates get noisier, estimates of the change in firing rate get more variable. Here we are trying to estimate the fraction of neurons for which firing rates decreased upon inactivation of the other region. Variability in estimates of the firing rate change will bias this estimate toward 50%, since in the limit when the change estimates are entirely based on noise, we expect 50% to be decreases. As expected, when we use increasingly liberal thresholds for this analysis, the fraction of decreases trends closer to 50%. 

      As a consequence of this, we cannot easily distinguish whether higher firing rate neurons might for some reason have a greater tendency to exhibit decreases in firing compared to lower firing rate neurons. However, we see no positive reason to expect such a difference. We have added a sentence noting this caveat in interpreting our findings to the relevant paragraph of the Results.

      The lack of min/max axis values in Figure 3B-F makes it hard to interpret - are these neurons almost silent when near the bottom of the plot or are they still firing a substantial # of spikes?

      To aid interpretation of the relative magnitude of firing rate changes, we have added minimum firing rates for the averages depicted in Figure 3B,C,E and F to the legend. Our original thinking was that the plots in Figure 3G and H would provide an indication of the relative changes in firing.

      It would be interesting to know if the impact of optogenetic stimulation changed with exposure to the manipulation. Are all results presented only from the first X number of sessions in each animal? Or is the effect robust over time and (within the same animal) you can get the same results of optogenetic inactivation over time? This information seems critical for reproducibility.

      We have now performed brief optogenetic inactivations in several brain areas in several different behavioral paradigms, and have found that inactivation effects are stable both within and across sessions, almost surprisingly so. This includes cases where the inactivations were more frequent (every ~1.25 s on average) and more numerous (>15,000 trials per animal) than in the present manuscript. Thus we did not restrict our analysis here to the first X sessions or trials within a session. We have added additional plots as Figure S3T-AA showing the stability of optogenetic effects both within and across sessions.

      Given that it can be difficult to record from interneurons (as the proportion of putative interneurons in Figure S1 attests), the SALT analyses would be more convincing if a few recordings had been performed in the same region as optogenetic stimulation to show a "positive control" of what direct interneuron stimulation looks like. Could also use this to validate the narrow/wide waveform classification.

      We have verified that using SALT as we have in the present manuscript does detect vGAT+ interneurons directly responding to light. This is included in a recent preprint from the lab (Kristl et al., biorxiv, 2025). We (Warriner et al., Cell Reports, 2022) and others (Guo et al., Neuron, 2014) have previously used direct ChR2 activation to validate waveform-based classification.

      Simultaneous CFA/RFA recordings during optogenetic perturbation would also allow for time courses of inhibition to be compared in RFA/CFA. Does it take 25ms to inhibit locally, and the cross-area impact is fast, or does it inactivate very fast locally and takes ~25ms to impact the other region?

      Latencies of this sort are difficult to precisely measure given the statistical limits of this sort of data, but there does appear to be some degree of delay between local and downstream effects. We do not have a statistical foundation as of yet for concluding that this is the case. It will be interesting to examine this issue more rigorously in the future.

      Given the difference in the analytical methods, the authors should share data in a relatively unprocessed format (e.g., spike times from sorted units relative to video tracking + behavioral data), along with analysis code, to allow others to investigate these differences.

      We plan to post the data and code to our lab’s Github site once the Version of Record is online.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, the authors reveal that the MK2 inhibitor CMPD1 can inhibit the growth, migration, and invasion of breast cancer cells both in vitro and in vivo by inducing microtubule depolymerization, preferentially at the microtubule plus-end, leading to cell division arrest, mitotic defects, and apoptotic cell death. They also showed that CMPD1 treatment upregulates genes associated with cell migration and cell death, and downregulates genes related to mitosis and chromosome segregation in breast cancer cells, suggesting a potential mechanism of CMPD1 inhibition in breast cancer. Besides, they used the combination of an MK2-specific inhibitor, MK2-IN-3, with the microtubule depolymerizer vinblastine to simultaneously disrupt both the MK2 signaling pathway and microtubule dynamics, and they claim that inhibiting the p38-MK2 pathway may help to enhance the efficacy of MTAs in the treatment of breast cancer. However, there are a few concerns, including:

      (1) What is the effect of CMPD1 on breast cancer metastasis?

      In this study, we hypothesized that the MK2 signaling pathway could synergize with microtubule-targeting agents (MTAs) to enhance anti-cancer efficacy. We utilized CMPD1 as a potent dual-function inhibitor, targeting both MK2 and microtubule dynamics. By simultaneously inhibiting these pathways, CMPD1 not only shows the therapeutic impact of MTAs, but also significantly suppresses breast cancer cell migration and invasion. Therefore, we propose that CMPD1, through its dual inhibition of MK2 activity and microtubule dynamics, may offer enhanced specificity and efficacy in preventing breast cancer metastasis and limiting tumor progression.

      (2) The mechanism is lacking as to how MK2 inhibitors enhance the efficacy of MTAs.

      Thank you for the valuable suggestion. We agree that our current findings do not fully elucidate the underlying mechanism by which MK2 inhibition synergistically enhances the efficacy of MTAs. We recognize this as an important area for further investigation and are committed to exploring the molecular interplay between MK2 signaling and microtubule dynamics in future studies. A deeper mechanistic understanding will be critical to establishing a strong rationale for the potential co-treatment of MK2 inhibitors and MTAs in clinical breast cancer therapy.

      Reviewer #2 (Public review):

      Summary:

      This study explores the potential of inhibiting the p38-MK2 signaling pathway to enhance the efficacy of microtubule-targeting agents (MTAs) in breast cancer treatment using a dual-target inhibitor.

      Strengths:

      The study identifies the p38-MK2 pathway as a promising target to enhance the efficacy of microtubule-targeting agents (MTAs), offering a novel therapeutic strategy for breast cancer treatment. In addition, the study employs a wide range of techniques, especially live-cell imaging, to assess the microtubule dynamics in TNBC cells.

      We sincerely appreciate your recognition of the significance and impact of our work.

      Weaknesses:

      The study primarily uses RPE1 cells as the control for normal cells, which may not fully capture the response of normal mammary epithelial cells. While CMPD1 is shown to be effective in suppressing tumor growth in MDA-MB-231 xenograft, the study lacks detailed toxicity data to confirm its safety profile in vivo.

      Thank you for your valuable suggestions. In the revised manuscript, we have included CMPD1 treatment in MCF10A cells, a more appropriate non-transformed control line commonly used in breast cancer research. Notably, MCF10A cells exhibited results similar to those observed in RPE1 cells, further reinforcing our conclusion that breast cancer cells display increased sensitivity to CMPD1 treatment. These new findings are presented in Figure 2-Supplement 1A-C. Additionally, we performed further xenograft experiments using CAL-51 and MDA-MB-231 cells. We collected data on tumor growth, mouse body weight, survival rates, and other relevant parameters to comprehensively assess toxicity. The newly obtained results are presented in Figure 3F-G and Figure 3-Supplement 1-3.

      Reviewer #3 (Public review):

      Summary:

      The authors demonstrated MK2i could enhance the therapeutic efficacy of MTAs. With Tumor xenograft and migration assay, the author suggested that the p38-MK2 pathway may serve as a promising therapeutic target in combination with MTAs in cancer treatment.

      Strengths:

      The authors provided a potential treatment for breast cancer.

      Thank you for recognizing the importance and significance of our work.

      Weaknesses:

      (1) In Figure 2, the authors used a human retinal pigment epithelial-1 (RPE1) cell line to show that breast cancer cells are more sensitive to CMPD1 treatment. MCF10A cells would be suggested here as a suitable control. Besides, to compare the sensitivity, IC50 indifferent cell lines should be measured.

      In the revised manuscript, we have addressed these points by determining the IC50 values for CMPD1 in MDA-MB-231, CAL-51, MCF10A, and CAL-51 p53 knockout cells. These new results are presented in Figure 2-Supplement Figure 3.

      (2) The data of MDA-MB-231 in Figure 1D is not consistent with CAL-51 and T47D, also not consistent with the data in Figures 2B-C.

      In the revised manuscript, we have included all relevant statistical analyses in Figure 1D. In MDA-MB-231 cells, there are no statistically significant differences in mitotic duration between 1 µM and 5 µM, 5 µM and 10 µM, or 1 µM and 10 µM CMPD1 treatments. Similarly, no significant differences are observed between 1 µM and 5 µM or 5 µM and 10 µM CMPD1 treatments in CAL-51 cells, and between 5 µM and 10 µM in T-47D cells. These results suggest that mitotic duration does not exhibit a clear dose-dependent relationship within the 1–10 µM range, likely because mitotic arrest has reached a near-plateau effect at these concentrations.

      It is also important to note that the experimental conditions in Figures 1 and 2 are fundamentally different. Figure 1 investigates the effects of higher concentrations of CMPD1 (≥1 µM), which severely disrupt microtubule organization and result in robust mitotic arrest, with cells arrested in mitosis for over 8 hours. In contrast, the conditions in Figure 2 utilize much lower concentrations of CMPD1 (10–50 nM), which are insufficient to cause complete microtubule depolymerization, but are capable of inducing a subtle yet statistically significant mitotic delay, particularly in breast cancer cell lines. These lower concentrations were chosen to mimic clinically relevant intratumoral drug levels. Previous studies have reported that paclitaxel (PTX) concentrations in patient tumors approximate ~50 nM when modeled in vitro. At these physiologically relevant levels, PTX does not induce strong mitotic arrest but instead causes moderate delays that result in division errors and chromosomal instability, ultimately contributing to cancer cell death. In this study, the conditions used in Figure 2 emulate these clinically relevant concentrations for CMPD1. We found that, similar to PTX, low-dose CMPD1 induces a slight but significant mitotic delay without triggering a full mitotic arrest. Notably, unlike PTX, CMPD1 appears to exert this effect selectively in breast cancer cells, contributing to mitotic errors and potentially enhancing therapeutic efficacy through targeted chromosomal instability.

      (3) To support the authors' conclusion in Figure 5, an additional animal experiment performed by tail vein injection would be helpful.

      While current technical limitations have precluded us from conducting this suggested experiment in this study, we have performed complementary xenograft studies using CAL-51 cells treated with CMPD1. These experiments included a comprehensive toxicity analysis. Furthermore, we carried out an in vitro migration assay using CAL-51 cells under combined treatment with the MK2 inhibitor and vinblastine. These additional findings are presented in Figure 3–Supplement 1–3 and Figure 6–Supplement 3. We recognize the importance of the suggested tail vein injection approach and are actively pursuing further mechanistic studies, including this experiment, in our ongoing and future work.

      (4) Page 14, to evaluate the combination result of MK2i and vinblastine, an in vivo animal assay must be performed.

      We appreciate the reviewer’s valuable suggestion. We are actively investigating the synergistic mechanisms between the MK2 inhibitor and microtubule-targeting agents (MTAs). In future studies, we plan to extend our findings by conducting xenograft experiments to further evaluate their therapeutic potential in vivo.

      (5) The authors used RNA-seq to show some pathways affected by CMPD1. What are the key/top genes that were affected? How about the mechanism?

      In the revised manuscript, we have included the top 20 upregulated and downregulated genes identified from RNA-seq analysis using MDA-MB-231 cells. This new data is presented in Figure 6-Supplement Figure 4. Gene Ontology (GO) Biological Process (BP) pathway enrichment analysis revealed that the most significantly enriched pathways among upregulated genes are associated with cell migration, whereas the downregulated genes are primarily involved in mitosis and chromosome segregation. These transcriptional changes are consistent with the phenotypic outcomes observed in our experiments, supporting the functional relevance of CMPD1 treatment. However, further investigation will be necessary to elucidate the detailed molecular mechanisms underlying these effects.

      (6) Line 127, more experiments should be involved to support the conclusion.

      In the revised manuscript, we have addressed this point by performing additional experiments, including determination of the IC₅₀ values of CMPD1 in MDA-MB-231, CAL-51, MCF10A, and CAL-51 p53 knockout cells. We also conducted live-cell imaging analyses using MCF10A cells. These new results further reinforce our conclusion that breast cancer cells are more sensitive to CMPD1 treatment than normal breast epithelial cells, and that this sensitivity is independent of p53 status. The new data are presented in Figure 2-Supplement Figures 1 and 3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1D: As the concentration of CMPD1 increased, the mitotic duration of MDA-MB-231 cells decreased, why was that?

      Although there appears to be a slight decrease in mitotic duration with increasing concentrations of CMPD1, our quantitative analysis reveals no statistically significant differences among the 1 to 10 µM treatment groups in MDA-MB-231 cells. In the revised manuscript, we have included all relevant statistical analyses in Figure 1D for clarity. Importantly, all CMPD1-treated groups exhibit a pronounced and statistically significant prolongation of mitosis compared to the DMSO-treated control. While the average mitotic duration in control cells is approximately 30 minutes, cells exposed to 1–10 µM CMPD1 consistently display mitotic durations exceeding 8 hours, indicating a strong and sustained mitotic arrest across this concentration range.

      Reviewer #2 (Recommendations for the authors):

      (1) The rationale for using RPE1 as normal cell control instead of normal mammary epithelial cells as control is unclear. Using normal mammary epithelial cells such as MCF10A for the study is recommended.

      Thank you for this valuable suggestion. In the revised manuscript, we have included additional experiments using non-transformed mammary epithelial MCF10A cells. The new data, presented in Figure 2-Supplement Figures 1 and 3, include both IC50 measurements and live-cell imaging analyses. These results further support our conclusion that breast cancer cells are significantly more sensitive to CMPD1 treatment compared to normal mammary epithelial cells.

      (2) It is intriguing that CAL-51 cells are more sensitive to CMPD1 than MDA-MB-231 cells; examining how p53 signaling changes in these cells would be worthwhile.

      We appreciate this insightful comment. In the revised manuscript, we have measured the IC₅₀ values for both CAL-51 and CAL-51 p53 knockout (p53KO) cells. The results show no significant difference in CMPD1 sensitivity between the two, suggesting that the enhanced sensitivity of CAL-51 cells is independent of p53 status. These new findings are presented in Figure 2—Supplement Figure 3.

      (3) Figures S1A and B are not described and cited in the main text.

      We apologize for this oversight. In the revised manuscript, we have correctly cited and described Figures S1A and B (Figure 2-Supplement Figure 2 A-B in revised manuscript) in the main text.

      (4) I'm not that convinced by the conclusion made from Lines 201-204. First, Figure S2C, which is the growth of tumor volume, does not reflect the toxicity of the drug treatment. No additional data evaluating the toxicity (such as body weight change) under the regimen was shown. Second, although the tumor weight by the endpoint indicated some anti-tumor effect in the MDA-MB-231 xenograft model, the tumor volume does not show the same pattern (the dot lines do not well distinguish which group from which). I would suggest repeating the in vivo experiment using CAL-51 cells since it is more sensitive to CMPD1 according to the previous data.

      Thank you for this thoughtful and constructive feedback. In the revised manuscript, we have addressed these concerns through several additional experiments. We performed new xenograft studies using CAL-51 TNBC cells, in parallel with further toxicity-focused analyses in the MDA-MB-231 model. Consistent with previous results, CMPD1 treatment significantly suppressed tumor growth in CAL-51 xenografts (Figure 3F-G), further supporting its efficacy in a more sensitive cell line. To evaluate drug-associated toxicity, we measured body weight changes throughout the course of treatment. CMPD1-treated mice maintained a comparable weight gain to the control group, whereas mice treated with paclitaxel (PTX) showed significantly reduced body weight (Figure 3-Supplement Figure 2A). Notably, animal deaths occurred only in the PTX-treated groups in both MDA-MB-231 and CAL-51 models (Figure 3-Supplement Figure 2B). We also assessed organ toxicity, including both anatomical and functional evaluations of the kidney and liver, and observed no significant damage in CMPD1-treated mice (Figure 3-Supplement Figures 3A-B and 3D). Furthermore, white blood cell (WBC) counts remained stable in the CMPD1 group, while PTX treatment led to a significant reduction (Figure 3-Supplement Figures 3C-D). These additional data provide strong evidence for the anti-tumor efficacy and lower toxicity of CMPD1 in vivo.

      (5) While I appreciate the combination effect of treating cells with the MK2 inhibitor with vinblastine. I would consider using genetic knockdown as a complementary approach to demonstrate that inhibiting the p38-MK2 pathway synergized with microtubule depolymerizing agents. In addition, could inhibition of the p38-MK2 pathway alone induce the cell growth inhibition observed with CMPD1 treatment?

      Thank you for these important suggestions. In the revised manuscript, we have incorporated siRNA-mediated knockdown of MK2 in combination with vinblastine treatment. This genetic approach revealed synergistic effects on mitotic index and mitotic errors, closely mirroring the phenotypes observed with pharmacological co-treatment using the MK2 inhibitor and vinblastine (Figure 6-Supplement Figure 2A-C). These results further validate the role of the p38-MK2 pathway in modulating mitotic progression in the presence of MTAs. To address whether MK2 inhibition alone is sufficient to impair cell growth, we performed validation experiments using the MK2 inhibitor at 10 µM. At this concentration, the inhibitor effectively blocked phosphorylation of Hsp27, a major downstream substrate of MK2, under H2O2-induced ROS stress conditions (Figure 6-Supplement Figure 1A-B), confirming MK2 signaling pathway inhibition. However, treatment with the MK2 inhibitor alone did not significantly affect cell proliferation, as shown by a 4-day growth curve analysis in CAL-51 cells (Figure 6-Supplement Figure 1C). These findings suggest that inhibition of the p38-MK2 pathway alone is not sufficient to suppress cancer cell growth, and that its synergistic interaction with MTAs, such as vinblastine, is essential for the observed anti-proliferative effects.

      (6) Phenotypic studies (such as anchorage-independent growth and cell migration and invasion assay) of combining MK2 inhibitor with vinblastine in TNBC cells are recommended.

      Thank you for this valuable suggestion. In the revised manuscript, we have conducted cancer cell migration assays using CAL-51 TNBC cells treated with control, MK2 inhibitor alone, vinblastine alone, or the combination of both. Our results demonstrate that the combination treatment significantly enhances the inhibition of cell migration compared to either agent alone (Figure 6-Supplement Figure 3A-C). These findings provide additional phenotypic evidence supporting the synergistic interaction between MK2 inhibition and microtubule-targeting agents in TNBC cells.

      Reviewer #3 (Recommendations for the authors):

      The authors can utilize diverse experiments to support their conclusions.

      Thank you for this important suggestion. In the revised manuscript, we have conducted a series of additional experiments to robustly support our conclusions.

      These include:

      (1) Xenograft studies using CAL-51 TNBC cells, along with comprehensive toxicity evaluations.

      (2) CMPD1 sensitivity analysis in non-transformed MCF10A mammary epithelial cells.

      (3) IC50 measurements in MDA-MB-231, CAL-51, CAL-51 p53 knockout, and MCF10A cells.

      (4) Cell migration assays assessing the combination effects of MK2 inhibitor and vinblastine

      (5) siRNA-mediated genetic knockdown of MK2 to complement pharmacological findings

      Collectively, these additional data sets substantially strengthen the evidence base for our conclusions and provide a more comprehensive mechanistic understanding.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use the teleost medaka as an animal model to study the effect of seasonal changes in day-length on feeding behaviour and oocyte production. They report a careful analysis of how day-length affects female medakas and a thorough molecular genetic analysis of genes potentially involved in this process. They show a detailed analysis of two genes and include a mutant analysis of one gene to support their conclusions

      Strengths:

      The authors pick their animal model well and exploit the possibilities to examine in this laboratory model the effect of a key environmental influence, namely the seasonal changes of day-length. The phenotypic changes are carefully analysed and well-controlled. The mutational analysis of the agrp1 by a ko-mutant provides important evidence to support the conclusions. Thus this report exceeds previous findings on the function of agrp1 and npyb as regulators of food-intake and shows how in medaka these genes are involved in regulating the organismal response to an environmental change. It thus furthers our understanding of how animals react to key exogenous stimuli for adaptation.

      Weaknesses:

      The authors are too modest when it comes to underscoring the importance of their findings. Previous animal models used to study the effect of these neuropeptides on feeding behaviour have either lost or were most likely never sensitive to seasonal changes of day length. Considering the key importance of this parameter on many aspects of plant and animal life it could be better emphasised that a suitable animal model is at hand that permits this. The molecular characterization of the agrp1 ko-mutant that the authors have generated lacks some details that would help to appreciate the validity of the mutant phenotype. Additional data would help in this respect.

      We would like to thank Reviewer #1 for the really constructive advice. In the revised manuscript, we provided more information on the molecular characterization of the agrp1 KO-mutant and to emphasize the importance of our present animal model that permits the analysis of neuropeptide effects on feeding behavior in response to seasonal changes of day length.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated the mechanisms behind breeding season-dependent feeding behavior using medaka, a well-known photoperiodic species, as a model. Through a combination of molecular, cellular, and behavioral analyses, including tests with mutants, they concluded that AgRP1 plays a central role in feeding behavior, mediated by ovarian estrogenic signals.

      Strengths:

      This study offers valuable insights into the neuroendocrine mechanisms that govern breeding season-dependent feeding behavior in medaka. The multidisciplinary approach, which includes molecular and physiological analyses, enhances the scientific contribution of the research.

      Weaknesses:

      While medaka is an appropriate model for studying seasonal breeding, the results presented are insufficient to fully support the authors' conclusions.

      Specifically, methods and data analyses are incomplete in justifying the primary claims:<br /> - the procedure for the food intake assay is unclear;

      - the sample size is very small;

      - the statistical analysis is not always adequate.

      Additionally, the discussion fails to consider the possible role of other hormones that may be involved in the feeding mechanism.

      We would like to thank Reviewer #2 for the helpful comments. As the reviewer suggested, we revised the paragraph describing the procedure for the food intake assay to make it much easier for the readers to understand in the revised manuscript. In Figure 1-Supplementary figure 2, RNAseq was performed to search for the candidate neuropeptides, and that’s why the sample size was the minimum. On the other hand, each group in the other experiments consist of n ≥ 5 samples, which is usually accepted to be adequate sample size in various studies (cf. Kanda et al., Gen Comp Endocrinol., 2011, Spicer et al., Biol Reprod., 2017). As for the statistical analyses, we revised our manuscript so that the readers may be convinced with the validity of our statistical analyses.

      Reviewer #3 (Public review):

      Summary:

      Understanding the mechanisms whereby animals restrict the timing of their reproduction according to day length is a critical challenge given that many of the most relevant species for agriculture are strongly photoperiodic. However, the principal animal models capable of detailed genetic analysis do not respond to photoperiod so this has inevitably limited progress in this field. The fish model medaka occupies a uniquely powerful position since its reproduction is strictly restricted to long days and it also offers a wide range of genetic tools for exploring, in depth, various molecular and cellular control mechanisms.

      For these reasons, this manuscript by Tagui and colleagues is particularly valuable. It uses the medaka to explore links bridging photoperiod, feeding behaviour, and reproduction. The authors demonstrate that in female, but not male medaka, photoperiod-induced reproduction is associated with an increase in feeding, presumably explained by the high metabolic cost of producing eggs on a daily basis during the reproductive period. Using RNAseq analysis of the brain, they reveal that the expression of the neuropeptides agrp and npy that have been previously implicated in the regulation of feeding behaviour in mice are upregulated in the medaka brain during exposure to long photoperiod conditions. Unlike the situation in mice, these two neuropeptides are not co-expressed in medaka neurons, and food deprivation in medaka led to increases in agrp but also a decrease in npy expression. Furthermore, the situation in fish may be more complicated than in mice due to the presence of multiple gene paralogs for each neuropeptide. Exposure to long-day conditions increases agrp1 expression in medaka as the result of increases in the number of neurons expressing this neuropeptide, while the increase in npyb levels results from increased levels of expression in the same population of cells. Using ovariectomized medaka and in situ hybridization assays, the authors reveal that the regulation of agrp1 involves estrogen acting via the estrogen receptor esr2a. Finally, a loss of agrp1 function mutant is generated where the female mutants fail to show the characteristic increase in feeding associated with long-day enhanced reproduction as well as yielding reduced numbers of eggs during spawning.

      Strengths:

      This manuscript provides important foundational work for future investigations aiming to elucidate the coordination of photoperiod sensing, feeding activity, and reproduction function. The authors have used a combination of approaches with a genetic model that is particularly well suited to studying photoperiodic-dependent physiology and behaviour. The data are clear and the results are convincing and support the main conclusions drawn. The findings are relevant not only for understanding photopriodic responses but also provide more general insight into links between reproduction and feeding behaviour control.

      Weaknesses:

      Some experimental models used in this study, namely ovariectomized female fish and juvenile fish have not been analysed in terms of their feeding behaviour and so do not give a complete view of the position of this feeding regulatory mechanism in the context of reproduction status. Furthermore, the scope of the discussion section should be expanded to speculate on the functional significance of linking feeding behaviour control with reproductive function.

      We would like to thank Reviewer #3 for the insightful advice. We added several pertinent sentences describing the ovariectomized female fish and juvenile fish, and our revised manuscript will give more complete view of their feeding regulatory mechanism in the context of reproduction status. In addition, we revised the discussion section to incorporate the valuable suggestion of the Reviewer #3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      General: the text could profit from a careful editing of errors, including adjusting singular and plural status of nouns and verbs: examples are line 107 noun, line 96 verb suitable text editing software is available to do this task

      Thank you for your suggestion. We thoroughly read the entire manuscript and corrected such errors in the revised manuscript.

      As medaka is a unique genetic vertebrate model to study seasonal effects, it would be interesting to know whether the authors found novel or rather unexpected genes with a differential expression between LD and SD. It is understandable that the authors focused on argrp1 and npyb, as these have already been well studied in mammalian models although not in this context. Novel insights with genes previously not implicated in feeding regulation could underscore the unique nature of medaka as a model.

      We appreciate your kind comments, which we found really encouraging to us. Since we focused on feeding-related peptides, we did not find any novel genes that have not been reported.

      ISH is unreliable as a methodology to quantify expression levels. Yet the authors use this to compare fed and starved females to compare expression levels of agrp1. They use a temporal staining comparison and compare 90-minute and 300-minute staining reactions. However, they do not explain why they use the 90-minute staining time point and why 300 minutes of staining is the "saturation point of staining". They should provide compelling data for their claim and the selection of time points or else refrain from using these (at best) semi-quantitative ISH and provide more detailed (using serial sections) data to quantify the number of expressing cells.

      Anyhow, the quantification of mRNA expression levels may not be that significant when trying to compare different states of gene function, as translational and post-translational steps can have large effects on gene function. This should be discussed adequately.

      Thank you very much for your comments. We conducted ISH by using medaka under LD or SD, not using those under fed or starved conditions. In addition, our previous study demonstrated that the slopes of the increase in the number of cells stained by ISH are also different if there is a difference in the expression level (Mitani et al., 2010). Although we do not have quantitative data of cell numbers, we confirmed that the number of cells expressing agrp1 was saturated around 300 mins in our preliminary experiments, and therefore we terminated the chemogenic reactions at 300 mins. Based on these, we compared the cell ratio of 90 min (beginning of coloring) /300 min (saturation). However, since this analysis may not be worth discussing in detail, we moved this part to the supplementary figure as the reviewer suggested.

      The molecular characterization of the agrp1 ko mutant is a bit thin.

      Line 221: "We obtained agrp1<sup>−/−</sup> medaka, which has lots of amino acid changes in functional site for AgRP1" is a bit vague as a description for the ko-mutation. It would be really helpful if the authors could provide a scheme showing the wt protein with the relevant functional sites alongside the presumptive mutant protein.

      How did the authors verify the molecular nature of their mutation? They should use suitable antibodies and western-blot analysis (maybe reagents from Shainer et al., 2019 work in medaka); in case this is not possible they could isolate & clone the mutant transcript and use in-vitro translation systems to show that the presumptive mutant protein can actually be translated from this transcript. Another strategy could be to use a second non-allelic and (hopefully) non-complementing mutation (ko1/ko2 heterozygots for example) to show that ko-mutation acts the way the authors presume. The authors mention agrp1 ko medaka lines (plural!) in line 520, thus they may have an additional ko allele at hand.

      Thank you very much for your comments. We explained the mutation site in Figure 6-Supplementary Figure 1 (A: DNA sequences and B: predicted amino acid sequence, of WT and mutants). In addition, we added immunohistochemistry data of WT and mutant using anti-AgRP antibody (Figure 6-Supplementary Figure 1C). While AgRP-immunoreactive signals were observed in WT, those were not in agrp1<sup>−/−</sup>. This result suggests that AgRP1 is not functional in agrp1<sup>−/−</sup>.

      Presumably, the authors analysed heterozygous agrp1<sup>+/−</sup> females and found they are as wt. If so the authors should say so.

      Yes, we analyzed food intake of agrp1<sup>+/−</sup>. We added a supplementary figure (Figure 6-Supplementary Figure 2) and a sentence in L. 233-234.

      How about agrp1<sup>−/−</sup> medaka males: do they show a discernible phenotype?

      We analyzed the phenotypes of agrp1<sup>−/−</sup> males but did not describe the results, since the present paper only focused on female-specific feeding behavior.

      agrp1<sup>−/−</sup> females show no significant sensitivity of food intake to day length (Figure 6C). Does their (reduced) oocyte production react to day length? With other words: how much of the seasonal sensitivity is left in agrp1<sup>−/−</sup> females. The authors suggest that E2 acts upstream of agrp1 and therefore some seasonality may still be left in agrp1<sup>−/−</sup> females.

      Although agrp1<sup>−/−</sup> female is suggested to display abnormal seasonality of food intake, agrp1<sup>−/−</sup> female in LD spawns and that in SD does not, indicating that seasonality of gonadal maturation still remains in agrp1<sup>−/−</sup> female.

      The authors show that fshb and lhb are downregulated in agrp1<sup>−/−</sup> females. Is this also the case in wt females at SD?

      Thank you very much for your comment. As described above, agrp1<sup>−/−</sup> can spawn, which indicates that mechanisms for the downregulation of gonadotropins in agrp1<sup>−/−</sup> may be different from that in SD female.

      Figure 1_Supplementary Figure 2: the trends are visible in B and C, however, there is quite some variance between LD1, 2, and 3; the same for SD 1, 2, and 3. Can the authors give an explanation for this?

      Since the data for LD1, 2, and 3 (SD1, 2, and 3) were obtained from different individual fish, the variance may be reasonable. We conducted expression analyses by using RNA-seq to find candidate genes that show larger differences than individual ones.

      Figure 7E: the ovaries are difficult to see and the size bar in the wt picture is missing.

      Thank you very much for your comments. We added a scale bar in the wt picture.

      509 ff: the authors do not describe what exactly the "sham operation" encompasses: were the females just anesthetised or was there an actual operation without removing the ovaries?

      The sham operation group was anesthetized, received an abdominal incision without removing the ovaries, and received skin suture by using a silk thread. We added this explanation in the Method section.

      519 ff: was the agrp1<sup>−/−</sup> ko induced in the d-rR strain to have the same genetic background as the wt fish?

      Exactly. As the reviewer pointed out, the genetic background of agrp1 -/- was the same as that of WT.

      Minor points (Text edits):

      Line 42: change "when" into "where".

      Line: 54 "under the fixed appropriate ambient temperature" change into "while keeping an appropriate temperature constant".

      Line 55: here it would be good to briefly explain what long-day and short-day is so that the reader has an idea about the changes required without having to scroll down to the M&M section. For example LD 14/10 light-dark cycle, SD 10/14 light-dark cycle.

      Line 88: change "measurement" into "measuring".

      Line 96 change eats -> eat.

      Line 107 change female -> females.

      We deeply appreciate the reviewer’s suggestions described above. We corrected them as the reviewer suggested (L. 42, L. 54, L. 55, L. 89, L. 96, L. 107).

      Line 144-145: the sentence "since hypothalamic npy control..." does not make sense. Please correct.

      Thank you very much for your suggestion. We corrected the sentence so that it makes sense (L. 145-146).

      Line 180 and 185: the term here should be "LD induced sexual activity" rather than maturity. Age is the main determinant of maturity whereas light (LD) determines activity, in other words SD females are sexually mature if they are post-puberty stage.

      Thank you very much for your suggestion. Since the sentence “LD-induced sexual maturity” made the reviewer confused, we corrected the sentence “substance(s) from LD-induced mature ovary” or “ovarian maturity”. Even though SD females are at post-puberty stage, their ovaries are immature and do not possess mature oocytes (L. 181).

      Line 222: the authors should include the relevant information about the females: presumably agrp1.

      In Line 226-228, we explained the phenotypes of agrp1 knockout and added information for AgRP1 protein in Figure 6-Supplementary figure 1C.

      Lines 449 ff: authors should state that the analysis was done in females, instead of just writing "medaka". This is also in line with the preceding paragraph of the M&M section.

      Thank you very much for your suggestions. We corrected the sentence as the reviewer suggested (L.469)

      Line 305: change like other mammals -> like in mammals.

      Thank you very much for your suggestion. We corrected the sentence as the reviewer suggested (L. 320)

      Reviewer #2 (Recommendations for the authors):

      (1) The procedure of the food intake assay is not clear.

      - Habituation Period: Medaka were placed into a white cup containing 100 mL of water and allowed to habituate for 5 minutes. However, is 5 minutes sufficient to reduce stress in the fish? A stressed fish does not exhibit the same feeding behavior as an unstressed one.

      Thank you for your comment. We confirmed that 5 minutes is enough for habituation in medaka, since medaka can swim freely in a few minutes after replacement from the tank and show normal feeding behavior.

      - Feeding Protocol: Medaka were fed with 200 μL aliquots of brine shrimp-containing water. This procedure was repeated multiple times. How many times was this feeding procedure repeated? Was it 3, 10, or 100 times?

      Although there was a small variation in each trial, we usually applied tubes about 5 times or so.

      - Brine Shrimp Counting: You collected 10 mL of the breeding water to count the number of uneaten brine shrimp. Can you confirm that sampling 10% of the total volume is representative? Were any tests conducted to validate this? Given that you developed an automated tool to count the brine shrimp, why didn't you count them in all 100 mL?

      The reason for collecting 10 mL is to collect the leftover shrimp as soon as possible. Ten mins after the start of the experiment, we quickly placed a magnetic bar to stir the breeding water so that the shrimp concentration will be constant. Then we collected 10 mL aliquot from the experimental cup by using a micro pipette. In preliminary trials, we applied shrimps, the amount of which is almost the same as that applied to WT medaka in LD, to a white cup containing 100 mL water, and we divided it into 10 mL and 90 mL aliquots and separately counted the number of shrimps in each aliquot. Here, we confirmed that the variance between the numbers calculated by counting the shrimps in 10 mL aliquot and the total volume of 100 mL falls within the range of the variance of total applied shrimp. Thus, our present counting method can be considered reasonable.

      - Brine Shrimp Aliquot Measurement: You mentioned counting the number of brine shrimp in the 200 μL solution three times before and after the experiments. What does this mean? Did you use this procedure to calculate the mean number of brine shrimp in each 200 μL aliquot?

      Thank you for your comment. As the reviewer commented, to calculate the mean number of brine shrimp in each 200 µL aliquot, we counted the number of brine shrimp in the 200 µL solution three times before and after the experiments.

      - How did you normalize the food intake data? This procedure is not detailed in the methods section.

      Thank you very much for pointing it out. We normalized food intake by subtracting the amount of shrimp by the average of those in LD or WT fish. This explanation was added in the Method section (L. 439).

      (2) Sample Size. Various tests were conducted with a low number of medaka (e.g., 2 brains for RNA-seq, 8 females for ovariectomy). Are these sample sizes sufficient to draw reliable conclusions?

      In Figure 1-Supplementary figure 2, RNAseq was performed to search for the candidate neuropeptides, and that’s why the sample size was the minimum; we pooled two brains as one sample and used three samples per group. On the other hand, each group in the other experiments consist of n ≥ 5 samples, which is usually accepted to be adequate sample size in various studies (cf. Kanda et al., Gen Comp Endocrinol., 2011, Spicer et al., Biol Reprod., 2017).

      (3) Statistical Analysis.

      - The authors used both parametric and non-parametric tests but did not specify how they assessed the normal distribution of the data. For example, if I understood correctly, a t-test was used to compare a small dataset (n=3). In such cases, a U-test would be more appropriate.

      Thank you for your comment. As for Figure 1 -Supplementary Figure 2C, we showed the graphs just to show you candidates. To avoid misunderstanding, we deleted statistical statements in that panel.

      - It is unclear why the Steel-Dwass test was used instead of the Kruskal-Wallis test for comparing agrp1 and npyb expressions in control, OVX, and E2-administered medaka.

      While the authors mentioned using non-parametric tests, they did not specify in which contexts or conditions they were applied.

      Thank you very much for your comment. Kruskal-Wallis test statistically shows whether or not there are differences among any of three groups. To perform multiple comparisons among the three groups, we used Steel-Dwass test.

      - The results section lacks details on the statistical tests used, including the specific test (e.g., Z, U, or W values) and degrees of freedom.

      Thank you for your comment. As the reviewer pointed out, we added such statements in all the figure legends containing statistics.

      (4) Previous studies have shown that photoperiod treatments alter the production of various hormones in medaka (e.g., Lucon-Xiccato et al., 2022; Shimmura et al., 2017), some of which, like growth hormone (GH), have been shown to influence feeding behavior (Canosa et al., 2007).

      In your RNA-seq analysis, did you observe any changes in the expression of genes involved in other hormone synthesis pathways, such as pituitary hormones (GH and TSH), leptin, or ghrelin (e.g., see Volkoff, 2016; Blanco, 2020; Bertolucci et al., 2019)?

      Including such evidence in the discussion would provide a broader perspective on the hormonal regulation of food intake in medaka.

      We appreciate your constructive comments. Unfortunately, since we performed RNA-seq using the whole brain after removal of the pituitary, we could not check such changes in the expression of pituitary hormone-related genes. As additional information about the feeding-related hormones, leptin did not show significant difference in our RNA-seq analysis, and we could not analyze ghrelin because ghrelin has not been annotated in medaka (NCBI and ensembl).

      Reviewer #3 (Recommendations for the authors):

      There are some parts of the study that need to be developed further in order to provide a more comprehensive analysis.

      (1) In the juvenile as well as ovariectomized female fish, the authors should confirm experimentally whether day length influences feeding activity.

      Thank you very much for your suggestion. We analyzed feeding behavior of juvenile (Figure 4-Supplementary Figure 1) and OVX female (Figure 5-Supplementary Figure 1). As shown in these figures, food intake in juvenile and OVX were not significantly different between LD and SD.

      (2) More discussion as to the relevance of increasing feeding activity to support reproductive functions such as sustained egg production would be valuable. One assumes the metabolic costs of producing eggs on a daily basis in this species would inevitably require increased food intake. Is this a reasonable prediction?

      We deeply appreciate your suggestion. We strongly agree with this argument, and we added such discussion in “Discussion” section (L. 406-408).

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We appreciate the editor’s suggestion. We added P-value in the main manuscript, where statistical analyses were performed. In addition, we described test statics in the figure legends. We did not use df values for the statistics used in the present analyses, and therefore did not describe it in the main text.

    1. Author response:

      We will revise the statements of novelty in the introduction by more clearly emphasizing how our model addresses gaps in the existing literature. In addition, we will clarify the description of the dispersal process. Briefly, we use the same dispersal gene β to represent the likelihood an individual will either leave or join a group, thereby quantifying both dispersal and immigration using the same parameter. Specifically, individuals with higher β are more likely to remain as floaters (i.e., disperse from their natal group to become a breeder elsewhere), whereas those with lower β are either more likely to remain in their natal group as subordinates (i.e., queue in a group for the breeding position) or join another group if they dispersed. Immigrants that join a group as a subordinate help and queue for a breeding position, as does any natal subordinate born into the group. To follow the suggestion of the referee and more fully explore the impact of competition between subordinates born in the group and subordinate immigrants, we will explore extending our model to allow dispersers to leave their natal group and join another as subordinates, by incorporating a reaction norm based on their age or rank (D = 1 / (1 + exp (β<sub>t</sub> * t – β<sub>0</sub>)) . This approach will allow individuals to adjust also their dispersal strategy to their competitiveness and to avoid kin competition by remaining as a subordinate in another group.

      We apologize that there was some confusion with terminology. We use the term “disperser” to describe individuals that disperse from their natal group. Dispersers can assume one of three roles: (1) they can migrate to another group as "subordinates"; (2) they can join another group as "breeders" if they successfully outcompete other candidates; or (3) they can remain as "floaters" if they fail to join a group. "Floaters" are individuals who persist in a transient state without access to a breeding territory, waiting for opportunities to join a group in an established territory. Therefore, dispersers do not work when they are floaters, but they may later help if they immigrate to a group as a subordinate. Consequently, immigrant subordinates have no inherent competitive advantage over natal subordinates (as step 2.2. “Join a group” is followed by step 3. “Help”, which occurs before step 5. “Become a breeder”). Nevertheless, floaters can potentially outcompete subordinates of the same age if they attempt to breed without first queuing as a subordinate (step 5) when subordinates are engaged in work tasks. We believe that this assumption is realistic and constitutes part of the costs associated with work tasks. However, floaters are at a disadvantage for becoming a breeder because: (1) floaters incur higher mortality than individuals within groups (eq. 3); and (2) floaters may only attempt to become breeders in some breeding cycles (versus subordinate groups members, who are automatically candidates for an open breeding position in the group in each cycle). Therefore, due to their higher mortality, floaters are rarely older than individuals within groups, which heavily influences dominance value and competitiveness. Additionally, any competitive advantage that floaters might have over other subordinate group members is unlikely to drive the kin selection-only results because subordinates would preferably choose defense tasks instead of work tasks so as not to be at a competitive disadvantage compared to floaters.

      We note that reviewers also mention that floaters often aren't usually high resource holding potential (RHP) individuals and, therefore, our assumptions might be unrealistic. As we explain above, floaters are not inherently at a competitive advantage in our model. In any case, empirical work in a number of species has shown that dispersers are not necessarily those of lower RHP or of lower quality. In fact, according to the ecological constraints hypothesis, one might predict that high quality individuals are the ones that disperse because only individuals in good condition (e.g., larger body size, better energy reserves) can afford the costs associated with dispersal (Cote et al., 2022). By adding a reaction norm approach to explore the role of age or rank in the revised version, we can also determine whether higher or lower quality individuals are the ones dispersing. We will address the issues of terminology and clarity of the relative competitive advantage of floaters versus subordinates, and also include more information in the Supplementary Tables (e.g., the number of floaters). As a side note, the “scramble context” we mention was an additional implementation that we decided to remove from the final manuscript, but we forgot to remove from Table 1 before submission.

      The reviewers also raised a question about asexual reproduction and relatedness more generally. As we showed in the Supplementary Tables and the section on relatedness in the SI (“Kin selection and the evolution of division of labor"), high relatedness does not appear to explain our results. In evolutionary biology generally and in game theory specifically (with the exception of models on sexual selection or sex-specific traits), asexual reproduction is often modelled because it reduces unnecessary complexity. To further study the effect of relatedness on kin structures more closely resembling those of vertebrates, however, we will create an additional “relatedness structure level”, where we will shuffle half of the philopatric offspring using the same method used to remove relatedness completely. This approach will effectively reduce relatedness structure by half and overcome the concerns with our decision to model asexual reproduction.

      Briefly, we will elaborate on the concept of division of labor and the tasks that cooperative breeders perform. In nature, multiple tasks are often necessary to successfully rear offspring. For example, in many cooperatively breeding birds, the primary reasons that individuals fail to produce offspring are (1) starvation, which is mitigated by the feeding of offspring, and (2) nest depredation, which is countered by defensive behavior. Consequently, both types of tasks are necessary to successfully produce offspring, and focusing solely on one while neglecting the other is likely to result in lower reproductive success than if both tasks are performed by individuals within the group. We simplify this principle in the model by maximizing reproductive output when both tasks are carried out to a similar extent, allowing for some flexibility from the mean. In response to the reviewer suggestion about making fecundity a function of work tasks and offspring survival as a function of defensive tasks, these are actually equivalent in model terms, as it’s the same whether breeders produce three offspring and two die, or if they only produce one. This represents, of course, a simplification of the natural context, where breeding unsuccessfully is more costly (in terms of time and energy investment) than not breeding at all, but this is approach is typically used in models of this sort.

      The scope of this paper was to study division of labor in cooperatively breeding species with fertile workers, in which help is exclusively directed towards breeders to enhance offspring production (i.e., alloparental care). Our focus is in line with previous work in most other social animals, including eusocial insects and humans, which emphasizes how division of labor maximizes group productivity. Other forms of “general” help are not considered in the paper, and such forms of help are rarely considered in cooperatively breeding vertebrates or in the division of labor literature, as they do not result in task partitioning to enhance productivity.

      How do we model help? Help provided is an interaction between H (total effort) and T (proportion of total effort invested in each type of task). We will make this definition clearer in the revised manuscript. Thank you for pointing out an error in Eq. 1. This inequality was indeed written incorrectly in the paper (but is correct in the model code); it is dominance rank instead of age (see code in Individual.cpp lines 99-119). We will correct this mistake in the revision.

      There was also a question about bounded and unbounded helping costs. The difference in costs is inherent to the nature of the different task (work or defense): while survival is naturally bounded, with death as the lower bound, dominance costs are potentially unbounded, as they are influenced by dynamic social contexts and potential competitors. Therefore, we believe that the model’s cost structure is not too different to that in nature.

      Thank you for your comments about the parameter landscape. It is important to point out that variations in the mutation rate do not qualitatively affect our results, as this is something we explored in previous versions of the model (not shown). Briefly, we find that variations in the mutation rates only alter the time required to reach equilibrium. Increasing the step size of mutation diminishes the strength of selection by adding stochasticity and reducing the genetic correlation between offspring and their parents. Population size could, in theory, affect our results, as small populations are more prone to extinction. Since this was not something we planned to explore in the paper directly, we specifically chose a large population size, or better said, a large number of territories (i.e. 5000) that can potentially host a large population.

      During the exploratory phase of the model development, various parameters and values were also assessed. However, the manuscript only details the ranges of values and parameters where changes in the behaviors of interest were observed, enhancing clarity and conciseness. For instance, variation in y<sub>h</sub> (the cost of help on dominance when performing “work tasks”) led to behavioral changes similar to those caused by changes in x<sub>h</sub> (the cost of help in survival when performing “defensive tasks”), as both are proportional to each other. Specifically, since an increase in defense costs raises the proportion of work relative to defense tasks, while an increase in the costs of work task has the opposite effect, only results for the variation of x<sub>h</sub> were included in the manuscript to avoid redundancy. We will make this clearer in the revision.

      Finally, following the advice from the reviewers, we will add the symbols of the variables to the figure axes, and clarify whether the values shown represent a genetic or phenotypic trait. In Figure 2, the x-axis is H and the y-axis is T. In Figure 3A, the subindex t in x-axis is incorrect; it should be subindex R (reaction norm to dominance rank instead of age), the y-axis is T. In Figure 3B, the x-axis is R, and the y-axis is T. All values of T, H and R are phenotypic expressed values (see Table 1). For instance, T values are the phenotypic expressed values from the individuals in the population according to their genetic gamma values and their current dominance rank at a given time point.

      References

      Cote, J., Dahirel, M., Schtickzelle, N., Altermatt, F., Ansart, A., Blanchet, S., Chaine, A. S., De Laender, F., De Raedt, J., & Haegeman, B. (2022). Dispersal syndromes in challenging environments: A cross‐species experiment. Ecology Letters, 25(12), 2675–2687.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a practical modification of the orthogonal hybridization chain reaction (HCR) technique, a promising yet underutilized method with broad potential for future applications across various fields. The authors advance this technique by integrating peptide ligation technology and nanobody-based antibody mimetics - cost-effective and scalable alternatives to conventional antibodies - into a DNA-immunoassay framework that merges oligonucleotide-based detection with immunoassay methodologies. Notably, they demonstrate that this approach facilitates a modified ELISA platform capable of simultaneously quantifying multiple target protein expression levels within a single protein mixture sample.

      Strengths:

      The hybridization chain reaction (HCR) technique was initially developed to enable the simultaneous detection of multiple mRNA expression levels within the same tissue. This method has since evolved into immuno-HCR, which extends its application to protein detection by utilizing antibodies. A key requirement of immuno-HCR is the coupling of oligonucleotides to antibodies, a process that can be challenging due to the inherent difficulties in expressing and purifying conventional antibodies.

      In this study, the authors present an innovative approach that circumvents these limitations by employing nanobody-based antibody mimetics, which recognize antibodies, instead of directly coupling oligonucleotides to conventional antibodies. This strategy facilitates oligonucleotide conjugation - designed to target the initiator hairpin oligonucleotide of HCR -through peptide ligation and click chemistry.

      Weaknesses:

      The sandwich-format technique presented in this study, which employs a nanobody that recognizes primary IgG antibodies, may have limited scalability compared to existing methods that directly couple oligonucleotides to primary antibodies. This limitation arises because the C-region types of primary antibodies are relatively restricted, meaning that the use of nanobody-based detection may constrain the number of target proteins that can be analyzed simultaneously. In contrast, the conventional approach of directly conjugating oligonucleotides to primary antibodies allows for a broader range of protein targets to be analyzed in parallel.

      We would like to clarify that MaMBA was specifically designed to address and overcome the limitations imposed by relying on primary antibodies’ Fc types for multiplexing. MaMBA utilizes DNA oligo-conjugated nanobodies that selectively and monovalently bind to the Fc region of IgG. This key feature allows us to barcode primary IgGs targeting different antigens independently. These barcoded IgGs can then be pooled together after barcoding, effectively minimizing the potential for cross-reactivity or crossover. Therefore, IgGs barcoded using MaMBA are functionally equivalent to those barcoded via conventional direct conjugation approaches with respect to multiplexing capability.

      Additionally, in the context of HCR-based protein detection, the number of proteins that can be analyzed simultaneously is inherently constrained by fluorescence wavelength overlap in microscopy, which limits its multiplexing capability. By comparison, direct coupling of oligonucleotides to primary antibodies can facilitate the simultaneous measurement of a significantly greater number of protein targets than the sandwich-based nanobody approach in the barcode-ELISA/NGS-based technique.

      As we have responded above, MaMBA barcoding of primary IgGs that target various antigens can be conducted separately. Once barcoded, these IgGs can then be combined into a single pool. Therefore, for BLISA (i.e., the barcode-ELISA/NGS-based technique), IgGs barcoded through MaMBA offer the same multiplexing capability as those barcoded using traditional direct conjugation methods.

      In in situ protein imaging, spectral overlap can indeed limit the throughput of multiplexed HCR fluorescent imaging. There are two strategies to address this challenge. As demonstrated in this work with misHCR and misHCRn, removing the HCR amplifiers allows for multiplexed detection using a limited number of fluorescence wavelengths. This is achieved through sequential rounds of HCR amplification and imaging. Alternatively, recent computational approaches offer promising solutions for “one-shot” multiplexed imaging. These include combinatorial multiplexing (PMID: 40133518) and spectral unmixing (PMID: 35513404), which can be applied to misHCR to deconvolute overlapping spectra and increase multiplexing capacity in a single imaging acquisition.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Nuclear depletion and cytoplasmic mislocalization/aggregation of the DNA and RNA binding protein TDP-43 are pathological hallmarks of multiple neurodegenerative diseases. Prior work has demonstrated that depletion of TDP-43 from the nucleus leads to alterations in transcription and splicing. Conversely, cytoplasmic mislocalization/aggregation can contribute to toxicity by impairing mRNA transport and translation as well as miRNA dysregulation. However, to date, models of TDP-43 proteinopathy rely on artificial knockdown- or overexpression-based systems to evaluate either nuclear loss or cytoplasmic gain of function events independently. Few model systems authentically reproduce both nuclear depletion and cytoplasmic miscloalization/aggregation events. In this manuscript, the authors generate novel iPSC-based reagents to manipulate the localization of endogenous TDP-43. This is a valuable resource for the field to study pathological consequences of TDP-43 proteinopathy in a more endogenous and authentic setting. However, in the current manuscript, there are a number of weaknesses that should be addressed to further validate the ability of this model to replicate human disease pathology and demonstrate utility for future studies.

      Strengths:

      The primary strength of this paper is the development of a novel in vitro tool.

      Weaknesses:

      There are a number of weaknesses detailed below that should be addressed to thoroughly validate these new reagents as more authentic models of TDP-43 proteinopathy and demonstrate their utility for future investigations.

      (1) The authors should include images of their engineered TDP-43-GFP iPSC line to demonstrate TDP-43 localization without the addition of any nanobodies (perhaps immediately prior to addition of nanobodies). Additionally, it is unclear whether simply adding a GFP tag to endogenous TDP-43 impact its normal function (nuclear-cytoplasmic shuttling, regulation of transcription and splicing, mRNA transport etc).

      We have included images of the untransduced day 20 MNs derived from the engineered TDP43-GFP iPSC lines and the unedited line (Supplementary Fig. 1B).

      We acknowledge the reviewer’s concern about the potential impact of the GFP tag on TDP43's normal function. To address this, we have validated the functionality of TDP43 by assessing the inclusion of cryptic exons in highly sensitive targets such as UNC13A and STMN2, both of which are known to be directly regulated by TDP43.

      We compared MNs derived from the unedited parent line with the TDP43-GFP MNs prior to nanobody addition. As measured by qPCR, cryptic exon inclusion in UNC13A and STMN2 was not observed in the unedited or edited TDP43-GFP MNs (Supplementary Fig.1C), confirming that the tagging does not induce splicing defects by itself. The cryptic exon inclusion in UNC13A and STMN2 were only observed in TDP43-GFP MNs expressing the NES nanobody (Supplementary Fig. 2D). These findings were further supported by our next-generation sequencing data, which also showed that cryptic exon inclusion was specific to the TDP43 mislocalization condition (Supplementary Fig.3 and 4).

      Thus, we have strong evidence that the GFP-tagged TDP43 behaves similarly to the wild-type protein and does not interfere with its function in our model.

      (2) Can the authors explain why there is a significant discrepancy in time points selected for nanobody transduction and immunostaining or cell lysis throughout Figure 1 and 2? This makes interpretation and overall assessment of the model challenging.

      For the phenotypic data shown in Fig.1, we added the AAVs at day 18 or 20 and analyzed the cells at day 40. For the phosphorylated TDP43 western blot (revised Fig. 3D), cells were treated with doxycycline at day 20 to induce nanobody expression and samples were harvested at day 40. Thus, cells were harvested between days 20 or 22 after adding the nanobodies. The onset of transgene expression when using AAVs in neurons typically display slow kinetics. We observed TDP43 mislocalization in less than 50% of the neurons after 7 days post-transduction that peaked at 10-12 days after addition of the nanobodies, when more than 80% of the cells displayed TDP43 mislocalization. Hence, we do not believe that a two-day difference significantly alters the interpretation of the data.

      The decision to harvest neurons at day 30 for the qPCR data was taken to investigate whether the splicing changes seen at day 40 from the transcriptomics analysis can be detected well before the phenotypes observed at day 40.

      (3) The authors should further characterize their TDP-43 puncta. TDP-43 immunostaining is typically punctate so it is unclear if the puncta observed are physiologic or pathologic based on the analyses carried out in the current version of this manuscript. Additionally, do these puncta co-localize with stress granule markers or RNA transport granule markers? Are these puncta phosphorylated (which may be more reminiscent of end-stage pathologic observations in humans)?

      We have tried immunostaining neurons for phosphorylated TDP43. However, our immunostaining attempts were unsuccessful. Depending on the antibody, we either saw no signal (antibody from Cosmo Bio, TIP-PTD-M01A) or even the control neurons displayed detectable phosphorylation within the nucleus (antibody from Proteintech 22309-1-AP). Consequently, we performed western blot analysis using an antibody from Cosmo Bio, (TIP-PTD-M01A) that clearly shows hyperphosphorylation of TDP43 in whole cell lysates (Fig. 3D, E). Hence, we have referred to these structures as puncta and not aggregates (Page 4).

      To assess co-localization of the puncta with stress granules, we immunostained for the stress granule marker G3BP1. This was done in MNs that were treated with sodium arsenite (SA) or PBS as a control. In the PBS treated control MN cultures, TDP43 mislocalization alone did not induce stress granule formation. G3BP1+ stress granules were only observed following SA stress (0.5 mM, 60 minutes). Further, only a subset of TDP43 puncta overlapped with these stress granules (Supplementary Fig. 7) (Page 6).

      (4) The authors should include multiple time points in their evaluation of TDP-43 loss of function events and aggregation. Does loss of function get worse over time? Is there a time course by which RNA misprocessing events emerge or does everything happen all at once? Does aggregation get worse over time? Do these neurons die at any point as a result of TDP-43 proteinopathy?

      We agree that a time course to analyze TDP43 mislocalization and its consequences would be ideal. However, the mislocalization of TDP43 across neurons is not a coordinated process. At each given time instance, neurons display varying levels of TDP43 mislocalization. Answering the questions raised by the reviewer would require tracking individual neurons in real time in a controlled environment over weeks. Unfortunately, we currently do not have the hardware to run these experiments. However, we do observe increased levels of cleaved caspase 3 in MNs expressing the NES nanobody, indicating that these neurons indeed undergo apoptosis by day 40 (Fig.1).

      We have, however, analyzed changes in splicing using qPCR for 12 genes over a time course starting as early as 4 hours after inducing mislocalization. We detect time-dependent cryptic splicing events in all genes as early as 8 hours after doxycycline addition, coinciding with the appearance TDP43 mislocalization (Fig. 4A, B).

      (5) Can the authors please comment on whether or not their model is "tunable"? In real human disease, not every neuron displays complete nuclear depletion of TDP-43. Instead there is often a gradient of neurons with differing magnitudes of nuclear TDP-43 loss. Additionally, very few neurons (5-10%) harbor cytoplasmic TDP-43 aggregates at end-stage disease. These are all important considerations when developing a novel authentic and endogenous model of TDP-43 proteinopathy which the current manuscript fails to address.

      As shown in Fig .1, the neurons expressing the NES-nanobody display a wide range of mislocalization as assessed by the % of nuclear TDP43 present. By titrating the amount of AAVs added to the culture, the model can be tuned to achieve a wide gradient of TDP43 mislocalization.

      We calculated the size and percentage of neurons displaying TDP43 puncta. The size and the number of aggregates varies across the neurons that display TDP43 mislocalization. Around 50% of the neurons displayed small (1  um<sup>2</sup>) puncta while large puncta (> 5  um<sup>2</sup>) were observed in <10% of the cells, similar to observations in patient tissue (Fig. 1F).

      Reviewer #2 (Public Review):

      Summary:

      TDP-43 mislocalization occurs in nearly all of ALS, roughly half of FTD, and as a co-pathology in roughly half of AD cases. Both gain-of-function and loss-of-function mechanisms associated with this mislocalization likely contribute to disease pathogeneisis.

      Here, the authors describe a new method to induce TDP-43 mislocalization in cellular models. They endogenously tagged TDP-43 with a C-terminal GFP tag in human iPSCs. They then expressed an intrabody - fused with a nuclear export signal (NES) - that targeted GFP to the cytosol. Expression of this intrabody-NES in human iPSC-derived neurons induced nuclear depletion of homozygous TDP-43-GFP, caused its mislocalization to the cytosol, and at least in some cells appeared to cause cytosolic aggregates. This mislocalization was accompanied by induction of cryptic exons in well characterized transcripts known to be regulated by TDP-43, a hallmark of functional TDP-43 loss and consistent with pathological nuclear TDP-43 depletion. Interestingly, in heterozygous TDP-43-GFP neurons, expression of intrabody-NES appeared to also induce the mislocalization of untagged TDP-43 in roughly half of the neurons, suggesting that this system can also be used to study effects on untagged endogenous TDP-43 as well as TDP-43-GFP fusion protein.

      Strengths:

      A clearer understanding of how TDP-43 mislocalization alters cellular function, as well as pathways that mitigate clearance of TDP-43 aggregates, is critical. But modeling TDP-43 mislocalization in disease-relevant cellular systems has proven to be challenging. High levels of overexpression of TDP-43 lacking an NES can drive endogenous TDP-43 mislocalization, but such overexpression has direct and artificial consequences on certain cellular features (e.g. altered exon skipping) not seen in diseased patients. Toxic small molecules such as MG132 and arsenite can induce TDP-43 mislocalization, but co-induce myriad additional cellular dysfunctions unrelated to TDP-43 or ALS. TDP-43 binding oligonucleotides can cause cytosolic mislocalization as well. Each system has pros and cons, and additional ways to induce TDP-43 mislocalization would be useful for the field. The method described in this manuscript could provide researchers with a powerful way to study the combined biology of cytosolic TDP-43 mislocalization and nuclear TDP-43 depletion, with additional temporal control that is lacking in current method. Indeed, the authors see some evidence of differences in RNA splicing caused by pure TDP-43 depletion versus their induced mislocalization model. Finally, their method may be especially useful in determining how TDP-43 aggregates are cleared by cells, potentially revealing new biological pathways that could be therapeutically targeted.

      Weaknesses:

      The method and supporting data have limitations in its current form, outlined below, and in its current form the findings are rather preliminary.

      (1) Tagging of TDP-43 with a bulky GFP tag may alter its normal physiological functions, for example phase separation properties and functions within complex ribonucleoprotein complexes. In addition, alternative isoforms of TDP-43 (e.g. "short" TDP-43, would not be GFP tagged and therefore these species would not be directly manipulatable or visualizable with the tools currently employed in the manuscript.

      With reference to our answer above, we have confirmed using qPCR and RNA-seq analysis that adding a GFP tag to the C-terminus of TDP43 does not result in an appreciable loss of functionality. We do not observe any cryptic exon inclusion in STMN2 and UNC13A. Cryptic exon inclusion in these genes, especially STMN2, has been recognized as a very sensitive indicator of TDP43 loss of function (Supplementary Fig 1C, Supplementary 2D, Fig. 3, Fig.4)

      We acknowledge that truncated alternatively spliced versions of TDP43 will lose the GFP-tag and cannot be manipulated with our system. Since our GFP tag is positioned on the C-terminus, our system cannot manipulate these truncated fragments as the tag is lost in these isoforms. But these isoforms, if present, should be detectable using the Proteintech antibody against total TDP43, which recognizes N-terminal TDP43 epitopes. However, western blot analysis, even 20 days after inducing TDP43 mislocalization, showed no truncated fragments. This suggests that TDP43 mislocalization alone is insufficient to generate significant levels of truncated isoforms. We have added this section to the Limitations paragraph (page 9).

      (2) The data regarding potential mislocalization of endogenous TDP-43 in the heterozygous TDP-43-GFP lines is especially intriguing and important, yet very little characterization was done. Does untagged TDP-43 co-aggregate with the tagged TDP-43? Is localization of TDP-43 immunostaining the same as the GFP signal in these cells?

      The purpose of the heterozygous experiments was to see whether mislocalized TDP43 could potentially trap the untagged TDP43. If this was not the case, we would have seen a maximum of 50% of the TDP43 signal mislocalized to the cytoplasm. The fact that a sizeable proportion of cells had significantly higher levels of TDP43 loss from the nucleus, indicates that mislocalized TDP43 can indeed trap the untagged protein fraction. We used GFP immunostaining to identify the tagged TDP43 while an antibody against the endogenous TDP43 protein was used to detect total TDP43 levels. In the cells that show near complete loss of nuclear TDP43, the total TDP43 signal coincides with the GFP (tagged TDP43) signal. We are unable to distinguish the untagged fraction selectively as we do not have an antibody that can detect this directly.  

      But we agree with the reviewer that these observations need further detailed follow-up that we are unable to provide currently. Hence, we have removed this figure from the manuscript.

      (3) The experiments in which dox was used to induce the nanobody-NES, then dox withdrawn to study potential longer-lasting or self-perpetuating inductions of aggregation is potentially interesting. However, the nanobody was only measured at the RNA level. We know that protein half lives can be very long in neurons, and therefore residual nanobody could be present at these delayed time points. The key measurement to make would be at the protein level of the nanobody if any conclusions are be made from this experiment.

      The reviewer has highlighted an important point. To address this issue, we tagged the nanobodies with a V5 tag that allowed us to directly measure nanobody levels within cells. After Dox withdrawal, we indeed observed significant expression of the nanobody within cells even after two weeks of Dox withdrawal. Extending the time point to three weeks allowed complete loss of the nanobody in most neurons. However, in contrast to our observations at two weeks, this was accompanied by a reversal of TDP43 mislocalization in these neurons at three weeks (Fig. 5).

      Surprisingly, in less than 10% of the neurons, we observed >80% of the total TDP43 still mislocalized to the cytoplasm, despite nearly undetectable levels of the nanobody. Super-resolution microscopy further revealed persistent cytoplasmic TDP43 in these neurons that did not overlap with residual nanobody signal. This suggests that in these neurons, the nanobody was no longer required to maintain TDP43 mislocalization (Fig. 5, page 7)

      (4) Potential differences in splicing and microRNAs between TDP-43 knockdown and TDP-43 mislocalization are potentially interesting. However, different patterns of dysregulated RNA splicing can occur at different levels of TDP-knockdown, thus it is difficult to assess whether the changes observed in this paper are due to mislocalization per se, or rather just reflect differences in nuclear TDP-43 abundance.

      This a fair point. It is possible that microRNA dysregulation might require a greater loss of nuclear TDP43 and maybe more resilient to TDP43 loss as compared to splicing. We have acknowledged this in the discussion section (page 9).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It would be helpful to include nuclear vs cytoplasmic ratios of TDP-43 instead of simply "% nuclear TDP-43"

      We have used % nuclear TDP43 as these values have biologically meaningful upper and lower bounds, which makes it easier to compare across experiments. We found that using a ratio of nuclear vs cytoplasmic TDP43 intensities displayed higher variability and a wider range.

      We have re-labelled the y-axis as “% Nuclear TD43 / soma TDP43” to make our quantification clearer. The conversion from % nuclear TDP43 to N/C is straightforward. If the % nuclear TDP43 is X, then the N/C ratio can be calculated as X / (100-X). For example, a % nuclear TDP43 of 80% would amount to an N/C ratio of 80/20 = 4.

      (2) The axis descriptions in Figure 1D are very unclear. While this is described better in the figure legend, it would be beneficial to have a more descriptive y-axis title in the figure (which may mean increasing the number of graphs).

      Axis descriptions and figures changed as recommended.

      (3) In Figure 1, the time points at which iPSNs were transduced with nanobody and/or fixed for immunostaining is somewhat inconsistent across all panels. This hinders interpretation of the figure as a whole. The authors should use same transduction and immunostaining time points for consistency or demonstrate that the same phenotype is observed regardless of transduction and immunostaining day as long as the time in between (time of nano body expression) is consistent. Subsequently, in Figure 2, a different set of time points is used.

      Please see our response in the public comments above

      (4) In Figure 1, please show individual data points for each independent differentiation to demonstrate the level of reproducibility from batch to batch.

      Data points have been shown per replicate (Supplementary Fig. 2)

      We have refined our approach for phenotypic analysis to improve consistency across different clones. Previously, we set thresholds on % nuclear TDP43 to distinguish MNs with nuclear versus mislocalized TDP43. This was done by ranking all cells based on % nuclear TDP43 and applying quantile-based thresholds—designating the top 25% as control and the bottom 25% as mislocalized, ensuring equal number of cells per category. However, we observed significant variability in thresholds across clones. For instance, the E8 clone had thresholds of 96% and 29%, while the E5 clone had 93% and 40%.

      To address this, we reanalysed the data using a standardized three-bin approach:

      (1) Control: MNs expressing the control nanobody.

      (2) Low-Moderate Mislocalization: MNs expressing the NES nanobody with > 40% nuclear TDP43.

      (3) Severe Mislocalization: MNs expressing the NES nanobody with < 40% nuclear TDP43.

      This approach ensured a more reliable comparison of TDP43 mislocalization effects across experiments. The conclusions remain the same.

      (5) In Figure 2, please show individual data points.

      Data points for all the qPCR analyses in the paper have been included as a supplementary text file.

      (6) In Figure 3, please show individual data points.

      Data points for the western blot data have been included as a supplementary data file.

      All other comments are within the public review.

      Reviewer #2 (Recommendations For The Authors):

      (1) In general more robust quantification of many of the described phenotypes are necessary. In particular, no apparent quantification of cytosolic mislocalization was performed in Figure 1, or quantification of mislocalization of Figure 3F. It is unclear in the western blot in Fig 1G if TDP-43 signal were normalized to total protein, and of note it seems that expression of the intrabody-NES reduced total proteins in the western blots that were shown. No quantification or measurement of the insoluble material was done or shown.

      We have quantified cytosolic mislocalization of TDP43 (Fig. 1C). The y-axis indicates the total TDP43 signal observed in the nucleus as a percentage of the total signal observed in the soma (including the nucleus). This value has the advantage of ranging between 100% (perfectly nuclear) to 0% (complete nuclear loss). The boxplots indicate that expression of the NES-nanobody results in a range of cytosolic mislocalization with a median value around 40% of the TDP43 remaining in the nucleus.

      Western blot data in previous Fig. 1G was normalized to alpha-tubulin. We were unable to get a good signal for the insoluble fraction. From the alpha-tubulin alone, it cannot be concluded that NES-nanobody results in a decrease in total protein levels. In the revised western blot for phosphorylated TDP43 (Fig. 3D, E), we have quantified total and phosphorylated TDP43. Here, we observe a six-fold increase in the levels of phosphorylated TDP43 without a significant change in total TDP43 protein levels.

      To avoid potential mis-interpretation of our results, we have now removed the previous Fig. 1G.

      (2) Additional images of nearly all microscopy data at higher magnifications would be required to better evaluate TDP-43 localization. Ideally including images for each channel in addition to merged images, and especially for key figures such as Figure 1B, 3B, 3F.

      Better images have been provided.

      (3) No control images were shown for Figure 1F and 3F. It is unclear what the bright punctate spots of cytoplasmic TDP-43 GFP signal represent. Are these true aggregates? If so, additional characterization would be required before such conclusions can be made, beyond the relatively superficial western blot analysis that was done in Figure 1.

      Control images have now been provided (Figure 1E). As we mentioned above, immunostaining analysis to characterize whether the aggregates are phosphorylated failed to provide a clear signal. However, we have now confirmed that the mislocalized TDP43 is indeed hyper-phosphorylated (Figure 3D, E). We have acknowledged this in the main text, and have referred to these as puncta reminiscent of aggregates (Page 4, Page 6).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public Review):

      Summary:

      This paper reports an intracranial SEEG study of speech coordination, where participants synchronize their speech output with a virtual partner that is designed to vary its synchronization behavior. This allows the authors to identify electrodes throughout the left hemisphere of the brain that have activity (both power and phase) that correlates with the degree of synchronization behavior. They find that high-frequency activity in the secondary auditory cortex (superior temporal gyrus) is correlated to synchronization, in contrast to primary auditory regions. Furthermore, activity in the inferior frontal gyrus shows a significant phase-amplitude coupling relationship that is interpreted as compensation for deviation from synchronized behavior with the virtual partner.

      Strengths:

      (1) The development of a virtual partner model trained for each individual participant, which can dynamically vary its synchronization to the participant's behavior in real-time, is novel and exciting.

      (2) Understanding real-time temporal coordination for behaviors like speech is a critical and understudied area.

      (3) The use of SEEG provides the spatial and temporal resolution necessary to address the complex dynamics associated with the behavior.

      (4) The paper provides some results that suggest a role for regions like IFG and STG in the dynamic temporal coordination of behavior both within an individual speaker and across speakers performing a coordination task.

      We thank the Reviewer for their positive comments on our manuscript.

      Weaknesses:

      (1) The main weakness of the paper is that the results are presented in a largely descriptive and vague manner. For instance, while the interpretation of predictive coding and error correction is interesting, it is not clear how the experimental design or analyses specifically support such a model, or how they differentiate that model from the alternatives. It's possible that some greater specificity could be achieved by a more detailed examination of this rich dataset, for example by characterizing the specific phase relationships (e.g., positive vs negative lags) in areas that show correlations with synchronization behavior. However, as written, it is difficult to understand what these results tell us about how coordination behavior arises.

      We understand the reviewer’s comment. It is true that this work, being the first in the field using real-time adapting synchronous speech and intracerebral neural data, is a descriptive work, that hopefully will pave the way for further studies. We have now added more statistical analyses (see point 2) to go beyond a descriptive approach and we have also rewritten the discussion to clarify how this work can possibly contribute to disentangle different models of language interaction. Most importantly we have also run new analyses taking into account the specific phase relationship, as suggested.

      We already had an analysis using instantaneous phase difference in the phase-amplitude coupling approach, that bridges phase of behaviour to neural responses (amplitude in the high-frequency range). However, this analysis, as the reviewer noted, does not distinguish between positive and negative lags, but rather uses the continuous fluctuations of coordinative behaviour. Following the reviewer’s suggestion, we have now run a new analysis estimating the average delay (between virtual partner speech and patient speech) in each trial, using a cross-correlation approach. This gives a distribution of delays across trials that can then be “binned” as positive or negative. We have thus rerun the phase-amplitude coupling analyses on positive and negative trials separately, to assess whether the phase amplitude relationship depends upon the anticipatory (negative lags) or compensatory (positive lags) behaviour. Our new analysis (now in the supplementary, see figure below) does not reveal significant differences between positive and negative lags. This lack of difference, although not easy to interpret, is nonetheless interesting because it seems to show that the IFG does not have a stronger coupling for anticipatory trials. Rather the IFG seems to be strongly involved in adjusting behaviour, minimizing the error, independently of whether this is early or late.

      We have updated the “Coupling behavioural and neurophysiological data” section in Materials and methods as follows:  

      “In the third approach, we assessed whether the phase-amplitude relationship (or coupling) depends upon the anticipatory (negative delays) or compensatory (positive delays) behaviour between the VO and the patients’ speech. We computed the average delay in each trial using a cross-correlation approach on speech signals (between patient and VP) with the MATLAB function xcorr. A median split (patient-specific ; average median split = 0ms, average sd = 24ms) was applied to conserve a sufficient amount of data, classifying trials below the median as “anticipatory behaviour” and trials above the median as “compensatory behaviour”. Then we conducted the phase-amplitude coupling analyses on positive and negative trials separately.”

      We also added a paragraph on this finding in the Discussion:

      “Our results highlight the involvement of the inferior frontal gyrus (IFG) bilaterally, in particular the BA44 region, in speech coordination. First, trials with a weak verbal coordination (VCI) are accompanied by more prominent high frequency activity (HFa, Fig.4; Fig.S4). Second, when considering the within-trial time-resolved dynamics, the phase-amplitude coupling (PAC) reveals a tight relation between the low frequency behavioural dynamics (phase) and the modulation of high-frequency neural activity (amplitude, Fig.5B ; Fig.S5). This relation is strongest when considering the phase adjustments rather than the phase of speech of the VP per se : larger deviations in verbal coordination are accompanied by increase in HFa. Additionally, we also tested for potential effects of different asynchronies (i.e., temporal delay) between the participant's speech and that of the virtual partner but found no significant differences (Fig.S6). While lack of delay-effect does not permit to conclude about the sensitivity of BA44 to absolute timing of the partner’s speech, its neural dynamics are linked to the ongoing process of resolving phase deviations and maintaining synchrony.”

      (2) In the results section, there's a general lack of quantification. While some of the statistics reported in the figures are helpful, there are also claims that are stated without any statistical test. For example, in the paragraph starting on line 342, it is claimed that there is an inverse relationship between rho-value and frequency band, "possibly due to the reversed desynchronization/synchronization process in low and high frequency bands". Based on Figure 3, the first part of this statement appears to be true qualitatively, but is not quantified, and is therefore impossible to assess in relation to the second part of the claim. Similarly, the next paragraph on line 348 describes optimal clustering, but statistics of the clustering algorithm and silhouette metric are not provided. More importantly, it's not entirely clear what is being clustered - is the point to identify activity patterns that are similar within/across brain regions? Or to interpret the meaning of the specific patterns? If the latter, this is not explained or explored in the paper.

      The reviewer is right. We have now added statistical analyses showing that:

      (1) the ratio between synchronization and desynchronization evolves across frequencies (as often reported in the literature).

      (2) the sign of rho values also evolves across frequencies.

      (3) the clustering does indeed differ when taking into account behaviour. We have also clarified the use of clustering and the reasoning behind it.

      We have updated the Materials and methods section as follows:

      “The statistical difference between spatial clustering in global effect and brain-behaviour correlation was estimated with linear model using the R function lm (stat package), post-hoc comparisons were corrected for multiple comparisons using the Tukey test (lsmeans R package ; Lenth, 2016). The statistical difference between clustering in global effect and behaviour correlation across the number of clusters was estimated using permutation tests (N=1000) by computing the silhouette score difference between the two conditions.” We have updated the Results section as follows:

      (1) “This modulation between synchronization and desynchronization across frequencies was significant (F(5) = 6.42, p < .001 ; estimated with linear model using the R function lm).”

      (2) “The first observation is a gradual transition in the direction of correlations as we move up frequency bands, from positive correlations at low frequencies to negative ones at high frequencies (F(5) = 2.68, p = .02). This effect, present in both hemispheres, mimics the reversed desynchronization/synchronization process in low and high frequency bands reported above.”

      (3) “Importantly, compared to the global activity (task vs rest, Fig 3A), the neural spatial profile of the behaviour-related activity (Fig 3B) is more clustered, in the left hemisphere. Indeed, silhouette scores are systematically higher for behaviour-related activity compared to global activity, indicating greater clustering consistency across frequency bands (t(106) = 7.79, p < .001, see Figure S3). Moreover, silhouette scores are maximal, in particular for HFa, for five clusters (p < .001), located in the IFG BA44, the IPL BA 40 and the STG BA 41/42 and BA22 (see Figure S3).”

      (3) Given the design of the stimuli, it would be useful to know more about how coordination relates to specific speech units. The authors focus on the syllabic level, which is understandable. But as far as the results relate to speech planning (an explicit point in the paper), the claims could be strengthened by determining whether the coordination signal (whether error correction or otherwise) is specifically timed to e.g., the consonant vs the vowel. If the mechanism is a phase reset, does it tend to occur on one part of the syllable?

      Thank you for this thoughtful feedback. We agree that the relationship between speech coordination and specific speech units, such as consonants versus vowels, is an intriguing question. However, in our study, both interlocutors (the participant and the virtual partner) are adapting their speech production in real-time. This interactive coordination makes it difficult to isolate neural signatures corresponding to precise segments like consonants or vowels, as the adjustments occur in a continuous and dynamic context.

      The VP's ability to adapt depends on its sensitivity to spectral cues, such as the transition from one phonetic element to another. This is likely influenced by the type of articulation, with certain transitions being more salient (e.g., between a stop consonant like "p" and a vowel like "a") and others being less distinct (e.g., between nasal consonants like "m" and a vowel). Thus, the VP’s spectral adaptation tends to occur at these transitions, which are more prominent in some cases than in others.

      For the participants, previous studies have shown a greater sensitivity during the production of stressed vowels (Oschkinat & Hoole, 2022; Li & Lancia, 2024), which may reflect a heightened attentional or motor adjustment to stressed syllables.

      Here, we did not specifically address the question of coordination at the level of individual linguistic units. Moreover, even if we attempted to focus on this level, it would be challenging to relate neural dynamics directly to specific speech segments. The question of how synchronization at the level of individual linguistic units might relate to neural data is complex. The lack of clear, unit-specific predictions makes it difficult to parse out distinct neural signatures tied to individual segments, particularly when both interlocutors are continuously adjusting their speech in relation to one another.

      Therefore, while we recognize the potential importance of examining synchronization at the level of individual phonetic elements, the design of our task and the nature of the coordination in this interactive context (realtime bidirection adaptation) led us to focus more broadly on the overall dynamics of speech synchronization at the syllabic level, rather than on specific linguistic units.

      We now state at the end of the Discussion section:

      “It is worth noting that the influence of specific speech units, such as consonants versus vowels, on speech coordination remains to be explored. In non-interactive contexts, participants show greater sensitivity during the production of stressed vowels, possibly reflecting heightened attentional or motor adjustments (Oschkinat & Hoole, 2022; Li & Lancia, 2024). In this study, the VP’s adaptation relies on sensitivity to spectral cues, particularly phonetic transitions, with some (e.g., formant transitions) being more salient than others. However, how these effects manifest in an interactive setting remains an open question, as both interlocutors continuously adjust their speech in real time. Future studies could investigate whether coordination signals, such as phase resets, preferentially align with specific parts of the syllable.” References cited:

      – Oschkinat, M., & Hoole, P. (2022). Reactive feedback control and adaptation to perturbed speech timing in stressed and unstressed syllables. Journal of Phonetics, 91, 101133.

      – Li, J., & Lancia, L. (2024). A multimodal approach to study the nature of coordinative patterns underlying speech rhythm. In Proc. Interspeech, 397-401.

      (4) In the discussion the results are related to a previously-described speech-induced suppression effect. However, it's not clear what the current results have to do with SIS, since the speaker's own voice is present and predictable from the forward model on every trial. Statements such as "Moreover, when the two speech signals come close enough in time, the patient possibly perceives them as its own voice" are highly speculative and apparently not supported by the data.

      We thank the reviewer for raising thoughtful concerns about our interpretation of the observed neural suppression as related to speaker-induced suppression (SIS). We agree that our study lacks a passive listening condition, which limits direct comparisons to the original SIS effect, traditionally defined as the suppression of neural responses to self-produced speech compared to externally-generated speech (Meekings & Scott, 2021).

      In response, we have reconsidered our terminology and interpretation. In the revised Discussion section, we refer to our findings as a "SIS-related phenomenon specific to the synchronous speech context". Unlike classic SIS paradigms, our interactive task involves simultaneous monitoring of self- and externally-generated speech, introducing additional attentional and coordinative demands.

      The revised Discussion also incorporates findings by Ozker et al. (2022, 2024), which link SIS and speech monitoring, suggesting that suppressing responses to self-generated speech facilitates error detection. We propose that the decrease in high-frequency activity (HFa) as verbal coordination increases reflects reduced error signals due to closer alignment between perceived and produced speech. Conversely, HFa increases with reduced coordination may signify greater prediction error.

      Additionally, we relate our findings to the "rubber voice" effect (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021), where temporally and phonetically congruent external speech can be perceived as self-generated. We speculate that this may occur in synchronous speech tasks when the participant's and VP's speech signals closely align. However, this interpretation remains speculative, as no subjective reports were collected to confirm this perception. Future studies could include participant questionnaires to validate this effect and relate subjective experience to neural measures of synchronization.

      Overall, our findings extend the study of SIS to dynamic, interactive contexts and contribute to understanding internal forward models of speech production in more naturalistic scenarios.

      We have now added these points to the discussion as follows:

      “The observed negative correlation between verbal coordination and high-frequency activity (HFa) in STG BA22 suggests a suppression of neural responses as the degree of behavioural synchrony increases. This result is reminiscent of findings on speaker-induced suppression (SIS), where neural activity in auditory cortex decreases during self-generated speech compared to externally-generated speech (Meekings & Scott, 2021; Niziolek et al., 2013). However, our paradigm differs from traditional SIS studies in two critical ways: (1) the speaker's own voice is always present and predictable from the forward model, and (2) no passive listening condition was included. Therefore, our findings cannot be directly equated with the original SIS effect.

      Instead, we propose that the suppression observed here reflects a SIS-related phenomenon specific to the synchronous speech context. Synchronous speech requires simultaneous monitoring of self- and externallygenerated speech, a task that is both attentionally demanding and coordinative. This aligns with evidence from Ozker et al. (2024, 2022), showing that the same neural populations in STG exhibit SIS and heightened responses to feedback perturbations. These findings suggest that SIS and speech monitoring are related processes, where suppressing responses to self-generated speech facilitates error detection. In our study, suppression of HFa as coordination increases may reflect reduced prediction errors due to closer alignment between perceived and produced speech signals. Conversely, increased HFa during poor coordination may signify greater mismatch, consistent with prediction error theories (Houde & Nagarajan, 2011; Friston et al., 2020). Furthermore, when self- and externally-generated speech signals are temporally and phonetically congruent, participants may perceive external speech as their own. This echoes the "rubber voice" effect, where external speech resembling self-produced feedback is perceived as self-generated (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021). While this interpretation remains speculative, future studies could incorporate subjective reports to investigate this phenomenon in more detail.” References cited:

      – Franken, M. K., Hartsuiker, R. J., Johansson, P., Hall, L., & Lind, A. (2021). Speaking With an Alien Voice: Flexible Sense of Agency During Vocal Production. Journal of Experimental Psychology-Human perception and performance, 47(4), 479-494. https://doi.org/10.1037/xhp0000799

      – Houde, J. F., & Nagarajan, S. S. (2011). Speech production as state feedback control. Frontiers in human neuroscience, 5, 82.

      – Lind, A., Hall, L., Breidegard, B., Balkenius, C., & Johansson, P. (2014). Speakers' acceptance of real-time speech exchange indicates that we use auditory feedback to specify the meaning of what we say. Psychological Science, 25(6), 1198-1205. https://doi.org/10.1177/0956797614529797

      – Meekings, S., & Scott, S. K. (2021). Error in the Superior Temporal Gyrus? A Systematic Review and Activation Likelihood Estimation Meta-Analysis of Speech Production Studies. Journal of Cognitive Neuroscience, 33(3), 422-444. https://doi.org/10.1162/jocn_a_01661

      – Niziolek C. A., Nagarajan S. S., Houde J. F (2013) What does motor efference copy represent? Evidence from speech production Journal of Neuroscience 33:16110–16116Ozker M., Doyle W., Devinsky O., Flinker A (2022) A cortical network processes auditory error signals during human speech production to maintain fluency PLoS Biology 20.

      – Ozker, M., Yu, L., Dugan, P., Doyle, W., Friedman, D., Devinsky, O., & Flinker, A. (2024). Speech-induced suppression and vocal feedback sensitivity in human cortex. eLife, 13, RP94198. https://doi.org/10.7554/eLife.94198

      – Zheng, Z. Z., MacDonald, E. N., Munhall, K. G., & Johnsrude, I. S. (2011). Perceiving a Stranger's Voice as Being One's Own: A 'Rubber Voice' Illusion? PLOS ONE, 6(4), e18655.

      (5) There are some seemingly arbitrary decisions made in the design and analysis that, while likely justified, need to be explained. For example, how were the cutoffs for moderate coupling vs phase-shifted coupling (k ~0.09) determined? This is noted as "rather weak" (line 212), but it's not clear where this comes from. Similarly, the ROI-based analyses are only done on regions "recorded in at least 7 patients" - how was this number chosen? How many electrodes total does this correspond to? Is there heterogeneity within each ROI?

      The reviewer is correct, we apologize for this missing information. We now specify that the coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization, but keeping the phase-shifted coupling at a rather implicit level.  

      Concerning the definition of coupling as weak, one should consider that, in the Kuramoto model, the strength of coupling (k) is relative to the spread of the natural frequencies (Δω) in the system. In our study, the natural frequencies of syllables range approximately from 2 Hz to 10Hz, resulting in a frequency spread of Δω = 8 Hz. For coupling to strongly synchronize oscillators across such a wide range, k must be comparable to or exceed Δω. Thus, since k = 0.1 is far much smaller than Δω, it is therefore classified as weak coupling.

      We have now modified the Materials and methods section as follows:

      “More precisely, for a third of the trials the VP had a neutral behaviour (close to zero coupling: k = +/- 0.01). For a third it had a moderate coupling, meaning that the VP synchronised more to the participant speech (k = -0.09). And for the last third of the trials the VP had a moderate coupling but with a phase shift of pi/2, meaning that it moderately aimed to speak in between the participant syllables (k = + 0.09). The coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization but keeping the phase-shifted coupling at a rather implicit level. In other terms, while participants knew that the VP would adapt, they did not necessarily know in which direction the coupling went.”

      Regarding the criterion of including regions recorded in at least 7 patients, our goal was to balance data completeness with statistical power. Given our total sample of 16 patients, this threshold ensures that each included region is represented in at least ~44% of the cohort, reducing the likelihood of spurious findings due to extremely small sample sizes. This choice also aligns with common neurophysiological analysis practices, where a minimum number of subjects (at least 2 in extreme cases) is required to achieve meaningful interindividual comparisons while avoiding excessive data exclusion. Additionally, this threshold maintains a reasonable tradeoff between maximizing patient inclusion and ensuring that statistical tests remain robust.

      We have now added more information in the Results section “Spectral profiles in the language network are nuanced by behaviour” on this point as follows:

      “To balance data completeness and statistical power, we included only brain regions recorded in at least 7 patients (~44% of the cohort) for the left hemisphere and at least 5 patients for the right hemisphere (~31% of the cohort), ensuring sufficient representation while minimizing biases due to sparse data.”

      Reviewer #2 (Public Review):

      Summary:

      This paper investigates the neural underpinnings of an interactive speech task requiring verbal coordination with another speaker. To achieve this, the authors recorded intracranial brain activity from the left hemisphere in a group of drug-resistant epilepsy patients while they synchronised their speech with a 'virtual partner'. Crucially, the authors were able to manipulate the degree of success of this synchronisation by programming the virtual partner to either actively synchronise or desynchronise their speech with the participant, or else to not vary its speech in response to the participant (making the synchronisation task purely one-way). Using such a paradigm, the authors identified different brain regions that were either more sensitive to the speech of the virtual partner (primary auditory cortex), or more sensitive to the degree of verbal coordination (i.e. synchronisation success) with the virtual partner (secondary auditory cortex and IFG). Such sensitivity was measured by (1) calculating the correlation between the index of verbal coordination and mean power within a range of frequency bands across trials, and (2) calculating the phase-amplitude coupling between the behavioural and brain signals within single trials (using the power of high-frequency neural activity only). Overall, the findings help to elucidate some of the left hemisphere brain areas involved in interactive speaking behaviours, particularly highlighting the highfrequency activity of the IFG as a potential candidate supporting verbal coordination.

      Strengths:

      This study provides the field with a convincing demonstration of how to investigate speaking behaviours in more complex situations that share many features with real-world speaking contexts e.g. simultaneous engagement of speech perception and production processes, the presence of an interlocutor, and the need for inter-speaker coordination. The findings thus go beyond previous work that has typically studied solo speech production in isolation, and represent a significant advance in our understanding of speech as a social and communicative behaviour. It is further an impressive feat to develop a paradigm in which the degree of cooperativity of the synchronisation partner can be so tightly controlled; in this way, this study combines the benefits of using prerecorded stimuli (namely, the high degree of experimental control) with the benefits of using a live synchronisation partner (allowing the task to be truly two-way interactive, an important criticism of other work using pre-recorded stimuli). A further key strength of the study lies in its employment of stereotactic EEG to measure brain responses with both high temporal and spatial resolution, an ideal method for studying the unfolding relationship between neural processing and this dynamic coordination behaviour.

      We sincerely appreciate the Reviewer's thoughtful and positive feedback on our manuscript.

      Weaknesses:

      One major limitation of the current study is the lack of coverage of the right hemisphere by the implanted electrodes. Of course, electrode location is solely clinically motivated, and so the authors did not have control over this. However, this means that the current study neglects the potentially important role of the right hemisphere in this task. The right hemisphere has previously been proposed to support feedback control for speech (likely a core process engaged by synchronous speech), as opposed to the left hemisphere which has been argued to underlie feedforward control (Tourville & Guenther, 2011). Indeed, a previous fMRI study of synchronous speech reported the engagement of a network of right hemisphere regions, including STG, IPL, IFG, and the temporal pole (Jasmin et al., 2016). Further, the release from speech-induced suppression during a synchronous speech reported by Jasmin et al. was found in the right temporal pole, which may explain the discrepancy with the current finding of reduced leftward high-frequency activity with increasing verbal coordination (suggesting instead increased speech-induced suppression for successful synchronisation). The findings should therefore be interpreted with the caveat that they are limited to the left hemisphere, and are thus likely missing an important aspect of the neural processing underpinning verbal coordination behaviour.

      We have now included, in the supplementary materials, data from the right hemisphere, although the coverage is a bit sparse (Figures S2, S4, S5, see our responses in the ‘Recommendation for the authors’ section, below). We have also revised the Discussion section to add the putative role of right temporal regions (see below as well).

      A further limitation of this study is that its findings are purely correlational in nature; that is, the results tell us how neural activity correlates with behaviour, but not whether it is instrumental in that behaviour. Elucidating the latter would require some form of intervention such as electrode stimulation, to disrupt activity in a brain area and measure the resulting effect on behaviour. Any claims therefore as to the specific role of brain areas in verbal coordination (e.g. the role of the IFG in supporting online coordinative adjustments to achieve synchronisation) are therefore speculative.

      We appreciate the reviewer’s observation regarding the correlational nature of our findings and agree that this is a common limitation of neuroimaging studies. While elucidating causal relationships would indeed require intervention techniques such as electrical stimulation, our study leverages the unique advantages of intracerebral recordings, offering the best available spatial and temporal resolution alongside a high signal-tonoise ratio. These attributes ensure that our data accurately reflect neural activity and its temporal dynamics, providing a robust foundation for understanding the relationship between neural processes and behaviour. Therefore, while causal claims are beyond the scope of this study, the precision of our methodology allows us to make well-supported observations about the neural correlates of synchronous speech tasks.

      Recommendations for the authors:

      Reviewing Editor Comment:

      After joint consultation, we are seeing the potential for the report to be strengthened and the evidence here to be deemed ultimately at least 'solid': to us (editors and reviewers) it seems that this would require both (1) clarifying/acknowledging the limitations of not having right hemisphere data, and (2) running some of the additional analyses the reviewers suggest, which should allow for richer examination of the data e.g. phase relationships in areas that correlate with synchronisation.

      We have now added data on the right hemisphere (RH) that we did not previously report due to a rather sparse sampling of the RH. These results are now reported in the Results section as well as in the Supplementary section, where we put all right hemisphere figures for all analyses (Figure S2, S4, S5). We have also run additional analyses digging into the phase relationship in areas that correlate with synchronisation (Figure S6). These additional analyses allowed us to improve the Discussion section as well.

      Reviewer #1 (Recommendations For The Authors):

      In some sections, the writing is a bit unclear, with both typos and vague statements that could be fixed with careful proofreading.

      We thank the reviewer for pointing out areas where the writing could be improved. We carefully proofread the manuscript to address typos and clarify any vague statements. Specific sections identified as unclear have been rephrased for better precision and readability.

      In Figure 1, the colors repeat, making it impossible to tell patients apart.

      We have now updated Figure 1 colormap to avoid redundancy and added the right hemisphere.

      Line 132: "16 unilateral implantations (9 left, 7 bilateral implantations)". Should this say 7 right hemisphere? If so, the following sentence stating that there was "insufficient cover [sic] of the right hemisphere" is unclear, since the number of patients between LH and RH is similar.

      The confusion was due to the fact that the lateralization refers to the presence/absence of electrodes in the Heschl’s gyrus (left : H’ ; right : H) exclusively.

      We have thus changed this section as follows:

      “16 patients (7 women, mean age 29.8 y, range 17 - 50 y) with pharmacoresistant epilepsy took part in the study. They were included if their implantation map covered at least partially the Heschl's gyrus and had sufficiently intact diction to support relatively sustained language production.” The relevant part (previously line 132) now states:

      “Sixteen patients with a total of 236 electrodes (145 in the left hemisphere) and 2395 contacts (1459 in the left hemisphere, see Figure 1). While this gives a rather sparse coverage of the right hemisphere, we decided, due to the rarity of this type of data, to report results for both hemispheres, with figures for the left hemisphere in the main text and figures for the right hemisphere in the supplementary section.”

      Reviewer #2 (Recommendations For The Authors):

      (1) To address the concern regarding the absence of data from the right hemisphere, I would advise the authors to directly acknowledge this limitation in their Discussion section, citing relevant work suggesting that the right hemisphere has an important role to play in this task (e.g. Jasmin et al., 2016). You should also make this clear in your abstract e.g. you could rewrite the sentence in line 40 to be: "Then, we recorded the intracranial brain activity of the left hemisphere in 16 patients with drug-resistant epilepsy...".

      We are grateful to the reviewer for this comment that incited us to look into the right hemisphere data. We have now included results in the right hemisphere, although the coverage is a bit sparse. We have also revised the Discussion section to add the putative role of right temporal regions. Interestingly, our results show, as suggested by the reviewer, a clear involvement of the RH in this task.

      First, the full brain analyses show a very similar implication of the RH as compared to the LH (see Figure below). We have now added in the Results section:

      “As expected, the whole language network is strongly involved, including both dorsal and ventral pathways (Fig 3A). More precisely, in the left temporal lobe the superior, middle and inferior temporal gyri, in the left parietal lobe the inferior parietal lobule (IPL) and in the left frontal lobe the inferior frontal gyrus (IFG) and the middle frontal gyrus (MFG). Similar results are observed in the right hemisphere, neural responses being present across all six frequency bands with medium to large modulation in activity compared to baseline (Figure S2A) in the same regions. Desynchronizations are present in the theta, alpha and beta bands while the low gamma and HFa bands show power increases.”

      As to compared to the left hemisphere, assessing brain-behaviour correlations in the right hemisphere does not provide the same statistical power, because some anatomical regions have very few electrodes. Nonetheless, we observe a strong correlation in the right IFG, similar to the one we previously reported in the left hemisphere, and we now report in the Results section:

      “The decrease in HFa along the dorsal pathway is replicated in the right hemisphere (Figure S4). However, while both the right STG BA41/42 and STG BA22 present a power increase (compared to baseline) — with a stronger increase for the STG BA41/42 — neither shows a significant correlation with verbal coordination (t(45)=-1.65, p=.1 ; t(8)=-0.67, p=.5 ; Student’s T test, FDR correction). By contrast, results in the right IFG BA44 are similar to the one observed in the left hemisphere with a significant power increase associated with a negative brainbehaviour correlation (t(17) = -3.11, p = .01 ; Student’s T test, FDR correction).”

      Interestingly, the phase-amplitude coupling analysis yields very similar results in both hemispheres (exception made for BA22). We have thus updated the Results section as follows:

      “Notably, when comparing – within the regions of interest previously described – the PAC with the virtual partner speech and the PAC with the phase difference, the coupling relationship changes when moving along the dorsal pathway: a stronger coupling in the auditory regions with the speech input, no difference between speech and coordination dynamics in the IPL and a stronger coupling for the coordinative dynamics compared to speech signal in the IFG (Figure 5B ). When looking at the right hemisphere, we observe the same changes in the coupling relationship when moving along the dorsal pathway, except that no difference between speech and coordination dynamics is present in the right secondary auditory regions (STG BA22; Figure S5).”

      We also included in the Discussion section the right hemisphere results also mentioning previous work of Guenther and the one of Jasmin. On the section “Left secondary auditory regions are more sensitive to coordinative behaviour” one can read:

      “Furthermore, the absence of correlation in the right STG BA22 (Figure S4) seems in first stance to challenge influential speech production models (e.g. Guenther & Hickok, 2016) that propose that the right hemisphere is involved in feedback control. However, one needs to consider the the task at stake heavily relied upon temporal mismatches and adjustments. In this context, the left-lateralized sensitivity to verbal coordination reminds of the works of Floegel and colleagues (2020, 2023) suggesting that both hemispheres are involved depending on the type of error: the right auditory association cortex monitoring preferentially spectral speech features and the left auditory association cortex monitoring preferentially temporal speech features. Nonetheless, the right temporal pole seems to be sensitive to speech coordinative behaviour, confirming previous findings using fMRI (Jasmin et al., 2016) and thus showing that the right hemisphere has an important role to play in this type of tasks (e.g. Jasmin et al., 2016).”

      References cited:

      – Floegel, M., Fuchs, S., & Kell, C. A. (2020). Differential contributions of the two cerebral hemispheres to temporal and spectral speech feedback control. Nature Communications, 11(1), 2839.

      – Floegel, M., Kasper, J., Perrier, P., & Kell, C. A. (2023). How the conception of control influences our understanding of actions. Nature Reviews Neuroscience, 24(5), 313-329.

      – Guenther, F. H., & Hickok, G. (2016). Neural models of motor speech control. In Neurobiology of language (pp. 725-740). Academic Press.

      (2) When discussing previous work on alignment during synchronous speech, you may wish to include a recently published paper by Bradshaw et al (2024); this manipulated the acoustics of the accompanist's voice during a synchronous speech task to show interactions between speech motor adaptation and phonetic convergence/alignment.

      We thank the reviewer for pointing to this recent and interesting paper. We added the article as reference as follows

      “Furthermore, synchronous speech favors the emergence of alignment phenomena, for instance of the fundamental frequency or the syllable onset (Assaneo et al., 2019 ; Bradshaw & McGettigan, 2021 ; Bradshaw et al., 2023; Bradshaw et al., 2024).”

      (3) Line 80: "Synchronous speech resembles to a certain extent to delayed auditory feedback tasks"- I think you mean "altered auditory feedback tasks" here.

      In the case of synchronous speech it is more about timing than altered speech signals, that is why the comparison is done with delayed and not altered auditory feedback. Nonetheless, we understand the Reviewer’s point and we have now changed the sentence as follows:

      “Synchronous speech resembles to a certain extent to delayed/altered auditory feedback tasks”

      (4) When discussing superior temporal responses during such altered feedback tasks, you may also want to cite a review paper by Meekings and Scott (2021).

      We thank the reviewer for this suggestion, indeed this was a big oversight!

      The paper is now quoted in the introduction as follows:

      “Previous studies have revealed increased responses in the superior temporal regions compared to normal feedback conditions (Hirano et al., 1997 ; Hashimoto & Sakai, 2003 ; Takaso et al., 2010 ; Ozerk et al., 2022 ; Floegel et al., 2020 ; see Meekings & Scott, 2021 for a review of error-monitoring and feedback control in the STG during speech production).”

      Furthermore, we updated the discussion part concerning the speaker-induced suppression phenomenon (see below our response to the point 10).

      (5) Line 125: "The parameters and sound adjustment were set using an external low-latency sound card (RME Babyface Pro Fs)". Can you please report the total feedback loop latency in your set-up? Or at the least cite the following paper which reports low latencies with this audio device.

      Kim, K. S., Wang, H., & Max, L. (2020). It's About Time: Minimizing Hardware and Software Latencies in Speech Research With Real-Time Auditory Feedback. Journal of Speech, Language, and Hearing Research, 63(8), 25222534. https://doi.org/10.1044/2020_JSLHR-19-00419

      We now report the total feedback loop latency (~5ms) and also cite the relevant paper (Kim et al., 2020).

      (6) Line 127 "A calibration was made to find a comfortable volume and an optimal balance for both the sound of the participant's own voice, which was fed back through the headphones, and the sound of the stimuli." What do you mean here by an 'optimal balance'? Was the participant's own voice always louder than the VP stimuli? Can you report roughly what you consider to be a comfortable volume in dB?

      This point was indeed unlcear. We have now changed as follows:

      “A calibration was made to find a comfortable volume and an optimal balance for both the sound of the participant's own voice, which was fed back through the headphones, and the sound of the stimuli. The aim of this procedure was that the patient would subjectively perceive their voice and the VP-voice in equal measure. VP voice was delivered at approximately 70dB.”

      (7) Relatedly, did you use any noise masking to mask the air-conducted feedback from their own voice (which would have been slightly out of phase with the feedback through the headphones, depending on your latency)?

      Considering the low-latency condition allowed with the sound card (RME Babyface Pro Fs), we did not use noise masking to mask the air-conducted feedback from the self-voice of the patients.

      (8) Line 141: "four short sentences were pre-recorded by a woman and a man." Did all participants synchronise with both the man and woman or was the VP gender matched to that of the participant/patient?

      We thank the reviewer for this important missing detail. We know changed the text as follows:

      “Four stimuli corresponding to four short sentences were pre-recorded by both a female and a male speaker. This allowed to adapt to the natural gender differences in fundamental frequency (i.e. so that the VP gender matched that of the patients). All stimuli were normalised in amplitude.”

      (9) Can you clarify what instructions participants were given regarding the VP? That is, were they told that this was a recording or a real live speaker? Were they naïve to the manipulation of the VP's coupling to the participant?

      We have now added this information to the task description as follows:

      “Participants, comfortably seated in a medical chair, were instructed that they would perform a real-time interactive synchronous speech task with an artificial agent (Virtual Partner, henceforth VP, see next section) that can modulate and adapt to the participant’s speech in real time.”

      “The third step was the actual experiment. This was identical to the training but consisted of 24 trials (14s long, speech rate ~3Hz, yielding ~1000 syllables). Importantly, the VP varied its coupling behaviour to the participant. More precisely, for a third of the sequences the VP had a neutral behaviour (close to zero coupling : k = +/- 0.01). For a third it had a moderate coupling, meaning that the VP synchronised more to the participant speech (k = - 0.09). And for the last third of the sequences the VP had a moderate coupling but with a phase shift of pi/2, meaning that it moderately aimed to speak in between the participant syllables (k = + 0.09). The coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization, but keeping the phase-shifted coupling at a rather implicit level. In other terms, while participants knew that the VP would adapt, they did not necessarily know in which direction the coupling went.”  

      (10) The paragraph from line 438 entitled "Secondary auditory regions are more sensitive to coordinative behaviour" includes an interesting discussion of the relation of the current findings to the phenomenon of speech-induced suppression (SIS). However, the authors appear to equate the observed decrease in highfrequency activity as speech coordination increases with the phenomenon of SIS (in lines 456-457), which is quite a speculative leap. I would encourage the authors to temper this discussion by referring to SIS as a potentially related phenomenon, with a need for more experimental work to determine if this is indeed the same phenomenon as the decreases in high-frequency power observed here. I believe that the authors are arguing here for an interpretation of SIS as reflecting internal modelling of sensory input regardless of whether this is self-generated or other-generated; if this is indeed the case, I would ask the authors to be more explicit here that these ideas are not a standard part of the traditional account of SIS, which only includes internal modelling of self-produced sensory feedback.

      As stated in the public review, we thank both reviewers for raising thoughtful concerns about our interpretation of the observed neural suppression as related to speaker-induced suppression (SIS). We agree that our study lacks a passive listening condition, which limits direct comparisons to the original SIS effect, traditionally defined as the suppression of neural responses to self-produced speech compared to externally-generated speech (Meekings & Scott, 2021).

      In response, we have reconsidered our terminology and interpretation. In the revised discussion, we refer to our findings as a "SIS-related phenomenon specific to the synchronous speech context." Unlike classic SIS paradigms, our interactive task involves simultaneous monitoring of self- and externally-generated speech, introducing additional attentional and coordinative demands.

      The revised discussion also incorporates findings by Ozker et al. (2024, 2022), which link SIS and speech monitoring, suggesting that suppressing responses to self-generated speech facilitates error detection. We propose that the decrease in high-frequency activity (HFa) as verbal coordination increases reflects reduced error signals due to closer alignment between perceived and produced speech. Conversely, HFa increases with reduced coordination may signify greater prediction error.

      Additionally, we relate our findings to the "rubber voice" effect (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021), where temporally and phonetically congruent external speech can be perceived as self-generated. We speculate that this may occur in synchronous speech tasks when the participant's and VP's speech signals closely align. However, this interpretation remains speculative, as no subjective reports were collected to confirm this perception. Future studies could include participant questionnaires to validate this effect and relate subjective experience to neural measures of synchronization.

      Overall, our findings extend the study of SIS to dynamic, interactive contexts and contribute to understanding internal forward models of speech production in more naturalistic scenarios.

      We have now added these points to the discussion as follows:

      “The observed negative correlation between verbal coordination and high-frequency activity (HFa) in STG BA22 suggests a suppression of neural responses as the degree of synchrony increases. This result aligns with findings on speaker-induced suppression (SIS), where neural activity in auditory cortex decreases during self-generated speech compared to externally-generated speech (Meekings & Scott, 2021; Niziolek et al., 2013). However, our paradigm differs from traditional SIS studies in two critical ways: (1) the speaker's own voice is always present and predictable from the forward model, and (2) no passive listening condition was included. Therefore, our findings cannot be directly equated with the original SIS effect.

      Instead, we propose that the suppression observed here reflects a SIS-related phenomenon specific to the synchronous speech context. Synchronous speech requires simultaneous monitoring of self- and externally generated speech, a task that is both attentionally demanding and coordinative. This aligns with evidence from Ozker et al. (2024, 2022), showing that the same neural populations in STG exhibit SIS and heightened responses to feedback perturbations. These findings suggest that SIS and speech monitoring are related processes, where suppressing responses to self-generated speech facilitates error detection.

      In our study, suppression of HFa as coordination increases may reflect reduced prediction errors due to closer alignment between perceived and produced speech signals. Conversely, increased HFa during poor coordination may signify greater mismatch, consistent with prediction error theories (Houde & Nagarajan, 2011; Friston et al., 2020).”

      (11) Within this section, you also speculate in line 460 that "Moreover, when the two speech signals come close enough in time, the patient possibly perceives them as its own voice." I would recommend citing studies on the 'rubber voice' effect to back up this claim (e.g. Franken et al., 2021; Lind et al., 2014; Zheng et al., 2011).

      We are grateful to the Reviewer for this interesting suggestion. Directly following the previous comment, the section now states:

      “Furthermore, when self- and externally-generated speech signals are temporally and phonetically congruent, participants may perceive external speech as their own. This echoes the "rubber voice" effect, where external speech resembling self-produced feedback is perceived as self-generated (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021). While this interpretation remains speculative, future studies could incorporate subjective reports to investigate this phenomenon in more detail.”

      (12) As noted in my public review, since your methods are correlational, you need to be careful about inferring the causal role of any brain areas in supporting a specific aspect of functioning e.g. line 501-504: "By contrast, in the inferior frontal gyrus, the coupling in the high-frequency activity is strongest with the input-output phase difference (input of the VP - output of the speaker), a metric that reflects the amount of error in the internal computation to reach optimal coordination, which indicates that this region optimises the predictive and coordinative behaviour required by the task." I would argue that the latter part of this sentence is a conclusion that, although consistent with, goes beyond the current data in this study, and thus needs tempering.

      We agree with the Reviewer and changed the sentence as follows:

      “By contrast, in the inferior frontal gyrus, the coupling in the high-frequency activity is strongest with the inputoutput phase difference (input of the VP - output of the speaker), a metric that could possibly reflect the amount of error in the internal computation to reach optimal coordination. This indicates that this region could have an implication in the optimisation of the predictive and coordinative behaviour required by the task.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      Recommendations  Analysis: 

      (1) Given that a MER21B/C LTR was not immediately identified at the start site of the Liz lncRNA in the mouse, and its match is only 46%, this raises the question of whether an analogous LTR would be identified at the homologous location in other species on deeper analysis. The authors need to argue that what has been conserved in the LTR alone in mouse is the essential element conferring the ability to initiate transcription of Liz. A transient reporter assay might be sufficient to do this. 

      We believe that the 46% identity between the first exon of mouse Liz and the consensus sequence of MER21C is so weak that its traces as MER21C are too attenuated to be detected by standard in silico analyses, such as homology searches. For instance, when pairwise alignments are performed between the first exon of mouse Liz and the consensus sequences of solo-LTRs other than MER21C, MER21C does not emerge as the most similar sequence (Figure 5 – figure supplement 1). This is in stark contrast to similar analyses involving the first exon of human and rabbit GPR1AS (which overlaps with MER21C), where MER21C is identified as the most similar sequence. [pages: 26, 31-32]

      The positions of these LTRs were initially annotated using RepeatMasker. To ensure robust analysis, we performed additional searches with RepeatMasker under more sensitive conditions, adjusting search engines (e.g., RMblast to HMMER or Cross-match) and sensitivity settings. Nevertheless, MER21C or closely related LTRs were still undetectable in mouse, rat, and hamster (Figure 4 – figure supplement 1). However, a multiple genome alignment generated by Cactus/UCSC revealed a syntenic region corresponding to the first exon of human GPR1-AS, overlapping with LTR21C, in the genomes of mice, as well as rats and hamsters (Figure 4 – figure supplement 2). Although RepeatMasker did not annotate MER21C at the GPR1 locus in these species, homologous regions were observed across all selected Euarchontoglires. Due to the limitations of the Cactus alignment track in delineating precise homologous boundaries across species, extracting sequences for evolutionary tree construction was not feasible. Nevertheless, these findings support the hypothesis that the first exon of GPR1-AS (Liz in mice) originated from a MER21C insertion in the common ancestor of Euarchontoglires. [pages: 21, 24-25]

      A combination of traditional annotation of repetitive elements using RepeatMasker and the reconstruction of ancestral genomes through multiple genome alignment can reveal highly degenerated LTR relics. This approach is likely to point to significant future directions for research. This point is further elaborated in the discussion section. [page 42]

      Furthermore, in response to the reviewer's suggestion, we investigated the promoter activity of the GPR1-AS and Liz first exons, which are hypothesized to have originated from the same MER21C insertion. Using a dual reporter assay, we demonstrated that the first exon of mouse Liz exhibits promoter activity in a human cell line comparable to that of the human GPR1-AS promoter. Thus, despite the relatively low sequence similarity between the Liz first exon and the MER21C consensus sequence (46% as determined by pairwise alignment, Figure 5 – figure supplement 2), the promoter activity remains functionally conserved. We further discuss the potential functional motifs within the putative MER21C LTR-derived sequences in Figure 4B-D. Taken together, these findings suggest that despite a high level of degeneracy of the promoter region in rodents, including mice, the most parsimonious explanation for the origin of this regulatory element in rodents is the presence of the same LTR relic detectable in humans/primates, which is essential for robust transcription initiation of Liz and GPR1-AS, respectively. [pages: 27, 32]

      (2) Imprinting will depend on an initiating mechanism in the germline, in addition to events in the embryo that induce the secondary DMR at ZDBF2. The authors should therefore examine as far as possible the presence of a gDMR in the species with/without GPR1-AS1 and ZDBF2 imprinting. Whole-genome bisulphite sequencing data from oocytes and sperm should exist for some of the relevant species (e.g., pig, cow: Ivanova et al. 2020 PMID: 32393379; Lu et al. 2012 PMID: 34818044). 

      As the reviewer noted, the presence of a gDMR is essential for the establishment of imprinting. Following another reviewer's suggestion, we have now demonstrated that the ZDBF2 gene in rhesus monkeys is also subject to imprinting (see Figure 3C-D). We also acquired whole genome bisulfite sequencing data for rhesus monkey sperm and oocytes, identified DMRs between them, and discovered an oocyte-specifically methylated gDMR in the first exon of GPR1-AS (which overlaps with MER21C)(Figure 3 – figure supplement 1A). This finding is consistent with observations in humans and mice. Conversely, we obtained similar sequencing data for porcine and bovine sperm and oocytes and conducted the same analysis (Figure 3 – figure supplement 1A,B). However, we did not detect any oocyte-specific methylated gDMRs in the GPR1 intragenic region (where GPR1AS is transcribed from an intron of GPR1) in these species of the Laurasiatheria superorder. These results support the hypothesis that ZDBF2 is not imprinted in lineages outside the Euarchontoglires, the superorder which includes both rodents and primates. We have included these important DMR results as a supplement to Figure 3. [pages 16-21]

      Presentation: 

      (1) The first section of the Introduction would benefit from the inclusion of some additional general references on genomic imprinting. 

      We have added two review articles, Tucci et al. (2019) and Kobayashi (2021), as references in the first section of the Introduction. [page 5]

      (2) Introduction statement: "....nearly 200 imprinted genes have been identified in mice and humans. However, less than half of these genes overlapped in both species." This was the conclusion of one study (Tucci et al. 2016), so it would be better to provide a caveat to the statement "However, one comparative analysis suggested that fewer than half of these genes overlapped in both species". 

      The point being that the actual number of imprinted genes is still a matter of debate (see Edwards et al. 2023 PMID: 36916665), and the extent of overlap will depend on the strength of the evidence for each gene in the human and mouse imprinted gene lists. So, it is very difficult to put an accurate figure on the extent of overlap - but the authors' point is valid that there are species- or lineage-specific imprinted genes. 

      We have revised this sentence following reviewer #1's suggestion. [page 5]

      (3) Introduction statement: "The establishment of species-specific imprinting.....can be driven by various evolutionary events, including.....differences in the function of DNA methyltransferases". I am not aware that this has been described as an evolutionary event causing species-specific imprinting - without supporting evidence, I recommend to remove this suggestion. 

      We thank the reviewer for this comment and realize that we should have been more explicit here. We were referring to DNMT3C, a rodent-specific member of the DNMT3 family, which is responsible for the paternal methylation imprinting of Rasgrf1 in mice (Barau et al., Science, 2016), in association with the piRNA pathway and targeting of a specific retrotransposon within the DMR (Watanabe et al. Science, 2011). The Rasgrf1 gene is imprinted in mice but not considered imprinted in humans (though some conflicting data exist). While it is likely that the emergence of DNMT3C was a pre-requisite to the establishment of Rasgrf1 imprinting in evolutionary terms, clear evidence is lacking. Following the reviewer’s suggestion, we have removed the phrase "differences in the function of DNA methyltransferases" from the text. However, we have reintroduced this point in the Introduction section as a potential mechanism that may contribute to the establishment of species-specific imprinted genes, alongside the roles of ZNF445 and ZFP57, which regulate the maintenance of imprinting with partially divided roles between humans and mice. [page 6]

      (4) It would be very useful for readers to have a schema of the Gpr1/Zdbf2 locus that indicates the locations of the germline and somatic DMRs and their relationship to the Liz transcript. 

      (5) There is a summary figure amongst the Supplementary Figures (Suppl. Fig. 7) - it would be beneficial to readers to have this summary figure in the main text rather than the supplement. 

      Following reviewer #1’s suggestion, we have moved the regulatory system schema at the Gpr1/Zdbf2 locus, originally shown in Supplementary Figure 7, to the main text as Figure 7. In addition, in response to comment 4, we have revised the figure to explicitly depict the relationship between the Liz transcript and the establishment of the somatic DMR (sDMR), enhancing the clarity of the regulatory interactions at this locus. [page 38]

      (6) With a focus of the study on LTRs as cis-regulatory elements having been co-opted in genomic imprinting mechanisms - whether in the female germline (as in Bogutz et al. 2019) or in the current study as an activating element post-fertilisation - it is a real omission that the authors do not to refer to the role of tissue-specific LTRs as the candidate regulatory elements in non-canonical imprinting (see Hanna et al. 2019 PMID: 31665063). Please include in Introduction and/or Discussion. 

      We added a sentence explaining canonical and non-canonical imprinting and the cases where LTRs act as regulatory elements in non-canonical imprinting, with reference to the study of Hanna et al., as suggested. [page 6]

      (7) Discussion statement: "Two paternally expressed imprinted genes, PEG10/SIRH1 and PEG11/RTL1/SIRH2 have been identified in mammals. They encode GAG-POL proteins of sushi-ichi LTR retrotransposons and are essential for mammalian placenta formation and maintenance." 

      These sentences should be combined: "Two paternally expressed imprinted genes, PEG10/SIRH1, and PEG11/RTL1/SIRH2, that encode GAG-POL proteins of sushi-ichi LTR retrotransposons have been identified in mammals and are essential for mammalian placenta formation and maintenance." 

      We have revised this sentence according to reviewer #1's suggestion. [page 41]

      Reviewer #2 (Recommendations For The Authors): 

      When showing assembled GPR1-AS transcripts via genome browser tracks, it would be valuable to add normalized counts of reads mapping to each strand, in order to more convincingly demonstrate the existence of these transcripts. I ask for this because in my experience Stringtie will assemble transcripts that are only marginally supported by reads. 

      In response to Reviewer #2's suggestion, FPKM and TPM values for all StringTiepredicted GPR1-AS-like transcripts are now included in Figure 6. Each of these transcripts has a TPM value greater than 1, supporting their validity. [pages: 35]

      Reviewer #3 (Recommendations For The Authors): 

      (1) The tree in Figure 5A is one of the main arguments supporting the divergence of the mouse Liz promoter from a common MER21C element, but this contains only a handful of species, making it difficult to appreciate the full extent of its evolution. Presumably its faster mutation rate in mouse would also be supported by other closely related rodents, which would solidify the conclusion that the Liz promoter is derived from an ancient MER21C insertion. So my suggestion is to expand this tree substantially to other species, comparing sequences syntenic to the GPR1-AS/Liz promoter. 

      (2) It may also be worth trying different TE/LTR annotation tools and/or running Repeatmasker with different parameters, to see if an MER21C element is detected in mouse using a more sensitive approach. 

      In response to this suggestion, we performed computational analyses with RepeatMasker under various settings (e.g., switching search engines from RMblast to HMMER or Crossmatch, adjusting speed/sensitivity settings from default to slow). Despite these modifications, a MER21C element was not detected near the mouse Liz promoter. However, a multiple genome alignment track generated by Cactus/UCSC revealed a syntenic region, corresponding to the first exon of human GPR1-AS, which overlaps with LTR21C, also present in the genomes of mouse, rat, and hamster (Figure 4 – figure supplement 1). While RepeatMasker did not identify MER21C at the GPR1 locus in these species, homologous regions were observed across all selected Euarchontoglires. Although the Cactus alignment track does not delineate the exact boundaries of homologous regions across species (relative to humans) and thus precludes extracting each homologous sequence to construct an evolutionary tree, these findings support the hypothesis that the first exon of GPR1-AS (referred to as Liz in mice) originated from an ancient MER21C insertion in the common ancestor of Euarchontoglires. [pages: 21, 24-25]

      (3) According to Dfam, MER21C is not common to all eutherians, but specific to Boroeutheria, whilst MER21B is presumably specific to Euarchontoglires. To clarify MER21C/B evolution, it would be useful to show the number of elements present in select species from each group (including an outgroup). 

      (7) In Figure 4 it is hard to distinguish between red and purple. 

      Initially, we referenced Repbase (e.g., MER21C: Origin/Eutheria), but, as Reviewer #3 noted, Dfam should be the primary reference. We have now included the copy numbers of MER21C and MER21B for each genome in Figure 4, providing a clearer understanding of their evolutionary appearance (MER21C appears specific to Boroeutheria, while MER21B is specific to Euarchontoglires). Additionally, we adjusted the MER21B position color from purple to dark purple to improve visibility. Furthermore, we have also underlined the copy number of MER21C or MER21B located within the GPR1 region in each species. For example, in the Treeshrew genome, the LTR overlapping with GPR1-AS is annotated as MER21B, so we underlined the copy number of MER21B (2,305). These changes now clearly indicate whether homologous sequences to the first exon of GPR1-AS are annotated as MER21C or MER21B in each genome. [page 22]

      (4) Could the imprinting status of ZDBF2 not be determined in chimpanzees and rabbits? Or is it already known? Either way, a clarification would be useful to further support the concordance between GPR1-AS-like transcripts and ZDBF2 imprinting.

      The imprinting status of ZDBF2 had not previously been reported in chimpanzees, rhesus macaques, or rabbits, where GPR1-AS-like transcripts were identified. Therefore, we conducted allele-specific expression analysis of ZDBF2 using blood samples from rhesus macaques and rabbits. As expected, paternal-allele-specific expression of ZDBF2 was observed in both species, consistent with findings in humans and mice. These results have been added to Figure 3. Although we did not analyze the imprinting status in chimpanzees, we believe the existing data sufficiently support our hypothesis. [pages: 16, 19-20]

      (5) The authors briefly discuss the role of KRAB-ZFPs in controlling TE expression. An interesting addition would be to analyse the expression of the main KRAB-ZFP that binds to MER21C (ZFP789, according to data from PMID 28273063). This could be linked to the temporal control of MER21C expression. 

      In response to Reviewer #3's suggestion, we focused on the expression pattern of ZNF789 (noted by the reviewer as ZFP789), the primary KRAB-ZFP known to bind MER21C, as identified by Didier Trono’s group (PMID 28273063). Strikingly, our analysis reveals that ZNF789 is specifically downregulated at the 4-cell stage, which aligns with the timing of MER21C reactivation. While it remains to be determined whether this downregulation directly influences MER21C reactivation or the initiation of GPR1-AS expression, this finding is significant and consistent with our model. We have incorporated this information in Figure 5 – figure supplement 3. [pages: 33]

      (6) The sentence "Liz directs DNA methylation at the somatic DMR, which competes with ZDBF2 to repress the paternal allele" (introduction) was confusing to me. 

      This sentence has been revised to be more accurate as follows; Liz transcription counteracts the H3K27me3-mediated repression of Zdbf2 by promoting the deposition of antagonistic DNA methylation at the secondary DMR. [page 7]

      (8) In Figure 5 I take it that 'consensus motif' refers to ELF1/2? Maybe change the legend. 

      To clarify potential confusion around the term 'consensus motif,' which may have been mistaken for 'consensus MER21C' (the consensus sequence of MER21C-LTR from the Dfam database), we have revised the figure legend. We now refer to the motif as the "common motif," indicating the sequence common to all MER21C-derived sequences and overlapping with the first exon of GPR1-AS. [page 29]

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Glaser et al present ExA-SPIM, a light-sheet microscope platform with large volumetric coverage (Field of view 85mm^2, working distance 35mm), designed to image expanded mouse brains in their entirety. The authors also present an expansion method optimized for whole mouse brains and an acquisition software suite. The microscope is employed in imaging an expanded mouse brain, the macaque motor cortex, and human brain slices of white matter. 

      This is impressive work and represents a leap over existing light-sheet microscopes. As an example, it offers a fivefold higher resolution than mesoSPIM (https://mesospim.org/), a popular platform for imaging large cleared samples. Thus while this work is rooted in optical engineering, it manifests a huge step forward and has the potential to become an important tool in the neurosciences. 

      Strengths: 

      - ExA-SPIM features an exceptional combination of field of view, working distance, resolution, and throughput. 

      - An expanded mouse brain can be acquired with only 15 tiles, lowering the burden on computational stitching. That the brain does not need to be mechanically sectioned is also seen as an important capability. 

      - The image data is compelling, and tracing of neurons has been performed. This demonstrates the potential of the microscope platform. 

      Weaknesses: 

      - There is a general question about the scaling laws of lenses, and expansion microscopy, which in my opinion remained unanswered: In the context of whole brain imaging, a larger expansion factor requires a microscope system with larger volumetric coverage, which in turn will have lower resolution (Figure 1B). So what is optimal? Could one alternatively image a cleared (non-expanded) brain with a high-resolution ASLM system (Chakraborty, Tonmoy, Nature Methods 2019, potentially upgraded with custom objectives) and get a similar effective resolution as the authors get with expansion? This is not meant to diminish the achievement, but it was unclear if the gains in resolution from the expansion factor are traded off by the scaling laws of current optical systems. 

      Paraphrasing the reviewer: Expanding the tissue requires imaging larger volumes and allows lower optical resolution. What has been gained?

      The answer to the reviewer’s question is nuanced and contains four parts. 

      First, optical engineering requirements are more forgiving for lenses with lower resolution. Lower resolution lenses can have much larger fields of view (in real terms: the number of resolvable elements, proportional to ‘etendue’) and much longer working distances. In other words, it is currently more feasible to engineer lower resolution lenses with larger volumetric coverage, even when accounting for the expansion factor. 

      Second, these lenses are also much better corrected compared to higher resolution (NA) lenses. They have a flat field of view, negligible pincushion distortions, and constant resolution across the field of view. We are not aware of comparable performance for high NA objectives, even when correcting for expansion.

      Third, although clearing and expansion render tissues ‘transparent’, there still exist refractive index inhomogeneities which deteriorate image quality, especially at larger imaging depths. These effects are more severe for higher optical resolutions (NA), because the rays entering the objective at higher angles have longer paths in the tissue and will see more aberrations. For lower NA systems, such as ExaSPIM, the differences in paths between the extreme and axial rays are relatively small and image formation is less sensitive to aberrations. 

      Fourth, aberrations are proportional to the index of refraction inhomogeneities (dn/dx). Since the index of refraction is roughly proportional to density, scattering and aberration of light decreases as M^3, where M is the expansion factor. In contrast, the imaging path length through the tissue only increases as M. This produces a huge win for imaging larger samples with lower resolutions. 

      To our knowledge there are no convincing demonstrations in the literature of diffraction-limited ASLM imaging at a depth of 1 cm in cleared mouse brain tissue, which would be equivalent to the ExA-SPIM imaging results presented in this manuscript.  

      In the discussion of the revised manuscript we discuss these factors in more depth. 

      - It was unclear if 300 nm lateral and 800 nm axial resolution is enough for many questions in neuroscience. Segmenting spines, distinguishing pre- and postsynaptic densities, or tracing densely labeled neurons might be challenging. A discussion about the necessary resolution levels in neuroscience would be appreciated. 

      We have previously shown good results in tracing the thinnest (100 nm thick) axons over cm scales with 1.5 um axial resolution. It is the contrast (SNR) that matters, and the ExaSPIM contrast exceeds the block-face 2-photon contrast, not to mention imaging speed (> 10x).  

      Indeed, for some questions, like distinguishing fluorescence in pre- and postsynaptic structures, higher resolutions will be required (0.2 um isotropic; Rah et al Frontiers Neurosci, 2013). This could be achieved with higher expansion factors.

      This is not within the intended scope of the current manuscript. As mentioned in the discussion section, we are working towards ExA-SPIM-based concepts to achieve better resolution through the design and fabrication of a customized imaging lens that maintains a high volumetric coverage with increased numerical aperture.  

      - Would it be possible to characterize the aberrations that might be still present after whole brain expansion? One approach could be to image small fluorescent nanospheres behind the expanded brain and recover the pupil function via phase retrieval. But even full width half maximum (FWHM) measurements of the nanospheres' images would give some idea of the magnitude of the aberrations. 

      We now included a supplementary figure highlighting images of small axon segments within distal regions of the brain.  

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, Glaser et al. describe a new selective plane illumination microscope designed to image a large field of view that is optimized for expanded and cleared tissue samples. For the most part, the microscope design follows a standard formula that is common among many systems (e.g. Keller PJ et al Science 2008, Pitrone PG et al. Nature Methods 2013, Dean KM et al. Biophys J 2015, and Voigt FF et al. Nature Methods 2019). The primary conceptual and technical novelty is to use a detection objective from the metrology industry that has a large field of view and a large area camera. The authors characterize the system resolution, field curvature, and chromatic focal shift by measuring fluorescent beads in a hydrogel and then show example images of expanded samples from mouse, macaque, and human brain tissue. 

      Strengths: 

      I commend the authors for making all of the documentation, models, and acquisition software openly accessible and believe that this will help assist others who would like to replicate the instrument. I anticipate that the protocols for imaging large expanded tissues (such as an entire mouse brain) will also be useful to the community. 

      Weaknesses: 

      The characterization of the instrument needs to be improved to validate the claims. If the manuscript claims that the instrument allows for robust automated neuronal tracing, then this should be included in the data. 

      The reviewer raises a valid concern. Our assertion that the resolution and contrast is sufficient for robust automated neuronal tracing is overstated based on the data in the paper. We are hard at work on automated tracing of datasets from the ExA-SPIM microscope. We have demonstrated full reconstruction of axonal arbors encompassing >20 cm of axonal length.  But including these methods and results is out of the scope of the current manuscript. 

      The claims of robust automated neuronal tracing have been appropriately modified.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Smaller questions to the authors: 

      - Would a multi-directional illumination and detection architecture help? Was there a particular reason the authors did not go that route?

      Despite the clarity of the expanded tissue, and the lower numerical aperture of the ExA-SPIM microscope, image quality still degrades slightly towards the distal regions of the brain relative to both the excitation and detection objective. Therefore, multi-directional illumination and detection would be advantageous. Since the initial submission of the manuscript, we have undertaken re-designing the optics and mechanics of the system. This includes provisions for multi-directional illumination and detection. However, this new design is beyond the scope of this manuscript. We now mention this in L254-255 of the Discussion section.

      - Why did the authors not use the same objective for illumination and detection, which would allow isotropic resolution in ASLM? 

      The current implementation of ASLM requires an infinity corrected objective (i.e. conjugating the axial sweeping mechanism to the back focal plane). This is not possible due to the finite conjugate design of the ExA-SPIM detection lens.

      More fundamentally, pushing the excitation NA higher would result in a shorter light sheet Rayleigh length, which would require a smaller detection slit (shorter exposure time, lower signal to noise ratio). For our purposes an excitation NA of 0.1 is an excellent compromise between axial resolution, signal to noise ratio, and imaging speed. 

      For other potentially brighter biological structures, it may be possible to design a custom infinity corrected objective that enables ASLM with NA > 0.1.

      - Have the authors made any attempt to characterize distortions of the brain tissue that can occur due to expansion? 

      We have not systematically characterized the distortions of the brain tissue pre and post expansion. Imaged mouse brain volumes are registered to the Allen CCF regardless of whether or not the tissue was expanded. It is beyond the scope of this manuscript to include these results and processing methods, but we have confirmed that the ExA-SPIM mouse brain volumes contain only modest deformation that is easily accounted for during registration to the Allen CCF. 

      - The authors state that a custom lens with NA 0.5-0.6 lens can be designed, featuring similar specifications. Is there a practical design? Wouldn't such a lens be more prone to Field curvature? 

      This custom lens has already been designed and is currently being fabricated. The lens maintains a similar space bandwidth product as the current lens (increased numerical aperture but over a proportionally smaller field of view). Over the designed field of view, field curvature is <1 µm. However, including additional discussion or results of this customized lens is beyond the scope of this manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      System characterization: 

      - Please state what wavelength was used for the resolution measurements in Figure 2.

      An excitation wavelength of 561 nm was used. This has been added to the manuscript text.

      - The manuscript highlights that a key advance for the microscope is the ability to image over a very large 13 mm diameter field of view. Can the authors clarify why they chose to characterize resolution over an 8diameter mm field rather than the full area? 

      The 13 mm diameter field of view refers to the diagonal of the 10.6 x 8.0 mm field of view. The results presented in Figure 1c are with respect to the horizontal x direction and vertical y direction. A note indicating that the 13 mm is with respect to the diagonal of the rectangular imaging field has been added to the manuscript text. The results were presented in this way to present the axial and lateral resolution as a function of y (the axial sweeping direction).

      - The resolution estimates seem lower than I would expect for a 0.30 NA lens (which should be closer to ~850 nm for 515 nm emission). Could the authors clarify the discrepancy? Is this predicted by the Zemax model and due to using the lens in immersion media, related to sampling size on the camera, or something else? It would be helpful if the authors could overlay the expected diffraction-limited performance together with the plots in Figure 2C. 

      As mentioned previously, the resolution measurements were performed with 561 nm excitation and an emission bandpass of ~573 – 616 nm (595 nm average). Based on this we would expect the full width half maximum resolution to be ~975 nm. The resolution is in fact limited by sampling on the camera. The 3.76 µm pixel size, combined with the 5.0X magnification results in a sampling of 752 nm. Based on the Nyquist the resolution is limited to ~1.5 µm. We have added clarifying statements to the text.

      - I'm confused about the characterization of light sheet thickness and how it relates to the measured detection field curvature. The authors state that they "deliver a light sheet with NA = 0.10 which has a width of 12.5 mm (FWHM)." If we estimate that light fills the 0.10 NA, it should have a beam waist (2wo) of ~3 microns (assuming Gaussian beam approximations). Although field curvature is described as "minimal" in the text, it is still ~10-15 microns at the edge of the field for the emission bands for GFP and RFP proteins. Given that this is 5X larger than the light sheet thickness, how do the authors deal with this? 

      The generated light sheet is flat, with a thickness of ~ 3 µm. This flat light sheet will be captured in focus over the depth of focus of the detection objective. The stated field curvature is within 2.5X the depth of focus of the detection lens, which is equivalent to the “Plan” specification of standard microscope objectives.

      - In Figure 2E, it would be helpful if the authors could list the exposure times as well as the total voxels/second for the two-camera comparison. It's also worth noting that the Sony chip used in the VP151MX camera was released last year whereas the Orca Flash V3 chosen for comparison is over a decade old now. I'm confused as to why the authors chose this camera for comparison when they appear to have a more recent Orca BT-Fusion that they show in a picture in the supplement (indicated as Figure S2 in the text, but I believe this is a typo and should be Figure S3). 

      This is a useful addition, and we have added exposure times to the plot. We have also added a note that the Orca Flash V3 is an older generation sCMOS camera and that newer variants exist. Including the Orca BT-Fusion. The BT-Fusion has a read noise of 1.0 e- rms versus 1.6 e- rms, and a peak quantum efficiency of ~95% vs. 85%. Based on the discussion in Supplementary Note S1, we do not expect that these differences in specifications would dramatically change the data presented in the plot. In addition, the typo in Figure S2 has been corrected to Figure S3.

      - In Table S1, the authors note that they only compare their work to prior modalities that are capable of providing <= 1 micron resolution. I'm a bit confused by this choice given that Figure 2 seems to show the resolution of ExA-SPIM as ~1.5 microns at 4 mm off center (1/2 their stated radial field of view). It also excludes a comparison with the mesoSPIM project which at least to me seems to be the most relevant prior to this manuscript. This system is designed for imaging large cleared tissues like the ones shown here. While the original publication in 2019 had a substantially lower lateral resolution, a newer variant, Nikita et al bioRxiv (which is cited in general terms in this manuscript, but not explicitly discussed) also provides 1.5-micron lateral resolution over a comparable field of view. 

      We have updated the table to include the benchtop mesoSPIM from Nikita et al., Nature Communications, 2024. Based on this published version of the manuscript, the lateral resolution is 1.5 µm and axial resolution is 3.3 µm. Assuming the Iris 15 camera sensor, with the stated 2.5 fps, the volumetric rate (megavoxels/sec) is 37.41.

      - The authors state that, "We systematically evaluated dehydration agents, including methanol, ethanol, and tetrahydrofuran (THF), followed by delipidation with commonly used protocols on 1 mm thick brain slices. Slices were expanded and examined for clarity under a macroscope." It would be useful to include some data from this evaluation in the manuscript to make it clear how the authors arrived at their final protocol. 

      Additional details on the expansion protocol may be included in another manuscript.

      General comments: 

      There is a tendency in the manuscript to use negative qualitative terms when describing prior work and positive qualitative terms when describing the work here. Examples include: 

      - "Throughput is limited in part by cumbersome and error-prone microscopy methods". While I agree that performing single neuron reconstructions at a large scale is a difficult challenge, the terms cumbersome and error-prone are qualitative and lacking objective metrics.

      We have revised this statement to be more precise, stating that throughput is limited in part by the speed and image quality of existing microscopy methods.

      - The resolution of the system is described in several places as "near-isotropic" whereas prior methods were described as "highly anisotropic". I agree that the ~1:3 lateral to axial ratio here is more isotropic than the 1:6 ratio of the other cited publications. However, I'm not sure I'd consider 3-fold worse axial resolution than lateral to be considered "near" isotropic.

      We agree that the term near-isotropic is ambiguous. We have modified the text accordingly, removing the term near-isotropic and where appropriate stating that the resolution is more isotropic than that of other cited publications.

      - In the manuscript, the authors describe the photobleaching in their imaging conditions as "negligible". Figure S5 seems to show a loss of 60% fluorescence after 2000 exposures (which in the caption is described as "modest"). I'd suggest removing these qualitative terms and just stating the values.

      We agree and have changed the text accordingly.

      - The results section for Figure 5 is titled "Tracing axons in human neocortex and white matter". Although this section states "larger axons (>1 um) are well separated... allowing for robust automated and manual tracing" there is no data for any tracing in the manuscript. Although I agree that the images are visually impressive, I'm not sure that this claim is backed by data.

      We have now removed the text in this section referring to automated and manual tracing.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper Weber et al. investigate the role of 4 dopaminergic neurons of the Drosophila larva in mediating the association between an aversive high-salt stimulus and a neutral odor. The 4 DANs belong to the DL1 cluster and innervate non-overlapping compartments of the mushroom body, distinct from those involved in appetitive associative learning. Using specific driver lines for individual neurons, the authors show that activation of the DAN-g1 is sufficient to mimic an aversive memory and it is also necessary to form a high-salt memory of full strength, although optogenetic silencing of this neuron has only a partial phenotype. The authors use calcium imaging to show that the DAN-g1 is not the only DAN responding to salt. DAN-c1 and d1 also respond to salt, but they seem to play no role for the associative memory. DAN-f1, which does not respond to salt, is able to lead to the formation of a memory (if optogenetically activated), but it is not necessary for the salt-odor memory formation in normal conditions. However, when silenced together with DAN-g1, it enhances the memory deficit of DAN-g1. Overall, this work brings evidence of a complex interaction between DL1 DANs in both the encoding of salt signals and their teaching role in associative learning, with none of them being individually necessary and sufficient for both functions.

      Strengths:

      Overall, the manuscript contributes interesting results that are useful to understand the organization and function of the dopaminergic system. The behavioral role of the specific DANs is accessed using specific driver lines which allow to test their function individually and in pairs. Moreover, the authors perform calcium imaging to test whether DANs are activated by salt, a prerequisite for inducing a negative association to it. Proper genetic controls are carried across the manuscript.

      Weaknesses:

      The authors use two different approaches to silence dopaminergic neurons: optogenetics and induction of apoptosis. The results are not always consistent, but the authors discuss these differences appropriately. In general, the optogenetic approach is more appropriate as developmental compensations are not of major interest for the question investigated.

      The physiological data would suggest the role of a certain subset of DANs in salt-odor association, but a different partially overlapping set is necessary in behavioral assays (with a partial phenotype). No manipulation completely abolishes the salt-odor association, leaving important open questions on the identity of the neural circuits involved in this behavior.

      The EM data analysis reveals a non-trivial organization of sensory inputs into DANs, but it is difficult to extrapolate a link to the functional data presented in the paper.

      We would like to once again thank Reviewer 1 for the positive assessment of our work and for the valuable suggestions provided on the first revision of the manuscript. In this second revision, we have addressed the linguistic issues and most of the minor comments as recommended. We now hope that the current version of our manuscript meets the reviewer’s expectations both in terms of language and content.

      Reviewer #2 (Public review):

      Summary:

      In this work the authors show that dopaminergic neurons (DANs) from the DL1 cluster in Drosophila larvae are required for the formation of aversive memories. DL1 DANs complement pPAM cluster neurons which are required for the formation of attractive memories. This shows the compartmentalized network organization of how an insect learning center (the mushroom body) encodes memory by integrating olfactory stimuli with aversive or attractive teaching signals. Interestingly, the authors found that the 4 main dopaminergic DL1 neurons act partially redundant, and that single cell ablation did not result in aversive memory defects. However, ablation or silencing of a specific DL1 subset (DAN-f1,g1) resulted in reduced salt aversion learning, which was specific to salt but no other aversive teaching stimuli tested. Importantly, activation of these DANs using an optogenetic approach was also sufficient to induce aversive learning in the presence of high salt. Together with the functional imaging of salt and fructose responses of the individual DANs and the implemented connectome analysis of sensory (and other) inputs to DL1/pPAM DANs this represents a very comprehensive study linking the structural, functional and behavioral role of DL1 DANs. This provides fundamental insight into the function of a simple yet efficiently organized learning center which displays highly conserved features of integrating teaching signals with other sensory cues via dopaminergic signaling.

      Strengths:

      This is a very careful, precise and meticulous study identifying the main larval DANs involved in aversive learning using high salt as a teaching signal. This is highly interesting because it allows to define the cellular substrates and pathways of aversive learning down to the single cell level in a system without much redundancy. It therefore sets the basis to conduct even more sophisticated experiments and together with the neat connectome analysis opens the possibility to unravel different sensory processing pathways within the DL1 cluster and integration with the higher order circuit elements (Kenyon cells and MBONs). The authors' claims are well substantiated by the data and balanced, putting their data in the appropriate context. The authors also implemented neat pathway analyses using the larval connectome data to its full advantage, thus providing network pathways that contribute towards explaining the obtained results.

      Weaknesses:

      Previous comments were fully addressed by the authors.

      We sincerely thank Reviewer 2 for the positive evaluation of our work. We are glad that our responses in the first revision addressed the previous concerns and appreciate the reviewer’s constructive feedback once again.

      Reviewer #3 (Public review):

      Summary:

      The study of Weber et al. provides a thorough investigation of the roles of four individual dopamine neurons for aversive associative learning in the Drosophila larva. They focus on the neurons of the DL-1 cluster which already have been shown to signal aversive teaching signals. But the authors go beyond the previous publications and test whether each of these dopamine neurons responds to salt or sugar, is necessary for learning about salt, bitter, or sugar, and is sufficient to induce a memory when optogenetically activated. In addition, previously published connectomic data is used to analyze the synaptic input to each of these dopamine neurons. The authors conclude that the aversive teaching signal induced by salt is distributed across the four DL-1 dopamine neurons, with two of them, DAN-f1 and DAN-g1, being particularly important. Overall, the experiments are well designed and performed, support the authors' conclusions, and deepen our understanding of the dopaminergic punishment system.

      Strengths:

      (1) This study provides, at least to my knowledge, the first in vivo imaging of larval dopamine neurons in response to tastants. Although the selection of tastants is limited, the results close an important gap in our understanding of the function of these neurons.

      (2) The authors performed a large number of experiments to probe for the necessity of each individual dopamine neuron, as well as combinations of neurons, for associative learning. This includes two different training regimen (1 or 3 trials), three different tastants (salt, quinine and fructose) and two different effectors, one ablating the neuron, the other one acutely silencing it. This thorough work is highly commendable, and the results prove that it was worth it. The authors find that only one neuron, DAN-g1, is partially necessary for salt learning when acutely silenced, whereas a combination of two neurons, DAN-f1 and DAN-g1, are necessary for salt learning when either being ablated or silenced.

      (3) In addition, the authors probe whether any of the DL-1 neurons is sufficient for inducing an aversive memory. They found this to be the case for two of the neurons, largely confirming previous results obtained by a different learning paradigm, parameters and effector.

      (4) This study also takes into account connectomic data to analyze the sensory input that each of the dopamine neurons receives. This analysis provides a welcome addition to previous studies and helps to gain a more complete understanding. The authors find large differences in inputs that each neuron receives, and little overlap in input that the dopamine neurons of the "aversive" DL-1 cluster and the "appetitive" pPAM cluster seem to receive.

      (5) Finally, the authors try to link all the gathered information in order to describe an updated working model of how aversive teaching signals are carried by dopamine neurons to the larva's memory center. This includes important comparisons both between two different aversive stimuli (salt and nociception) and between the larval and adult stages.

      We would also like to thank Reviewer 3 for the positive assessment of our work. Many of the constructive comments provided were incorporated into the first revision, contributing significantly to the improved clarity and overall quality of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Here are some minor comments (and some semantics that could be addressed to improve the manuscript)

      Title: is the title correct given that c1 and d1 do not really signal punishment?

      We think the title is correct and would like to keep it as it is.

      L72 striatum misspelled

      We have corrected the error.

      L74 constitute instead of provide?

      We made the suggested modification in the text.

      L129: "But can these four individual DANs also process other sensory modalities?" other then what? What was used before?

      We have made the required change, which now allows us to contrast somatosensory and chemosensory information.

      L172: (Please refer to the discussion regarding the partial reduction of the memory); would be more natural to explain shortly here, or add a sentence before this parenthesis that point to the effect

      We made the requested change in the manuscript and added a short sentence before the parenthesis.

      L182: "DL1 neurons convey a dopaminergic aversive teaching signal" you cannot make this statement from just TH-GAL4!

      We agree - that's why we have completely revised the sentence and now further restricted it and also refer to further larval and adult published data

      L264: "possible redundancy among" I don't think you are testing a redundancy here, it is more likely a developmental compensation.

      We made the requested change in the sentence and added a potential developmental compensation as an interpretation of our results.

      L296: "to determine if the activation of individual DL1 DANs signals aspects of the natural high salt punishment," - how can the optogenetic activation tell something about aspects of the natural salt punishment? I understand the fact that salt is present, but still I find it inaccurate

      Our approach is based on the framework established by Bertram Gerber and colleagues over the past two decades in larval Drosophila research. According to this logic, memory recall is dependent on the specific properties of the test context, particularly the type and concentration of the stimulus presented on the test plate. Aversive memory retrieval occurs only when the test conditions closely match those of the training stimulus. Consequently, the larva's behavior on the test plate serves as an indicator of the memory content being recalled. We therefore adhere to this established methodology (Gerber & Hendel, 2006; Schleyer et al., 2011; Schleyer et al., 2015).

      L307 "DAN-f1 and DAN-g1 encode aspects of the natural aversive high salt teaching" you cannot conclude that given that f1 does not even respond to salt. I understand the logic of the salt during test, but I think it is still a stretched interpretation

      We agree and thus have deleted the sentence.

      L310 "Individual DL1 DANs are acutely necessary" this is too general, it seems that only one is

      We have changed the title and now clearly state that this is only one DAN of the DL1 cluster.

      Reviewer #2 (Recommendations for the authors):

      In Fig.6 the text flow could be optimized as the authors first mention Fig. 6E,F before they follow up with Fig. 6A-D.

      Thanks for bringing this up – we changed it in the revised version of the manuscript. Now 6A-D is mentioned first.

      In Fig.6 the finding that optogenetic inactivation but not ablation of DAN-g1 slightly but significantly reduces aversive salt learning suggests that there is an individual contribution of this DAN in this paradigm. The authors emphasize redundancy of DL1 DANs although the effect size seems comparable between DAN-g1 and DAN-f1,g1 silencing.

      In response to this concern and the one of reviewer 2, we have revised the section title and removed the final sentence of the section before to avoid placing emphasis on the potential redundancy of DL1 DANs within this results section.

      Reviewer #3 (Recommendations for the authors):

      The authors replied to each issue I raised, and revised their manuscript accordingly. In particular, regarding my major concern (the sufficiency of the neurons for salt-"specific" memories), I think the authors found a good solution.

      I have no further comments.

      We sincerely thank the reviewer for the positive feedback on our revision. We are pleased that the revised manuscript meets the expectations and appreciate the time and effort invested in the review process.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In Causal associations between plasma proteins and prostate cancer: a Proteome-Wide Mendelian Randomization, the authors present a manuscript which seeks to identify novel markers for prostate cancer through analysis of large biobank-based datasets and to extend this analysis to potential therapeutic targets for drugs. This is an area that is already extensively researched, but remains important, due to the high burden and mortality of prostate cancer globally.

      Strengths:

      The main strengths of the manuscript are the identification and use of large biobank data assets, which provide large numbers of cases and controls, essential for achieving statistical power. The databases used (deCODE, FinnGen, and the UK Biobank) allow for robust numbers of cases and controls. The analytical method chosen, Mendelian Randomization, is appropriate to the problem. Another strength is the integration of multi-omic datasets, here using protein data as well as GWAS sources to integrate genomic and proteomic data.

      Thank you for your positive feedback regarding the overall quality of our work and we greatly appreciate you taking time and making effort in reviewing our manuscript.

      Weaknesses:

      The main weaknesses of the manuscript relate to the following areas:

      (1) The failure of the study to analyse the data in the context of other closely related conditions such as benign prostatic hyperplasia (BPH) or lower urinary tract symptoms (LUTS), which have some pathways and biomarkers in common, such as inflammatory pathways (including complement) and specific markers such as KLK3. As a consequence, it is not possible for readers to know whether the findings are specific to prostate cancer or whether they are generic to prostate dysfunction. Given the prevalence of prostate dysfunction (half of men reaching their sixth decade), the potential for false positives and overtreatment from non-specific biomarkers is a major problem, resulting in the evidence presented in this manuscript being weak. Other researchers have addressed this issue using the same data sources as presented here, for example, in this paper, looking at BPH in the UK Biobank population. https://www.nature.com/articles/s41467-018-06920-9

      Thank you for your valuable comment. We fully agree that biomarker development must prioritize specificity to avoid overtreatment. While our study is a foundational step toward identifying potential therapeutic targets or complementary biomarkers for prostate cancer (PCa)—not as a direct endorsement of these proteins for standalone clinical diagnosis. Mendelian randomization (MR) analysis strengthens causal inference by design, and we further ensured robustness through sensitivity analyses (e.g. MR-Egger regression for pleiotropy, Bonferroni correction for multiple testing). These methods distinguish true causal effects from nonspecific associations. Importantly, while PSA’s lack of specificity is widely recognized, its role in reducing PCa mortality underscores the value of biomarker-driven screening. Our findings align with the need to integrate multiple markers (e.g. combining a novel protein with PSA) to improve diagnostic precision. Translating these causal insights into clinical tools remains challenging but represents a necessary next step, and we emphasize that this work provides a rigorous starting point for future validation studies.

      (2) There is no discussion of Gleason scores with regard to either biomarkers or therapies, and a general lack of discussion around indolent disease as compared with more aggressive variants. These are crucial issues with regard to the triage and identification of genomically aggressive localized prostate cancers. See, for example, the work set out in: https://doi.org/10.1038/nature20788

      Thank you for pointing this out. We acknowledge that our original analysis did not directly address this critical issue due to a key data limitation: the publicly available GWAS summary statistics for PCa (from openGWAS and FinnGen) do not provide genetic associations stratified by phenotypic severity or molecular subtypes. This limitation precluded MR analysis of proteins specifically linked to aggressive disease. To partially bridge this gap, we integrate evidence from recent studies in the revised Discussion section to explore the relevance of potential biomarkers to aggressive PCa.

      (3) An additional issue is that the field of PCa research is fast-moving. The manuscript cites ~80 references, but too few of these are from recent studies, and many important and relevant papers are not included. The manuscript would be much stronger if it compared and contrasted its findings with more recent studies of PCa biomarkers and targets, especially those concerned with multi-omics and those including BPH.

      Thank you for your professional comments. We have rigorously updated the manuscript to include more recent publications and we systematically compare and contrast our findings with these recent studies in the revised Discussion section.

      (4) The Methods section provides no information on how the Controls were selected. There is no Table providing cohort data to allow the reader to know whether there were differences in age, BMI, ethnic grouping, social status or deprivation, or smoking status, between the Cases and Controls. These types of data are generally recorded in Biobank data, so this sort of analysis should be possible, or if not, the authors' inability to construct an appropriately matched set of Controls should be discussed as a Limitation.

      We thank the reviewer for raising this important methodological concern. We have expanded the Limitations section to state it.

      Reviewer #2 (Public review):

      This is potentially interesting work, but the analyses are attempted in a rather scattergun way, with little evident critical thought. The structure of the work (Results before Methods) can work in some manuscripts, but it is not ideal here. The authors discuss results before we know anything about the underlying data that the results come from. It gives the impression that the authors regard data as a resource to be exploited, without really caring where the data comes from. The methods can provide meaningful insights if correctly used, but while I don't have reasons to doubt that the analyses were conducted correctly, findings are presented with little discussion or interpretation. No follow-up analyses are performed.

      In summary, there are likely some gems here, but the whole manuscript is essentially the output from an analytic pipeline.

      We thank the reviewer for the thoughtful evaluation of our work.

      Taking the researchers aims in turn:

      (1) Meta-GWAS - while combining two datasets together can provide additional insights, the contribution of this analysis above existing GWAS is not clear. The PRACTICAL consortium has already reported the GWAS of 70% of these data. What additional value does this analysis provide? (Likely some, but it's not clear from the text.) Also, the presentation of results is unclear - authors state that only 5 gene regions contained variants at p<5x10-8, but Figure 1 shows dozens of hits above 5x10-8. Also, the red line in Figure 1 (supposedly at 5x10-8) is misplaced.

      Thank you very much for your feedback. Although the PRACTICAL consortium constituted the majority of PCa GWAS data, our meta-analysis integrating FinnGen data enhanced statistical power enabling robust detection of low-frequency variants with minor allele frequencies. Moreover, FinnGen's Finnish ancestry (genetic isolate) helps distinguish population-specific effects. The presentation of results showed the top 5 gene regions contained variants at p < 5×10<sup>-8</sup>. We apologize for not noticing that the red line was not displayed correctly in the original figures included in the manuscript. We have updated it in the revised manuscript.

      (2) Cross-phenotype analysis. It is not really clear what this analysis is, or why it is done. What is the iCPAGdb? A database? A statistical method? Why would we want to know cross-phenotype associations? What even are these? It seems that the authors have taken data from an online resource and have written a paragraph based on this existing data with little added value.

      We thank you for raising this issue. The iCPAGdb (interactive Cross-Phenotype Analysis of GWAS database) is an integrative platform that systematically identifies cross-phenotype associations and evaluates genetic pleiotropy by leveraging LD-proxy associations from the NHGRI-EBI GWAS Catalog. The pathogenesis and progression of prostate cancer constitute a complex pathophysiological continuum characterized by dynamic multisystem interactions, extending beyond singular molecular pathway dysregulation to encompass coordinated disruptions across endocrine regulation, immune microenvironment remodeling, and metabolic reprogramming. Therefore, it is indispensable for discriminating primary pathogenic drivers from secondary compensatory responses, ultimately informing the development of precision therapeutic strategies.

      (3) PW-MR. I can see the value of this work, but many details are unclear. Was this a two-sample MR using PRACTICAL + FinnGen data for the outcome? How many variants were used in key analyses? Again, the description of results is sparse and gives little added value.

      We thank you for raising this issue. Two-sample MR refers to an analytical design where genetic instruments for the exposure (plasma proteins) and genetic associations with the outcome (PCa) are derived from non-overlapping populations. This ensures complete sample independence between exposure and outcome datasets to avoid confounding biases, regardless of whether the outcome data originate from single or multiple cohorts. The meta-analysis of PRACTICAL and FinnGen GWAS generates 27,210 quality-controlled variants (p < 5×10<sup>-8</sup>, MAF ≥ 1%, LD-clumped r<sup>2</sup> < 0.1) used in key analyses.

      (4) Colocalization - seems clear to me.

      (5) Additional post-GWAS analyses (pathway + druggability) - again, the analyses seem to be performed appropriately, although little additional insight other than the reporting of output from the methods.

      The post-MR druggability and pathway analyses serve two primary scientific purposes: (1) therapeutic prioritization - systematically evaluating which MR-identified proteins represent tractable drug targets (either through existing FDA-approved agents or compounds in clinical development) with direct relevance to cancer or PCa management, and (2) mechanistic hypothesis generation - mapping these candidate proteins to coherent biological pathways to guide future functional validation studies investigating their causal roles in prostate carcinogenesis.

      Minor points:

      (6) The stated motivation for this work is "early detection". But causality isn't necessary for early detection. If the authors are interested in early detection, other analysis approaches are more appropriate.

      We appreciate your insightful feedback. While early detection is one motivation for this work, our primary goal extends to identifying causally implicated proteins that may serve as intervention targets for PCa prevention or therapy.  Establishing causality is critical for distinguishing biomarkers that drive disease pathogenesis from those that are secondary to disease progression, as the former holds greater specificity for early detection and prioritization of therapeutic targets. While we acknowledge that validation for early detection may require additional methodologies, MR analysis provides a foundational step by prioritizing candidate proteins with causal links to disease. This approach ensures that downstream efforts focus on biomarkers and targets with the greatest potential to alter disease trajectories, rather than merely correlative markers.

      (7) The authors state "193 proteins were associated with PCa risk", but they are looking at MR results - these analyses test for disease associations of genetically-predicted levels of proteins, not proteins themselves.

      In MR, the exposure of interest is the lifelong effect of genetically predicted protein levels. This approach is designed to infer causality while avoiding confounding and reverse causation, as genetic variants are fixed at conception and unaffected by disease processes. When we state “193 proteins were associated with PCa risk,” we specifically refer to proteins whose genetically predicted levels (based on instrument SNPs from protein QTLs) show causal links to PCa. Importantly, MR does not measure the direct association between observed protein concentrations and disease. Instead, it estimates the lifelong causal effect of protein levels predicted by genetics. This distinction is critical for disentangling cause from consequence. For example, a protein elevated due to tumor progression would not be identified as causal in MR if its genetic predictors are unrelated to PCa risk.

      We acknowledge that clinical translation requires further validation of these proteins in observational studies measuring actual protein levels. However, MR provides a robust first step by prioritizing candidates with causal roles, thereby reducing the risk of investing in biomarkers confounded by disease processes.

    1. Author response:

      We thank the reviewing editors, senior editors, and reviewers for their time, efforts, and constructive feedback. We believe the points raised are addressable and we would like to proceed with a revised submission for further review. Specifically, we plan the following revisions:

      Editor’s Comments

      We will clarify study definitions to ensure the meaning of "5-year crude overall survival time" is explicit for readers.

      Reviewer 1 Comments

      - Clarify and supplement the work with detailed sources of study origin (cancer registries or single-center cohorts).

      - Conduct a multi-level hierarchical meta-analysis to address concerns of ecological fallacy in interpreting results.

      - Perform an ecological sensitivity analysis and clarify findings regarding small study effects.

      - Expand the search base significantly by including African local databases; preliminary searches have identified over 50 potentially eligible doctoral theses, dissertations, local journal articles, and gray literature, potentially adding data from five or more additional countries.

      Reviewer 2 Comments

      - Conduct subgroup analyses by sex and assess the influence of the percentage of males in mixed cohorts.

      - Enhance the limited meta-analysis and provide supplementary full forest plots for all analyses.

      - Clarify phrasing in sections identified by the reviewer.

      Additional Planned Clarifications and Analyses

      - Elucidate the role of cumulative meta-analysis in mitigating lead-time bias.

      - Include supplementary cumulative meta-analysis based on the year of investigation (instead of publication year).

      - Perform subgroup analyses by clinical staging, TNM grading, and treatment modalities where data from ≥10 studies is available.

      - Expand discussion on the merits of quality assessment versus risk of bias evaluation in large scale epidemiological and observational studies, in line with other studies of this scale.

      - Condense the comparison with 2018 estimates, as per reviewer suggestions.

      Clarification Regarding SSA vs. AU Classification

      We do not intend to compare survival between "Sub-Saharan Africa" (SSA) and North Africa, as this binary classification is historically rooted and does not reflect current African Union (AU) administrative or policy groupings. Our regional analyses will adhere to the AU’s contemporary regional framework to better reflect political, cultural, and healthcare system realities.

      On Registry Data

      We will clarify that we will not extract raw registry data, as such data is typically unprocessed and does not provide 5-year overall survival metrics. As such extracting raw, individual-level data from registries or vital statistics systems falls outside the methodological scope of a meta-analysis. Meta-analyses are designed to synthesize published survival estimates or those available from reports where survival analyses have already been conducted. Utilizing raw surveillance data would require primary data processing and survival analysis — effectively creating new data, not synthesizing existing results. This would represent a distinct study design, such as a pooled analysis or original cohort study, rather than a meta-analysis. Where registry reports present summary survival estimates (e.g., 5-year overall survival) in a format compatible with meta-analysis, we will certainly include them.

      All planned additional analyses will depend on data quality, consistency, and feasibility for pooling using state-of-the-art statistical techniques. Where pooling is not possible, we will transparently report limitations.

    1. Author response:

      We thank all the reviewers for their thoughtful comments on our submitted manuscript.

      The main points made by all three reviewers were: to discuss the components of the omitted synapses and explore parameter sensitivity and broader physiological variability; to provide deeper physical insights into phase separation; to clarify terminology and provide better presentation and context in relation to previous studies.

      We fully agree with the first point, suggesting that parameter sensitivity and broader physiological variability should be explored. Our model omits scaffold proteins such as GKAP, Shank and Homer, which are present at the bottom of the PSD hierarchy. In addition, there are many other interactions in PSDs whose affinity is altered by phosphorylation, and the phase separation state of the condensate is likely to be affected by ionic concentration and other environmental factors. We will include a more detailed discussion of these environmental factors and a limitation of our study in the Discussion section. Furthermore, regarding to the sensitivity of the parameters, the reviewer's point that the membrane potential parameter is an important value is right since it directly regulates the difference between 3D and 2D systems. We plan to verify this by changing the strength of the membrane potential, and by running simulations again to see how much it affects the morphology of condensates.

      The second point is that we should provide deeper physical insight into phase separation in different dimensions. It would not be straightforward to directly estimate the entropy of the system due to the nature of the model. However, as pointed out, the difference of phase behavior can be elucidated through various simplified theories such as the lattice model. In this context, the reduced coordination number in 2D systems compared to 3D systems, and the decreased pseudo-attractive force due to the depletion effect, can offer rationalizations. We would like to add some theoretical discussion of these aspects with equations.

      Third, we will clarify terminology and provide better explanation in relation to previous studies. In some parts in manuscripts, such as complexes containing receptors, there were some disunity in terminology and lack of annotations in figures. We will improve the wording and visualization in the text for further clarity and add relevant references, as suggested by the reviewers.

      Also, as additionally suggested, scripts for the simulation and analysis together with the initial structure obtained will be deposited to Zenodo or GitHub.

    1. Author response:

      eLife Assessment

      This work presents an important technical advancement with the release of MorphoNet 2.0, a user-friendly, standalone platform for 3D+T segmentation and analysis in biological imaging. The authors provide convincing evidence of the tool's capabilities through illustrative use cases, though broader validation against current state-of-the-art tools would strengthen its position. The software's accessibility and versatility make it a resource that will be of value for the bioimaging community, particularly in specialized subfields.

      We would like to thank the editors and reviewers for their careful and constructive evaluation of our manuscript “MorphoNet 2.0: An innovative approach for qualitative assessment and segmentation curation of large-scale 3D time-lapse imaging datasets”. We are grateful for the positive assessment of MorphoNet 2.0 as a valuable and accessible tool for the bioimaging community, and for the recognition of its technical advancements, particularly in the context of complex 3D+t segmentation tasks.

      The reviewers have highlighted several important points that we will address in the revised manuscript. These include:

      - The need for a clearer demonstration that improvements in unsupervised quality metrics correspond to actual improvements in segmentation quality. In response, we will provide comparisons with gold standard annotations where available and clarify how to interpret metric distributions.<br /> - The potential risk of circular logic when using unsupervised metrics to guide model training. We now explicitly discuss this limitation and emphasize the importance of external validation and expert input.<br /> - The value of comparing MorphoNet 2.0 to other tools such as FIJI and napari. We will include a comparative table to help readers understand MorphoNet’s positioning and complementarity.<br /> - The importance of clearer documentation and terminology. We will overhaul the help pages, standardize plugin naming, and add a glossary-style table to the manuscript.<br /> - Suggestions for future developments, such as mesh export and interoperability with napari, which we will explore for the revision.

      We appreciate the detailed feedback on both scientific and editorial aspects, including corrections to figures and text, and we will integrate all suggested revisions to improve the manuscript’s clarity and impact. We are confident that these changes will strengthen the manuscript and enhance the utility of MorphoNet 2.0 for the community.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present a substantial improvement to their existing tool, MorphoNet, intended to facilitate assessment of 3D+t cell segmentation and tracking results, and curation of high-quality analysis for scientific discovery and data sharing. These tools are provided through a user-friendly GUI, making them accessible to biologists who are not experienced coders. Further, the authors have re-developed this tool to be a locally installed piece of software instead of a web interface, making the analysis and rendering of large 3D+t datasets more computationally efficient. The authors evidence the value of this tool with a series of use cases, in which they apply different features of the software to existing datasets and show the improvement to the segmentation and tracking achieved.

      While the computational tools packaged in this software are familiar to readers (e.g., cellpose), the novel contribution of this work is the focus on error correction. The MorphoNet 2.0 software helps users identify where their candidate segmentation and/or tracking may be incorrect. The authors then provide existing tools in a single user-friendly package, lowering the threshold of skill required for users to get maximal value from these existing tools. To help users apply these tools effectively, the authors introduce a number of unsupervised quality metrics that can be applied to a segmentation candidate to identify masks and regions where the segmentation results are noticeably different from the majority of the image.

      This work is valuable to researchers who are working with cell microscopy data that requires high-quality segmentation and tracking, particularly if their data are 3D time-lapse and thus challenging to segment and assess. The MorphoNet 2.0 tool that the authors present is intended to make the iterative process of segmentation, quality assessment, and re-processing easier and more streamlined, combining commonly used tools into a single user interface.

      We sincerely thank the reviewer for their thorough and encouraging evaluation of our work. We are grateful that they highlighted both the technical improvements of MorphoNet 2.0 and its potential impact for the broader community working with complex 3D+t microscopy datasets. We particularly appreciate the recognition of our efforts to make advanced segmentation and tracking tools accessible to non-expert users through a user-friendly and locally installable interface, and for pointing out the importance of error detection and correction in the iterative analysis workflow. The reviewer’s appreciation of the value of integrating unsupervised quality metrics to support this process is especially meaningful to us, as this was a central motivation behind the development of MorphoNet 2.0. We hope the tool will indeed facilitate more rigorous and reproducible analyses, and we are encouraged by the reviewer’s positive assessment of its utility for the community.

      One of the key contributions of the work is the unsupervised metrics that MorphoNet 2.0 offers for segmentation quality assessment. These metrics are used in the use cases to identify low-quality instances of segmentation in the provided datasets, so that they can be improved with plugins directly in MorphoNet 2.0. However, not enough consideration is given to demonstrating that optimizing these metrics leads to an improvement in segmentation quality. For example, in Use Case 1, the authors report their metrics of interest (Intensity offset, Intensity border variation, and Nuclei volume) for the uncurated silver truth, the partially curated and fully curated datasets, but this does not evidence an improvement in the results. Additional plotting of the distribution of these metrics on the Gold Truth data could help confirm that the distribution of these metrics now better matches the expected distribution.

      Similarly, in Use Case 2, visual inspection leads us to believe that the segmentation generated by the Cellpose + Deli pipeline (shown in Figure 4d) is an improvement, but a direct comparison of agreement between segmented masks and masks in the published data (where the segmentations overlap) would further evidence this.

      We agree that demonstrating the correlation between metric optimization and real segmentation improvement is essential. We will add new analysis comparing the distributions of the unsupervised metrics with the gold truth data before and after curation. Additionally, we will provide overlap scores where ground truth annotations are available, confirming the improvement. We will also explicitly discuss the limitation of relying solely on unsupervised metrics without complementary validation.

      We would appreciate the authors addressing the risk of decreasing the quality of the segmentations by applying circular logic with their tool; MorphoNet 2.0 uses unsupervised metrics to identify masks that do not fit the typical distribution. A model such as StarDist can be trained on the "good" masks to generate more masks that match the most common type. This leads to a more homogeneous segmentation quality, without consideration for whether these metrics actually optimize the segmentation

      We thank the reviewer for this important and insightful comment. It raises a crucial point regarding the risk of circular logic in our segmentation pipeline. Indeed, relying on unsupervised metrics to select “good” masks and using them to train a model like StarDist could lead to reinforcing a particular distribution of shapes or sizes, potentially filtering out biologically relevant variability. This homogenization may improve consistency with the chosen metrics, but not necessarily with the true underlying structures.

      We fully agree that this is a key limitation to be aware of. We will revise the manuscript to explicitly discuss this risk, emphasizing that while our approach may help improve segmentation quality according to specific criteria, it should be complemented with biological validation and, when possible, expert input to ensure that important but rare phenotypes are not excluded.

      In Use case 5, the authors include details that the errors were corrected by "264 MorphoNet plugin actions ... in 8 hours actions [sic]". The work would benefit from explaining whether this is 8 hours of human work, trying plugins and iteratively improving, or 8 hours of compute time to apply the selected plugins.

      We will clarify that the “8 hours” refer to human interaction time, including exploration, testing, and iterative correction using plugins.

      Reviewer #2 (Public review):

      Summary:

      This article presents Morphonet 2.0, a software designed to visualise and curate segmentations of 3D and 3D+t data. The authors demonstrate their capabilities on five published datasets, showcasing how even small segmentation errors can be automatically detected, easily assessed, and corrected by the user. This allows for more reliable ground truths, which will in turn be very much valuable for analysis and training deep learning models. Morphonet 2.0 offers intuitive 3D inspection and functionalities accessible to a non-coding audience, thereby broadening its impact.

      Strengths:

      The work proposed in this article is expected to be of great interest to the community by enabling easy visualisation and correction of complex 3D(+t) datasets. Moreover, the article is clear and well written, making MorphoNet more likely to be used. The goals are clearly defined, addressing an undeniable need in the bioimage analysis community. The authors use a diverse range of datasets, successfully demonstrating the versatility of the software.

      We would also like to highlight the great effort that was made to clearly explain which type of computer configurations are necessary to run the different datasets and how to find the appropriate documentation according to your needs. The authors clearly carefully thought about these two important problems and came up with very satisfactory solutions.

      We would like to sincerely thank the reviewer for their positive and thoughtful feedback. We are especially grateful that they acknowledged the clarity of the manuscript and the potential value of MorphoNet 2.0 for the community, particularly in facilitating the visualization and correction of complex 3D(+t) datasets. We also appreciate the reviewer’s recognition of our efforts to provide detailed guidance on hardware requirements and access to documentation—two aspects we consider crucial to ensuring the tool is both usable and widely adopted. Their comments are very encouraging and reinforce our commitment to making MorphoNet 2.0 as accessible and practical as possible for a broad range of users in the bioimage analysis community.

      Weaknesses:

      There is still one concern: the quantification of the improvement of the segmentations in the use cases and, therefore, the quantification of the potential impact of the software. While it appears hard to quantify the quality of the correction, the proposed work would be significantly improved if such metrics could be provided.

      The authors show some distributions of metrics before and after segmentations to highlight the changes. This is a great start, but there seem to be two shortcomings: first, the comparison and interpretation of the different distributions does not appear to be trivial. It is therefore difficult to judge the quality of the improvement from these. Maybe an explanation in the text of how to interpret the differences between the distributions could help. A second shortcoming is that the before/after metrics displayed are the metrics used to guide the correction, so, by design, the scores will improve, but does that accurately represent the improvement of the segmentation? It seems to be the case, but it would be nice to maybe have a better assessment of the improvement of the quality.

      We thank the reviewer for this constructive and important comment. We fully agree that assessing the true quality improvement of segmentation after correction is a central and challenging issue. While we initially focused on changes in the unsupervised quality metrics to illustrate the effect of the correction, we acknowledge that interpreting these distributions may not be straightforward, and that relying solely on the metrics used to guide the correction introduces an inherent bias in the evaluation.

      To address the first point, we will revise the manuscript to provide clearer guidance on how to interpret the changes in metric distributions before and after correction, with additional examples to make this interpretation more intuitive.

      Regarding the second point, we agree that using independent, external validation is necessary to confirm that the segmentation has genuinely improved. To this end, we will include additional assessments using complementary evaluation strategies on selected datasets where ground truth is accessible, to compare pre- and post-correction segmentations with an independent reference. These results reinforce the idea that the corrections guided by unsupervised metrics generally lead to more accurate segmentations, but we also emphasize their limitations and the need for biological validation in real-world cases.

      Reviewer #3 (Public review):

      Summary:

      A very thorough technical report of a new standalone, open-source software for microscopy image processing and analysis (MorphoNet 2.0), with a particular emphasis on automated segmentation and its curation to obtain accurate results even with very complex 3D stacks, including timelapse experiments.

      Strengths:

      The authors did a good job of explaining the advantages of MorphoNet 2.0, as compared to its previous web-based version and to other software with similar capabilities. What I particularly found more useful to actually envisage these claimed advantages is the five examples used to illustrate the power of the software (based on a combination of Python scripting and the 3D game engine Unity). These examples, from published research, are very varied in both types of information and image quality, and all have their complexities, making them inherently difficult to segment. I strongly recommend the readers to carefully watch the accompanying videos, which show (although not thoroughly) how the software is actually used in these examples.

      We sincerely thank the reviewer for their thoughtful and encouraging feedback. We are particularly pleased that the reviewer appreciated the comparative analysis of MorphoNet 2.0 with both its earlier version and existing tools, as well as the relevance of the five diverse and complex use cases we selected. Demonstrating the software’s versatility and robustness across a variety of challenging datasets was a key goal of this work, and we are glad that this aspect came through clearly. We also appreciate the reviewer’s recommendation to watch the accompanying videos, which we designed to provide a practical sense of how the tool is used in real-world scenarios. Their positive assessment is highly motivating and reinforces the value of combining scripting flexibility with an interactive 3D interface.

      Weaknesses:

      Being a technical article, the only possible comments are on how methods are presented, which is generally adequate, as mentioned above. In this regard, and in spite of the presented examples (chosen by the authors, who clearly gave them a deep thought before showing them), the only way in which the presented software will prove valuable is through its use by as many researchers as possible. This is not a weakness per se, of course, but just what is usual in this sort of report. Hence, I encourage readers to download the software and give it time to test it on their own data (which I will also do myself).

      We fully agree that the true value of MorphoNet 2.0 will be demonstrated through its practical use by a wide range of researchers working with complex 3D and 3D+t datasets. In this regard, we will improve the user documentation and provide a set of example datasets to help new users quickly familiarize themselves with the platform. We are also committed to maintaining and updating MorphoNet 2.0 based on user feedback to further support its usability and impact.

      In conclusion, I believe that this report is fundamental because it will be the major way of initially promoting the use of MorphoNet 2.0 by the objective public. The software itself holds the promise of being very impactful for the microscopists' community.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary: 

      The manuscript by Nicoletti et al. presents a minimal model of habituation, a basic form of non-associative learning, addressing both from dynamical and information theory aspects of how habituation can be realized. The authors identify that negative feedback provided with a slow storage mechanism is sufficient to explain habituation.

      Strengths: 

      The authors combine the identification of the dynamical mechanism with information-theoretic measures to determine the onset of habituation and provide a description of how the system can gain maximum information about the environment.

      We thank the reviewer for highlighting the strength of our work and for their comments, which we believe have been instrumental in significantly improving our work and its scope. Below, we address all their concerns.

      Weaknesses: 

      I have several main concerns/questions about the proposed model for habituation and its plausibility. In general, habituation does not only refer to a decrease in the responsiveness upon repeated stimulation but as Thompson and Spencer discussed in Psych. Rev. 73, 16-43 (1966), there are 10 main characteristics of habituation, including (i) spontaneous recovery when the stimulus is withheld after response decrement; dependence on the frequency of stimulation such that (ii) more frequent stimulation results in more rapid and/or more pronounced response decrement and more rapid spontaneous recovery; (iii) within a stimulus modality, the less intense the stimulus, the more rapid and/or more pronounced the behavioral response decrement; (iv) the effects of repeated stimulation may continue to accumulate even after the response has reached an asymptotic level (which may or may not be zero, or no response). This effect of stimulation beyond asymptotic levels can alter subsequent behavior, for example, by delaying the onset of spontaneous recovery. 

      These are only a subset of the conditions that have been experimentally observed and therefore a mechanistic model of habituation, in my understanding, should capture the majority of these features and/or discuss the absence of such features from the proposed model. 

      We are really grateful to the reviewer for pointing out these aspects of habituation that we overlooked in the previous version of our manuscript. Indeed, our model is able to capture most of these 10 observed behaviors, specifically: 1) habituation; 2) spontaneous recovery; 3) potentiation of habituation; 4) frequency sensitivity; 5) intensity sensitivity; 6) subliminal accumulation. Here, we are following the same terminology employed in Eckert et al., Current Biology 34, 5646–5658 (2024), the paper highlighted by the reviewer. We have dedicated a section of the revised version of the manuscript to these hallmarks, substantiating the validity of our framework as a minimal model to have habituation. We remark that these are the sole hallmarks that can be discussed by considering one single external stimulus and that can be identified without ambiguity in a biochemical context. This observation is again in line with Eckert et al., Current Biology 34, 5646–5658 (2024).

      In the revised version, we employ the same strategy of the aforementioned work to determine when the system can be considered “habituated”. Indeed, we introduce a response threshold that is now discussed in the manuscript. We also included a note in the discussions stating that, since any biochemical model will eventually reach a steady state, subliminal accumulation, for example, can only be seen with the use of a threshold. The introduction of different storage mechanisms, ideally more detailed at a molecular level, can shed light on this conceptual gap. This is an interesting direction of research.

      Furthermore, the habituated response in steady-state is approximately 20% less than the initial response, which seems to be achieved already after 3-4 pulses, the subsequent change in response amplitude seems to be negligible, although the authors however state "after a large number of inputs, the system reaches a time-periodic steady-state". How do the authors justify these minimal decreases in the response amplitude? Does this come from the model parametrization and is there a parameter range where more pronounced habituation responses can be observed? 

      The reviewer is correct, but this is solely a consequence of the specific set of parameters we selected. We made this choice solely for visualization purposes in the previous version. In the revised version, in the section discussing the hallmarks of habituation, we also show other parameter choices when the response decrement is more pronounced. Moreover, we remark that the contour plot of \Delta⟨U> clearly shows that the decrement can largely exceed the 20% threshold presented in the previous version.

      In the revised version, also in light of the works highlighted by the reviewer, we decided to move the focus of the manuscript to the information-theoretic advantage of habituation. As such, we modified several parts of the main text. Also, in the region of optimal information gain, habituation is at an intermediate level. For this reason, we decided to keep the same parameter choice as the previous version in Figure 2.

      We stated that the time-periodic steady-state is reached “after a large number of stimuli” from a mathematical perspective. However, by using a habituation threshold, as done in Eckert et al., Current Biology 34, 5646–5658 (2024), we can state that the system is habituated after a few stimuli for each set of parameters. This aspect is highlighted in the revised version of the manuscript (see also the point above).

      The same is true for the information content (Figure 2f) - already at the first pulse, IU, H ~ 0.7 and only negligibly increases afterwards. In my understanding, during learning, the mutual information between the input and the internal state increases over time and the system extracts from these predictions about its responses. In the model presented by the authors, it seems the system already carries information about the environment which hardly changes with repeated stimulus presentation. The complexity of the signal is also limited, and it is very hard to clarify from the presented results, whether the proposed model can actually explain basic features of habituation, as mentioned above. 

      As for the response decrement of the readout, we can certainly choose a set of parameters for which the information gain is higher. In the revised version, we also report the information at the first stimulation and when the system is habituated to give a better idea of the range of these quantities. At any rate, as the referee correctly points out, it is difficult to give an intuitive interpretation of the information in our minimal model.

      It is also important to remark that, since the readout population and the receptor both undergo fast dynamics (with appropriate timescales as discussed in the text), we are not observing the transient gain of information associated with the first stimulus. As such, the mutual information presents a discontinuous behavior that resembles the dynamics of the readout, thereby starting at a non-zero value already at the first stimulus.

      Additionally, there have been two recent models on habituation and I strongly suggest that the authors discuss their work in relation to recent works (bioRxiv 2024.08.04.606534; arXiv:2407.18204).

      We thank the reviewer for pointing out these relevant references. In the revised version, we highlighted that we discuss the information-theoretic aspects of habituation, while the aforementioned references focus on the dynamics of this phenomenon.

      Reviewer #1 (Recommendations for the authors):

      I would also like to note here the simplification of the proposed biological model - in particular, that the receptor can be in an active/passive state, as well as proposing the Nf-kB signaling module as a possible molecular realization. Generally, a large number of cell surface receptors including RTKs of GPCRs have much more complex dynamics including autocatalytic activation that generally leads to bistability, and the Nf-kB has been demonstrated to have oscillatory even chaotic dynamics (works of Savas Tsay, Mogens Jensen and others). Considering this, the authors should at least discuss under which conditions these TNF-Alpha signaling could potentially serve as a molecular realisation for habituation. 

      We thank the reviewer for bringing this to our attention. In the previous version, we reported the TNF signaling network only to show a similar coarse-grained modular structure. However, following a suggestion of reviewer #2, we decided to change Figure 1 to include a simplified molecular scheme of chemotaxis rather than TNF signaling, to avoid any source of confusion about this issue.

      Also, a minor point: Figures 2d-e are cited before 2a-c. 

      We apologize for the oversight. The structure of the Figures and their order is now significantly different, and they are now cited in the correct order. 

      Reviewer #2 (Public review):

      In this study, the authors aim to investigate habituation, the phenomenon of increasing reduction in activity following repeated stimuli, in the context of its information-theoretic advantage. To this end, they consider a highly simplified three-species reaction network where habituation is encoded by a slow memory variable that suppresses the receptor and therefore the readout activity. Using analytical and numerical methods, they show that in their model the information gain, the difference between the mutual information between the signal and readout after and before habituation, is maximal for intermediate habituation strength. Furthermore, they demonstrate that the Pareto front corresponds to an optimization strategy that maximizes the mutual information between signal and readout in the steady state, minimizes some form of dissipation, and also exhibits similar intermediate habituation strength. Finally, they briefly compare predictions of their model to whole-brain recordings of zebrafish larvae under visual stimulation. 

      The author's simplified model might serve as a solid starting point for understanding habituation in different biological contexts as the model is simple enough to allow for some analytic understanding but at the same time exhibits all basic properties of habituation in sensory systems. Furthermore, the author's finding of maximal information gain for intermediate habituation strength via an optimization principle is, in general, interesting. However, the following points remain unclear or are weakly explained: 

      We thank the reviewer for deeming our work interesting and for considering it a solid starting point for understanding habituation in biological systems.

      (1) Is it unclear what the meaning of the finding of maximal information gain for intermediate habituation strength is for biological systems? Why is information gain as defined in the paper a relevant quantity for an organism/cell? For instance, why is a system with low mutual information after the first stimulus and intermediate mutual information after habituation better than one with consistently intermediate mutual information? Or, in other words, couldn't the system try to maximize the mutual information acquired over the whole time series, e.g., the time series mutual information between the stimulus and readout?

      This is a delicate aspect to discuss and we thank the referee for the comment. In the revised version, we report information gain, initial and final information, highlighting that both gain and final information are higher in regions where habituation is present. They have qualitatively similar behavior and highlight a clear information-theoretic advantage of this dynamical phenomenon. An important point is that, to determine the optimal Pareto front, we consider a prolonged stimulus and its associated steady-state information. Therefore, from the optimization point of view, there is no notion of “information gain” or “final information”, which are intrinsically dynamical quantities. As a result, the fact that optimal curve lies in the region of optimal information gain is a-priori not expected and hints at the potential crucial role of this feature. In the revised version, we elucidate this aspect with several additional analyses.

      We would like to add that, from a naive perspective, while the first stimulation will necessarily trigger a certain (non-zero) mutual information, multiple observations of the same stimulus have to reflect into accumulated information that consequently drives the onset of observed dynamical behaviors, such as habituation.

      (2) The model is very similar to (or a simplification of previous models) for adaptation in living systems, e.g., for adaptation in chemotaxis via activity-dependent methylation and demethylation. This should be made clearer.

      We apologize for having missed this point. Our choice has been motivated by the fact that we wanted to avoid confusion between the usual definition of (perfect) adaptation and habituation. However, we now believe that this is not the case for the revised manuscript, and we now include chemotaxis as an example in Figure 1.

      (3) It remains unclear why this optimization principle is the most relevant one. While it makes sense to maximize the mutual information between stimulus and readout, there are various choices for what kind of dissipation is minimized. Why was \delta Q_R chosen and not, for instance, \dot{\Sigma}_int or the sum of both? How would the results change in that case? And how different are the results if the mutual information is not calculated for the strong stimulation input statistics but for the background one?

      We thank the reviewer for the suggestion. We agree that a priori, there is no reason to choose \delta Q_R or a function of the internal energy flux J_int (that, in the revised version, we are using in place of \dot\Sigma_int following the suggestion of reviewer #3). The rationale was to minimize \delta Q_R since this dissipation is unavoidable and stems from the presence of the storage inhibiting the receptor through the internal pathway. Indeed, considering the existence of two different pathways implementing sensing and feedback, the presence of any input will result in a dissipation produced by the receptor. This energy consumption is reflected in \delta Q_R.

      In the revised version, we now include in the optimization principle two energy contributions (see Eq. (14) of the revised manuscript): \delta Q_R and E_int, which is the energy consumption associated with the driven storage production per unit energy. All Figures have been updated accordingly. The results remain similar, as \delta Q_R still represents the main contribution, especially at high \beta.

      Furthermore, in the revised version, we include examples of the Pareto optimization for different values of input strength. As detailed both in the main text and the Supplementary Information, changing the value of ⟨H⟩ moves the Pareto frontier in the (\beta, \sigma) space, since the signal needs to be strong enough for the system to distinguish it from the intrinsic thermal noise (controlled by beta). We also show that if the system is able to tune the inhibition strength \kappa, the Pareto frontiers at different ⟨H⟩ collapse into a single curve. This shows that, although the values of, e.g., the mutual information, depend on ⟨H⟩, the qualitative behavior of the system in this regime is effectively independent of it. We also added more details about this in the Supplementary Information.

      (4) The comparison to the experimental data is not too strong of an argument in favor of the model. Is the agreement between the model and the experimental data surprising? What other behavior in the PCA space could one have expected in the data? Shouldn't the 1st PC mostly reflect the "features", by construction, and other variability should be due to progressively reduced activity levels? 

      The agreement between data and model is not surprising - we agree on this - since the data exhibit habituation. However, we believe that the fact that our minimal model is able to capture the features of a complex neural system just by looking at the PCs, without any explicit biological details, is non-trivial. We also stress that the 1st PC only reflects the feature that captures most of the variance of the data and, as such, it is difficult to have a-priori expectations on what it should represent. In the case of the data generated from the model, most of the variance of the activity comes from the switching signal, and similar considerations can be made for the looming stimulations in the data. We updated the manuscript to clarify this point.

      Reviewer #2 (Recommendations for the authors):

      (1) The abstract makes it sound like a new finding is that habituation is due to a slow, negative feedback mechanism. But, as mentioned in the introduction, this is a well-known fact. 

      We agree with the reviewer. We have revised the abstract.

      (2) Figure 2c Why does the range of Delta Delta I_f include negative values if the corresponding region is shaded (right-tilted stripes)? 

      The negative values in the range are those attained in the shaded region with right-tilted stripes. We decided to include them in the colorbar for clarity, since Delta Delta I_f is also plotted in the region where it attains negative values.

      (3) What does the Pareto front look like if the optimization is done for input statistics given by ⟨H⟩_min? 

      In the revised version, we include examples of the Pareto optimization for different values of input strength. As detailed both in the main text and the Supplementary Information, changing the value of ⟨H⟩ moves the Pareto frontier in the (\beta, \sigma) space, since the strength of the signal is crucial for the system to discriminate input and thermal noise (see also the answers above).

      In particular, in Figure 4 we explicitly compare the results of the Pareto optimization (which is done with a static input of a given statistics) with the dynamics of the model for different values of ⟨H⟩ in two scenarios, i.e., adaptive and non-adaptive inhibition strength (see answers above for details).

      We also remark that ⟨H⟩_min represents the background signal that the system is not trying to capture, which is why we never used it for optimization.

      (4) From the main text, it is rather difficult to understand how the comparison to the experimental data was performed. How was the PCA done exactly? What are the "features" of the evoked neural response? 

      The PCA on data is performed starting from the single-neuron calcium dynamics. To perform a far comparison, we reconstruct a similar but extremely simplified dynamics using our model as explained in Methods to perform the PCA on analogous simulated data. We added a comment on this in the revised version. While these components capture most of the variance in the data, their specific interpretation is usually out of reach and we believe that it lies beyond the scope of this theoretical work. We also remark that the model does not contain all these biological details - a strong aspect in our opinion - and, as such, it cannot capture specific biological features.

      Reviewer #3 (Public review):

      The authors use a generic model framework to study the emergence of habituation and its functional role from information-theoretic and energetic perspectives. Their model features a receptor, readout molecules, and a storage unit, and as such, can be applied to a wide range of biological systems. Through theoretical studies, the authors find that habituation (reduction in average activity) upon exposure to repeated stimuli should occur at intermediate degrees to achieve maximal information gain. Parameter regimes that enable these properties also result in low dissipation, suggesting that intermediate habituation is advantageous both energetically and for the purpose of retaining information about the environment. 

      A major strength of the work is the generality of the studied model. The presence of three units (receptor, readout, storage) operating at different time scales and executing negative feedback can be found in many domains of biology, with representative examples well discussed by the authors (e.g. Figure 1b). A key takeaway demonstrated by the authors that has wide relevance is that large information gain and large habituation cannot be attained simultaneously. When energetic considerations are accounted for, large information gain and intermediate habituation appear to be a favorable combination. 

      We thank the reviewer for this positive assessment of our work and its generality.

      While the generic approach of coarse-graining most biological detail is appealing and the results are of broad relevance, some aspects of the conducted studies, the problem setup, and the writing lack clarity and should be addressed: 

      (1) The abstract can be further sharpened. Specifically, the "functional role" mentioned at the end can be made more explicit, as it was done in the second-to-last paragraph of the Introduction section ("its functional advantages in terms of information gain and energy dissipation"). In addition, the abstract mentions the testing against experimental measurements of neural responses but does not specify the main takeaways. I suggest the authors briefly describe the main conclusions of their experimental study in the abstract.

      We thank the reviewer for raising this point. In the revised version, we have changed the abstract to reflect the reviewer’s points and the new structure and results of the manuscript.

      (2) Several clarifications are needed on the treatment of energy dissipation. 

      -   When substituting the rates in Eq. (1) into the definition of δQ_R above Eq. (10), "σ" does not appear on the right-hand side. Does this mean that one of the rates in the lower pathway must include σ in its definition? Please clarify.

      We apologize to the reviewer for this typo. Indeed, \sigma sets the energy scale of feedback and, as such, it appears in the energetic driving given by the feedback on the receptor, i.e., in Eq. (1) together with \kappa. This typo has been corrected in the revised manuscript, and all subsequent equations are consistent.

      -   I understand that the production of storage molecules has an associated cost σ and hence contributes to dissipation. The dependence of receptor dissipation on ⟨H⟩, however, is not fully clear. If the environment were static and the memory block was absent, the term with ⟨H⟩ would still contribute to dissipation. What would be the nature of this dissipation?

      In the spirit of building a paradigmatic minimal model with a thermodynamic meaning, we considered H to act as an external thermodynamic driving. Since this driving acts on a different pathway with respect to the one affected by the storage, the receptor is driven out of equilibrium by its presence.

      By eliminating the memory block, we would also be necessarily eliminating the presence of the pathway associated with the storage effect (“internal pathway” in the manuscript), since its presence is solely due to the existence of a storage population. Therefore, in this case, the receptor would be a 2-state, 1-pathway system and, as such, it would always satisfy an effective detailed balance. As a consequence, the definition of \delta Q_R reported in the manuscript would not hold anymore and the receptor would not exhibit any dissipation. Thus, in a static environment and without a memory block, no receptor dissipation would be present. We would also like to stress that our choice to model two different pathways has been motivated by the observation that the negative feedback acts along a different pathway in several biochemical and biological examples. We made some changes to the model description in the revised version and we hope that this aspect has been clarified.

      -   Similarly, in Eq. (9) the authors use the ratio of the rates Γ_{s → s+1} and Γ_{s+1 → s} in their expression for internal dissipation. The first-rate corresponds to the synthesis reaction of memory molecules, while the second corresponds to a degradation reaction. Since the second reaction is not the microscopic reverse of the first, what would be the physical interpretation of the log of their ratio? Since the authors already use σ as the energy cost per storage unit, why not use σ times the rate of producing S as a metric for the dissipation rate? 

      We agree with the referee that the reverse reaction we considered is not the microscopic reverse of the storage production. In the case of a fast readout population, we employed a coarse-grained view to compute this entropy production. To be more precise, we gladly welcomed the referee’s suggestion in the revised version and modified the manuscript accordingly. As suggested, we now employ the energy flux associated with the storage production to estimate the internal dissipation (see new Fig. 3). 

      In the revised version, we also use this quantity in the optimization procedure in combination with \deltaQ_R (see new Fig. 4) to have a complete characterization of the system’s energy consumption. The conclusions are qualitatively identical to before, but we believe that now they are more solid from a theoretical perspective. For this important advance in the robustness and quality of our work, we are profoundly grateful to the referee.

      (3) Impact of the pre-stimulus state. The plots in Figure 2 suggest that the environment was static before the application of repeated stimuli. Can the authors comment on the impact of the pre-stimulus state on the degree of habituation and its optimality properties? Specifically, would the conclusions stay the same if the prior environment had stochastic but aperiodic dynamics? 

      The initial stimulus is indeed stochastic with an average constant in time and mimics the background (small) signal. We apply the (strong) stimulation when the system already reached a stationary state with respect to the background. As it can be appreciated in Fig. 2 of the revised version, the model response depends on the pre-stimulus level, since it sets the storage concentration before the stimulation arrives and, as such, the subsequent habituation dynamics. This dependence is important from a dynamical perspective. The information-theoretic picture has been developed, as said above, by letting the system relax before the first stimulus. This eliminates this arbitrary dependence and provides a clearer idea of the functional advantages of habituation. Moreover, the optimization procedure is performed in a completely different setting, with no pre-stimulus at all, since we only have one prolonged stimulation. We hope that the revised version is clearer on all these points.

      (4) Clarification about the memory requirement for habituation. Figure 4 and the associated section argue for the essential role that the storage mechanism plays in habituation. Indeed, Figure 4a shows that the degree of habituation decreases with decreasing memory. The graph also shows that in the limit of vanishingly small Δ⟨S⟩, the system can still exhibit a finite degree of habituation. Can the authors explain this limiting behavior; specifically, why does habituation not vanish in the limit Δ⟨S⟩ -> 0?

      We apologize for the lack of clarity and we thank the reviewer for spotting this issue. In Figure 4 (now Figure 5 in the revised manuscript) Δ⟨S⟩ is not exactly zero, but equal to 0.15% at the final point. It appeared as 0% in the plot due to an unwanted rounding in the plotting function that we missed. This has been fixed in the revised version, thank you.

      Reviewer #3 (Recommendations for the authors):

      (1) Page 2 | "Figure 1b-e" should be "Figure 1b-d" since there is no panel (e) in Figure 1. 

      (2) Figure 1a | In the top schematic, the symbol "k" is used, while in the rest of the text, the proportionality constant is denoted by κ. 

      We thank the reviewer for pointing this out. Figure 1 has been revised and the panels are now consistent. The proportionality constant (the inhibition strength) has also been fixed.

      (3) Figure 1a | I find the upper part of the schematic for Storage hard to perceive. I understand the lower part stands for the degradation reaction for storage molecules. The upper part stands for the synthesis reaction catalyzed by the readout population. I think the bolded upper arrow would explain it sufficiently well; the left/right arrows, together with the crossed green circle make that part of the figure confusing. Consider simplifying. 

      We decided to remove the left/right arrows, as suggested by the reviewer, as we agree that they were unnecessarily complicating the schematic. We hope that the revised version will be easier to understand.

      (4)Page 3 | It would be helpful to tell what the temporal statistics of the input signal $p_H(h,t)$ is, i.e. <h(t) h(t')>. Looking at the example trajectory in Figure 1a, consecutive signal values do not seem correlated. 

      We agree with the reviewer that this is an important detail and worth mentioning. We now explicitly state that consecutive values are not correlated, for simplicity. 

      (5)Figure 2 | I believe the label "EXTERNAL INPUT" refers to the *average* external input, not one specific realization (similar to panels (d) and (e) that report on average metrics). I suggest you indicate this in the label, or, what may be even better, add one particular realization of the stochastic input to the same graph.

      We thank the reviewer for spotting this. We now write that what we show is the average external signal. We prefer this solution rather than showing a realization of the stochastic input, since it is more consistent with the rest of the plots, where we always show average quantities. We also note that Figure 2 is now Figure 3 in the revised manuscript.

      (6)Figure 2d | The expression of Δ⟨U⟩ is the negative of the definition in Eq. (5). It should be corrected. 

      In the revised version, both the definitions in Figure 2 (now Figure 3) and in the text (now Eq. (11)) are consistent.

      (7) Figure 3(d-e) caption | "where ⟨U⟩ starts to be significantly smaller than zero." There, it should be Δ⟨U⟩ instead of ⟨U⟩. 

      Thanks again, we corrected this typo.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      (1) Wnt3 cue and global PCP. PCP has been described in detail in a previous paper on Clytia (Momose et al, 2012): its orientation along the oral-aboral body axis (ciliary basal body positioning studies), and its function in directional polarity during gastrulation (Stbm-, Fz1-, and Dsh-MO experiments). I wonder if this part could be shortened. What is new, however, are the knockdown and Wnt3-mRNA rescue experiments, which provide a deeper insight into the link between Wnt3 function in the blastopore organiser as a source or cue for axis formation. These experiments demonstrate that the Wnt3 knockdown induces defects equivalent to PCP factor knockdown, but can be rescued by Wnt3-mRNA injection, even at a distance of 200 µm away from the Wnt-positive area. The experimental set-up of these new molecular experiments follows in important aspects those of Freeman's experiments of 1981 (who in turn was motivated to re-examine Teissier's work of 1931/1933 ...). Freeman did not use the term "global polarity" but the concept of an axis-inducing source and a long-range tissue polarity can be traced back to both researchers.

      We appreciate the reviewer’s insightful comments for evolutionary biology and cnidarian developmental biology.

      Concerning the presentation of the basic PCP structure of Clytia embryo epidermal cells, we prefer to retain this section unless there is a strict limit on manuscript length. These experiments provide background information necessary to establish the biological system for the readers. The structures of cells, notably cell adhesion, cilia, and the cytoskeleton, are essential components of this system.

      We have restored sentences concerning the historical contributions of Freeman and Teissier from a previous version of the manuscript.

      Freeman’s work offered two key insights. The first is the concept that cell polarity spreads and self-organizes over the distances revealed by the tissue orientation of aggregate embryonic cells (Freeman, 1981 https://doi.org/10.1007/BF00867804), which was termed “global polarity” in a review by Primus and Freeman (2004 https://doi.org/10.1002/bies.20031). This concept closely resembles the modern understanding of PCP coordination mechanisms mediated by core PCP interactions. Remarkably, Freeman proposed this idea in the early 1980s, at the same time of the first characterization of PCP mutants in Drosophila (Gubb and Garcia-Bellido 1982). The second is the role of egg polarity in defining the axis. Freeman demonstrated that the position of the first cleavage furrow predicts the oral-aboral axis by a series of sophisticated experiments. This was a milestone for the studies of cnidarian body axis development.

      However, some of Freeman’s interpretations were misleading. In the 1981 paper, he stated:

      "Polarity

      Other work that I have done has established that the anterior-posterior axis of the planula is set up at the time of the first cleavage; the site where cleavage is initiated specifies the posterior pole of this axis (Freeman 1980). The experiment reported here in which embryos were cut into halves and each half regulated to form a normal planula with the same polarity properties as the embryo it is from provides evidence that these polarity properties are remarkably stable at all developmental stages tested ranging from 4 cell to postgastrula embryos. "

      Freeman hypothesised that cell polarity at the 2- or 4-cell stage, referred to as the “polarity of first cell cleavage,” is directly inherited as the global polarity observed in later developmental stages.

      In the review by Primus and Freeman (2004), two hypotheses were introduced: (1) maternally localised factors, such as mRNA, determine the axis, and (2) cell polarity of cleavage furrow formation, is inherited to later stages and determines the axis. Freeman described these two hypotheses as mutually exclusive. However, we now know that cell polarity at early cleavage stages does not directly contribute to global polarity/PCP. Instead, Wnt/β-catenin signaling is regionally activated by maternally localised mRNAs distributed along the egg polarity (Momose, 2007; Momose, 2008), which maintain Wnt3 localisation and direct morphological axis patterning. Our study shown in this article unified these hypotheses.

      On the second point, as the reviewer noted, Freeman indeed revisited the work of Georges Teissier (Teissier, 1931), who conducted similar experiments on Amphisbetia embryos. It was Teissier who first described how the egg polarity is preserved in later stages and defines the axis. Teissier, however, carefully avoided asserting continuity between egg and blastula polarities, allowing for the possibility of “rétablissement” (re-establishment). As Teissier stated:

      "…On constate, en second lieu, que la polarité de l’œuf se conserve dans chacun de se fragment et que le maintien ou le rétablissement de cette polarité sont indispensables à un développement normal. Un fragment d’œuf ou de morula n’a aucune partie ni aucun blastomère qui soit rigoureusement déterminé comme endoderme, mais possède, par contre, un pôle antérieur et un pôle postérieur bien définis.…

      Mais cette proposition, qui ne semble pourtant guère dépasser la simple constatation des faits, soulève de grave difficulté. Elle donne en effet à la polarité, propriété encore bien mystérieuse, un rôle morphogénétique de premier ordre et implique des conséquences trop importantes pour qu’on puisse l’accepter sans un très sérieux examen.

      Comme je ne pense pas que les questions relatives à la nature des localisation germinales, à l’existence et au fonctionnement des organisateurs de l’œuf des Cœlentérés, puissant, dans l’état actuel de nos connaissances, être discutées utilement, je ne veux voir dans la proposition précédente qu’une façons commode et tout provisoire de systématiser les faits."

      English translation:

      “We note also that the polarity of the egg is preserved in each fragment and that the maintenance or re-establishment of this polarity is essential for normal development. A fragment of egg or morula has no part or blastomere that is rigorously determined as endoderm, but has, on the other hand, a well-defined anterior and posterior pole....

      But this proposition, which hardly seems to go beyond the simple observation of facts, raises serious difficulties. It gives polarity, still a mysterious property, a morphogenetic role of the first order, and implies consequences too important to be accepted without very serious examination.

      As I do not believe that questions concerning the nature of germinal localisation, or the existence and functioning of the egg organisers in Coelenterates, can, in the present state of our knowledge, be usefully discussed, I prefer only to see in the foregoing proposition a convenient and very provisional way of systematising the facts.”

      Teissier, G. (1931). Étude Expérimentale du Développement de Quelques Hydraires. Ann. Sc. Nat. Zool 14, 5–59.

      Teissier's interpretation and caution were reasonable.

      Our work connects recent molecular research on axis specification mechanisms in cnidarians with the classic experimental studies of Freeman and Teissier. We believe it is essential to present and acknowledge their conceptual contributions.  We have updated the Discussion to include these points.

      (2) PCP propagation and β-catenin. The central but unanswered question in this study focuses on the interaction between Wnt3 and PCP and the propagation of PCP. Wnt3 has been described in cnidarians but also in vertebrates and insects as a canonical Wnt interacting with β-catenin in an autocatalytic loop. The surprising result of this study is that the action of Wnt3 on PCP orientation is not inhibited in the presence of a dominant-negative form of CheTCF (dnTCF) ruling out a potential function of β-catenin in PCP. This was supported by studies with constitutively active β-catenin (CA-β-cat) mRNA which was unable to restore PCP coordination nor elongation of Wnt3-depleted embryos but did restore β-catenin-dependent gastrulation. Based on these data, the authors conclude that Wnt3 has two independent roles: Wnt/β-catenin activation and initial PCP orientation (two-step model for PCP formation). However, the molecular basis for the interaction of Wnt3 with the PCP machinery and how the specificity of Wnt3 for both pathways is regulated at the level of Wnt-receiving cells (Fz-Dsh) remain unresolved. Also, with respect to PCP propagation, there is no answer with respect to the underlying mechanisms. The authors found that PCP components are expressed in the mid-blastula stage, but without any further indication of how the signal might be propagated, e.g., by a wavefront of local cell alignment. Here, it is necessary to address the underlying possible cellular interactions more explicitly.

      The question of how Wnt3 interacts with the core PCP complex remains open for future investigation. An obvious hypothesis is that one of the Frizzled receptors binds Wnt3 ligands. For additional details, please refer to the response to Reviewer 2’s comment. Regarding other non-classic Wnt receptors, studies in the developing mouse limb have demonstrated that a Wnt5a gradient controls PCP polarisation via ROR receptors and graded Strabismus phosphorylation (Gao et al., 2011, https://doi.org/10.1016/j.devcel.2011.01.001). However, in this context, the Wnt5a gradient influences the frequency of polarised cells rather than PCP orientation. In Clytia, we performed gene knockdown experiments targeting ROR and RYK receptors using Morpholinos but did not observe any effect on axial patterning, suggesting that these receptors are unlikely to be involved in Wnt3 interaction.

      Concerning PCP propagation mechanisms, these are well-characterized in both Drosophila and vertebrates and conserved across taxa. The localised Fz-Fmi complex at the apical cortex of a cell interacts with the oppositely localised Stbm-Fmi complex in neighbouring cells, enabling coordination of PCP between directly adjacent cells. This interaction provides a comprehensive explanation for PCP propagation mechanisms. In Drosophila, the “domineering non-autonomy” effect is a well-documented phenomenon where PCP orientation autonomously propagates from core PCP mutant mosaic patches. Overall, PCP propagation is a conserved and robust mechanism across metazoans.

      (3) The proposed two-step model for PCP formation has important evolutionary implications in that it excludes the current alternate model according to which a long-range Wnt3-gradient orients PCP ("Wnt/β-catenin-first"). Nevertheless, the initial PCP orientation by Wnt3 - as proposed in the two-step model - is not explained at all on the molecular level. Another possible, but less well-discussed and studied option for linking Wnt3 with PCP action could be the role of other Wnt pathways. The authors present compelling evidence that Wnt3 is the most highly expressed Wnt in Clytia at all stages of development. The authors convincingly show that Wnt3 is the most highly expressed Wnt in Clytia at all stages of development (Figure S1). However, Wnt7 is also more highly expressed, which makes it a candidate for signal transduction from canonical Wnts to PCP Wnts. An involvement of Wnt7 in PCP regulation has been described in vertebrates (http://dx.doi.org/10.1016/j.celrep.2013.12.026). This would challenge the entire discussion and speculation on the evolutionary implications according to which PCP Wnt signaling comes first (PCP-first scenario") and canonical Wnt signaling later in metazoan evolution.

      First of all, we apologise that the expression profile of Wnt7originally provided in Figure S1 was incorrect; Wnt7 is not expressed in the embryonic stage. The error came from the accession number XLOC_034538 assigned to two transcripts, Wnt7 and Ataxin10, in the published genome assembly. Once the expression profile is revised in this light, the data are consistent with the in situ hybridisation data published in Momose et al. (2012, https://doi.org/10.1242/dev.084251). Wnt3 is the only Wnt ligand detectable between egg and gastrula stages. We appreciate the reviewer highlighting this issue and have corrected Figure S1

      If we understand correctly, the reviewer raises the possibility that Wnt3's downstream canonical Wnt/β-catenin pathway activates the expression of other Wnt genes, which in turn orient the PCP. Indeed, we showed that the expression of Wnt1 (previously called WntX2), Wnt2 (WntX1A), Wnt5 and Wnt6 (Wnt9) all becomes undetectable at the planula stage following Wnt3-MO injection (Momose et al., 2012). So, it is a reasonable concern.

      This possibility can be excluded because the canonical pathway activation by CA-β-cat does not restore PCP in Wnt3-MO-injected embryos and Wnt3 can orient PCP without Wnt/β-catenin pathway activity in the presence of dominant negative TCF (dnTCF). Concerning Wnt1b and Wnt11b, these transcripts are maternally stored and even more abundant than Wnt3. However, we can conclude that these do not have any role in axis patterning based on the complete axis loss in Wnt3-MO morphants.

      Lastly, it should of course be remembered that the chronological order of characters appearing in a developmental process does not necessarily reflect their appearance in evolution from ancestral to modern.

      (4) The discussion, including Figure 6, is strongly biased towards the traditional evolutionary scenario postulating a choanzoan-sponge ancestry of metazoans. Chromosome-linkage data of pre-metazoans and metazoans (Schulz et al., 2023; https://doi.org/10(1038/s41586-023-05936-6) now indicate a radically different scenario according to which ctenophores represent the ancestral form and are sister to sponges, cnidarians and bilaterians (the Ctenophora-sister hypothesis). This has also implications for the evolution of Wnt signalling, as discussed in the recent Nature Genetics Review by Holzem et al. (2024) (https://doi.org/10.1038/s41576-024-00699-w). Furthermore, it calls into question the hypothesis of a filter-feeding multicellular gastrula-like ancestor as proposed by Haeckel (Maegele et al., 2023). These papers have not yet been referenced, but they would provide a more robust discussion.

      I overlooked the excellent work of Holzem and colleagues. I appreciate this suggestion. The work, unfortunately, focusses mainly on the Wnt/β-catenin pathway. The PCP pathway consists of not only core PCP (Fmi Stbm, Pk, Dgo, Fz and Dsh) but many other components, such as Rho GTPase, which are all dealt with as "PCP” in this review. While the full set of core PCP is present only in the phylum Cnidaria and bilaterians, Pk and Dgo are present in choanoflagellate and Rho GTPase or ROCK are present even in Fungi (Lapébie et al,  2011 DOI 10.1002/bies.201100023). Holzem et al., described PCP as absent in ctenophores, likely based on the lack of Fmi/Stbm, while claiming its presence in fungi based on Rho GTPase and ROCK. This led to their argument that the Wnt/β-catenin pathway is more ancestral, supported by the absence of PCP components in ctenophores alongside the ctenophore-sister hypothesis.

      This likely reflects the limited attention given to PCP in the metazoan evolutionary biology community. Our work sheds light on the importance of PCP regulation in metazoan evolution. In the revised Discussion, we emphasise this point together with the importance of cell biology studies in basal metazoans and compare them based on functional studies.

      The observation of Aiptasia’s predatory “gastrula-like” larvae is indeed fascinating. Understanding how early metazoan ancestors obtained nutrients is a key to uncovering the origins of metazoans. However, the relevance of this work to metazoan evolution remains unclear. Predatory nutrient uptake is common among cnidarians, and the findings of Maegele et al. could suggest that the predatory gastrula-like state is ancestral, with the symbiotic state being derived, within Cnidaria, but does not notably support it in metazoa. Also, it has to be clarified how predation is defined. Fundamentally, there is little distinction between filter-feeding and predatory feeding regarding heterotrophy; both feeding types require digestive machinery. If active feeding behaviour is the essence of predation, this would be better addressed as an evolutionary neurobiology or neuroscience question. Another mystery is what the metazoan ancestors took as food if they were predatory; there has to be a non-predatorial metazoan, as a food, already present before them.<br /> Overall, Maegele’s work seems premature to be incorporated into the metazoan evo-devo discussion. In either case, the standard approach would involve comparative studies across taxa. It will be interesting to see follow-up works on comparative and functional genomics of predatory/digestive machinery within phylum cnidaria and across metazoan, including sponge and ctenophores.

      Reviewer #2 (Recommendations for the authors):

      We appreciate the reviewer’s expertise and recommendations regarding Wnt and PCP signalling. It would be our great pleasure if our work is seen and referenced by the cell biology community using model animals.

      (1) According to the 2-step model, one would expect that there is a temporal gradient in the spreading of the PCP from oral to aboral. Is there any indication for this?

      The best indication of a spatial and temporal gradient of PCP establishment observed so far is at the blastula stage (Fig.2B). PCP gradually becomes coordinated starting at 9 hpf, when PCP is slightly better organised close to the Wnt3-positive area (oral) compared to distal (aboral) areas. We did live imaging with tagged Poc1 to track the positions of centrioles in each cell (Fig. 2E), but this did not provide any further information about the spreading of the PCP. We hypothesise that there is a delay between PCP polarisation—established through the subcellular localisation of core PCP components—and its structural manifestation as ciliary positioning and orientation. This delay likely varies between cells, preventing the formation of a precise spatial PCP wave. We hope in the future to address this temporal aspect by live-imaging of core PCP proteins labelled with fluorescent proteins.

      (2) PCP is likely to be an all-or-nothing effect, while axial patterning is dose-dependent. is there a critical dose of Wnt3 level required to kick off the PCP pathway?

      We agree that the PCP phenotype is all-or-nothing.  Although we did not perform a quantitative test, we have not seen any intermediate phenotypes in Wnt3-rescue experiments. In our experimental condition (100 ng/µl mRNA), the Wnt3 mRNA injection into a blastomere consistently restores the body axis (via PCP) of Wnt3-MO injected embryos. No axis restoration was observed at 1 ng/µl. At 10 ng/µl, some embryos showed a restored elongated axis, while others showed no axis. The volume of injection is not precisely controllable and can easily vary two-fold, so we assume the limit is somewhere around 10 ng/µl. This contrasts with endoderm rescue via Wnt/β-catenin activation by GSK-β-inhibitors (alsterpaullone) or the constitutively active β-catenin (CA-β-cat), which occurs in a dose-dependent manner (ex. Supplementary Figure S2).

      (3) The key question left unaddressed is whether Wnt3 signals through one or two different Frizzled receptors? Which Frizzled receptors are candidates for this? Could they be knocked down to see which pathway (or both) is affected?

      How Wnt3 orientates the PCP system is an extremely interesting question that needs to be answered, and we plan to address this in the future. In Clytia, four Frizzled genes have been identified in the genome: CheFz1 (vertebrate counterpart of Fz1, 2, 3, 6 and 7), CheFz2 (Fz5 and 8), CheFz3 (Fz9/10) and CheFz4 (Fz4). Knockdown of CheFz1, hereby called Fz1, by Morpholino showed a PCP phenotype (Momose 2012, supplementary data). For a long time, we have suspected that the most likely candidate for PCP mediation is CheFz1. The Wnt3-rescue experiment in CheFz1-blocked background (similar experiment to Figure 3E, F) could potentially have answered this question. No PCP orientation would be expected even near the Wnt3-mRNA injected area if CheFz1 was the Wnt3 receptor for PCP orientation. Unfortunately, no reliable PCP phenotype was observed in this experiment, so this experiment was not included in the manuscript. We initially thought this was due to incomplete suppression of CheFz1 mRNA translation by the Morpholino when used at sub-toxic doses. But we now favour the alternative explanation that Fz1 does not mediate the Wnt3 signal responsible for initiating PCP orientation. We have previously shown that Fz1 is required for the Wnt/ β-catenin pathway (indicated by nuclear β-catenin localisation Momose 2007), which is then required to maintain Wnt3 expression. We cannot rule out that the PCP phenotype obtained previously following Fz1 knockdown (supplementary data in Momose 2012) is an indirect effect of Wnt3 downregulation.

      In future work, we plan to test the PCP involvement of the other Clytia Frizzleds, notably CheFz2 and CheFz4, which are not present as maternal mRNAs but are zygotically expressed in the early gastrula stage. CheFz3 is unlikely to be a candidate because it is aborally localised and acts as a negative receptor for the Wnt/β-catenin pathway (Momose 2007). Lastly, in unpublished experiments, no axial phenotype was obtained with ROR and RYK knockdown by Morpholino (T. Momose unpublished). 

      Based on these considerations, our current working hypothesis is that Wnt3 somehow stabilises or activates one of the Frizzled receptors acting as a core PCP protein in a polarised manner, likely at the oral side of each cell (Stbm is localised at the aboral side), which breaks the PCP symmetry and is propagated across the body axis.

      A few lines have been added to the discussion regarding this point.

      (4) Is there also PCP within the Wnt3 expressing domain? In other words, (and linked to question 2), does PCP require a certain concentration of Wnt3 or a gradient of Wnt3 in order to provide an orientation?

      In the context of a simple Wn3-MO rescue experiment, PCP is coordinated within the Wnt3-positive area. But this could be because PCP can propagate in both orientations, so it does not answer the question. In the Wnt3-rescue experiments in Fmi-MO and Stbm-MO embryos, PCP seemed better oriented close to the boundary between Wnt3-positive and -negative areas, in particular outside the Wnt3-positive area and rather uncoordinated deep in the middle of Wnt3-RNA positive area. 

      If Wnt3 expression is uniform across an embryo, as achieved by Wnt3-mRNA injection into the egg, the axis will be lost entirely (Momose 2008). We interpret these observations as indicating that Wnt3 expression "contrasts" (or steep gradients) act as the PCP orientation cue rather than a permissive manner.

      In normal development, mRNA expression detected by in situ hybridisation has a slight gradient, but we do not have any information about the endogenous protein distribution.

      We greatly appreciate the reviewer’s insightful comments. A few sentences addressing points (2) and (4) have been added. The graphical models in Figures 4 and 6A have been updated. While these are relatively minor changes to the manuscript, they significantly impact future perspectives.

      Minor comments:

      (1) Labeling in some of the figures is too small and not legible, e.g. Figures 4E-H. Please check and improve.

      Agreed. Some labelling was way too small (2.5 points). This has been corrected. The minimum font size is now 6-point for most labelling in the revised Figures. 

      (2) Page 13: ...and allow us to novel scenarios for PCP-driven axis symmetry breaking... seems to lack the verb "propose"

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Compelling and clearly described work that combines two elegant cell fate reporter strains with mathematical modelling to describe the kinetics of CD4+ TRM in mice. The aim is to investigate the cell dynamics underlying the maintenance of CD4+TRM.

      The main conclusions are that:

      (1) CD4+ TRM are not intrinsically long-lived.

      (2) Even clonal half-lives are short: 1 month for TRM in skin, and even shorter (12 days) for TRM in lamina propria.

      (3) TRM are maintained by self-renewal and circulating precursors.

      Strengths:

      (1) Very clearly and succinctly written. Though in some places too succinctly! See suggestions below for areas I think could benefit from more detail.

      (2) Powerful combination of mouse strains and modelling to address questions that are hard to answer with other approaches.

      (3) The modelling of different modes of recruitment (quiescent, neutral, division linked) is extremely interesting and often neglected (for simpler neutral recruitment).

      Weaknesses/scope for improvement:

      (1) The authors use the same data set that they later fit for generating their priors. This double use of the same dataset always makes me a bit squeamish as I worry it could lead to an underestimate of errors on the parameters. Could the authors show plots of their priors and posteriors to check that the priors are not overly-influential? Also, how do differences in priors ultimately influence the degree of support a model gets (if at all)? Could differences in priors lead to one model gaining more support than another?

      We now show the priors and posteriors overlaid in Figure S2. The posteriors lie well within the priors, giving us confidence that the priors are not overly influential.

      (2) The authors state (line 81) that cells were "identified as tissue-localised by virtue of their protection from short-term in vivo labelling (Methods; Fig. S1B)". I would like to see more information on this. How short is short term? How long after labelling do cells need to remain unlabelled in order to be designated tissue-localised (presumably label will get to tissue pretty quickly -within hours?). Can the authors provide citations to defend the assumption that all label-negative cells are tissue-localised (no false negatives)?

      And conversely that no label-positive cells can be found in the tissue (no false positives)? I couldn't actually find the relevant section in the methods and Figure S1B didn't contain this information.

      We did describe the in vivo labeling in the first section of Methods (it was for 3 mins before sacrifice). The two aims of Fig S1B were to show the gating strategy (label-positive and negatives from tissue samples were clearly separated) and to address the false-positive issue. Less than 3% of cells in our tissue samples were positive; therefore, at most 3% of truly tissue-resident cells acquired the i.v. label, and likely less. Excluding those (as we did) therefore makes little difference to our analyses in terms of cell numbers. False negative rates are expected to be extremely low; labeling within circulating cells is typically >99% (see refs in Methods).

      (3) Are the target and precursor populations from the same mice? If so is there any way to reflect the between-individual variation in the precursor population (not captured by the simple empirical fit)? I am thinking particularly of the skin and LP CD4+CD69- populations where the fraction of cells that are mTOM+ (and to a lesser extent YFP+) spans virtually the whole range. Would it be nice to capture this information in downstream predictions if possible?

      This is a great point. We do indeed isolate all populations from each mouse. We are very aware of the advantages of using this grouping of information to reduce within-mouse uncertainty – we employ this as often as we can. The issue here was that the label content within the tissue (target) at any time depends on the entire trajectory of the label frequency in the precursor, in that mouse, up to that point. We can’t identify this curve for each animal individually – so we are obliged to use a population average.

      To mitigate this lack of pairing we do take a very conservative approach and fit this empirical function describing the trajectories of YFP and mTom in precursors at the same time as the label kinetics in the target; that is, we account for uncertainty in label influx in our fits and parameter estimates.

      Another issue is that to be sure that we are performing model selection appropriately, we only use the distribution of the likelihood on the target observations when comparing support for different precursors with LOO-IC. If we had been able to pair the precursor and target data in some way, the two would then be entangled and model comparison across precursors would not be possible.

      We’ve added some of this to the discussion.

      (4) In Figure 3, estimates of kinetics for cells in LP appear to be more dependent on the input model (quiescent/neutral/division-linked) than the same parameters in the skin. Can the authors explain intuitively why this is the case?

      This is a nice observation and it has a fairly straightforward explanation. As we pointed out in the paper, estimated rates of self renewal become more sensitive to the mode of recruitment the greater the rate of influx. If immigrants are quiescent, all Ki67 in the tissue has to be explained by self renewal. If all new immigrants are Ki67 high, the estimate of the rate of self renewal within the tissue will be lower. Across the board, the estimated rates of influx into gut were consistently higher than those in skin, and so the sensitivity of parameters to the mode of recruitment was much more obvious at that site.

      The importance of this trade-off for the division linked model can also be seen when you look at the neutral and quiescent models; they give similar parameter estimates because the Ki67 levels within all precursor populations were all less than 25% and so those two modes of recruitment are difficult to distinguish.

      (5) Can the authors include plots of the model fits to data associated with the different strengths of support shown in Figure 4? That is, I would like to know what a difference in the strength of say 0.43 compared with 0.3 looks like in "real terms". I feel strongly that this is important. Are all the fits fantastic, and some marginally better than others? Are they all dreadful and some are just less dreadful? Or are there meaningful differences?

      This is another good point (and from the author recommendations list, is your most important concern).

      We find that a fairly common issue is that models that are clearly distinguished by information criteria or LRTs can often give visually quite similar fits. Our experience is that this is partly due to the fact that models are usually fit on transformed scales (e.g. log for cell counts, logit for fractions) to normalise residuals, and this uncertainty is compressed when one looks at fits on the observed scale (e.g. linear). Another issue in our case is that for each model (precursor, target, and mode of recruitment) we fit 6 time courses simultaneously. Visual comparisons of fits of different models can then be a little difficult or misleading; apparently small differences in each fitted timecourse can add up to quite significant changes in the combined likelihood. We added this to the Discussion.

      The number of models is combinatorial (Fig. 4) so showing them all seems a bit cumbersome. But now in the supporting information (Fig. S3), for each target we show the best, second best, and the worst model fits overlaid, to give a sense of the dynamic range of the models we considered. As you will now see, visual differences among the most strongly supported models were not huge (but refer to our point just above). Measures of out-of-sample prediction error (LOO-IC) discriminated between these models reasonably well, though (weights shown in Fig. 4).

      It’s also worth mentioning here that we have substantially greater confidence in the identity of the precursors than in the precise modes of recruitment - you can see this clearly in the groupings of weights in Figure 4A. We did comment on this in the text but now emphasise it more.

      (6) Figure 4 left me unclear about exactly which combinations of precursors and targets were considered. Figure 3 implies there are 5 precursors but in Figure 4A at most 4 are considered. Also, Figure 4B suggests skin CD69- were considered a target. This doesn't seem to be specified anywhere.

      Thanks for pointing this out. When we were considering CD4+ EM in bulk as target, this population includes CD69- cells; in those fits, therefore, we couldn't use CD69- as a precursor. We now clarify this in the caption. Thanks also for the observation about Figure 4B; we didn’t consider CD69- cells as a target, so we’ve also made that clearer.

      Reviewer #2 (Public review):

      This manuscript addresses a fundamental problem of immunology - the persistence mechanisms of tissue-resident memory T cells (TRMs). It introduces a novel quantitative methodology, combining the in vivo tracing of T-cell cohorts with rigorous mathematical modeling and inference. Interestingly, the authors show that immigration plays a key role in maintaining CD4+ TRM populations in both skin and lamina propria (LP), with LP TRMs being more dependent on immigration than skin TRMs. This is an original and potentially impactful manuscript. However, several aspects were not clear and would benefit from being explained better or worked out in more detail.

      (1) The key observations are as follows:

      a) When heritably labeling cells due to CD4 expression, CD4+ TRM labeling frequency declines with time. This implies that CD4+ TRMs are ultimately replenished from a source not labeled, hence not expressing CD4. Most likely, this would be DN thymocytes.

      That’s correct.

      b) After labeling by Ki67 expression, labeled CD4+ TRMs also decline - This is what Figure 1B suggests. Hence they would be replaced by a source that was not in the cell cycle at the time of labeling. However, is this really borne out by the experimental data (Figure 2C, middle row)? Please clarify.

      (2) For potential source populations (Figure 2D): Please discuss these data critically. For example, CD4+ CD69- cells in skin and LP start with a much lower initial labeling frequency than the respective TRM populations. Could the former then be precursors of the latter?

      A similar question applies to LN YFP+ cells. Moreover, is the increase in YFP labeling in naïve T cells a result of their production from proliferative thymocytes? How well does the quantitative interpretation of YFP labeling kinetics in a target population work when populations upstream show opposite trends (e.g., naïve T cells increasing in YFP+ frequency but memory cells in effect decreasing, as, at the time of labeling, non-activated = non-proliferative T cells (and hence YFP-) might later become activated and contribute to memory)?

      These are good (and related) points. We've added some text to the discussion, paragraphs 2 and 3; we reproduce it here, slightly expanded.

      Fig 1B was a schematic but did faithfully reflect the impact of any waning of YFP in precursor on its kinetic in the targets. However, in our experiments, as you noted, the kinetics of YFP in most of the precursor populations were quite flat. This was due in part to memory subsets being sustained by the increasing levels of YFP within naïve cells from the cohort of thymocytes labeled during treatment. There is also likely some residual permanent labeling of lymphocyte progenitor populations. We discussed this in Lukas Front Imm 2023. (The latter is not a problem; all that matters for our analysis is that we generate a reasonable empirical description of the label kinetics in naive cells, however it arises). YFP is therefore not cleanly washed out in the periphery; and so for models with circulating memory as the tissue precursor, the flatness of their YFP curves leads to rather flat curves in the tissues.

      The mTom labelling was more informative as it was clearly diluted out of all peripheral populations by mTom-negative descendants of thymically-derived cells, as you point out in (a).

      Regarding (2), re: interpreting the initial levels of labels in precursors and targets. The important point here is that YFP and mTom were induced quickly in all populations we studied; therefore our inferences regarding precursors and targets aren’t informed by the initial levels of levels in each. (Imagine a slow precursor feeding a rapidly dividing target; YFP levels in the former would start lower than those in the latter). The causal issue that we think you’re referring to would matter if one expects the targets to begin with no label at all; for instance, in our busulfan chimeric mouse model (e.g. Hogan PNAS 2015) new, thymically derived ‘labelled’ (donor) cells progressively infiltrate replete ‘unlabelled’ (host) populations. In that case, one can immediately reject certain differentiation pathways by looking the sequence of accrual of donor cells in different subsets.

      The trends in YFP and mTom frequencies after treatment do matter for pathway inference, though, because precursor kinetics must leave an imprint on the target. For the case you mentioned, with opposite trends in label kinetics, such models would unlikely to be supported strongly; indeed, we never saw strong support for naïve cells (strongly increasing YFP) as a direct precursor of TRM (fairly flat).

      We’ve added a condensed version of this to the Discussion.

      (3) Please add a measure of variation (e.g., suitable credible intervals) to the "best fits" (solid lines in Figure 2).

      Added.

      (4) Could the authors better explain the motivation for basing their model comparisons on the Leave-OneOut (LOO) cross-validation method? Why not use Bayesian evidence instead?

      Bayes factors are very sensitive to priors and are either computationally unstable if calculated with importance sampling methods, or very expensive to calculate, if ones uses the more stable bridge sampling method. (We also note that fitting just a single model here takes a substantial amount of time). Further, using BF can be unreliable unless one of the models is close to the 'true' data generating model; though they seem to work well, we can be sure that none of our models are! For us, a more tractable and real-world selection criterion is based on the usefulness of a model, for which predictive performance is a reasonable proxy. In this case the mean out-of-sample prediction error (which LOO-IC reflects) is a wellestablished and valid means of ascribing support to different models.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      Wang et al. identify Hamlet, a PR-containing transcription factor, as a master regulator of reproductive development in Drosophila. Specifically, the fusion between the gonad and genital disc is necessary for the development of continuous testes and seminal vesicle tissue essential for fertility. To do this, the authors generate novel Hamlet null mutants by CRISPR/Cas9 gene editing and characterize the morphological, physiological, and gene expression changes of the mutants using immunofluorescence, RNA-seq, cut-tag, and in-situ analysis. Thus, Hamlet is discovered to regulate a unique expression program, which includes Wnt2 and Tl, that is necessary for testis development and fertility. 

      Strengths: 

      This is a rigorous and comprehensive study that identifies the Hamlet-dependent gene expression program mediating reproductive development in Drosophila. The Hamlet transcription targets are further characterized by Gal4/UAS-RNAi confirming their role in reproductive development. Finally, the study points to a role for Wnt2 and Tl as well as other Hamlet transcriptionally regulated genes in epithelial tissue fusion. 

      We appreciate that the reviewer thinks our study is rigorous.

      Weaknesses: 

      The image resolution and presentation of figures is a major issue in this study. As a nonexpert, it is nearly impossible to see the morphological changes as described in the results. Quantification of all cell biological phenotypes is also lacking therefore reducing the impact of this study to those familiar with tissue fusion events in Drosophila development. 

      In the revised version, we have improved the image presentation and resolution. For all the images with more than two channels, we included single-channel images, changed the green color to lime and the red to magenta, highlighted the testis (TE) and seminal vescicles to make morphological changes more visible.  

      We had quantification for marker gene expression in the original version, and now also included quantification for cell biological phenotypes which are generally with 100% penetrance.  

      Reviewer #2 (Public review): 

      Strengths: 

      Wang and colleagues successfully uncovered an important function of the Drosophila PRDM16/PRDM3 homolog Hamlet (Ham) - a PR domain-containing transcription factor with known roles in the nervous system in Drosophila. To do so, they generated and analyzed new mutants lacking the PR domain, and also employed diverse preexisting tools. In doing so, they made a fascinating discovery: They found that PR-domain containing isoforms of ham are crucial in the intriguing development of the fly genital tract. Wang and colleagues found three distinct roles of Ham: (1) specifying the position of the testis terminal epithelium within the testis, (2) allowing normal shaping and growth of the anlagen of the seminal vesicles and paragonia and (3) enabling the crucial epithelial fusion between the seminal vesicle and the testis terminal epithelium. The mutant blocks fusion even if the parts are positioned correctly. The last finding is especially important, as there are few models allowing one to dissect the molecular underpinnings of heterotypic epithelial fusion in development. Their data suggest that they found a master regulator of this collective cell behavior. Further, they identified some of the cell biological players downstream of Ham, like for example E-Cadherin and Crumbs. In a holistic approach, they performed RNAseq and intersected them with the CUT&TAG-method, to find a comprehensive list of downstream factors directly regulated by Ham. Their function in the fusion process was validated by a tissue-specific RNAi screen. Meticulously, Wang and colleagues performed multiplexed in situ hybridization and analyzed different mutants, to gain a first understanding of the most important downstream pathways they characterized, which are Wnt2 and Toll. 

      This study pioneers a completely new system. It is a model for exploring a process crucial in morphogenesis across animal species, yet not well understood. Wang and colleagues not only identified a crucial regulator of heterotypic epithelial fusion but took on the considerable effort of meticulously pinning down functionally important downstream effectors by using many state-of-the-art methods. This is especially impressive, as the dissection of pupal genital discs before epithelial fusion is a time-consuming and difficult task. This promising work will be the foundation future studies build on, to further elucidate how this epithelial fusion works, for example on a cell biological and biomechanical level. 

      We appreciate that the reviewer thinks our study is orginal and important.

      Weaknesses: 

      The developing testis-genital disc system has many moving parts. Myotube migration was previously shown to be crucial for testis shape. This means, that there is the potential of non-tissue autonomous defects upon knockdown of genes in the genital disc or the terminal epithelium, affecting myotube behavior which in turn affects fusion, as myotubes might create the first "bridge" bringing the epithelia together. The authors clearly showed that their driver tools do not cause expression in myoblasts/myotubes, but this does not exclude non-tissue autonomous defects in their RNAi screen. Nevertheless, this is outside the scope of this work. 

      We thank the reviewer’s consideration of non-tissue autonomous defects upon gene knockdown. The driver, hamRSGal4, drives reporter gene expression mainly in the RS epithelia, but we did observe weak expression of the reporter in the myoblasts before they differentiate into myotubes. Thus, we could not rule out a non-tissue autonomou effect in the RNAi screen. So we now included a statement in the result, “Given that the hamRSGal4 driver is highly expressed in the TE and SV epithelia, we expect highly effective knockdown occurs only in these epithelial cells. However, hamRSGal4 also drives weak expression in the myoblasts before they differentiated into myotubes (Supplementary Fig. 5B), which may result in a non-tissue autonomous effect when knocking down the candidate genes expressed in myoblasts.”

      However, one point that could be addressed in this study: the RNAseq and CUT&TAG experiments would profit from adding principal component analyses, elucidating similarities and differences of the diverse biological and technical replicates. 

      Thanks for the suggestion. We now have included the PCA analyses in supplementary figure 6A-B and the corresponding description in the text. The PCA graphs validated the consistency between biological replicates of the RNA-seq samples. The Cut&Tag graphs confirm the consistency between the two biological replicates from the GFP samples, but show a higher variability between the w1118 replicates. Importantly, we only considered the overlapped peaks pulled by the GFP antibody from the ham_GFP genotype and the Ham antibody from the wildtype (w1118) sample as true Ham binding sites. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      Major Concern: 

      (1) The image resolution and presentation of figures (Figures 2, 5, 6, and 7) is a major issue in this study. As a non-expert, it is nearly impossible to see the morphological changes as described in the results. Images need to be captured at higher resolution and zoomed in with arrows denoting changes as described. Individual channels, particularly for intensity measurement need to be shown in black and white in addition to merged images. Images also need pseudo-colored for color-blind individuals (i.e. no red-green staining). 

      The images were captured at a high resolution, but somehow the resolution was drammaticlly reduced in the BioRxiv PDF. We try to overcome this by directly submitting the PDF in the Elife submission system. In the revised version, we have included single-channel images, changed the green and red colors to lime and magenta for color blindness. We also highlighted the testis (TE) and seminal vescicle structures in the images to make morphological changes more visible.  

      (2) The penetrance of morphological changes observed in RT development is also unclear and needs to be rigorously quantified for data in Figures 2, 5, and 7. 

      We now included quantification for cell biological phenotypes which are generally with 100% penetrance. The percentage of the penetrance and the number of animals used are indicated in each corresponding image.  

      Reviewer #2 (Recommendations for the authors): 

      Major Points 

      (1) Lines 193- 220 I would strongly suggest pointing out the obvious shape defects of the testes visible in Figure 2A ("Spheres" instead of "Spirals"). These are probably a direct consequence of a lack in the epithelial connection that myotubes require to migrate onto the testis (in a normal way) as depicted in the cartoons, allowing the testis to adopt a spiral shape through myotube-sculpting (Bischoff et al., 2021), further confirming the authors' findings! 

      Good point. In the revised text, we have added more description of the testis shape defects and pointed out a potential contribution from compromised myotube migration.   

      (2) Line 216: "Often separated from each other". Here it would be important to mention how often. If the authors cannot quantify that from existing data, I suggest carrying it out in adult/pharate adult genital tracts (if there is no strong survivor bias due to the lethality of stronger affected animals), as this is much easier than timing prepupae. This should be a quick and easy experiment. 

      Because it is hard to tell whether the separation of the SV and TE was caused by developmental defects or sometimes could be due to technical issues (bad dissection), we now change the description to, “control animals always showed connected TE and SV, whereas ham mutant TE and SV tissues were either separated from each other, or appeared contacted but with the epithelial tubes being discontinuous (Fig. 2B).” Additionally, we quantified the disconnection phenotype, which is 100% penetrance in 18 mutant animals. This quantification is now included in the figure. 

      (3) Lines 289-305, Figure 3. I could only find how many replicates were analyzed in the RNAseq/CUT&Tag experiments in the Material & Methods section. I would add that at least in the figure legends, and perhaps even in the main text. Most importantly, I would add a Principal Component Analysis (one for RNAseq and one for the CUT&TAG experiment), to demonstrate the similarity of biological replicates (3x RNaseq, 4x Cut&Tag) but also of the technical replicates (RNAseq: wt & wt/dg, ham/ham & ham/df, GD & TE; CUT&TAG: Antibody & GFP-Antibody, TG&TE...). This should be very easy with the existing data, and clearly demonstrate similarities & differences in the different types of replicates and conditions. 

      Principle component analysis and its description are now added to Supplementary Fig 6 and the main text respectively. 

      (4) Line 321; Supplementary Table 1: In the table, I cannot find which genes are down- or upregulated - something that I think is very important. I would add that, and remove the "color" column, which does not add any useful information. 

      In Supplementary table 1, the first sheet includes upregulated genes while the second sheet includes downregulated genes. We removed the column “color” as suggested.  

      (5) Line 409: SCRINSHOT was carried out with candidate genes from the screen. One gene I could not find in that list was the potential microtubule-actin crosslinker shot. If shot knockdown caused a phenotype, then I would clearly mention and show it. If not, I would mention why a shot is important, nonetheless. 

      shot is one of the candidate target genes selected from our RNA-seq and Cut&Tag data. However, in the RNAi screen, knocking down shot with the available RNAi lines did not cause any obvious phenotype. These could be due to inefficient RNAi knockdown or redundancy with other factors. We anyway wanted to examine shot expression pattern in the developing RS, give the important role of shot in epithelial fusion (Lee S., 2002). Using SCRINSHOT, we could detect epithelial-specific expression of shot, implying its potential function in this context. We now revised the text to clarify this point. 

      Minor points 

      (1) Cartoons in Figure 1: The cartoons look like they were inspired by the cartoon from Kozopas et al., 1998 Fig. 10 or Rothenbusch-Fender et al., 2016 Fig 1. I think the manuscript would greatly profit from better cartoons, that are closer to what the tissue really looks like (see Figure 1H, 2G), to allow people to understand the somewhat complicated architecture. The anlagen of the seminal vesicles/paragonia looks like a butterfly with a high columnar epithelium with a visible separation between paragonia/seminal vesicles (upper/lower "wing" of the "butterfly"). Descriptions like "unseparated" paragonia/seminal vesicle anlagen, would be much easier to understand if the cartoons would for example reflect this separation. It would even be better to add cartoons of the phenotypic classes too, and to put them right next to the micrographs. (Another nitpick with the cartoons: pigment cells are drastically larger and fewer in number (See: Bischoff et al., 2021 Figure 1E & MovieM1).) 

      Thanks for the suggestion. We have updated Figure 1 by adding additional illustrations showing the accessory gland and seminal vesicle structures in the pupal stage and changing the size of pigment cells.

      (2) Line 95-121 I would also briefly introduce PR domains, here. 

      We have added a brief descripition of the PR domains.

      (3) Line 152, 158, 160, 162. When first reading it, I was a bit confused by the usage of the word sensory organ. I would at least introduce that bristles are also known as external mechanosensory organs. 

      We have now revised the description to “mechano-sensory organ”.

      eg. Line 184, 194, and many more. Most times, the authors call testis muscle precursors "myoblasts". This is correct sometimes, but only when referring to the stage before myoblast-fusion, which takes place directly before epithelial fusion (28 h APF). Postmyoblast-fusion (eg. during migration onto the testis), these cells should be called myotubes or nascent myotubes, as the fly muscle community defined the term myoblast as the singlenuclei precursors to myotubes. 

      We have now revised the description accordingly.  

      (4) Line 217/Figure 2B. It looks like there is a myotube bridge between the testis and the genital disc. I would point that out if it's true. If the authors have a larger z-stack of this connection, I suggest creating an MIP, and checking if there are little clusters of two/three/four nuclei packed together. This would clearly show that the cells in between are indeed myotubes (granted that loss of ham does not introduce myoblast-fusion-defects). 

      We do not have a Z-stack of this connection, and thus can not confirm whether the cells in this image are myotubes. However, we found that mytubes can migrate onto the testis and form the muscular sheet in the ham mutant despite reduced myotube density. At the junction there are myotubes, suggesting that loss of ham does not introduce myoblast-fusion defects. These results are now included in the revised manuscript, supplementary Fig. 5 C-D.

      (5) Line 231/Supplementary Fig. 3C-G: I would add to the cartoons, where the different markers are expressed. 

      We have added marker gene expression in the cartoons.

      (6) Line 239. I don't see what Figure 1A/1H refers to, here. I would perhaps just remove it. 

      Yes, we have removed it.

      (7) Line 232. I would rephrase the beginning of the sentence to: Our data suggest Ham to be... 

      Yes, we have revised it.

      (8) Line 248-250/Figure 2F. Clonal analyses are great, but I think single channels should be shown in black and white. Also, a version without the white dashed line should be shown, to clearly see the differences between wt and ham-mutant cells. 

      Now single channel images from the green and red images are presented in Supplementary Figures. This particular one is in Supplementary Figure 3B. 

      (9) Line 490. The Toll-9 phenotype was identified on the sterility effect/lack-of-spermphenotype alone, and it was deduced, that this suggests connection defects. By showing the right focus plane in Fig S8B (lower right), it should be easy to directly show whether there is a connection defect or not. Also, one would expect clearer testis-shaping defects, like in ham-mutants, as a loss of connection should also affect myotube migration to shape the testis. This is just a minor point, as it only affects supplementary data with no larger impact on the overall findings, even if Toll-9 is shown not to have a defect, after all. 

      We find that scoring defects at the junction site at the adult stage is difficult and may not be always accurate. Instead, we score the presence of sperms in the SV, which indirectly but firmly suggests successful connection between the TE and SV. We have now included a quantification graph, showing the penetrance of the phentoype in the new Supplementary Fig.14C. There were indeed morphological defects of TE in Toll-9 RNAi animals. We now included the image and quantification in the new Supplementary Fig.14B.

    1. Author response:

      The following is the authors’ response to the original reviews

      Response to the public reviews:

      We are very pleased to see these positive reviews of our preprint.

      Reviewers 1 and 3 raise issues around PIP-PP1 interactions.

      (1) Role of the “RVxF-ΦΦ-R-W string”

      Most PIPs interact with the globular PP1 catalytic core through short linear interaction motifs (SLiMs) and Choy et al (PNAS 2014) previously showed that many PIPs interact with PP1 through conserved trio of SLiMs, RVxF-ΦΦ-R, which is also present in the Phactrs.

      Previous structural analysis showed the trajectory of the PPP1R15A/B, Neurabin/Spinphilin (PPP1R9A/B), and PNUTS (PPP1R10) PIPs across the PP1 surface encompasses not only the RVxF-ΦΦ-R trio, but also additional sequences C-terminal to it (Chen et al, eLife, 2015). This extended trajectory is maintained in the Phactr1-PP1 complex (Fedoryshchak et al, eLife (2020). Based on structural alignment we proposed the existence of an additional hydrophobic “W” SLiM that interacts with the PP1 residues I133 and Y134.

      The extended “RVxF-ΦΦ-R-W” interaction brings sequences C-terminal to the “W” SLiM into the vicinity of the hydrophobic groove that adjoins the PP1 catalytic centre. In the Phactr1/PP1 complex, these sequences remodel the groove, generating a novel pocket that facilitates sequence-specific substrate recognition.

      This raises the possibility that sequences C-terminal to the extended “RVxF-ΦΦ-R-W string” in the other complexes also confer sequence-specific substrate recognition, and our study aims to test this hypothesis. Indeed, the hydrophobic groove structures of the Neurabin/Spinophilin/PP1 and Phactr1/PP1 complexes differ significantly (Ragusa et al, 2010; see Fedoryshchak et al 2020, Fig2 FigSupp1).

      (2) Orientation of the W side chain

      Reviewer 1 points out that in the substrate-bound PP1/PPP1R15A/Actin/eIF2 pre-dephosphorylation complex the W sidechain is inverted with respect to its orientation in  PP1-PPP1R15B complex (Yan et al, NSMB 2021). The authors proposed that this may reflect the role of actin in assembly of the quaternary complex. This does not necessarily invalidate the notion that sequences C-terminal to the “W” motif might play a role in actin-independent substrate recognition, and we therefore consider our inclusion of the R15A/B fusions in our analysis to be reasonable.

      (3) Conservation of W

      The motif ‘W’ does not mandate tryptophan - Phactrs and PPP1R15A/B indeed have W at this position but Neurabin/spinophilin contain VDP, which makes similar interactions. Similarly the “RVxF” motifs in Phactr1, Neurabin/Spinophilin, PPP1R15A/B and PNUTS are LIRF, KIKF, KV(R/T)F and TVTW respectively.

      In our revision, we will present comparisons of the differentially remodelled/modified PP1 hydrophobic groove in the various complexes, discuss the different orientations of the tryptophan in the previously published PPP1R15A/PP1 and PPP1R15B/PP1 structures. We will also address the other issues raised by the referees.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comments and suggestions for revisions

      (1) The authors do not provide strong evidence that the interactions of the 'W' of the RVxF- øø -R-W string with the hydrophobic groove of PP1 is conserved in PIPs. Whereas the RVxF motif is well conserved and validated since its discovery in 1997, as are the øø - (an extension of the RVxF motif), and the 'R', the conservation of the Trp residue in the RVxF-øø-R-W string is not conserved.

      We did not mean to imply that the W motif is conserved amongst all PIPs.

      Most PIPs interact with the globular PP1 catalytic core through short linear interaction motifs (SLiMs). Choy et al (PNAS 2014) previously showed that many PIPs interact with PP1 through a conserved trio of SLiMs, RVxF-ΦΦ-R, which is also present in the Phactrs.

      Previous structural analysis showed that the PPP1R15A/B, Neurabin/Spinophilin (PPP1R9A/B), and PNUTS (PPP1R10) PIPs share a trajectory across the PP1 surface that encompasses not only the RVxF-ΦΦ-R SLIMs, but also additional sequences C-terminal to the R SLIM (Chen et al, eLife, 2015). This trajectory is also shared by the Phactr1-PP1 complex (Fedoryshchak et al, eLife, 2020). Based on this structural alignment we proposed the existence of an additional hydrophobic “W” SLiM that interacts with the PP1 residues I133 and Y134 (See Fedoryshchak et al, 2020, Figure 1 figure supplement 2).

      Introduction, paragraph 2 is rewritten to make this clearer.

      The sequence and positions of W differ in amino acid type and position relative to the RVxF-øø-R string.

      The motif ‘W’ does not mandate tryptophan, it is our name for a common structurally aligned motif: although the Phactrs and PPP1R15A/B indeed have W at this position, Neurabin and spinophilin contain VDP, which nevertheless makes similar interactions. Similarly the _“_RVxF” motifs in Phactr1, Neurabin/Spinophilin, PPP1R15A/B and PNUTS are LIRF, KIKF, KV(R/T)F and TVTW respectively.

      In the Discussion the authors state that the hydrophobic groove of PP1 is remodelled by Neurabin. However, details of this are not described or shown in the manuscript.

      The shared trajectory determined by the RVxF-øø-R-W string brings the sequences C-terminal to the W SLIM into the vicinity of the PP1 hydrophobic groove. In the Phactr1/PP1 holoenzyme this generates a novel pocket required for substrate recognition (Fedoryshchak et al, 2020). These observations raised the possibility that sequences C-terminal to the “W” motif in the other RVxF-øø-R-W PIPs also play a role in substrate recognition.

      Introduction paragraph 3 now cites a new Figure 1-S2, which shows how the hydrophobic groove is remodelled in the various different PIP/PP1 complexes. A revised Figure 1A now indicates the hydrophobic residues defining the hydrophobic groove by grey shading.

      (2) To add to the confidence of the structure, the authors should include a 2Fo-Fc simulated annealing omit map, perhaps showing the R and W interactions of the RVxF-øø-R-W string.

      This is now included as new Figure 6 Figure supplement 1. Note that in Neurabin, the W motif is VDP, where the valine and proline sidechains interact similarly to the tryptophan (see also new Figure 1-S2G,H).

      We also add a new supplementary Figure 6-S1 comparing our PBM-liganded Neurabin PDZ domain with the previously published unliganded structure (Ragusa et al 2010).

      (3) Page 16. The authors state that spinophilin remodels the PP1 hydrophobic groove differently from Phactrs. Arguably spinophilin does not remodel the PP1 hydrophobic groove at all. There are no contacts between spinophilin and the PP1 hydrophobic groove in the spinophilin-PP1 structure, correlating with the absence of 'W" in the RVxF-øø-R-W string in spinophilin.

      The VDP sequence corresponding to the W motif in spinophilin and neurabin makes analogous contacts to those made by the W in Phactr1 (see Fedoryshchak et al 2020).

      Remodelling is meant in the sense of altering the structure of the major groove by bringing new sequences into its vicinity rather than necessarily directly interacting with it. The spinophilin/PP1 and Phactr/PP1 hydrophobic grooves are compared in new Figure 1-S2 (see also Fedoryshchak et al 2020, Figure 2 figure supplement 1)

      (4) Page 8. For the cell-based/proteomics-dephosphorylation assay in Figure 2, it isn't clear why there were no dephosphorylation sites detected for the PPP1R15A/B-PP1 fusion (except PPP6R1 S531 for PPP1R15B). One might have expected a correlation with PP1 alone. Does this imply that PPP1R15A/B are inhibiting PP1 catalytic activity? Was the activity tested in vitro?

      The R15A/B data are compared to average abundance of all the phosphosites in the dataset, including those of PP1.

      We have not tested for a general inhibitory effect of R15A/B on PP1 activity. Many PIPs including R15A/B do occlude one or more of the PP1 substrate groove and therefore generally act as inhibitors of PP1 activity against some potential substrates, while enhancing activities against others.

      Other points 

      (4) Figure S1: Colour sequence similarities/identities.

      Done

      (6) Figures: Structure figures lacked labels:

      Figure 1A, label PP1, Phactrs etc.

      Done

      Figure 6, label PP1, Neurabin, previous Neurabin structure (Fig. 6C), hydrophobic groove, PDZ domain, etc.

      Done

      (7) Statistical analysis. p values should be shown for data in:

      Figure 5.

      To avoid cluttering the Figure, a new sheet, “statistical significance” has been added to Supplementary Table 3, summarizing the analysis.

      Figure 1.

      Figure amended (now figure 1-S1).

      (8) Some inconsistency with labels, eg '34-WT' used in Fig. 5C, whereas '34A-WT' (better) in Methods.

      Now changed to 34A etc where used.

      (9) Page 6. PPP1R9A/B is not shown in Figure 1A and Figure S1A.

      PPP1R9A/B are Neurabin and spinophilin - now clarified in Introduction paragraph 2, Results paragraph 1, Discussion paragraph 1.

      (10) Page 7: lines 4, 'site' not 'side'.

      Done

      (11) Page 9: DTL and CAMSAP3 were found to be dephosphorylated in the PP1-Neurabin/spinophilin screen. Are these PDZ-binding proteins?

      Neither DTL nor CAMSAP3 contain C-terminal hydrophobic residues characteristic of classical PBMs. Sentence added in Discussion, paragraph 5

      (12) Page 12 and Figure 5 and S5: The synthetic p4E-BP1 and IRSp53WT peptides with PBM should be given more specific names to indicate the presence of the PBM.

      We have renamed 4E-BP1<sup>WT</sup> and IRSp53<sup>WT</sup> to 4E-BP1<sup>PBM</sup> and  IRSp53<sup>PBM</sup> respectively, emphasising the inclusion of the wildtype or mutated PBM from 4E-BP1 on these peptides.

      Text, Figure 5, and Figure S5 all revised accordingly.

      (13) Give PDB code for spinophilin-PP1 complex coordinates shown in Figure 6C.

      PDB codes for the various PIP/PP1 complexes now given in new Figure 1-S2 and revised Figure 6C.

      Reviewer #2 (Recommendations for the authors):

      The work undertaken by the authors is extensive and robust, however, I believe that some improvement in the writing and some detailed explanation of certain results sections would help with the presentation of the work and clarity for the readers.

      (1) The introduction should contain more information about the interaction between PP1 and Neurabin, given that this is the focus of the paper. This would give the reader the necessary background required to follow the paper.

      Introduction paragraph 2 revised to describe the different SLIMs in more detail. New Figure 1-S2 shows detail of the different remodelled hydrophobic grooves in the various PIP/PP1 complexes.

      (2) More information on PP1-IRSp53L460A has to be added before discussing results in S1B.

      Sentence explaining that IRSp53 L460 docks with the remodelled PP1 hydrophobic groove in the Phactr1/PP1 holoenzyme added in Results paragraph 2.

      (3) Page 6: "as expected, the +5 residue L460A mutation, which impairs dephosphorylation by the intact Phactr1/PP1 holoenzyme, impaired sensitivity to all the fusions, indicating that they recognise phosphorylated IRSp53 in a similar way (Figure S1B)". Statistics between IRSp53 and IRSp53L460A across PP1-PIPs need to be conducted before concluding the above. From the graph and the images, the impairment to dephosphorylation is not convincing.

      For each of the four PP1-Phactr fusions, the IRSp53 L460A peptide shows significantly less reactivity than the IRSp53WT peptide (p<0.05 for each fusion).

      Since the proteomics studes in Figure 2 show that the substrate specificity of the four PP1-Phactr1 fusions is virtually identical, we combined the data for the four different fusions. The IRSp53 L460A peptide shows significantly less reactivity than the IRSp53WT peptide in this analysis (p< 0.0001). This result shown in revised Figure S1B and legend.

      (4) mCherry-4E-BP1(118+A), in which an additional C-terminal alanine should still allow TOSmediated phosphorylation, but prevent PDZ interaction. Does 4EBP1 (118+A) actually prevent interaction between PP1-Neurabin? This interaction needs to be validated, especially since spinophilin was shown to bind to multiple regions of PP1.

      It is not clear what the referee is asking for here. The biochemical analysis in Figure 4C shows that the C-terminus of 4E-BP1 constitutes a classical PBM. The X-ray crystallography in Figure 6 confirms this, demonstrating H-bond interactions between the 4E-BP1 C-terminal carboxylate and main chain amides of L514, G515 and I516.

      We consider the possibility that the 4E-BP1(118+A) mutant inhibits the activity of PP1-neurabin via a mechanism other than direct blocking 4E-BP1 / PDZ interaction to be unlikely for the following reasons:

      (1) Addition of a C-terminal alanine will disrupt the PBM interaction because the extra residue sterically blocks access to the PBM-binding groove. This is the most parsimonious explanation, and is based on our solid structural and biochemical evidence that the 4E-BP1 C-terminus is a classical PBM.

      (2) Alphafold3 modelling predicts Neurabin PDZ / 4E-BP1 PBM interaction with high confidence (shown in Figure 6-S2E), but it does not predict any PDZ interaction with 4E-BP1(118+A). Note added in Figure 6-S2 legend.

      (3) Recognition of the 4E-BP1(118+A) mutation without loss of binding affinity would require that the mutant becapable of binding formally equivalent to recognition of an “internal” PDZ-binding peptide. Recognition of such “internal peptides” is dependent on their adopting a specifically constrained conformation, which typically requires reorganisation of the PDZ carboxylate-binding GLGF loop. Such “internal site” recognition typically involves more than one residue C-terminal to the conventional PDZ “0” position (see Penkert et al NSMB 2004, doi:10.1038/nsmb839; Gee et al JBC 1998, DOI: 10.1074/jbc.273.34.21980; Hillier et al 1999, Science PMID: 10221915).

      (5) It is nice to see that the various PP1-Phactr fusions have around 60% substrate overlap between them. Would it be possible to compare these results with previously published mass spec data of Phactr1XXX from the group? There is mention of some substrates being picked up, but a comparison much like in Figure 2E would be more informative about the extent to which the described method captures relevant information.

      This is difficult to do directly as the PP1-Phactr fusion data are from human cells while that in Fedoryshchak et al 2020 is from mouse.

      However, manual curation shows that of the 28 top hits seen in our previous analysis of Phactr1XXX in NIH3T3 cells, 18 were also detectable in the HEK293 system; of these, 13 were also detected as as PP1-Phactr fusion hits. Data summarised in new Figure 2-S1C. Text amended in Results, “Proteomic analysis...”, paragraph 2.

      (6) Figure 3D Why are the levels of pT70, pT37/46 and total protein in vector controls much lower as compared to 0nM Tet in PP1-Neurabin conditions? It is also weird that given total protein is so low, why are the pS65/101 levels high compared to the rest?

      We think it likely these phenomena reflect a low level expression of PP1-Neurabin expression in uninduced cells. Now noted in Figure 3D legend, basal PP1-Neurabin expression shown in new Figure 3-S1C. This alters the relative levels of the different species detected by the total 4E-BP1 antibody in favour of the faster migrating forms, which are less phosphorylated than the slower ones, and the total amount increases about 2-fold (Figure 3D, compare 0nM Tet lanes).

      The altered p65/101-pT70 ratio is also likely to reflect the leaky PP1-Neurabin expression, since the relative intensities of the various phosphorylated species are dependent on both the relative rates of phosphorylation and dephosphorylation. Expression of a phosphatase would therefore be expected to differentially affect the phosphorlyation levels of different sites according to their reactivity.

      (7) Figure 3E: Does inhibiting mTORC further reduce translation when PP1-Neurabin is expressed? If this is the case, this might suggest that they might not necessarily be mTORC inhibitors?

      We have not done this experiment. Since Rapamycin cannot be guaranteed to completely block 4E-BP1 phosphorylation, and PP1-Neurabin cannot be guaranteed to completely dephosphorylate 4E-BP1, any further reduction upon their combination would be hard to interpret.

      (8) Substrate interactions with the remodelled PP1 hydrophobic groove do not affect PP1-Neurabin specificity. Is there evidence that PP1-Neurabin remodels the hydrophobic groove? Is it not possible that Neurabin does not remodel the PP1 groove to begin with and hence there is no effect observed with the various mutants? If this is not the case, it should be explained in a bit more detail.

      Comparison of the Neurabin/PP1 and Phactr1/PP1 structures shows that the hydrophobic groove is remodelled differently in the two complexes. Now shown in new Figure 1-S2B,C,G.

      (9) Figure 5B has a lot of interesting information, which I believe has not been discussed at all in the results section.

      To help interpretation of the enzymology in Figure 5 we have renamed 4E-BP1WT and IRSp53WT to 4E-BP1PBM and IRSp53PBM respectively, emphasising the inclusion of the wildtype or mutated PBM from 4E-BP1 on these peptides. Text in Results, “PDZ domain interaction…”, paragraph 1, and Figures 5 and S5 revised accordingly.

      Why does the 4E-BP1Mut affect catalytic efficiency of PP1 alone when compared with WT, while no difference is observed with IRSp53WT and mutant?

      We do not understand the basis for the differential reactivity of 4E-BP1PBM and 4E-BP1MUT with PP1 alone; we suspect that it reflects the hydrophobicity change resulting from the MDI -> SGS substitution. However this is unlikely to be biologically significant as PP1 is sequestered in PIP-PP1 complexes.

      Importantly, the two PP1 fusion proteins behave consistently in this assay – the presence of the intact PBM increases reactivity with PP1-Neurabin, but has no effect on dephosphorylation by PP1-Phactr1.

      Why does PP1 alone not have a difference between IRSp53WT and mutant, while PP1-Neurabin does have a difference?

      This is due to the presence of the PBM in IRSp53WT (now renamed IRSp53PBM), which affects increases affinity for PP1 Neurabin, but not PP1 alone. Likewise, PP1-Phactr1, which does not possess a PDZ domain, is also unaffected by the integrity of the PBM.

      (7) “Strikingly, alanine substitutions at +1 and +2 in 4E-BP1WT increased catalytic efficiency by both fusions, perhaps reflecting changes at the catalytic site itself (Figure 5E, Figure S5E)”. This could be expanded upon, because this suggests a mechanism that makes the substrate refractory to PDZ/hydrophobic groove remodelling?

      We favour the idea that this reflects a requirement to balance dephosphorylation rates between the multiple 4E-BP1 phosphorylation sites, especially if multiple rounds of dephosphorylation occur for each PBM—PDZ interaction. Additional sentences added in Discussion paragraph 7.

      (8) Typographical errors and minor comments:

      a) PIPs can target PP1 to specific subcellular locations, and control substrate specificity through autonomous substrate-binding domains, occupation or extension of the substrate grooves, or modification of PP1 surface electrostatics.

      b) Phosphophorylation side site abundances within triplicate samples from the same cell line were comparable between replicates (Figure 2B).

      c) While the alanine substitutions had little effect, conversion of +4 to +6 to the IRSp534E-BP1 sequence LLD increased catalytic efficiency some 20-fold (Figure 5C, Figure S5C). 

      d) Figure 3E labels are not clear. The graph can be widened to make the labels of the conditions clearer.

      All corrected

      Reviewer #3 (Recommendations for the authors):

      This was a very well-written manuscript.

      However, I was looking for a summary mechanistic figure or cartoon to help me navigate the results.

      I noted a few typos in the text.

      New summary Figure 5-S2 added, cited in results, and discussed in Discussion paragraph 6,7.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This article presents a meta-analysis that challenges established abundance-occupancy relationships (AORs) by utilizing the largest known bird observation database. The analysis yields contentious outcomes, raising the question of whether these findings could potentially refute AORs.

      We thank the Reviewer for their positive comments.

      Strengths:

      The study employed an extensive aggregation of datasets to date to scrutinize the abundance-occupancy relationships (AORs).

      We thank the Reviewer for their positive comments.

      Weaknesses:

      While the dataset employed in this research holds promise, a rigorous justification of the core assumptions underpinning the analytical framework is inadequate. The authors should thoroughly address the correlation between checklist data and global range data, ensuring that the foundational assumptions and potential confounding factors are explicitly examined and articulated within the study's context.

      We thank the Reviewer for these comments. We agree that more justification and transparency is needed of the core assumptions that form the foundation of our methods. In our revised version, we have taken the following steps to achieve this:

      - Altered the title to be more explicit about the core assumptions, which now reads: “Local-scale relative abundance is decoupled from global range size”

      - We have added more details on why and how we treat global range size as a measure of ‘occupancy.’

      - We have added a section that discusses the limitations of using eBird relative abundance

      Reviewer #2 (Public Review):

      Summary:

      The goal is to ask if common species when studied across their range tend to have larger ranges in total. To do this the authors examined a very large citizen science database which gives estimates of numbers, and correlated that with the total range size, available from Birdlife. The average correlation is positive but close to zero, and the distribution around zero is also narrow, leading to the conclusion that, even if applicable in some cases, there is no evidence for consistent trends in one or other direction.

      We thank the Reviewer for these comments.

      Strengths:

      The study raises a dormant question, with a large dataset.

      We thank the Reviewer for these comments. We intended to take a longstanding question and attempt to apply novel datasets that were not available mere decades ago. While we do not imply that we have ‘solved’ the question, we hope this work highlights the potential for further interrogation using these large datasets.

      Weaknesses:

      This study combines information from across the whole world, with many different habitats, taxa, and observations, which surely leads to a quite heterogeneous collection.

      We agree that there is a heterogeneous collection of data across many habitats, taxa, and observations. However, rather than as a weakness, we see this as a significant strength. Our work assumes we are averaging over this variability to assess for a large-scale pattern in the relationship - something that was potentially a limitation of previous work, as these large datasets were often focused on particular contexts (e.g., much work focused solely on the UK), which we believe could limit some of the generalizability of the previous work. However, the reviewer makes a fair point in regard to the heterogeneity of data collection. We have now added some text in the discussion which is explicit about this - see the new section named “Potential limitations of current work and future work –-although our findings challenge some long-held assumptions about the consistency of the abundance-occupancy relationship, our work only deals with interspecific AORs among birds, synthesizing observations of potentially heterogeneous locations, context and quality”.

      First, scale. Many of the earlier analyses were within smaller areas, and for example, ranges are not obviously bounded by a physical barrier. I assume this study is only looking at breeding ranges; that should be stated, as 40% of all bird species migrate, and winter limitation of populations is important. Also are abundances only breeding abundances or are they measured through the year? Are alien distributions removed?

      Second, consider various reasons why abundance and range size may be correlated (sometimes positively and sometimes negatively) at large scales. Combining studies across such a large diversity of ecological situations seems to create many possibilities to miss interesting patterns. For example:

      (1) Islands are small and often show density release.

      See comment below.

      (2) North temperate regions have large ranges (Rapoport's rule) and higher population sizes than the tropics.

      See comment below.

      (3) Body size correlates with global range size (I am unsure if this has recently been tested but is present in older papers) and with density. For example, cosmopolitan species (barn owl, osprey, peregrine) are relatively large and relatively rare.

      See comment below.

      (4) In the consideration of alien species, it certainly looks to me as if the law is followed, with pigeon, starling, and sparrow both common and widely distributed. I guess one needs to make some sort of statement about anthropogenic influences, given the dramatic changes in both populations and environments over the past 50 years.

      See comment below. We also added a sentence in the methods that highlighted we did not remove alien ranges and provided reasons why. Still, we do acknowledge the dramatic changes in populations and environments over the past 50 years (see the new section  “Potential limitations of current work and futur work”)

      (5) Wing shape correlates with ecological niche and range size (e.g. White, American Naturalist). Aerial foraging species with pointed wings are likely to be easily detected, and several have large ranges reflecting dispersal (e.g. barn swallow).

      We agree that all of the points above are interesting data explorations. As said above, our main purpose was to highlight the potential for further interrogation using these large datasets. However, we have added some additional text in the discussion that explicitly mentions/encourages these additional data explorations. We hope people will pick up on the potential for these data and explore them further.

      Third, biases. I am not conversant with ebird methodology, but the number appearing on checklists seems a very poor estimate of local abundance. As noted in the paper, common species may be underestimated in their abundance. Flocking species must generate large numbers, skulking species few. The survey is often likely to be in areas favorable to some species and not others. The alternative approach in the paper comes from an earlier study, based on ebird but then creating densities within grids and surely comes with similar issues.

      We agree that if we were interested in the absolute abundance of a given species, the local number on an eBird checklist would be a poor representation. However, our study aims not to estimate absolute abundance but to examine relative abundance among species on each checklist. By focusing on relative abundance, we leverage eBird data's strengths in detecting the presence and frequency of species across diverse locations and times, thereby capturing community composition trends that can provide meaningful insights despite individual checklist biases. This approach allows us to assess the comparative prominence of species in the community as reported by the observer, providing a consistent metric of relative abundance. Despite detectability biases, the structure of eBird checklists reflects the observer’s encounter rates with each species under similar conditions, offering a valuable snapshot of relative species composition across sites and times. The key to our assumption is that these biases discussed are not directional and, therefore, random throughout the sampling process, which would translate to no ‘real’ bias in our effect size of interest.

      Range biases are also present. Notably, tropical mountain-occupying species have range sizes overestimated because holes in the range are not generally accounted for (Ocampo-Peñuela et al., Nature Communications). These species are often quite rare, too.

      We thanks the reviewer for pointing to this issue and reference. We included a discussion on these biases in our limitations section and reference Ocampo-Peñuela et al. to emphasize the need for improved spatial resolution in range data for more accurate AOR assessments.”More precise range-size estimates would also improve the accuracy of AOR assessments, since species range data are often overestimated due to the failure to capture gaps in actual distributions ”

      Fourth, random error. Random error in ebird assessments is likely to be large, with differences among observers, seasons, days, and weather (e.g. Callaghan et al. 2021, PNAS). Range sizes also come with many errors, which is why occupancy is usually seen as the more appropriate measure.

      If we consider both range and abundance measurements to be subject to random error in any one species list, then the removal of all these errors will surely increase the correlation for that list (the covariance shouldn't change but the variances will decrease). I think (but am not sure) that this will affect the mean correlation because more of the positive correlations appear 'real' given the overall mean is positive. It will definitely affect the variance of the correlations; the low variance is one of the main points in the paper. A high variance would point to the operation of multiple mechanisms, some perhaps producing negative correlations (Blackburn et al. 2006).

      We agree random errors can affect estimates, but as we wrote above, random errors, regardless of magnitudes, would not bias estimates. After accounting for sampling error (a part of random errors), little variance is left to be explained as we have shown in the MS. This suggests that many of the random errors were part of the sampling errors. And this is where meta-analysis really shines.

      On P.80 it is stated: "Specifically, we can quantify how AOR will change in relation to increases in species richness and sampling duration, both of which are predicted to reduce the magnitude of AORs" I haven't checked the references that make this statement, but intuitively the opposite is expected? More species and longer durations should both increase the accuracy of the estimate, so removing them introduces more error? Perhaps dividing by an uncertain estimate introduces more error anyway. At any rate, the authors should explain the quoted statement in this paper.

      It would be of considerable interest to look at the extreme negative and extreme positive correlations: do they make any biological sense?

      Extremely high correlations would not make any biological sense if these observations were based on large sample sizes. However, as shown in Figure 2, all extreme correlations come from small sample sizes (i.e., low precision), as sampling theory expects (actually our Fig 2 a text-book example of the funnel shape). Therefore, we do not need to invoke any biological explanations here.

      Discussion:

      I can see how publication bias can affect meta-analyses (addressed in the Gaston et al. 2006 paper) but less easily see how confirmation bias can. It seems to me that some of the points made above must explain the difference between this study and Blackburn et al. 2006's strong result.

      We agree. Now, we extended an explanation of why confirmation bias could result in positive AOR. Yet, we point out confirmation bias is a very common phenomena which we cite relevant citations in the original MS. The only way to avoid confirmation bias is to conduct a study blind but this is not often possible in ecological work.

      “Meta-research on behavioural ecology identified 79 studies on nestmate recognition, 23 of which were conducted blind. Non-blind studies confirmed a hypothesis of no aggression towards nestmates nearly three times more often. It is possible that confirmation bias was at play in earlier AOR studies.”

      Certainly, AOR really does seem to be present in at least some cases (e.g. British breeding birds) and a discussion of individual cases would be valuable. Previous studies have also noted that there are at least some negative and some non-significant associations, and understanding the underlying causes is of great interest (e.g. Kotiaho et al. Biology Letters).

      We agree. And yes, we pointed out these in our introduction.

      Reviewer #3 (Public Review):

      Summary:

      This paper claims to overturn the longstanding abundance occupancy relationship.

      Strengths:

      (1) The above would be important if true.

      (2) The dataset is large.

      We have clarified this point by changing the title to emphasize that we do not suggest overturning AORs entirely but instead provide a refined view of the relationship at a global scale. Our results suggest a weaker and more context-dependent AOR than previously documented. We hope our revised title and additional clarifications in the text convey our intent to contribute to a more nuanced understanding rather than a whole overturning of the AOR framework.

      Weaknesses:

      (1) The authors are not really measuring the abundance-occupancy relationship (AOR). They are measuring abundance-range size. The AOR typically measures patches in a metapopulation, i.e. at a local scale. Range size is not an interchangeable notion with local occupancy.

      We have refined this in our revision to be more explicitly focused on global range size. However, we note that the classic paper by Bock and Richlefs (1983, Am Nat) also refers to global (species entire) range size in the context of the AOR. Importantly, Bock and Richlefs pointed out the importance of using species’ entire ranges; without such uses, there will be sampling artifacts creating positive AORs when using arbitrary geographical ranges, which were used in some studies of AORs. So we highlight that our work is well in line with the previous work, allowing us to question the longstanding macroecological work. One of the issues of AOR has been how to define occupancy and global range size, which provides a relatively ambiguous measure, which is why we used this measure.

      (2) Ebird is a poor dataset for this. The sampling unit is non-standard. So abundance can at best be estimated by controlling for sampling effort. Comparisons across space are also likely to be highly heterogenous. They also threw out checklists in which abundances were too high to be estimated (reported as "X"). As evidence of the biases in using eBird for this pattern, the North American Breeding Bird Survey, a very similar taxonomic and geographic scope but with a consistent sampling protocol across space does show clear support for the AOR.

      Yes, we agree the sampling unit is non-standard. However, this is a significant strength in that it samples across much heterogeneity (as discussed in response to Reviewer 2, above). We were interested in relative abundance and not direct absolute abundance per se, which is accurate, especially since we did control for sampling effort.

      We appreciate the reviewer’s attention to our data selection criteria. We excluded checklists containing ‘X’ entries to minimize biases in our abundance estimates. The 'X' notation is often used for the most common species, reflecting the observer's identification of presence without specifying a count. This approach was chosen to avoid disproportionately inflating presence data for these abundant species, which could distort the relative abundance calculations in our analysis. By excluding such checklists, we aimed to retain consistency and ensure that local abundance estimates were representative across all species on each checklist. We have revised our manuscript to clarify this methodological choice and hope this explanation addresses the reviewer’s concern. We modified our text in the methods to make the entries ‘X’ clearer (see the Method section).

      (3) In general, I wonder if a pattern demonstrated in thousands of data sets can be overturned by findings in one data set. It may be a big dataset but any biases in the dataset are repeated across all of those observations.

      Overturning a major conclusion requires careful work. This paper did not rise to this level.

      We appreciate the reviewer’s caution regarding broad conclusions based on a single dataset, even one as large as eBird. Our intention was not to definitively overturn the abundance-occupancy relationship (AOR) but to re-evaluate it with the most extensive and globally representative dataset currently available. We recognise that potential biases in citizen science data, such as observer variation, may influence our findings, and we have taken steps to address these in our methodology and limitations sections. We see this work as a contribution to an ongoing discourse, suggesting that AOR may be less universally consistent than previously believed, mainly when tested with large-scale citizen science data. We hope this study will encourage additional research that tests AORs using other expansive datasets and approaches, further refining our understanding of this classic macroecological relationship. However, we have left our broad message about instigating credible revolution and also re-examining ecological laws.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The investigation focuses solely on interspecific relationships among birds; thus, the extrapolation of these conclusions to broader ecological contexts requires further validation.

      We have now added this point to our new section: “Although our findings challenge some long-held assumptions about the consistency of the abundance-occupancy relationship, our work only deals with interspecific AORs among birds, so we hope this work serves as a foundation for further investigations that utilize such comprehensive datasets.”

      (2) The rationale for combining data from eBird - a platform predominantly representing individual observations from urban North America - with the more globally comprehensive BirdLife International database needs to be substantiated. The potential underrepresentation of global abundance in the eBird checklist data could introduce a sampling bias, undermining the foundational premises of AORs.

      We agree with the limitation of ebird sampling coverage, but it should not bias our results. In statistical definitions, bias is directional, and if not directional, it will become statistical noise, making it difficult to detect the signal. In fact, our meta-analyses adjust what statisticians call sampling bias and it is the strength of meta-analysis.

      (3) In the full mixed-effect model, checklist duration and sampling variance (inversely proportional to sample size N) are treated as fixed effects. However, these variables are likely to be negatively correlated, which could introduce multicollinearity, inflating standard errors and diminishing the statistical significance of other factors, such as the intercept. This calls into question the interpretation of insignificance in the results.

      Multicollinearity is an issue with sample sizes. For example, with small datasets, correlations of 0.5 could be an issue, and such an issue would usually show up as a large SE. We do not have such an issue with ~ 17 million data points. Please refer to this paper.

      Freckleton, Robert P. "Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error." Behavioral Ecology and Sociobiology 65 (2011): 91-101.

      (4) The observed low heterogeneity may stem from discrepancies in sampling for abundance versus occupancy, compounded by uncertainties in reporting behavior.

      If we assume everybody underreports common species or overreports rare species, this could happen. However, such an assumption is unlikely. If some people report accurately (but not others), we should see high heterogeneity, which we do not observe).  We have touched upon this point in our original MS.

      (5) The contribution and implementation of phylogenetic comparative analysis remain ambiguous and were not sufficiently clarified within the study.

      We need to add more explanation for the global abundance analysis

      “To statistically test whether there was an effect of abundance and occupancy at the macro-scale, we used phylogenetic comparative analysis.  This analysis also addresses the issue of positive interspecific AORs potentially arising from not accounting for phylogenetic relatedness among species examined ”

      (6) The use of large N checklists could skew the perceived rarity or commonality of species, potentially diminishing the positive correlation observed in AORs. A consistent observer effect could lead to a near-zero effect with high precision.

      Regardless of the number of N species in checklists (seen in Fig 2), correlations are distributed around zero. This means there is nothing special about large N checklists. 

      (7) The study should acknowledge and discuss any discrepancies or deviations from previous literature or expected outcomes.

      We felt we had already done this as we discussed the previous meta-analysis and what we expected from this meta-analysis.  Nevertheless, we have added some relevant sentences in the new version of MS.

      In addition to these major points, there are several minor concerns:

      (1) Figure 2B lacks discussion, and the metric for the number of observations is not clarified. Furthermore, the labeling of the y-axis appears to be incorrect.

      Thank you very much for pointing out this shortcoming. Now, the y-axis label has been fixed and we mention 2B in the main text.

      (2) The study should provide a clear, mathematical expression of the multilevel random effect models for greater transparency.

      Many thanks for this point, and now we have added relevant mathematical expressions in Table S6.

      (3) On Line 260, the term "number of species" should be refined to "number of species in a checklist," ideally represented by a formula for precision.

      This ambiguity has been mended as suggested.

      Please provide the data and R code linked to the outputs.

      The referee must have missed the link (https://github.com/itchyshin/AORs) in our original MS. In addition to our GitHub repository link, we now have added a link to our Zenodo repository (https://doi.org/10.5281/zenodo.14019900).

      Reviewer #3 (Recommendations For The Authors):

      The authors cite Rabinowitz's 7 forms of rarity paper as a suggestion that previous findings also break the AOR. In fact empirical studies of the 7 forms of rarity typically find that all three forms of rareness vs commonness are heavily correlated (e.g. Yu & Dobson 2000).

      We thank the reviewer for drawing attention to Yu & Dobson (2000) and similar studies that find positive correlations among the axes of rarity. Ref 3 is correct in that Rabinowitz’s (1981) framework does not require that local abundance and geographic range size be uncorrelated for every species; instead, it highlights conceptual scenarios where a species may be common locally yet have a restricted distribution (or vice versa).

      Empirical analyses such as Yu & Dobson (2000) show that, on average, these axes can be correlated, which may align with conventional AOR findings in some taxonomic groups. However, Rabinowitz’s key insight was that exceptions do occur, so these exceptions demonstrate that strong positive AORs may not be universally applicable. Our results do not claim that Rabinowitz’s framework “breaks” the AOR outright; instead, we use it to underscore that local abundance can, in principle, be “decoupled” from global occupancy.  Whether the correlation found by Yu & Dobson (2000) implies a positive AOR, requires a detailed simulation study, which is an interesting avenue for future research. 

      Thus, citing Rabinowitz serves to highlight the potential heterogeneity and complexity of abundance–occupancy relationships rather than to refute every positive correlation reported in the literature. Our findings suggest that when examined at large spatiotemporal scales (with unbiased sampling), the overall AOR signal may be less robust than traditionally believed. This is consistent with Rabinowitz’s view that local abundance and global range can vary along independent axes. Now we added

      “Although studies using her framework found positive correlations between species range and local abundance.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      This manuscript uses a well-validated behavioral estimation task to investigate how optimistic belief updating was attenuated during the 2020 global pandemic. Online participants recruited during and outside of the pandemic estimated how likely different negative life events were to happen to them in the future and were given statistics about these events happening. Belief updating (measured as the degree to which estimations changed after viewing the statistics) was less optimistically biased during the pandemic (compared to outside of it). This resulted from reduced updating from "good news" (better than expected information). Computational models were used to try to unpack how statistics were integrated and used to revise beliefs. Two families of models were compared - an RL set of models where "estimation errors" (analogous to prediction errors in classic RL models) predict belief change and a Bayesian set of models where an implied likelihood ratio was calculated (derived from participants estimations of their own risk and estimation of the base rate risk) and used to predict belief change. The authors found evidence that the former set of models accounted for updating better outside of the pandemic, but the latter accounted for updating during the pandemic. In addition, the RL model provides evidence that learning was asymmetrically positively biased outside of the pandemic but symmetric during it (as a result of reduced learning rates from good news estimation errors).

      Strengths:

      Understanding whether biases in learning are fixed modes of information processing or flexible and adapt in response to environmental shocks (like a global pandemic or economic recession) is an important area of research relevant to a wide range of fields, including cognitive psychology, behavioral economics, and computational psychiatry. The study uses a well-validated task, and the authors conduct a power analysis to show that the sample sizes are appropriate. Furthermore, the authors test that their results hold in both a between-group analysis (the focus of the main paper) and a within-group analysis (mainly in the supplemental).

      The finding that optimistic biases are reduced in response to acute stress, perceived threat, and depression has been shown before using this task both in the lab (social stress manipulation), in the real world (firefighters on duty), and clinical groups (patients with depression). However, the work does extend these findings here in important ways:

      (1) Examining the effect of a new real-world adverse event (the pandemic).<br /> (2) The reduction in optimistic updating here arises due to reduced updating from positive information (previously, in the case of environmental threat, this reduction mainly arose from increased sensitivity to negative information).<br /> (3) Leveraging new RL-inspired computational approaches, demonstrating that the bias - and its attenuation - can be captured using trial-by-trial computational modeling with separate learning rates for positive and negative estimation errors.

      Weaknesses:

      Some interpretation and analysis (the computational modeling in particular) could be improved.

      On the interpretation side, while the pandemic was an adverse experience and stressful for many people (including myself), the absence of any measures of stress/threat levels limits the conclusions one can draw. Past work that has used this task to examine belief updating in response to adverse environmental events took physiological (e.g., SCR, cortisol) and/or self-report (questionnaires) measures of mood. In SI Table 1, the authors possibly had some questionnaire measures along these lines, but this might be for the participants tested during the pandemic.

      Thank you for this review.

      We agree that the lack of physiological and self-report measures of stress, threat, and perceived uncertainty limits the interpretation of findings regarding potential psychological factors. Some self-reported anxiety and perceived risk measures experienced during the lockdowns were collected in a subset of participants (n=40, counting n=21 tested before and during the 1st strict lockdown, and n=19 tested solely during the 1st lockdown). These reports were given retrospectively at the time of release of the 1st lockdown in summer 2020 when the pandemic was still unfolding (SI Table 1).

      Exploratory correlations revealed some noteworthy trends. We found that participants who reported to have perceived a bigger risk of death due to contagion were also those who were less optimistically biased when updating their beliefs about adverse future life risks during the first strict COVID-19-related lockdown (Pearson’s r = -0.36, p = 0.02).

      Moreover, parameter estimates from the computational models of belief updating showed associations with specific survey responses: The rational Bayesian model’s scaling parameter correlated positively with adherence to distancing measures (r = 0.41, p = 0.01) and negatively with the need for social contact (r = -0.37, p = 0.02). This result indicated that participants who were updating their beliefs faster were more likely to follow preventive guidelines and felt less social craving. Meanwhile, the asymmetry parameter correlated negatively with mask wearing (r = -0.41, p = 0.01), positively with physical contact with close others (r = 0.32, p = 0.04) and satisfaction with social interactions (r = 0.33, p = 0.04). This suggests that participants who displayed some asymmetry in belief updating during the COVID-19 pandemic were less likely to comply with mask-wearing rules and more likely to engage in social interactions.

      However, these results did not survive correction for multiple comparisons and the sample size for correlational analyses is in the lower range. The subjective measures of anxiety and fear of contagion did not significantly correlate to the updating bias, or any other variable measured by the belief updating task (e.g. estimation error, updating magnitude).

      We now further discuss on page 12 the limitation, which reads:

      “We did not collect physiological measures of stress or information about the COVID-19 infection status of participants, which precludes a direct exploration of the immediate effects of experiencing the infection on belief-updating behavior and the potential interaction with anxiety and stress levels. Although subjective ratings of the perceived risk of death from COVID-19 correlated negatively to the beliefs updating bias measured during the pandemic, this result was obtained retrospectively in a subset of participants (SI section 4). We thus cannot directly attribute the observed lack of optimistically biased belief updating during the lockdown to psychological causes such as heightened anxiety and stress. This limitation is noteworthy, as the impact of experiencing the pandemic on belief updating about the future could differ between those who directly experienced infection and those who remained uninfected. It is also important to acknowledge that our study was timely and geographically limited to the context of the COVID-19 outbreak in France. Cultural variations and differences in governmental responses to contain the spread of SARS-CoV-2 may have impacted the optimism biases in belief updating differently.”

      On the analysis side, it was unclear what the motivation was for the different sets of models tested. Both families of models test asymmetric vs symmetric learning (which is the main question here) and have similar parameters (scaling and asymmetry parameters) to quantify these different aspects of the learning process. Conceptually, the different behavioral patterns one could expect from the two families of models needed to be clarified.

      Thank you for raising this point. We agree that a clearer conceptual distinction between the two model families can help strengthen the interpretation of our findings. We have added the following considerations to the introduction on pages 2–3, which now reads:

      “The underlying mechanism of optimistically biased belief updating involves an asymmetry in learning from positive and negative belief-disconfirming information[2,3,4], which can unfold in two ways following Reinforcement learning (RL) or Bayes rule[5].

      Conceptually, Reinforcement learning (RL) and Bayesian models of belief updating are complementary but make different assumptions about the hidden process humans may use to adjust their beliefs when faced with information that contradicts them. The RL models assume belief updating is proportional to the estimation error. The key idea of the estimation error expresses the difference between how much someone believes they will experience a future life event and the actual prevalence of the event in the general population. This difference can be positive or negative. A scaling and an asymmetry parameter quantify the propensity to consider the estimation error magnitude and its valence, respectively. These two free parameters form the learning rate, which indicates how fast and biased participants update their beliefs.

      In contrast, Bayesian models assume that following Bayes’ rule the posterior, updated belief is a new hypothesis, formed by pondering prior knowledge with new evidence. The prior knowledge consists in information about the prevalence of life events in the general population. The new evidence comprises various alternative hypotheses. It examines how likely a specific event is to occur or not occur for oneself, compared to the likelihood that it will happen or not happen to others. This probabilistic adjustment of beliefs about future life events can be considered as an approximation of a participant’s confidence in the future. The two free parameters of the Bayesian belief updating model scale how much the initial belief deviates from the updated, posterior belief (i.e., scaling parameter) and the propensity to consider the valence of this deviance (i.e., asymmetry parameter).

      Although RL-like and Bayesian updating models make different assumptions about the updating strategy, they are complementary and powerful formalizations of human reasoning. Both models provide insight into hidden, latent variables of the updating process. Most notably, the learning rate and its components, the scaling and asymmetry parameters, which can vary between individuals and contexts and, through this variance, offer possible explanations for the idiosyncrasy in belief-updating behavior and its cognitive biases. “

      Do the "winning" models produce the main behavioral patterns in Figure 1, and are they in some way uniquely able to do so, for instance? How would updating look different for an optimistic RL learner versus an optimistic Bayesian RL learner?

      We now show that the winning models can reproduce the main behavioral patterns (revised Figure 1b).

      Moreover, we plotted estimated and observed average belief updating for each participant (n=123) using the overall best-fitting asymmetrical RL-like updating model shown in SI Figure 6.

      Would the asymmetry parameter in the former be correlated with the asymmetry parameter in the latter? Moreover, crucially, would one be able to reliably distinguish the models from one another under the model estimation and selection criteria that the authors have used here (presenting robust model recovery could help to show this)?

      The asymmetry parameter estimated with the optimistically biased RL- and Bayesian models did correlate (r = 0.735; p < 0.001).

      However, we argue that while the observed updating behavior and estimated free parameters are similar for RL-like and Bayesian learners, the underlying assumed cognitive processes differed and are identifiable. To test this assumption, we have added a model recovery analysis now reported in the supplement section 2c and main manuscript’s methods section pages 24–25.

      As shown in SI Figure 5 confusion matrix, there is evidence for strong recovery of nearly all models, and importantly for the two winning models: the optimistically biased RL-like model and the rational Bayesian model of belief updating. This analysis thus rules out that the two model families were confused and mitigate concerns about the validity of the model selection.

      Note, one exception was observed. The RL-like and Bayesian updating models that assumed no scaling and asymmetry were best recovered by their respective models that estimated the asymmetry parameter. Many factors could explain this. For example, it could be that the models, which assumed asymmetry, but no scaling, may have captured some bias in updating due to noise generated by the zero parameter models.

      A justification is also needed to focus on the "RL-like updating model with an asymmetry and scaling learning rate component" in Figure 3. As I understand it, this model fits best outside of the pandemic, but another model - the Rational Bayesian Model - does worse (and does the best during the pandemic). What model best combines the groups (outside and inside the pandemic)?

      We thank the reviewer for highlighting the need to justify our focus on the biased RL-like updating model in Figure 3. The model chosen for parameter comparison was selected based on a model comparison procedure conducted across all 12 models, including data from all participants (both those tested outside and during the pandemic, n=123). This model comparison revealed that Model 1 — the RL model with both asymmetry and scaling learning rate parameters estimated — provided the best fit across the entire dataset (Ef = 0.40, pxp = 0.99). As such, we focused on this model for parameter comparisons in Figure 3 to ensure consistency with the model comparison results and to interpret the parameters in the context of the overall best-fitting model. We added this information on top of the model parameter comparison results on page 8. Moreover, SI Figure 6 in the supplements shows how this model reproduces the observed belief updating in each of the 123 participants.

      Why do the authors use absolute belief updating (|UPD|) in the first linear mixed effects model (equation iv)? Since an update is calculated differently depending on whether information calls for an update in an upward or downward direction, I do not understand the need to do this (and it means that updates that go in the wrong direction - away from the information - are counted as positive)

      Thank you for driving our attention to this point. The ‘absolute belief updating’ note was incorrect, and we apologize for the confusion. To be precise, we did not use absolute updating values in our analyses. Belief updating was assumed on each trial to go either toward the base rate (e.g., Update = E2 – E1) for negative estimation errors or away from it for positive estimation errors (e.g., Update = E1 – E2). Updates that went in the wrong direction, further away from the base rate, were thus counted and included in the analysis with their negative sign. We have corrected this important point in equation iv of the methods section on page 19.

      Figure 4: The task schema does not show a confidence rating for base rates.

      Thank you for catching this. We have now added the confidence ratings for base rates to the task in Figure 4b in the revised version of the manuscript. We have furthermore corrected a typo in Figure 4a: The sample size for the group 3 tested in Mai 2021 now indicates 31.

      The authors report that base rates are uniformly distributed - this is quite different to other instances of the task where base rates are normally distributed (ideally around the midpoint of the scale). Why this deviation in the design?

      We used life events and base rates like those used in past studies of belief updating (Garrett and Sharot 2017, Sharot et al. 2011, Garrett et al. 2017, Korn et al. 2017), which were normal to uniformly distributed (W = 0.952, p = 0.088, Shapiro-Wilk test). The base rates ranged between 10% and 70%, with a mean of 40%. Participants rated their estimates between 3% and 77%, which ensured that for most likely (base rate = 70%) and most unlikely events (base rate = 10%) there was the same space (7%) to update beliefs toward the base rates. Moreover, all statistical models included the absolute estimation errors as a control for variance potentially explained by different estimation error magnitude[42,43]. We added this extra base rate information to the methods section’s task description on page 16.

      The task is comprised of only negative life events, which arguably this hinders the generalizability of the results. The authors could mention this as a limitation (there has been a significant quantity of debate about this point in relation to this task: see the work from Ulrike Hahn's lab).

      We have added a paragraph to the discussion page 13 to provide a rationale for using only adverse events. This paragraph now reads:

      “In this study we tested how actual adverse experiences affect the updating of negative future outlooks in healthy participants and in analogy to studies conducted in depressed patients[19,20,24] following the cognitive model of depression[37]. One open question is whether findings were specific to the adverse event framing[38,39,40]. We argue that under normal, non-adverse contexts belief updating should also be optimistically biased for positive life events, as shown by previous research[41,42]. However, how context such as experiencing a challenging or favorable situation influence the updating of beliefs about positive and negative outlooks remains an open question.”

      It would be useful to show the parameter recovery for all parameters (not just the learning rates) and the correlation between parameters (both in simulations and in the fitted parameters).

      We apologize for being unclear on this part. The models included two free parameters that were the components of the learning rates: The scaling and the asymmetry parameter. We now have added parameter recovery analyses for the scaling and asymmetry components of the learning rates for (1) the Bayesian model of belief updating during the pandemic, and (2) the RL-like model of belief updating outside the pandemic to the supplement (SI section 2b, SI Figure 4).

      Reviewer #2:

      The authors investigated how experiencing the COVID-19 pandemic affected optimism bias in updating beliefs about the future. They ran a between-subjects design testing for participants on cognitive tasks before, during, and after lifting the sanitary state of emergence during the pandemic. The authors show that optimism bias varied depending on the context in which it was tested. Namely, it disappeared during COVID-19 and re-emerged at the time of lift of sanitary emergency measures. Through advanced computational modeling, they are able to thoroughly characterize the nature of such alternations, pinpointing specific mechanisms underlying the lack of optimistic bias during the pandemic.

      Strengths pertain to the comprehensive assessment of the results via computational modeling and from a theoretical point of view to the notion that environmental factors can affect cognition. However, the relatively small sample size for each group is a limitation.

      Thank you for this review.

      We acknowledge that sample sizes in each group are lower, especially when breaking down the participant sample into four sub-samples tested in the different contexts. To mitigate concerns we checked the power of the observed context by valence interaction on belief updating. To this aim we simulated new belief updates using the parameters from the best fitting optimistic RL-like model of observed belief updating outside the pandemic, and the rational Bayesian model of observed belief updating during the pandemic. At each iteration we performed a linear mixed effects model analysis of the simulated belief updates[44] analogous to equation iv in the main text. The frequency across 1000 iterations with which the LMEs detected a significant interaction of valence by context on simulated belief updating was 75 %. This frequency indicates the power of the valence by context interaction on observed belief updating. In other words, false negatives were 25% likely, which meant type II errors of failing to reject the null hypothesis when the effect was there. We have added these extra analyses to the main manuscript’s results section page 4 and method’s section page 20.

      A major impediment interpreting of the findings is the need for additional measures. While the information on for example, risk perception or the need for social interaction was collected from participants during the pandemic, the fact that these could not be included in the analysis hinders the interpretation of findings, which is now generally based on data collected during the pandemic, for example, reporting increased stress. While authors suggest an interpretation in terms of uncertainty of real-life conditions it is currently difficult to know if that factor drove the effect. Many concurrent elements might have accounted for the findings. This limits understanding of the underlying mechanisms related to changes in optimism bias.

      We agree with the reviewer on the limitation arising from the lack of physiological and self-report measures of stress, threat, and perceived uncertainty. To address this point and a similar point raised by reviewer 1 we have added a section to the supplement (SI section 4) that now reports explorative correlations between questionnaire responses of subjective perceptions of risk and anxiety, behavior (e.g. mask wearing, social distancing) and belief updating measured during the 1st strict lockdown.

      We now also further discuss this limitation on page 12 of the main text’s discussion.

      I recommend that the authors spend more time on explaining the belief-updating task in the presentation of the experiment.

      Thank you for this advice. We now provide a clearer and more detailed description of the belief-updating task in the main manuscript’s methods section and have updated Figure 4b to display the confidence rating event in the task schema.

      The task description now reads:

      “As illustrated in Figure 4b, each of the 40 trials began with presenting an adverse life event. Participants estimated their own risk and the risk of someone else their age and gender. Then the base rate of the event occurring in the general population was displayed on the computer screen. Participants rated their confidence in the accuracy of the presented base rate. Finally, they re-estimated their risk for experiencing the event now informed by the base rate.”

      The experimental task seems to include a self-other dimension, which is completely disregarded in the analysis. It would be interesting to explore whether the effect of diluted optimism bias during the pandemic is specific to information about self vs. Other.

      We appreciate the reviewer's observation regarding the self-versus-other dimension in the belief updating task design. As now shown in SI Figure 2 the participants indeed displayed an optimism bias: They estimated that adverse events are more likely to happen to others than to themselves (ß = 3.02, SE = 0.86, t (232) = 3.53, p = 5.09e-04, 95% CI [1.33 – 4.71]; SI Figure 2; SI Table 18). This effect was observed overall participants. The pandemic context had no significant effect (ß = -1.91, SE = 3.00, t (232) = -0.64, p = 0.52, 95% CI [-7.82 – 4.00]; SI Table 18). Moreover, following previous studies of optimistically biased belief updating we tested the effect of estimation errors (EE) calculated on the difference between the estimate for someone else (eBR) and the base rate (BR), following: EE = eBR – BR[4,5,25,26]. When categorizing trials as good news or bad news based on this alternative EE calculation the context-by-EE valence interaction remained significant (SI Table 6).

      We conclude from these additional analyses that experiencing the pandemic specifically influenced belief updating but did not affect optimism biases in initial beliefs about the future.

      Please provide an English translation of the instructions for the task.

      We now provide an English translation of the task instructions in the Supplement section 5.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The production of ROS has been measured in a very superficial way.

      The term "ROS" confers a plethora of chemical species which exerts different physiological effects on different cells and situations.

      Mitochondria through one of the source, but not the only source of ROS production. Only measuring ROS with mitosox do not reflect the cellular condition of ROS in a specific condition. I would suggest authors consider doing IF of oxidative stress specific markers , carbonyl group and also, maybe, Amplex red for determining average oxidative stress and ros production in the cells.

      We agree with the reviewer that a detailed analysis of ROS production and its markers would strengthen the manuscript. Accordingly, we will perform the Amplex Red assay for Figure 1.

      (2) 8-OHG signal seems very confusing in Figure 7E. 8-ohg is supposed to be mainly in the nucleus and to some extent in mitochondria. The signal is very diffused in the images. I would suggest a higher magnification and better resolution images for 8-ohg. Also, the VWF signal is pretty weak whereas it should be strong given the staining is in aorta. Authors should redo the experiments.

      The reviewer’s comment is correct regarding the expected signal. We will repeat the assays. However, we would like to note that the flat morphology of the endothelial cell monolayer on the aortic surface may limit the visualization of subcellular signal differentiation when transversely sectioned.

      (3) PCA analysis is quite not clear. Why is there a convergence among the plots? Authors should explain. Also, I would suggest that the authors do the analysis done in Figure 8B again with R based packages. IPA, though being user-friendly, mostly does not yield meaningful results and the statistics carried out is not accurate. Authors should redo the analysis in R or Python whichever is suitable for them.

      Thank you for your valuable feedback. We acknowledge the concern regarding the PCA analysis and the convergence observed in the plots. In the revised manuscript, we will revise our interpretation to clarify this observation.

      Additionally, we appreciate your suggestion to use R-based packages for pathway analysis. We will make efforts to regenerate the analysis presented in Figure 8B using R to enhance the statistical robustness and reproducibility of our results.

      (4) The MS analysis part seems pretty vague in methods. Please rewrite.

      We will revise the methods section to improve the legibility.

      Reviewer #2 (Public review):

      All the experiments performed here are in overexpression background therefore, it would be crucial to show that p66Shc is SUMO2ylated at physiological levels.

      To address this concern, we will attempt to assess p66Shc-SUMO2 levels under physiological conditions. However, we would like to highlight a technical limitation: the currently available antibodies do not distinguish p66Shc from other isoforms, nor SUMO2 from SUMO3. Therefore, enriching for the endogenous p66Shc-SUMO2 adduct will require novel tools and techniques, which we are actively exploring.

      Reviewer #3 (Public review):

      One notable weakness is that the link between the observed cellular changes and the ultimate in vivo phenotype remains only partially explored. While the authors successfully show that p66ShcK81R knockin mice are protected from endothelial dysfunction in a hyperlipidemic context, additional experiments characterizing the broader tissue-specific roles, or examining further endothelial assays in vivo, would strengthen the mechanistic conclusions. It would also be beneficial to see more direct evaluations of p66Shc subcellular localization in the protective knockin mice to complement the proteomic findings.

      That is an excellent suggestion. We will determine the tissue specific distribution of endogenous p66ShcK81R.

      Despite these gaps, the data broadly support the authors' main conclusions. The authors lay out a plausible mechanistic pathway for how hyperlipidemia and increased global SUMOylation can converge on the oxidative stress pathway to provoke vascular dysfunction.

      The likely impact of this work on the field is noteworthy. Beyond clarifying how a single post-translational modification event can influence the pathophysiology of endothelial cells, the study provides a model for investigating broader roles of SUMO2 in other cardiovascular conditions and highlights the importance of identifying additional SUMOylation sites and their downstream impact.

      In conclusion, by demonstrating the direct SUMOylation of p66Shc at lysine-81 and linking that modification to endothelial dysfunction in a hyperlipidemic mouse model, this paper offers valuable insights into how broadly acting post-translational modifiers can evoke specific pathological effects.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript assesses the utility of spatial image correlation spectroscopy (ICS) for measuring physiological responses to DNA damage. ICS is a long-established (~1993) method similar to fluorescence correlation spectroscopy, for deriving information about the fluorophore density that underlies the intensity distributions of images. The authors first provide a technical but fairly accessible background to the theory of ICS, then compare it with traditional spot-counting methods for its ability to analyze the characteristics of γH2AX staining. Based on the degree of aggregation (DA) value, the authors then survey other markers of DNA damage and uncover some novel findings, such as that RPA aggregation inversely tracks the sensitivity to PARP inhibitors of different cell lines.

      The need for a more objective and standardized tool for analyzing DNA damage has long been felt in the field and the authors argue convincingly for this. The data in the manuscript are in general well-supported and of high quality, and show promise of being a robust alternative to traditional focus counting. However, there are a number of areas where I would suggest further controls and explanations to strengthen the authors' case for the robustness of their ICS method.

      Strengths:

      The spatial ICS method the authors describe and demonstrate is easy to perform and applicable to a wide variety of images. The DDR was well-chosen as an arena to showcase its utility due to its well-characterized dose-responsiveness and known variability between cell types. Their method should be readily useable by any cell biologist wanting to assess the degree of aggregation of fluorescent tags of interest.

      Weaknesses:

      The spatial ICS method, though of longstanding history, is not as intuitive or well-known as spot-based quantitation. While the Theory section gives a standard mathematical introduction, it is not as accessible as it could be. Additionally, the values of TNoP and DA shown in the Results are not discussed sufficiently with regard to their physical and physiological interpretation.

      We agree that a major limitation in adaption of this approach is a deeper understanding of the theory and results. We have updated the theory section to include further discussion (Page 4 line 132)

      The correlation of TNoP with γH2AX foci is high (Figure 2) and suggestive that the ICS method is suitable for measuring the strength of the DDR. The authors correctly mention that the number of spots found using traditional means can vary based on the parameters used for spot detection. They contrast this with their ICS detection method; however, the actual robustness of spatial ICS is not given equal consideration.

      We found it difficult to give equal consideration of robustness to ICS. The major limitation of traditional approaches is proper selection of an intensity threshold that is necessary to define and separate foci from background intensity. However, ICS does not employ a threshold, therefore we could not test different thresholding applications in ICS as we did with traditional methods. In our view the absence of the need for a threshold is profoundly advantageous. The only inputs we employ in the ICS analysis are used to segment cell nuclei, yet these have no impact on the ICS calculation and are necessary for any analysis of the DDR.

      Reviewer #2 (Public review):

      Summary:

      Immunostaining of chromatin-associated proteins and visualization of these factors through fluorescence microscopy is a powerful technique to study molecular processes such as DNA damage and repair, their timing, and their genetic dependencies. Nonetheless, it is well-established that this methodology (sometimes called "foci-ology") is subject to biases introduced during sample preparation, immunostaining, foci visualization, and scoring. This manuscript addresses several of the shortcomings associated with immunostaining by using image correlation spectroscopy (ICS) to quantify the recruitment of several DNA damage response-associated proteins following various types of DNA damage.

      The study compares automated foci counting and fluorescence intensity to image correlation spectroscopy degree of aggregation study the recruitment of DNA repair proteins to chromatin following DNA damage. After validating image correlation spectroscopy as a reliable method to visualize the recruitment of γH2AX to chromatin following DNA damage in two separate cell lines, the study demonstrates that this new method can also be used to quantify RPA1 and Rad51 recruitment to chromatin following DNA damage. The study further shows that RPA1 signal as measured by this method correlates with cell sensitivity to Olaparib, a widely-used PARP inhibitor.

      Strengths:

      Multiple proof-of-concept experiments demonstrate that using image correlation spectroscopy degree of aggregation is typically more sensitive than foci counting or foci intensity as a measure of recruitment of a protein of interest to a site of DNA damage. The sensitivity of the SKOV3 and OVCA429 cell lines to MMS and the PARP inhibitors Olaparib and Veliparib as measured by cell viability in response to increasing amounts of each compound is a valuable correlate to the image correlation spectroscopy degree of aggregation measurements.

      Weaknesses:

      The subjectivity of foci counting has been well-recognized in the DNA repair field, and thus foci counts are usually interpreted relative to a set of technical and biological controls and across a meaningful time period. As such:

      (1) A more detailed description of the numerous prior studies examining the immunostaining of proteins such as γH2AX, RAD51, and RPA is needed to give context to the findings presented herein.

      We apologize for not providing enough detail. We have added further references and discussion. γH2AX foci counting, in particular, has been used in thousands of previous studies. (Pages 18 line 513 and 517)

      (2) The benefits of adopting image correlation spectroscopy should be discussed in comparison to other methods, such as super-resolution microscopy, which may also offer enhanced sensitivity over traditional microscopy.

      Thank you for raising this point. We have added this discussion (page 19 line 553). The limiting factor that ICS addresses is the partition coefficient of signal in a foci or cluster versus outside the cluster. Super-resolution will not necessarily improve this unless it is resolved down to single molecule counting. However, one would still need to evaluate how to define a cluster or foci in the background of non-cluster distribution.

      (3) Additional controls demonstrating the specificity of their antibodies to detection of the proteins of interest should be added, or the appropriate citations validating these antibodies included.

      We have added text stating that we only use validated antibodies (page 6 line 193). One thing to note is that we are measuring differences between treatment conditions, thus, if an antibody has non-specific labeling of proteins of cellular structures that do not change upon treatment, our approach would overcome this limitation.

      Reviewer #3 (Public review):

      Summary:

      This paper described a new tool called "Image Correlation Spectroscopy; ICS) to detect clustering fluorescence signals such as foci in the nucleus (or any other cellular structures). The authors compared ICS DA (degree of aggregation) data with Imaris Spots data (and ImageJ Find Maxima data) and found a comparable result between the two analyses and that the ICS sometimes produced a better quantification than the Imaris. Moreover, the authors extended the application of ICS to detect cell-cycle stages by analyzing the DAPI image of cells. This is a useful tool without the subjective bias of researchers and provides novel quantitative values in cell biology.

      Strengths:

      The authors developed a new tool to detect and quantify the aggregates of immunofluorescent signals, which is a center of modern cell biology, such as the fields of DNA damage responses (DDR), including DNA repair. This new method could detect the "invisible" signal in cells without pre-extraction, which could prevent the effect of extracted materials on the pre-assembled ensembles, a target for the detection. This would be an alternative method for the quantification of fluorescent signals relative to conventional methods.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) The ICS theory section is essential and based on an excellent review from one of the authors. It would benefit greatly from a diagram showing where the quantities 𝒈(𝟎, 𝟎), 𝝎𝟎, and 𝒈inf come from in the 2D Gaussian fit, ideally for two cases where these quantities differ (i.e., how they correspond to different DA or TNoP values). In my opinion, this addition would greatly increase the manuscript's accessibility for DDR researchers. The citation of the review at the beginning would also be a plus.

      We have added the review citation at the front of the theory section (page 3 line 87).We have highlighted where g(0,0), the most critical measurement for determination of TNoP and DA, derives from in Figure 2D. However, it is difficult to describe all the curve fit parameters in an image as they have some interdependency on each other and thus labeling one in a single image would not independently capture how they might be observed in a different curve fit.

      (2) The TNoP measured in Figure 2 is a quantity about 2000-3000 times greater than the number of "traditionally detected" foci by both methods and the linear relations have very low Y intercepts. Can the authors comment explicitly on the physical interpretation of this number - are 2 to 3 thousand independent particles present within each "focus" detected by traditional means? If so, then what might one "particle" correspond to? (a single secondary antibody or fluorophore? a nucleosome?). In a similar vein, the X intercepts lie at around 25 foci, meaning that in images with fewer than that number of foci detected by ImageJ or Imaris, the ICS method should detect zero TNoP - is this in line with the authors' predictions? Is it possible that a first-order line fit is not the most appropriate relation between the two methods?

      We apologize for our brevity here. Since DA proved to be a more useful metric we did not spend much effort discussing TNoP. TNoP correlates to the number of clustered particles, or non-diffuse fluorophores. TNoP is the inverse of the number of individual particles per nucleus, but the value is not a direct measure of foci. If a sample had no clustering at all, the number of individual particles would be at a maximum and the TNoP would be at a minimum. However, as fluorophores cluster, the number of individual particles (i.e. non-clustered fluorophores) decreases, which increases the TNoP value. Therefore, TNoP has a correlation to the number of foci detected through traditional measurements, as we found here. Yet, TNoP is a relative measurement and cannot be compared across different conditions. Similar to foci counting, TNoP is unable to factor the size or intensity of each cluster, thus DA is a more appropriate quantification of the DNA damage response.

      The value of TNoP is dependent on the fitted point spread function and the area of the nucleus. The y=0 intercept of TNoP is defined by the optical setup and is not expected to necessarily go through x=0. Intriguingly, other groups have found that some foci identified through traditional measurements are actually clusters of multiple smaller foci, thus the concept of what a foci represents is difficult to interpret. Thus, here we aimed to show a general correlation of TNoP with foci count through traditional methods to reflect how ICS is similar to foci counting, then employed DA to overcome the limitations of defining a foci.

      We have tried to clarify this in the text (page 8, line 266)

      (3) Some suggestions to address the robustness of ICS:

      For a given sample (i.e. one segmented nucleus), the calculation of DA and TNoP should be similar between different images of that same nucleus taken at different times, similar to how the number of traditionally detected foci would be fairly invariant. In particular, it should be shown that these values are not just scaling with the higher normalized intensity seen in stronger DDR responses. In the same vein, the linear relationship between TNoP and "foci" should not change even if the confocal settings are slightly different (i.e., higher/lower illumination intensity) as long as the condition stipulated by the authors in the Discussion holds ("ICS can be implemented on any fluorescence image as long as the square relative fluorescence intensity fluctuations are detectable above noise fluctuations."). To show, as the title states, that spatial ICS is a robust tool, it would be desirable to demonstrate this with a series of images of the same cell at the same or varying excitation intensities.

      Thank you for your suggestions. Indeed, the calculation will be the same over sequential images of the same cell. Observations of dose dependent DA that does not correlate with intensity for RPA1 and RAD51 results (Fig. S5) directly demonstrates that DA does not just scale with intensity.

      We would not expect the TNoP to change with confocal setting, however we show in Figure 1 that the number of foci does indeed change with intensity settings as captured by thresholds. Therefore, any interpretation of TNoP vs. foci count would be very difficult to make at different microscope settings. To ensure we are fairly comparing ICS to existing analysis we keep the settings the same and measure changes between conditions.

      (4) More information is needed on how intensity normalization was performed. The Methods states "Measurements across experiments were normalized by the control in each dataset." The DMSO (0mM drug) plots all appear to have a mean of 1.0, so it appears the values for each set of control nuclei were divided by their own mean, and then the values for each set of experimental nuclei were divided by the mean value of all 3 controls as an aggregate; is this correct?

      We apologize for not being more clear. Thank you for raising this point. We normalized data to a control from each experimental group. Thus, in figures 3,4 and 5 data were collected over multiple experiments with one control per experiment and each treatment condition included in each experiment. Therefore, we normalized each result to the corresponding control from that imaging session. However, in Figure 8 we ran experiments at much higher throughput with multiple controls per experiment, thus the data were normalized to the overall average of the controls, which is why the control averages are not all at a value of 1. We have clarified this in the text. (Page 7 line 218).

      (5) Some more information about the ICS analysis should be given if the full code is not provided - in particular, how the nucleus mask was implemented on the "signal" channel (were the edges abruptly set to zero or was a window function introduced to avoid edge effects in the discrete FFT?

      Thank you for raising this point. We have added the code to GitHub - github.com/ dubachLab/ics. The signal region was established by simply applying the nuclear mask from the DAPI channel to the IF channel. Each region is padded with average intensity value at the edges for 2x the dimensions of the ROI to remove edge effects in the FFT.

      Minor comments:

      (1) Figure 3, 4, 5: I think it would aid figure readability if channels were labeled in the images themselves, not just in the legend.

      Thank you for the suggestion, we tried doing this and struggle to fit a label with the layout of the images. We were also concerned about interpretation of data in each column and the potential to assign data to each figure if they were so prominently labeled.

      (2) Supplemental Figures are mislabeled; the order given in the legends is S1, S2, S3, S2, S3. S4 is called out in the main text where it should be S5.

      Thank you for catching this error. We have made the necessary corrections. S4 contains data on cellular response to the drugs, while S5 contains intensity data in response to MMS.

      (3) It should be stated for each Figure what kind of microscopy was performed - I assume that it is confocal for everything except when widefield is explicitly stated, but for clarity please add this information.

      Indeed, this is correct, we have indicated which microscopy was used for each figure.

      (4) The MATLAB code and full (uncropped) Western blots should be provided as supplemental data if possible.

      We have included a GitHub link for the code and un-cropped western blots.

      (5) The p values from significance tests should indicate whether multiple comparisons correction was necessary (if suggested by Prism) and performed.

      Apologies for a lack of clarity but this was not necessary, significance was calculated vs. the next lower dose (e.g. 10 micromolar vs. 1 micromolar). We have clarified this in the methods (page 7 line 221).

      Reviewer #2 (Recommendations for the authors):

      Major points:

      In addition to the weaknesses noted above, to encourage widespread adoption of this method, the authors should make the tools that they used for their analysis publicly available. In a few instances (e.g., compare Figures 3J and 3L), other methods outperform DA. It would be meaningful to discuss when especially DA may be a better measure than others (such as intensity or number of foci).

      We have made code available on Github. We expect results, such as those in Figures 3J and 3L where intensity is significantly higher at the highest concentration but DA is not are reflective of the underlying biology and this may be interpreted differently under different experimental conditions. Imaris spots (Fig. 3K) also does not capture a significant increase at the highest dose of olaparib, suggesting that intensity may raise but it doesn’t not generate more foci. These results are likely highly dependent on the mechanism of olaparib at such a high concentration and the DDR response. We are hesitant to draw biological conclusions from these results and instead would like to highlight the capacity of ICS to evaluate the DDR, therefore we don’t want to make any broad comments about different applications.

      Minor points:

      (1) Pg. 12: "We used MMS to induce DNA damage in SKOV3 and OVCA429 cells. As expected, normalized intensity for RPA1 and RAD51 values (Figure S5) did not display a dose dependence on MMS concentration."

      Please provide a citation for the claim that RPA1 and RAD51 normalized intensities do not display a dose dependence on MMS concentration.

      These were data that we generated. We were not expecting an intensity change as that would presumably require increased protein generation in response to MMS, compared to gH2AX where the phospho-specific H2AX is generated in the DDR.

      (2) Pg. 12: "Similar to RPA1, RAD51 does not form distinguishable foci in the nuclei in cells without preextraction (Fig. 5)." Please provide a citation for this claim.

      We did not do pre-extraction and our results don’t produce changes in distinguishable foci. We provided citations discussing how, without pre extraction, foci formation for these proteins is not obvious (REF 38 and 39).

      (3) I noted that the authors cite one paper [38] apparently showing that RPA and Rad51 do not always form foci, however, this is in the C. elegans germline in response to micro irradiation, therefore I am not sure that it is applicable to human cells.

      We apologize for referencing a paper on C elegans. Most papers looking at RPA and RAD51 in the DDR use pre-extraction as it seems necessary to observe foci. Therefore, there are not as many papers, that we could find, that do not use pre-extraction. Reference 39 is in Hela cells.

      Reviewer #3 (Recommendations for the authors):

      Major points:

      (1) Page 8, the second paragraph: In the Result section, it is better to describe how the authors carried out immuno-staining (without pre-extract subtraction) and ICS briefly, although the method is described in detail in the Method section.

      Thank you for the suggestion, we have added this description (page 8, line 259)

      (2) In Figure 5K-P: The authors analyzed "invisible" RAD51 foci on the image (Fig. 5L, M, O, and P) without pre-extraction. As a control experiment, it is useful to check whether pre-extraction would provide "visible" RAD51 foci and to examine the similar MMS concentration dependency shown in Figure 5R (or 5T). This would strengthen the power of the ICS analysis.

      Thank you for the suggestion. In our hands, pre-extraction is extremely subjective. We have tried performing pre-extraction but find highly variable results depending on conditions. Therefore, we did not include any pre-extraction here. We expect that performing these experiments may or may not agree with results in Figure 5 largely because we are unable to achieve repeatable pre-extraction foci counting.

      (3) Figure 6D (and 6C) looks very interesting. It would be important to show the interpretation of this correlation shown in the graph. Although the authors argued that ICS analysis results shown in the graph could provide new insight into the DDR (page 14, last line 5), as shown in another part, it is important to carry out the same analysis by using Imaris Spots. Moreover, it is interesting to apply the analysis to RAD51 foci (shown in Figure 5), given that the PARPi effect is enhanced in the absence of RAD51mediated recombination.

      We completely agree that this analysis may generate interesting results to help interpret the DDR response to PARP inhibition. These experiments are part of an ongoing follow up study where we extend the use of ICS to other parts of the DDR and investigate protein clustering across several proteins with impact on PARPi response. Therefore, since the focus of this manuscript is introducing ICS as a tool to study the DDR, we believe that omitting those data here does not deter from the central points of the manuscript. We including results in Figure 6 because we wanted to show how ICS could impact DDR research. Furthermore, combined with our advances shown in Figures 7 and 8, we are currently working on adapting ICS to be high-throughput and much simpler than Imaris spots for handling large datasets needed to generate results like those in Figure 6.

      Minor points:

      (1) Figure 1I, blue arrows: These showed an area with a higher background. Because of a low magnification, it is very hard to see the difference from the other areas of the background. It is better to show a magnified image of the representative region with a higher background.

      We hope that readers can see the higher intensity in the diffuse area. We attempted to construct a zoomed in area, but that either blocked a significant portion of the nonzoomed image or added complexity to the figure. We have noted that images in Figure S1 are larger and more obviously capture an increase in background intensity.

      (2) Figure 2 legend, line 5, the same as "A)": This should be "B".

      Here, the number of independent particle clusters is intended to be the same as A, the difference is that the independent particles are clusters in C and individual fluorophores in A.

      (3) Page 9, the first paragraph, last line, foci formation, and foci composition: These should be "focus formation and focus composition".

      We have changed this.

      (4) Page 15, the first paragraph, line 5, palbociclib, camptothecin, or etoposide: please explain what kinds of the drugs are.

      We have added that these drugs cause cells to stall at different cell cycle stages. Explaining the drugs would take considerable room in the text.

      (5) Page 16, the first paragraph, line 1, bleomycin: Please explain what this drug is.

      Similar to above, we have stated that this drug causes DNA damage, going into detail would take several sentences.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Triple-negative breast cancer (TNBC) accounts for approximately 15-20% of all breast cancers. Compared to other types of breast cancer, TNBC exhibits highly aggressive clinical characteristics, a greater likelihood of metastasis, poorer clinical outcomes, and lower survival rates. Immunotherapy is an important treatment option for TNBC, but there is significant heterogeneity in treatment response. Therefore, it is crucial to accurately identify immunosuppressive patients before treatment and actively seek more effective therapeutic approaches for TNBC patients.

      Strengths:

      In this work, the authors collected and integrated data from single cells and large volumes of RNA sequencing and RNA-SEQ to analyze the TME landscape mediated by genes associated with iron death. On this basis, the prediction model of prognosis and treatment response of 131 patients was constructed using a machine learning algorithm, which is beneficial to provide individualized and precise treatment guidance for breast cancer patients.

      Thank you for your appreciation of our work. We are encouraged by your positive feedback and will continue to explore new avenues in personalized medicine for breast cancer.

      Weaknesses:

      However, there are still some issues that need to be clarified:

      (1) The description of the research background is too brief and concise, and it is necessary to add some information about the limitations of existing methods and the differences and advantages of this study compared with other published relevant studies, so as to better highlight the necessity and research value of this study.

      Thank you for your suggestions. We have supplemented the research background and compared the differences between this study and other studies, further highlighting the research value of our study.

      (2) This study is a retrospective analysis of a public data set and lacks experimental validation and prospective experiments to support the results of bioinformatics analysis. This should be added to the acknowledgment of limitations in the study.

      Thank you for the constructive feedback. We also acknowledge that the lack of experimental evidence is one of the limitations of this study. Therefore, we plan to conduct in vivo and in vitro experiments in our future research to support the findings of our bioinformatics analysis, and have already supplemented the relevant content in the limitations of Discussion.

      Reviewer #2 (Public review):

      Summary:

      This study aims to explore the ferroptosis-related immune landscape of TNBC through the integration of single-cell and bulk RNA sequencing data, followed by the development of a risk prediction model for prognosis and drug response. The authors identified key subpopulations of immune cells within the TME, particularly focusing on T cells and macrophages. Using machine learning algorithms, the authors constructed a ferroptosis-related gene risk score that accurately predicts survival and the potential response to specific drugs in TNBC patients.

      Strengths:

      The study identifies distinct subpopulations of T cells and macrophages with differential expression of ferroptosis-related genes. The clustering of these subpopulations and their correlation with patient prognosis is highly insightful, especially the identification of the TREM2+ and FOLR2+ macrophage subtypes, which are linked to either favorable or poor prognoses. The risk model thus holds potential not only for prognosis but also for guiding treatment selection in personalized oncology.

      Thank you for your thorough review and insightful comments.

      Weaknesses:

      The study has a relatively small sample size, with only 9 samples analyzed by scRNA-seq. Given the typically high heterogeneity of the tumor microenvironment (TME) in cancer patients, this may affect the accuracy of the conclusions. The scRNA-seq analysis focuses on the expression of ferroptosis-related genes in various cells within the TME. In contrast, bulk RNA sequencing uses data from tumor samples, and the results between the two analyses are not consistent. The bulk RNA sequencing results may not accurately capture the changes happening in the microenvironment.

      Thank you for your constructive feedback. Although this study only included 9 samples, given the limited availability of scRNA-seq datasets for untreated TNBC in public databases, we chose to utilize a dataset that contains a relatively larger number of untreated TNBC samples. We are fully aware of the complexity and high heterogeneity of the TME. Despite the limited sample size, we first conducted rigorous quality control on the data and, based on this, preliminarily revealed the landscape of the TME mediated by ferroptosis-related genes. These findings provide a new perspective for understanding the biological mechanisms underlying the onset and progression of breast cancer. To enhance the reliability and generalizability of our research results, we plan to strive to expand the sample size in future work and consider integrating other omics technologies, such as proteomics and metabolomics, with scRNA-seq data for a more in-depth exploration of the complex interactions within the TME.

      We also agree with your viewpoint that scRNA-seq data reveals gene expression within individual cells, while bulk RNA-seq data reveals the average gene expression in tumor tissues, and there are differences in data acquisition and processing methods between the two. However, we believe that there are also some close connections between them in terms of gene expression levels. By comparing the expression specificity of marker genes for specific cell types in breast cancer tissues, we found that they are correlated with patient prognosis, and the results have been validated in both internal and external validation sets. Thank you once again for your valuable suggestions, which will play an important guiding role in our subsequent research.

      Reviewer #1 (Recommendations for the authors):

      (1) The breast cancer scRNA-seq dataset files of GSE176078 include 10 TNBC primary tumors (DOI:10.1016/j.compbiomed.2023.107066). However, in this study, only 9 cases were listed, please explain the reason for the data exclusion.

      Thank you for your questions. Although it was clearly stated in the original paper that "To elucidate the cellular architecture of breast cancers, we analyzed 26 primary pre-treatment tumors, including 11 ER+, 5 HER2+ and 10 TNBCs, by scRNA-Seq (Supplementary Table 1)," upon downloading and carefully examining the patient information in Supplementary Table 1, we only included 9 patients explicitly labeled as TNBC in our study (https://pmc.ncbi.nlm.nih.gov/articles/PMC9044823/#SD1).

      (2) The description of the technique in the methods section should be more detailed, such as parameter settings, quality control standards, etc.

      Thank you for your valuable suggestions. We have already supplemented the relevant content in the methods section.

      (3) Please check and correct formatting errors to improve readability, such as lines 176 and 177.

      We were really sorry for our careless mistakes. Thank you for your reminder. We have corrected the “Pseudotime analysis with scRNA-seq data helps to obtain an approximate landscape of gene expression dynamics” into “Pseudotime analysis of scRNA-seq snapshot data helps to provide an approximate landscape of gene expression dynamics”. And we have further checked and revised the formatting errors of the manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) In multiple sections of the paper, abbreviations are used without being defined when first mentioned.

      We were really sorry for our careless mistakes. Thank you for your reminder. We have already added definitions for the abbreviations in both the abstract and the main text.

      (2) The authors should analyze whether the transcription factors in Figure 2 are correlated with the expression of ferroptosis-related genes.

      Thank you for your valuable feedback. Some transcription factors in Figure 2 correlate with the expression of ferroptosis-related genes, which we have supplemented in the Discussion.

      (3) Figures 3d and 4e lack explanations for the axis values, and for Figure 4e, is the unit of the y-axis labeled "survival" in days?

      Thank you for your valuable feedback. We apologize for the lack of explanations for the axis values in Figures 3d and 4e and we have made revisions to both figures accordingly. We have noted that the unit "survival" on the y-axis of Figure 4e is in years, and we have already made the necessary supplement to clarify this. Thank you very much for your reminder.

      (4) The authors conducted their analysis using public databases but did not cite the original literature, nor did they discuss the similarities and differences between their findings and those in the original studies.

      Thank you for your valuable suggestions, and we deeply apologize for our carelessness. We have supplemented the original literature in the references and discussed the differences between this study and the original literature in the Discussion.

      (5) Some figures, particularly those involving heatmaps and t-SNE plots (e.g., Figures 1 and 3), present dense and complex data that may be challenging for readers to interpret. The heatmaps (Figure 1e-f and 3d) include many genes, but it is unclear how these genes were selected, and the scale of gene expression differences is difficult to interpret. Simplifying these figures by focusing on the most differentially expressed and clinically relevant genes (e.g., those with prognostic value) would improve readability.

      Thank you for your valuable suggestions. The t-SNE plots in Figures 1 and 3 primarily serve as a dimensionality reduction technique to visually present the clustering of multiple cells or samples based on gene expression, aiding readers in quickly identifying cell subpopulations. The heatmaps, on the other hand, are mainly used to showcase the differential expression of ferroptosis-related genes across different clinicopathological classifications and cell subpopulations, with varying shades of color helping readers quickly recognize gene expression differences among different cell subpopulations. The genes included in the heatmaps (Figures 1e-f and 3d) are sourced from the FerrDb website. We have uploaded the list of ferroptosis-related genes used in this study as Supplementary Table 1 and added the relevant steps in Method 2.3.

      (6) The study analyzes the expression of ferroptosis-related genes in different immune cells within the TME. The authors should discuss how these changes in gene expression may impact the function and behavior of immune cells.

      Thank you for your valuable feedback. We have supplemented the discussion with detailed effects of the main differential genes (FOLR2 and TREM2) on the tumor immune response.

      (7) The authors analyzed the expression of ferroptosis-related genes in immune cells using single-cell sequencing data. However, they subsequently applied the selected genes to perform a risk factor analysis in tumor cells. Is the expression and function of these genes the same in immune cells and tumor cells? This seems questionable.

      Thank you very much for your suggestion. We also believe that there may be differences in the expression and function of genes between immune cells and tumor cells. However, some genes may exhibit similarities in their expression and function in immune cells and tumor cells, especially within the tumor immune microenvironment, due to the complex and tight interactions between immune cells and tumor cells (as shown in Figures 1d and 2h), and their expression levels can be related to the onset, progression, and prognosis of tumors.

      (8) While the risk score model based on ferroptosis-related genes is promising, it lacks experimental validation, which weakens the strength of the conclusions. The authors should consider conducting in vitro or in vivo experiments. These functional studies would provide essential evidence to support the model's predictive capability.

      Thank you for the constructive feedback. We fully recognize the importance of conducting functional studies to substantiate the predictive capability of the model. Therefore, we plan to conduct in vitro and in vivo experiments in our future research to provide the necessary evidence and further validate the model's effectiveness.

      (9) The manuscript predicts sensitivity to 27 drugs based on the risk score, but it lacks mechanistic insight into why patients in the high-risk group might be more responsive to certain drugs. Including a more detailed discussion of the molecular mechanisms underlying this drug sensitivity, particularly linking ferroptosis-related genes to drug metabolism or efficacy, would provide a stronger rationale for the clinical application of these findings.

      Thank you very much for your valuable suggestions. In the discussion, we thoroughly analyzed the mechanism of action of the drugs (ABT-263 and erlotinib) with the greatest difference in sensitivity between high-risk and low-risk groups, as well as their correlation with ferroptosis.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this revised report, Yamanaka and colleagues investigate a proposed mechanism by which testosterone modulates seminal plasma metabolites in mice. The authors identify oleic acid as a particularly important metabolite, derived from seminal vesicle epithelium, that stimulates linear progressive motility in isolated cauda epidydimal sperm in vitro. The authors provide additional experimental evidence of a testosterone dependent mechanism of oleic acid production by the seminal vesicle epithelium.

      Strengths:

      Often, reported epidydimal sperm from mice have lower percent progressive motility compared with sperm retrieved from the uterus or by comparison with human ejaculated sperm. The findings in this report may improve in vitro conditions to overcome this problem, as well as add important physiological context to the role of reproductive tract glandular secretions in modulating sperm behaviors. The strongest observations are related to the sensitivity of seminal vesicle epithelial cells to testosterone. The revisions include addition of methodological detail, modified language to reflect the nuance of some of the measurements, as well as re-performed experiments with more appropriate control groups. The findings are likely to be of general interest to the field by providing context for follow-on studies regarding the relationship between fatty acid beta oxidation and sperm motility pattern.

      Thank you for summarizing and your positive evaluation of our study.

      Weaknesses:

      Support for the proposed mechanism is stronger in this revised report than in the previous report, but there are many challenges in measuring sperm metabolism and its direct relationship with motility patterns. This study is no exception and largely relies on correlations between various experiments in lieu of direct testing. Additionally, the discussion is framed from a human pre-clinical perspective, and it should be noted that the reproductive physiology between mice and humans is very different.

      Thank you for pointing out the challenges in our paper. We appreciate your comment on the limited evidence supporting the direct relationship between sperm metabolism and motility patterns under current experimental conditions. Based on your and reviewer2’s suggestions, we have decided to remove the experiments and discussion on the “effects of OA on sperm metabolism, motility and fertility (Fig. 7, Supplemental Figure 5A and C-F.)” and the corresponding parts in the Discussion section from the paper. (See also Reviewer 2's main comment) These data mainly show correlations, and did not show direct evidence of causality. Instead, we added a new experiment to the manuscript, in which a lipid mixture that mimics the fatty acid profile secreted testosterone-dependently from seminal vesicle epithelial cells was added to the sperm culture medium (New Supplemental Figure 5, Lines 259-268). In this experiment, motility parameters were measured using CASA. This experiment evaluates the direct effects of lipid exposure on sperm motility. With these revisions, we are able to focus on the metabolic changes caused by testosterone in seminal vesicle epithelial cells, which are the central focus of our research. We have added a short statement agreeing the potential importance of OA and our intention to more rigorously investigate the role of OA in sperm function in subsequent studies (Lines 402-407).

      Furthermore, we have revised text, clearly state the limitations of the species difference and clarify that the translational aspects to humans are speculative (Lines 383-384, 395-397, 408-410).

      We appreciate your guidance. We believe that these changes will strengthen our research.

      Reviewer #2 (Public review):

      Using a combination of in vivo studies with testosterone-inhibited and aged mice with lower testosterone levels as well as isolated mouse and human seminal vesicle epithelial cells the authors show that testosterone induces an increase in glucose uptake. They find that testosterone induces a difference in gene expression with a focus on metabolic enzymes. Specifically, they identify increased expression of enzymes regulating cholesterol and fatty acid synthesis, leading to increased production of 18:1 oleic acid. The revised version strengthens the role of ACLY as the main regulator of seminal vesicle epithelial cell metabolic programming. 18:1 oleic acid is secreted by seminal vesicle epithelial cells and taken up by sperm, inducing an increase in mitochondrial respiration. The difference in sperm motility and in vivo fertilization in the presence of 18:1 oleic acid and the absence of testosterone, however, is small. Additional experiments should be included to further support that oleic acid positively affects sperm function.

      Thank you very much for carefully reading the manuscript and for your comments. We appreciate your understanding that the role of ACLY in metabolic programming of seminal vesicle epithelial cells has been strengthened in the revised version. On the other hand, we agree with your view that the increase in sperm motility and fertilization rate by oleic acid is minimal under the current experimental conditions. We agree that further evidence is needed to support our conclusion regarding the positive effects of oleic acid on sperm function. Based on your comments and our re-evaluation of the data, we have decided to remove the experiments and discussion on “OA and sperm motility” from the current paper (Fig. 7, Supplemental Figure 5A and C-F). In the revised paper, we have significantly toned down the claims on the previous role of oleic acid and instead focused on the metabolic regulatory mechanisms of seminal vesicle epithelial cells.

      We hope that these revisions address your concerns and improve the overall clarity of the manuscript.

      Recommendations for the authors:

      Note from the reviewing editor: The reviewers agree that the revised manuscript is significantly improved and view the work as important. Both reviewers agree that the evidence for testosterone effects on seminal vesicle epithelial cells to support fatty acid synthesis is strong and suggest that the authors tone down their conclusion of oleic acid effect on sperm motility as the effect is very small. With this minor changes, the evidence to support the conclusion of the study is viewed as solid.

      Thank you for recognizing the improvements that we have made to our manuscript and for appreciating the importance of our research. We also appreciate your assessment that the evidence for the effect of testosterone on seminal vesicle epithelial cells that support fatty acid synthesis is solid.

      On the other hand, we agree with the two reviewers that the effect of oleic acid on sperm motility is limited and that the relevant data do not measure a direct relationship. Therefore, we have decided to withdraw the data set on the effect of oleic acid on sperm (Fig. 7, Supplemental Figure 5A and C-F) and focus this paper on seminal vesicle epithelial cells (in response to reviewer 2's suggestion). Given that testosterone-induced lipid (Fatty acid) synthesis in seminal vesicle epithelial cells is a key aspect of our study, we have included additional experiments in the revised manuscript to show how lipids affect sperm (New Supplemental Figure5, Lines 259-263).

      With these revisions, the manuscript emphasizes the importance of testosterone-dependent fatty acid synthesis in seminal vesicle epithelial cells and the fact that this includes oleic acid. The title has also been partially revised in line with these revisions.

      Reviewer #1 (Recommendations for the authors):

      Minor Comments:

      (1) The authors indicate in the methods that extracellular flux analysis was normalized to cell count. However, the y-axis units in Figs 4, 8, 9 and SFig 9 are not normalized.

      (2) The OA label appears to be missing from Fig 7A. Additionally, the scale bar is offset in one of the images and the length of the scale bar does not appear to be mentioned in the figure legend.

      Thank you for raising these points. We have corrected.

      Fig. 7 has been withdrawn in response to Reviewer 2's suggestion.

      Reviewer #2 (Recommendations for the authors):

      With the experiments included in their revised version the authors strengthen their conclusions about testosterone-induced metabolic reprogramming in seminal vesicle cells resulting in reduced proliferation. The experiments surrounding ACLY are well-designed and give insights into the underlying molecular mechanisms. For other parts, the manuscript became less clear and it is often hard to follow the author's line of thoughts for their conclusions.

      Based on the experiments shown in the manuscript this reviewer is still not convinced that OA positively affects sperm function. The changes in linear motility are minor, blastocyst levels are lower and the authors do not show that OA alone positively affects cleavage rate during AI. Without additional experiments that show a stronger effect on sperm function, the authors should consider focusing the manuscript exclusively on seminal vesicle epithelial cells.

      Thank you for your constructive comments on our paper. We thank the reviewer for pointing out that the effect of oleic acid (OA) on sperm function is limited in our current experiments. As reviewer 1 also pointed out, we agree that further experiments and improved methodology are needed to reliably demonstrate the functional effects of OA on sperm. Because the strength of the data on the direct relationship between fatty acids in seminal fluid and improved sperm function is currently insufficient, we have removed the data set for oleic acid and sperm motility (Fig. 7, Supplemental Figure 5A and C-F) and focused on the “the mechanism of metabolic regulation of testosterone in seminal vesicle epithelial cells”. We have consistently narrowed the focus of the paper to the theme of “how testosterone changes energy metabolism in seminal vesicle epithelial cells”. In accordance with this change, the structure of the paper has also been partially revised (red text in the manuscript). With these revisions, the main point of the paper focuses on the mechanism by which testosterone regulates metabolic pathways in the seminal vesicle epithelial cells.

      For more detailed revisions, please see the responses to your comments below.

      (1) 45-55 still need major revision. It will not become clear to the reader what the authors mean by epididymal maturation. 'Ability to fertilize in in vitro?' Epididymal sperm are moving linearly in the absence of seminal vesicle fluid. Increased progressive motility, hyperactivation, and the ability to undergo the acrosome reaction are induced upon exposure to seminal vesicle fluid. The authors should introduce the concept of capacitation and that capacitation can be induced in vitro by exposure to bicarbonate and a cholesterol acceptor.

      Thank you for pointing out the ambiguity of epididymal maturation, the need to clarify the concept of capacitation, and the role of seminal plasma in this context. The revised text explains that epididymal maturation only gives sperm their potential ability to fertilize. It also explains that it is the subsequent capacitation process—inducible in vitro by incubation with bicarbonate and cholesterol acceptors—that gives full fertilization potential. On the other hands, we emphasize that in vivo, seminal plasma, which contains both capacitation-promoting and decapacitation factors, plays a key role in fine-tuning the timing of capacitation, ensuring that sperm acquire fertilization competence at the appropriate moment. We hope that these revisions clarify our intended meaning and strengthen the overall message of the paragraph. (lines 42-54)

      “Sperm that have completed spermatogenesis in the testis acquire their potential to fertilize while maturing in the epididymis (5–7). The physiological change of sperm during fertilization process are collectively referred to as “capacitation”. This change includes a large amplitude of flagella (called hyperactivation) and developing the capacity to undergo the acrosome reaction, and can be induced by culturing sperm collected from the epididymis in a medium containing bicarbonate and cholesterol acceptors (8, 9). However, once capacitation is complete, sperm cannot maintain that state for a long time. Therefore, even if epididymal sperm that have not been exposed to seminal plasma are artificially inseminated into the cervix or uterus, the fertilization rate remains low (10–12). That is because, in vivo, during ejaculation, exposure of epididymal sperm to seminal plasma masks the unintended capacitation as they pass through the female reproductive tract and ensures fertilization of sperm that reach the oviduct (13). In other words, seminal plasma plays an important role in fine-tuning the timing of sperm capacitation and in maintaining the sustained sperm motility needed to reach the oviduct.”

      (2) 81: Similar as in their rebuttal the authors should further elute on the connection between fructose, citrate, and testosterone. That still does not become clear. Based on the author's explanation in the rebuttal, why are citrate and fructose levels higher when the animals are castrated?

      We thank you for the opportunity to clarify our statement regarding the relationship between fructose, citrate, and testosterone. Our original explanation was intended to reflect the fact that testosterone from the testes has a stimulating effect on the accessory reproductive glands, and to report that the concentrations of fructose and citric acid were higher in the non-castrated (control) animals than in the castrated animals. In castrated animals, the absence of testosterone leads to decreased activity of these glands and, consequently, lower levels of these metabolites. To make this clear, we have revised the manuscript as follows. (lines 76-82)

      “Several specific factors produced by the male accessory glands that contribute to seminal plasma and impact male fertility have been elucidated. For example, surgical removal of seminal vesicles in male mice and rats was associated with infertility (17, 22, 23). The observations that fructose (24) and citric acid (25) concentrations in seminal plasma of control mice and rats are higher than in castrated animals suggest that the specific metabolism of the accessory glands might be affected by testosterone derived from the testes, which activate intracellular androgen receptors (AR; NR3C4) required for gene regulation of transcription.”

      (3) 111: This reviewer does not understand the author's obsession with reporting linear motility. Sperm are moving linearly when isolated from the epididymis. Again, increase of progressive motility is a well-defined hallmark of capacitation and primarily used in the field when discussing changes in sperm motility during capacitation. This reviewer is assuming that the changes in progressive vs linear motility in Fig. 7 are not significant because the data is more scattered. The % increase seems to be approximately the same. The same is true for Fig. 8. The increase in LIN is so small and not dose-dependent that this reviewer is not comfortable making that one of the main conclusions of the manuscript.

      Our claim is based on the observation that seminal vesicle secretions significantly improve the linear motility (VSL and LIN) of sperm even in an environment that does not contain capacitation-inducing factors such as BSA. We interpret this as a survival strategy for sperm to pass through the female reproductive tract efficiently. Therefore, we believe that this does not mean that the meaning of “progressive motility” in the context of conventional capacitation is the same as that of progressive motility observed in seminal plasma.

      However, the reviewer's point that the current data set does not sufficiently support what the minor increase in linear motility caused by oleic acid means is agreed with. Therefore, we have decided to withdraw the dataset on the effect of oleic acid on sperm motility (Fig. 7, Supplemental Figure 5A and C-F) and have revised the conclusion. (Lines 406-410)

      (4) 128: For the mitochondrial membrane potential measurements the authors should mention that they included antimycin as a control. The manuscript would benefit from including scatter plots with unloaded controls to support their gating strategy. In its current stage, the gating between low and high membrane potential seems arbitrary.

      Thank you for pointing this out. We have included an explanation of antimycin as a control in the main text (Lines 920-921). In addition, we have added some reference scatter plots and also added an explanation of the gating strategy between low and high membrane potentials (Supplemental Figure 1C and D, Lines 1101-1104). We hope this change will make the manuscript clearer.

      (5) 190: What do the authors mean by: 'However, there was no difference in the Oligomycin-sensitive ECAR, indicating that testosterone may increase glucose metabolism but does not enhance the expression of a group of enzymes involved in the glycolytic pathway.'

      Our original intention was to state that testosterone probably increases basal glycolytic flux via increased glucose uptake (as supported by the GLUT4 translocation data), but does not increase maximal glycolytic capacity, as indicated by the lack of difference in oligomycin-sensitive ECAR.

      However, as Reviewer 1 previously pointed out, we agree that the assay conditions themselves, such as the use of oligomycin to inhibit oxidative mitochondria, may create non-physiological conditions and not fully reflect the energy distribution in vivo. Under these conditions, there is a possibility that the flow of glycolysis will increase artificially as a compensatory reaction, and parameters such as “maximum glycolytic capacity” should have been interpreted with caution.

      Therefore, we have revised the manuscript to clarify that our data are a single-time point under defined experimental conditions and do not necessarily provide direct insight into changes in expression or activity of individual glycolytic enzymes.

      “These data indicate that testosterone enhances glucose utilization. This leads to the interpretation that testosterone increases the flow of glycolysis by increasing glucose uptake and alters metabolic flux distribution.” (Lines 186-188)

      (6) 205: Could the authors elaborate further on how they came to this conclusion: 'These results suggest that testosterone does not reduce transient enzyme activity in mitochondria but rather weakens the metabolic pathway of the mitochondrial TCA cycle and/or the electron transport chain due to the changes in gene expression patterns in seminal vesicle epithelial cells.' Based on their results at this point the authors have no insights about changes in enzyme activity or gene expression that might explain the phenotype.

      Our statement is based on the following observations. In testosterone-treated cells, the addition of glucose increased ECAR, suggesting an increase in glycolytic flux due to an increase in glucose uptake. On the other hand, mitochondrial respiratory parameters (basal respiration, oligomycin-sensitive respiration, FCCP-uncoupled respiration, and reserve respiratory capacity) were significantly decreased under testosterone treatment.

      From these results, it was speculated that testosterone promotes the redistribution of metabolic flux, directing it away from mitochondrial oxidative phosphorylation and towards the glycolytic pathway and, possibly, lipid synthesis. However, as the reviewers correctly point out, at this point, we have not directly measured changes in the activity or expression of individual enzymes in the TCA cycle or ETC. Therefore, in the next experiment, we extracted mRNA from the cells and performed gene expression analysis using real-time PCR. To make this clear, we have revised the manuscript as follows.

      “Overall, these data indicate that testosterone promotes the redistribution of metabolic flux. In other words, testosterone increased glycolysis in seminal vesicle epithelial cells while decreasing mitochondrial respiration. To determine whether these changes were accompanied by changes in gene expression of specific metabolic-related enzymes, we analyzed gene expression levels.” (Lines 201-205)

      (7) 219: Characterizing ACLY as an enzyme of the ETC is misleading. ACLY is a cytosolic enzyme that connects the TCA cycle with fatty acid synthesis.

      We would like to thank you for pointing out that the description of the function of ACLY could be misunderstood. We agree that characterizing ACLY as an enzyme of the ETC could be misleading. Therefore, we have revised the sentence to clearly indicate that ACLY is a cytosolic enzyme that links the TCA cycle with fatty acid synthesis. The revised text is as follows:

      "Interestingly, testosterone significantly increased the expression of Acly, which encodes a cytoplasmic enzyme that converts citrate transported from the TCA cycle into acetyl-CoA, a substrate that is essential for fatty acid synthesis." (lines216-218)

      (8) 228: Which results support that ETC proteins were upregulated by flutamide?

      We appreciate the reviewer for this point. In preliminary experiments, we analyzed ETC protein expression using real-time qPCR. Our data show that treatment with flutamide significantly upregulates the expression of genes involved in mitochondrial ETC, such as mtND6, while decreasing the expression of the lipogenic genes Acly and Acc. These additional data are now presented in Supplementary Figure S3B. (lines 223-226)

      (9) 245: Aren't the authors showing in Fig. 5 that glut4 expression is reduced in seminal vesicle epithelial cells upon testosterone treatment? How does that fit into the author's hypothesis?

      Thank you for pointing this out. We have already responded to a similar comment from Reviewer 3 in a previous revision. Please refer to our response to Reviewer 3 in a previous version.

      (10) 285: Based on the author's results OA increases the oocyte cleavage rate but then reduces the rate of blastocyst to cleaved oocyte. Doesn't that mean OA affects negatively early development?

      We thank the reviewer for the insightful comment. The one-hour pre-treatment is designed to reflect the transient exposure of sperm to the seminal plasma during ejaculation. In this context, it is unlikely that such a short exposure would impair the overall developmental potential of the embryo. However, although pre-conditioning with oleic acid does not ultimately affect the development of the offspring, it may lead to a decrease in the blastocyst rate at a certain point (approximately 96-120 hours after fertilization). We agree that additional research is needed to demonstrate this.

      Therefore, because the experiments related to the effects of oleic acid on sperm and fertilization are currently incomplete, we have decided to withdraw them for future research.

      (11) 305: What happens to pyruvate and lactate levels when ACLY expression is reduced?

      We appreciate the reviewer’s question regarding the fate of pyruvate and lactate when ACLY expression is reduced. In the absence of testosterone (Ctrl), the expression level of ACLY decreases. At this time, the concentration of pyruvate in the culture medium increased compared to that of testosterone (Testo; Fig. 4D,E). This is probably a reflection of the fact that when the expression of ACLY is suppressed, the rate at which the products of the glycolytic pathway are converted to the fat-producing pathway (i.e., the conversion of citrate to acetyl-CoA) decreases.

      On the other hand, lactate levels did not change significantly. This suggests that the flow of lactate production via lactate dehydrogenase is relatively constant, independent of metabolic reprogramming by ACLY.

      Therefore, our data suggest that a decrease in ACLY expression leads to a decrease in pyruvate demand, while lactate production is maintained. We interpret these findings as supporting the idea that ACLY is important for directing the carbon produced by the glycolytic pathway to lipid synthesis (by transporting citrate from the mitochondria).

      We hope that this explanation clarifies the interpretation of the data.

      Minor revision:

      189: ECAR: extracellular acidification rate. Please correct.

      We have corrected this. (Lines 184-185)

      199: Pyruvate is not synthesized, it is metabolized from PEP. Please correct.

      The following corrections have been made. “pyruvate is metabolized from phosphoenolpyruvic acid through glycolysis”. (Lines 194-195)

      In addition, minor revisions were made to improve the clarity of the overall text.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1:

      (1) This manuscript introduces a useful curation pipeline of antibody-antigen structures downloaded from the PDB database. The antibody-antigen structures are presented in a new database called AACDB, alongside annotations that were either corrected from those present in the PDB database or added de-novo with a solid methodology. Sequences, structures, and annotations can be very easily downloaded from the AACDB website, speeding up the development of structure-based algorithms and analysis pipelines to characterize antibody-antigen interactions. However, AACDB is missing some key annotations that would greatly enhance its usefulness.

      Here are detailed comments regarding the three strengths above:

      I think potentially the most significant contribution of this database is the manual data curation to fix errors present in the PDB entries, by cross-referencing with the literature. However, as a reviewer, validating the extent and the impact of these corrections is hard, since the authors only provided a few anecdotal examples in their manuscript.

      I have personally verified some of the examples presented by the authors and found that SAbDab appears to fix the mistakes related to the misidentification of antibody chains, but not other annotations.

      (a) "the species of the antibody in 7WRL was incorrectly labeled as "SARS coronavirus B012" in both PDB and SabDab" → I have verified the mistake and fix, and that SAbDab does not fix is, just uses the pdb annotation.

      (b) "1NSN, the resolution should be 2.9 , but it was incorrectly labeled as 2.8" → I have verified the mistake and fix, and that sabdab does not fix it, just uses the PDB annotation.

      (c) "mislabeling of antibody chains as other proteins (e.g. in 3KS0, the light chain of B2B4 antibody was misnamed as heme domain of flavocytochrome b2)" → SAbDab fixes this as well in this case.

      (d) "misidentification of heavy chains as light chains (e.g. both two chains of antibody were labeled as light chain in 5EBW)" → SAbDab fixes this as well in this case.

      I personally believe the authors should make public the corrections made, and describe the procedures - if systematic - to identify and correct the mistakes. For example, what was the exact procedure (e.g. where were sequences found, how were the sequences aligned, etc.) to find mutations? Was the procedure run on every entry?

      We appreciate the reviewer’s valuable feedback. Our correction procedures combined manual curation with systematic sequence analysis. While most metadata discrepancies were resolved through cross-referencing original literature, we implemented a structured approach for identifying mutations in specific cases. For PDB entries labeled as variants (e.g., "Bevacizumab mutant" or "Ipilimumab variant Ipi.106") where the "Mutation(s)" field was annotated as "NO," we retrieved the canonical therapeutic antibody sequence from Thera-SAbDab, then performed pairwise sequence alignment against the PDB entry using BLAST program to identified mutated residues.

      This procedure was not applied to all entries, as mutations are context-dependent. Therapeutic antibodies have well-defined reference sequences, enabling systematic alignment. For antibodies lacking unambiguous wild-type references (e.g., research-grade or non-therapeutic antibodies), mutation annotations were directly inherited from the PDB or literature.

      All corrections have been publicly archived in AACDB. We have added a detailed discussion of this issue in the section “2.3 Metadata” of revised manuscript.

      (2) I believe the splitting of the pdb files is a valuable contribution as it standardizes the distribution of antibody-antigen complexes. Indeed, there is great heterogeneity in how many copies of the same structure are present in the structure uploaded to the PDB, generating potential artifacts for machine learning applications to pick up on. That being said, I have two thoughts both for the authors and the broader community. First, in the case of multiple antibodies binding to different epitopes on the same antigen, one should not ignore the potentially stabilizing effect that the binding of one antibody has on the complex, thereby enabling the binding of the second antibody. In general, I urge the community to think about what is the most appropriate spatial context to consider when modeling the stability of interactions from crystal structure data. Second, and in a similar vein, some antigens occur naturally as homomultimers - e.g. influenza hemagglutinin is a homotrimer. Therefore, to analyze the stability of a full-antigen-antibody structure, I believe it would be necessary to consider the full homo-trimer, whereas, in the current curation of AACDB with the proposed data splitting, only the monomers are present.

      We sincerely appreciate the reviewer’s insightful comments regarding the splitting of PDB files and we appreciate the opportunity to address the reviewer’s thoughtful concerns.

      Firstly, when two antibodies bind to distinct epitopes on the same antigen, we would like to clarify that this scenario can be divided into two cases based on the experimental context: Case1: When two antibodies bind to distinct epitopes on the same antigen, and their complexes are determined in separate structures. For example, SAR650984 (PDB: 4CMH) and daratumumab (PDB: 7DHA) target CD38 at non-overlapping epitopes. These two antibody-antigen complexes were determined independently, and their structures do not influence each other. Case 2 : When the crystal structure contains a ternary complex with two antibodies and an antigen, as in the example of 6OGE discussed in Section 2.2 of our manuscript. After reviewing the original literature, the experiment confirmed that the order of Fab binding does not affect the formation of the ternary complex, and the binding of one antibody does not enhance the binding of the other. This supports the rationale for splitting 6OGE into two separate structures. However, we acknowledge that not all ternary complexes in the PDB provide such detailed experimental descriptions in their original literature. We agree with the reviewer that in some cases, one antibody may stabilize the structure to facilitate the binding of a second antibody. For instance, in 3QUM, the 5D5A5 antibody stabilizes the structure, enabling the binding of the 5D3D11 antibody to human prostate-specific antigen. Such sandwich complexes are indeed valuable for identifying true epitopes and paratopes. Importantly, splitting the structure does not alter the interaction sites.

      Secondly, we fully agree with the reviewer that for antigens that naturally exist as homomultimers (e.g., influenza hemagglutinin as a homotrimer), the full multimeric structure should be considered when analyzing stability. In such cases, users can directly utilize the original PDB structures provided in their multimeric form. Our splitting approach is intended to provide an additional option for cases where monomeric analysis is sufficient or preferred, but it does not preclude the use of the original multimeric structures when necessary.

      (3) I think the manuscript is lacking in justification about the numbers used as cutoffs (1A^2 for change in SASA and 5A for maximum distance for contact) The authors just cite other papers applying these two types of cutoffs, but the underlying physico-chemical reasons are not explicit even in these papers. I think that, if the authors want AACDB to be used globally for benchmarks, they should provide direct sources of explanations of the cutoffs used, or provide multiple cutoffs. Indeed, different cutoffs are often used (e.g. ATOM3D uses 6A instead of 5A to determine contact between a protein and a small molecule https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/c45147dee729311ef5b5c3003946c48f-Abstract-round1.html). I think the authors should provide a figure with statistics pertaining to the interface atoms. I think showing any distribution differences between interface atoms determined according to either strategy (number of atoms, correlation between change in SASA and distance...) would be fundamental to understanding the two strategies. I think other statistics would constitute an enhancement as well (e.g. proportion of heavy vs. light chain residues).

      Some obvious limitations of AACDB in its current form include:

      AACDB only contains entries with protein-based antigens of at most 50 amino acids in length. This excludes non-protein-based antigens, such as carbohydrate- and nucleotide-based, as well as short peptide antigens.

      AACDB does not include annotations of binding affinity, which are present in SAbDab and have been proven useful both for characterizing drivers of antibody-antigen interactions (cite https://www.sciencedirect.com/science/article/pii/S0969212624004362?via%3Dihub) and for benchmarking antigen-specific antibody-design algorithms (cite https://www.biorxiv.org/content/10.1101/2023.12.10.570461v1)).

      We thank the reviewer for raising this critical point about the cutoff values used in AACDB. In the current study, the selection of the threshold value is very objective; the threshold chosen in the manuscript is summarized based on existing literature, and we have provided more literature support in the manuscript. The criteria for defining interacting amino acids in established tools, typically do not set the ΔSASA exceed 1 Å2 and the distance exceed 6 Å. While our manuscript emphasizes widely accepted thresholds for consistency with prior benchmarks, AACDB explicitly provides raw ΔSASA and distance values for all annotated residues. Users can dynamically filter the data from downloaded files by excluding entries exceeding their preferred thresholds (e.g., selecting 5Å instead of 6Å). This ensures adaptability to diverse research needs. In the revised version, we reset the distance threshold to 6 Å and calculated the interacting amino acids in order to give the user a wider range of choices. In the section “3.2 Database browse and search” of revised manuscript, we provide a description of the flexible choice of thresholds for practical use.

      Furthermore, distance and ΔSASA are two distinct metrics for evaluating interactions. Distance directly quantifies spatial proximity between atoms, reflecting physical contacts such as van der Waals interactions or hydrogen bonds, and is ideal for identifying direct spatial adjacency. ΔSASA, on the other hand, measures changes in solvent accessibility of residues during binding, capturing the contribution of buried surfaces to binding free energy. Even for residues not in direct contact, reduced SASA due to conformational changes may indicate indirect functional roles.

      As demonstrated through comparisons on the detailed information pages, the sets of interacting amino acids defined by these two methods differ by only a few residues, with no significant variation in their overall distributions. However, since interaction patterns vary significantly across different complexes, analyzing residue distributions across all structures using both criteria is not feasible.

      We thank the reviewer for highlighting these limitations. AACDB currently focuses on protein-based antigens ≤50 amino acids to prioritize structural consistency, which excludes non-protein antigens and shorter peptides. While affinity annotations are critical for benchmarking antibody design tools, these data were not integrated in this release due to insufficient data verification caused by internal team constraints. We acknowledge these gaps and plan to expand antigen diversity and incorporate affinity metrics in future updates.

      Reviewer #2:

      Summary:

      Antibodies, thanks to their high binding affinity and specificity to cognate protein targets, are increasingly used as research and therapeutic tools. In this work, Zhou et al. have created, curated, and made publicly available a new database of antibody-antigen complexes to support research in the field of antibody modelling, development, and engineering.

      Strengths:

      The authors have performed a manual curation of antibody-antigen complexes from the Protein Data Bank, rectifying annotation errors; they have added two methods to estimate paratope-epitope interfaces; they have produced a web interface that is capable of both effective visualisation and of summarising the key useful information in one page. The database is also cross-linked to other databases that contain information relevant to antibody developability and therapeutic applications.

      Weaknesses:

      The database does not import all the experimental information from PDB and contains only complexes with large protein targets.

      Thank you for the valuable feedback. As previously responded to Reviewer 1, due to limitations within our team, comprehensive data integration from PDB has not been achieved in the current version. We acknowledge the significance of expanding the database to encompass a broader range of experimental information and complexes with diverse target sizes. Regrettably, immediate updates to address these limitations are not feasible at this time. Nevertheless, we are committed to enhancing the database in upcoming upgrades to provide users with a more comprehensive and inclusive resource

      Recommendations for the authors:

      Reviewer #1:

      (1) Line 194: "produce" → "produced"

      We thank the reviewer for the feedback. We have checked the grammar and spelling carefully in the revised manuscript.

      (2) As mentioned in the public review, I think adding binding affinity annotations would greatly enhance the use cases for the database.

      We thank the reviewer for the suggestion. As the response in “Public review”. Due to team constraints, these data are not integrated into this release but are being collated. We recognize these gaps and plan to expand antigenic diversity and incorporate affinity metrics in future updates.

      (3) I think adding a visualization of interface atoms and contacts on an entry's webpage would be useful for someone exploring specific entries. It also would be useful if the authors provided a pymol command to select interface residues since that's a procedure any structural biologist is likely to do.

      We sincerely appreciate the reviewer’s constructive suggestions. In response to the request for enhanced visualization and accessibility of interface residue information, we have implemented the following improvements: (1) Web Interface Visualization. On the entry-specific webpage, we have added an interactive visualization window that highlights the antigen-antibody interaction interface using distinct colors. The interaction interface visualization has been incorporated into Figure 5 of the revised manuscript, with a detailed description. (2) PyMOL Command Accessibility. The “Help” page now provides step-by-step PyMOL commands to select and visualize interface residues.

      (4) I think the authors should provide headers to the files containing interface residues according to the change-in-SASA criterion, as they do for those computed according to contact. This would avoid unnecessary confusion - however slight - and make parsing easier. I was initially confused by the meaning of the last column, though after a minute I understood it to be the change in SASA.

      We thank the reviewer for providing such detailed feedback. We thank the reviewer for the comment and the suggestion. We have provided headers for the files of the interacting residues defined by ΔSASA.

      (5) Line 233: "AACDB's data processing pipeline supports mmCIF files" → The meaning and implications of this statement are not obvious to me, and are mentioned nowhere else in the paper. Do you mean that in AACDB there are structure entries that the RCSB PDB database only has in mmCIF file format, and not .pdb format? So, effectively, there are some entries in AACDB that are not in any other antibody-specific database?I checked and, as of Dec 3rd, 2024, there are 41 structures in AACDB that are NOT in SAbDab. Manually checking 5 of those 41 structures, none are mmCIF-only structures.

      We thank the reviewer for the valuable comment. Because of the size of the structures within certain entries, representing them in a single PDB format data file is not feasible due to the excessive number of atoms and polymer chains they contain. As a result, PDB stores these structures in “mmcif” format files. In AACDB, 47 entries, such as 7SOF, 7NKT, 7B27, and 6T9D, are only available in the “mmCIF” format from the PDB. The “.pdb” and “.cif” files contain atomic coordinates in distinct text formats, and the segmentation of these structure files is automatically conducted based on manually annotated antibody-antigen chains. To accommodate this, we have incorporated these considerations into our file processing pipeline, thereby enabling a fully automated file segmentation process. Additionally, we employed Naccess to calculate interatomic distances. However, since this software only accepts .pdb format files as input, we also converted all split .cif files into .pdb format within our fully automated pipeline. We apologize for the lack of clarity in the original manuscript and have included a more detailed explanation in the "2.2 PDB Splitting" section of the revised manuscript.

      Reviewer #2:

      (1) In SabDab and PDB, experimental binding affinities are also reported: could the authors comment on whether they also imported this information and double-checked it against the original paper? If it wasn't imported, that might discourage some users and should be considered as an extension for the future.

      We thank the reviewer for the comment and the suggestion. As the response in “Public review”. Due to current resource constraints, quantitative affinity data has not been incorporated into this release but is undergoing systematic curation. We explicitly recognize these limitations and propose a two-pronged strategy for future iterations: (1) broadening antigen diversity coverage through expanded structural sampling, and (2) integrating quantitative binding affinity measurements. In the Discussion section, we have included description outlining the planned enhancements.

      (2) Line 49-50: the references mentioned in connection to deep learning methods for antibody-antigen predictions seem a bit limited given the amount of articles in this field, with 3 of 4 references on one method only (SEPPA), could the authors expand this list to reflect a bit more the state of the art?

      We thank the reviewer for the suggestion. We agree that more relevant studies should be listed and therefore more references are provided in the revised manuscript.

      When mentioning the limitations of the existing databases, it feels a bit that the criticism is not fully justified. For instance:

      Line 52-53: could the authors elaborate on the reasons why such an identification is challenging? (Isn't it possible to make an efficient database-filtered search? Or rather, should one highlight that a more focussed resource is convenient and why?)

      Thank you for feedback. In this study, the keywords "antibody complex," "antigen complex," and "immunoglobulin complex," were employed during data collection. PDB returned over 30,000 results, of which only one-tenth met our criteria after rigorous filtering. This demonstrates that keyword searches, while useful, inherently limit result precision and introduce substantial redundancy, likely due to the PDB's search mechanism. That’s why we illustrated the significant challenges in identifying antibody-antigen complexes from general protein structures in the PDB.

      Line 55: reading the website http://www.abybank.org/abdb/, it would be fairer to say that the web interface lacks updates, as the database and the code have gone through some updates. Could the authors provide a concrete example of the reason why: 'The AbDb database currently lacks proper organization and management of this valuable data.'?

      We thank the reviewer for highlighting this issue. In our original manuscript, the statement that the AbDb database "lacks proper organization and management" was based on the absence of explicit statement regarding data updates on its official website at the time of submission, even though internal updates to its content may have occurred. We fully respect the long-standing contributions of AbDb to antibody structural research, and our comments were solely directed at the specific state of the database at that time. As the reviewer noted, following the release of our preprint, we have also taken note of AbDb's recent updates. To reflect the latest developments and avoid potential misinterpretation, we have revised the original statement in revised manuscript.

      Also 'this rapid updating process may inadvertently overlook a significant amount of information that requires thorough verification,': it's difficult for me to understand what this means in practice. Could the authors clarify if they simply mean that SabDab collects information from PDB and therefore tends to propagate annotation errors from there? If yes, I think it's enough to state it in these terms, and for sure I agree that the reason is that correcting these annotation errors requires a substantial amount of work.

      We thank the reviewer for providing such detailed feedback on the manuscript. We acknowledge that SabDab represents a highly valuable contribution to the field, and its rapid update mechanism has significantly advanced related research areas. However, as stated by the reviewer, we aim to clarify that SabDab primarily relies on automated metadata extraction from the PDB for annotation, and its rapid update process inherently inherits raw data from upstream sources. According to their paper, manual curation is only applied when the automated pipeline fails to resolve structural ambiguities. This workflow—dependent on PDB annotations with limited manual verification—may propagate errors provided by PDB. Examples include species misannotation and mutation status misinterpretation. We fully agree with the reviewer's observation that correcting errors in such cases necessitates labor-intensive manual curation, which is a core motivation for our study.

      Line 86: why 'Structures that consisted solely of one type of antibody were excluded'? Why exclude complexes with antigens shorter than 50 amino acids? These complexes are genuine antibody-antigen complexes.

      We thank the reviewer for the valuable question. The AACBD database is dedicated to curating structural data of antigen-antibody complexes. Structures featuring only a single antibody type are classified as free antibodies and systematically excluded from the database due to the absence of protein-bound partners. During data screening , we retained sequences shorter than 50 amino acids by categorizing them as peptides rather than eliminating them outright. The current release exclusively encompasses complexes with protein-based antigens. Meanwhile, complexes involving peptide, haptens, and nucleic acid antigens are undergoing systematic curation, with planned inclusion in future updates to broaden antigen category representation.

      Line 96 needs a capital letter at the beginning.

      Line 107: 'this would generate' → 'this generates' (given it is something that has been implemented, correct?).

      Line 124: missing an 'of'.

      Line 163: inspiring by -> inspired by.

      Thank you for feedback. All of the above grammatical or spelling errors have been revised in the manuscript.

      Line 109-111: apart from the example, it would be good to spell out the general rule applied to anti-idiotypic antibodies.

      We thank the reviewer for the valuable feedback. For anti-idiotypic antibodies complex. the partner antibody is treated as a dual-chain antigen, , necessitating individual evaluation of heavy chain and light chain interactions with the anti-idiotypic component. We have given a general rule for anti-idiotypic antibodies in section “2.2 PDB splitting” of revised manuscript.

      Line 155-159: could the authors provide references for the two choices (based on sasa and any-atom distance) that they adopted to define interacting residues?

      We thank the reviewer for the comment and the suggestion. As the same as the response to reviewer #1 in Public review. The interacting residues definition and the threshold chosen in the manuscript is summarized based on existing literature. We have added additional references for support in section “1.Introduction”. Our resource does not provide a fixed amino acid list. Instead, all interacting residues are explicitly documented alongside their corresponding ΔSASA (solvent-accessible surface area changes) and intermolecular distances, allowing researchers to flexibly select residue pairs based on customized thresholds from downloadable datasets. Furthermore, aligning with widely adopted criteria in current literature—where interactions are defined by ΔSASA >1 Ų and atomic distances <6 Å, we have recalibrated our analysis in the revised version. Specifically, we replaced the previous 5 Å distance threshold with a 6 Å cutoff to recalculate interacting residues.

      Line 176-178: could the authors re-phrase this sentence to clarify what they mean by 'change in the distribution'?

      We thank the reviewer for the suggestion. Our search was conducted with an end date of November 2023. However, Figure 3B includes an entry dated 2024. Upon reviewing this record, we identified that the discrepancy arises from the supersession of the 7SIX database entry (originally released in December 2022) by the 8TM1 version in January 2024. This version update explains the apparent chronological inconsistency. We regret any lack of clarity in our original description and have revised the corresponding section in the manuscript to explicitly clarify this change of database.

      Caption Figure 3: please spell out all the acronyms in the figure. Provide the date when the last search was performed (i.e., the date of the last update of these statistics).

      We thank the reviewer for the comment. We have systematically expanded all acronyms and included update dates for statistics in the legend of Figure 3. Corresponding changes have also been made to the statistical pages on the website.

      Finally, it would be advisable to do a general check on the use of the English language (e.g. I noted a few missing articles). In Figure 5 DrugBank contains typos.

      We sincerely appreciate the reviewer's meticulous attention to linguistic precision. We have corrected the typographical error in Figure 5 and conducted a comprehensive review of the entire manuscript to ensure accuracy and clarity.

    1. Author response:

      We are highly appreciative of your constructive criticism and that you found that our findings of interest and significance. Based on your helpful suggestions, we plan to revise the paper as following:

      (1) Although ETFDH is reduced, but not mutated across neoplasia, we appreciate your point pertinent to catalytically activity of ETFDH. To this end, in the revision we are planning to compare the effects of rescues using wild type ETFDH or one of the MADD-associated mutants with compromised catalytic activity.

      (2) We intend to measure steady-state nucleotide levels as a function of ETFDH status in the cell. If time and/or funding allow, we will also perform appropriate labelling experiments.

      (3) We will revise the text of the manuscript to address the minor points raised by the reviewers.

      Again, we would like to thank you for helpful comments, which we aim to address as outlined above and hopefully further improve our report.

    1. Author response:

      We sincerely thank all reviewers for their thoughtful, detailed, and supportive evaluations of our manuscript. We are very pleased that the reviewers appreciated the integrative approach of our study, the quality of the imaging and analyses, and the insights provided into the parallel evolution of biomineralization mechanisms in sponges and corals.

      We are carefully considering all the suggestions made, including those regarding the improvement of figure clarity and the clarification of certain image interpretations. These comments are extremely valuable, and we are preparing a detailed point-by-point reply to accompany our revised manuscript.

      It was also brought to our attention that the links to the Zenodo repository were incorrect. We apologize for this oversight and any inconvenience it may have caused and will updae the links in our revised manuscript. In the meantime, the correct Zenodo repositories can be accessed using the following links:

      https://zenodo.org/records/14755899

      https://zenodo.org/records/13847772

      We again thank the reviewers for their constructive feedback, which will help us to further strengthen the manuscript.

    1. Author response:

      We thank the editors and reviewers for their thoughtful and constructive evaluation of our manuscript, “Krüppel Regulates Cell Cycle Exit and Limits Adult Neurogenesis of Mushroom Body Neural Progenitors in Drosophila.” We are pleased that all reviewers recognised the novelty and significance of identifying Krüppel (Kr) as a key transcription factor promoting timely termination of mushroom body neuroblast (MBNB) proliferation, and the potential antagonistic function of Kr-h1.

      We appreciate the helpful suggestions aimed at improving the mechanistic clarity and presentation of our findings. Below, we outline how we plan to address the major points raised in the full revision.

      (1) Characterisation of the KrIf-1 allele and Kr expression

      We agree that clarifying the nature of the KrIf-1 allele is important. In response to this concern, we will examine Kr expression in KrIf-1 mutant larval, pupal, and adult brains using immunostaining and available reporter lines. These experiments will help determine whether the observed neuroblast retention phenotype correlates with altered Kr expression in MBNBs.

      (2) Regulatory relationships between Kr, Kr-h1, Imp, Syp, Chinmo, and E93

      We are currently performing additional experiments to clarify the interactions among these temporal factors. For instance, we are testing whether Kr-h1 overexpression alters the expression of Imp, Syp, and E93. We have obtained a published E93 antibody from Dr Chris Doe (Syed et al., 2017) and will include E93 expression analysis in our revised manuscript.

      While Chinmo is of interest, its expression is well established to be regulated downstream of Imp/Syp via mRNA stability (Liu et al., 2015; Ren et al., 2017). Given that we currently lack reliable tools to assess Chinmo levels, we will focus primarily on Imp, Syp, and E93 as readouts for Kr/Kr-h1 function. If we succeed in obtaining Chinmo antibodies or reporter lines in time, we will include corresponding data.

      (3) Expression of Kr-h1 in MBNBs

      We fully agree that direct evidence for Kr-h1 expression in MBNBs is important. To address this, we have obtained the Kr-h1::GFP BAC transgenic line (BDSC #96786) and are currently using it to assess Kr-h1 expression in MBNBs. We also tested an anti–Kr-h1 antibody previously reported by Kang et al. (2017), developed in the context of fat body studies, but it did not yield clear signals in larval MBNBs. However, previous work by Shi et al. (2007) clearly demonstrated Kr-h1 expression in the developing MB, including MBNBs, using a custom antibody developed by their lab. We also contacted the Lee lab to request this antibody, but unfortunately, it is no longer available. We will include the results obtained using the GFP BAC line in the revised manuscript and, if needed, pursue RNA in situ hybridisation to further validate Kr-h1 expression in MBNBs.

      (4) Temporal Kr knockdown and MARCM analysis

      We appreciate the suggestion to validate our RNAi-based temporal knockdown results using MARCM. We plan to perform MBNB-specific MARCM analysis following the strategy described by Rossi et al. (2020). However, this approach requires additional time due to the logistics of acquiring the necessary fly stocks, generating appropriate genetic combinations, and conducting clonal analyses. While we will make every effort to include these data, we note that RNAi-based knockdown offers the advantage of temporal reversibility and has been essential for assessing stage-specific requirements in our current study.

      (5) Details of the targeted genetic screen

      Kr was initially identified as part of a broader, ongoing effort to screen for candidate transcription factors and cell cycle regulators involved in neuroblast cell cycle exit and/or quiescence. As this screen is still preliminary and incomplete, we prefer not to include the full dataset at this stage. Instead, we will revise the manuscript to clarify that Kr was prioritised for further investigation based on the striking MBNB-specific phenotype observed upon RNAi-mediated knockdown and in the KrIf-1 mutant, rather than through a completed screening process.

      (6) Clarifying the model (Figure 6D) and interactions

      We will revise the proposed model to distinguish between experimentally supported interactions and speculative ones. As noted above, we will primarily focus on the Imp/Syp and E93 axis in relation to Kr and Kr-h1 activity. Chinmo will be omitted from the model unless further data become available to support its inclusion.

      (7) Clarifications on figures and data presentation

      We appreciate the feedback on figure clarity. We will revise figures such as 1B, 2C, and 3A to improve legibility and presentation. We will also correct typographical errors and figure references, and clarify the activity patterns of the GAL4 drivers. Specifically, while UASmCD8::GFP expression driven by OK107-GAL4 is markedly weaker in MBNBs than in their neuronal progeny (as seen, for example, in Figure S3C), the driver remains active and functionally relevant in MBNBs. We believe the weak expression in MBNBs likely explains the absence of a NB retention phenotype in OK107>KrIR adult brains (see main text, Lines 374–376). As suggested by the reviewer, we will clarify this point earlier in the manuscript and can include additional data showing OK107>GFP expression patterns in pupal MB lineages as supplementary material.

      (8) Analysis of public datasets

      We will include results from our analysis of publicly available datasets such as FlyAtlas2, modENCODE, and a time-course RNA-seq dataset specific to MBNBs (Liu et al., 2015). While the spatial resolution of FlyAtlas2 and modENCODE is limited, the MBNB dataset provides valuable temporal information up to 36 h after puparium formation (APF). From this dataset, we observe that Kr expression remains consistently low throughout development, with only a modest increase at 84 h ALH (mean TPM ~11) and 36 h APF (~7), suggesting it does not undergo strong transcriptional regulation in MBNBs. In contrast, Kr-h1 is highly expressed during early larval stages (24–84 h ALH; mean TPM ~55–60) and shows a marked suppression by 36 h APF (mean TPM ~2), consistent with its proposed role in promoting MBNB proliferation. Importantly, Eip93F (E93) exhibits a reciprocal pattern to Kr-h1—with minimal expression until 84 h ALH (mean TPM ~24), followed by a substantial induction at 36 h APF (mean TPM ~104), aligning with its known role in triggering neuroblast termination. These temporal expression dynamics support our model that Kr-h1 and E93 function in opposition during the transition from proliferative to terminating neuroblast states. We will summarise these findings in the revised manuscript, along with appropriate discussion of dataset limitations.

      We hope this provisional response conveys our strong commitment to thoroughly addressing the reviewers’ concerns and improving the manuscript. We are currently carrying out additional experiments and will submit a revised version with new data and enhanced clarity in due course.

      References:

      Kang et al., 2017. Sci Rep. 7(1):16369. doi: 10.1038/s41598-017-16638-1.

      Shi et al., 2007. Dev Neurobiol. 67(11):1614–1626. doi: 10.1002/dneu.20537.

      Rossi et al., 2020. eLife. 9:e58880. doi: 10.7554/eLife.58880.

      Liu et al., 2015. Science. 350(6258):317–320. doi: 10.1126/science.aad1886.

      Ren et al., 2017. Curr Biol. 27(9):1303–1313. doi: 10.1016/j.cub.2017.03.018. Syed et al., 2017. eLife. 6:e26287. doi: 10.7554/eLife.26287.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Garcia et al. describes how the expression of a respiratory chain alternative oxidase (AOX) from the tunicate Ciona intestinalis, capable of transferring electrons directly from reduced coenzyme Q (CoQ) to oxygen, is able to induce an increase in the mass of Drosophila melanogaster larvae and an accelerated development, especially when the larvae are kept at low temperatures. In order to explain this phenomenon, the paper addresses the modifications in the activity and levels of the 'canonical' electron transfer system (ETS), i.e., complexes I-IV and of the ATP synthase. In addition, the abundance of different metabolites as well as the NAD+/NADH ratios are measured, finding significant differences between the larvae.

      Strengths:

      The observations of differences in growth, body mass and food intake in the wt D. melanogaster larvae vs. those expressing the AOX transgene are solid. The evidence that mild uncoupling of the ETS might accelerate development of the fly larvae is convincing."

      We appreciate the reviewer’s attention to our results and hope we can improve the manuscript to address all criticism appropriately.

      Weaknesses:

      Some of the observations, especially those concerning the origin of the metabolic remodelling in AOX-expressing larvae, are left unexplained, and the argumentation is somewhat speculative. What the authors mean by "reconfiguration" of the mitochondrial electron transfer system is not clear. If this implies that there is an actual change in ETS function and/or structural organisation in the presence of AOX, this conclusion is not supported by the experimental data. In addition, the influence of AOX activity in the mitochondrial ETS system is tested in vitro in the presence of saturating concentrations of substrates. The real degree to which AOX activity is actually influencing ETS activity in vivo remains unknown.

      Indeed, the term “reconfiguration” may seem a little too strong. However, we do have preliminary structural data on larval mitochondria indicating that the term is adequate in this context. We plan to work on obtaining concrete data to sustain our claims that AOX imparts significant functional and structural remodeling of the organelle, which would be consistent with our respirometry and BN-PAGE data. If the data turns out not to be robust enough, we will consider replacing the term with one that better reflects our findings.

      We also realize that the in vivo data we are presenting (body mass, mobility, food intake) are indirect measurements of metabolism and that a more direct approach is necessary to assess the real degree to which AOX influences ETS activity in vivo. To address this issue, we plan to expand our pharmacological treatments of the larval development and to measure whole larval oxygen consumption.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents intriguing findings about the role of alternative oxidase (AOX) from the tunicate Ciona intestinalis in accelerating growth and development when expressed in Drosophila melanogaster.

      Strengths:

      The study is overall well-constructed, including appropriate analysis. Likewise, the manuscript is written clearly and supported by high-quality figures. The present study provides valuable insights into AOX's role in Drosophila development. The paper attempts to explore a unique mechanism by which AOX influences Drosophila development, providing insights into mitochondrial respiration and its physiological effects. This is relevant for understanding mitochondrial dysfunction and potential therapeutic applications. The study employs a variety of approaches, including calorimetry, infrared thermography, and genetic analyses, to investigate AOX's impact on metabolism and development.

      We sincerely thank the reviewer for recognizing the strengths and acknowledging the novelty of our study.

      Weaknesses:

      There are a number of methodological limitations and substantial gaps in the interpretation of the data presented, which reduces the strength of its conclusions. For instance, there is a misunderstanding of the non-proton motive nature of the AOX - it does not uncouple respiration, merely decouple it as it neither contributes to nor dissipates the proton motive force, in contrast to chemical uncouplers or proton uncouplers such as UCPs. The authors need to reassess their data in light of the above.

      The reviewer is absolutely right about the non-proton motive nature of AOX. We will reassess our data considering that AOX decouples respiration and, if necessary and possible, we will add new experiments to address the methodological limitations raised by the reviewer.

    1. Author response:

      We appreciate the reviewers' positive feedback on our paper. We especially thank them for their evaluation of the genetic analysis, which required a significant amount of timef time. We acknowledge that several aspects of our interpretation and description of the results need correction, as noted by both reviewers. Additionally, we recognize the importance of providing a more comprehensive overview of previous findings, including those conducted in mice, in the manuscript. In the revised version, we will thoroughly address the reviewers' concerns.

      Both reviewers emphasized the need for further validation to ascertain whether the specific requirement of Hox genes in the Hoxba and Hoxbb clusters for pectoral fin bud formation is due to their expression patterns or the functional roles of Hox proteins. This consideration has been on our agenda for some time; however, our submitted paper does not sufficiently address this aspect. In the revised manuscript, we will conduct a comprehensive analysis of the expression patterns of Hox genes in zebrafish to draw informed conclusions on this matter.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate how the viscoelasticity of the fingertip skin can affect the firing of mechanoreceptive afferents and they find a clear effect of recent physical skin state (memory), which is different between afferents. The manuscript is extremely well-written and well-presented. It uses a large dataset of low threshold mechanoreceptive afferents in the fingertip, where it is particularly noteworthy that the SA-2s have been thoroughly analyzed and play an important role here. They point out in the introduction the importance of the non-linear dynamics of the event when an external stimulus contacts the skin, to the point at which this information is picked up by receptors. Although clearly correlated, these are different processes, and it has been very well-explained throughout. I have some comments and ideas that the authors could think about that could further improve their already very interesting paper. Overall, the authors have more than achieved their aims, where their results very much support the conclusions and provoke many further questions. This impact of the previous dynamics of the skin affecting the current state can be explored further in so many ways and may help us to better understand skin aging and the effects of anatomical changes of the skin.

      At the beginning of the Results, it states that FA-2s were not considered as stimuli did not contain mechanical events with frequency components high enough to reliably excite them. Was this really the case, did the authors test any of the FA-2s from the larger dataset? If FA-2s were not at all activated, this is also relevant information for the brain to signal that it is not a relevant Pacinian stimulus (as they respond to everything). Further, afferent receptive fields that were more distant to the stimulus were included, which likely fired very little, like the FA-2s, so why not consider them even if their contribution was low?

      Thank you for bringing this up, we have now clarified in the text that while FA-2s did respond at a low rate during the experiment, their responses were not reliably driven by the force stimuli. In the Methods section we have included the following text:

      “Initially, 10 FA-2 neurons were also included in the analysis. But their responsiveness during the experiment was remarkably low, and unlike the other neuron types, their responses were rarely affected by force stimuli. Specifically, only one of the observed FA-2 neurons responded during the force protraction phases. Due to the lack of clear stimulus-driven responses, FA-2 neurons were subsequently excluded from further analysis.”

      One question that I wondered throughout was whether you have looked at further past history in stimulation, i.e. not just the preceding stimulus, but 2 or 3 stimuli back? It would be interesting to know if there is any ongoing change that can be related back further. I do not think you would see anything as such here, but it would be interesting to test and/or explore in future work (e.g. especially with sticky, forceful, or sharp indentation touch). However, even here, it could be that certain directions gave more effects.

      This is a very interesting question! A discernible effect from the previous stimulus could persist at the end of the current stimulation (see Figure 4C), potentially influencing the next one—a 2-stimuli-back effect. Unfortunately, our experimental design did not allow for rigorous testing of this effect. While all possible pairs of stimulus directions were included in immediately consecutive trials, this was not the case for pairs separated by additional trials. Hence, the combination of a likely weak effect and limited variation in history precluded a thorough analysis of a 2-stimuli-back effect. Future work should delve into the time course of the viscoelastic effect in greater detail.

      Did the authors analyze or take into account the difference between receptive field locations? For example, did afferents more on the sides have lower responses and a lesser effect of history?

      An investigation into the potential impact of the relationship between the receptive field location on the fingertip skin and the primary contact site of the stimulus surface revealed no discernible influence for SA-1 and SA-2 neurons. In contrast, FA-1 neurons, particularly those predominantly sensitive to the previous stimulation or displaying mixed sensitivity, exhibited a tendency to terminate near the primary stimulation site. We have added these observations to the text:

      “We found no straightforward relationship between a neuron's sensitivity to current and previous stimulation and its termination site in fingertip skin. Specifically, there was no statistically significant effect of the distance between a neuron's receptive field center and the primary contact site of the stimulus surface on whether neurons signaled current, prior, or mixed information for SA-1 (Kruskal-Wallis test H(2)=3.86, p= 0.15) or SA-2 neurons (H(2)=0.75, p=0.69). However, a significant difference emerged for FA-1 neurons (H(2)=8.66, p=0.01), indicating that neurons terminating closer to the stimulation site on the flat part of the fingertip were more likely to signal past or mixed information.”

      Was there anything different in the firing patterns between the spontaneous and non-spontaneously active SA-2s? For example, did the non-spontaneous show more dynamic responses?

      The firing patterns of both spontaneously and non-spontaneously active SA-2 neurons shared similarities in terms of adaptation and range of firing rate modulation in response to force stimuli, i.e., ‘dynamic response’. The distinction lay in the pattern of modulation of the firing rate associated with stimulus presentations. For spontaneously active SA-2 neurons, this modulation occurred around a significant background discharge, implying that a force stimulus could either decrease or increase the firing rate, depending on how it deformed the fingertip. This characteristic is well illustrated by the firing pattern of the neuron depicted in the lower panels of Figure 3D. Conversely, in non-spontaneously active SA-2 neurons, a force stimulus could only induce an increase in the firing rate or no change. Although the neuron depicted in the upper panels of Figure 3D exhibited some background activity, it serves to exemplify this characteristic. In the text, we have elucidated the dynamics of the SA-2 neuron response by highlighting that force stimulation can either decrease or increase the firing rate in neurons with spontaneous activity through the following addition/change:

      “This increased variability was most evident during the force protraction phase where most neurons exhibited the most intense responses. Increased variability was also observed in instances where the dynamic response to force stimulation involved a decrease in the firing rate (lower panels of Figure 3D). This phenomenon was observed in SA-2 neurons that maintained an ongoing discharge during intertrial periods (cf. Fig. 2A). In these cases, the response to a force stimulus constituted a modulation of the firing rate around the background discharge, signifying that a force stimulus could either decrease or increase the firing rate depending on the prevailing stimulus direction.”

      Were the spontaneously active SA-2 afferents firing all the time or did they have periods of rest - and did this relate to recent stimulation? Were the spontaneously active SA-2s located in a certain part of the finger (e.g. nail) or were they randomly spread throughout the fingertip? Any distribution differences could indicate a more complicated role in skin sensing.

      SA-2 neurons, in general, are well-known for undergoing significant post-stimulation depression (e.g., Knibestöl and Vallbo, 1970; Chambers et al., 1972; Burgess and Perl, 1973). In our force stimulations, this post-excitatory depression manifested as a reduced or absent response during the latter part of the stimulus retraction period for stimuli in directions that markedly excited the neuron. The excitability recovered when the fingertip relaxed during the subsequent intertrial period, and for "spontaneously active" neurons, the firing resumed (see examples in Figure 7A). Furthermore, some “spontaneously active” neurons could be silenced or exhibit a near-silent period during force stimulation for certain force directions, while the spontaneous firing returned during the upcoming intertrial period when the fingertip shape recovered (for example, see responses to stimulation in the proximal and especially ulnar directions in the top panel in Figure 7A).

      Regarding the location of the receptive field centres of spontaneously active and non-spontaneously active SA-2 neurons on the fingertip we did not observe any obvious spatial segregation. To illustrate this, we have revised Figure 1A by color-marking SA-2 neurons that exhibited ongoing activity in intertrial periods, and the figure caption has been modified accordingly:

      “Figure 1. Experimental setup. A. Receptive field center locations shown on a standardized fingertip for all first-order tactile neurons included in the study, categorized by neuron type. Purple symbols denote spontaneously active SA-2 neurons exhibiting ongoing activity without external stimulation.”

      Did the authors look to see if the spontaneous firing in SA-2s between trials could predict the extent to which the type 1 afferents encode the proceeding stimulus? Basically, does the SA-2 state relate to how the type 1 units fire?

      We found no clear indications that the responses of FA-1 and SA-1 could be readily anticipated based on the firing patterns of SA-2 neurons.

      In the discussion, it is stated that "the viscoelastic memory of the preceding loading would have modulated the pattern of strain changes in the fingertip differently depending on where their receptor organs are situated in the fingertip". Can the authors expand on this or make any predictions about the size of the memory effect and the distance from the point of stimulation?

      We have explored this topic further in the text, referring to recent studies modeling essential aspects of fingertip mechanics. However, in our view, current models lack the capability to predict the specific nature sought by the reviewer. These models should include a detailed understanding of the intricate networks of collagen fibers anchoring the pulp tissue at the distal phalangeal bone and the nail. They should also consider potential inherent directional preferences of the receptor organs, attributed to their microanatomy. The text modifications are as follows:

      “In addition to the receptor organ locations, the variation in sensitivity among neurons to fingertip deformations in response to both previous and current loadings would stem from the fingertip’s geometry and its complex composite material properties. Possible inherent directional preferences of the receptor organs, attributed to their microanatomy, could also be significant. However, mechanical anisotropy, particularly within the viscoelastic subcutaneous tissue of the fingertip induced by intricately oriented collagen fiber strands forming fat columns in the pulp (Hauck et al., 2004), are likely to play a crucial role. This anisotropy would shape the dynamic pattern of strain changes at neurons' receptor sites, intricately influencing a neuron's sensitivity not only to current but also to preceding loadings. Indeed, recent modeling efforts suggest that such mechanical anisotropy strongly influences the spatiotemporal distribution of stresses and strains across the fingertip (Duprez et al., 2024).”

      Relatedly, we have included additional text to provide a more comprehensive explanation of the “bulk deformation” of the fingertip that occurs during the loadings:

      “As pressure increases in the pulp, the pulp tissue bulges at the end and sides of the fingertip. Simultaneously, the tangential force component amplifies the bulging in the direction of the force while stretching the skin on the opposite side.”

      In the discussion, it would be good if the authors could briefly comment more on the diversity of the mechanoreceptive afferent firing and why this may be useful to the system.

      The diversity in responses among neurons is instrumental in enhancing the information transmitted to the brain by averting redundancy in information acquisition. This diversity thereby contributes to an overall increase in information. We've included a brief statement, along with several references, underscoring this concept:

      "The resulting diversity in the sensitivities of neurons might enhance the overall information collected and relayed to the brain by the neuronal population, facilitating the discrimination between tactile stimuli or mechanical states of the fingertip (see Rongala et al., 2024; Corniani et al., 2022; Tummala et al., 2023, for more extensive explorations of this idea)."

      Also, the authors could briefly discuss why this memory (or recency) effect occurs - is it useful, does it serve a purpose, or it is just a by-product of our skin structure? There are examples of memory in the other senses where comparisons could be drawn. Is it like stimulus adaptation effects in the other senses (e.g. aftereffects of visual motion)?

      We have expanded the concluding paragraph of the discussion, specifically delving into the question of whether the mechanical memory effect serves a deliberate purpose or is simply an incidental byproduct of our skin structure:

      “In any case, the viscoelastic deformability of the fingertips plays a pivotal role in supporting the diverse functions of the fingers. For example, it allows for cushioned contact with objects featuring hard surfaces and allows the skin to conform to object shapes, enabling the extraction of tactile information about objects' 3D shapes and fine surface properties. Moreover, deformability is essential for the effective grasping and manipulation of objects. This is achieved, among other benefits, by expanding the contact surface, thereby reducing local pressure on the skin under stronger forces and enabling tactile signaling of friction conditions within the contact surface for control of grasp stability. Throughout, continuous acquisition of information about various aspects of the current state of the fingertip and its skin by tactile neurons is essential for the functional interaction between the brain and the fingers. In light of this, the viscoelastic memory effect on tactile signaling of fingertip forces can be perceived as a by-product of an overall optimization process within prevailing biological constraints.”

      One point that would be nice to add to the discussion is the implications of the work for skin sensing. What would you predict for the time constant of relaxation of fingertip skin, how long could these skin memory effects last? Two main points to address here may be how the hydration of the skin and anatomical skin changes related to aging affect the results. If the skin is less viscoelastic, what would be the implications for the firing of mechanoreceptors?

      It is likely that the time constant depends to some extent on mechanical factors of the skin, which will likely change due to age or environmental factors. However, while these questions are intriguing, they fall outside the scope of the current study and we are not aware of studies that have addressed these issues directly in experiments either.

      How long does it take for the effect to end? Again, this will likely depend on the skin's viscoelasticity. However, could the authors use it in a psychophysical paradigm to predict whether participants would be more or less sensitive to future stimuli? In this way, it would be possible to test whether the direction modifies touch perception.

      Time constants for tissue viscoelasticity have been estimated to extend up to several seconds (see citations in the introduction). While direct perceptual effects could indeed be explored through psychophysical experimental paradigms, we are currently unaware of any studies specifically addressing the type of effect described in this study. In addition to the statement that, concerning manipulation and haptic tasks, "to our knowledge, a possible influence of fingertip viscoelasticity on task performance has not been systematically investigated," we have now also addressed tactile psychophysical tasks conducted during passive touch with the following sentence in the text:

      “Similarly, there is a lack of systematic investigation of potential effects of fingertip viscoelasticity on performance in tactile psychophysical tasks conducted during passive touch.”

      Reviewer #2 (Public Review):

      Summary:

      The authors sought to identify the impact skin viscoelasticity has on neural signalling of contact forces that are representative of those experienced during normal tactile behaviour. The evidence presented in the analyses indicates there is a clear effect of viscoelasticity on the imposed skin movements from a force-controlled stimulus. Both skin mechanics and evoked afferent firing were affected based on prior stimulation, which has not previously been thoroughly explored. This study outlines that viscoelastic effects have an important impact on encoding in the tactile system, which should be considered in the design and interpretation of future studies. Viscoelasticity was shown to affect the mechanical skin deflections and stresses/strains imposed by previous and current interaction force, and also the resultant neuronal signalling. The result of this was an impaired coding of contact forces based on previous stimulation. The authors may be able to strengthen their findings, by using the existing data to further explore the link between skin mechanics and neural signalling, giving a clearer picture than demonstrating shared variability. This is not a critical addition, but I believe would strengthen the work and make it more generally applicable.

      Strengths:

      - Elegant design of the study. Direct measurements have been made from the tactile sensory neurons to give detailed information on touch encoding. Experiments have been well designed and the forces/displacements have been thoroughly controlled and measured to give accurate measurements of global skin mechanics during a set of controlled mechanical stimuli.

      - Analytical techniques used. Analysis of fundamental information coding and information representation in the sensory afferents reveals dynamic coding properties to develop putative models of the neural representation of force. This advanced analysis method has been applied to a large dataset to study neural encoding of force, the temporal dynamics of this, and the variability in this.

      Weaknesses:

      - Lack of exploration of the variation in neural responses. Although there is a viscoelastic effect that produces variability in the stimulus effects based on prior stimulation, it is a shame that the variability in neural firing and force-induced skin displacements have been presented, and are similarly variable, but there has been no investigation of a link between the two. I believe with these data the authors can go beyond demonstrating shared variability. The force per se is clearly not faithfully represented in the neural signal, being masked by stimulation history, and it is of interest if the underlying resultant contact mechanics are.

      Thank you for this suggestion. We have added a new section investigating the link between skin deformation and neural firing in more depth via a simple neural model. Please see our answer below in the ‘Recommendations’ section for further details.

      Validity of conclusions:

      The authors have succeeded in demonstrating skin viscoelasticity has an impact on skin contact mechanics with a given force and that this impacts the resultant neural coding of force. Their study has been well-designed and the results support their conclusions. The importance and scope of the work is adequately outlined for readers to interpret the results and significance.

      Impact:

      This study will have important implications for future studies performing tactile stimulation and evaluating tactile feedback during motor control tasks. In detailed studies of tactile function, it illustrates the necessity to measure skin contact dynamics to properly understand the effects of a force stimulus on the skin and mechanoreceptors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (Very) minor comments

      - The authors say at the beginning of the Results that, "The fourth type of tactile neurons in the human glabrous skin, fast adapting type II neurons...". Although generally written that there are four types of afferent in the glabrous skin, it would be better to state that these are low-threshold A-beta myelinated mechanoreceptive afferents, at least one time, as there are other types of afferent in the glabrous skin that respond to mechanical stimulation (e.g. low and high threshold C-fibers).

      This is now clarified at the start of the Results section:

      “We recorded action potentials in the median nerve of individual low-threshold A-beta myelinated first-order human tactile neurons innervating the glabrous skin of the fingertip…”

      - Fig. 3: Could you add '(N)' as the measurement of force for Fig. 3A for Fz, Fy, and Fz? Also, please change 'Data was recorded' to 'Data were recorded' in the legend.

      Fixed.

      - At the beginning of the Methods, you say that your study conforms to the Declaration of Helsinki, which actually requires pre-registration in a database. If you did not pre-register your study, please can you add '... in accordance with the Declaration of Helsinki, apart from pre-registration in a database'.

      Thanks for making us aware of this. We have added the suggested qualifier to the ethics statement.

      Reviewer #2 (Recommendations For The Authors):

      The neural representation/encoding of the actual displacement vectors would be a useful addition to the analyses. These vectors have been demonstrated to systematically change with the condition in the irregular series (Figure 2E) and will thus significantly act on the dynamics of induced mechanical changes in the skin with a given interaction force. Thus, it could be examined how the neurons code the magnitude of displacements as well as their direction. An evaluation of the extent to which the imposed displacement magnitudes are encoded in the neural responses would be a useful addition in explaining the signalling of the force events and how the central nervous system decodes these. Evaluating an alternative displacement encoding for comparison to pure force encoding may reveal more about how contact events are represented in the tactile system, which must decode these variable afferent signals to reconstruct a percept of the interaction. It could then be explored how the central nervous system may then scale the dynamic afferent responses based on the background viscoelastic state likely to be present in the SA-II afferent signals (Figure 7) for a context in which to evaluate the dynamic contact forces. This may of course be a complex relationship for the type-I afferents, where the underlying mechanical events evoking the firing (microslips not represented in global forces) have not been measured here. Such a model could be more widely applicable, as the skin viscoelasticity and displacement magnitudes are a straightforward measurement metric and could perhaps be used as a better proxy for neural signalling. This would allow the investigation of a wider variety of forces, and the study of the timing of the viscoelastic effect, both of which have been fixed here. This would give the work a broader impact, rather than just highlighting that this effect produces variability, it could reveal if this mechanical feature is structured in the neural representation. The categorical encoding/decoding tested here is specific to the stimuli used (magnitudes, intervals), but there is the possibility that this may be more generally applicable (within the bounds of forces/speeds) if the underlying basis of the variability in the signalling produced by the viscoelasticity is identified. Since the time course of the viscoelasticity has not been measured here (fixed forces and intervals), further study is required to fully understand the implications this has for a wider variety of situations.

      We agree that a better understanding of how the mechanical deformations are reflected in the resulting spike trains would be valuable. While ultimately a full understanding will need precise measurements of skin deformation across the whole fingertip to account for mechanical propagation to mechanoreceptor locations, relating the deformations at the contact location with neural firing patterns directly can provide useful hints into which aspects of deformation are encoded and how. To this end, we ran a new analysis that aimed to predict the time-varying neural responses directly from the recorded mechanical movements of the contactor.

      Below we have reproduced the new results and methods text along with the additional figures for this analysis. Note that we have also added text in the Discussion to interpret these findings in the context of our other results.

      New section in Results titled Predicting neural responses from contactor movements: “The similarity in the history-dependent variation in neural firing and fingertip deformation at a given force stimulus suggests that neuronal firing is determined by how the fingertip deforms rather than the applied force itself. However, this similarity does not clarify the relationship between fingertip deformation dynamics and neural signaling. To investigate further, we fit cross-validated multiple linear regression models to evaluate how well distinct aspects of contactor movement could predict the time-varying firing rates of individual neurons during the protraction phases of the irregular sequence. The models used predictors based on (1) the three-dimensional position of the contactor, (2) its three-dimensional velocity, (3) a combination of position and velocity signals, and, finally, (4) position and velocity signals along with all possible two-way interactions between them, capturing potentially complex relationship between fingertip deformations and neural signaling.

      Comparing the variance explained (R<sup>2</sup>) by each regression model for each neuron type revealed clear differences between the models (Figure 5A). A two-way mixed design ANOVA, with regression model as within-group effects and neuron type as a between-group effect revealed a main effect of model on variance explained (F(3,462) = 815.5, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.84). Model prediction accuracy overall increased with the number of predictors, with the two-way interaction model outperforming all others (p < 0.001 for all comparisons, Tukey’s HSD). Additionally, a significant main effect of neuron type (F(2,154) = 29.8, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.28) and a significant interaction between regression model and neuron type were observed (F(6,462) = 50.8, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.40).

      For neuron type, model predictions were most accurate for SA-2 neurons, followed by SA-1 neurons, with FA-1 neurons showing the lowest accuracy (p < 0.003 for all comparisons, Tukey’s HSD). The interaction between model and neuron type revealed distinct patterns. For SA-1 and SA-2 neurons, position-only and velocity-only models had similar prediction accuracy (p ≥ 0.996, Tukey’s HSD) with no significant differences between these neuron types (p ≥ 0.552, Tukey’s HSD). FA-1 neurons performed poorly with the position-only model but showed higher accuracy with the velocity-only model (p < 0.001, Tukey’s HSD) and better than SA-1 neurons (p = 0.006, Tukey’s HSD). Models combining position and velocity predictors (without interactions) surpassed both position-only and velocity-only models for SA-1 and SA-2 neurons (p < 0.001, Tukey’s HSD). Overall, the differences between neuron types broadly match their tuning to static and dynamic stimulus properties.

      The two-way interaction model, accounting for most variance in neural responses, produced mean R<sup>2</sup> values of 0.75 for FA-1, 0.88 for SA-1, and 0.91 for SA-2 neurons (Figure 5A). To evaluate the contribution of the different predictors, we ranked them using the permutation feature importance method, focusing on the six most important ones. Regression analyses using only these variables explained almost all of the variance explained by the full model, with a median R<sup>2</sup> reduction of just 0.055 across all neurons. Across all neuron types, at least half included all three velocity components (dPx, dPy, dPz) among the top six, with FA-1 neurons showing the highest prevalence (Figure 5B). Interactions between normal position (Pz) and each velocity component were also frequently observed, while interactions involving tangential position and velocity components were less common. Interactions among velocity components were relatively well represented, followed by interactions limited to position components. Position signals were generally less represented, except for normal position (Pz) in slowly adapting neurons, where it appeared in 50% of SA-1 and 68% of SA-2 neurons. Despite these broad trends, important predictors varied widely across ranks even within a given neuron class (see Figure 5-figure supplement 1), and even the most frequent variables appeared in only a subset of cases, suggesting broad variability in sensitivity across neurons.”

      New methods paragraph titled Predicting time-varying firing rates from skin deformations:

      “This analysis was conducted in Python (v3.13) with pandas for data handling, numpy for numerical operations, and scikit-learn for model fitting and evaluation.

      To assess how well individual neurons' time-varying firing rates could be predicted from simultaneous contactor movements, we fitted multiple linear regression models (see Khamis et al., 2015, for a similar approach}. This analysis focused on the force protraction phase of the irregular sequence, where neurons were most responsive and sensitive to stimulation history. Data from 100 ms before to 100 ms after the protraction phase (between -0.100 s and 0.225 s relative to protraction onset) were included for each trial. Neurons were included if they fired at least two action potentials during the force protraction phase and the following 100 ms in at least five of the 25 trials. This ensured sufficient variability in firing rates for meaningful regression analysis, resulting in 68 SA-1, 38 SA-2, and 51 FA-1 neurons being included.

      Contractor position signals digitized at 400 Hz were linearly interpolated to 1000 Hz. Instantaneous firing rates, derived from action potentials sampled at 12.8 kHz, were resampled at 1000 Hz to align with position signals. A Gaussian filter (σ = 10 ms, cutoff ~16 Hz) was applied to the firing rate as well as to the position signals before differentiation. To account for axonal conduction (8–15 ms) and sensory transduction delays (1–5 ms), firing rates were advanced by 15 ms to align approximately with independent variables.

      Regressions were performed using scikit-learn's Ridge and RidgeCV regressors, which apply L2 regularization to mitigate overfitting. Hyperparameter tuning for the regularization parameter (alpha) was performed using GridSearchCV with a predefined range (0.001–1000.0), incorporating five-fold cross-validation to select the best value. To minimize overfitting risks, model performance was further validated with independent five-fold cross-validation (KFold), and R<sup>2</sup> scores were computed using cross_val_score.

      We constructed four linear regression models with increasing complexity: (1) Position-only, using three-dimensional contactor positions (Px, Py, Pz); (2) Velocity-only, using three-dimensional velocities (dPx, dPy, dPz); (3) Combined, including all position and velocity signals (6 predictors); and (4) Interaction, including all signals and their two-way interactions (21 predictors). All features were standardized using StandardScaler to improve regularization and model convergence. PolynomialFeatures generated second-order interaction terms for the interaction model. Feature importance was evaluated with permutation_importance, and simpler models were built using the most important features. These models were validated through cross-validation to assess retained explanatory power.”

      Minor:

      - It would be useful to add a brief description of the material aspects of the contactor tip to the methods (as per Birznieks 2001).

      We have added the following statement:

      “To ensure that friction between the contactor and the skin was sufficiently high to prevent slips, the surface was coated with silicon carbide grains (50–100 μm), approximating the finish of smooth sandpaper.”

      - The axes labelling on Figure 3A and legend description is ambiguous, probably placing the Px, Py, and Pz labels on the far left axes and the Fx, Fy, and Fz on the right side of the far right axes would make this clearer.

      Label placement has been improved along with some other minor fixes.

      - For the quasi-static phase analysis, the phrase "absence of loading" used in reference to the interstimulus period and SA-II afferents does not seem to be a correct description. The finger is still loaded (at least in the normal direction), with a magnitude of imposed displacement that counteracts the viscoelastic force exerted by the skin mechanics of the fingertip. Although there is a zero net-force load, a mechanical stimulus is still being actively applied to the skin.

      We have changed the wording throughout the text and now consistently refer either to the “interstimulus period” directly or to an “absence of externally applied stimulation” to avoid confusion.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      Summary:

      The revised paper by Kim et al. reports two disease mutations in proBMP4, S91C and E93G, disrupt the FAM20C phosphorylation site at Ser91, blocking the activation of proBMP4 homodimers, while still allowing BMP4/7 heterodimers to function. Analysis of DMZ explants from Xenopus embryos expressing the proBMP4 S91C or E93G mutants showed reduced expression of pSmad1 and tbxt1. The expert amphibian tissue transplant studies were expanded to in vivo studies in Bmp4S91C/+ and Bmp4E93G/+ mice, highlighting the impact of these mutations on embryonic development, particularly in female mice, consistent with patient studies. Additionally, studies in mouse embryonic fibroblasts (MEFs) demonstrated that the mutations did not affect proBMP4 glycosylation or ER-to-Golgi transport but appeared to inhibit the furin-dependent cleavage of proBMP4 to BMP4. Based on these findings and AI modeling using AlphaFold of proBMP4, the authors speculate that pSer91 influences access of furin to its cleavage site at Arg289AlaLysArg292 in a new "Ideas and Speculation" section. Overall, the authors addressed the reviewers' comments, improving the presentation.

      Strengths:

      The strengths of this work continue to lie in the elegant Xenopus and mouse studies that elucidate the impact of the S91C and E93G disease mutations on BMP signaling and embryonic development. Including an "Ideas and Speculation" subsection for mechanistic ideas reduces some shortcomings regarding the analysis of the underlying mechanisms.

      Weaknesses:

      (1)  (Minor) In Figure S1 and lines 165-174 and 179-180, the authors should consider that, unlike the wild-type protein (Ser), which can be reversibly phosphorylated or dephosphorylated, phosphomimic mutations are locked into mimicking either the phosphorylated state (Asp) or the non-phosphorylated state (Ala). Consequently, if the S91D mutant exhibits lower activity than WT, it could imply that S91D interferes with other regulatory constraints, as the authors suggest. However, it may also be inhibiting activation. Therefore, caution is warranted when comparing S91D with S91C to conclude that Ser91 phosphorylation increases BMP4 activity. While additional experiments are not necessary, further consideration is essential.

      (Minor) In lines 394-399, the authors cleverly speculate that pS91 interacts with Arg289-the essential P4 arginine for furin processing. If so, this interaction could hinder the cleavage of proBMP4, as indicated by the results in Figure S1. The discussion would benefit from considering that, contrary to their favored model, dephosphorylation at Ser91 might actually facilitate cleavage.

      We have added a paragraph raising this possibility but explaining why it is unlikely and inconsistent with our in vivo data. The S91D construct was a simple control that was tested in ectopic expression assays and not in vivo.  We can make no conclusions about whether this construct resembles the phosphorylated state or whether it hinders or facilitates cleavage in vivo. The conclusion that dephosphorylation promotes BMP4 cleavage or activity is not compatible with the finding that two mutations associated with birth defects in humans (p.S91C or p.E93G) that are predicted to prevent FAM20C-mediated phosphorylation of the BMP4 prodomain lead to impaired proteolytic maturation of endogenous BMP4 and reduced BMP activity in vivo. 

      (2)  In Figure 4, panels A, E, and I, the proBMP bands in the mouse embryonic lysates and MEFs expressing the mutations show a clear size shift. Are these shifts a cause or a consequence of the lack of cleavage? Regardless, the size shifts should be explicitly noted.

      These intriguing shifts were observed in some but not all biological replicates.  When present, the shifts were not reversed by treatment with phosphatases or deglycosylases, and the shifts were never observed in epitope tagged wild type controls.  We have added a paragraph noting the shifts and our tests of whether they might be due to glycosylation, phosphorylation or epitope tags. 

      (3)  (Minor) In line 314, the authors should consider modifying the wording to: "is required for modulating proprotein convertase..."

      The original wording (“Collectively, our findings are consistent with a model in which FAM20C-mediated phosphorylation of the BMP4 prodomain is not required for folding or exit of the precursor protein from the ER, but is required for proprotein convertase recognition and/or for trafficking to post-TGN compartment(s) where BMP4 is cleaved”) more accurately reflects the model that is supported by our findings. Stating that “phosphorylation ……is required to modulate proprotein convertase recognition and/or trafficking” is vague and leaves open the possibility that it modulates in either direction, which our data do not support as described in point 1 above.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This study investigates the role of microtubules in regulating insulin secretion from pancreatic islet beta cells. This is of great importance considering that controlled secretion of insulin is essential to prevent diabetes. Previously, it has been shown that KIF5B plays an essential role in insulin secretion by transporting insulin granules to the plasma membrane. High glucose activates KIF5B to increase insulin secretion resulting in the cellular uptake of glucose. In order to prevent hypoglycemia, insulin secretion needs to be tightly controlled. Notably, it is known that KIF5B plays a role in microtubule sliding. This is important, as the authors described previously that beta cells establish a peripheral sub-membrane microtubule array, which is critical for the withdrawal of excessive insulin granules from the secretion sites. At high glucose, the sub-membrane microtubule array is destabilized to allow for robust insulin secretion. Here the authors aim to answer the question of how the peripheral array is formed. Based on the previously published data the authors hypothesize that KIF5B organizes the sub-membrane microtubule array via microtubule sliding. 

      General comment: 

      This manuscript provides data that indicate that KIF5B, like in many other cells, mediates microtubule sliding in beta cells. This study is limited to in vitro assays and one cell line. Furthermore, the authors provide no link to insulin secretion and glucose uptake and the overall effects described are moderate. Finally, the overall effect of microtubule sliding upon glucose stimulation is surprisingly low considering the tight regulation of insulin secretion. Moreover, the authors state "the amount of MT polymer on every glucose stimulation changes only slightly, often undetectable…. In fact, we observe a prominent effect of peripheral MT loss only after a long-term kinesin depletion (three-four days)". This challenges the view that a KIF5Bdependent mechanism regulating microtubule sliding plays a major role in controlling insulin secretion. 

      (1) Our initial study was indeed done in a cell line, which is a normal approach to addressing molecular mechanisms of a phenomenon in a challenging cell model: primary pancreatic beta cells are prone to rapidly dedifferentiate outside of the organism and are hard to genetically modify. To address this reviewer’s comment, in the revised manuscript we now confirm the phenotype in beta cells within intact pancreatic islets from a KIF5B KO mouse model (New Figure 2 – Supplemental Figure 1).

      (2) We agree that testing the effect of microtubule sliding on insulin secretion is an important question. Unfortunately, the experimental design needed to accomplish this task is not straighDorward. Importantly, besides microtubule sliding, KIF5B is heavily engaged in insulin granule transport, and GSIS deficiency upon KIF5B inactivation is well documented (e.g. Varadi et al 2002). In this study, we choose not to repeat this GSIS assay because of ample existing data. However, this reported GSIS deficiency could result from a combination of lack of insulin granule delivery to the periphery (previous data) and from the depletion of insulin granules from the periphery due to the loss of the submembrane MT bundle (this study and Bracey et al 2020).  In order to exclusively test the role of MT sliding in secretion, a significant investment in mutant tool development would be needed. Ideally, a new mutant mouse model where insulin granule transport is allowed by MT sliding in blocked must be developed to specifically address this question. To conclude, answering this question will be the subject for another, follow-up study. 

      (3) We respecDully disagree with the reviewer’s opinion that the effect of MT sliding in beta cells is moderate. As MT networks go, even a slight change in MT configuration often has dramatic consequences. For example, in mitotic spindles, a tiny overgrowth of microtubule ends during metaphase, which causes them to attach to both kinetochores rather than just one, is very significant for the efficiency of chromosome segregation, causing aneuploidy and cancer. The changes in beta-cell MT networks that we are reporting are much stronger: the effect on the peripheral MT network accumulated over three days of KIF5B depletion is dramatic (Fig 2 B, C). Short-term gross MT network configurations after a single glucose stimulation are harder to detect, but MTs at the cell periphery are, in fact, destabilized and fragmented, as we and others have previously reported (Ho et al 2020, Mueller et al 2021). Preventing this MT rearrangement completely blocks GSIS (Zhu et al 2015, Ho et al 2020). 

      One of the most fascinating features of insulin secretion regulation is that the amount of generated insulin granules significantly exceeds the normal physiological needs for insulin secretion (~100 times more than needed). At the same time, even slightly facilitated glucose depletion can be devastating. Accordingly, the excessive insulin content of a beta cell resulted in the development of multiple levels of control, preventing excessive secretion. Our previous data suggest that the peripheral MT array provides one of those mechanisms. This study indicates that microtubule sliding is necessary to form the proper peripheral network in the long term. Short-term glucose-induced changes in the peripheral MT array likely need to be subtle to prevent over-secretion. Thus, we are not surprised that a dramatic effect of sliding inhibition is only detectable by our approaches after the changes in the MT network accumulate over time. In the revised paper, we now discuss the potential impact of peripheral MT sliding on positive and negative regulation of secretion and add a schematic model illustrating these processes.

      Specific comments: 

      (1) Notably, the authors have previously reported that high glucose-induced remodeling of microtubule networks facilitates robust glucose-stimulated insulin secretion. This remodeling involves the disassembly of old microtubules and the nucleation of new microtubules. Using real-time imaging of photoconverted microtubules, they report that high levels of glucose induce rapid microtubule disassembly preferentially in the periphery of individual β-cells, and this process is mediated by the phosphorylation of microtubule-associated protein tau. Here, they state that the sub-membrane microtubule array is destabilized via microtubule sliding. What is the relevance of the different processes? 

      In this comment, the summary of our previous conclusions is correct, but the conclusion of this current study is re-stated incorrectly. Indeed, we have previously shown that in high glucose, MTs are destabilized at the cell periphery and nucleated in the cell interior. However, this current paper does not state that “the sub-membrane microtubule array is destabilized via microtubule sliding”. To answer this reviewer’s question, our data support a model where, during glucose stimulation, MT sliding within the peripheral bundle might move fragments of MTs severed by other mechanisms. Importantly, we propose that MT sliding restores the partially destabilized peripheral bundle by delivery of MTs that are nucleated at the cell interior and incorporating them into that bundle. In our overall model, three processes (destabilization, nucleation, and sliding to restore the bundle) are coordinated to maintain beta cell fitness on each GSIS cycle.

      (2) On one hand the authors describe how KIF5B depletion prevents sliding and the transport of microtubules to the plasma membrane to form the sub-membrane microtubule array. This indicates KIF5B is required to form this structure. On the other hand, they describe that at high glucose concentration, KIF5B promotes microtubule sliding to destabilize the sub-membrane microtubule array to allow robust insulin secretion. This appears contradictory. 

      We never intended to make an impression that MT sliding destabilized the sub-membrane bundle. Apologies if there was a reason in our wording that caused this misunderstanding of our model. We propose that while the bundle is destabilized downstream of glucose signaling (e.g. due to tau phosphorylation, please see Ho et al Diabetes 2020), MT sliding remodels the bundle and thereafter rebuilds it to prevent over-secretion. In the revised manuscript, we have doublechecked the whole text to make sure that such misunderstanding is avoided. 

      (3) Previously, it has been shown that KIF5B induces tubulin incorporation along the microtubule shaft in a concentration-dependent manner. Moreover, running KIF5B increases microtubule rescue frequency and unlimited growth of microtubules. Notably, KIF5B regulates microtubule network mass and organization in cells (PMID: 34883065). Consequently, it appears possible that the here observed phenomena of changes in the microtubule network might be due to alterations in these processes. 

      We thank the reviewer for proposing this alternative explanation to the observed change in microtubule networks after KIF5B depletion. We have now directly tested this possibility. Namely, we have re-expressed the kinesin-1 motor domain in MIN6 cells depleted of KIF5B. This motor domain construct by itself is not capable of driving microtubule sliding because it lacks the tail domain. At the same time, it is known to move very efficiently at microtubules and should provide the effects as reported in the article cited by the reviewer. We found that the reexpression of the kinesin motor domain does not rescue microtubule network defects in beta cells (see new Figure 2 – Supplemental Figure 2). Thus, we conclude that the effects of kinesin depletion on the microtubule network in beta cells are due to the lack of microtubule sliding, as reported here.

      (4) The authors provide data that indicate that microtubule sliding is enhanced upon glucose stimulation. They conclude that these data indicate that microtubule sliding is an integral part of glucose-triggered microtubule remodeling. Yet, the authors fail to provide any evidence that this process plays a role in insulin secretion or glucose uptake. 

      We would like to point out that we do not “fail” but rather choose not to overload our study by repeating insulin secretion assays in KIF5B-inactivated cells because this would not have been very informative. It has been found previously that kinesin-1 inactivation or knockout significantly attenuates insulin secretion because kinesin-1 is actively transporting insulin granules and kinesin-1 activity is enhanced under high glucose conditions (e.g. Varadi et al 2002, Cui et al., 2011, Donelan et al, 2002). That said, our current finding is very much in line with these previous data. When kinesin is depleted, two things would be happening at the same time: in the absence of sub-membrane microtubule bundle pre-existing insulin granules would be over-secreted, and new insulin would not be delivered to the periphery, both decreasing GSIS. Unfortunately, we do not have tools yet that would allow us to dissect which part of the insulin secretion defect is due to prior over-secretion (the consequence of deficient MT sliding) and which part is due to the lack of new granule delivery. We plan to develop such tools in the future and elaborate on them in a follow-up study. Here, our goal is to understand microtubule organization principles in beta cells, and we choose not to extend the scope of the current study to metabolic assays.  

      (5) The authors speculate that the sub-membrane microtubule array prevents the over-secretion of insulin. Would one not expect in this case a change in the distribution of insulin granules at the plasma membrane when this array is affected? Or after glucose stimulation? Notably, it has been reported that "the defects of β-cell function in KIF5B mutant mice were not coupled with observable changes in islet morphology, islet cell composition, or β-cell size" and "the subcellular localization of insulin vesicles was found to not be affected significantly by the decreased Kif5b level. The cytoplasm of both wild-type and mutant β-cells was filled with insulin vesicles. Insulin vesicle numbers per square μm were determined by counting all insulin vesicles in randomly photographed β-cells. More insulin granules were found in Kif5b knockout β-cells compared with control cells. This phenomenon is consistent with the observation that insulin secretion by β-cells is affected" whereby "Insulin vesicles (arrowheads) were distributed evenly in both mutant and control cells" (PMID: 20870970).  

      Quantitative analyses in the study cited by the reviewer do not include assays that would be relevant to our study. Particularly, in that study neither the amount of insulin granules at the cell periphery nor the ratio between the number of granules at the periphery and the beta cell interior has been analyzed. In addition, in our preliminary observations not shown here, insulin content in beta cells in KIF5B KO mice is highly heterogeneous, with a subpopulation of cells severely depleted of insulin. This opens a new avenue of investigation into beta cell heterogeneity, which is out of the scope of this current study. Thus, we chose to restrict this current study to microtubule organization data.   

      (6) Does the sub-membrane microtubule array exist in primary beta cells (in vitro and/or in vivo) and how it is affected in KIF5B knockout mice?  

      Yes, it does exist. In fact, we have first reported it in mouse islets (Bracey et al 2020, Ho et al 2020). Now, we report that the sub-membrane bundle is defective, and microtubules are misaligned in KIF5B KO mice (new Figure 2 – Supplemental Figure 1).

      Reviewer #2 (Public Review): 

      In this article, Bracey et al. provide insights into the factors contributing to the distinct arrangement observed in sub-membrane microtubules (MTs) within mouse β-cells of the pancreas. Specifically, they propose that in clonal mouse pancreatic β-cells (MIN6), the motor protein KIF5B plays a role in sliding existing MTs towards the cell periphery and aligning them with each other along the plasma membrane. Furthermore, similar to other physiological features of β-cells, this process of MTs sliding is enhanced by a high glucose stimulus. Because a precise alignment of MTs beneath the cell membrane in β-cells is crucial for the regulated secretion of pancreatic enzymes and hormones, KIF5B assumes a significant role in pancreatic activity, both in healthy conditions and during diseases. 

      The authors provide evidence in support of their model by demonstrating that the levels of KIF5B mRNA in MIN6 cells are higher compared to other known KIFs. They further show that when KIF5B is genetically silenced using two different shRNAs, the MT sliding becomes less efficient. Additionally, silencing of KIF5A in the same cells leads to a general reorganization of MTs throughout the cell. Specifically, while control cells exhibit a convoluted and non-radial arrangement of MTs near the cell membrane, KIF5B-depleted cells display a sparse and less dense sub-membrane array of MTs. Based on these findings, the Authors conclude that the loss of KIF5B strongly affects the localization of MTs to the periphery of the cell. Using a dominant-negative approach, the authors also demonstrate that KIF5B facilitates the sliding of MTs by binding to cargo MTs through the kinesin-1 tail binding domain. Additionally, they present evidence suggesting that KIF5B-mediated MT sliding is dependent on glucose, similar to the activity levels of kinesin-1, which increase in the presence of glucose. Notably, when the glucose concentrations in the culturing media of MIN6 cells are reduced from 20 mM to 5 mM, a significant decrease in MT sliding is observed. 

      Strengths:

      This study unveils a previously unexplained mechanism that regulates the specific rearrangement of MTs beneath the cell membrane in pancreatic β-cells. The findings of this research have implications and are of significant interest because the precise regulation of the MT array at the secretion zone plays a critical role in controlling pancreatic function in both healthy and diseased states. In general, the author's conclusions are substantiated by the provided data, and the study demonstrates the utilization of state-of-the-art methodologies including quantification techniques, and elegant dominant-negative experiments. 

      Weaknesses:

      A few relatively minor issues are present and related to data interpretation and the conclusions drawn in the study. Namely, some inconsistencies between what appears to be the overall and sub-membrane MT array in scramble vs. KIF5B-depleted cells, the lack of details about the sub-cellular localization of KIF5B in these cells and the physiological significance of the effect of glucose levels in beta-cells of the pancreas. 

      We thank the reviewer for this insighDul review. In the revised version, we provided re-worded and extended interpretations and conclusions to prevent any issues or misunderstandings.  We trust that while some noted apparent inconsistencies may reflect the intrinsic heterogeneity of the beta cell population, all data presented here indicate the same trend in phenotypes.  In the revised version, we have provided additional cell views and, in places, alternative representative images and videos, to clear out any apparent inconsistencies. We also would like to point out that we in fact reported KIF5B localization: not surprisingly, KIF5B predominantly localized to insulin granules and the punctate staining fills the whole cytoplasm (Figure 2A, bottom panel). However, as pointed out in detail in our response to reviewer 1, we choose to leave out an extensive study of the physiological and metabolic consequences of the reported microtubule network dynamics to a follow-up study. 

      Reviewer #3 (Public Review): 

      Prior work from the Kaverina lab and others had determined that beta-cells build a microtubule network that differs from the canonical radial organization typical in most mammalian cell types and that this organization facilitates the regulated secretion of insulin-containing secretory granules (IGs). In this manuscript, the authors tested the hypothesis that kinesin-driven microtubule sliding is an underlying mechanism that establishes a sub-membranous microtubule array that regulates IG secretion. They employed knock-down and dominant-negative strategies to convincingly show microtubule sliding does, in fact, drive the assembly of the sub-membranous microtubule band. They also used live cell imaging assays to demonstrate that kinesin-mediated microtubule sliding in beta-cells is triggered by extracellular high glucose. Overall, this is an interesting and important study that relates microtubule dynamics to an important physiological process. The experiments were rigorous and well-controlled. 

      We truly appreciate this reviewer’ opinion. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Figures: 

      (1) Figure 1: 

      a) Why can one not see here, and in most following images, the peripheral sub-membrane microtubule array? One can also not see an accumulation of microtubules in the cell interior. 

      Microtubule pattern in beta cells is variable, and the sub-membrane array is seen in the whole population to a variable extent (see directionality histogram in Figure 2E for statistics). In fact, an array of peripheral MTs parallel to the cell border is present in the example shown in Figure 1 and in all following control images. To make it clearer, we now show the pre-bleach images in Figure 1 D-F at a lower magnification, so that the differences in MT density at the cell periphery and cell center are more clearly seen: MTs lack at the periphery in KF5B-depleted but not the control cells.  

      b) 5 min appears to be a long time and enough time to polymerize a significant number of new microtubules. 

      We interpret this comment as the reviewer’s concern that in FRAP assays, fluorescently-labeled MTs moving into the bleached area might be newly polymerizing MTs rather than preexisting MT relocated into that area. However, this is not the case because newly polymerized MTs contain predominantly quenched “dark” tubulin molecules and only a small percent of fluorescent tubulin. These dim MTs are not included in MT sliding assay analysis, where a threshold for bright MTs is introduced. Now, we added more details for the quantification of these data to Materials and Methods section.

      c) The overall effects appear minor. It is unclear how Fig. 1-Suppl-Fig.1, where no significant difference is shown, is translated into Figure 1 J and K showing a significant difference. 

      With all due respect, we do not agree that the effect is minor. Please see our response to the Public Review where we discuss the major consequences of MT defects in detail. 

      To answer this specific comment, we show that there are significant differences in the number of rapidly moving MTs (5-sec displacement over 0.3 µm) and in the amount of stationary MTs (5sec displacement is below 0.15 µm). There is no significant difference in the amount of slightly displaced MTs (displacements between 0.15 and 0.3 µm; the central part of the histogram). This might indicate that these slight displacements do not depend on kinesin-1 motor but rather are caused by experimental noise, pushing by moving organelles, and/or myosin-dependent forces in the cell. In the revised manuscript, we have this quantification more clearly detailed in Methods and included in Figure legends.

      d) The authors utilize single molecule tracking to further strengthen their conclusion that KIF5B promotes microtubule sliding. The observed effects are weaker than the data obtained from photobleaching experiments. The videos clearly show that there is still significant movement also in KIF5B-depleted cells. If K560RigorE236A binds irreversibly to a microtubule and this microtubule is growing (not only by the addition of tubulin dimers to the plus end; see PMID: 34883065) wouldn't that also result in movement of the tagged K560RigorE236A? As KIF5B is also required in the transport of insulin granules, it should also label "interior microtubules". And in Video 2 it appears that pretty much all "labeled" microtubules are moving. 

      K560RigorE236A forms fiducial marks along the whole MTs lattice, as previously shown in (Tanenbaum et al., 2014). When it is bound to MT lattice, K560RigorE236A moves with the whole MT if it is being relocated. The mechanism described in (PMID: 34883065) appears to be absent or minor in beta cells (see Figure 2- Supplemental Figure 2), thus, even if this mechanism would displace already polymerized MTs, this is not happening in this cell type.

      The reviewer is correct, K560RigorE236A does mark all MTs throughout a beta cell. All MTs are moving slightly in a living cell because they are pushed around by moving organelles, actin contractility, etc. MTs may also be slid by other MT-dependent motors (dynein against the membrane and such). So, it is not surprising that the MT network is “breezing,” and kinesindependent sliding is only a part of MT movement. What we show here is that the KIF5Bdependent MT sliding is responsible for a relatively “long-distance” relocation of MTs manifested in long, directional displacement of fiducial marks.  This does not exclude other movements. This makes extraction of kinesin-dependent MT movements somewhat challenging, of course, that is why we needed to do those extensive analyses. 

      e) Figure 1 G to K is misleading, at least in the context of the provided videos. There are several microtubules that move extensively in shRNA#2-treated cells and overall there appears more movement in this cell as in the control cell. Figure 1I is clearly not representative of the movement shown in Video 2. 

      We apologize if our selection of representative movies/figures for this experiment was imperfect. Indeed, in all depleted cells, SunTag puncta still move to a certain extent, either due to incomplete depletion or to alternative intracellular forces dislocating microtubules. However, there is a clear difference in the fraction of persistently moving puncta (please see Figure 1K and  histogram in Figure 1 - Supplemental Figure 1B). Unfortunately, when the number of SunTag puncta per a cell is variable, it sometimes prevents a good visual perception of the actual distribution of moving versus stationary microtubules. We now show an alternative representative movie for the Figure 1I and the corresponding Video 2, with a goal to compare cells with more consistent numbers of Sun-Tag puncta.

      (2) Figure 2A. 

      a) This is the only image that clearly shows the existence of a sub-membrane microtubule array and the concentration of microtubules in the cell interior. The differences are unclear between the experimental setups including the length of cultivation and knockdown of KIF5B or expression of mutants. 

      We now provide a more detailed description of each image acquisition and processing in Materials and Methods. In brief, while the morphology of MT patterns is intrinsically variable in beta cells, all control cells have populated peripheral MTs that exhibit a more parallel configuration as compared to depletions and mutants.

      b) The authors state "While control cells had convoluted non-radial MTs with a prominent sub-membrane array, typical for beta cells (Fig. 2A), KIF5B-depleted cells featured extra-dense MTs in the cell center and sparse reseeding MTs at the periphery (Fig. 2B, C)". Could that not be explained with the observation that "Kinesin-1 controls microtubule length" (PMID: 34883065)? 

      Thank you for this interesting alternative idea. It does not appear to be the case for beta cells.

      Please see Figure 2-Supplemental Figure 2  and our response to Public Review Comment #3.

      Also, our apologies for the typo in the original manuscript: this is “receding” nor “reseeding”.

      (3) Figure 3: 

      a) This is an elegant way to determine whether KIF5B is involved in microtubule sliding independent of the fact that the effect appears very small. 

      Thank you!            

      b) The assay depends on ectopic expression of a dominant negative mutant. It appears important to show that KIFDNwt is high enough expressed to indeed block the binding of endogenous KIF5B. The authors need to provide a control for this. Furthermore, authors need to provide evidence that other functions of KIF5B are not impaired such as transport of insulin granules and tubulin incorporation or microtubule stability and length.

      Expression of cargo-binding motor domains routinely causes a dominant-negative effect of their cargo transport. This exact construct has been used for the purpose of dominant-negative action previously (Ravindran et al., 2017). It does prevent the membrane cargo binding of KIF5B (Ravindran et al., 2017), thus the transport of insulin granules is also impaired in overexpression cells. Confirming this fact would not influence our study conclusions, so we chose not to repeat these assays for the sake of time.

      c) N-numbers should be similar. The data for KIFDNmut are difficult to interpret with possibly 2 experiments showing little to no displacement and 3 showing displacement. 

      In the revised manuscript, additional data have been added to increase N-numbers.

      (4) Figure 4 and supplements: The morphology of the KIFDNwt cells is greatly affected and this makes it difficult to say whether the effect on microtubules at the cell periphery is a direct or indirect effect. 

      Yes, these cells often have less spread appearance, obscuring visual perception of MT distribution. We have now replaced the image of KIFDNwt cell (Figure 4, Supplemental Figure 1 A) to a more visually representative example.

      Things to do: 

      (1) Notably, the authors have previously reported that high glucose-induced remodeling of microtubule networks facilitates robust glucose-stimulated insulin secretion. This remodeling involves the disassembly of old microtubules and the nucleation of new microtubules. Here, they state that the sub-membrane microtubule array is destabilized via microtubule sliding. What is the relevance of the different processes? Please discuss these in the manuscript. 

      Thank you, we have now extended our discussion of these points and our prior findings. We have also added a schematic model figure for clarity (Figure 7).  

      (2) 5 min appears to be a long time and enough time to polymerize a significant number of new microtubules. Do the authors have any information about the speed of MT formation in MIN6 cells? Can the authors repeat this experiment by preventing MT polymerization? Or repeat the experiment with EB1/EB3 reporter to visualize microtubule growth in the same experimental setting? 

      While some MT polymerization will happen in this timeframe, newly polymerized MTs contain predominantly quenched “dark” tubulin molecules and only a small percent of fluorescent tubulin. These dim MTs are not included in MT sliding assay analysis, where a threshold for bright MTs is introduced. We apologize for initially omitting certain details from the FRAP assay analysis. Now these details have been added.   

      Are the microtubules shown on the cell surface (TIRF microscopy) or do we see here all microtubules? 

      Please see Materials and Methods for microscopy methods and image processing for each figure. Specifically, FRAP assays show a maximum intensity projection of spinning disk confocal stacks over 2.4µm in height (approximately the ventral half of a cell).

      (3) Previously, it has been shown that KIF5B induces tubulin incorporation along the microtubule shaft in a concentration-dependent manner. Moreover, running KIF5B increases microtubule rescue frequency and unlimited growth of microtubules. Notably, KIF5B regulates microtubule network mass and organization in cells (PMID: 34883065). Consequently, it appears possible that the here observed phenomena of changes in the microtubule network might be due to alterations in these processes. Authors need to exclude these possibilities and discuss them. 

      Thank you for this interesting alternative idea. It does not appear to be the case for beta cells. Please see Figure 2-Supplemental Figure 2  and our response to Public Review Comment #3.

      (4) It is important that the authors describe in the text and possibly in the figure legends the differences between the experimental set-ups including the length of cultivation and knock down of KIF5B or expression of mutants. 

      Thank you, please see these details in the text (Materials and Methods section).

      (5) Figure 5: Does KIF5B depletion rescue the kinesore-induced defects 

      Thank you for suggesting this control. We have now conducted corresponding experiments. The answer is yes, it does. Kinesore does not induce detectable changes in MT patterns in KIF5Bdepleted cells (new Figure 5-Supplemental Figure 2). 

      (6) Can the authors block kinesin-1 resulting in microtubule accumulation in the cell center and then release the block, and best inhibiting microtubule formation, to see whether the microtubules accumulated in the cell center will be transported to the periphery? 

      This proposed experiment would have been a nice illustration to the study, however it has proven to be too challenging. Unfortunately we have to leave it for the future studies. However,  the experiments already included in the paper are sufficient to prove our conclusions. 

      Minor comments: 

      (1) The English needs to be improved. Oaen it is unclear what the authors try to convey. The manuscript is difficult to read and contains several overstatements. 

      The revised manuscript has been through several rounds of proof-reading for clarity.

      (2) It is important to describe in more detail in the introduction what is known about KIF5B in beta cells. Previously, it has been demonstrated that silencing, or inactivation by a dominant negative form of KIF5B, blocks the sustained phase of glucose-stimulated insulin secretion (PMID: 9112396, PMID: 12356920, PMID: 20870970). 

      Yes, this is of course very important and have been cited in the original manuscript. Now, we have expanded the discussion on the matter.

      (3) Figure 1B and Fig. 1 Suppl Fig.1: Please provide band sizes and provide information on the size of KIF5B. 

      We have replaced Fig. 1B and Suppl Fig 1A with quantitative analysis of KIF5B depletion, not found in new Fig. 1B and Suppl Fig. 1A-C. 

      (4) It is important to state the used glucose concentrations in Figure 1D (based on the methods section it is probably 25 mM glucose) and all subsequent experiments. Is this correct and comparable to Figure 6A or B? For the non-specialized reader, more information should be provided on why initial glucose starvation is performed.  

      Cell culture models of pancreatic beta cells are routinely maintained at glucose levels that at considered “high”, or stimulatory for secretion. This is needed to prevent the loss of cells’ capacity to respond to glucose stimulation over generations. In order to test GSIS, cells need to be equilibrated at low (fasting, standardly 2.8mM) glucose levels for several hours, so that they are capable of secreting insulin upon glucose addition. 25mM glucose is normally used to stimulate GSIS in cell culture models of beta cells, like MIN6. This is a higher concentration as compared to what is needed to stimulate primary beta cells in islets.

      Reviewer #2 (Recommendations For The Authors): 

      I have the following specific questions that pertain to data interpretation and the conclusions drawn.

      (1) The morphology of the overall MT array before the bleach treatment in both control cells and KIF5B-KD cells depicted in Figure 1D-F and Figure 2A-C appears to be distinct. In Figure 1, it seems that the absence of KIF5B results in a general augmentation of MT mass, whereas the arrangement presented in Figure 2 indicates the contrary. Even in the sub-membrane areas, this phenomenon appears to hold true. However, the images used in this study, which depict entire cells or a significant portion of cells, may not be ideal for visualizing the sub-membrane regions.

      It would be beneficial if the author could offer some explanations for this apparent inconsistency. 

      While beta cell population is intrinsically heterogeneous, all data presented here indicate the same trend in phenotypes. Possibly, some apparent inconsistency between figure 1 and 2 appeared because in the original manuscript we did not show the pre-bleach whole-cell overview in Figure 1. In the revised version, we now show the whole cells for pre-bleach so that MT organization at the cell periphery can be assessed. Please note that in the control cell, MTs are more or less equally distributed over the cell, while in KIF5B depletions the cell periphery is significantly less populated than the cell center. Furthermore, we did not detect MT mass augmentation or increase in KIF5B depletions. One possible explanation for such reviewer’s impression from Figure 2 is that Figure 2 F-H shows thresholded images where threshold was adjusted to highlight peripheral MTs in each cell. Please note that this is not the same threshold for each cell (see Figure 2 - Supplemental Figure 2 and 3). Thus, KIF5B-depleted cells that have fewer MTs at the periphery appear brighter in these thresholded images. For the true comparison of MT intensity, please see Figure 2 A-C (grayscale image, not the threshold).

      (2) It would be helpful if the author could provide a visual representation or comment on the sub-cellular localization of KIF5B in MIN6 cells. Is it predominantly localized in the submembrane region, or is it more evenly distributed throughout the cytoplasm? 

      Please see Fig 2A, lower panel. KIF5B is seen across the cell as a punctate staining, in agreement with previous findings that it mostly localize at IGs.

      (3) The alteration in microtubule (MT) organization and sliding in the absence of KIF5B seems to initiate in proximity to the apparent microtubule organizing center (MTOC) depicted in Figure 2A, and then "simply" extends towards the sub-membrane region. Although the authors acknowledge it, it would be advantageous for the readers to have a clearer indication that the sub-membrane microtubule (MT) reorganization in the absence of KIF5B is a result of a broader MT reorganization rather than a specific occurrence restricted to the sub-membrane regions. 

      Thank you for this comment. We now extend our discussion to clearer state our conclusions and interpretations of this point. We also have added a schematic Figure 7 as an illustration. 

      (4) Regarding the "glucose experiments," it is common to add 20-25 mM glucose to culture media, but physiological concentrations of glucose typically hover around 5 mM. Therefore, it is somewhat unclear what the implications are when investigating the impact of KIF5B depletion on MT sliding at 2.8 mM of glucose. It would be helpful if the authors could provide some commentary on this matter, particularly in relation to physiological and pathological conditions. 

      2.8 mM glucose is a standard low glucose condition used to model glucose deprivation/fasting. For functional primary beta cells within pancreatic islets, GSIS can be triggered by glucose stimulation as low as 8-12 mM glucose. However, for glucose stimulation of cultured beta cells such as MIN6 used in this paper, 20-25 mM glucose is standardly used because these cell lines have a higher threshold of stimulation compared to primary beta cells and whole islets.

      (5) In supplementary Figure 1A, it would be helpful if the lanes in the WB were marked indicating what is what. In my observation, it appears that Supplementary Figure 1A, particularly lanes #2, 3, and 4, display the GAPDH protein (MW 36 kDa) (or is it alpha-tubulin, as mentioned in the Material and Methods section and indicated in lane #409?) relative to Figure 1A. I am curious about KIF5B (MW 108 kDa). Is it represented by the upper band? Did the author probe the same membrane simultaneously with two different primary antibodies? This should be clarified, and the author should indicate the molecular weight of the ladder. 

      Indeed, in the original WB two antibodies have been used together, due to a challenge in collecting a sufficient number of shRNA-expressing beta cells. It caused a confusion and improper interpretation of the loading control. We thank the reviewer for catching this.  We have now replaced old Fig. 1B and Suppl. Fig. 1A with quantitative analysis of KIF5B depletion based on single-cell immunofluorescent staining. It is now found in new Fig. 1B and Suppl Fig. 1A-C.  

      Reviewer #3 (Recommendations For The Authors): 

      In all of the figures that present microtubule orientations (e.g. Figure 2E) the error bars obscure the vertical bins making them difficult to read or interpret. If they were rendered at a larger scale, it would be easier to read and interpret these results. 

      Thank you pointing this out. We now show these histograms with a different format of error bars and without outliers that obscure the view. A variant with outliers is now shown in the supplement. 

      Some of the callouts to the videos in the paper are inaccurate. Perhaps the authors reordered sections of the paper but failed to correctly renumber the video citations? 

      Thank you for this comment, we have corrected all callouts now.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This short report shows that the transcription factor gene mirror is specifically expressed in the posterior region of the butterfly wing imaginal disk, and uses CRISPR mosaic knock-outs to show it is necessary to specify the morphological features (scales, veins, and surface) of this area.

      Strengths:

      The data and figures support the conclusions. The article is swiftly written and makes an interesting evolutionary comparison to the function of this gene in Drosophila. Based on the data presented, it can now be established that mirror likely has a similar selector function for posterior-wing identity in a plethora of insects.

      We thank the reviewer for their feedback.

      Weaknesses:

      This first version has minor terminological issues regarding the use of the terms "domains" and "compartment".

      We acknowledge that the terminologies “domains” and “compartments” might lead to confusion. To avoid confusion we have removed the term “compartment” from the manuscript.

      Reviewer #2 (Public Review):

      This is a short and unpretentious paper. It is an interesting area and therefore, although much of this area of research was pioneered in flies, extending basic findings to butterflies would be worthwhile. Indeed, there is an intriguing observation but it is technically flawed and these flaws are serious.

      The authors show that mirror is expressed at the back of the wing in butterflies (as in flies). They present some evidence that is required for the proper development of the back of the wing in butterflies (a region dubbed the vannus by the ancient guru Snodgrass). But there are problems with that evidence. First, concerning the method, using CRISP they treat embryos and the expectation is that the mirror gene will be damaged in groups of cell lineages, giving a mosaic animal in which some lines of cells are normal for mirror and others are not. We do not know where the clones or patches of cells that are defective for mirror are because they are not marked. Also, we do not know what part of the wing is wild type and what part is mutant for mirror. When the mirror mutant cells colonise the back of the wing and that butterfly survives (many butterflies fail to develop), the back of the wing is altered in some selected butterflies. This raises a second problem: we do not know whether the rear of the wing is missing or transformed. From the images, the appearance of the back of the wing is clearly different from the wild type, but is that due to transformation or not? And then I believe we need to know specifically what the difference is between the rear of the wing and the main part. What we see is a silvery look at the back that is not present in the main part, is it the structure of the scales? We are not told.

      Thank you for this feedback. We appreciate that many readers may not accustomed to looking at mosaic knockouts. As discussed in a previous review article (Zhang & Reed 2017), we rely on a combination of contralateral asymmetry and replicates to infer mutant phenotypes. For many genes (e.g. pigmentation enzymes) mutant clones are obvious, but for other types of genes (e.g. ligands) clone boundaries are sometimes not directly diagnosable. It is simply a limitation of our study system. Nonetheless, you see for yourself that “the back of the wing is altered in some butterflies” – the effects of deleting mirror are clear and repeatable.

      In terms of interpreting mutant phenotypes, we agree that that paper would benefit from a better description of the specific effects. Therefore, we have included an improved, more systematic description of phenotypes, along with better-annotated figures showing changes in wing shape and venation, scale coloration, and color pattern transformation (e.g. posterior elongation of the orange marginal stripes).

      There are other problems. Mirror is only part of a group of genes in flies and in flies both iroquois and mirror are needed to make the back of the wing, the alula (Kehl et al). What is known about iro expression in butterflies?

      In Drosophila mirror, araucan, and caupolican comprise the so-called Iroqouis Complex of genes. As denoted in Figure S4 and in Kerner et al (doi: https://doi.org/10.1186/1471-2148-9-74) the divergence of araucan and caupolican into two separate paralogs is restricted to Drosophila. As in most insects, butterflies have only two Iroquois Complex genes: araucan and mirror. We tested the role of araucan in Junonia coenia as shown in our pre-print: https://doi.org/10.1101/2023.11.21.568172. Its expression appears to be restricted to early pupal wings where it is transcribed in all scale-forming cells. Mosaic araucan KOs resulted in a change in scale iridescent coloration associated with changes in the laminar thickness of scale cells.  

      In flies, mirror regulates a late and local expression of dpp that seems to be responsible for making the alula. What happens in butterflies? Would a study of the expression of Dpp in wildtype and mirror compromised wings be useful?

      We thank the reviewer for the proposal and agree that a future study comparing Dpp in wild-type versus mirror KO butterflies would be useful to clarify the mechanism of Dpp signalling in wing development. It is not clear, however, that the results of a Dpp experiment would change the conclusions of our current study therefore we decided not to undertake these additional experiments for our revision.

      Thus, I find the paper to be disappointing for a general journal as it does little more than claim what was discovered in Drosophila is at least partly true in butterflies. 

      We respect that the reviewer does not have a strong interest in the comparative aspects of this study. Fair enough. This report is primarily aimed at biologists interested in the evolutionary history of insect wings.

      Also, it fails to explain what the authors mean by "wing domains" and "domain specification". They are not alone, butterfly workers, in general, appear vague about these concepts, their vagueness allowing too much loose thinking.

      A domain is “a region distinctively marked by some physical feature”. This term is used extensively in the developmental biology literature (e.g. “expression domain”, “embryonic domain”, “tissue domain”, “domain specification”) and is found throughout popular textbooks (e.g. Alberts et al. “The Cell”, Gilbert “Developmental Biology”). We prefer the term “domain” because of its association in the Drosophila literature with transcription factors that define fields of cells. We specifically avoided using the term “compartment” because of its association with cell lineage, which we have not tested. 

      Since these matters are at the heart of the purpose and meaning of the work reported here, we readers need a paper containing more critical thought and information. I would like to have a better and more logical introduction and discussion.

      We would like the very same thing, of course, and we hope the reviewer finds our revised manuscript to be more satisfying to read.

      The authors do define what they mean by the vannus of the wing. In flies the definition of compartments is clear and abundantly demonstrated, with gene expression and requirement being limited precisely to sets of cells that display lineage boundaries. It is true that domains of gene expression in flies, for example of the iroquois complex, which includes mirror, can only be related to patterns with difficulty. Some recap of what is known plus the opinion of the authors on how they interpret papers on possible lineage domains in butterflies might also be useful as the reader, is no wiser about what the authors might mean at the end of it!

      We thank the reviewer for this suggestion. However, our experiments have little to contribute to the topic of cell lineage compartmentalization. We have therefore opted to avoid speculating on this topic to prevent confusion and to keep the manuscript focused on our experimental results.

      The references are sometimes inappropriate. The discovery of the AP compartments should not be referred to Guillen et al 1995, but to Morata and Lawrence 1975. Proofreading is required.

      We thank the reviewer for suggesting this important reference. We have included it in our revision.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Chatterjee et al. examines the role of the mirror locus in patterning butterfly wings. The authors examine the pattern of mirror expression in the common buckeye butterfly, Junonia coenia, and then employ CRISPR mutagenesis to generate mosaic butterflies carrying clones of mirror mutant cells. They find that mirror is expressed in a well-defined posterior sector of final-instar wing discs from both hindwings and forewings and that CRISPR-injected larvae display a loss of adult wing structures presumably derived from the mirror expressing region of hindwing primordium (the case for forewings is a bit less clear since the mirror domain is narrower than in the hindwing, but there also do seem to be some anomalies in posterior regions of forewings in adults derived from CRISPR injected larvae). The authors conclude that the wings of these butterflies have at least three different fundamental wing compartments, the mirror domain, a posterior domain defined by engrailed expression, and an anterior domain expressing neither mirror nor engrailed. They speculate that this most posterior compartment has been reduced to a rudiment in Drosophila and thus has not been adequately recognized as such a primary regional specialization.

      Critique:

      This is a very straightforward study and the experimental results presented support the key claims that mirror is expressed in a restricted posterior section of the wing primordium and that mosaic wings from CRISPR-injected larvae display loss of adult wing structures presumably derived from cells expressing mirror (or at least nearby). The major issue I have with this paper is the strong interpretation of these findings that lead the authors to conclude that mirror is acting as a high-level gene akin to engrailed in defining a separate extreme posterior wing compartment. To place this claim in context, it is important in my view to consider what is known about engrailed, for which there is ample evidence to support the claim that this gene does play a very ancestral and conserved function in defining posterior compartments of all body segments (including the wing) across arthropods.

      (1) Engrailed is expressed in a broad posterior domain with a sharp anterior border in all segments of virtually all arthropods examined (broad use of a very good panspecies anti-En antibody makes this case very strong).

      (2) In Drosophila, marked clones of wing cells (generated during larval stages) strictly obey a straight anterior-posterior border indicating that cells in these two domains do not normally intermix, thus, supporting the claim that a clear A/P lineage compartment exists.

      In my opinion, mirror does not seem to be in the same category of regulator as engrailed for the following reasons:

      (1) There is no evidence that I am aware of, either from the current experiments, or others that the mirror expression domain corresponds to a clonal lineage compartment. It is also unclear from the data shown in this study whether engrailed is co-expressed with mirror in the posterior-most cells of J. coenia wing discs. If so, it does not seem justified to infer that mirror acts as an independent determinant of the region of the wing where it is expressed.

      (2) Mirror is not only expressed in a posterior region of the wing in flies but also in the ventral region of the eye. In Drosophila, mirror mutants not only lack the alula (derived approximately from cells where mirror is expressed), but also lack tissue derived from the ventral region of the eye disc (although this ventral tissue loss phenotype may extend beyond the cells expressing mirror).

      In summary, it seems most reasonable to me to think of mirror as a transcription factor that provides important development information for a diverse set of cells in which it can be expressed (posterior wing cells and ventral eye cells) but not that it acts as a high-level regulator as engrailed.

      Recommendation:

      While the data provided in this succinct study are solid and interesting, it is not clear to me that these findings support the major claim that mirror defines an extreme posterior compartment akin to that specified by engrailed. Minimally, the authors should address the points outlined above in their discussion section and greatly tone down their conclusion regarding mirror being a conserved selector-like gene dedicated to establishing posterior-most fates of the wing. They also should cite and discuss the original study in Drosophila describing the mirror expression pattern in the embryo and eye and the corresponding eye phenotype of mirror mutants: McNeill et al., Genes & Dev. 1997. 11: 1073-1082; doi:10.1101/gad.11.8.1073.

      We thank the reviewer for their summary, critique, and recommendations. We agree with everything the reviewer says. Honestly, however, we were surprised by these comments because we took great care in the paper to never refer to mirror as a compartmentalization gene or claim it has a function in cell lineage compartmentalization like engrailed. As pointed out, we lack clonal analyses to test for compartmentalization. This is why we used the term “domain” instead of “compartment” in the title and throughout the manuscript. Nevertheless, we have recrafted the discussion in the manuscript, including completely removing the term “compartment”, to better avoid implications that mirror plays a role in cell lineage compartmentalization. 

      We also thank the reviewer for recommending the paper about the role of mirror in eye development. For the sake of keeping the paper focused, however, we decided not to broach the topic of mirror functions outside the context of wing development.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have minor comments for improvement.

      The abstract and introductions are terminologically problematic when they refer to the concept of compartment and compartment boundaries. Allegedly this confusion has previously propagated in several articles related to butterfly wing development, which keeps alienating this literature from being taken seriously by fly specialists, for example. So it is important to use the right terms. I will try to explain point by point here, but I would appreciate it if the authors could undertake a significant rewrite taking these comments into account. The authors use the terms compartment and compartment boundary. This has a very specific use in developmental genetics: mitotic clones never cross a boundary (or compartment). I think the authors can keep referring to the equivalent of the A-P boundary, which is situated somewhere between M1-M2 based on unpublished data from the Patel Lab, and is not very well defined (Engrailed expression moves a little bit during development in this area). Domain is a looser term and can be used more liberally to describe genetically defined regions.

      - "Classical morphological work subdivides insect wings into several distinct domains along the antero-posterior (AP) axis, each of which can evolve relatively independently." Yes. This concept of domain and individuation seems important. You could make a proposed link to selector genes here.

      - "There has been little molecular evidence, however, for AP subdivision beyond a single compartment boundary described from Drosophila melanogaster." Incorrect, and this conflates "domain" and "compartment".

      Flies have wing AP domains too, that pattern their veins (see the cited Banerjee et al). 

      - "Our results confirm that insect wings can have more than one posterior developmental domain, and support models of how selector genes may facilitate evolutionarily individuation of distinct AP domains in insect wings". Yes, and I like the second part of the sentence. Still, I would recommend simply deleting "confirm that insect wings can have more than one posterior developmental domain, and" because this is neglecting previous work on AP genetic regionalization in both flies (vein literature) and butterflies (e.g. McKenna and Nijhout, Banerjee et al).

      - "Analyses of wing pattern diversity across butterflies, considering both natural variation and genetic mutants, suggest that wings can be subdivided into at least five AP domains, bounded by the M1, M3, Cu2, and 2A veins respectively, within each of which there are strong correlations in color pattern variation and wing morphology (Figure 1A)". Yes, and I would recommend emphasizing they correspond to welldefined gene expression domains as mentioned in Banerjee et al, or McKenna and Nijhout.

      - "The anterior-most of these domains, bordered by the M1 vein, appears to correspond to an AP compartment boundary originally described by cell lineage tracing in Drosophila melanogaster, and later supported in butterfly wings by expression of the Engrailed transcription factor. Interestingly, however, D. melanogaster work has yet to reveal clear evidence for additional AP domain boundaries in the wing." Confusingly, because the first sentence is about compartments while the second is about AP domains. I also think the claim that Dmel has no other known AP domains is dubious because Spalt is highly regionalized in flies.

      - "Previous authors have proposed the existence of such individuated domains, and speculated that they may be specified by selector genes.5,10 Our data provide experimental support for this model, and now motivate us to identify factors that specify other domain boundaries between the M1 and A2 veins." Yes, I completely agree with this way to emphasize the selector effect, and to link it to the concept of "individuated domain"

      We cannot thank the reviewer enough for the time and thought they devoted to giving helpful suggestions to improve our manuscript. We have applied all of the above recommendations to the revision.

      Fig. S1: the field needs to move away from Red/Green microscopy images, for accessibility reasons.

      The easiest fix here would be to change the red channels to magenta.

      Green/Magenta provides excellent contrast and accessibility in general in 2-channel images.

      We thank the reviewer for this suggestion. We have improved the color accessibility of Fig. S1.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kv2 subfamily potassium channels contribute to delayed rectifier currents in virtually all mammalian neurons and are encoded by two distinct types of subunits: Kv2 alpha subunits that have the capacity to form homomeric channels (Kv2.1 and Kv2.2), and KvS or silent subunits (Kv5,6,8.9) that can assemble with Kv2.1 or Kv2.2 to form heteromeric channels with novel biophysical properties. Many neurons express both types of subunits and therefore have the capacity to make both homomeric Kv2 channels and heteromeric Kv2/KvS channels. Determining the contributions of each of these channel types to native potassium currents has been very difficult because the differences in biophysical properties are modest and there are no Kv2/KvS-specific pharmacological tools. The authors set out to design a strategy to separate Kv2 and Kv2/KvS currents in native neurons based on their observation that Kv2/KvS channels have little sensitivity to the Kv2 pore blocker RY785 but are blocked by the Kv2 VSD blocker GxTx. They clearly demonstrate that Kv2/KvS currents can be differentiated from Kv2 currents in native neurons using a two-step strategy to first selectively block Kv2 with RY785, and then block both with GxTx. The manuscript is beautifully written; takes a very complex problem and strategy and breaks it down so both channel experts and the broad neuroscience community can understand it.

      Strengths:

      The compounds the authors use are highly selective and unlikely to have significant confounding cross-reactivity to other channel types. The authors provide strong evidence that all Kv2/KvS channels are resistant to RY785. This is a strength of the strategy - it can likely identify Kv2/KvS channels containing any of the 10 mammalian KvS subunits and thus be used as a general reagent on all types of neurons. The limitation then of course is that it can't differentiate the subtypes, but at this stage, the field really just needs to know how much Kv2/KvS channels contribute to native currents and this strategy provides a sound way to do so.

      Weaknesses:

      The authors are very clear about the limitations of their strategy, the most important of which is that they can't differentiate different subunit combinations of Kv2/KvS heteromers. This study is meant to be a start to understanding the roles of Kv2/KvS channels in vivo. As such, this is a minor weakness, far outweighed by the potential of the strategy to move the field through a roadblock that has existed since its inception.

      The study accomplishes exactly what it set out to do: provide a means to determine the relative contributions of homomeric Kv2 and heteromeric Kv2/KvS channels to native delayed rectifier K+ currents in neurons. It also does a fabulous job laying out the case for why this is important to do.

      Reviewer #2 (Public Review):

      Summary:

      Silent Kv subunits and the channels containing these Kv subunits (Kv2/KvS heteromers) are in the process of discovery. It is believed that these channels fine-tune the voltage-activated K+ currents that repolarize the membrane potential during action potentials, with a direct effect on cell excitability, mostly by determining action potentials firing frequency.

      Strengths:

      What makes silent Kv subunits even more important is that, by being expressed in specific tissues and cell types, different silent Kv subunits may have the ability to fine-tune the delayed rectifying voltage-activated K+ currents that are one of the currents that crucially determine cell excitability in these cells. The present manuscript introduces a pharmacological method to dissect the voltage-activated K+ currents mediated by Kv2/KvS heteromers as a means of starting to unveil their importance, together with Kv2-only channels, to the cells where they are expressed.

      Weaknesses:

      While the method is effective in quantifying these currents in any isolated cell under an electric voltage clamp, it is ineffective as a modulating maneuver to perhaps address these currents in an in vivo experimental setting. This is an important point but is not a claim made by the authors.

      We agree. We have now stated in the introduction that this study does not address the roles of Kv2/KvS currents in an in vivo setting.

      Manuscript revisions:

      While this study does not address the impact of GxTX or RY785 on action potentials or in vivo, the distinct pharmacology of Kv2/KvS heteromers presented here suggests that KvS conductances could be targeted to selectively modulate discrete subsets of cell types.  

      There are other caveats with the methods and data:

      (i) The need for a 'cocktail' of blockers to supposedly isolate Kv2 homomers and Kv2/KvS heteromers' currents from others may introduce errors in the quantification Kv2/KvS heteromers-mediated K+ currents and that is due to possible blockers off targets.

      We now point out that is possible that off target effects of blockers may introduce errors, include references that identify the selectivity of the blockers used in the cocktail, and specifically note that 4-aminopyridine in the cocktail is expected to block 2% of Kv2 homomers yet have a lesser impact Kv2/KvS heteromers. Additionally, to test whether the KvS isolation strategy requires the cocktail in neurons, we performed new experiments on a different subclass of nociceptors without the blocker cocktail and identified a substantial KvS-like component (new Fig 7 Supplement 3).

      Manuscript revisions:

      “After whole-cell voltage clamp was established, non-Kv2/KvS conductances were suppressed by changing to an external solution containing a cocktail of inhibitors: 100 nM alpha-dendrotoxin (Alomone) to block Kv1 (Harvey and Robertson, 2004), 3 μM AmmTX3 (Alomone) to block Kv4 (Maffie et al., 2013; Pathak et al., 2016), 100 μM 4-aminopyridine to block Kv3 (Coetzee et al., 1999; Gutman et al., 2005), 1 μM TTX to block TTX sensitive Nav channels, and 10 μM A803467 (Tocris) to block Nav1.8 (Jarvis et al., 2007). It is possible that off target effects of blockers may introduce errors in the quantification Kv2/KvS heteromer-mediated K<sup>+</sup> currents. For example, 4-aminopyridine is expected to block a small fraction, 2%, of Kv2 homomers and have a lesser impact on Kv2/KvS heteromers (Post et al., 1996; Thorneloe and Nelson, 2003; Stas et al., 2015) which could result in a slight overestimation of the ratio of Kv2/KvS heteromers to Kv2 homomers.”

      “We also tested the other major mouse C-fiber nociceptor population, peptidergic nociceptors, to determine if this subpopulation also has conductances resistant to RY785 yet sensitive to GxTX. We voltage clamped DRG neurons from a CGRP<sup>GFP</sup> mouse line that expresses GFP in peptidergic nociceptors (Gong et al., 2003). Deep sequencing has identified mRNA transcripts for Kv6.2, Kv6.3, Kv8.1 and Kv9.3 present in GFP+ neurons, an overlapping but distinct set of KvS subunits from the Mrgprd<sup>GFP</sup> non-peptidergic population (Zheng et al., 2019). In GFP+ neurons from CGRP<sup>GFP</sup> mice, we found that a fraction of outward current was inhibited by 1 µM RY785 and additional current inhibited by 100 nM GxTX (Fig 7 Supplement 3 A-C). In these experiments, 58 ± 2% (mean ± SEM) was KvS-like (Fig 7 Supplement 3 D) identifying that KvSlike conductances are present in these peptidergic nociceptors. For CGRP<sup>GFP</sup> neurons we did not include the Kv1, Kv3, Kv4, Nav and Cav channel inhibitor cocktail used for other neuron experiments, indicating that the cocktail of inhibitors is not required to identify KvS-like conductances.”

      (ii) During the electrophysiology experiments, the authors use a holding potential that is not as negative as it is needed for the recording of the full population of the Kv2/KvS channels. Depolarized holding potentials lead to a certain level of inactivation of the channels, that vary according to the KvS involved/present in that specific population of channels. As a reminder, some KvS promote inactivation and others prevent inactivation. Therefore, the data must be interpreted as such.

      We agree. We now point out that the physiological holding potentials used are insufficiently negative to relieve inactivation from all Kv2/KvS heteromeric channels. We also note that the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.

      Manuscript revisions:

      “Neurons were held at a membrane potential of –74 mV to mimic a physiological resting potential. KvS subunits can profoundly shift the voltage-inactivation relation (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and this potential is likely insufficiently negative to relieve inactivation from all Kv2/KvS heteromeric channels. Also, the activation membrane potential is close to the half-maximal point of Kv2/KvS conductances. Thus the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.”

      (iii) The analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. Also, in dealing with a heterogenous population of Kv2/KvS heteromers, heterogenous K+ conductance deactivation kinetics is a must. Indeed, different KvS may significantly relate to different deactivation kinetics as well.

      We now discuss that the bi-exponential fit of tail currents is likely inadequate to capture the deactivation kinetics of all underlying components of a heterogenous population of Kv2/KvS heteromers.

      Manuscript revisions:

      “We note that the analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. We expect that inactivation of Kv2/KvS conductances during the 200 ms pre-pulse is minimal (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and did not notice inactivation during the activation pulse. Also, deactivation kinetics can vary in a heterogenous population of Kv2/KvS heteromers. While analysis of tail currents could skew the quantification of total Kv2 like and KvS-like conductances, our data supports that mouse nociceptors and human neurons have tail currents that are resistant to RY785 and sensitive to GxTX consistent with the presence of Kv2/KvS heteromers.”

      (iv) Silent Kv subunits may be retained in the ER, in heterologous systems like CHO cells. This aspect may subestimate their expression in these systems. Nevertheless, the authors show similar data in CHO cells and in primary neurons.

      We agree. We now note that in heterologous systems, including CHO cells, transfection of KvS subunits can result in KvS subunits that are retained intracellularly.

      Manuscript revisions:

      “While a fraction of KvS subunits appear to be retained intracellularly, immunofluorescence for Kv5.1, Kv9.3 and Kv2.1 also appeared localized to the perimeter of transfected Kv2.1-CHO cells (Figure 1 Supplement).”

      (v) The hallmark of silent Kv subunits is their effect on the time inactivation of K+ currents. As such, data should be shown throughout, preferably, from this perspective, but it was only done so in Figure 4G.

      Indeed, effects on inactivation are a hallmark of KvS subunits. However, quantifying inactivation of Kv2/KvS channels requires steps to positive voltages for approximately 10 seconds. In neurons steps this long usually resulted in irreversible changes in leak currents/input resistance that degraded the accuracy of RY785/GxTX subtraction currents. Consequently, we did not acquire inactivation data in neurons, and we now explain in the manuscript why such data was not obtained.

      Manuscript revisions:

      “While changes in inactivation are prominent with KvS subunits, we did not investigate inactivation in neurons because the lengthy depolarizations required often resulted in irreversible leak current increases that degraded the accuracy of RY785/GxTX subtraction current quantification.”

      (vi) Functional characterization of currents only, as suggested by the authors as a bona fide of Kv2 and Kv2/KvS currents, should not be solely trusted to classify the currents and their channel mediators.

      We agree, and now state explicitly that functional characterization cannot be trusted to classify their channel mediators of conductances, and we try to be clear about this throughout the manuscript by using soft terms such as "KvS-like" when identity is uncertain.

      Manuscript revisions:

      “As functional characterization alone cannot be trusted to classify their channel mediators of conductances, we define conductances consistent with Kv2/KvS heteromers as 'KvS-like' and conductances consistent with Kv2 homomers as 'Kv2-like'.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There is not a lot to do here - this was a real pleasure to read and very easy to understand, as written. Here are a few minor things to consider:

      (1) The naming of the KvS subunits has always been confusing - it is not clear that Kv5,6,8,9 are members of the Kv2 subfamily from the names. KvS does a good job of differentiating them by assembly phenotype and has been used a lot in the literature, but it doesn't solve the misconception of what subfamily they belong to. This might not matter so much for mammals, where all KvS channels are in the Kv2 subfamily, but it makes it impossible to extend the naming system to other animals where subunits requiring heteromeric assembly are common in most subfamilies. How about trying the name Kv2S? It would have continuity with KvS in the reader's mind, make it clear that they are Kv2 subfamily, and make a naming system that could be extended beyond vertebrates. This is not a problem the authors created - just a completely optional suggestion on how to solve it if so inclined.

      We agree that naming conventions for these subunits are problematic, and agonized quite a bit about nomenclature. In the end we chose to stick with the precedent of KvS.

      (2) Another naming issue they should definitely change is the use of "subfamily" for the different KvS subtypes (Kv5, Kv6, Kv8, and Kv9). This really creates confusion with the higher-order subfamilies that have a very clear functional definition: a subfamily of Kv genes is a group of related genes that have assembly compatibility. Those are Kv1, Kv2, Kv3 and Kv4. KvS genes are assembly compatible with Kv2, evolutionarily derived from the Kv2 lineage, and thus clearly a part of the Kv2 subfamily. Using a subfamily for the next lower level of the naming hierarchy confuses this. The authors should use different terms like sub-type or class or subgroups for the divisions within KvS.

      Thank you. We have standardized to Kv2/KvS as a subfamily; Kv5, Kv6, Kv8, and Kv9 as subtypes; and individual proteins, e.g. Kv8.1, as subunits.

      (3) When you discuss whether the KvS subunit directly disrupts Ry785 binding in the pore or works allosterically and you said you know which KvS residues point into the pore from models, I thought that maybe you could tell from a sequence alignment whether the KvS channels you didn't test look the same in the conduction pathway as the ones you did test. If so, you could mention that if the binding site is the pore, they should all be resistant. Alternatively, if one you didn't test looks fundamentally more similar to the Kv2s in this region, then maybe it could be fingered as a possible exception that needs to be tested later.

      Great ideas. We now assess sequence KvS variability near the proposed RY785 binding site in all KvS subunits. We generated structural models of RY785 docking to Kv2.1 and Kv2.1/Kv8.1 and found that residues near RY785 are different in all KvS subunits.

      Manuscript revisions:

      “We analyzed computational structural models of RY785 docked to a Kv2.1 homomer and a 3:1 Kv2.1:Kv8.1 heteromer (Fig 9) to gain structural insight into how KvS subunits might interfere with RY785 binding. We used Rosetta to dock RY785 to a cryo-EM structure of a Kv2.1 homomer in an apparently open state (Fernández-Mariño et al., 2023). The top-scoring docking pose has RY785 positioned below the selectivity filter and off-axis of the pore (Fig 9 A), similar to a stable pose observed in molecular dynamic simulations (Zhang et al., 2024). In this pose, RY785 contacts a collection of Kv2.1 residues that vary in every KvS subtype (Fig 9 B,D,E). Notably, RY785 bound similarly to a 3:1 model of Kv2.1/Kv8.1, in contact with the three Kv2.1 subunits, yet avoided the Kv8.1 subunit (Fig 9C). This is consistent with RY785 binding less well to Kv2.1/Kv8.1 heteromers, and also suggests that a 3:1 Kv2:KvS channel could retain a RY785 binding site when open.”

      (4) Future suggestion or tip - not for this paper. Your data shows your isolation strategy works really well on Kv6 channels, and these are also the Kv2/KvS channels that have the most pronounced biophysical changes. Working on neurons that have a prominent Kv2/Kv6 component would really show how well the strategy outlined here works to describe the physiology of native neurons. The highest KvS expression I have seen in public data in a wellstudied cell type is Kv6.4 in spinal motor neurons.

      Wonderful tip, thank you. We are indeed very interested in Kv6.4 in spinal motor neurons.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript makes a good contribution to the identification of Kv2/KvS channels in primary cells. The pharmacological method proposed by the authors to dissect the currents in an experimental setting seems proper. Although meritorious in themselves, the findings are heavily phenomenological in the opinion of this reviewer. The manuscript should be improved with some level of mechanistic data and/or the demonstration of different levels of expression in different cell types.

      Thank you for the suggestions. This manuscript now demonstrates strikingly higher levels of the KvS-like component of Kv2 currents in somatosensory (DRG nonpeptidergic and peptidergic nociceptor) versus autonomic (SCG) neuron types. The mechanistic question of what electrophysiological properties the KvS subunits are providing to the neuronal circuit is an exciting one that we are pursuing separately.

      Manuscript revisions:

      “While we found only RY785-sensitive Kv2-like conductances in SCG neurons, Kv2/KvS heteromer-like conductances were dominant in DRG neurons.”

      At present, the manuscript says that the combination of RY785 and guangxitoxin-1E can be used to define Kv2/KvS-mediated K+ currents. Importantly, this method cannot be used in a way that one can functionally determine the function of Kv2/KvS channels, since it depends on the pre-blocking of Kv2-mediated K+ currents prior. In the opinion of this reviewer, this fact decreases the attention of a potential reader.

      Indeed, our study is focused on revealing KvS heteromers by voltage clamp, and we now clarify in the introduction that we do not determine the function of Kv2/KvS channels in this study, so as not to lead the reader to expect studies of neuronal signaling.

      However, the selective pharmacology we identify suggests RY785 application could reveal the function of Kv2 homomers, and for RY785-insensitive signaling, GxTX application of could reveal the function of Kv2/KvS heteromers. We now mention these possible applications in the Discussion.

      Manuscript revisions:

      “While this study does not address the impact of GxTX or RY785 on action potentials or in vivo, the distinct pharmacology of Kv2/KvS heteromers presented here suggests that KvS conductances could be targeted to selectively modulate discrete subsets of cell types.”

      Please find below suggestions for improving the manuscript:

      (1) The term "Kv2/KvS heteromers" should be used throughout instead of variations such as "Kv2/KvS channels", "Kv2/KvS" and others. Standardization of the term to refer to heteromers would make the manuscript easier to read.

      Thank you. We have standardized terms to consistently refer to Kv2/KvS heteromers.

      (2) Confusing terms like KvS conductances, KvS-like conductances, KvS-like (RY785-resistant, GxTX-sensitive) currents, and KvS channels should be avoided because they disregard the current belief that KvS cannot form functional homomeric channels. The term KvS-containing channels, and Kv2/KvS channels, seem more accurate. Uniformization in this regard will also make the manuscript more easily readable.

      Thank you. We have standardized terms to Kv2/KvS heteromers and KvS-containing channels when channel subunits are known and the use terms KvS-like and Kv2-like for functionally identified endogenous conductances with unknown channel subunits.

      (3) Referring to KvS as a regulatory subunit is inaccurate. It is clear that KvS is part of, and it makes up the alpha pore. KvS therefore is a part of the conductive pathway and not a regulatory (suggesting accessory) subunit. KvS take part in selectivity filter (fully conserved), but they also make up an important part of the conducting pathway with non-conserved amino acid residues.

      We felt it important to include the descriptor “regulatory” to connect our nomenclature with prior use of the descriptor in the literature, and now only use the term at the start of the introduction.

      Manuscript revisions:

      “A potential source of molecular diversity for Kv2 channels are a group of Kv2-related proteins which have been referred to as regulatory, silent, or KvS subunits.”

      (4) The use of a cocktail of channel inhibitors may affect the quantification of Kv2/KvS heteromers-mediated K+ currents because they may interact with RY785 and/or GxTx or they may even interact with the sites for these two drugs on Kv2-containing channels.

      This is an interesting point worth considering, thank you. We now alert readers to this possibility in the discussion when considering the limitations of our approach.

      Manuscript revisions:

      “Also, the cocktail of inhibitors used in most neuron experiments here could potentially alter RY785 or GxTX action against KvS/Kv2 channels.”

      (5) The graphical representation of fractional blocking and other parameters (e.g., Fig 1D), is hard to read in these slim plots. In my opinion, tall bars would be more meaningfully visualized.

      Thank you for pointing out that the graphs were hard to read, we have made the graph easier to read and added tall bars.

      (6) Vehicle control for IHC and electrophysiology. Please state what is the vehicle used in the electrophysiology experiments.

      Thank you. The composition of vehicle has now been stated in the methods.

      Manuscript revisions:

      “All RY785 solutions contained 0.1% DMSO. Vehicle control solutions also contained 0.1% DMSO but lacked RY785.”

      “Sections were incubated in vehicle solution (4% milk, 0.2% triton diluted in PB) for 1 hr at RT.”

      (7) The reference Trapani & Korn, 2003 (?) is not included in the list. This reference is important since it sets what are the Kv2.1-CHO cells. In this regard it is also important to mention, even better to address, the expressing qualities of this system in the face of a co-expression with a plasmid-based expression of silent Kv subunits. Are these two ways of expressing Kv subunits, meant to come together (or not) in heteromers, balanced? This question is critical here. Still, in regard to Kv2.1-CHO cells, it was not clear in the manuscript if the term "transfection" refers only to the plasmids used to temporarily induce the expression of silent Kv subunits and potentially Kv channels accessory subunits.

      We now include the Trapani & Korn, 2003 reference (thank you for pointing out this accidental omission), and better explain expression methods. The benefit of the inducible Kv2.1 expression is control of Kv conductance densities which can otherwise become so large as to be refractory to voltage clamp. The beauty of the expression system is that cells recently transfected with KvS subunits can be induced to express just enough Kv2.1 to get a substantial but not clampoverwhelming RY785-resistant Kv2/KvS conductance. We also discuss that our expression methods are distinct from past studies. We stop short of comparing the expression systems, as this is beyond the scope of what we set out to study.

      Manuscript revisions: See next response

      (8) Kv2.1-CHO cells transfection procedures, induction, and validation are unclear. This validation is important here.

      We have clarified transfection procedures, induction, and validation in the methods section.

      Manuscript revisions:

      “The CHO-K1 cell line transfected with a tetracycline-inducible rat Kv2.1 construct (Kv2.1-CHO) (Trapani and Korn, 2003) was cultured as described previously (Tilley et al., 2014).”

      Transfections were achieved with Lipofectamine 3000 (Life Technologies, L3000001). 1 μl Lipofectamine was diluted, mixed, and incubated in 25 μl of Opti-MEM (Gibco, 31985062).”

      “Concurrently, 0.5 μg of KvS or AMIGO1 or Navβ2, 0.5 μg of pEGFP, 2 μl of P3000 reagent and 25 μl of Opti-MEM were mixed. DNA and Lipofectamine 3000 mixtures were mixed and incubated at room temperature for 15 min. This transfection cocktail was added to 1 ml of culture media in a 24 well cell culture dish containing Kv2.1-CHO cells and incubated at 37 °C in 5% CO2 for 6 h before the media was replaced. Immediately after media was replaced, Kv2.1 expression was induced in Kv2.1-CHO cells with 1 μg/ml minocycline (Enzo Life Sciences, ALX380-109-M050), prepared in 70% ethanol at 2 mg/ml. Voltage clamp recordings were performed 12-24 hours later. We note that the expression method of Kv2/KvS heteromers used here is distinct from previous studies which show that the KvS:Kv2 mRNA ratio can affect the expression of functional Kv2/KvS heteromers (Salinas et al., 1997b; Pisupati et al., 2018). We validated the functional Kv2/KvS heteromer expression using voltage clamp to establish distinct channel kinetics and the presence of RY785-resistant conductance in KvS-transfected cells and using immunohistochemistry to label apparent surface localization of KvS subunits (Figure 4, Figure 1 Supplement, Figure 1 and Figure 5).”

      (9) It is important for readers to add some context to Kv2.1/Kv8.1 channels (and other Kv2/KvS heteromers) used to test the combination of RY785 and GxTx. In my opinion, this enriches the discussion.

      Good idea. We have added context about each of the KvS subunits we test.

      Manuscript revisions:

      “To test the pharmacological response of KvS we began with Kv8.1, a subunit that creates heteromers with biophysical properties distinct from Kv2 homomers (Salinas et al., 1997a), and modulates motor neuron vulnerability to cell death (Huang et al., 2024).

      Each of these KvS subunits create Kv2/KvS heteromers that have distinct biophysical properties (Kramer et al., 1998; Kerschensteiner and Stocker, 1999; Bocksteins et al., 2012). Kv5.1/Kv2.1 heteromers play an important role in controlling the excitability of mouse urinary bladder smooth muscle (Malysz and Petkov, 2020), mutations in Kv6.4 have been shown to influence human labor pain (Lee et al., 2020b), and deficiency of Kv9.3 disrupts parvalbumin interneuron physiology in mouse prefrontal cortex (Miyamae et al., 2021).”

      (10) In general, the membrane potential used to activate Kv2 only channels and Kv2/KvS channels is too close to the activation V1/2. In case the comparing curves are displaced in their relative voltage dependence and voltage sensitivity, using that range of membrane potential may introduce a crucial error in the estimation of the conductance's relative amplitudes.

      We now note that the relative conductances of Kv2-only vs Kv2/KvS channels are expected to vary with voltage protocol, as KvS inclusion results in channels with altered voltage responses.

      Manuscript revisions:

      “…the activation membrane potential is close to the half-maximal point of Kv2/KvS conductances. Thus the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.”

      (11) The use of tail currents to estimate conductance is problematic if i) lack of current inactivation is not assured, and ii) if the different currents, with possible different deactivation kinetics at the used membrane potential (e.g., mV), are not assured. Why was the activation peak used at times, and at different elapsed times the tail currents were used instead? These aspects of conductance's amplitude estimation methods should be well defined.

      In CHO cells peak currents were analyzed because outward currents seem to offer the best signal/noise. In neurons, we restricted analysis to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. We have clarified this analysis in the methods section.

      Manuscript revisions:

      “In CHO cells peak currents were analyzed because outward currents seem to offer the best signal/noise. In neurons, we restricted analysis to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. In neurons, voltage gated currents remained in the toxin cocktail + RY785 and GxTX, that were sometimes unstable. To minimize complications from these currents, we restricted analysis of RY785 and GxTX subtraction experiments to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. We note that the analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. We expect that inactivation of Kv2/KvS conductances during the 200 ms pre-pulse is minimal (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and did not notice inactivation during the activation pulse. Also, deactivation kinetics can vary in a heterogenous population of Kv2/KvS heteromers. While analysis of tail currents could skew the quantification of total Kv2 like and KvS-like conductances, our data supports that mouse nociceptors and human neurons have tail currents that are resistant to RY785 and sensitive to GxTX consistent with the presence of Kv2/KvS heteromers.”

      (12) Were the experiments including different conditions such as control, RY, and RY+GxTx done pair-wised? This could potentially better the statistics and strengthen the data and the conclusions drawn from them.

      The control, RY, and RY+GxTX in neurons were done pairwise and the statistical tests performed for these experiments were pairwise tests. We have clarified this in the figure legends.

      Manuscript revisions:

      “Wilcoxon rank tests were paired, except the comparison of RY785 to vehicle which was unpaired.”

      (13) The holding potential of the experiments, mostly -89 mV, may be biasing the estimation of Kv2 only channels vs. Kv2/KvS channels conductances. Figure 4I exemplifies this concern.

      We agree. Figure 4I reveals that a holding potential of -89 mV vs -129 mV reduces conductance of Kv2.1/Kv8.1 heteromers vs Kv2.1 homomers in CHO cells by ~20%. We have now alerted readers that the ratio of Kv2 only channels vs. Kv2/KvS conductances can vary with holding voltage.

      Manuscript revisions:

      “Under these conditions, 58 ± 3 % (mean ± SEM) of the delayed rectifier conductance was resistant to RY785 yet sensitive to GxTX (KvS-like) (Fig 7 F). We note that the ratio of KvS- to Kv2-like conductances is expected to vary with holding potential, as KvS subunits can change the degree and voltage-dependence of steady state inactivation (e.g. Fig 4I).”

      (14) It is possible that Figure 6A (control trace) and Figure 6C ("Kv2-like" trace) are the same, by mistake, since their noise pattern looks too similar.

      Indeed the noise pattern of the Figure 6A (control trace) and Figure 6C ("Kv2-like" trace) are related, as they have inputs from the same trace, with Figure 6C ("Kv2-like" trace) being a subtraction of Figure 6A (+RY trace) from Figure 6A (control trace).

      (15) For example, in Figure 7A, what is the identity of the current remaining after the RY+GxTx application? In Figure 7B, a supposed outlier in the group of data referring to "veh" in the right panel is what possibly is making this group different from +RY in the left panel (p=0.02, Wilcoxon rank test). I would recommend parametric tests only since the data is essentially quantitative.

      In Figure 7A, we do not know the identity of the current remaining after the RY+GxTX application, the kinetics of the residual current appeared distinct from the Kv2/KvS-like currents blocked by RY or GxTX, but we did not analyze these.

      The date in Figure 7B, was indeed the positive outlier in the group of data referring to "veh" in the right panel and contributes to the p-value, but we saw no reason to exclude it. We have now replaced the representative trace in 7B with a non-outlier trace. We respectfully disagree with the suggestion to use parametric statistical tests as we do not know the distribution underlying the variance our data.

      Manuscript revisions:

      “Subsequent application of 100 nM GxTX decreased tail currents by 68 ± 5% (mean ± SEM) of their original amplitude before RY785. We do not know the identity of the outward current that remains in the cocktail of inhibitors + RY785 + GxTX.”

      (16) Please state the importance of using nonpeptidergic neurons to study silent Kv5.1 and Kv9.1 subunits. RNA data may not necessarily work to probe function or protein abundance, which is crucial in heteromeric complexes.

      We have now more thoroughly explained our rationale for choosing the nonpeptidergic neurons.

      RNA is not predictive of protein abundance, and we have not yet been successful in measuring KvS protein abundance in these neurons, so we've probed KvS abundance by assessing RY785 resistance.

      Manuscript revisions:

      “Mouse dorsal root ganglion (DRG) somatosensory neurons express Kv2 proteins (Stewart et al., 2024), have GxTX-sensitive conductances (Zheng et al., 2019), and express a variety of KvS transcripts (Bocksteins et al., 2009; Zheng et al., 2019), yet transcript abundance does not necessarily correlate with functional protein abundance. To record from a consistent subpopulation of mouse somatosensory neurons which has been shown to contain GxTXsensitive currents and have abundant expression of KvS mRNA transcripts (Zheng et al., 2019), we used a Mrgprd<sup>GFP</sup> transgenic mouse line which expresses GFP in nonpeptidergic nociceptors (Zylka et al., 2005; Zheng et al., 2019). Deep sequencing identified that mRNA transcripts for Kv5.1, Kv6.2, Kv6.3, and Kv9.1 are present in GFP+ neurons of this mouse line (Zheng et al., 2019) and we confirmed the presence of Kv5.1 and Kv9.1 transcripts in GFP+ neurons from Mrgprd<sup>GFP</sup> mice using RNAscope (Fig 7 Supplement 1).”

      (17) In Figure 8B, were +RY data different from veh data? The figure shows no Wilcoxon (nonparametric) comparison and this is important to be stated. What conductance(s) is the vehicle solution blocking or promoting? What is RY dissolved in, DMSO? What is the DMSO final concentration?

      We now state that in Figure 8B, +RY amplitudes were not statistically different from veh data in this limited data set. However, the RY-subtraction currents always had Kv2-like biophysical properties, whereas vehicle-subtraction currents had variable properties precluding biophysical analysis for Fig 8D.

      In Figure 8B, we do not know what conductance(s) the vehicle solution is affecting, we think the changes observed are likely merely time dependent or due to the solution exchange itself. RY stock is in DMSO. All recording solutions have 0.1% DMSO final concentration, this is now noted in methods.

      Manuscript revisions:

      “Unlike mouse neurons, we did not detect a significant difference in tail currents of RY785 versus vehicle controls. However, RY785-subtracted currents always had Kv2-like biophysical properties whereas vehicle-subtraction currents had variable properties that precluded the same biophysical analysis. Overall, these results show that human DRG neurons can produce endogenous voltage-gated currents with pharmacology and gating consistent with Kv2/KvS heteromeric channels.”

      “All RY785 solutions contained 0.1% DMSO. Vehicle control solutions also contained 0.1% DMSO but lacked RY785.”

      (18) METHODS. The electrophysiology approach should be unified in all aspects as applicable and possible.

      We have unified the mouse dorsal root ganglion and mouse superior cervical ganglion methods sections. We have kept CHO cells and mouse/human neurons section separate because the methods were substantially different.

      (19) DISCUSSION. The discussion section spends half of its space trying to elaborate on possible blocking/inhibiting/modulating mechanisms for RY785. The present manuscript shows no data, at least not that I have noticed, that would evoke such discussion.

      We have shortened this section, and enhance the discussion with structural models (new Fig 9), and our functional data indicating perturbed RY785 interaction with Kv2.1/8.1.

      Manuscript revisions:

      “In this pose, RY785 contacts a collection of Kv2.1 residues that vary in every KvS subtype (Fig 9 B,D,E). Notably, RY785 bound similarly to a 3:1 model of Kv2.1/Kv8.1, in contact with the three Kv2.1 subunits, yet avoided the Kv8.1 subunit (Fig 9C). This is consistent with RY785 binding less well to Kv2.1/Kv8.1 heteromers, and also suggests that a 3:1 Kv2:KvS channel could retain a RY785 binding site when open. However, the RY785 resistance of Kv2/KvS heteromers may primarily arise from perturbed interactions with the constricted central cavity of closed channels. In homomeric Kv2.1, RY785 becomes trapped in closed channels and prevents their voltage sensors from fully activating, indicating that RY785 must interact differently with closed channels (Marquis and Sack, 2022). Here we found that Kv2.1/Kv8.1 current rapidly recovers following washout of RY785, suggesting that Kv2.1/Kv8.1 heteromers do not readily trap RY785 (Figure 2 Supplement). Overall, the structural modeling suggests that KvS subunits sterically interfere with RY785 binding to the central cavity, while functional data suggest KvS subunits disrupt RY785 trapping in closed states.”

      (20) DISCUSSION. Topics like ER retention and release upon certain conditions would be a better enrichment for the manuscript in my opinion.

      ER retention of KvS subunits is indeed an important topic! However, we have opted not to delve into it here.

      (21) DISCUSSION. Speculation about the binding site for RY on Kv2/KvS channels is also not touched by the data shown in the manuscript.

      We have shortened this section of discussion, and now present this with structural models of RY785 docked to a Kv2.1 homomer and 3:1 Kv2.1: Kv8.1 heteromer (new Fig 9) to ground speculations. See manuscript changes noted in response to comment (19) above.

      (22) DISCUSSION. An important reference is missing in regard to stoichiometry: Bocksteins et al., 2017. This work is the only one using a non-optical technique to add knowledge to that question.

      Good point, and an excellent study we didn’t realize we’d not included before. We now include Bocksteins et al. 2017 as a reference in the Introduction.

      (23) In my opinion, allosterism and orthosterism are concepts not yet useful for the discussion of RY binding sites without even a general piece of data.

      We now include structural models of RY785 docked to a Kv2.1 homomer and 3:1 Kv2.1: Kv8.1 heteromer (new Fig 9) to ground blocking speculations. See manuscript changes noted in response to comment (19).

      (24) The term "homogeneously susceptible" associated with a Hill slope close to 1 needs to be more elaborated.

      Thank you, we have elaborated.

      Manuscript revisions:

      “Also, the degree of resistance to RY785 may vary if Kv2:KvS subunit stoichiometry varies. With high doses of RY785, we found that the concentration-response characteristics of Kv2.1/Kv8.1 in CHO cells revealed hallmarks of a homogenous channel population with a Hill slope close to 1 (Fig 2B). However, other KvS subunits might assemble in multiple stoichiometries and result in pharmacologically-distinct heteromer populations.”

      (25) Stating the KvS are resistant to RY785 is not proper in my opinion. This opinion relates to the fact that the RY binding site in the channels is certainly not restricted to a binding site residing only on the Kv subunit.

      Good point. We have now changed phrasing to convey that KvS subunits are a component of a heteromer that imbues RY785 resistance.

      Manuscript revisions:

      “These results show that voltage-gated outward currents in cells transfected with members from each KvS subtype have decreased sensitivity to RY785 but remain sensitive to GxTX. While we did not test every KvS subunit, the ubiquitous resistance suggests that all KvS subunits may provide resistance to 1 μM RY785 yet remain sensitive to GxTX, and that RY785 resistance is a hallmark of KvS-containing channels.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of the melanocortin system in puberty onset. They conclude that POMC neurons within the arcuate nucleus of the hypothalamus provide important but differing input to kisspeptin neurons in the arcuate or rostral hypothalamus.

      Strengths:

      Innovative and novel

      Technically sound

      Well-designed

      Thorough

      Weaknesses:

      There were no major weaknesses identified.

      Reviewer #2 (Public review):

      Summary:

      This interesting manuscript describes a study investigating the role of MC4R signalling on kisspeptin neurons. The initial question is a good one. Infertility associated with MC4 mutations in humans has typically been ascribed to the consequent obesity and impaired metabolic regulation. Whether there is a direct role for MC4 in regulating the HPG axis has not been thoroughly examined. Here, the researchers have assembled an elegant combination of targetted loss of function and gain of function in vivo experiments, specifically targetting MC4 expression in kisspeptin neurons. This excellent experimental design should provide compelling evidence for whether melanocortin signalling dirently affects arcuate kisspeptin neurons to support normal reproductive function. There were definite effects on reproductive function (irregular estrous cycle, reduced magnitude of LH surge induced by exogenous estradiol). However, the magnitude of these responses and the overall effect on fertility were relatively minor. The mice lacking MC4R in kisspeptin neurons remained fertile despite these irregularities. The second part of the manuscript describes a series of electrophysiological studies evaluating the pharmacological effects of melanocortin signalling in kisspeptin cells in ex-vivo brain slides. These studies characterised interesting differential actions of melanocortins in two different populations of kisspeptin neurons. Collectively, the study provides some novel insights into how direct actions of melanocortin signalling via the MC4 receptor in kisspeptin neurons contribute to the metabolic regulation of the reproductive system. Importantly, however, it is clear that other mechanisms are also at play.

      Strengths:

      The loss of function/gain of function experiments provides a conceptually simple but hugely informative experimental design. This is the key strength of the current paper - especially the knock-in study that showed improved reproductive function even in the presence of ongoing obesity. This is a very convincing result that documents that reproductive deficits in MC4R knockout animals (and humans with deleterious MC4R gene variants) can be ascribed to impaired signalling in the hypothalamic kisspeptin neurons and not necessarily caused as a consequence of obesity. As concluded by the authors: "reproductive impairments observed in MC4R deficient mice, which replicate many of the conditions described in humans, are largely mediated by the direct action of melanocortins via MC4R on Kiss1 neurons and not to their obese phenotype." This is important, as it might change how such fertility problems are treated.

      I would like to see the validation experiments for the genetic manipulation studies given greater prominence in the manuscript because they are critical to interpretation. Presently, only single unquantified images are shown, and a much more comprehensive analysis should be provided.

      Weaknesses:

      (1) Given that mice lacking MC4R in kisspeptin neurons remained fertile despite some reproductive irregularities, this can be described as a contributing pathway, but other mechanisms must also be involved in conveying metabolic information to the reproductive system. This is now appropriately covered in the discussion.

      (2) The mechanistic studies evaluating melanocortin signalling in kisspeptin neurons were all completed in ovariectomised animals (with and without exogenous hormones) that do not experience cyclical hormone changes. Such cyclical changes are fundamental to how these neurons function in vivo and may dynamically alter how they respond to hormones and neuropeptides. Eliminating this variable makes interpretation difficult, but the authors have justified this as a reductionist approach to evaluate estradiol actions specifically. However, this does not reflect the actual complexity of reproductive function.

      For example, the authors focus on a reduced LH response to exogenous estradiol in ovariectomised mice as evidence that there might be a sub-optimal preovulatory LH surge. However, the preovulatory LH sure (in intact animals) was not measured.

      They have not assessed why some follicles ovulated, but most did not. They have focused on the possibility that the ovulation signal (LH surge) was insufficient rather than asking why some follicles responded and others did not. This suggests some issue with follicular development, likely due to changes in gonadotropin secretion during the cycle and not simply due to an insufficient LH surge.

      Reviewer #3 (Public review):

      The manuscript by Talbi R et al. generated transgenic mice to assess the reproduction function of MC4R in Kiss1 neurons in vivo and used electrophysiology to test how MC4R activation regulated Kiss1 neuronal firing in ARH and AVPV/PeN. This timely study is highly significant in neuroendocrinology research for the following reasons.

      (1) The authors' findings are significant in the field of reproductive research. Despite the known presence of MC4R signaling in Kiss1 neurons, the exact mechanisms of how MC4R signaling regulates different Kiss1 neuronal populations in the context of sex hormone fluctuations are not entirely understood. The authors reported that knocking out Mc4r from Kiss1 neurons replicates the reproductive impairment of MC4RKO mice, and Mc4r expression in Kiss1 neurons in the MC4R null background partially restored the reproductive impairment. MC4R activation excites Kiss1 ARH neurons and inhibits Kiss1 AVPV/PeN neurons (except for elevated estradiol).

      (2) Reproduction dysfunction is one of obesity comorbidities. MC4R loss-of-function mutations cause obesity phenotype and impaired reproduction. However, it is hard to determine the causality. The authors carefully measured the body weight of the different mouse models (Figure 1C, Figure 2A, Figure 3B). For example, the Kiss1-MC4RKO females showed no body weight difference at puberty onset. This clearly demonstrated the direct function of MC4R signaling in reproduction but was not a consequence of excessive adiposity.

      (3) Gene expression findings in the "KNDy" system align with the reproduction phenotype.

      (4) The electrophysiology results reported in this manuscript are innovative and provide more details of MC4R activation and Kiss1 neuronal activation.

      Overall, the authors have presented sufficient background in a clear, logical, and organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, and made a justified conclusion.

      Comments on revisions:

      The authors have addressed my comments.

      Recommendations for the authors:

      The reviewers noted that they received comments in response to their concerns, and some improvements have been made to the manuscript. However, as described below, in some cases, a rebuttal was provided, but changes were not made to the manuscript. It is suggested that these issues be addressed to improve the quality of the manuscript.

      We thank the reviewers and editor for the assessment of the manuscript and recommendations for its improvement. We have addressed the remaining comments from reviewer #2 below, and hope that they find our revisions satisfactory.

      Reviewer #2 (Recommendations for the authors):

      The manuscript convincingly shows that MC4R in kisspeptin-producing cells can influence reproductive function. This suggests that fertility problems associated with melanocortin mutations are likely due to direct effects on the reproductive systems rather than simply being side effects of the resultant obesity.

      We are pleased that this reviewer finds the data convincing and thank them for the careful review of the manuscript, which has helped to improve its published version.

      The authors have responded to the reviewer's comments and made several improvements to the manuscript.

      The authors are correct in pointing out that the POMC-Cre animals should be fine for studies involving the administration of AAVs to adult animals. I have misinterpreted how these mice were being used, and this concern is fully addressed.

      Unfortunately, in some cases, the authors rebutted the reviewer's comments but did not change the manuscript. I suggest addressing several issues in the manuscript (after all, it is not the reviewer's opinion that counts; this process is about improving the manuscript).

      (1) Validation of the KO is insufficiently reported. From the methods, it appears that this was done thoroughly, but currently, only a single image of the arcuate nucleus is shown, and no image of the AVPV is shown. There is no quantitative information provided. The authors can keep these data as supplementary material, but they should be comprehensive and convincing, as so much depends on the degree of knockout in this model. One cannot assume complete KO based simply on the relevant genetics, as there are examples in this system where different Cre lines produce different outcomes with various floxed genes in the two major populations of kisspeptin neurons. This figure should show the quantitation of the RNAscope analysis from each of the two regions regarding the percentage of kisspeptin cells showing expression of MC4R mRNA. In addition, the lack of MC4 labelling in the arcuate nucleus, outside of kisspeptin neurons, is a concern. One would expect to see AgRP or POMC cells at this level, but are they still showing expression of MC4? A single image is insufficient to be convinced of the model's efficacy.

      We appreciate the reviewer’s concerns regarding the validation of the MC4RKO model. Below, we provide clarification and additional justification for our approach.

      (1) Quantification of MC4R in the Arcuate Nucleus (ARC): As noted by the reviewer, we were unable to detect sufficient MC4R signal in the ARC of KO mice to perform meaningful quantification. This is consistent with the expected outcome of a successful MC4R deletion. Given the low endogenous expression levels of MC4R in this region, even in control animals, and the technical limitations of RNAscope in detecting very low-abundance transcripts, especially for receptors, the absence of MC4R signal in the ARC of KO mice strongly supports effective deletion. Moreover, the MC4R loxP mouse has been published and validated by many labs including Brad Lowell’s lab who’s done extensive work using these mice for selective deletion of Mc4r from various neuronal populations such as Sim1 and Vglut2 neurons (Shah et al., 2014, de Souza Cordeiro et al., 2020). To further strengthen our validation, we provide additional images from another animal (Fig_S1) to illustrate the consistency of the MC4R KO in the ARC. These will be included as supplementary material, as suggested.Regarding AgRP and POMC neurons, MC4R is not highly expressed in these neurons (as per previous literature, e.g., Garfield et al., Nat Neurosci. 2015; Padilla SL et al, Endocrinology 2012; Henry et al, Nature, 2015). Instead, MC4R is predominantly found in downstream neurons in the paraventricular nucleus (PVN) and other hypothalamic regions (which is intact in our KO mice as shown in our validation figure). Thus, the absence of MC4R labeling in AgRP or POMC cells in our images aligns with known expression patterns and does not contradict the validity of our model.

      (2) MC4R Expression in the AVPV and OVX Effect on Kiss1 Expression: We acknowledge the reviewer’s request for MC4R expression analysis in the anteroventral periventricular nucleus (AVPV). However, due to the timing of tissue collection after ovariectomy (OVX), Kiss1 expression in the AVPV is significantly suppressed, making it technically unfeasible to perform co-staining of MC4R with Kiss1 in this region. This is a well-documented effect of estrogen depletion following OVX (Smith et al., 2005; Lehman et al., 2010). While we acknowledge that an ideal validation would include AVPV co-labeling, the experimental constraints related to OVX preclude this analysis in our dataset.

      Given these considerations and validations, we are confident that the KO is effective and specific.

      (2) Line 88: "... however, conflicting reports exist". Expand on this sentence to describe what these conflicting reports show. The authors responded to my comment but made no changes to the introduction. As a reader, I dislike being told there are conflicting reports, but then I have to go and look up the reference to see what that actual point of conflict is.

      By conflicting reports we meant that other studies have shown no association between MC4R and reproductive disorders, this has now been included in the revised manuscript (Line 89).

      (3) Could the authors explain how a decrease in AgRP would be interpreted as a "decrease in hypothalamic melanocortin tone" in line 142 and line 364? These overly simplistic interpretations of qPCR data detract from the overall quality of the paper.

      The reference to a decrease in melanocortin tone referred to the decrease in the expression of melanocortin receptor signaling, this has been clarified in the revised manuscript (lines 142 and 360).

      (4) Please show the individual cycle patterns for all animals, as in Figure 2B. This can be a supplemental figure, but the current bar charts are not informative.

      We respectfully disagree that the bar charts are not informative as they include the critical statistical analysis. We have now included all individual estrous cycle data in new separate supplemental figure (Sup. Figure 3). Therefore, we have excluded the representative cycles from the main figures as they are now in the new Supplemental. We have changed the orders of the figures in the text accordingly.

      (5) In their rebuttal, the authors state: "Mice lack true follicular and luteal phases, and therefore, it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate an LH surge with an E2-replacement regimen [1]. This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Inclusion of cycling females would make interpretation much more difficult." I disagree, but the authors can take this position if they wish. However, they should not report the responses to exogenous estradiol in an ovariectomised mouse as a "preovulatory LH surge" (line 380). An ovariectomised mouse cannot ovulate, and the estrogen-induced LH surge is significantly different in magnitude and timing from the endogenous preovulatory LH surge (likely due to the actions of progesterone). One goal of these studies is to understand why the ovulation rate appears to be low in the MC4-KO animals. Hence, evaluating whether the preovulatory LH surge is typical is important. This has not been done. The authors have shown that the response to exogenous estradiol is sub-normal. Such an effect might lead to a reduced preovulatory LH surge, but this has not been measured.

      We appreciate this reviewer’s concern about the nature of the preovulatory LH surge. We have clarified this in the revised manuscript and described it as “an induced LH surge” throughout the text (Lines 163, 533, 6560).

      (6) I believe that the ovulation process should be considered "all or none," and I do not quite understand the rebuttal discussion. The authors describe that "numerous follicles mature at the same time....". That is not disputed. My point was that each mature follicle will receive the identical endocrine ovulatory signal (correct? Or do the authors believe something different?). If it were sufficient for one follicle to ovulate, then all of those mature follicles (the number of which will be variable between animals and between cycles) would be expected to undergo ovulation. The fact that they do not raise several possibilities. One that the authors favor is that an insufficient ovulatory signal might approach a threshold where some follicles ovulate and others do not. This possibility is supported by the apparent increase in cystic follicles, which might be preovulatory follicles that did not complete the ovulation process. Such variation might be stochastic, within normal variation for sensitivity to LH. However, it is also possible that the follicles have not matured at the same rate, perhaps influenced by abnormal secretion of LH or FSH during earlier phases of the cycle, and hence are not in the appropriate condition to respond to the ovulation signal when it arrives. Some may even have matured prematurely due to the elevated gonadotropins reported in this study. Given the data and the partial fertility, the most likely explanation is that the genetic manipulation has resulted in fewer follicles being available for ovulation due to changes in follicular development rather than a deficit of the ovulation signal, although the latter mechanism might also contribute. A third possibility is that genetic manipulation has directly affected the ovary. The authors did not answer whether Kiss1 and MC4 are co-expressed in the ovary. I think the authors might want to rule this out by showing no change in MC4R expression in the ovary.

      We thank the reviewer for this thoughtful comment and agree that these are possible outcomes. We have now acknowledged them in the Discussion.

      To answer the reviewer’s question, we have not investigated the co-expression of Kiss1 and Mc4r in the ovary. While MC4R has indeed been documented in the ovary (Chen et al. Reproduction, 2017), the changes in gonadotropin release and supporting in vitro data included in this manuscript clearly document a central effect, however, an additional effect at the level of the ovary cannot be completely ruled out. This has now been added to the discussion (Line 378-387).

      (7) Lines 390, 454 " impaired LH pulse" What was the evidence for impaired LH pulse (see figure 2D)?

      Thank you for pointing this out. This comment referred to augmented LH release. This has been corrected in the revised manuscript (Line 394).

      The paper's strengths remain, as outlined in my original review. The authors have addressed what I perceived to be weaknesses, predominantly by changing the tone of discussion and interpretation of the data. This is appropriate. I consider the focus on the LH surge as the primary mechanism too narrow, and the authors should be considering how other changes during the cycle might influence ovarian function.

      We sincerely appreciate the reviewer’s thoughtful evaluation of our manuscript and their constructive feedback. We are pleased that our revisions have addressed the perceived weaknesses and that the adjustments to the discussion and interpretation were deemed appropriate.

      We acknowledge the reviewer’s perspective on broadening the discussion beyond the LH surge to consider additional cycle-dependent influences on ovarian function. While our current study focuses on this specific mechanism, we recognize that ovarian function is influenced by multiple physiological changes throughout the cycle. We have refined our discussion to reflect this broader context and appreciate the suggestion to consider these additional factors in future studies.

      We have addressed all of the reviewer’s comments to the best of our ability and hope they find the revised manuscript satisfactory.

    1. Author response:

      The following is the authors’ response to the original reviews

      ANALYTICAL

      (1) A key claim made here is that the same relationship (including the same parameter) describes data from pigeons by Gibbon and Balsam (1981; Figure 1) and the rats in this study (Figure 3). The evidence for this claim, as presented here, is not as strong as it could be. This is because the measure used for identifying trials to criterion in Figure 1 appears to differ from any of the criteria used in Figure 3, and the exact measure used for identifying trials to criterion influences the interpretation of Figure 3***. To make the claim that the quantitative relationship is one and the same in the Gibbon-Balsam and present datasets, one would need to use the same measure of learning on both datasets and show that the resultant plots are statistically indistinguishable, rather than simply plotting the dots from both data sets and spotlighting their visual similarity. In terms of their visual characteristics, it is worth noting that the plots are in log-log axis and, as such, slight visual changes can mean a big difference in actual numbers. For instance, between Figure 3B and 3C, the highest information group moves up only "slightly" on the y-axis but the difference is a factor of 5 in the real numbers. Thus, in order to support the strong claim that the quantitative relationships obtained in the Gibbon-Balsam and present datasets are identical, a more rigorous approach is needed for the comparisons.

      ***The measure of acquisition in Figure 3A is based on a previously established metric, whereas the measure in Figure 3B employs the relatively novel nDKL measure that is argued to be a better and theoretically based metric. Surprisingly, when r and r2 values are converted to the same metric across analyses, it appears that this new metric (Figure 3B) does well but not as well as the approach in Figure 3A. This raises questions about why a theoretically derived measure might not be performing as well on this analysis, and whether the more effective measure is either more reliable or tapping into some aspect of the processes that underlie acquisition that is not accounted for by the nDKL metric.

      Figure 3 shows that the relationship between learning rate and informativeness for our rats was very similar to that shown with pigeons by Gibbon and Balsam (1981). We have used multiple criteria to establish the number of trials to learn in our data, with the goal of demonstrating that the correspondence between the data sets was robust. In the revised Figure 3, specifically 3C and 3D, we have plotted trials to acquisition using decision criterion equivalent to those used by Gibbon and Balsam. The criterion they used—at least one peck at the response key on at least 3 out of 4 consecutive trials—cannot be directly applied to our magazine entry data because rats make magazine entries during the inter-trial interval (whereas pigeons do not peck at the response key in the inter-trial interval). Therefore, evidence for conditioning in our paradigm must involve comparison between the response rate during CS and the baseline response rate, rather than just counting responses during the CS. We have used two approaches to adapt the Gibbon and Balsam criterion to our data. One approach, plotted in Figure 3C, uses a non-parametric signed rank test for evidence that the CS response rate exceeds the pre-CS response rate, and adopting a statistical criterion equivalent to Gibbon and Balsam’s 3-out-of-4 consecutive trials (p<.3125). The second method (Figure 3D) estimates the nDkl for the criterion used by Gibbon and Balsam and then applies this criterion to the nDkl for our data. To estimate the nDkl of Gibbon and Balsam’s data, we have assumed there are no responses in the inter-trial interval and the response probability during the CS must be at least 0.75 (their criterion of at least 3 responses out of 4 trials). The nDkl for this difference is 2.2 (odds ratio 27:1). We have then applied this criterion to the nDkl obtained from our data to identify when the distribution of CS response rates has diverged by an equivalent amount from the distribution of pre-CS response rates. These two analyses have been added to the manuscript to replace those previously shown in Figures 3B and 3C.

      (2) Another interesting claim here is that the rates of responding during ITI and the cue are proportional to the corresponding reward rates with the same proportionality constant. This too requires more quantification and conceptual explanation. For quantification, it would be more convincing to calculate the regression slope for the ITI data and the cue data separately and then show that the corresponding slopes are not statistically distinguishable from each other. Conceptually, it is not clear why the data used to test the ITI proportionality came from the last 5 conditioning sessions. What were the decision criteria used to decide on averaging the final 5 sessions as terminal responses for the analyses in Figure 5? Was this based on consistency with previous work, or based on the greatest number of sessions where stable data for all animals could be extracted?

      If the model is that animals produce response rates during the ITI (a period with no possible rewards) based on the overall rate of rewards in the context, wouldn't it be better to test this before the cue learning has occurred? Before cue learning, the animals would presumably only have attributed rewards in the context to the context and thus, produce overall response rates in proportion to the contextual reward rate. After cue learning, the animals could technically know that the rate of rewards during ITI is zero. Why wouldn't it be better to test the plotted relationship for ITI before cue learning has occurred? Further, based on Figure 1, it seems that the overall ITI response rate reduces considerably with cue learning. What is the expected ITI response rate prior to learning based on the authors' conceptual model? Why does this rate differ from pre and post-cue learning? Finally, if the authors' conceptual framework predicts that ITI response rate after cue learning should be proportional to contextual reward rate, why should the cue response rate be proportional to the cue reward rate instead of the cue reward rate plus the contextual reward rate?

      A single regression line, as shown in Figure 5, is the simplest possible model of the relationship between response rate and reinforcement rate and it explains approximately 80% of the variance in response rate. Fixing the log-log slope at 1 yields the maximally simple model. (This regression is done in the logarithmic domain to satisfy the homoscedasticity assumption.) When transformed into the linear domain, this model assumes a truly scalar relation (linear, intercept at the origin) and assumes the same scale factor and the same scalar variability in response rates for both sets of data (ITI and CS). Our plot supports such a model. Its simplicity is its own motivation (Occam’s razor).

      If separate regression lines are fitted to the CS and ITI data, there is a small increase in explained variance (R<sub>2</sub> = 0.82). These regression lines have been added to the plot in the revised manuscript (Figure 5). We leave it to further research to determine whether such a complex model, with 4 parameters, is required. However, we do not think the present data warrant comparing the simplest possible model, with one parameter, to any more complex model for the following reasons:

      · When a brain—or any other machine—maps an observed (input) rate to a rate it produces (output rate), there is always an implicit scalar. In the special case where the produced rate equals the observed rate, the implicit scalar has value 1. Thus, there cannot be a simpler model than the one we propose, which is, in and of itself, interesting.

      · The present case is an intuitively accessible example of why the MDL (Minimum Description Length) approach to model complexity (Barron, Rissanen, & Yu, 1998; Grünwald, Myung, & Pitt, 2005; Rissanen, 1999) can yield a very different conclusion from the conclusion reached using the Bayesian Information Criterion (BIC) approach. The MDL approach measures the complexity of a model when given N data specified with precision of B bits per datum by computing (or approximating) the sum of the maximum-likelihoods of the model’s fits to all possible sets of N data with B precision per datum. The greater the sum over the maximum likelihoods, the more complex the model, that is, the greater its measured wiggle room, it’s capacity to fit data. Recall that von Neuman remarked to Fermi that with 4 parameters he could fit an elephant. His deeper point was that multi-parameter models bring neither insight nor predictive power; they explain only post-hoc, after one has adjusted their parameters in the light of the data. For realistic data sets like ours, the sums of maximum likelihoods are finite but astronomical. However, just as the Sterling approximation allows one to work with astronomical factorials, it has proved possible to develop readily computable approximations to these sums, which can be used to take model complexity into account when comparing models. Proponents of the MDL approach point out that the BIC is inadequate because models with the same number of parameters can have very different amounts of wiggle room. A standard illustration of this point is the contrast between logarithmic model and power-function model. Log regressions must be concave; whereas power function regressions can be concave, linear, or convex—yet they have the same number of parameters (one or two, depending on whether one counts the scale parameter that is always implicit). The MDL approach captures this difference in complexity because it measures wiggle room; the BIC approach does not, because it only counts parameters.

      · In the present case, one is comparing a model with no pivot and no vertical displacement at the boundary between the black dots and the red dots (the 1-parameter unilinear model) to a bilinear model that allows both a change in slope and a vertical displacement for both lines. The 4-parameter model is superior if we use the BIC to take model complexity into account. However, 4-parameter has ludicrously more wiggle room. It will provide excellent fits—high maximum likelihood—to data sets in which the red points have slope > 1, slope 0, or slope < 0 and in which it is also true that the intercept for the red points lies well below or well above the black points (non-overlap in the marginal distribution of the red and black data). The 1-parameter model, on the other hand, will provide terrible fits to all such data (very low maximum likelihoods). Thus, we believe the BIC does not properly capture the immense actual difference in the complexity between the 1-parameter model (unilinear with slope 1) to the 4-parameter model (bilinear with neither the slope nor the intercept fixed in the linear domain).

      · In any event, because the pivot (change in slope between black and red data sets), if any, is small and likewise for the displacement (vertical change), it suffices for now to know that the variance captured by the 1-parameter model is only marginally improved by adding three more parameters. Researchers using the properly corrected measured rate of head poking to measure the rate of reinforcement a subject expects can therefore assume that they have an approximately scalar measure of the subject’s expectation. Given our data, they won’t be far wrong even near the extremes of the values commonly used for rates of reinforcement. That is a major advance in current thinking, with strong implications for formal models of associative learning. It implies that the performance function that maps from the neurobiological realization of the subject’s expectation is not an unknown function. On the contrary, it’s the simplest possible function, the scalar function. That is a powerful constraint on brain-behavior linkage hypotheses, such as the many hypothesized relations between mesolimbic dopamine activity and the expectation that drives responding in Pavlovian conditioning (Berridge, 2012; Jeong et al., 2022; Y.  Niv, Daw, Joel, & Dayan, 2007; Y. Niv & Schoenbaum, 2008).

      The data in Figures 4 and 5 are taken from the last 5 sessions of training. The exact number of sessions was somewhat arbitrary but was chosen to meet two goals: (1) to capture asymptotic responding, which is why we restricted this to the end of the training, and (2) to obtain a sufficiently large sample of data to estimate reliably each rat’s response rate. We have checked what the data look like using the last 10 sessions, and can confirm it makes very little difference to the results. We now note this in the revised manuscript. The data for terminal responding by all rats, averaged over both the last 5 sessions and last 10 sessions, can be downloaded from https://osf.io/vmwzr/

      Finally, as noted by the reviews, the relationship between the contextual rate of reinforcement and ITI responding should also be evident if we had measured context responding prior to introducing the CS. However, there was no period in our experiment when rats were given unsignalled reinforcement (such as is done during “magazine training” in some experiments). Therefore, we could not measure responding based on contextual conditioning prior to the introduction of the CS. This is a question for future experiments that use an extended period of magazine training or “poor positive” protocols in which there are reinforcements during the ITIs as well as during the CSs. The learning rate equation has been shown to predict reinforcements to acquisition in the poor-positive case (Balsam, Fairhurst, & Gallistel, 2006).

      (3) There is a disconnect between the gradual nature of learning shown in Figures 7 and 8 and the information-theoretic model proposed by the authors. To the extent that we understand the model, the animals should simply learn the association once the evidence crosses a threshold (nDKL > threshold) and then produce behavior in proportion to the expected reward rate. If so, why should there be a gradual component of learning as shown in these figures? In terms of the proportional response rule to the rate of rewards, why is it changing as animals go from 10% to 90% of peak response? The manuscript would be greatly strengthened if these results were explained within the authors' conceptual framework. If these results are not anticipated by the authors' conceptual framework, this should be explicitly stated in the manuscript.

      One of us (CRG) has earlier suggested that responding appears abruptly when the accumulated evidence that the CS reinforcement rate is greater than the contextual rate exceeds a decision threshold (C.R.  Gallistel, Balsam, & Fairhurst, 2004). The new more extensive data require a more nuanced view. Evidence about the manner in which responding changes over the course of training is to some extent dependent on the analytic method used to track those changes. We presented two different approaches. The approach shown in Figures 7 and 8 (now 6 and 7), extending on that developed by Harris (2022), assumes a monotonic increase in response rate and uses the slope of the cumulative response rate to identify when responding exceeds particular milestones (percentiles of the asymptotic response rate). This analysis suggests a steady rise in responding over trials. Within our theoretical model, this might reflect an increase in the animal’s certainty about the CS reinforcement rate with accumulated evidence from each trial. While this method should be able to distinguish between a gradual change and a single abrupt change in responding (Harris, 2022) it may not distinguish between a gradual change and multiple step-like changes in responding and cannot account for decreases in response rate.

      The other analytic method we used relies on the information theoretic measure of divergence, the nDkl (Gallistel & Latham, 2023), to identify each point of change (up or down) in the response record. With that method, we discern three trends. First, the onset tends to be abrupt in that the initial step up is often large (an increase in response rate by 50% or more of the difference between its initial value and its terminal value is common and there are instances where the initial step is to the terminal rate or higher). Second, there is marked within-subject variability in the response rate, characterized by large steps up and down in the parsed response rates following the initial step up, but this variability tends to decrease with further training (there tend to be fewer and smaller steps in both the ITI response rates and the CS response rate as training progresses). Third, the overall trend, seen most clearly when one averages across subjects within groups is to a moderately higher rate of responding later in training than after the initial rise. We think that the first tendency reflects an underlying decision process whose latency is controlled by diminishing uncertainty about the two reinforcement rates and hence about their ratio. We think that decreasing uncertainty about the true values of the estimated rates of reinforcement is also likely to be an important part of the explanation for the second tendency (decreasing within-subject variation in response rates). It is less clear whether diminishing uncertainty can explain the trend toward a somewhat greater difference in the two response rates as conditioning progresses. It is perhaps worth noting that the distribution of the estimates of the informativeness ratio is likely to be heavy tailed and have peculiar properties (as witness, for example, the distribution of the ratio of two gamma distributions with arbitrary shape and scale parameters) but we are unable at this time to propound an explanation of the third trend.

      (4) Page 27, Procedure, final sentence: The magazine responding during the ITI is defined as the 20 s period immediately before CS onset. The range of ITI values (Table 1) always starts as low as 15 s in all 14 groups. Even in the case of an ITI on a trial that was exactly 20 s, this would also mean that the start of this period overlaps with the termination of the CS from the previous trial and delivery (and presumably consumption) of a pellet. It should be indicated whether the definition of the ITI period was modified on trials where the preceding ITI was < 20 s, and if any other criteria were used to define the ITI. Were the rats exposed to the reinforcers/pellets in their home cage prior to acquisition?

      There was an error in the description provided in the original text. The pre-CS period used to measure the ITI responding was 10 s rather than 20 s. There was always at least a 5-s gap between the end of the previous trial and the start of the pre-CS period. The statement about the pre-CS measure has been corrected in the revised manuscript.

      (5) For all the analyses, the exact models that were fit and the software used should be provided. For example, it is not necessarily clear to the reader (particularly in the absence of degrees of freedom) that the model discussed in Figure 3 fits on the individual subject data points or the group medians. Similarly, in Figure 6 there is no indication of whether a single regression model was fit to all the plotted data or whether tests of different slopes for each of the conditions were compared. With regards to the statistics in Figure 6, depending on how this was run, it is also a potential problem that the analyses do not correct for the potentially highly correlated multiple measurements from the same subjects, i.e. each rat provides 4 data points which are very unlikely to be independent observations.

      Details about model fitting have been added to the revision. The question about fitting a single model or multiple models to the data in Figure 6 (now 5) is addressed in response 2 above. In Figure 5, each rat provides 2 behavioural data points (ITI response rate and CS response rate) and 2 values for reinforcement rate (1/C and 1/T). There is a weak but significant correlation between the ITI and CS response rates (r = 0.28, p < 0.01; log transformed to correct for heteroscedasticity). By design, there is no correlation between the log reinforcement rates (r = 0.06, p = .404).

      CONCEPTUAL

      (1) We take the point that where traditional theories (e.g., Rescorla-Wagner) and rate estimation theory (RET) both explain some phenomenon, the explanation in terms of RET may be preferred as it will be grounded in aspects of an animal's experience rather than a hypothetical construct. However, like traditional theories, RET does not explain a range of phenomena - notably, those that require some sort of expectancy/representation as part of their explanation. This being said, traditional theories have been incorporated within models that have the representational power to explain a broader array of phenomena, which makes me wonder: Can rate estimation be incorporated in models that have representational power; and, if so, what might this look like? Alternatively, do the authors intend to claim that expectancy and/or representation - which follow from probabilistic theories in the RW mould - are unnecessary for explanations of animal behaviour?***

      It is important for the field to realize that the RW model cannot be used to explain the results of Rescorla’s (Rescorla, 1966; Rescorla, 1968, 1969) contingency-not-pairing experiments, despite what was claimed by Rescorla and Wagner (Rescorla & Wagner, 1972; Wagner & Rescorla, 1972) and has subsequently been claimed in many modelling papers and in most textbooks and reviews (Dayan & Niv, 2008; Y. Niv & Montague, 2008). Rescorla programmed reinforcements with a Poisson process. The defining property of a Poisson process is its flat hazard function; the reinforcements were equally likely at every moment in time when the process was running. This makes it impossible to say when non-reinforcements occurred and, a fortiori, to count them. The non-reinforcements are causal events in RW algorithm and subsequent versions of it. Their effects on associative strength are essential to the explanations proffered by these models. Non-reinforcements—failures to occur, updates when reinforcement is set to 0, hence also the lambda parameter—can have causal efficacy only when the successes may be predicted to occur at specified times (during “trials”). When reinforcements are programmed by a Poisson process, there are no such times. Attempts to apply the RW formula to reinforcement learning soon foundered on this problem (Gibbon, 1981; Gibbon, Berryman, & Thompson, 1974; Hallam, Grahame, & Miller, 1992; L.J. Hammond, 1980; L. J. Hammond & Paynter, 1983; Scott & Platt, 1985). The enduring popularity of the delta-rule updating equation in reinforcement learning depends on “big-concept” papers that don’t fit models to real data and discretize time into states while claiming to be real-time models (Y. Niv, 2009; Y. Niv, Daw, & Dayan, 2005).

      The information-theoretic approach to associative learning, which sometimes historically travels as RET (rate estimation theory), is unabashedly and inescapably representational. It assumes a temporal map and arithmetic machinery capable in principle of implementing any implementable computation. In short, it assumes a Turing-complete brain. It assumes that whatever the material basis of memory may be, it must make sense to ask of it how many bits can be stored in a given volume of material. This question is seldom posed in associative models of learning, nor by neurobiologists committed to the hypothesis that the Hebbian synapse is the material basis of memory. Many—including the new Nobelist, Geoffrey Hinton— would agree that the question makes no sense. When you assume that brains learn by rewiring themselves rather than by acquiring and storing information, it makes no sense.

      When a subject learns a rate of reinforcement, it bases its behavior on that expectation, and it alters its behavior when that expectation is disappointed. Subjects also learn probabilities when they are defined. They base some aspects of their behavior on those expectations, making computationally sophisticated use of their representation of the uncertainties (Balci, Freestone, & Gallistel, 2009; Chan & Harris, 2019; J. A. Harris, 2019; J.A. Harris & Andrew, 2017; J. A. Harris & Bouton, 2020; J. A. Harris, Kwok, & Gottlieb, 2019; Kheifets, Freestone, & Gallistel, 2017; Kheifets & Gallistel, 2012; Mallea, Schulhof, Gallistel, & Balsam, 2024 in press).

      (2) The discussion of Rescorla's (1967) and Kamin's (1968) findings needs some elaboration. These findings are already taken to mean that the target CS in each design is not informative about the occurrence of the US - hence, learning about this CS fails. In the case of blocking, we also know that changes in the rate of reinforcement across the shift from stage 1 to stage 2 of the protocol can produce unblocking. Perhaps more interesting from a rate estimation perspective, unblocking can also be achieved in a protocol that maintains the rate of reinforcement while varying the sensory properties of the US (Wagner). How does rate estimation theory account for these findings and/or the demonstrations of trans-reinforcer blocking (Pearce-Ganesan)? Are there other ways that the rate estimation account can be distinguished from traditional explanations of blocking and contingency effects? If so, these would be worth citing in the discussion. More generally, if one is going to highlight seminal findings (such as those by Rescorla and Kamin) that can be explained by rate estimation, it would be appropriate to acknowledge findings that challenge the theory - even if only to note that the theory, in its present form, is not all-encompassing. For example, it appears to me that the theory should not predict one-trial overshadowing or the overtraining reversal effect - both of which are amenable to discussion in terms of rates.

      I assume that the signature characteristics of latent inhibition and extinction would also pose a challenge to rate estimation theory, just as they pose a challenge to Rescorla-Wagner and other probability-based theories. Is this correct?

      The seemingly contradictory evidence of unblocking and trans-reinforcer blocking by Wagner and by Pearce and Ganesan cited above will be hard for any theory to accommodate. It will likely depend on what features of the US are represented in the conditioned response.

      RET predicts one-trial overshadowing, as anyone may verify in a scientific programming language because it has no free parameters; hence, no wiggle room. Overtraining reversal effects appear to depend on aspects of the subjects’ experience other than the rate of reinforcement. It seems unlikely that it can proffer an explanation.

      Various information-theoretic calculations give pretty good quantitative fits to the relatively few parametric studies of extinction and the partial-reinforcement extinction effect (see Gallistel (2012, Figs 3 & 4); Wilkes & Gallistel (2016, Fig 6) and Gallistel (2025, under review, Fig 6). It has not been applied to latent inhibition, in part for want of parametric data. However, clearly one should not attribute a negative rate to a context in which the subject had never been reinforced. An explanation, if it exists, would have to turn on the effect of that long period on initial rate estimates AND on evidence of a change in rate, as of the first reinforcement.

      Recommendations for authors:

      MINOR POINTS

      (1) It is not clear why Figure 3C is presented but not analyzed, and why the data presented in Figure 4 to clarify the spread of the distribution of the data observed across the plots in Figure 3 uses the data from Figure 3C. This would seem like the least representative data to illustrate the point of Figure 4. It also appears that the data plotted in Figure 4 corresponds to Figure 3A and 3B rather than the odds 10:1 data indicated in the text.

      Figures 3 has changed as already described. The data previously plotted in Figure 4 are now shown in 3B and corresponds to that plotted in Figure 3A.

      (2) Log(T) was not correlated with trials to criterion. If trials to criterion is inversely proportional to log(C/T) and C is uncorrelated with T, shouldn't trials to criterion be correlated with log(T)? Is this merely a matter of low statistical power?

      Yes. There is a small, but statistically non-significant, correlation between log(T) and trials to criterion, r = 0.35, p = .22. That correlation drops to .08 (p = .8) after factoring out log(C/T), which demonstrates that the weak correlation between log(T) and trials to criterion is based on the correlation between log(t) and log(C/T).

      (3) The rationale for the removal of the high information condition samples in the Fig 8 "Slope" plot to be weak. Can the authors justify this choice better? If all data are included, the relationship is clearly different from that shown in the plot.

      We have now reported correlations that include those 3 groups but noted that the correlations are largely driven by the much lower slope values of those 3 groups which is likely an artefact of their smaller number of trials. We use this to justify a second set of correlations that excludes those 3 groups.

      (4) The discussion states that there is at most one free parameter constrained by the data - the constant of proportionality for response rate. However, there is also another free parameter constrained by data-the informativeness at which expected trials to acquisition is 1.

      I think this comment is referring to two different sets of data. The constant of proportionality of the response rate refers to the scalar relationship between reinforcement rate and terminal response rate shown in Figure 5. The other parameter, the informativeness when trials to acquisition equals 1, describes the intercept of the regression line in Figure 1 (and 3).

      (5) The authors state that the measurement of available information is not often clear. Given this, how is contingency measurable based on the authors' framework?

      (6) Based on the variables provided in Supplementary File 3, containing the acquisition data, we were unable to reproduce the values reported in the analysis of Figure 3.

      Figure 3 has changed, using new criteria for trials to acquisition that attempt to match the criterion used by Gibbon and Balsam. The data on which these figures are based has been uploaded into OSF.

      GRAPHICAL AND TYPOGRAPHICAL

      (1) Y-axis labels in Figure 1 are not appropriately placed. 0 is sitting next to 0.1. 0 should sit at the bottom of the y-axis.

      If this comment refers to the 0 sitting above an arrow in the top right corner of the plot, this is not misaligned. The arrow pointing to zero is used to indicate that this axis approaches zero in the upward direction. 0 should not be aligned to a value on the axis since a learning rate of zero would indicate an infinite number of learning trials. The caption has been edited to explain this more clearly.

      (2) Typo, Page 6, Final Paragraph, line 4. "Fourteen groups of rats were trained with for 42 session"

      Corrected. Thank you.

      (3) Figure 3 caption: Typo, should probably be "Number of trials to acquisition"?

      This change has now been made. The axis shows reinforcements to acquisition to be consistent with Gibbon and Balsam, but trials and number of reinforcements are identical in our 100% reinforcement schedule.

      (4) Typo Page 17 Line 1: "Important pieces evidence about".

      Correct. Thank you.

      (5) Consider consistent usage of symbols/terms throughout the manuscript (e.g. Page 22, final paragraph: "iota = 2" is used instead of the corresponding symbol that has been used throughout).

      Changed.

      (6) Typo Page 28, Paragraph 1, Line 9: "We used a one-sample t-test using to identify when this".

      This section of text has been changed to reflect the new analysis used for the data in Figure 3.

      (7) Typo Page 29, Paragraph 1, Line 2: "problematic in cases where one of both rates are undefined" either typo or unclear phrasing.

      “of” has been corrected to “or”

      (8) Typo Page 30: Equation 3 appears to have an error and is not consistent with the initial printing of Equation 3 in the manuscript.

      The typo in initial expression of Eq 3 (page 23) has been corrected.

      (9) Typo Page 33, Line 5: "Figures 12".

      Corrected.

      (10) Typo Page 34, Line 10: "and the 5 the increasingly"? Should this be "the 5 points that"?

      Corrected.

      (11) Typo Page 35, Paragraph 2: "estimate of the onset of conditioned is the trial after which".

      Corrected.

      (12) Clarify: Page 35, final paragraph: it is stated that four-panel figures are included for each subject in the Supplementary files, but each subject has a six-panel figure in the Supplementary file.

      The text now clarifies that the 4-panel figures are included within the 6-panel figures in the Supplementary materials.

      (13) It is hard to identify the different groups in Figure 2 (Plot 15).

      The figure is simply intended to show that responding across seconds within the trial is relatively flat for each group. Individuation of specific groups is not particularly important.

      (14) It appears that the numbering on the y-axis is misaligned in Figure 2 relative to the corresponding points on the scale (unless I have misunderstood these values and the response rate measure to the ITI can drop below 0?).

      The numbers on the Y axes had become misaligned. That has now been corrected.

      (15) Please include the data from Figure 3A in the spreadsheet supplementary file 3. If it has already been included as one of the columns of data, please consider a clearer/consistent description of the relevant column variable in Supplementary File 1.

      The data from Figure 3 are now available from the linked OSF site, referenced in the manuscript.

      (16) Errors in supplementary data spreadsheets such that the C/T values are not consistent with those provided in Table 1 (C/T values of 4.5, 54, 180, and 300 are slightly different values in these spreadsheets). A similar error/mismatch appears to have occurred in the C/T labels for Figures (e.g. Figure 10) and the individual supplementary figures.

      The C/T values on the figures in the supplementary materials have been corrected and are now consistent with those in Table 1.

      (17) Currently the analysis and code provided at https://osf.io/vmwzr/ are not accessible without requesting access from the author. Please consider making these openly available without requiring a request for authorization. As such, a number of recommendations made here may already have been addressed by the data and code deposited on OSF. Apologies for any redundant recommendations.

      Data and code are now available in at the OSF site which has been made public without requiring request.

      (18) Please consider a clearer and more specific reference to supplementary materials. Currently, the reader is required to search through 4 separate supplementary files to identify what is being discussed/referenced in the text (e.g. Page 18, final line: "see Supplementary Materials" could simply be "see Figure S1").

      We have added specific page numbers in references to the Supplementary Materials.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes a novel magnetic steering technique to target human adipose derived mesenchymal stem cells (hAMSC) or induce pluripotent stem cells to the TM (iPSC-TM). The authors show that delivery of the stem cells lowered IOP, increased outflow facility, and increased TM cellularity.

      Strengths:

      The technique is novel and shows promise as a novel therapeutic to lower IOP in glaucoma. hAMSC are able to lower IOP below the baseline as well as increase outflow facility above baseline with no tumorigenicity. These data will have a positive impact on the field and will guide further research using hAMSC in glaucoma models.

      Weaknesses:

      The transgenic mouse model of glaucoma the authors used did not show ocular hypertensive phenotypes at 6-7 months of age as previously reported. Therefore, if there is no pathology in these animals the authors did not show a restoration of function, but rather a decrease in pressure below normal IOP.

      We appreciate the reviewer’s feedback and agree with the statement of weakness. Accordingly, we have revised the language to improve clarity. Specifically, all references to "restoration of IOP" or "restoration of conventional outflow function" have been replaced with more precise phrases, in the following locations: 

      • lines 2-3 (title): Magnetically steered cell therapy for reduction of intraocular pressure  as a treatment strategy for open-angle glaucoma

      • lines 36-8 (abstract): We observed a 4.5 [3.1, 6.0] mmHg or 27% reduction in intraocular pressure (IOP) for nine months after a single dose of only 1500 magnetically-steered hAMSCs, explained by increased conventional outflow facility and associated with higher TM cellularity.

      • lines 45-6 (one-sentence summary): A novel magnetic cell therapy provided effective intraocular pressure reduction in mice, motivating future translational studies.

      • lines 123-4 (introduction): Despite the absence of ocular hypertension in our MYOC<sup>Y437H</sup> mice, our data demonstrate sustained IOP lowering and a significant benefit of magnetic cell steering in the eye, particularly for hAMSCs, strongly indicating further translational potential.

      • line 207 (results): The observed reductions in IOP and increases in outflow facility after delivery of both cell types suggested functional changes in the conventional outflow pathway.

      • line 509-10 (discussion): In summary, this work shows the effectiveness of our novel magnetic TM cell therapy approach for long-term IOP reduction through functional changes in the conventional outflow pathway.

      It is very important to note that at the 23rd annual Trabecular Meshwork Study Club meeting (San Diego, December 2024), Dr. Zode, the lead author of reference 26 originally describing the transgenic myocilin mouse model, announced during his talk that this model no longer demonstrates the glaucomatous phenotype in his hands, which incidentally has motivated him to create a new, CRISPR MYOC mouse model. Dr. Zode also stated that he was uncertain of the reason for this loss of phenotype. His observation is consistent with our report. However, other investigators continue to observe the desired phenotype in their colonies of this mouse (Dr. Wei Zhu, personal communication). Continued use of this mouse model should therefore be approached with caution. 

      Reviewer #2 (Public review):

      Summary:

      This observational study investigates the efficacy of intracameral injected human stem cells as a means to re-functionalize the trabecular meshwork for the restoration of intraocular pressure homeostasis. Using a murine model of glaucoma, human adiposederived mesenchymal stem cells are shown to be biologically safer and functionally superior at eliciting a sustained reduction in intraocular pressure (IOP). The authors conclude that the use of human adipose-derived mesenchymal stem cells has the potential for long-term treatment of ocular hypertension in glaucoma.

      Strengths:

      A noted strength is the use of a magnetic steering technique to direct injected stem cells to the iridocorneal angle. An additional strength is the comparison of efficacy between two distinct sources of stem cells: human adipose-derived mesenchymal vs. induced pluripotent cell derivatives. Utilizing both in vivo and ex vivo methodology coupled with histological evidence of introduced stem cell localization provides a consistent and compelling argument for a sustainable impact exogenous stem cells may have on the refunctionalization of a pathologically compromised TM.

      Weaknesses:

      A noted weakness of the study, as pointed out by the authors, includes the unanticipated failure of the genetic model to develop glaucoma-related pathology (elevated IOP, TM cell changes). While this is most unfortunate, it does temper the conclusion that exogenous human adipose derived mesenchymal stem cells may restore TM cell function. Given that TM cell function was not altered in their genetic model, it is difficult to say with any certainty that the introduced stem cells would be capable of restoring pathologically altered TM function. A restoration effect remains to be seen. 

      We acknowledge that the phrase “restoration of TM function” is not fully supported by our results, given the absence of ocular hypertension in our animal model. Accordingly, we have revised the language to more precisely describe our findings. For specific details regarding these changes, please refer to our response to Reviewer 1’s public comments above.

      Another noted complication to these findings is the observation that sham intracameralinjected saline control animals all showed elevated IOP and reduced outflow facility, compared to WT or Tg untreated animals, which allowed for more robust statistically significant outcomes. Additional comments/concerns that the authors may wish to address are elaborated in the Private Review section.

      We agree that sham-injected animals tended to have higher average IOPs than transgenic animals in our study. However, these differences did not reach statistical significance and therefore remain inconclusive. Further, an increase in IOP following placebo injection has been previously reported (Zhu et al., 2016). 

      Prompted by the Referee’s comments and also a private comment from Referee 1, we further investigated this effect by analyzing IOP in uninjected contralateral eyes at the mid-term time point and comparing the IOPs in these eyes to other cohorts, as now presented as additional data in Supplementary Tables 1 and 2 and Supplementary Figure 4 (see below). In brief, the uninjected contralateral transgenic eyes (10 months old) showed an IOP of 16.5 [15.9, 17.1] mmHg, which was intermediate between the IOP levels of the 6–7-month-old Tg group (15.4 [14.7, 16.1] mmHg) and the sham group (16.9 [15.5, 18.2] mmHg). However, none of these differences reached statistical significance. Additionally, we cannot rule out potential contralateral effects induced by the injections.

      Regarding the best way to assess the effect of cell treatment, we feel very strongly that the most relevant IOP comparison is between cell-injected eyes and control (vehicle)-injected eyes, since this provides the most direct accounting for the effects of injection itself on IOP. Other comparisons, such as WT or untreated Tg eyes vs. cell-treated eyes, are interesting but harder to interpret. However, in response to the referee’s comment, we have added comparisons between cell-treated groups and untreated Tg eyes to Table 2, adjusting the post-hoc corrections accordingly. All hAMSC treated groups show statistically significant decrease in IOP even compared to Tg untreated eyes, while iPSC-TMs fail to reach such significance.

      The following changes were made to the manuscript:

      Lines 326 et seq.: Eyes subjected to saline injection exhibited marginally higher IOPs and lower outflow facilities on average, in comparison to the transgenic animals at baseline. However, due to the lack of statistical significance in these differences and the inherent age difference between the saline-injected animals and the non-injected controls at baseline, no conclusive inference can be drawn regarding the effect of saline injection. To investigate this phenomenon further, we also analyzed IOPs in uninjected contralateral eyes at the midterm time point (Supplementary Tables 1 and 2, Supplementary Figure 4). The uninjected contralateral transgenic eyes (10 months old) showed an IOP of 16.5 [15.9, 17.1] mmHg, which was intermediate between the IOP levels of the 6–7-month-old Tg group (15.4 [14.7, 16.1] mmHg) and the sham-injected group (16.9 [15.5, 18.2] mmHg). However, none of these differences reached statistical significance. Of note, contralateral hypertension has been previously reported after subconjunctival and periocular injection of dexamethasoneloaded nanoparticles (34), and we similarly cannot definitively rule out potential contralateral effects induced by our stem cell injections. Thus, we cannot draw any definite conclusions from these additional IOP comparisons at this time.

      Reviewer #3 (Public review):

      Summary:

      The purpose of the current manuscript was to investigate a magnetic cell steering technique for efficiency and tissue-specific targeting, using two types of stem cells, in a mouse model of glaucoma. As the authors point out, trabecular meshwork (TM) cell therapy is an active area of research for treating elevated intraocular pressure as observed in glaucoma. Thus, further studies determining the ideal cell choice for TM cell therapy is warranted. The experimental protocol of the manuscript involved the injection of either human adipose derived mesenchymal stem cells (hAMSCs) or induced pluripotent cell derivatives (iPSC-TM cells) into a previously reported mouse glaucoma model, the transgenic MYOCY437H mice and wild-type littermates followed by the magnetic cell steering. Numerous outcome measures were assessed and quantified including IOP, outflow facility, TM cellularity, retention of stem cells, and the inner wall BM of Schlemm's canal.

      Strengths:

      All of these analyses were carefully carried out and appropriate statistical methods were employed. The study has clearly shown that the hAMSCs are the cells of choice over the iPSC-TM cells, the latter of which caused tumors in the anterior chamber. The hAMSCs were shown to be retained in the anterior segment over time and this resulted in increased cellular density in the TM region and a reduction in IOP and outflow facility. These are all interesting findings and there is substantial data to support it.

      Weaknesses:

      However, where the study falls short is in the MYOCY437H mouse model of glaucoma that was employed. The authors clearly state that a major limitation of the study is that this model, in their hands, did not exhibit glaucomatous features as previously reported, such as a significant increase in IOP, which was part of the overall purpose of the study. The authors state that it is possible that "the transgene was silenced in the original breeders". The authors did not show PCR, western blot, or immuno of angle tissue of the tg to determine transgenic expression (increased expression of MYOC was shown in the angle tissue of the transgenics in the original paper by Zode et al, 2011). This should be investigated given that these mice were rederived. Thus, it is clearly possible that these are not transgenic mice.

      All MYOC mice that were used in this study were genotyped and confirmed to carry the transgene as noted in the original version of the paper (see lines 590-2). However, the transgene seems not to have been active, based on the lack of ocular hypertension as well as the lack of differences in supporting endpoints such as outflow facility and TM cellularity. While it would have been possible to carry out their recommended assays to investigate the root cause of this loss of phenotype this was not an objective of our study. Thus we instead here focus simply on communicating the observed loss of phenotype to readers. We also refer the referee to the final paragraph of our response to Referee 1. 

      If indeed they are transgenics, the authors may want to consider the fact that in the Zode paper, the most significant IOP elevation in the mutant mice was observed at night and thus this could be examined by the authors. 

      This is a good point. However, while the dark-phase IOP does exhibit a distinctly larger elevation (as previously observed in hypertonic saline sclerosis), Zode et al. also reported a notable 3 mmHg IOP increase during the light phase. The complete absence of such daytime (light phase) IOP elevation in our animals diminished our enthusiasm for pursuing darkphase IOP measurements. 

      Other glaucomatous features of these mice could also have been investigated such as loss of RGCs, to further determine their transgenic phenotype. 

      We agree that these other phenotypes could be studied, but in the absence of any detectable IOP elevation (and thus lack of mechanical insult on RGC axons), loss of RGC is extremely unlikely. We also note that the loss of retinal ganglion cells (RGCs) in the Myocilin model remains a subject of controversy. For example, despite a significant increase in IOP (>10 mmHg) in this model across four mouse strains, three, including C57BL6/J, did not exhibit any signs of optic nerve damage (McDowell et al., 2012). In contrast, Zhu et al. observed considerable nerve damage in this model, which was reversed following iPSC-TM cell transplantation (Zhu et al., 2016). Given these conflicting findings, we directed our efforts toward outcome measures directly related to aqueous humor dynamics.

      Finally, while increased cellular density in the TM region was observed, proliferative markers could be employed to determine if the transplanted cells are proliferating.

      We agree that identifying the source of the increased trabecular meshwork (TM) cellularity we observed is interesting and we plan to pursue that in future studies. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The sham-injected transgenic animals showed elevated IOP 3-4 weeks after the baseline measurements in the transgenic mice. The authors justify this may be due to the increase in age in these animals. However, this seems unlikely due to the short duration of time between measurement of the baseline IOP and the Short time point (3-4 weeks). The authors do not provide IOP data for any WT sham injected eyes or naïve Tg eyes at these time points. These data are essential to determine if the elevation is due to the sham injection, age, or the transgene. Could it be that the IOP in this cohort of Tg mice didn't increase until 7-8 months of age instead of 6-7 months of age? The methods state only unilateral injections of the stem cells were done so it is assumed the contralateral eye was uninjected. What was the IOP in these eyes? These data would clarify the confusion in the data from sham-injected animals compared to baseline (naive) measurements.

      We agree that the average IOP in saline-injected groups is higher than in WT or non-treated Tg mice, although the difference is inconclusive due to a lack of statistical significance. It is important to note, however, that this difference is subtle and not comparable to the 3 mmHg light-phase IOP elevation previously observed in this model (Zode et al., 2011). 

      We appreciate the reviewer’s suggestion to include IOP data from the contralateral uninjected eyes, and we have now provided this information along with the comparative statistics in the supplementary materials. Additional details can be found in our response to a similar comment from Reviewer 2’s public review. In summary, the IOP difference in contralateral non-injected ten-month-old transgenic eyes was even smaller than in the original Tg group. IOP elevation following saline injection in mice has been reported previously (Zhu et al., 2016). As a potential confounding factor, we highlight possible contralateral effects of the injection itself (which is why we initially did not analyze IOP in the contralateral eyes).

      The hAMSC-treated eyes appear to lower IOP even from baseline (although stats were only provided compared to the sham-injected eyes, which as stated above appear to have increased).

      However, the iPSC-TM-treated eyes had IOPs equal to that of the baseline measurements taken 3 weeks prior. The significance is coming from the "sham-treated" eyes which had elevated IOPs. The controls listed above should be included to make these conclusions.

      The reviewer makes an astute observation. Please refer to our response to a similar observation by Reviewer 2 under public reviews, where we provide and discuss the comparative statistics noted by the reviewer. However, we feel very strongly that the most relevant IOP comparison is between cell-injected eyes and control-injected eyes. 

      If the transgenic mouse model truly did not have a phenotype, then the authors are testing the ability of the stem cells to lower IOP from baseline normal pressures. Therefore, the authors are not "restoring function of the conventional outflow pathway" as there is no damage to begin with. The language in the manuscript should be corrected to reflect this if the transgenics have no phenotype.

      We agree and have adjusted the language accordingly. For further details, please refer to our response to your public review.

      The authors noted in the iPSC-TM-treated eyes there was a high rate of tumorigenicity. If the magnetic steering of these cells is specific and targeted to the TM, why do the tumors form near the central iris?

      While magnetic steering is more specific to the trabecular meshwork (TM) than previouslyused approaches (Bahrani Fard et al., 2023), it is not perfect, and a modest amount of offtarget delivery to the iris, including its central portion, still occurs. Apparently, it took only a few mis-directed iPSC-TM cells to lead to tumors in this work, which is a serious concern for future translational approaches. 

      Reviewer #2 (Recommendations for the authors):

      (1) It appears that mice were injected unilaterally (Line 590). I may have missed this, but was the companion un-injected eye analyzed in this study? If not analyzed, was there a confounding concern or limitation that necessitated omitting this possible control option?

      Contralateral effects, such as hypertension in the untreated eye after subconjunctival and periocular injection of dexamethasone-loaded nanoparticles, have previously been reported in the literature (Li et al., 2019) and also reported anecdotally by other leaders in the field to the senior authors, which is why we did not initially analyze contralateral eyes in this study. However, prompted by this comment and others, we have now included the IOP measurements for contralateral uninjected ten-month-old transgenic eyes in the supplementary materials. For further details, please refer to our response to your public review.

      (2) Were all these mice the same gender? Would gender be expected to alter the findings of this study?

      Animals of both sexes were randomly chosen and included in the study. We added the following statement to the Materials and Methods section (line 530): After breeding and genotyping, mice, regardless of sex, were maintained to age 6-7 months, when transgenic animals were expected to have developed a POAG phenotype.

      (3) As noted in the public review, the use of PBS for a control seems to have resulted in a slight elevation in IOP (Figure 2) as well as a reduction in outflow facility (Figure 3B) when compared to WT or Tg mice. Was this difference statistically significant? 

      The differences between the sham (saline)-injected groups at any time point and untreated Tg mice did not reach statistical significance for IOP, facility, or TM cellularity and for facility, did not even show clear trends. For example, WT mice had, on average, 0.2 mmHg higher IOP and 0.6 nl/min/mmHg greater facility than the Tg group. Meanwhile on a similar scale, the long-term sham group exhibited 0.4 nl/min/mmHg higher facility compared to the Tg group. As the statistical tests indicate, these differences should be interpreted more as noise than meaningful signal. 

      If so, then it should be noted as to whether the observed decrease in IOP following stem cell injection remained statistically significant when compared to these un-injected control animals. If significance was lost, then this should be appropriately noted and discussed. It is not apparently obvious why sham controls should have elevated IOP. This is a design and statistical concern.

      Please refer to our response to a similar observation by Reviewer 1. We believe that comparing the treatment (cell suspension in saline) with its age-matched vehicle (saline) is the appropriate approach which maintains rigor by most directly accounting for the effects of injection. 

      (4) The tonicity of the PBS used as a vehicle control was not stated and I did not see within the methods whether the stem cells were suspended using this same PBS vehicle. I assume isotonic phosphate buffered saline was used and that the stem cells were resuspended using the same sterile PBS. 

      Thanks for catching this. We added “sterile PBS (1X, Thermo Fisher Scientific, Waltham, MA)” to the Methods section of the manuscript (line 567). 

      With regards to using PBS as an injection control, I wonder if a better comparable control might have been to use mesenchymal stem cells that were rendered incapable of proliferating prior to intracameral injection. This, of course, addresses the unexplained mechanism(s) by which mesenchymal stem cells elicit a decrease in IOP.

      This is an interesting idea, and represents another level of control. However, we explicitly chose not to use non-proliferating hAMSCs as a control, for several reasons. Firstly, a saline injection is the simplest control and in this initial study with multiple groups, we did not feel another experimental group should be added. Second, this control would not rule out paracrine effects from injected cells, which our data suggested are an important effect. Third, rendering injected cells truly non-proliferative could introduce unwanted/unknown phenotypes in these cells that would need to be carefully characterized. That being said, if an efficient method could be developed to render an entire population of these cells irreversibly non-proliferating, the reviewer’s suggestion would be worth pursuing to better understand the mechanism of TM cell therapies. 

      (5) As noted in Figure 4C, TM cellular density as quantified was not altered in the sham control, so a loss of cellular density can not explain the elevated IOP with this group. Injecting viable (not determined?) mesenchymal stem cells did show, over the short term, a noted increase in TM cellular density. 

      Thank you for noting this. We agree that changes in cell density do not explain the mild IOP elevation in the sham group. As the referee certainly is aware, there are multiple reasons that IOP can be elevated (changes in trabecular meshwork extracellular matrix, changes in trabecular meshwork stiffness) that are not necessarily related to cell density.  Since we do not know definitively the cause of this mild elevation, we would prefer to not speculate about it in the manuscript. 

      Thanks for pointing out our omission of a statement about injected cell viability. We have now included the following statement in the Materials and Methods section (564-566): “For all the experiments where animals received hAMSC, cell count and >90% viability was verified using a Countess II Automated Cell Counter (Thermo Fisher Scientific, Waltham, MA).”

      I'm confused, as clearly stated (Lines 431-432), mesenchymal stem cells accumulated close to, but not within, the TM. How is it that TM cellular density increased if these stem cells did not enter the TM? The authors may wish to clarify this distinction. Given that mesenchymal stem cells did not increase the risk of tumorigenicity, do the authors have any evidence that these cells actually proliferated post-injection or did they undergo senesce thereby displaying senescence-associated secretory phenotype as a source of paracrine support?

      As the reviewer correctly noted, our observations show that hAMSCs primarily accumulated close to, but outside, the TM (likely caught up in the pectinate ligaments). Based on observations of increased TM cellularity, we think that the most likely explanation of these findings is paracrine signaling, as the reviewer suggests and which was discussed at length in the original version of the manuscript (lines 453-477). 

      We agree that, despite observing little signal from hAMSCs within the TM, labeling with proliferation markers (e.g., Ki-67) and searching for co-localization with exogenous cells, and/or labeling for senescence markers would have provided more mechanistic information. This is an excellent topic for future study, which we plan to pursue, but was outside the scope of this study. 

      (6) As noted in the public review, I think it is a bit of a stretch to even suggest that the findings of this study support stem cell restoration of TM function given that the model apparently did not produce TM cell dysfunction as anticipated. A restoration effect remains to be seen.

      We agree and have adjusted the language accordingly. For further details, please refer to our response to Reviewer 1’s public comment.

      Reviewer #3 (Recommendations for the authors):

      (1) Show PCR, western blot, or immuno of angle tissue of the MYOC tg to confirm transgenic expression.

      (2) Examine the IOP of mice at night.

      (3) Investigate other glaucomatous features in the mice to determine if they have any of the transgenic phenotypes previously reported.

      (4) Examine proliferative markers in the TM region of angles injected with stem cells.

      Please see our responses to all four of these comments in the public section.

      Bibliography (for this response letter only)

      Bahrani Fard, M.R., Chan, J., Sanchez Rodriguez, G., Yonk, M., Kuturu, S.R., Read, A.T., Emelianov, S.Y., Kuehn, M.H., Ethier, C.R., 2023. Improved magnetic delivery of cells to the trabecular meshwork in mice. Exp. Eye Res. 234, 109602. https://doi.org/10.1016/j.exer.2023.109602

      Li, G., Lee, C., Agrahari, V., Wang, K., Navarro, I., Sherwood, J.M., Crews, K., Farsiu, S., Gonzalez, P., Lin, C.-W., Mitra, A.K., Ethier, C.R., Stamer, W.D., 2019. In vivo measurement of trabecular meshwork stiffness in a corticosteroid-induced ocular hypertensive mouse model. Proc. Natl. Acad. Sci. U. S. A. 116, 1714–1722.

      https://doi.org/10.1073/pnas.1814889116

      Zhu, W., Gramlich, O.W., Laboissonniere, L., Jain, A., Sheffield, V.C., Trimarchi, J.M., Tucker, B.A., Kuehn, M.H., 2016. Transplantation of iPSC-derived TM cells rescues glaucoma phenotypes in vivo. Proc. Natl. Acad. Sci. 113, E3492–E3500.

      Zode, G.S., Kuehn, M.H., Nishimura, D.Y., Searby, C.C., Mohan, K., Grozdanic, S.D., Bugge, K., Anderson, M.G., Clark, A.F., Stone, E.M., Sheffield, V.C., 2011. Reduction of ER stress via a chemical chaperone prevents disease phenotypes in a mouse model of primary open angle glaucoma. J. Clin. Invest. 121, 3542–3553. https://doi.org/10.1172/JCI58183

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1:

      The authors attempted to replicate previous work showing that counterconditioning leads to more persistent reduction of threat responses, relative to extinction. They also aimed to examine the neural mechanisms underlying counterconditioning and extinction. They achieved both of these aims and were able to provide some additional information, such as how counterconditioning impacts memory consolidation. Having a better understanding of which neural networks are engaged during counterconditioning may provide novel pharmacological targets to aid in therapies for traumatic memories. It will be interesting to follow up by examining the impact of varying amounts of time between acquisition and counterconditioning phases, to enhance replicability to real-world therapeutic settings.

      Major strengths

      · This paper is very well written and attempts to comprehensively assess multiple aspects of counterconditioning and extinction processes. For instance, the addition of memory retrieval tests is not core to the primary hypotheses but provides additional mechanistic information on how episodic memory is impacted by counterconditioning. This methodical approach is commonly seen in animal literature, but less so in human studies.

      · The Group x Cs-type x Phase repeated measure statistical tests with 'differentials' as outcome variables are quite complex, however, the authors have generally done a good job of teasing out significant F test findings with post hoc tests and presenting the data well visually. It is reassuring that there is a convergence between self-report data on arousal and valence and the pupil dilation response. Skin conductance is a notoriously challenging modality, so it is not too concerning that this was placed in the supplementary materials. Neural responses also occurred in logical regions with regard to reward learning.

      · Strong methodology with regards to neuroimaging analysis, and physiological measures.

      ·The authors are very clear on documenting where there were discrepancies from their pre-registration and providing valid rationales for why.

      We thank reviewer 1 for the positive feedback and for pointing out the strengths of our work. We agree that future research should investigate varying times between acquisition and counterconditioning to assess its success in real-life applications.

      Major Weaknesses

      (1) The statistics showing that counterconditioning prevents differential spontaneous recovery are the weakest p values of the paper (and using one-tailed tests, although this is valid due to directions being pre-hypothesized). This may be due to a relatively small number of participants and some variability in responses. It is difficult to see how many people were included in the final PDR and neuroimaging analyses, with exclusions not clearly documented. Based on Figure 3, there are relatively small numbers in the PDR analyses (n=14 and n=12 in counterconditioning and extinction, respectively). Of these, each group had 4 people with differential PDR results in the opposing direction to the group mean. This perhaps warrants mention as the reported effects may not hold in a subgroup of individuals, which could have clinical implications.

      General exclusion criteria are described on page 17. We have added more detailed information on the reasons for exclusion (see page 17). All exclusions were in line with pre-registered criteria. For the analysis, the reviewer is referring to (PDR analysis that investigated whether CC can prevent the spontaneous recovery of differential conditioned threat responses), 18 participants were excluded from this analysis: 2 participants did not show evidence for successful threat acquisition as was already indicated on page 17, and 16 participants were excluded due to (partially) missing data. We now explicitly mention the exclusion of the additional 16 participants on page 7 and have updated Figure 3 to improve visibility of the individual data points. Therefore, for this analysis both experimental groups consisted of 15 participants (total N=30).

      It is true that in both groups a few participants show the opposite pattern. Although this may also be due to measurement error, we agree that it is relevant to further investigate this in future studies with larger sample sizes. It will be crucial to identify who will respond to treatments based on the principles of standard extinction or counterconditioning. We have added this point in the discussion on page 14.

      Reviewer #2:

      Summary:

      The present study sets out to examine the impact of counterconditioning (CC) and extinction on conditioned threat responses in humans, particularly looking at neural mechanisms involved in threat memory suppression. By combining behavioral, physiological, and neuroimaging (fMRI) data, the authors aim to provide a clear picture of how CC might engage unique neural circuits and coding dynamics, potentially offering a more robust reduction in threat responses compared to traditional extinction.

      Strengths:

      One major strength of this work lies in its thoughtful and unique design - integrating subjective, physiological, and neuroimaging measures to capture the various aspects of counterconditioning (CC) in humans. Additionally, the study is centered on a well-motivated hypothesis and the findings have the potential to improve the current understanding of pathways associated with emotional and cognitive control. The data presentation is systematic, and the results on behavioral and physiological measures fit well with the hypothesized outcomes. The neuroimaging results also provide strong support for distinct neural mechanisms underlying CC versus extinction.

      We thank reviewer 2 for the feedback and for valuing the thoughtfulness that went into designing the study.

      Weaknesses:

      (1) Overall, this study is a well-conducted and thought-provoking investigation into counterconditioning, with strong potential to advance our understanding of threat modulation mechanisms. Two main weaknesses concern the scope and decisions regarding analysis choices. First, while the findings are solid, the topic of counterconditioning is relatively niche and may have limited appeal to a broader audience. Expanding the discussion to connect counterconditioning more explicitly to widely studied frameworks in emotional regulation or cognitive control would enhance the paper's accessibility and relevance to a wider range of readers. This broader framing could also underscore the generalizability and broader significance of the results. In addition, detailed steps in the statistical procedures and analysis parameters seem to be missing. This makes it challenging for readers to interpret the results in light of potential limitations given the data modality and/or analysis choices.

      In this updated version of the manuscript, we included the notion that extinction has been interpreted as a form of implicit emotion regulation. In addition to our discussion on active coping (avoidance), we believe that our discussion has an important link to the more general framework of emotion regulation, while remaining within the scope of relevance. Please see pages 14 and 15 for the changes. In addition to being informative to theories of emotion regulation, our findings are also highly relevant for forms of psychotherapy that build on principles of counterconditioning (e.g. the use of positive reinforcement in cognitive behavioral therapy), as we point out in the introduction. We believe this relevance shows that counterconditioning is more than a niche topic. In line with the recommendation from reviewer 2, we added more details and explanations to the statistical procedures and analyses where needed (see responses to recommendations).

      Reviewer #3:

      Summary:

      In this manuscript, Wirz et al use neuroimaging (fMRI) to show that counterconditioning produces a longer lasting reduction in fear conditioning relative to extinction and appears to rely on the nucleus accumbens rather than the ventromedial prefrontal cortex. These important findings are supported by convincing evidence and will be of interest to researchers across multiple subfields, including neuroscientists, cognitive theory researchers, and clinicians.

      In large part, the authors achieved their aims of giving a qualitative assessment of the behavioural mechanisms of counterconditioning versus extinction, as well as investigating the brain mechanisms. The results support their conclusions and give interesting insights into the psychological and neurobiological mechanisms of the processes that underlie the unlearning, or counteracting, of threat conditioning.

      Strengths:

      · Mostly clearly written with interesting psychological insights

      · Excellent behavioural design, well-controlled and tests for a number of different psychological phenomena (e.g. extinction, recovery, reinstatement, etc).

      · Very interesting results regarding the neural mechanisms of each process.

      · Good acknowledgement of the limitations of the study.

      We thank reviewer 3 for the detailed feedback and suggestions.

      Weaknesses:

      (1) I think the acquisition data belongs in the main figure, so the reader can discern whether or not there are directional differences prior to CC and extinction training that could account for the differences observed. This is particularly important for the valence data which appears to differ at baseline (supplemental figure 2C).

      Since our design is quite complex with a lot of results, we left the fear acquisition results as a successful manipulation check in the Supplementary Information to not overload the reader with information that is not the main focus of this manuscript. If the editor would like us to add the figure to the main text, we are happy to do so. During fear acquisition, both experimental groups showed comparable differential conditioned threat responses as measured by PDRs and SCRs. Subjective valence ratings indeed differed depending on CS category. Importantly, however, the groups only differed with respect to their rating to the CS- category, but not the CS+ category, which suggests that the strength of the acquired fear is similar between the groups. To make sure that these baseline differences cannot account for the differences in valence after CC/Ext, we ran an additional group comparison with differential valence ratings after fear acquisition added as a covariate. Results show that despite the baseline difference, the group difference in valence after CC/Ext is still significant (main effect Group: F<sub>(1,43)</sub>=7.364, p=0.010, η<sup>2</sup>=0.146). We have added this analysis to the manuscript (see page 7).

      (2) I was confused in several sections about the chronology of what was done and when. For instance, it appears that individuals went through re-extinction, but this is just called extinction in places.

      We understand that the complexity of the design may require a clearer description. We therefore made some changes throughout the manuscript to improve understanding. Figure 1 is very helpful in understanding the design and we therefore refer to that figure more regularly (see pages 6-7). We also added the time between tasks where appropriate (e.g. see page 7). Re-extinction after reinstatement was indeed mentioned once in the manuscript. Given that the reinstatement procedure was not successful (see page 9), we could not investigate re-extinction and it is therefore indeed not relevant to explicitly mention and may cause confusion. We therefore removed it (see page 12).

      (3) I was also confused about the data in Figure 3. It appears that the CC group maintained differential pupil dilation during CC, whereas extinction participants didn't, and the authors suggest that this is indicative of the anticipation of reward. Do reward-associated cues typically cause pupil dilation? Is this a general arousal response? If so, does this mean that the CSs become equally arousing over time for the CC group whereas the opposite occurs for the extinction group (i.e. Figure 3, bottom graphs)? It is then further confusing as to why the CC group lose differential responding on the spontaneous recovery test. I'm not sure this was adequately addressed.

      Indeed, reward and reward anticipation also evoke an increase in pupil dilation. This was an important reason for including a separate valence-specific response characterization task. Independently from the conditioning task, this task revealed that both threat and reward-anticipation induced strong arousal-related PDRs and SCRs. This was also reflected in the explicit arousal ratings, which were stronger for both the shock-reinforced (negative valence) and reward-reinforced (positive valence) stimuli. Therefore, it is not surprising that reward anticipation leads to stronger PDRs for CS+ (which predict reward) compared to CS- stimuli (which do not predict reward) during CC, but is reduced during extinction due to a decrease in shock anticipation. During the spontaneous recovery test, a return of stronger PDRs for CS+ compared to CS- stimuli in the standard extinction group can only reflect a return of shock anticipation. Importantly, the CC group received no rewards during the spontaneous recovery task and was aware of this, so it is to be expected that the effect is weakened in the CC group. However, CS+ and CS- items were still rated of similar valence and PDRs did not differ between CS+ and CS- items in the CC group, whereas the Ext group rated the CS+ significantly more negative and threat responses to the CS+ did return. It therefore is reasonable to conclude that associating the CS+ with reward helps to prevent a return of threat responses. We have added some clarifications and conclusions to this section on page 8.

      (4) I am not sure that the memories tested were truly episodic

      In line with previous publications from Dunsmoor et al.[1-4], our task allows for the investigation of memory for elements of a specific episode. In the example of our task, retrieval of a picture probes retrieval of the specific episode, in which the picture was presented. In contrast, fear retrieval relies on the retrieval of the category-threat association, which does not rely on retrieval of these specific episodic elements, but could be semantic in nature, as retrieval takes place at a conceptual level. We have added a small note on what we mean with episodic in this context on page 4. We do agree that we cannot investigate other aspects of episodic memories here, such as context, as this was not manipulated in this experiment.

      (5) Twice as many female participants than males

      It is indeed unfortunate that there is no equal distribution between female and male participants. Investigating sex differences was not the goal of this study, but we do hope that future studies with the appropriate sample sizes are able to investigate this specifically. We have added this to the limitations of this study on page 17.

      (6) No explanation as to why shocks were varied in intensity and how (pseudo-randomly?)

      The shock determination procedure is explained on pages 18-19 (Peripheral stimulation). As is common in fear conditioning studies in humans (see references), an ascending staircase procedure was used. The goal of this procedure is to try and equalize the subjective experience of the electrical shocks to be “maximally uncomfortable but not painful”.

      Recommendations for the authors:

      Reviewer #1:

      Very well written. No additional comments

      We thank reviewer 1 for valuing our original manuscript version. To further improve the manuscript, we adapted the current version based on the reviewer’s public review (see response to reviewer #1 public review comment 1).

      Reviewer #2:

      (1) I feel that more justification/explanation is needed on why other regions highly relevant to different aspects of counterconditioning (e.g., threat, memory, reward processing) were not included in the analyses.

      We first performed whole-brain analyses to get a general idea of the different neural mechanisms of CC compared to Ext. Clusters revealing significant group differences were then further investigated by means of preregistered ROI analyses. We included regions that have previously been shown to be most relevant for affective processing/threat responding (amygdala), memory (hippocampus), reward processing (NAcc) and regular extinction (vmPFC). We restricted our analyses to these most relevant ROIs as preregistered to prevent inflated or false-positive findings[5]. Beyond these preregistered ROIs, we applied appropriate whole-brain FEW corrections. The activated regions are listed in Supplementary Table 1 and include additional regions that were expected, such as the ACC and insula.

      (2) Were there observed differences across participants in the experiment? Any information on variance in the data such as how individual differences might influence these findings would provide a richer understanding of counterconditioning and increase the depth of interpretation for a broad readership.

      We agree that investigating individual differences is crucial to gain a better understanding of treatment efficacy in the framework of personalized medicine. Specifically, future research should aim to identify factors that help predict which treatment will be most effective for a particular patient. The results of this study provide a good basis for this, as we could show that the vmPFC in contrast to regular extinction, is not required in CC to improve the retention of safety memory. Therefore, this provides a viable option for patients who are not responding to treatments that rely on the vmPFC. In addition, as noted by Reviewer 1, in both groups a few participants show the opposite pattern (see Figure 3). It will be crucial to identify who will respond to treatments based on the principles of standard extinction or counterconditioning. We have added this point in the discussion on page 14.

      (3) While most figures are informative and clear, Figure 3 would benefit from detailed axis labels and a more descriptive caption. Currently, it is challenging to navigate the results presented to support the findings related to differential PDRs. A supplementary figure consolidating key patterns across conditions might also further facilitate understanding of this rather complicated result.

      We have made some changes to the figure to improve readability and understanding. Specifically, we changed the figure caption to “Change from last 2 trials CC/Ext to first 2 trials Spontaneous recovery test”, to give more details on what exactly is shown here. We also simplified the x-axis labels to “counterconditioning”, “recovery test” and “extinction”. With the addition of a clearer figure description, we hope to have improved understanding and do not think that another supplemental figure is needed.

      (4) Additional details on the statistical tests are needed. For example, please clarify whether p-values reported were corrected across all experimental conditions. Also, it would be helpful for the authors to discuss why for example repeated measures ANOVA or mixed-effects conditions were not used in this study. Might those tests not capture variance across participants' PDRs and SCRs over time better?

      We added that significant interactions were followed by Bonferroni-adjusted post-hoc tests where applicable (see page 21). We have used repeated measures ANOVAs to capture early versus late phases of acquisition and CC/extinction, as well as to compare late CC/extinction (last 2 trials) compared to early spontaneous recovery (first 2 trials) as is often done in the literature. A trial-level factor in a small sample would cost too many degrees of freedom and is not expected to provide more information. We have added this information and our reasoning to the methods section on page 21.

      Reviewer #3:

      (1) Suggest putting acquisition data into the main figures. In fact many of the supplemental figures could be integrated into the main figures in my opinion.

      See response to reviewer #3 public review comment 1.

      (2) Include explanations for why shock intensity was varied

      See response to reviewer #3 public review comment 6.

      (3) Include a better explanation for the change in differential responding from training to spontaneous recovery in the CC group (I think the loss of such responding in extinction makes more sense and is supported by the notion of spontaneous recovery, but I'm not sure about the loss in the CC group. There is some evidence from the rodent literature - which I am most familiar with - regarding a loss in contextual gradient across time which could account for some loss in specificity, could it be something like this?).

      See response to reviewer #3 public review comment 3.

      If we understand the reviewer correctly in that the we see a loss of differential responding due to a generalization to the CS-, this would imply an increase in responding to the CS-, which is not what we see. Our data should therefore be correctly interpreted as a loss of the specific response to the CS+ from the CC phase to the recovery test. Therefore, there is no spontaneous recovery in the CC group, and also not a non-specific recovery. To clarify this we relabeled Figure 3 by indicating “recovery test” instead of “spontaneous recovery”.

      (4) Is there a possibility that baseline differences, particularly that in Supplemental Figure 2C, could account for later differences? If differences persist after some transformation (e.g. percentage of baseline responding) this would be convincing to suggest that it doesn't.

      See response to reviewer #3 public review comment 1.

      (5) As I mentioned, I got confused by the chronology as I read through. Maybe mention early on when reporting the spontaneous recovery results that testing occurred the next day and that participants were undergoing re-extinction when talking about it for the second time.

      See response to reviewer #3 public review comment 2.

      (6) Page 8 - I was confused as to why it is surprising that the CC group were more aroused than the extinction group, the latter have not had CSs paired with anything with any valence, so doesn't this make sense? Or perhaps I am misunderstanding the results - here in text the authors refer back to Figure 2B, but I'm not sure if this is showing data from the spontaneous recovery test or from CC/extinction. If it is the latter, as the caption suggests, why are the authors referring to it here?

      Participants in the CC group showed increased differential self-reported arousal after CC, whereas arousal ratings did not differ between CS+ and CS- items after extinction. We interpret this in line with the valence and PDR results as an indication of reward-induced arousal. At the start of the next day, however, participants from the CC and extinction groups gave comparable ratings. It may therefore be surprising why participants in the CC group do not still show stronger ratings since nothing happened between these two ratings besides a night’s sleep (see design overview in Figure 1A). We removed the “suprisingly” to prevent any confusion.

      (7) I suggest that the authors comment on whether there were any gender differences in their results.

      See response to reviewer #3 public review comment 5.

      (8) The study makes several claims about episodic memory, but how can the authors be sure that the memories they are tapping into are episodic? Episodic has a very specific meaning - a biographical, contextually-based memory, whereas the information being encoded here could be semantic. Perhaps a bit of clarification around this issue could be helpful.

      See response to reviewer #3 public review comment 4.

      References

      (1) Dunsmoor, J. E. & Kroes, M. C. W. Episodic memory and Pavlovian conditioning: ships passing in the night. Curr Opin Behav Sci 26, 32-39 (2019). https://doi.org/10.1016/j.cobeha.2018.09.019

      (2) Dunsmoor, J. E. et al. Event segmentation protects emotional memories from competing experiences encoded close in time. Nature Human Behaviour 2, 291-299 (2018). https://doi.org/10.1038/s41562-018-0317-4

      (3) Dunsmoor, J. E., Murty, V. P., Clewett, D., Phelps, E. A. & Davachi, L. Tag and capture: how salient experiences target and rescue nearby events in memory. Trends Cogn Sci 26, 782-795 (2022). https://doi.org/10.1016/j.tics.2022.06.009

      (4) Dunsmoor, J. E., Murty, V. P., Davachi, L. & Phelps, E. A. Emotional learning selectively and retroactively strengthens memories for related events. Nature 520, 345-348 (2015). https://doi.org/10.1038/nature14106

      (5) Gentili, C., Cecchetti, L., Handjaras, G., Lettieri, G. & Cristea, I. A. The case for preregistering all region of interest (ROI) analyses in neuroimaging research. Eur J Neurosci 53, 357-361 (2021). https://doi.org/10.1111/ejn.14954

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study provides useful findings about the effects of heterozygosity for Trio variants linked to neurodevelopmental and psychiatric disorders in mice. However, the strength of the evidence is limited and incomplete mainly because the experimental flow is difficult to follow, raising concerns about the conclusions' robustness. Clearer connections between variables, such as sex, age, behavior, brain regions, and synaptic measures, and more methodological detail on breeding strategies, test timelines, electrophysiology, and analysis, are needed to support their claims.

      We appreciate the opportunity to address the constructive feedback provided by eLife and the reviewers. Below, we respond to the overall assessment and individual reviewers' comments, clarifying our experimental approach, addressing concerns, and providing additional details where necessary.

      We thank the editors for highlighting the significance of our findings regarding the effects of Trio variant heterozygosity in mice. We acknowledge the feedback concerning the experimental flow and agree that clarity is paramount. To address these concerns:

      (1) Connections between variables: The word limit of the initial submission constrained our ability to provide adequate details and connections between variables. We have revised the manuscript to explicitly outline and extend explanations and the relationships between sex, age, behavior, brain regions, and synaptic measures, ensuring that the rationale for each experiment and its relevance to the overall conclusions are improved.

      (2) Methodological details: The Methods section of our initial submission was condensed, with key details provided in the Supplemental Methods section. We have merged all into an extended section to improve clarity. We have expanded our description of breeding strategies, test timelines, electrophysiological protocols, and data analysis methods in the revised Methods section. We believe the additions have enhanced the transparency and reproducibility of our study and ensured full support of our conclusions.

      (3) Experimental flow: We have revised and extended our results, methods, and discussion sections to clarify the rationale and experimental design to guide readers through the experimental sequence and rationale.

      We are confident these revisions address the concerns raised and enhance the robustness and coherence of our findings.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study explores how heterozygosity for specific neurodevelopmental disorder-associated Trio variants affects mouse behavior, brain structure, and synaptic function, revealing distinct impacts on motor, social, and cognitive behaviors linked to clinical phenotypes. Findings demonstrate that Trio variants yield unique changes in synaptic plasticity and glutamate release, highlighting Trio's critical role in presynaptic function and the importance of examining variant heterozygosity in vivo.

      Strengths:

      This study generated multiple mouse lines to model each Trio variant, reflecting point mutations observed in human patients with developmental disorders. The authors employed various approaches to evaluate the resulting behavioral, neuronal morphology, synaptic function, and proteomic phenotypes.

      Weaknesses:

      While the authors present extensive results, the flow of experiments is challenging to follow, raising concerns about the strength of the experimental conclusions. Additionally, the connection between sex, age, behavioral data, brain regions, synaptic transmission, and plasticity lacks clarity, making it difficult to understand the rationale behind each experiment. Clearer explanations of the purpose and connections between experiments are recommended. Furthermore, the methodology requires more detail, particularly regarding mouse breeding strategies, timelines for behavioral tests, electrophysiology conditions, and data analysis procedures.

      We appreciate the reviewer’s recognition of the novelty and comprehensiveness of our approach, particularly the generation of multiple mouse lines and our efforts to model Trio variant effects in vivo.

      Weaknesses

      (1) Experimental flow and rationale and connection between variables: We have expanded on the connections between behavioral data, neuronal morphology, synaptic function, and proteomics in the Results and Discussion sections to clarify how each experiment informs the reasoning and the conclusions and to highlight the relationships between sex, age, behavior, and synaptic measures.

      (2) Methodological details: Our initial Methods section was formatted to be short to fulfill word limits on the submitted version, with additional details provided in the Supplemental Methods section. We have merged our Methods and Supplemental Methods sections and expanded on our breeding strategies, test timelines, electrophysiological protocols, and data analysis. We believe these additions enhance the transparency and reproducibility of our study.

      (3) Recommendations for the authors: We thank Reviewer #1 for providing several recommendations to improve our manuscript. We have addressed their comments in the revision, as detailed below, adding key experiments that bolster our findings.

      Reviewer #2 (Public review):

      Summary:

      The authors generated three mouse lines harboring ASD, Schizophrenia, and Bipolar-associated variants in the TRIO gene. Anatomical, behavioral, physiological, and biochemical assays were deployed to compare and contrast the impact of these mutations in these animals. In this undertaking, the authors sought to identify and characterize the cellular and molecular mechanisms responsible for ASD, Schizophrenia, and Bipolar disorder development.

      Strengths:

      The establishment of TRIO dysfunction in the development of ASD, Schizophrenia, and Bipolar disorder is very recent and of great interest. Disorder-specific variants have been identified in the TRIO gene, and this study is the first to compare and contrast the impact of these variants in vivo in preclinical models. The impact of these mutations was carefully examined using an impressive host of methods. The authors achieved their goal of identifying behavioral, physiological, and molecular alterations that are disorder/variant specific. The impact of this work is extremely high given the growing appreciation of TRIO dysfunction in a large number of brain-related disorders. This work is very interesting in that it begins to identify the unique and subtle ways brain function is altered in ASD, Schizophrenia, and Bipolar disorder.

      Weaknesses:

      (1) Most assays were performed in older animals and perhaps only capture alterations that result from homeostatic changes resulting from prodromal pathology that may look very different.

      (2) Identification of upregulated (potentially compensating) genes in response to these disorder-specific Trio variants is extremely interesting. However, a functional demonstration of compensation is not provided.

      (3) There are instances where data is not shown in the manuscript. See "data not shown". All data collected should be provided even if significant differences are not observed.

      I consider weaknesses 1 and 2 minor. While they would be very interesting to explore, these experiments might be more appropriate for a follow-up study. I would recommend that the missing data in 3 should be provided in the supplemental material.

      We are grateful for the reviewer’s recognition of our study’s significance and methodological rigor. The acknowledgment of Trio dysfunction as a novel and impactful area of research is deeply appreciated.

      Weaknesses:

      We agree that focusing on older animals limits insights into early-stage pathophysiology. However, our goal in this study was to examine the functional impacts of Trio heterozygosity at an adolescent stage and to reveal the ultimate impact of these alleles on synaptic function. Our choice of age aligns with our objectives. Future studies of earlier developmental stages will be beneficial and complement these findings.

      Functional compensation:

      We tested functional compensation through rescue experiments in +/K1431M brain slices using a Rac1-specific inhibitor, NSC23766, which prevents Rac1 activation by Trio or Tiam1. Our finding that direct Rac1 inhibition normalizes deficient neurotransmitter release in +/K1431M mice strongly suggests that increased Rac1 activity drives this phenotype.

      Data not shown:

      We will incorporate all previously shown data into the Supplemental Materials, even when results are nonsignificant. We agree that this ensures full transparency and facilitates a more comprehensive evaluation of our findings.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1K-N, the lack of observed differences in +/M2145T mice across all tests raises questions about its validity as a BPD model. Furthermore, the differences in female behavior data compared to males, as shown in the Supplemental section, lack clarification-specifically, whether these variations are due to sex differences or sample size disparities, which is not discussed. Additionally, it's unclear if the same mice were used in tests K through L-N, as the reported numbers differ without explanation; if relevant, any mortality should be reported. Given the observed body weight differences, it is important to display locomotor data, despite the mention of no change in open field results. Lastly, a detailed breeding strategy and timeline for behavioral testing would enhance clarity.

      We thank Reviewer 1 for recognizing these confusing points in our behavioral data and seek to add clarification in our Revision as below:

      (a) We have revised the text to emphasize our goal to evaluate the impact of NDD-related Trio alleles that have discrete and measurable effects on brain development and function, and not to model specific NDDs (e.g. ASD, SCZ, or BPD). The three specific Trio mutations were chosen based on strong evidence of these mutations impairing the biochemical functions of Trio. We reasoned our approach would reveal how impairing Trio in different ways – i.e. altering protein level or GEF1/GEF2 function – and under genetic conditions (heterozygosity) that mimic those found in individuals with Trio-related disorders impacts brain development and function. The lack of behavioral phenotypes in +/M2145T mice is indeed intriguing, especially given the alterations in electrophysiology and biochemistry experiments. It remains possible that further behavioral analyses of these mice will reveal behavioral phenotypes.

      (b) Given that the prevalence and clinical presentation of individuals with various NDDs are influenced by sex, it is possible that the behavioral differences we see in male versus female Trio variant mice reflect human sex difference phenotypes. We have reorganized the Figure panels to clarify these sex differences in behaviors (new Fig. 2, Supp. Fig. 2). We focused on the most significant behavioral phenotypes shared by both sexes in the main text, or in males alone, as our anatomical and electrophysiological experiments were restricted to males to reduce variation due to estrus. The observed behavioral sex differences are not likely due to sample size disparities as power analyses were performed for all experimental results to ensure adequate sample size. A comprehensive study of the mechanisms underlying these behavioral findings merits examination but is outside the scope of this study.

      (c) All mice were subjected to all behavioral tests described. No sudden mortality was observed during the behavioral experiments. Outliers in post-hoc statistical analyses were removed, which explains the apparent sample size differences between behavioral tests. We have revised the Data analysis section in our Methods to include these details (Lines 216-289, 450-457).

      (d) Results of the open field test have been added to the Supplemental Data (new Supp. Fig. 2) and Results (Lines 532-537)

      (e) The Methods section was expanded to include more detail on the breeding strategy (Lines 98-106). A timeline for behavioral testing has also been included in the Figures to enhance clarity (new Fig. 2A).

      (2) In Figure 2A-E, head width and brain weight showed significant differences, but not body weight, how come the ratio does not change? Comparing with female results in Supplementary Figure 2A-E, it does show a difference between males and females. It is essential to clarify which sex authors use in all follow-up experiments, including synapse, transmission, and plasticity. Since the males and females have different phenotypes, why do the authors focus on males only? The E plot has no data points on the bar graph. In Figure 2I, it lacks example images for all four conditions.

      We greatly appreciate this Reviewer’s attention to details in our brain and body weight data and revised the manuscript to address these concerns.

      (a) The ratios of head width/body weight were calculated for each individual mouse. Hence the distribution of the ratio data (old Fig. 2D; new Fig. 3D) differs from the distribution of head width or body weight data alone (old Fig. 2A, 2C, resp.; now Fig. 3A, 3C), and therefore can affect the p-value for statistical significance. The body weight of +/M2145T males is 21.217 ±0.327 g, while for WT males is 21.745 ±0.224 g, a non-significant decrease of 0.528 g (adjusted p=0.3806). These values have been added to the Fig 3. figure legend (Lines 1020-1034) for clarity.

      (b) Similar to the behavioral experiments in comment (1), we observed sex differences in head width, brain weight, and body weight in Trio heterozygous variant mice compared to WT counterparts. The differences in the ratios of head width/body weight or brain weight/body weight were the same for both males and females (i.e. head width/body weight ratio is decreased in +/K1431M mice compared to WT regardless of sex, and brain weight/body weight ratio is decreased in both +/K1431M and +/K1918X mice compared to WT regardless of sex). These findings affirm the impact of Trio mutations on these phenotypes across both sexes. We have modified the text to draw more attention to this key point (Lines 554-566 and 777-801).

      (c) All experiments (excluding behavior and weight data) were performed in males only to minimize the variation in spine and synapse morphology and physiological activity that can occur due to estrus. We have clarified this in the ‘Animal Work’ section of the Methods (Lines 103-106) as well as in the Figure Legends.

      (d) We thank the Reviewer for pointing out Fig. 3E lacks individual data points on the bar graph. Fig. 3E has been modified to now include the brain weight/body weight ratio for each individual mouse rather than across the population, to be consistent with the calculation of head width/body weight ratio (see point 2a).

      On original submission, only a representative WT image was selected due to space constraints. The figure (new Fig. 3H and 3K) and figure legend have been revised to include representative traces for all genotypes examined.

      (3) In lines 315-320, "None of the Trio variant heterozygotes exhibited altered dendritic spine density on M1 L5 pyramidal neurons compared to WT mice on either apical or basal arbors (Supplementary Figure 3L, M). Electron microscopy of cortical area M1 L5 revealed that synapse density was significantly increased in +/K1918X mice compared to WT (Figure 3A, B), possibly due to a net reduction in neuropil resulting from smaller dendritic arbors." The proposed explanation does not adequately address the observed discrepancy between spine density and synapse density reported in these two experiments. A more thorough analysis is needed to reconcile these conflicting findings and clarify how these distinct measurements may relate to each other in the context of the study's conclusions.

      We acknowledge the apparent discrepancy between our dendritic spine density data, which is unchanged from WT for all three Trio variant heterozygotes, and our synapse density data, which showed an increase in +/K1918X M1 L5 compared to WT. We have expanded the explanation for this discrepancy below and added this to the Discussion (Lines 802-811):

      a) Because spine density can vary by dendritic branch order and distance from the soma, only protrusions from secondary dendritic arbors of M1 L5 pyramidal neurons were quantified for consistency in analyses. However, all synapses meeting criteria were quantified in EM images, regardless of where they were located along an individual neuron’s arbors. It is possible that the density and distribution of spines along other arbors are different between genotypes but was not captured in our current data.

      b) +/K1918X L5 pyramidal neurons are smaller and less complex than WT neurons, especially in the basal compartment corresponding to L5 where EM images were obtained, consistent with the smaller brain size and reduced cortical thickness of +/K1918X mice. We posit that due to their smaller dendritic field size, L5 neurons pack more densely contributing to the increased synapse density observed in +/K1918X M1 L5 cortex. Consistent with this hypothesis, we observed a trend toward increased DAPI+ cell density in M1 L5 of +/K1918X neurons (Supp. Fig. 3N).

      (4) In Figure 4, one potential rationale for measuring AMPAR mEPSC frequency is to infer synapse density changes. However, the findings show no frequency change in +/K1431M and +/K1918X, with an increase only in +/M2145T, which contradicts Figure 3 results indicating a trend toward increased density across variants.

      This inconsistency is confusing, especially since the authors claim to follow the methodology from the study "Trio Haploinsufficiency Causes Neurodevelopmental Disease-Associated Deficits"; yet, the observed mEPSC amplitude differs significantly from that study, while the frequency remains unaffected. Additionally, the NMDAR mEPSCs reflect combined AMPAR and NMDAR responses at positive holding potentials, with peak amplitude dominated by AMPAR. This inconsistency between holding potential results is unclear, as frequency should theoretically align across negative and positive potentials. For accurate NMDAR mEPSC measurement, it would be optimal to assess amplitude 50 ms post-initial peak and, if possible, increase the holding potential to enhance the driving force given the typically low signal of NMDAR response.

      We thank the Reviewer for highlighting these important points.

      a) Previous work from our lab and others demonstrate that Trio regulates synaptic AMPA receptor levels, which is why we chose to focus on AMPAR-mediated evoked and miniature EPSC frequencies and amplitudes in the current study. We acknowledge Reviewer 1’s comment on seemingly contradictory results regarding AMPAR mEPSC frequency and synapse density; however, the unchanged AMPAR mEPSC frequency in +/K1431M and +/K1918X mice is consistent with our finding of unaltered dendritic spine density in these mice compared to WT (Supp. Fig. 4L,M). The differences between dendritic spine counts and synapse density is addressed in Response (3) above.

      b) While synapse density changes can be inferred from AMPAR mEPSC frequency, mEPSCs are also measures of spontaneous neurotransmitter release changes especially in the absence of changes in synaptic numbers. Notably, the increased mEPSC frequency in the +/M2145T variant is linked to enhanced spontaneous release, not to spine or synapse density changes. These findings are reinforced by increase in counts of synaptic vesicles, calculated PPR changes, and estimates of the Pr and RRP from HFS train analysis. We have included these points in the Discussion (Lines 861-863).

      c) While it is tempting to compare the current study to our previously published conditional Trio haploinsufficiency model, we highlight key distinctions that may underlie phenotypic differences between these two mouse models. First, our prior model used a NEX-Cre transgene to ablate one Trio allele from excitatory neurons only beginning at embryonic day 11. In contrast, our Trio variants are expressed in all cell types throughout development, akin to the genetic variants found in individuals with TRIO-related disorders. Second, the Trio variant mice in this study are on a C57BL/6 background, while the Trio haploinsufficient mice were on a mixed 129Sv/J X C57BL/6 background. These differences in the current study may explain why some measures, such as mEPSC amplitude, may not align with those from the Trio conditional haploinsufficiency model.

      d) Recordings were performed using specific inhibitors to isolate AMPA and NMDA mEPSCs; these missing methodological details have now been clarified in the updated Methods section (Lines 353-360).

      (5) In Supplementary Figure 4, the sample traces indicate a higher NMDA/AMPA ratio, raising the question of whether the AMPA EPSC amplitude changes, as this could reflect PSD length. In Figure 4B, the increased AMPAR mEPSC amplitude in the +/K1918X condition compared to WT suggests an enhanced postsynaptic response, yet the PSD length is reduced in Figure 3C. Can the authors provide a potential hypothesis to explain this?

      We appreciate the Reviewer’s feedback. Yes, both evoked and miniature recordings indicate increased AMPAR amplitudes in the +/K1918X variants compared to WT. While PSD length is often linked to synaptic strength, the observed reduction in PSD length in EM PSD length reduction in +/K1918X synapses is small (~6% of WT) and clearly does not correlate with significant changes in synaptic strength. We also note that the whole cell recordings of mEPSCs represent input from all active synapses on the neuron, while PSD length is measured only in synapses of the L5.

      (6) In Figure 4, synaptic plasticity appears to decrease to around 50% of baseline; could this reduction be attributed to LTD, or might it result from changes in pipette resistance? Additionally, is the observed potentiation due to changes in presynaptic release probability? Measuring paired-pulse ratio (PPR) before and after induction would clarify this aspect.

      We thank the Reviewer for highlighting these important points.

      a) We used a well-established theta burst stimulation method for LTP induction in M1 L5 pyramidal neurons. This protocol reliably evokes LTP in WT neurons, as shown in Fig. 5J and K. Both +/K1431M and +/K1918X variants exhibit a slight but discernible increase in evoked excitatory postsynaptic currents (eEPSCs), indicative of the initiation of LTP. Although this increase is smaller compared to WT, the presence of potentiation indicates that long-term depression (LTD) is an unlikely explanation for the observed reduction.

      b) To rule out the influence of technical artifacts, pipette resistance was carefully monitored before and after LTP induction. Any cells exhibiting resistance changes exceeding 20% during electrophysiological recordings were excluded from the analysis, ensuring that fluctuations in pipette resistance did not confound LTP measurements. These technical details are denoted in the Methods (Lines 344-346 and 364-366).

      c) The potentiation in the +/M2145T variant may stem from increased release probability (Pr) and greater synaptic vesicle availability, but is beyond the scope of this work. We agree this is an intriguing question, not only for +/M2145T but also for +/K1431M mice. Future studies should address this, ideally using models where the Trio variant is selectively introduced into the presynaptic neuron.

      (7) In lines 377-380, "The +/M2145T PPR curve was unusual, with significantly reduced PPF at short ISIs, yet clearly increased PPF at longer ISI (Figure 5A, B) compared to WT." The unusual PPR observed at the 100 ms ISI appears unexpected. Can the authors provide an explanation for this anomaly? This finding could suggest atypical presynaptic dynamics or modulation at this specific interval, which may differ from typical synaptic behavior. Further insights into possible mechanisms or experimental conditions affecting this result would be valuable.

      "The decreased PPF at initial ISI in +/M2145T mice correlated with increased mEPSC frequency (Fig. 4A-C), suggestive of a possible increase in spontaneous glutamate Pr." If this is the case, it raises the question of why the increased PPR at the initial ISI in +/K1431M does not correspond to the result shown in Figure 4C. This discrepancy suggests that factors beyond initial presynaptic release probability might be influencing the observed synaptic response, or that compensatory mechanisms could be affecting PPR and mEPSC frequency differently in this variant. Further clarification on the interplay between these measurements would help resolve this inconsistency.

      We appreciate the Reviewer’s critical reading and genuine interest on this phenotype in +/M2145T mice.

      a) The unusual shift of the PPR in +/M2145T at ISI 100ms is fascinating and will require significant additional experimentation that lies beyond the scope of this report to address. We propose it results from altered presynaptic regulators, including increased Syt3 and reduced RhoA activity. Notably, Syt3 influences calcium-dependent SV replenishment, which can cause similar PPR defects (Weingarten DJ et al., 2022); this is now included in the Discussion. (Lines 915-918).

      Weingarten DJ, Shrestha A, Juda-Nelson K, Kissiwaa SA, Spruston E, Jackman SL. Fast resupply of synaptic vesicles requires synaptotagmin-3. Nature. 2022 Nov;611(7935):320-325. doi: 10.1038/s41586-022-05337-1. Epub 2022 Oct 19. PMID: 36261524.

      b) Thank you for raising the concern in clarity of this statement "The decreased PPF at initial ISI in +/M2145T mice correlated with increased mEPSC frequency (Fig. 4A-C), suggestive of a possible increase in spontaneous glutamate Pr." We have edited the sentence to be more clear (Lines 701-703). First, the K1431M and M2145T variants impact different TRIO catalytic activities disrupting distinct GTPase pathways and differentially affecting presynaptic regulators, which can lead to non-overlapping phenotypes. Also, we expand our discussion that +/K1431M variant data suggest increased AMPAR numbers and fewer silent synapses (Lines 850-855), potentially increasing AMPAR mEPSC frequency and masking the expected decrease in spontaneous release (Lines 905-910). Further experiments are needed, ideally using mixed cultures with TRIO variants in presynaptic neurons with synapses on WT neurons, as minimal stimulation variance analysis in slices would be inconclusive due to its reflection of both Pr and silent synapse changes, similar to mEPSC frequency.

      (8) In Figure 5, there is no evidence demonstrating that the NSC inhibitor functions specifically in the +/K1431M condition without affecting other conditions. To verify its specificity, the authors should test the NSC inhibitor's effects across other conditions in parallel, including a control group. Additionally, cumulative RRP measurements should be provided for a more comprehensive assessment of the inhibitor's impact on synaptic function.

      We appreciate the Reviewer’s feedback.

      a) Previous studies have shown that Rac1 activity can bidirectionally regulate synchronous release probability (Pr). We used the Rac1-specific inhibitor NSC23766 (NSC) to test how Rac1 inhibition impacted the neurotransmitter release deficits observed in +/K1431M mice. We also added control experiments testing the impact of NSC on WT slices. These new experiments are now presented in new Fig. 8 of the revised manuscript, with expanded details in the Results (Lines 737-750) and Discussion (Lines 892-900).

      b) To estimate Pr and the RRP, we employed the Decay method as described by (Ruiz et al., 2011), which does not rely on cumulative EPSC plots for RRP estimation. This approach was chosen to account for the initial facilitation in these synapses and fits are done using EPSCs plotted against stimulus number. Additional details have been provided in the Methods section  (Lines 367-373).

      Ruiz R, Cano R, Casañas JJ, Gaffield MA, Betz WJ, Tabares L. Active zones and the readily releasable pool of synaptic vesicles at the neuromuscular junction of the mouse. J Neurosci. 2011 Feb 9;31(6):2000-8. doi: 10.1523/JNEUROSCI.4663-10.2011. PMID: 21307238; PMCID: PMC6633039.

      (9) Given the relevance to NDD, specifying the age window of the mice used is crucial. It is confusing that the synaptic function studies were conducted at P42, while the proteomic analysis was performed at P21. Could the authors clarify the rationale behind using different age points for these analyses? Consistency in age selection, or an explanation for this variation, would help in interpreting the developmental relevance of the findings.

      P42 was chosen as the age as it represents young adulthood, by which time clinical features will have already presented in individuals with neurodevelopmental disorders. Our prior studies of NEX-Cre Trio<sup>-/-</sup> mice found significant measurable differences from WT at this age, after neuronal migration, differentiation, synaptogenesis and pruning have occurred. An earlier developmental timepoint, P21, which coincides with juvenile age in mice, was chosen for proteomics studies to identify earlier changes and potentially targetable and modifiable mechanisms that could influence the phenotypes we observed in older mice. The experiments in P42 versus P21 mice were originally two independent lines of investigation that converged in the current study.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Recommendations for the authors:

      Reviewer #1:

      First, I thank the authors for clarifying some of the confusion I had in the previous comment and I appreciate the efforts the authors put into improving the quality of the manuscript. However, my concerns about the lack of novelty of the key findings are not perfectly addressed and there is no additional analysis done in this revision. Currently in this version of the manuscript, asserting that a p-value of 10-6 is close to genome-wide significance may be considered an overstatement. Further analysis focusing on finding novel and additional discovery is very necessary.

      We thank the reviewer for their comments. Reviewer #2 also made a comment regarding the genomewide threshold, “However, it remains unclear why the authors found it appropriate to apply STEAM to the LAAA model, a joint test for both allele and ancestry effects, which does not benefit from the same reduction in testing burden.” The reviewers’ have correctly identified our oversight - we have amended the manuscript as follows:

      (1) The abstract, “We identified a suggestive association peak (rs3117230, p-value = 5.292 x10-6, OR = 0.437, SE = 0.182) in the HLA-DPB1 gene originating from KhoeSan ancestry.”

      (2) From line 233 to 239: “The R package STEAM (Significance Threshold Estimation for Admixture Mapping) (Grinde et al., 2019) was used to determine the admixture mapping significance threshold given the global ancestral proportions of each individual and the number of generations since admixture (g = 15). For the LA model, a genome-wide significance threshold of pvalue < 2.5 x 10-6 was deemed significant by STEAM. The traditional genome-wide significance threshold of 5 x 10-8 was used for the GA, APA and LAAA models, as recommended by the authors of the LAAA model (Duan et al., 2018).” 

      (3) We excluded the results for the signal on chromosome 20, since this also did not reach the LAAA model genome-wide significance threshold.  

      (4) From line 296 to 308: “LAAA models were successfully applied for all five contributing ancestries (KhoeSan, Bantu-speaking African, European, East Asian and Southeast Asian). However, no variants passed the threshold for statistical significance. Although no variants reached genome-wide significance, a suggestive peak was identified in the HLA-II region of chromosome 6 when using the LAAA model and adjusting for KhoeSan ancestry (Figure 3). The QQ-plot suggested minimal genomic inflation, which was verified by calculating the genomic inflation factor ( = 1.05289) (Supplementary Figure 1). The lead variants identified using the LAAA model whilst adjusting for KhoeSan ancestry in this region on chromosome 6 are summarised in Table 3. The suggestive peak encompasses the HLA-DPA1/B1 (major histocompatibility complex, class II, DP alpha 1/beta 1) genes (Figure 4). It is noteworthy that without the LAAA model, this suggestive peak would not have been observed for this cohort. This highlights the importance of utilising the LAAA model in future association studies when investigating disease susceptibility loci in admixed individuals, such as the SAC population.”

      We acknowledge that our results are not statistically significant. However, our study advances this area of research by identifying suggestive African-specific ancestry associations with TB in the HLA-II region. These findings build upon the work of the ITHGC, which did not identify any significant associations or suggestive peaks in their African-specific analyses. We have included this argument in our manuscript (from lines 425 to 432):

      “The ITHGC did not identify any significant associations or suggestive peaks in their African ancestryspecific analyses.  Notably, the suggestive peak in the HLA-DPB1 region was only captured in our cohort using the LAAA model whilst adjusting for KhoeSan local ancestry. This underscores the importance of incorporating global and local ancestry in association studies investigating complex multi-way admixed individuals, as the genetic heterogeneity present in admixed individuals (produced as a result of admixtureinduced and ancestral LD patterns) may cause association signals to be missed when using traditional association models (Duan et al., 2018; Swart, van Eeden, et al., 2022).”

      We appreciate the comment regarding additional analyses. We acknowledge that we did not validate our SNP peak in the HLA-II region through fine-mapping due to the lack of a suitable reference panel (see lines 490 to 500). Our long-term goal is to develop a HLA-imputation reference panel incorporating KhoeSan ancestry; however, this is beyond the scope and funding allowances of this study.

      Reviewer #2 (Recommendations for the authors):

      The authors we think have done an excellent job with their responses and the manuscript has been substantially improved.

      Thank you for taking the time to help us improve our manuscript.

    1. Author response:

      We are grateful to the reviewers for their extensive and constructive feedback. In large the three reviewers noted the following main points:

      (1) The overall evidence for any rhythmicity in this data is not ‘very strong’.

      We do agree and will tone down the conclusions accordingly. However, as one of the reviewers noted, a qualitative interpretation of the specific statistical results remains somewhat vague and speculative by necessity.

      (2) The differences between the results for the individual experiments are generally small. Yet, the same reviewer also asks for speculations as to how differences between experiments can be interpreted.

      We will consider these, but also note that a clear demonstration of the robustness of specific effects requires the replication of individual experiments in a separate experiment.

      (3) A clear-cut interpretation of the current experimental design in the context of continuous listening and true vigilance tasks remains difficult. This makes the interpretation and generalization of the results difficult.

      We do agree in principle, but also note that task designs very widely in previous work, which may be one reason for why there is no clear consensus on the existence or absence of a rhythmic mode of listening. We will consider specific suggestions for future work to be included in the revision.

      (4) The adjustment of task difficulty in the present task design may pose a challenge. Reviewers also suggest analyzing potential rhythmicity in this task difficulty parameter.

      We will consider this for the revision.

      (5) A more clear-cut interpretation of what potential differences in the rhythmicity of sensitivity and bias would mean should be included.

      We will provide this in the revision.

      (6) The study should provide a stronger conceptual framework both for the source of "rhythmic modes" and why one may expect differences between ears.

      In large this has been put forward by many previous studies testing and reporting rhythmicity in auditory tasks.  Rhythmicity is pervasive in neural activity, but whether and how this relates to behavioral data remains less clear. These points will be clarified in a revision.

      (7) Parallels to work in the visual domain by Fiebelkorn, Landau & Fries should be included.

      We will discuss similarities and differences between studies on perceptual rhythmicity in the visual and auditory domains.

    1. Author Response:

      eLife assessment

      This is a valuable initial study of cell type and spatially resolved gene expression in and around the locus coeruleus, the primary source of the neuromodulator norepinephrine in the human brain. The data are generated with cutting-edge techniques, and the work lays the foundation for future descriptive and experimental approaches to understand the contribution of the locus coeruleus to healthy brain function and disease. However, due to small sample size and the need for additional confirmatory data, the data only incompletely support the main conclusions presented here. With the strengthening of the analyses, this paper, and the associated web application, will be of great interest to neuroscientists working on arousal-based behaviors and neurological and neuropsychiatric phenotypes.

      Thank you for the assessment and comments. Overall, the majority of the issues raised by the reviewers relate either directly or indirectly to limitations of the sample size that precluded further optimization of protocols and expansion of the dataset. We fully acknowledge the limited sample size in this dataset and aim to be transparent about the limitations of the study. This is the first report of snRNA-seq and spatially-resolved transcriptomics in the human locus coeruleus (LC). The LC is a very small nucleus, located deep within the brainstem, which is extremely challenging to study due to its small size, difficult to access location, and the very small number of norepinephrine (NE) neurons located within the nucleus, which were of prime interest for this study. We note that this study represents our initial attempt to molecularly and spatially characterize cell types within the human LC. We note that we did not have significant, established funding from extramural sources dedicated to this study, and tissue resources for the LC are difficult to ascertain, contributing to the small sample size in this initial study. We acknowledge that there are limitations in sample size as well as data quality. Findings from this study will be used to inform, improve, and optimize future and ongoing experimental design, as well as technical and analytical workflows for larger-scale studies. As brought up by one of the reviewers, this field is still in its infancy -- pilot experimentation in new brain regions is labor-intensive and these sequencing approaches remain costly. Moreover, due to the small size and difficulties in dissecting, tissue resources from the human brain in this area are a highly limited resource. Hence, notwithstanding limitations, in our view it is important to release the data for community access at this time. Specific responses to the reviewers’ comments are provided point-by-point in the following sections.

      Reviewer #1 (Public Review):

      Weber et al. collect locus coeruleus (LC) tissue blocks from 5 neurotypical European men, dissect the dorsal pons around the LC and prepare 2-3 tissue sections from each donor on a slide for 10X spatial transcriptomics. […] The authors transparently present limitations of their work in the discussion, but some points discussed below warrant further attention.

      Specific comments:

      1) snRNAseq:

      a. Major concerns with the snRNAseq dataset are A) the low recovery rate of putative LC-neurons in the snRNAseq dataset, B) the fact that the LC neuron cluster is contaminated with mitochondrial RNA, and C) that a large fraction of the nuclei cannot be assigned to a clear cell type (presumably due to contamination or damaged nuclei). The authors chose to enrich for neurons using NeuN antibody staining and FACS. But it is difficult to assess the efficacy of this enrichment without images of the nuclear suspension obtained before FACS, and of the FACS results. As this field is in its infancy, more detail on preliminary experiments would help the reader to understand why the authors processed the tissue the way they did. It would be nice to know whether omitting the FACS procedure might in fact result in higher relative recovery of LC-neurons, or if the authors tried this and discovered other technical issues that prompted them to use FACS.

      Thank you for these comments. We agree these are valid concerns in assessing the data quality and validity of the findings from the snRNA-seq dataset. We will respond to these concerns here to the best of our ability, but in some cases, we do not have definitive answers since comparison data are not yet available for this region. In particular, we were limited in resources for this initial study -- some of the results of the study and issues that we identified in attempting to molecularly profile cells in the human LC were surprising to us, and we intend to generate additional samples and troubleshoot these issues to improve data quality and increase recovery in future work. However, these experiments are (i) expensive, (ii) time- and labor-intensive, and (iii) the tissue for this region is limited and difficult to ascertain. Given the extremely small size of the LC, the tissue resource is quickly depleted. For this study, we had fixed resources and made best-guess decisions on how to proceed with the experimental design, based on our experience with snRNA-seq in other human brain regions (Tran and Maynard et al. 2021). However, the LC is a unique region, and our experiences with this dataset will guide us to make technical adjustments in future studies. Due to the limitations in the tissue resources and the lack of data currently available to the community, we wanted to share these results immediately while acknowledging the limitations of the study as we work to increase our resource availability to expand molecular and spatial profiling studies in this region of the human brain.

      Regarding the reviewer’s concern that our choice to use FANS to enrich for neurons could have potentially led to more damage and contributed to the low recovery rate of LC-NE neurons and the mitochondrial contamination -- we do not have a definitive answer to this question, since we did not perform a direct comparison with non-sorted data. As noted above, our limited tissue resource dictated that we could not do both. We made the decision to enrich for neurons based on our previous experience with identifying relatively rare populations in other brain regions (e.g. nucleus accumbens and amygdala; Tran and Maynard et al. 2021). Based on this previous work, our rationale was that without neuronal enrichment, we could potentially miss the LC-NE population, given the relative scarcity of this neuronal population. The low recovery rate and relatively lower quality / contamination issues may be due to technical issues that lead to LC-NE neurons being more susceptible to damage during nuclear preparation and sorting. We agree that directly comparing to data prepared without NeuN labeling and sorting is reasonable, as the additional perturbations may indeed contribute to cell damage. As mentioned in the discussion, we do not have a definitive answer to the reasons for increased mitochondrial contamination and we suspect that multiple technical factors may contribute -- including the relatively large size and increased fragility of LC-NE neurons. We agree that systematically optimizing the preparation to attempt to increase recovery rate and decrease mitochondrial contamination are important avenues for future work.

      b. It is unclear what percentage of cells that make up each cluster.

      We will add this information in the clustering heatmaps or as a supplementary plot in a revised version of the manuscript.

      c. The number of subjects used in each analysis was not always clear. Only 3 subjects were used for snRNAseq, and one of them only yielded 4 LC-nuclei. This means the results are essentially based on n=2. The authors report these numbers in the corresponding section, but the first sentence of the results section (and Figure 1C specifically!) create the impression that n=5 for all analyses. Even for spatial transcriptomics, if I understood it correctly, 1 sample had to be excluded (n=4).

      This is correct. We will update the figures and text in a revised version of the manuscript to make this limitation (small sample size) more clear, and to further emphasize that the intention of this study is to provide initial data to help determine next steps and best practices for a larger scale and more comprehensive study on this region, especially given the limited availability of tissue resources and currently limited data resources available for this region.

      2) Spatial transcriptomics:

      a. It is not clear to me what the spatial transcriptomics provides beyond what can be shown with snRNAseq, nor how these two sets of results compare to each other. It would be more intuitive to start the story with snRNAseq and then try to provide spatial detail using spatial transcriptomics. The LC is not a homogeneous structure but can be divided into ensembles based on projection specificity. Spatial transcriptomics could - in theory - offer much-needed insights into the spatial variation of mRNA profiles across different ensembles, or as a first step across the spatial (rostral/caudal, ventral/dorsal) extent of the LC. The current analyses, however, cannot address this issue, as the orientation of the LC cannot be deduced from the slices analyzed.

      We understand the point of the reviewer. However, we structured the manuscript in this format due to our aims of creating a data resource for the community as well as being transparent about the limitations of our study. Our experiments began with the spatial experiments on the tissue blocks because this (i) helped orient ourselves to the region, and (ii) provided guidance for how best to score the tissue blocks for the snRNA-seq experiments to maximize recovery of LC-NE neurons. Therefore, we also decided to present the results in this sequence.

      The spatial data also provides more information in that the measurements are from nuclei, cytoplasm, and cell processes (instead of nuclei only). This is one of the main differences / advantages between the platforms at this level of spatial resolution. As noted above, we were also working with a finite tissue resource -- if we ran snRNA-seq first and captured no neurons, the tissue block would be depleted. Due to the logistics / thickness of the required tissue sections for Visium and snRNA-seq respectively, running Visium first allowed us to ensure that we could collect data from both assays.

      Regarding a point raised below on why we only ran snRNA-seq on a subset of the donors -- this was due to resource depletion and not enough available tissue remaining on the tissue blocks to run the assay. We have conducted extensive piloting in other brain regions on the amount (mg) of tissue that is needed from various sized cryosections, and the LC is particularly difficult since these are small tissue blocks and the extent of the structure is small. Hence, in some of the subjects, we did not have sufficient tissue available for the snRNA-seq assay.

      We agree with the reviewer that spatial studies could, in future work, offer needed and important information about expression profiles across the spatial axes (rostral/caudal, ventral/dorsal) of the LC. Our study provides us with insight about optimizing the dissections for spatial assays, as well as bringing to light a number of technical and logistical issues that we had not initially foreseen. For example, during the course of this study and parallel, ongoing work in other small, challenging brain regions, we have now developed a number of specialized technical and logistical strategies for keeping track of orientation and mounting serial sections from the same tissue block onto a single spatial array, which is extremely technically challenging. We are now well-prepared for addressing these issues in future studies with larger numbers of donors and samples, e.g. spaced serial sections across the extent of the LC to make these types of insights. Due to the rarity of the tissue, limited availability of information in this region, and high expense of conducting these studies, we want to share this initial data with the community immediately. We also note that in addition to the 10x Genomics Visium platform, which lacks cellular and sub-cellular resolution, many new and exciting spatial platforms are entering the market, which may be able to address questions in very small regions such as the LC at higher spatial resolution.

      b. Unfortunately, spatial transcriptomics itself is plagued by sampling variability to a point where the RNAscope analyses the authors performed prove more powerful in addressing direct questions about gene expression patterns. Given that the authors compare their results to published datasets from rodent studies, it is surprising that a direct comparison of genes identified with spatial transcriptomics vs snRNAseq is lacking (unless this reviewer missed this comparison). Supplementary Figure 17 seems to be a first step in that direction, but this is not a gene-by-gene comparison of which analysis identifies which LC-enriched genes. Such an analysis should not compare numbers of enriched genes using artificial cutoffs for significance/fold-change, but rather use correlations to get a feeling for which genes appear to be enriched in the LC using both methods. This would result in one list of genes that can serve as a reference point for future work.

      We agree this is a good suggestion, and will add additional computational analyses to address this point in a revised version of the manuscript.

      c. Maybe the spatial transcriptomics could be useful to look at the peri-LC region, which has generated some excitement in rodent work recently, but remains largely unexplored in humans.

      We agree this is an excellent suggestion -- assessing cross-species comparisons related to convergence, especially, of GABAergic cell populations in the human LC is of high interest. We note that these types of extensions are exactly the reason why we have provided the publicly accessible web app (R/Shiny app, which includes the ability to annotate regions). We hope that others will use these apps for specialized topics they are interested in. As discussed above, we note that our initial dissections precluded the ability to keep track of the exact orientation of our tissue sections on the Visium arrays with respect to their location within the brainstem, so definitive localization of this region across subjects is difficult in our current study. However, it is possible, for example, to investigate whether there is a putative peri-LC region that is densely GABAergic that is homologous with the GABAergic peri-LC region in rodents. We also raise attention to a recent preprint by Luskin and Li et al. (2022), who apply snRNA-seq and spatially-resolved transcriptomics to molecularly define both LC and peri-LC cell types in mice -- in a revised version of our manuscript, we will extend our computational analyses of inhibitory neuronal subtypes in our data (Supplementary Figures 13, 16) to directly compare with those identified in this study in more detail. As noted above, we we have now developed a number of specialized technical and logistical strategies for keeping track of orientation of sections from the tissue block onto a single spatial array, and we feel that combined with optimized dissection strategies for this region and the guide of RNAscope for GABAergic markers on serial sections, that annotating the peri-LC region on spatial arrays in future studies will be possible.

      3) The comparison of snRNAseq data to published literature is laudable. Although the authors mention considerable methodological differences between the chosen rodent work and their own analyses, this needs to be further explained. The mouse dataset uses TRAPseq, which looks at translating mRNAs associated with ribosomes, very different from the nuclear RNA pool analyzed in the current work. The rat dataset used single-cell LC laser microdissection followed by microarray analyses, leading to major technical differences in terms of tissue processing and downstream analyses. The authors mention and reference a recent 10x mouse LC dataset (Luskin et al, 2022), however they only pick some neuropeptides from this study for their analysis of interneuron subtypes (Figure S13). Although this is a very interesting part of the manuscript, a more in-depth analysis of these two datasets would be very useful. It would likely allow for a better comparison between mouse and human, given that the technical approach is more similar (albeit without FACS), and Luskin et al have indicated that they are willing to share their data.

      As noted above, we plan to extend our comparisons with the dataset from Luskin and Li et al. (2022) in a revised version of the manuscript, which will provide a more in-depth cross-species comparison. In addition, we also note that there are some additional recent studies using TRAPseq of LC-NE neurons in a functional context, i.e. treatment vs. control experiments or in model systems (e.g. Iannitelli et al. 2023), which provide new opportunities for understanding disease context using in-depth cross-species comparisons. By providing our dataset and reproducible code, we will enable others to adapt and extend these types of comparisons (i.e. TRAPseq of LC-NE neurons or LC snRNA-seq following functional manipulations or in the context of disease or behavioral models) in the future.

      4) Statements in the manuscript about the unexpected identification of a 5-HT (serotonin) cell-cluster seem somewhat contradictory. Figure S14 suggests that 5-HT markers are expressed in the LC-regions just as much as anywhere else, but the RNAscope image in Figure S15 suggests spatial separation between these two populations. And Figure S17 again suggests almost perfect overlap between the LC and 5HT clusters. Maybe I misunderstood, in which case the authors should better clarify/explain these results.

      In our view, the most likely scenario is that the 5-HT neurons come from contamination from the dorsal raphe nucleus based on spatial separation from the RNAscope images, which we agree are more definitive. As mentioned above, since we do not have definitive documentation for the tissue sections in terms of orientation, it is difficult to say with clarity that the regions are the dorsal raphe and which sub-portion of the dorsal raphe they are. This initial study has now allowed us to optimize and improve our dissection strategy and approaches for retaining documentation of the orientation of the tissue sections from their intact position within the brainstem as they move from cryosection to placement on the array, which will enable us to better annotate regions with definitive anatomical information with respect to the rostral/caudal and dorsal/ventral axes in future experiments. Given that there are reports in the rodent that 5-HT markers have been identified in LC-NE neurons (Iijima 1993; Iijima 1989), and taking into account the technical limitations in our study, we felt that it was premature to definitively conclude in the manuscript that we were sure these signals arose from the dorsal raphe. We will update this language in a revised version of the manuscript to ensure that these limitations are clear (referring to Supplementary Figures S14-15, S17).

      Reviewer #2 (Public Review):

      The data generated for this paper provides an important resource for the neuroscience community. The locus coeruleus (LC) is the known seed of noradrenergic cells in the brain. Due to its location and size, it remains scarcely profiled in humans. Despite the physically minute structure containing these cells, its impact is wide-reaching due to the known neuromodulatory function of norepinephrine (NE) in processes like attention and mood. As such, profiling NE cells has important implications for most neurological and neuropsychiatric disorders. This paper generates transcriptomic profiles that are not only cell-specific but which also maintain their spatial context, providing the field with a map for the cells within the region.

      Strengths:

      Using spatial transcriptomics in a morphologically distinct region is a very attractive way to generate a map. Overlaying macroscopic information, i.e. a region with greater pigmentation, with its corresponding molecular profile in an unbiased manner is an extremely powerful way to understand the specific cellular and molecular composition of that brain structure.

      The technologies were used with an astute awareness of their limitations, as such, multiple technologies were leveraged to paint a more complete and resolved picture of the cellular composition of the region. For example, the lack of resolution in the spatial transcriptomic platform was compensated by complementary snRNA-seq and single molecule FISH.

      This work has been made publicly available and accessible through a user-friendly application such that any interested researcher can investigate the level of expression of their gene of interest within this region.

      Two important implications from this work are 1) the potential that the gene regulatory profiles of these cells are only partially conserved across species, humans, and rodents, and 2) that there may be other neuromodulatory cell types within the region that were otherwise not previously localized to the LC

      Weaknesses:

      Given that the markers used to identify cells are not as specific as they need to be to definitively qualify the desired cell type, the results may be over-interpreted. Specifically, TH is the primary marker used to qualify cells as noradrenergic, however, TH catalyzes the synthesis of L-DOPA, a precursor to dopamine, which in turn is a precursor for epinephrine and norepinephrine suggesting some of the cells in the region may be dopaminergic and not NE cells. Indeed, there are publications to support the presence of dopaminergic cells in the LC (see Kempadoo et al. 2016, Takeuchi et al., 2016, Devoto et al. 2005). This discrepancy is further highlighted by the apparent lack of overlap per given Visium spots with TH, SCL6A2, or DBH. While the single-nucleus FISH confirms that some of the cells in the region are noradrenergic, others very possibly represent a different catecholamine. As such it is suggested that the nomenclature for the cells be reconsidered.

      We appreciate the reviewer’s comment, and are aware of the reports suggesting the potential presence of dopaminergic cells in the LC. We initially had the same thought as the reviewer when we observed Visium spots in the spatial data with lack of overlap between TH, SLC6A2, and DBH as well as single nuclei in the snRNA-seq data with lack of overlap between TH, SLC6A2, and DBH. This surprising result was exactly why we performed the smFISH/RNAscope experiment with these three marker genes. Given known issues with read depth and coverage in the 10x Genomics assays, we wanted to better understand if this was a technical limitation in the sequencing coverage, or rather a true biological finding. The RNAscope data showed very clearly that nearly every cell body we looked at had co-localization of these three marker genes. We included an image from a single capture array of one tissue section in Supplementary Figure 11, but could, in a revised version of the manuscript, provide additional examples to illustrate how conclusive the images were by visualization. As such, we were quite convinced that the lack of overlap on Visium spots and in single nuclei in the snRNA-seq data was more likely related to technical issues with sequencing coverage, rather than a biological finding. We also note that we checked for the presence of the dopamine transporter, SLC6A3, and as can be appreciated in the iSEE web app for the snRNA-seq data or the R/Shiny web app for the Visium data, there is virtually no expression of SLC6A3 in the dataset, which in our view provides additional evidence against the possibility that there are substantial quantities of dopaminergic cells in this human LC dataset. We will include supplementary plots showing the lack of SLC6A3 expression in a revised version of the manuscript.

      The authors are unable to successfully implement unsupervised clustering with the spatial data, this greatly reduces the impact of the spatial technology as it implies that the transcriptomic data generated in the study did not have enough resolution to identify individual cell types.

      The reviewer is correct -- this is a fundamental limitation of the 10x Genomics Visium platform, i.e. the spatial resolution captures multiple cells per spot (e.g. around 1-10 cells per spot in human brain tissue). We note that new spatial platforms now provide cellular resolution (e.g. Vizgen MERSCOPE, 10x Genomics Xenium, 10x Genomics Visium HD), which will help address this in future work. However, many of these cellular-resolution in situ sequencing platforms have the limitation that they do not quantify genome-wide expression, and instead require users to select a priori gene panels to investigate. This is a problem if no genome-wide reference datasets are available. Hence, despite the limited spatial resolution of the Visium platform, this dataset is useful precisely for helping investigators choose gene panels for higher-resolution platforms or higher-order smFISH multiplexing.

      We also applied spatial clustering (using BayesSpace; Zhao et al. 2021) to attempt to segment the LC regions within the Visium samples in a data-driven manner as an alternative to the manual annotations, which was unsuccessful (and hence we relied on the manually annotated regions for downstream analyses) (Supplementary Figure S5). However, this is a different application of unsupervised clustering, which is separate from the task of identifying cell types.

      The sample contribution to the results is highly unbalanced, which consequently, may result in ungeneralizable findings in terms of regional cellular composition, limiting the usefulness of the publicly available data.

      We acknowledge the limitations of the work due to the small/unbalanced sample sizes. As mentioned above for Reviewer 1, this was an initial study in this region -- results of which will inform our (and hopefully others’) experimental design and approach to molecular profiling in this difficult to access brain region. Overall, this study was executed with finite tissue and financial resources and was intended to uncover limitations and help develop best practices and design workflows for future studies with larger numbers of donors and samples. Given the limited data availability for this brain region, we wanted to make this dataset available for the research community immediately. In addition, we note that making this genome-wide dataset available will help inform targeted gene panel design for higher-resolution platforms (e.g. 10x Genomics Xenium).

      This study aimed to deeply profile the LC in humans and provide a resource to the community. The combination of data types (snRNA-seq, SRT, smFISH) does in fact represent this resource for the community. However, due to the limitations, of which, some were described in the manuscript, we should be cautious in the use of the data for secondary analysis. For example, some of the cellular annotations may lack precision, the cellular composition also may not reflect the general population, and the presence of unexpected cell types may represent the accidental inclusion of adjacent regions, in this case, serotonergic cells from the Raphe nucleus.

      We agree, and have attempted to explain these limitations in the manuscript. We will clarify the language regarding the interpretation of the annotated cell populations and unexpected cell types, and the limited sample sizes, in a revised version of the manuscript.

      Nonetheless having a well-developed app to query and visualize these data will be an enormous asset to the community especially given the lack of information regarding the region in general.

      Reviewer #3 (Public Review):

      […] This study has many strengths. It is the first reported comprehensive map of the human LC transcriptome, and uses two independent but complementary approaches (spatial transcriptomics and snRNA-seq). Some of the key findings confirmed what has been described in the rodent LC, as well as some intriguing potential genes and modules identified that may be unique to humans and have the potential to explain LC-related disease states. The main limitations of the study were acknowledged by the authors and include the spatial resolution probably not being at the single cell level and the relatively small number of samples (and questionable quality) for the snRNA-seq data. Overall, the strengths greatly outweigh the limitations. This dataset will be a valuable resource for the neuroscience community, both in terms of methodology development and results that will no doubt enable important comparisons and follow-up studies.

      Major comments:

      Overall, the discovery of some cells in the LC region that express serotonergic markers is intriguing. However, no evidence is presented that these neurons actually produce 5-HT.

      The reviewer is correct that we did not provide any additional evidence to show that these neurons actually produce 5-HT. As noted above in the response to Reviewer 1, in our view, the most likely explanation is that these neurons are from dorsal raphe contamination on the tissue section. However, due to technical and logistical limitations in this study, we could not definitively say this because we did not clearly track the orientation of the tissue sections, and we did not have remaining tissue sections from all donor tissue blocks to repeat RNAscope experiments. For some of the donors, where we had remaining tissue sections to go back to repeat RNAscope experiments after completion of the snRNA-seq and Visium assays, we could see clear separation of the LC region / LC-NE neuron core from where putative 5-HT neurons were located (Supplementary Figure 15). However, we did not have sufficient tissue resources to map this definitively in all donors, and the orientation and anatomy of each tissue block were not fully annotated.

      Due to the lack of clarity, and the fact that there have been reports that LC-NE neurons express serotonergic markers (Iijima 1993; Iijima 1989), we felt that it was premature to definitively declare that these putative 5-HT neurons that we identified were definitively from the raphe. We will clarify the language around this discrepancy in a revised version of the manuscript to ensure that these limitations are clearly described.

      Concerning the snRNA-seq experiments, it is unclear why only 3 of the 5 donors were used, particularly given the low number of LC-NE nuclear transcriptomes obtained, why those 3 were chosen, and how many 100 um sections were used from each donor. It is also unclear if the 295 nuclei obtained truly representative of the LC population or whether they are just the most "resilient" LC nuclei that survive the process.

      As discussed above for Reviewer 1, the reason we included only 3 of the 5 donors for the snRNA-seq assays was due to the tissue availability on the tissue blocks. We will clarify the language in a revised version of the manuscript to make this limitation more clear. We will also include additional details in the Methods section on the number of 100 μm sections used for each donor (which varied between 10-15, approximating 60-80 mg of tissue).

      The LC displays rostral/caudal and dorsal/ventral differences, including where they project, which functions they regulate, and which parts are vulnerable in neurodegenerative disease (e.g. Loughlin et al., Neuroscience 18:291-306, 1986; Dahl et al., Nat Hum Behav 3:1203-14, 2019; Beardmore et al., J Alzheimer's Dis 83:5-22, 2021; Gilvesy et al., Acta Neuropathol 144:651-76, 2022; Madelung et al., Mov Disord 37:479-89, 2022). It was not clear which part(s) of the LC was captured for the SRT and snRNAseq experiments.

      As discussed above for Reviewer 1, a limitation of this study was that we did not record the orientation of the anatomy of the tissue sections, precluding our ability to annotate the tissue sections with the rostral/caudal and dorsal/ventral axis labels. We agree with the reviewer that additional spatial studies, in future work, could offer needed and important information about expression profiles across the spatial axes (rostral/caudal, ventral/dorsal) of the LC. Our study provides us with insight about optimizing the dissections for spatial assays, as well as bringing to light a number of technical and logistical issues that we had not initially foreseen. For example, during the course of this study and parallel, ongoing work in other, small, challenging regions, we have now developed a number of specialized technical and logistical strategies for keeping track of orientation and mounting serial sections from the same tissue block onto a single spatial array, which is extremely technically challenging. We are now well-prepared for addressing these issues in future studies with larger numbers of donors and samples in order to make these types of insights.

      The authors mention that in other human SRT studies, there are typically between 1-10 cells per expression spot. I imagine that this depends heavily on the part of the brain being studied and neuronal density, but it was unclear how many LC cells were contained in each expression spot.

      The reviewer is correct that we did not include this information in the manuscript. We attempted to apply a computational method to count nuclei contained in each gene expression spot based on analyzing the histological H&E images (VistoSeg; Tippani et al. 2022), which we have developed and previously applied in data from the dorsolateral prefrontal cortex (DLPFC) (Maynard and Collado-Torres et al. 2021). Based on the segmentation using this workflow we observe that the counts in this region are similar to what we observed in the DLPFC, i.e., typically between 1-10 LC cells per expression spot, with approximately 1-2 LC-NE neurons (which are characterized by their large size) per expression spot. However, these analyses had several technical issues related to the images themselves, the relatively large size and pigmentation of LC-NE neurons, and parameter settings that had been optimized for different brain regions. We are currently optimizing this analysis workflow for these images to provide more accurate estimates of cell counts per spot to give readers additional context on the number of nuclei per spot in the annotated LC regions and outside the LC regions in a revised version of the manuscript.

      Regarding comparison of human LC-associated genes with rat or mouse LC-associated genes (Fig. 2D-F), the authors speculate that the modest degree of overlap may be due to species differences between rodents and human and/or methodological differences (SRT vs microarray vs TRAP). Was there greater overlap between mouse and rat than between mouse/rat and human? If so, that is evidence for the former. If not, that is evidence for the latter. Also would be useful for more in-depth comparison with snRNA-seq data from mouse LC: https://www.biorxiv.org/content/10.1101/2022.06.30.498327v1.

      We will investigate this question and discuss this in updated results in a revised version of the manuscript.

      The finding of ACHE expression in LC neurons is intriguing, especially in light of work from Susan Greenfield suggesting that ACHE has functions independent of ACH metabolism that contributes to cellular vulnerability in neurodegenerative disease.

      We thank the reviewer for pointing this out. We were very surprised too by the observed expression of SLC5A7 and ACHE in the LC regions (Visium data) and within the LC-NE neuron cluster (snRNA-seq data), coupled with absence of other typical cholinergic marker genes (e.g. CHAT, SLC18A3), and we do not have a compelling explanation or theory for this. Hence, the work of Susan Greenfield and colleagues suggesting non-cholinergic actions of ACHE, particularly in other catecholaminergic neurons (e.g. dopaminergic neurons in the substantia nigra) is very interesting. We will include references to this work and how it could inform interpretation of this expression in a revised version of the manuscript (Greenfield 1991; Halliday and Greenfield 2012).

      High mitochondrial reads from snRNA-seq can indicate lower quality. It was not clear why, given the mitochondrial read count, the authors are confident in the snRNA-seq data from presumptive LC-NE neurons.

      We will include additional analyses to further investigate and/or confirm this finding (e.g. comparing sum of UMI counts / number of detected genes and mitochondrial percentage per nucleus for this population to confirm data quality) in additional supplementary figures in a revised version of the manuscript.

      References

      • Greenfield (1991), A noncholinergic action of acetylcholinesterase (AChE) in the brain: from neuronal secretion to the generation of movement, Cellular and Molecular Neurobiology, 11, 1, 55-77.

      • Halliday and Greenfield (2012), From protein to peptides: a spectrum of non-hydrolytic functions of acetylcholinesterase, Protein & Peptide Letters, 19, 2, 165-172.

      • Iannitelli et al. (2023), The neurotoxin DSP-4 dysregulates the locus coeruleus-norepinephrine system and recapitulates molecular and behavioral aspects of prodromal neurodegenerative disease, eNeuro, 10, 1, ENEURO.0483-22.2022.

      • Iijima K. (1989), An immunocytochemical study on the GABA-ergic and serotonin-ergic neurons in rat locus ceruleus with special reference to possible existence of the masked indoleamine cells. Acta Histochema, 87, 1, 43-57.

      • Iijima K. (1993), Chemocytoarchitecture of the rat locus ceruleus, Histology and Histopathology, 8, 3, 581-591.

      • Luskin A.T., Li L. et al. (2022), A diverse network of pericoerulear neurons control arousal states, bioRxiv (preprint).

      • Maynard and Collado-Torres et al. (2021), Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex, Nature Neuroscience, 24, 425-436.

      • Tippani et al. (2022), VistoSeg: processing utilities for high-resolution Visium/Visium-IF images for spatial transcriptomics data, bioRxiv (preprint).

      • Tran M.N., Maynard K.R. et al. (2021), Single-nucleus transcriptome analysis reveals cell-type-specific molecular signatures across reward circuitry in the human brain, Neuron, 109, 3088-3103.

      • Zhao E. et al. (2021), Spatial transcriptomics at subspot resolution with BayesSpace, Nature Biotechnology, 39, 1375-1384.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the public reviewers and editors for their insightful comments on the manuscript. We have made the following changes to address their concerns and think the resulting manuscript is stronger as a result. Specifically, we have 1) added RNA FISH data of specific STB-2 and STB-3 RNA markers to confirm their distribution changes between STB<sup>in</sup> and STB<sup>out</sup> TOs, 2) removed language throughout the text that refer to STB-3 as a terminally differentiated nuclear subtype, and 3) generated CRISPR-mediated knock-outs of two genes identified by network analysis and validated their rolse in mediating STB nuclear subtype gene expression.

      Reviewer #1 (Public review): 

      Strengths: 

      The study offers a comprehensive SC- and SN-based characterization of trophoblast organoid models, providing a thorough validation of these models against human placental tissues. By comparing the older STB<sup>in</sup> and newer STB<sup>out</sup> models, the authors effectively demonstrate the improvements in the latter, particularly in the differentiation and gene expression profiles of STBs. This work serves as a critical resource for researchers, offering a clear delineation of the similarities and differences between TO-derived and primary STBs. The use of multiple advanced techniques, such as high-resolution sequencing and trajectory analysis, further enhances the study's contribution to the field. 

      Thank you for your thoughtful review—we appreciate your recognition of our efforts to comprehensively validate trophoblast organoid models and highlight key advancements in STB differentiation and gene expression.

      Weaknesses: 

      While the study is robust, some areas could benefit from further clarification. 

      (1) The importance of the TO model's orientation and its impact on outcomes could be emphasized more in the introduction. 

      We agree that TO orientation may significantly influence STB nuclear subtype differentiation. As the STB is critical for both barrier formation and molecular transport in vivo, lack of exposure to the surrounding media in STB<sup>in</sup> TOs in vitro could compromise these functions and the associated environmental cues that influence STB nuclear differentiation. We have added text to the introduction to highlight this point (lines 117-120).

      (2) The differences in cluster numbers/names between primary tissue and TO data need a clearer explanation, and consistent annotation could aid in comparison. 

      Thank you for highlighting that the comparisions and cluster annotations need clarification. In Figure 1, we did not aim to directly compare CTB and STB nuclear subtypes between TOs and tissue. Each dataset was analyzed independently, with clusters determined separately and with different resolutions decided via a clustering algorithm (Zappia and Oshlack, 2018). For example, for the STB, this approach identified seven subtypes in tissue but only two in TOs, making direct comparison challenging. To address this challenge, we integrated the SN datasets from TOs and tissue in Figure 6. This integration allowed us to directly compare gene expression between the sample types and examine the proportions within each STB subtype. Similarly, in Figure 2, direct comparison of individual CTB or STB clusters across the separate datasets is challenging (Figures 2A-C) due to differences in clustering. To overcome this, we integrated the datasets to compare cluster gene expression and relative proportions (Figures 2D-E). Nonetheless, to address the reviewers concern we have added text to the results section to clarify that subclusters of CTB and STB between datasets should not be directly compared until the datasets are integrated in Figure 2D-E and Figure 6 (lines 166-167).

      (3) The rationale for using SN sequencing over SC sequencing for TO evaluations should be clarified, especially regarding the potential underrepresentation of certain trophoblast subsets. 

      This is an important point as the challenges of studying a giant syncytial cell are often underappreciated by researchers that study mononucleated cells. We have added text to the introduction to clarify why traditional single cell RNA sequencing techniques were inadequate to collect  and characterize the STB (lines 91-93).

      (4) Additionally, more evidence could be provided to support the claims about STB differentiation in the STB<sup>out</sup> model and to determine whether its differentiation trajectory is unique or simply more advanced than in STB<sup>in</sup>. 

      Our original conclusion that STB<sup>out</sup> nuclei are more terminally differentiated than STB<sup>in</sup> was based on two observations: (1) STB<sup>out</sup> TOs exhibit increased expression of STB-specific pregnancy hormones and many classic STB marker genes and (2) STB<sup>out</sup> nuclei show an enrichment of the STB-3 nuclear subtype, which appears at the end of the slingshot pseudotime trajectory. However, upon consideration of the reviewer comments, we agree that this evidence is not sufficient to definitively distinguish if STB<sup>out</sup> nuclei are more advanced or follow a unique differentiation trajectory dependent on new environmental cues. Pseudotime analyses provided only a predictive framework for lineage tracing, and these predictions must be experimentally validated. Real-time tracking of STB nuclear subtypes in TOs would require a suite of genetic tools beyond the scope of this study. Therefore, to address the reviewers' concerns we have removed language suggesting that STB-3 is a terminally differentiated subtype or that STB<sup>out</sup> nuclei are more differentiated than STB<sup>in</sup> nuclei throughout the text until the discussion. Therein we present both our original hypothesis (that STB nuclei are further differentiated in STB<sup>out</sup>) and alternative explanations like changing trajectories due to local environmental cues (lines 619-625).

      Reviewer #2 (Public review): 

      Strengths: 

      (1) The use of SN and SC RNA sequencing provides a detailed analysis of STB formation and differentiation. 

      (2) The identification of distinct STB subtypes and novel gene markers such as RYBP offers new insights into STB development. 

      Thank you for highlighting these strengths—we appreciate your recognition of our use of SN and SC RNA sequencing to analyze STB differentiation and the discovery of distinct STB subtypes and novel gene markers like RYBP.

      Weaknesses: 

      (1) Inconsistencies in data presentation. 

      We address the individual comments of reviewer 2 later in this response.

      (2) Questionable interpretation of lncRNA signals: The use of long non-coding RNA (lncRNA) signals as cell type-specific markers may represent sequencing noise rather than true markers. 

      We appreciate the reviewer’s attention to detail in noticing the lncRNA signature seen in many STB nuclear subtypes. However, we disagree that these molecules simply represent sequencing noise. In fact, may studies have rigorously demonstrated that lncRNAs have both cell and tissue specific gene expression (e.g., Zhao et al 2022, Isakova et al 2021, Zheng et al 2020). Further, they have been shown to be useful markers of unique cell types during development (e.g., Morales-Vicente et al 2022, Zhou et al 2019, Kim et al 2015) and can enhance clustering interpretability in breast cancer (Malagoli et al 2024). Many lncRNAs have also been demonstrated to play a functional role in the human placenta, including H19, MEG3, and MEG8 (Adu-Gyamfi et al 2023) and differences are even seen in nuclear subtypes in trophoblast stem cells (Khan et al 2021). Therefore, we prefer to keep these lncRNA signatures included and let future researchers test their functional role.

      To improve the study's validity and significance, it is crucial to address the inconsistencies and to provide additional evidence for the claims. Supplementing with immunofluorescence staining for validating the distribution of STB_in, STB_out, and EVT_enrich in the organoid models is recommended to strengthen the results and conclusions. 

      Each general trophoblast cell type (CTB, STB, EVT) has been visualized by immunofluorescence by the Coyne laboratory in their initial papers characterizing the STB<sup>in</sup>, STB<sup>out</sup>, and EVT<sup>enrich</sup> models (Yang et al, 2022 and 2023). We agree that it is important to validate the STB nuclear subtypes found in our genomic study. However, one challenge in studying a syncytia is that immunofluorescence may not be a definitive method when the nuclei share a common cytoplasm. This is because protein products from mRNAs transcribed in one nucleus are translated in the cytoplasm and could diffuse beyond sites of transcription. Therefore, RNA fluorescence in situ hybridization (RNA-FISH) is instead needed. While a systematic characterization of the spatial distribution of the many marker genes found each subtype is outside the scope of this study, we include RNA-FISH of one STB-2 marker (PAPPA2) and one STB-3 marker (ADAMTS6) in Figure 3F-G and Supplemental Figure 3.3. This demonstrates there is an increase in STB-2 marker gene expression in STB<sup>in</sup> TOs and an increase in STB-3 marker gene expression in STB<sup>out</sup> TOs. 

      Reviewer #3 (Public review):  

      The authors present outstanding progress toward their aim of identifying, "the underlying control of the syncytiotrophoblast". They identify the chromatin remodeler, RYBP, as well as other regulatory networks that they propose are critical to syncytiotrophoblast development. This study is limited in fully addressing the aim, however, as functional evidence for the contributions of the factors/pathways to syncytiotrophoblast cell development is needed. Future experimentation testing the hypotheses generated by this work will define the essentiality of the identified factors to syncytiotrophoblast development and function. 

      We thank the reviewer for their thoughtful assessment, constructive feedback, and encouraging comments. We acknowledge that the initial manuscript primarily presented analyses suggesting correlations between RYBP and other factors identified in the gene network analysis and STB function. Understanding how gene networks in the STB are formed and regulated is a long-term goal that will require many experiments with collaborative efforts across multiple research groups.

      Nonetheless, to address this concern we have knocked out two key genes, RYBP and AFF1, in TOs using CRISPR-Cas9-mediated gene targeting. Bulk RNA sequencing of STB<sup>in</sup> TOs from both wild-type (WT) and knockout strains revealed that deletion of either gene caused a statistically significant decrease in the expression of the pregnancy hormone human placental lactogen and an increase in the expression of several genes characteristic of the oxygen-sensing STB-2 subtype, including FLT-1, PAPPA2, SPON2, and SFXN3. These findings demonstrate that knocking out RYBP or AFF1 results in an increase in STB-2 marker gene expression and therefore play a role in inhibiting their expression in WT TOs (Figure 5D-E and supplemental Figure 5.2). We also note that this is the first application of CRISPR-mediated gene silencing in a TO model.

      Future work will visualize the distribution of STB nuclear subtypes in these mutants and explore the mechanistic role of RYBP and AFF1 in STB nuclear subtype formation and maintenance. However, these investigations fall outside the scope of the current study.

      Localization and validation of the identified factors within tissue and at the protein level will also provide further contextual evidence to address the hypotheses generated. 

      We agree that visualizing STB nuclear subtype distribution is essential for testing the many hypotheses generated by our analysis. To address this, we have included RNA-FISH experiments for two STB subtype markers (PAPPA2 for STB-2 and ADAMTS6 for STB-3) in TOs. These experiments reveal an increase in PAPPA2 expression in STB<sup>in</sup> TOs and an increase in ADAMTS6 expression in STB<sup>out</sup> TOs (Figure 3F-G and Supplemental Figure 3.3). Genomic studies serve as powerful hypothesis generators, and we look forward to future work—both our own and that of other researchers—to validate the markers and hypotheses presented from our analysis.

      Recommendations for the authors: 

      Reviewing Editor Comments: 

      We strongly encourage the authors to further strengthen the study by addressing all reviewers' comments and recommendations, with particular attention to the following key aspects:

      (1) Clarifying the uniqueness of the STB differentiation trajectory between STB<sup>in</sup> and STB<sup>out</sup>, and determining whether STB<sup>out</sup> represents a more advanced stage of differentiation compared to STB<sup>in</sup>. It is also important to specify which developmental stage of placental villi the STB<sup>out</sup> and STB<sup>in</sup> are simulating. 

      We have revised the manuscript to remove definitive language claiming that STB-3 represents a terminally differentiated subtype or that STB<sup>out</sup> nuclei are more differentiated than STB<sup>in</sup> nuclei. Instead, we now present our hypothesis and alternative explanations in the discussion (lines 619-625), and emphasize the need for experimental validation of pseudotime predictions to test these hypotheses.

      (2) Utilizing immunofluorescence to validate the distribution of cell types in the organoid models. 

      The Coyne lab has previously performed immunofluorescence of CTB and STB markers in STB<sup>in</sup> and STB<sup>out</sup> TOs (Yang et al 2023). The syncytial nature of STBs complicates immunofluorescence-based validation of the STB nuclear subtypes due translating proteins all sharing a single common cytoplasm and therefore being able to diffuse and mix. Instead, we performed RNA-FISH for two STB subtype markers (PAPPA2, STB-2 and ADAMTS6, STB-3), which showed subtype-specific nuclear enrichment in STB<sup>in</sup> and STB<sup>out</sup> TOs, respectively (Figure 3F-G and Supplemental Figure 3.3).  

      (3) Addressing concerns regarding the use of lncRNA as cell marker genes. Employing canonical markers alongside critical TFs involved in differentiation pathways to perform a more robust cell-type analysis and validation is recommended.  

      As discussed in detail above, we maintain that lncRNAs are valuable markers, supported by their demonstrated roles in cell and tissue specificity and placental function. These signatures provide important insights and hypotheses for future research, and we have clarified this rationale in the revised manuscript.

      Reviewer #1 (Recommendations for the authors): 

      (1) The authors have presented an extensive SC- and SN-based characterization of their improved trophoblast TO model, including a comparison to human placental tissues and the previous TO iteration. In this way, the authors' work represents an invaluable resource for investigators by providing thorough validation of the TO model and a clear description of the similarities and differences between primary and TO-derived STBs. I would suggest that the authors reshape the study to further highlight and emphasize this aspect of the study. 

      We thank the reviewer for their thoughtful recommendation and agree that our datasets will serve as an invaluable resource for comparing in vitro models to in vivo gene expression. However, extensive validation is required to make definitive conclusions about the extent to which these systems mirror one another and where they diverge. For this reason, in this manuscript, we have focused on characterizing STB subtypes to provide a foundational understanding of the model and this poorly characterized subtype.

      (2) Introduction, Paragraph 3: What is the importance of orientation for the trophoblast TO model? The authors may consider removing some of the less important methodologic details from this paragraph and including more emphasis on why their TO model is an improvement. 

      Text has been added to this paragraph to highlight the importance of outward facing STB orientation, which is essential to mirror the STB’s transport function in vitro (lines 118-120).

      (3) Results, Figure 1: In addition to the primary placental tissue plots showing all cell populations, it may be useful to have side-by-side versions of similar plots showing only the trophoblast subsets, so that the primary and TO data could be more easily compared visually. 

      This has been implemented and added to the Supplemental Figure 1.4.

      (4) Results, Figure 1: In simple terms, what is the reason for ending up with different cluster numbers/names from the primary tissue and TO? Would it be possible to apply the same annotation to each (at least for trophoblast types) and thus allow direct comparison between the two? 

      As described above, each dataset was separately analyzed and clusters determined with an algorithm to determine the optimal clustering resolution. Therefore, the number of clusters between each dataset cannot be directly compared until the SN TO and tissue datasets are integrated together in Figure 6. We have added text to the manuscript to make it clear that they should not be compared except for in bulk number until this point (230-232).

      (5) Results, Figure 2: For subsequent evaluation of different in vitro TO conditions, did the authors use only SN sequencing because they wanted to focus on STB? Based on Figure 1, it seems some CTB subsets would be underrepresented if using only SN. Given that the authors look at both STB and CTB in their different TOs, is this an issue? 

      The CTB clusters that showed the greatest divergence between SC and SN datasets were those associated with mitosis and the cell cycle, likely due to nuclear envelope breakdown interfering with capture by the 10x microfluidics pipeline. While cytoplasmic gene expression provides valuable insights into CTB function, our manuscript focuses on the STB starting from Figure 2. Since the STB is captured exclusively by the SN dataset, we concentrated on this approach to streamline our analysis.

      (6) Results, Figure 3: What do the authors consider to be the primary contributing factors for why the STB subsets display differential gene expression between STB<sup>in</sup> and STB<sup>out</sup>? Is this due primarily to the cultural conditions and/or a result of the differing spatial arrangement with CTBs? 

      This is an intriguing question that is challenging to disentangle because the culture conditions are integral to flipping the orientation. The two primary factors that differ between STB<sup>in</sup> and STB<sup>out</sup> TOs are the presence of extracellular matrix in STB<sup>in</sup> and direct exposure to the surrounding media in STB<sup>out</sup>. We believe these environmental cues play a significant role in shaping the gene expression of STB subsets. Fully disentangling this relationship would require a method to alter the TO orientation without changing the culture conditions. While this is an exciting direction for future research, it falls outside the scope of the present study.

      (7) Results, Figure 4: The authors' analysis indicates that the STB nuclei from the STB<sup>out</sup> TO are likely "more differentiated" than those in STB<sup>in</sup> TO. Could the authors provide some qualitative or quantitative support for this? Is the STB<sup>out</sup> differentiated phenotype closer to what would be observed in a fully formed placenta? 

      As discussed earlier, we agree with the reviewers that this claim should be removed from the text outside of the discussion.

      (8) Results, Figure 5: Based on the trajectory analysis, do the authors consider that the STB from STB<sup>out</sup> TO are simply further along the differentiation pathway compared to those from STB<sup>in</sup> TO, or do the STB from STB<sup>out</sup> TO follow a differentiation pathway that is intrinsically distinct from STB<sup>in</sup> TO? 

      We think the idea of an intrinsically distinct pathway is a fascinating alternative hypothesis and have added it into the discussion. We do not find the pseudotime currently allows us to answer this question without additional experiments, so we have removed claims that the STB<sup>out</sup> STB nuclei are further along the differentiation pathway.

      (9) Results, Figure 6: A notable difference between the STB<sup>out</sup> TO and the term tissue is that the CTB subsets are much more prevalent. Is this simply a scale difference, i.e. due to the size of the human placenta compared to the limited STB nuclei available in the STB<sup>out</sup> TO? Or are there other contributing factors? 

      The proportion of CTB to STB nuclei in our term tissue (9:1) aligns with expectations based on stereological estimates. We believe the relatively low number of CTB nuclei in our dataset is due to the need for a larger sample size to capture more of this less abundant cell type. Since the primary focus of this paper is on STB, and we analyzed over 4,000 STB nuclei, we do not view this as a limitation. However, future studies utilizing SN to investigate term tissue should account for the abundance of STB nuclei and plan their sampling carefully to ensure sufficient representation of CTB nuclei if this is a desired focus.

      Reviewer #2 (Recommendations for the authors): 

      (1) The color annotations for cell types in Figure 2 are inconsistent between the different panels, and the term "Prolif" in Figure 2E is not explained by the authors. 

      We chose colors to enhance visibility on the UMAP. We do not wish readers to make direct comparisons between the different CTB or STB subtypes of the sample types until the datasets are integrated in Figure 2D. This is because an algorithm for the clustering resolution has been chosen independently for each dataset. Cluster proportions are better compared in the integrated datasets in Figure 2D. We have added text to the results section to make this clear to the reader (lines 166-167).

      (2) In Figure 3 and Supplementary Figures 1.3, the authors frequently present long non-coding RNA (lncRNA) signals as cell type-specific markers in the bubble plots. These signals are likely sequencing noise and may not accurately represent true markers for those cell types. It is recommended to revise this interpretation. 

      As referenced above, there are many examples of lncRNAs that have biological and pathological significance in the placenta (H19, Meg3, Meg8) and lncRNAs often have cell type specific expression that can enhance clustering. We prefer to keep these signatures included and let future researchers determine their biological significance.

      (3) In Figure 3C, the authors performed pathway enrichment analysis on the STB subtypes after integrating STB_in and STB_out organoids. The enrichment of the "transport across the blood-brain barrier" pathway in the STB-3 subtype does not align with the current understanding of STB cell function. Please provide corresponding supporting evidence. Additionally, please verify whether the other functional pathways represent functions specific to the STB subtypes. 

      Interestingly, many of the genes categorized under “transport across the blood-brain barrier” are transporters shared with “vascular transport.” These include genes involved in the transport of amino acids (SLC7A1, SLC38A1, SLC38A3, SLC7A8), molecules essential for lipid metabolism (SLC27A4, SLC44A1), and small molecule exchange (SLC4A4, SLC5A6). Given that the vasculature, the STB, and the blood-brain barrier all perform critical barrier functions, it is unsurprising that molecules associated with these GO terms are enriched in the STB-3 subtype, which expresses numerous transporter proteins. Since the transport of materials across the STB is a well-established function, we have not included additional supporting evidence but have clarified the genes associated with this GO term in the text (lines 392-394 and supplemental Table 9).

      (4) The pseudotime heatmap in Figure 4B is not properly arranged and is inconsistent with the differentiation relationships shown in Figure 4A. It is recommended to revise this. 

      We are uncertain which aspect of the heatmap in Figure 4A is perceived as inconsistent with Figure 4B. One distinction is that pseudotime in Figure 4A is normalized from 0 to 100 to fit the blue-to-yellow-to-red color scale, whereas in Figure 4B, the color scale is not normalized and the color bar ranging from white to red. This difference reflects our intent to simplify Figure 4B-C, as the abundance of color between cell types and gene expression changes required a streamlined representation to ensure the figure remained clear and easy to interpret. This is classically done in the field and consistent with the default code in the slingshot package.

      (5) In Figures 4C and 4D, although RYBP is highly expressed in STB, it is difficult to support the conclusion that RYBP shows the most significant expression changes. It is recommended to provide additional evidence. 

      The claim that RYBP exhibits the most significant expression changes was based on p-value ordering of genes associated with pseudotime via the associationTest function in slingshot and not with immunofluorescence data. The text has been revised to make this distinction clear (lines 390-393).

      (6) In Figure 4E, staining for CTB marker genes is missing, and in Figure 4F, CYTO is difficult to use as a classical STB marker. It is recommended to use the CGBs antibody from Figure 4E as a STB marker for staining to provide evidence.  

      We have revised the Figure 5B-C to use e-Cadherin as a CTB marker gene in TOs and CGB antibody as a marker of STB.

      In tissue, however, obtaining a good STB marker that does not overlap with the RYBP antibody (rabbit) in term tissue is difficult as the STB downregulates hCG expression closer to term to initiate contractions. SDC1 is often used but only labels the plasma membrane so does not help in distinguishing the STB cytoplasm. We have added an image of cytokeratin, e-Cadherin, and the STB marker ENDOU to validate that our current approach with e-Cadherin and cytokeratin allows us to accurately distinguish between CTB and STB cells.

      (7) The velocity results in Figure 5A do not align with the differentiation relationships between cells and contradict the pseudotime results presented in Figure 4 by the authors. 

      The reviewer raises an interesting observation regarding the velocity map in Figure 5A, which appears to show a bifurcation into two STB subtypes. This observation aligns with similar findings reported in tissue by our colleagues (Wang et al., 2024). However, given the low number of CTB cells in our tissue dataset, we were cautious about making definitive conclusions about pseudotime without a larger sample size. Notably, the RNA velocity map closely resembles the pseudotime trajectory in TOs, with CTB transitioning into the CTB-pf subtype and subsequently into the STB. One potential explanation for discrepancies between tissue and TOs is the difference in nuclear age: nuclei in tissue can be up to nine months old, whereas those in TOs are only hours or days old. It is possible that the lineage in TOs could bifurcate if cultured for longer than 48 hours, but our current dataset captures only the early stages of the STB differentiation process. While exploring these hypotheses is fascinating, they are beyond the scope of this current study.  

      Reviewer #3 (Recommendations for the authors): 

      Amazing work - I greatly enjoyed reading the manuscript. Here are a few questions and suggestions for consideration: 

      Evidence presented throughout the results sections hints that the organoids may represent an earlier stage of placental development compared to the term. Increased hCG gene expression is observed, but as noted expression is decreased in term STB. STB:CTB ratios are also higher at term compared to the first trimester, etc. It was difficult to conclude definitively based on how data is presented in Fig 6 and discussed. Maybe there is no clear answer. Perhaps the altered cell type ratios in the organoid models (e.g., few STB in EVT enrich conditions) impact recapitulation of the in vivo local microenvironment signaling. As such, can the authors speculate on whether cell ratios could be strategically leveraged to model different gestational time points? 

      Along these same lines, syncytiotrophoblast in early implantation (before proper villi development) is often described as invasive and later at the tertiary villi stage defined by hormone production, barrier function, and nutrient/gas exchange. Do the authors think the different STB subtypes captured in the organoid models represent different stages/functions of syncytiotrophoblast in placental development? 

      Minor Comments 

      (1) Please clarify what the third number represents in the STB:CTB ratio (e.g., 1:3:1 and 2:5:1). EVT? 

      The first number is a decimal point and not a colon (ie 1.3 and 2.5). Therefore these numbers are to be read as the STB:CTB ratio is 1.3 to 1 or 2.5 to 1.

      (2) Could consider co-localizing RYBP in term tissue with a syncytio-specific marker like CGB used for organoids (Fig 4F). 

      We addressed this concern in comment 6 to reviewer 2.

      (3) Recommend defining colors-which colors represent which module in Figure 5C in the legend and main body text. I see the labels surrounding the heatmap in 5B, but defining colors in text (e.g. cyan, magenta, etc.) would be helpful. Do the gray circles represent targets that don't belong to a specific module? Are the bolded factor names based on a certain statistical cutoff/defining criteria or were they manually selected? 

      The text of both the results and figure legends has been revised to clarify these points.

      (4) Data Availability: It would be helpful to provide supplemental table files for analyses (e.g., 5C to list the overlapping relationships in TGs for each TF/CR (5C) and 3E/6F to list DEG genes in comparisons). 

      Supplemental files for each analysis have been added (Supplemental Table 8-14). In addition, the raw and processed data is available on GEO and we have created an interactive Shiny App so people without coding experience can interact with each dataset (lines 917-919).

      (5) “...and found that each sample expressed these markers (Figure 6D), suggesting..." Consider clarifying "these". 

      Text has been added to refer to a few of these marker genes within the text (line 540).

      Citations

      (1) Zappia L, Oshlack A. Clustering trees: a visualization for evaluating clusterings at multiple resolutions. GigaScience. 2018;7(7):giy083. PMCID: PMC6057528

      (2) Zhou J, Xu J, Zhang L, Liu S, Ma Y, Wen X, Hao J, Li Z, Ni Y, Li X, Zhou F, Li Q, Wang F, Wang X, Si Y, Zhang P, Liu C, Bartolomei M, Tang F, Liu B, Yu J, Lan Y. Combined Single-Cell Profiling of lncRNAs and Functional Screening Reveals that H19 Is Pivotal for Embryonic Hematopoietic Stem Cell Development. Cell Stem Cell. 2019;24(2):285-298.e5. PMID: 30639035

      (3) Malagoli G, Valle F, Barillot E, Caselle M, Martignetti L. Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach. Cancers. 2024;16(7):1350. PMCID: PMC11011054

      (4) Adu-Gyamfi EA, Cheeran EA, Salamah J, Enabulele DB, Tahir A, Lee BK. Long non-coding RNAs: a summary of their roles in placenta development and pathology†. Biol Reprod. 2023;110(3):431–449. PMID: 38134961

      (5) Zheng M, Hu Y, Gou R, Nie X, Li X, Liu J, Lin B. Identification three LncRNA prognostic signature of ovarian cancer based on genome-wide copy number variation. Biomed Pharmacother. 2020;124:109810. PMID: 32000042

      (6) Khan T, Seetharam AS, Zhou J, Bivens NJ, Schust DJ, Ezashi T, Tuteja G, Roberts RM. Single Nucleus RNA Sequence (snRNAseq) Analysis of the Spectrum of Trophoblast Lineages Generated From Human Pluripotent Stem Cells in vitro. Front Cell Dev Biol. 2021;9:695248. PMCID: PMC8334858

      (7) Isakova A, Neff N, Quake SR. Single-cell quantification of a broad RNA spectrum reveals unique noncoding patterns associated with cell types and states. Proc Natl Acad Sci United States Am. 2021;118(51):e2113568118. PMCID: PMC8713755

      (8) Morales-Vicente DA, Zhao L, Silveira GO, Tahira AC, Amaral MS, Collins JJ, Verjovski-Almeida S. Singlecell RNA-seq analyses show that long non-coding RNAs are conspicuously expressed in Schistosoma mansoni gamete and tegument progenitor cell populations. Front Genet. 2022;13:924877. PMCID: PMC9531161

      (9) Kim DH, Marinov GK, Pepke S, Singer ZS, He P, Williams B, Schroth GP, Elowitz MB, Wold BJ. Single-Cell

      Transcriptome Analysis Reveals Dynamic Changes in lncRNA Expression during Reprogramming. Cell Stem Cell. 2015;16(1):88–101. PMCID: PMC4291542

      (10) Yang L, Liang P, Yang H, Coyne CB. Trophoblast organoids with physiological polarity model placental structure and function. bioRxiv. 2023;2023.01.12.523752. PMCID: PMC9882188

    1. Author response:

      General Statements

      In our manuscript, we demonstrate for the first time that RNA Polymerase I (Pol I) can prematurely release nascent transcripts at the 5' end of ribosomal DNA transcription units in vivo. This achievement was made possible by comparing wild-type Pol I with a mutant form of Pol I, hereafter called SuperPol previously isolated in our lab (Darrière at al., 2019). By combining in vivo analysis of rRNA synthesis (using pulse-labelling of nascent transcript and cross-linking of nascent transcript - CRAC) with in vitro analysis, we could show that Superpol reduced premature transcript release due to altered elongation dynamics and reduced RNA cleavage activity. Such premature release could reflect regulatory mechanisms controlling rRNA synthesis. Importantly, This increased processivity of SuperPol is correlated with resistance with BMH-21, a novel anticancer drugs inhibiting Pol I, showing the relevance of targeting Pol I during transcriptional pauses to kill cancer cells. This work offers critical insights into Pol I dynamics, rRNA transcription regulation, and implications for cancer therapeutics.

      We sincerely thank the three reviewers for their insightful comments and recognition of the strengths and weaknesses of our study. Their acknowledgment of our rigorous methodology, the relevance of our findings on rRNA transcription regulation, and the significant enzymatic properties of the SuperPol mutant is highly appreciated. We are particularly grateful for their appreciation of the potential scientific impact of this work. Additionally, we value the reviewer’s suggestion that this article could address a broad scientific community, including in transcription biology and cancer therapy research. These encouraging remarks motivate us to refine and expand upon our findings further.

      All three reviewers acknowledged the increased processivity of SuperPol compared to its wildtype counterpart. However, two out of three questions our claims that premature termination of transcription can regulate ribosomal RNA transcription. This conclusion is based on SuperPol mutant increasing rRNA production. Proving that modulation of early transcription termination is used to regulate rRNA production under physiological conditions is beyond the scope of this study. Therefore, we propose to change the title of this manuscript to focus on what we have unambiguously demonstrated:

      “Ribosomal RNA synthesis by RNA polymerase I is subjected to premature termination of transcription”.

      Reviewer 1 main criticisms centers on the use of the CRAC technique in our study. While we address this point in detail below, we would like to emphasize that, although we agree with the reviewer’s comments regarding its application to Pol II studies, by limiting contamination with mature rRNA, CRAC remains the only suitable method for studying Pol I elongation over the entire transcription units. All other methods are massively contaminated with fragments of mature RNA which prevents any quantitative analysis of read distribution within rDNA.  This perspective is widely accepted within the Pol I research community, as CRAC provides a robust approach to capturing transcriptional dynamics specific to Pol I activity. 

      We hope that these findings will resonate with the readership of your journal and contribute significantly to advancing discussions in transcription biology and related fields.

      (1) Description of the planned revisions

      Despite numerous text modification (see below), we agree that one major point of discussion is the consequence of increased processivity in SuperPol mutant on the “quality” of produced rRNA. Reviewer 3 suggested comparisons with other processive alleles, such as the rpb1-E1103G mutant of the RNAPII subunit (Malagon et al., 2006). This comparison has already been addressed by the Schneider lab (Viktorovskaya OV, Cell Rep., 2013 - PMID: 23994471), which explored Pol II (rpb1-E1103G) and Pol I (rpa190-E1224G). The rpa190-E1224G mutant revealed enhanced pausing in vitro, highlighting key differences between Pol I and Pol II catalytic ratelimiting steps (see David Schneider's review on this topic for further details).

      Reviewer 2 and 3 suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Pol I mutant with decreased rRNA cleavage have been characterized previously, and resulted in increased errorrate. We already started to address this point. Preliminary results from in vitro experiments suggest that SuperPol mutants exhibit an elevated error rate during transcription. However, these findings remain preliminary and require further experimental validation to confirm their reproducibility and robustness. We propose to consolidate these data and incorporate into the manuscript to address this question comprehensively. This could provide valuable insights into the mechanistic differences between SuperPol and the wild-type enzyme. SuperPol is the first pol I mutant described with an increased processivity in vitro and in vivo, and we agree that this might be at the cost of a decreased fidelity.

      Regulatory aspect of the process:

      To address the reviewer’s remarks, we propose to test our model by performing experiments that would evaluate PTT levels in Pol I mutant’s or under different growth conditions. These experiments would provide crucial data to support our model, which suggests that PTT is a regulatory element of Pol I transcription. By demonstrating how PTT varies with environmental factors, we aim to strengthen the hypothesis that premature termination plays an important role in regulating Pol I activity.

      We propose revising the title and conclusions of the manuscript. The updated version will better reflect the study's focus and temper claims regarding the regulatory aspects of termination events, while maintaining the value of our proposed model.

      (2) Description of the revisions that have already been incorporated in the transferred manuscript

      Some very important modifications have now been incorporated:

      Statistical Analyses and CRAC Replicates:

      Unlike reviewers 2 and 3, reviewer 1 suggests that we did not analyze the results statistically. In fact, the CRAC analyses were conducted in biological triplicate, ensuring robustness and reproducibility. The statistical analyses are presented in Figure 2C, which highlights significant findings supporting the fact WT Pol I and SuperPol distribution profiles are different. We CRAC replicates exhibit a high correlation and we confirmed significant effect in each region of interest (5’ETS, 18S.2, 25S.1 and 3’ ETS, Figure 1) to confirm consistency across experiments. We finally took care not to overinterpret the results, maintaining a rigorous and cautious approach in our analysis to ensure accurate conclusions.

      CRAC vs. Net-seq:

      Reviewer 1 ask to comment differences between CRAC and Net-seq. Both methods complement each other but serve different purposes depending on the biological question on the context of transcription analysis. Net-seq has originally been designed for Pol II analysis. It captures nascent RNAs but does not eliminate mature ribosomal RNAs (rRNAs), leading to high levels of contamination. While this is manageable for Pol II analysis (in silico elimination of reads corresponding to rRNAs), it poses a significant problem for Pol I due to the dominance of rRNAs (60% of total RNAs in yeast), which share sequences with nascent Pol I transcripts. As a result, large Net-seq peaks are observed at mature rRNA extremities (Clarke 2018, Jacobs 2022). This limits the interpretation of the results to the short lived pre-rRNA species. In contrast, CRAC has been specifically adapted by the laboratory of David Tollervey to map Pol I distribution while minimizing contamination from mature rRNAs (The CRAC protocol used exclusively recovers RNAs with 3′ hydroxyl groups that represent endogenous 3′ ends of nascent transcripts, thus removing RNAs with 3’-Phosphate, found in mature rRNAs). This makes CRAC more suitable for studying Pol I transcription, including polymerase pausing and distribution along rDNA, providing quantitative dataset for the entire rDNA gene.

      CRAC vs. Other Methods:

      Reviewer 1 suggests using GRO-seq or TT-seq, but the experiments in Figure 2 aim to assess the distribution profile of Pol I along the rDNA, which requires a method optimized for this specific purpose. While GRO-seq and TT-seq are excellent for measuring RNA synthesis and cotranscriptional processing, they rely on Sarkosyl treatment to permeabilize cellular and nuclear membranes. Sarkosyl is known to artificially induces polymerase pausing and inhibits RNase activities which are involved in the process. To avoid these artifacts, CRAC analysis is a direct and fully in vivo approach. In CRAC experiment, cells are grown exponentially in rich media and arrested via rapid cross-linking, providing precise and artifact-free data on Pol I activity and pausing.

      Pol I ChIP Signal Comparison:

      The ChIP experiments previously published in Darrière et al. lack the statistical depth and resolution offered by our CRAC analyses. The detailed results obtained through CRAC would have been impossible to detect using classical ChIP. The current study provides a more refined and precise understanding of Pol I distribution and dynamics, highlighting the advantages of CRAC over traditional methods in addressing these complex transcriptional processes.

      BMH-21 Effects:

      As highlighted by Reviewer 1, the effects of BMH-21 observed in our study differ slightly from those reported in earlier work (Ref Schneider 2022), likely due to variations in experimental conditions, such as methodologies (CRAC vs. Net-seq), as discussed earlier. We also identified variations in the response to BMH-21 treatment associated with differences in cell growth phases and/or cell density. These factors likely contribute to the observed discrepancies, offering a potential explanation for the variations between our findings and those reported in previous studies. In our approach, we prioritized reproducibility by carefully controlling BMH-21 experimental conditions to mitigate these factors. These variables can significantly influence results, potentially leading to subtle discrepancies. Nevertheless, the overall conclusions regarding BMH-21's effects on WT Pol I are largely consistent across studies, with differences primarily observed at the nucleotide resolution. This is a strength of our CRAC-based analysis, which provides precise insights into Pol I activity.

      We will address these nuances in the revised manuscript to clarify how such differences may impact results and provide context for interpreting our findings in light of previous studies.

      Minor points:

      Reviewer #1:

      •  In general, the writing style is not clear, and there are some word mistakes or poor descriptions of the results, for example: 

      •  On page 14: "SuperPol accumulation is decreased (compared to Pol I)". 

      •  On page 16: "Compared to WT Pol I, the cumulative distribution of SuperPol is indeed shifted on the right of the graph." 

      We clarified and increased the global writing style according to reviewer comment.

      •  There are also issues with the literature, for example: Turowski et al, 2020a and Turowski et al, 2020b are the same article (preprint and peer-reviewed). Is there any reason to include both references? Please, double-check the references.  

      This was corrected in this version of the manuscript.

      •  In the manuscript, 5S rRNA is mentioned as an internal control for TMA normalisation. Why are Figure 1C data normalised to 18S rRNA instead of 5S rRNA? 

      Data are effectively normalized relative to the 5S rRNA, but the value for the 18S rRNA is arbitrarily set to 100%.

      •  Figure 4 should be a supplementary figure, and Figure 7D doesn't have a y-axis labelling. 

      The presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. In the absence of these subunits (which can vary depending on the purification batch), Pol I pausing, cleavage and elongation are known to be affected. To strengthen our conclusion, we really wanted to show the subunit composition of the purified enzyme. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      Y-axis is figure 7D is now correctly labelled

      •  In Figure 7C, BMH-21 treatment causes the accumulation of ~140bp rRNA transcripts only in SuperPol-expressing cells that are Rrp6-sensitive (line 6 vs line 8), suggesting that BHM-21 treatment does affect SuperPol. Could the author comment on the interpretation of this result? 

      The 140 nt product is a degradation fragment resulting from trimming, which explains its lower accumulation in the absence of Rrp6. BMH21 significantly affects WT Pol I transcription but has also a mild effect on SuperPol transcription. As a result, the 140 nt product accumulates under these conditions.

      Reviewer #2:

      •  pp. 14-15: The authors note local differences in peak detection in the 5'-ETS among replicates, preventing a nucleotide-resolution analysis of pausing sites. Still, they report consistent global differences between wild-type and SuperPol CRAC signals in the 5'ETS (and other regions of the rDNA). These global differences are clear in the quantification shown in Figures 2B-C. A simpler statement might be less confusing, avoiding references to a "first and second set of replicates" 

      According to reviewer, statement has been simplified in this version of the manuscript.

      •  Figures 2A and 2C: Based on these data and quantification, it appears that SuperPol signals in the body and 3' end of the rDNA unit are higher than those in the wild type. This finding supports the conclusion that reduced pausing (and termination) in the 5'ETS leads to an increased Pol I signal downstream. Since the average increase in the SuperPol signal is distributed over a larger region, this might also explain why even a relatively modest decrease in 5'ETS pausing results in higher rRNA production. This point merits discussion by the authors. 

      We agree that this is a very important discussion of our results. Transcription is a very dynamic process in which paused polymerase is easily detected using the CRAC assay. Elongated polymerases are distributed over a much larger gene body, and even a small amount of polymerase detected in the gene body can represent a very large rRNA synthesis. This point is of paramount importance and, as suggested by the reviewer, is now discussed in detail.

      •  A decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Have the authors observed any evidence supporting this possibility? 

      Reviewer suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. We already started to address this point. Preliminary results from in vitro experiments suggest that SuperPol mutants exhibit an elevated error rate during transcription. However, these findings remain preliminary and require further experimental validation to confirm their reproducibility and robustness. We propose to consolidate these data and incorporate into the manuscript to address this question comprehensively.

      •  pp. 15 and 22: Premature transcription termination as a regulator of gene expression is welldocumented in yeast, with significant contributions from the Corden, Brow, Libri, and Tollervey labs. These studies should be referenced along with relevant bacterial and mammalian research. 

      According to reviewer suggestion, we referenced these studies.

      •  p. 23: "SuperPol and Rpa190-KR have a synergistic effect on BMH-21 resistance." A citation should be added for this statement. 

      This represents some unpublished data from our lab. KR and SuperPol are the only two known mutants resistant to BMH-21. We observed that resistance between both alleles is synergistic, with a much higher resistance to BMH-21 in the double mutant than in each single mutant (data not shown). Comparing their resistance mechanisms is a very important point that we could provide upon request. This was added to the statement.

      •  p. 23: "The released of the premature transcript" - this phrase contains a typo 

      This is now corrected.

      Reviewer #3:

      •  Figure 1B: it would be opportune to separate the technique's schematic representation from the actual data. Concerning the data, would the authors consider adding an experiment with rrp6D cells? Some RNAs could be degraded even in such short period of time, as even stated by the authors, so maybe an exosome depleted background could provide a more complete picture. Could also the authors explain why the increase is only observed at the level of 18S and 25S? To further prove the robustness of the Pol I TMA method could be good to add already characterized mutations or other drugs to show that the technique can readily detect also well-known and expected changes. 

      The precise objective of this experiment is to avoid the use of the Rrp6 mutant. Under these conditions, we prevent the accumulation of transcripts that would result from a maturation defect. While it is possible to conduct the experiment with the Rrp6 mutant, it would be impossible to draw reliable conclusions due to this artificial accumulation of transcripts.

      •  Figure 1C: the NTS1 probe signal is missing (it is referenced in Figure 1A but not listed in the Methods section or the oligo table). If this probe was unused, please correct Figure 1A accordingly. 

      We corrected Figure 1A.  

      •  Figure 2A: the RNAPI occupancy map by CRAC is hard to interpret. The red color (SuperPol) is stacked on top of the blue line, and we are not able to observe the signal of the WT for most of the position along the rDNA unit. It would be preferable to use some kind of opacity that allows to visualize both curves. Moreover, the analysis of the behavior of the polymerase is always restricted to the 5'ETS region in the rest of the manuscript. We are thus not able to observe whether termination events also occur in other regions of the rDNA unit. A Northern blot analysis displaying higher sizes would provide a more complete picture. 

      We addressed this point to make the figure more visually informative. In Northern Blot analysis, we use a TSS (Transcription Start Site) probe, which detects only transcripts containing the 5' extremity. Due to co-transcriptional processing, most of the rRNA undergoing transcription lacks its 5' extremity and is not detectable using this technique. We have the data, but it does not show any difference between Pol I and SuperPol. This information could be included in the supplementary data if asked.

      •  "Importantly, despite some local variations, we could reproducibly observe an increased occupancy of WT Pol I in 5'-ETS compared to SuperPol (Figure 1C)." should be Figure 2C. 

      Thanks for pointing out this mistake. it has been corrected.

      •  Figure 3D: most of the difference in the cumulative proportion of CRAC reads is observed in the region ~750 to 3000. In line with my previous point, I think it would be worth exploring also termination events beyond the 5'-ETS region. 

      We agree that such an analysis would have been interesting. However, with the exception of the pre-rRNA starting at the transcription start site (TSS) studied here, any cleaved rRNA at its 5' end could result from premature termination and/or abnormal processing events. Exploring the production of other abnormal rRNAs produced by premature termination is a project in itself, beyond this initial work aimed at demonstrating the existence of premature termination events in ribosomal RNA production.

      •  Figure 4: should probably be provided as supplementary material. 

      As l mentioned earlier (see comments), the presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      •  "While the growth of cells expressing SuperPol appeared unaffected, the fitness of WT cells was severely reduced under the same conditions." I think the growth of cells expressing SuperPol is slightly affected. 

      We agree with this comment and we modified the text accordingly.

      •  Figure 7D: the legend of the y-axis is missing as well as the title of the plot. 

      Legend of the y-axis and title of the plot are now present.

      •  The statements concerning BMH-21, SuperPol and Rpa190-KR in the Discussion section should be removed, or data should be provided.

      This was discussed previously. See comment above.

      •  Some references are missing from the Bibliography, for example Merkl et al., 2020; Pilsl et al., 2016a, 2016b. 

      Bibliography is now fixed

      (3) Description of analyses that authors prefer not to carry out

      Does SuperPol mutant produces more functional rRNAs ?

      As Reviewer 1 requested, we agree that this point requires clarification.. In cells expressing SuperPol, a higher steady state of (pre)-rRNAs is only observed in absence of degradation machinery suggesting that overproduced rRNAs are rapidly eliminated. We know that (pre)rRNas are unable to accumulate in absence of ribosomal proteins and/or Assembly Factors (AF). In consequence, overproducing rRNAs would not be sufficient to increase ribosome content. This specific point is further address in our lab but is beyond the scope of this article.

      Is premature termination coupled with rRNA processing 

      We appreciate the reviewer’s insightful comments. The suggested experiments regarding the UTP-A complex's regulatory potential are valuable and ongoing in our lab, but they extend beyond the scope of this study and are not suitable for inclusion in the current manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):  

      Summary:  

      This study provides new insights into the role of miR-19b, an oncogenic microRNA, in the developing chicken pallium. Dynamic expression pattern of miR-19b is associated with its role in regulating cell cycle progression in neural progenitor cells. Furthermore, miR-19b is involved in determining neuronal subtypes by regulating Fezf2 expression during pallial development. These findings suggest an important role for miR-19b in the coordinated spatio-temporal regulation of neural progenitor cell dynamics and its evolutionary conservation across vertebrate species.  

      Strengths:  

      The authors identified conserved roles of miR-19 in the regulation of neural progenitor maintenance between mouse and chick, and the latter is mediated by the repression of E2f8 and NeuroD1. Furthermore, the authors found that miR-19b-dependent cell cycle regulation is tightly associated with specification of Fezf1 or Mef2c-positive neurons, in spatio-temporal manners during chicken pallial development. These findings uncovered molecular mechanisms underlying microRNA-mediated neurogenic controls.  

      Weaknesses:  

      Although the authors in this study claimed striking similarities of miR-19a/b in neurogenesis between mouse and chick pallium, a previous study by Bian et al. revealed that miR-19a contributes the expansion of radial glial cells by suppressing PTEN expression in the developing mouse neocortex, while miR-19b maintains apical progenitors via inhibiting E2f2 and NeuroD1 in chicken pallium. Thus, it is still unclear whether the orthologous microRNAs regulate common or species-specific target genes.  

      In this study, we have proposed that miR-19b regulates similar phenomena in both species using different targets, such as regulation of proliferation through PTEN in mouse and through E2f8 in the chicken.

      The spatiotemporal expression patterns of miR-19b and several genes are not convincing. For example, the authors claim that NeuroD1 is initially expressed uniformly in the subventricular zone (SVZ) but disappears in the DVR region by HH29 and becomes detectable by HH35 (Figure 1). However, the in situ hybridization data revealed that NeuroD1 is highly expressed in the SVZ of the DVR at HH29 (Figure 4F). Thus, perhaps due to the problem of immunohistochemistry, the authors have not been able to detect NeuroD1 expression in Figure 1D, and the interpretation of the data may require significant modification.  

      While Fig. 1B may suggest that NeuroD1 expression has disappeared from the DVR region by HH29, this is not true in general because we have observed NeuroD1 to be expressed in the DVR at HH29 in images of other sections. In the revised version, we will include improved images for panels of Fig. 1B which accurately show the expression pattern of NeuroD1 and miR19b at stages HH29 and HH35.  

      It seems that miR-19b is also expressed in neurons (Figure 1), suggesting the role of miR19-b must be different in progenitors and differentiated neurons. The data on the gain- and loss-offunction analysis of miR-19b on the expression of Mef2c should be carefully considered, as it is possible that these experiments disturb the neuronal functions of miR19b rather than in the progenitors.

      As pointed out by the reviewer, it is quite possible that upon manipulation of miR19b its neuronal functions are also perturbed in addition to its function in progenitor cells. After introducing gain-of-function construct in progenitor cells, we have observed changes in the morphology of these cells. These data will be included in the revised version.

      The regions of chicken pallium were not consistent among figures: in Figure 1, they showed caudal parts of the pallium (HH29 and 35), while the data in Figure 4 corresponded to the rostral part of the pallium (Figure 4B).  

      We will address this by providing images from a similar region of the pallium showing Fezf2 and Mef2c expression patterns.

      The neurons expressing Fezf2 and Mef2 in the chicken pallium are not homologous neuronal subtypes to mammalian deep and superficial cortical neurons. The authors must understand that chicken pallial development proceeds in an outside-in manner. Thus, Mef2c-postive neurons in a superficial part are early-born neurons, while FezF2-positive neurons residing in deep areas are later-born neurons. It should be noted that the expression of a single marker gene does not support cell type homology, and the authors' description "the possibility of primitive pallial lamina formation in common ancestors of birds and mammals" is misleading.  

      We appreciate this clarification and will modify or remove this statement regarding the “primitive pallial lamina formation” to avoid any confusion and misinterpretation. 

      Overexpression of CDKN1A or Sponge-19b induced ectopic expression of Fezf2 in the ventricular zone (Figure 3C, E). Do these cells maintain progenitor statement or prematurely differentiate to neurons? In addition, the authors must explain that the induction of Fezf2 is also detected in GFP-negative cells.  

      We propose to follow up on the fate of these cells by extending the observation period post-overexpression of CDKN1A or Sponge-19b to assess whether they retain progenitor characteristics or differentiate. The presence of Fezf2 in GFP-negative cells could be due to the non-cell-autonomous effects, and we will discuss this possibility in the revised manuscript.

      Reviewer #2 (Public review):  

      Summary:  

      This paper investigates the general concept that avian and mammalian pallium specifications share similar mechanisms. To explore that idea, the authors focus their attention on the role of miR-19b as a key controlling factor in the neuronal proliferation/differentiation balance. To do so, the authors checked the expression and protein level of several genes involved in neuronal differentiation, such as NeuroD1 or E2f8, genes also expressed in mammals after conducting their functional gene manipulation experiments. The work also shows a dysregulation in the number of neurons from lower and upper layers when miR-19b expression is altered.  

      To test it, the authors conducted a series of functional experiments of gain and loss of function (G&LoF) and enhancer-reporter assays. The enhancer-reporter assays demonstrate a direct relationship between miR-19b and NeuroD1 and E2f8 which is also validated by the G&LoF experiments. It´s also noteworthy to mention that the way miR-19b acts is maintaining the progenitor cells from the ventricular zone in an undifferentiated stage, thus promoting them into a stage of cellular division.  

      Overall, the paper argues that the expression of miR-19b in the ventricular zone promotes the cells in a proliferative phase and inhibits the expression of differentiation genes such as E2f8 and NeurD1. The authors claim that a decrease in the progenitor cell pool leads to an increase and decrease in neurons in the lower and upper layers, respectively.  

      Strengths:  

      (1) Novelty Contribution  

      The paper offers strong arguments to prove that the neurodevelopmental basis between mammals and birds is quite the same. Moreover, this work contributes to a better understanding of brain evolution along the animal evolutionary tree and will give us a clearer idea about the roots of how our brain has been developed. This stands in contrast to the conventional framing of mammal brain development as an independent subject unlinked to the "less evolved species". The authors also nicely show a concept that was previously restricted to mammals - the role of microRNAs in development.  

      (2) Right experimental approach  

      The authors perform a set of functional experiments correctly adjusted to answer the role of miR-19b in the control of neuronal stem cell proliferation and differentiation. Their histological, functional, and genetic approach gives us a clear idea about the relations between several genes involved in the differentiation of the neurons in the avian pallium. In this idea, they maintain the role of miR-19b as a hub controller, keeping the ventricular zone cells in an undifferentiated stage to perpetuate the cellular pool.  

      (3) Future directions  

      The findings open a door to future experiments, particularly to a better comprehension of the role of microRNAs and pallidal genetic connections. Furthermore, this work also proves the use of avians as a model to study cortical development due to the similarities with mammals.  

      Weaknesses:  

      While there are questions answered, there are still several that remain unsolved. The experiments analyzed here lead us to speculate that the early differentiation of the progenitor cells from the ventricular zone entails a reduction in the cellular pool, affecting thereafter the number of latter-born neurons (upper layers). The authors should explore that option by testing progenitor cell markers in the ventricular zone, such as Pax6. Even so, it remains possible that miR-19b is also changing the expression pattern of neurons that are going to populate the different layers, instead of their numbers, so the authors cannot rule that out or verify it. Since the paper focuses on the role of miR-19b in patterning, I think the authors should check the relationship and expression between progenitors (Pax6) and intermediate (Tbr2) cells when miR-19b is affected. Since neuronal expression markers change so fast within a few days (HH24HH35), I don't understand why the authors stop the functional experiments at different time points.  

      To address this, we will examine the expression of Pax6 and Tbr2 following both gain-of-function and loss-of-function manipulations of miR-19b. We agree with the reviewer that miR-19b may influence not only the number of neurons but also the expression pattern of neuronal markers.  Due to the limitations of our experimental design, we acknowledge that this possibility cannot be ruled out. 

      Regarding time points chosen for the functional experiments: We selected different stages based on the expression dynamics of specific markers. To detect possible ectopic induction, we analyzed developmental stages where the expression of a given marker is normally absent. Conversely, to detect loss of expression we examined stages in which the marker is typically expressed robustly. This approach allowed us to better interpret the functional consequences of miR-19b manipulation within relevant developmental windows. 

      Reviewer #3 (Public review):  

      Summary:  

      This is a timely article that focuses on the molecular machinery in charge of the proliferation of pallial neural stem cells in chicks, and aims to compare them to what is known in mammals. miR19b is related to controlling the expression of E2f8 and NeuroD1, and this leads to a proper balance of division/differentiation, required for the generation of the right number of neurons and their subtype proportions. In my opinion, many experiments do reflect an interaction between all these genes and transcription factors, which likely supports the role of miR19b in participating in the proliferation/differentiation balance.  

      Strengths:  

      Most of the methodologies employed are suitable for the research question, and present data to support their conclusions.  

      The authors were creative in their experimental design, in order to assess several aspects of pallial development.  

      Weaknesses:  

      However, there are several important issues that I think need to be addressed or clarified in order to provide a clearer main message for the article, as well as to clarify the tools employed. I consider it utterly important to review and reinterpret most of the anatomical concepts presented here. The way the are currently used is confusing and may mislead readers towards an understanding of the bird pallium that is no longer accepted by the community.  

      Major Concerns:  

      (1) Inaccurate use of neuroanatomy throughout the entire article. There are several aspects to it, that I will try to explain in the following paragraphs:  

      Figure 1 shows a dynamic and variable expression pattern of miR19b and its relation to NeuroD1. Regardless of the terms used in this figure, it shows that miR19b may be acting differently in various parts of the pallium and developmental stages. However, all the rest of the experiments in the article (except a few cases) abolish these anatomical differences. It is not clear, but it is very important, where in the pallium the experiments are performed. I refer here, at least, to Figures 2C, E, F, H, I; 3D, E; 4C, D, G, I. Regarding time, all experiments were done at HH22, and the article does not show the native expression at this stage. The sacrifice timing is variable, and this variability is not always justified. But more importantly, we don't know where those images were taken, or what part of the pallium is represented in the images. Is it always the same? Do results reflect differences between DVR and Wulst gene expression modifications? The authors should include low magnification images of the regions where experiments were performed. And they should consider the variable expression of all genes when interpreting results.  

      We agree that precise anatomical context is essential. In the revised version, we propose to: 

      a) Include schematics of the regions of interest where experimental manipulations were performed.

      b) Provide low-magnification panoramic images where appropriate, for anatomical reference.

      c) Show the expression patterns of relevant marker genes to better justify stages and region selection. 

      d) Provide the expression pattern of markers in panoramic view to show differential expression in the DVR and Wulst region and interpret our results accordingly.

      b) SVZ is not a postmitotic zone (as stated in line 123, and wrongly assigned throughout the text and figures). On the contrary, the SVZ is a secondary proliferative zone, organized in a layer, located in a basal position to the VZ. Both (VZ and SVZ) are germinative zones, containing mostly progenitors. The only postmitotic neurons in VZ and SVZ occupy them transiently when moving to the mantle zone, which is closer to the meninges and is the postmitotic territory. Please refer to the original Boulder committee articles to revise the SVZ definition. The authors, however, misinterpret this concept, and label the whole mantle zone as it this would be the SVZ. Indeed, the term "mantle zone" does not appear in the article. Please, revise and change the whole text and figures, as SVZ statements and photographs are nearly always misinterpreted. Indeed, SVZ is only labelled well in Figure 4F.  

      The two articles mentioning the expression of NeuroD1 in the SVZ (line 118) are research in Xenopus. Is there a proliferative SVZ in Xenopus?  

      For the actual existence of the SVZ in the chick pallium, please refer to the recent Rueda-Alaña et al., 2025 article that presents PH3 stainings at different timepoints and pallial areas.  

      We appreciate the correction suggested by the reviewer. In the revised manuscript: a) SVZ will be labeled correctly in all figures and descriptions b) The mantle zone terminology will be incorporated appropriately c) The two Xenopus-based references in line 118 will be removed as they are not directly relevant and d) We will refer to the Rueda-Alaña et al., (2025) to guide accurate anatomical labeling and interpretation of proliferative zones.

      We also acknowledge that while some proliferative cells exist in the SVZ of the chicken, they are relatively few and do not express typical basal progenitor markers such as Tbr2 (Nomura et al., 2016, Development). We will ensure that this nuance is clearly reflected in the text. 

      What is the Wulst, according to the authors of the article? In many figures, the Wulst includes the medial pallium and hippocampus, whereas sometimes it is used as a synonym of the hyperpallium (which excludes the medial pallium and hippocampus). Please make it clear, as the addition or not of the hippocampus definitely changes some interpretations.  

      We propose to modify the text and figures to accurately represent the correct location of the Wulst in the chick pallium.

      d) The authors compare the entirety of the chick pallium - including the hippocampus (see above), hyperpallium, mesopallium, nidopallium - to only the neocortex of mammals. This view - as shown in Suzuki et al., 2012 - forgets the specificity of pallial areas of the pallium and compares it to cortical cells. This is conceptually wrong, and leads to incorrect interpretations (please refer to Luis Puelles' commentaries on Suzuki et al results); there are incorrect conclusions about the existence of upper-layer-like and deep-layer-like neurons in the pallium of birds. The view is not only wrong according to the misinterpreted anatomical comparisons, but also according to novel scRNAseq data (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025). These articles show that many avian glutamatergic neurons of the pallium have highly diversified, and are not comparable to mammalian cortical cells. The authors should therefore avoid this incorrect use of terminology. There are not such upper-layer-like and deeplayer-like neurons in the pallium of birds.  

      We acknowledge this conceptual oversight. In the manuscript: a) We will avoid direct comparisons between the entire chick pallium and the mammalian neocortex b) Terms like “upper-layer-like” and deep-layer-like” neurons will be removed or modified d) We will cite and integrate recent findings from Rueda-Alaña et al. (2025), Zaremba et al. (2025), and Hecker et al. (2025), which provide updated insights from scRNAseq analyses into the complexity of avian pallial neurons. Cell types will be described based on marker gene expression only, without unsupported evolutionary or homology claims.

      (2) From introduction to discussion, the article uses misleading terms and outdated concepts of cell type homology and similarity between chick and pallial territories and cells. The authors must avoid this confusing terminology, as non-expert readers will come to evolutionary conclusions which are not supported by the data in this article; indeed, the article does not deal with those concepts.  

      We agree with the reviewer. In the revised version, we will remove the misleading terms and outdated concepts and avoid speculative evolutionary conclusions.  

      a) Recent articles published in Science (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025) directly contradict some views presented in this article. These articles should be presented in the introduction as they are utterly important for the subject of this article and their results should be discussed in the light of the new findings of this article. Accordingly, the authors should avoid claiming any homology that is not currently supported. The expression of a single gene is not enough anymore to claim the homology of neuronal populations.  

      In the revised version, these above-mentioned articles (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025) will be included in the introduction and discussion.  Our interpretations will be updated to reflect these new insights into neuronal diversity and regionalization in the chick pallium. 

      Auditory cortex is not an appropriate term, as there is no cortex in the pallium of birds. Cortical areas require the existence of neuronal arrangements in laminae that appear parallel to the ventricular surface. It is not the case of either hyperpallium or auditory DVR. The accepted term, according to the Avian Nomenclature forum, is Field L.  

      We will replace all instances of “auditory cortex” with “Field L”, as per the accepted terminology in the Avian Nomenclature Forum.

      c) Forebrain, a term overused in the article, is very unspecific. It includes vast areas of the brain, from the pretectum and thalamus to the olfactory bulb. However, the authors are not researching most of the forebrain here. They should be more specific throughout the text and title.  

      In the revised version, we will replace “forebrain” with “Pallium” throughout the manuscript to more accurately reflect the regions studied.

      (3) In the last part of the results, the authors claim miR19b has a role in patterning the avian pallium. What they see is that modifying its expression induces changes in gene expression in certain neurons. Accordingly, the altered neurons would differentiate into other subtypes, not similar to the wild type example. In this sense, miR19b may have a role in cell specification or neuronal differentiation. However, patterning is a different developmental event, which refers to the determination of broad genetic areas and territories. I don't think miR19b has a role in patterning.  

      We agree with the reviewers that an alteration in one marker for a particular cell type may not indicate a change in patterning. However, including the effect of miR-19b gain- and loss-of-function on Pax6 and Tbr2, may strengthen the idea that it affects patterning as suggested by reviewer #2. 

      (4) Please add a scheme of the molecules described in this article and the suggested interaction between them.  

      In the revised version, we propose to include a diagram to visually summarize the proposed interactions between miR-19b, E2f8, NeuroD1, and other key regulators.  

      (5) The methods section is way too brief to allow for repeatability of the procedures. This may be due to an editorial policy but if possible, please extend the details of the experimental procedures.  

      We will expand the Methods section to provide more detailed protocols and justifications for experimental design, in alignment with journal policy.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors aim to understand the neural basis of implicit causal inference, specifically how people infer causes of illness. They use fMRI to explore whether these inferences rely on content-specific semantic networks or broader, domain-general neurocognitive mechanisms. The study explores two key hypotheses: first, that causal inferences about illness rely on semantic networks specific to living things, such as the 'animacy network,' given that illnesses affect only animate beings; and second, that there might be a common brain network supporting causal inferences across various domains, including illness, mental states, and mechanical failures. By examining these hypotheses, the authors aim to determine whether causal inferences are supported by specialized or generalized neural systems.

      The authors observed that inferring illness causes selectively engaged a portion of the precuneus (PC) associated with the semantic representation of animate entities, such as people and animals. They found no cortical areas that responded to causal inferences across different domains, including illness and mechanical failures. Based on these findings, the authors concluded that implicit causal inferences are supported by content-specific semantic networks, rather than a domain-general neural system, indicating that the neural basis of causal inference is closely tied to the semantic representation of the specific content involved.

      Strengths:

      (1) The inclusion of the four conditions in the design is well thought out, allowing for the examination of the unique contribution of causal inference of illness compared to either a different type of causal inference (mechanical) or non-causal conditions. This design also has the potential to identify regions involved in a shared representation of inference across general domains.

      (2) The presence of the three localizers for language, logic, and mentalizing, along with the selection of specific regions of interest (ROIs), such as the precuneus and anterior ventral occipitotemporal cortex (antVOTC), is a strong feature that supports a hypothesis-driven approach (although see below for a critical point related to the ROI selection).

      (3) The univariate analysis pipeline is solid and well-developed.

      (4) The statistical analyses are a particularly strong aspect of the paper.

      Weaknesses:

      Based on the current analyses, it is not yet possible to rule out the hypothesis that inferring illness causes relies on neurocognitive mechanisms that support causal inferences irrespective of their content, neither in the precuneus nor in other parts of the brain.

      (1) The authors, particularly in the multivariate analyses, do not thoroughly examine the similarity between the two conditions (illness-causal and mechanical-causal), as they are more focused on highlighting the differences between them. For instance, in the searchlight MVPA analysis, an interesting decoding analysis is conducted to identify brain regions that represent illness-causal and mechanical-causal conditions differently, yielding results consistent with the univariate analyses. However, to test for the presence of a shared network, the authors only perform the Causal vs. Non-causal analysis. This analysis is not very informative because it includes all conditions mixed together and does not clarify whether both the illness-causal and mechanical-causal conditions contribute to these results.

      (2) To address this limitation, a useful additional step would be to use as ROIs the different regions that emerged in the Causal vs. Non-causal decoding analysis and to conduct four separate decoding analyses within these specific clusters:

      (a) Illness-Causal vs. Non-causal - Illness First;

      (b) Illness-Causal vs. Non-causal - Mechanical First;

      (c) Mechanical-Causal vs. Non-causal - Illness First;

      (d) Mechanical-Causal vs. Non-causal - Mechanical First.

      This approach would allow the authors to determine whether any of these ROIs can decode both the illness-causal and mechanical-causal conditions against at least one non-causal condition.

      (3) Another possible analysis to investigate the existence of a shared network would be to run the searchlight analysis for the mechanical-causal condition versus the two non-causal conditions, as was done for the illness-causal versus non-causal conditions, and then examine the conjunction between the two. Specifically, the goal would be to identify ROIs that show significant decoding accuracy in both analyses.

      The hypothesis that a neural mechanism supports causal inference across domains predicts higher univariate responses when causal inferences occur than when they do not. This prediction was not generated by us ad hoc but rather has been made by almost all previous cognitive neuroscience papers on this topic (Ferstl & von Cramon, 2001; Satpute et al., 2005; Fugelsang & Dunbar, 2005; Kuperberg et al., 2006; Fenker et al., 2010; Kranjec et al., 2012; Pramod, Chomik-Morales, et al., 2023; Chow et al., 2008; Mason & Just, 2011; Prat et al., 2011). Contrary to this hypothesis, we find that the precuneus (PC) is most activated for illness inferences and most deactivated for mechanical inferences relative to rest, suggesting that the PC does not support domain-general causal inference. To further probe the selectivity of the PC for illness inferences, we created group overlap maps that compare PC responses to illness inferences and mechanical inferences across participants. The PC shows a strong preference for illness inferences and is therefore unlikely to support causal inferences irrespective of their content (Supplementary Figures 6 and 7). We also note that, in whole-cortex analysis, no shared regions responded more to causal inference than noncausal vignettes across domains. Therefore, the prediction made by the ‘domain-general causal engine’ proposal as it has been articulated in the literature is not supported in our data.

      Taking a multivariate approach, the hypothesis that a neural mechanism supports causal inference across domains also predicts that relevant regions can decode between all possible pairs of causal vs. noncausal conditions (e.g., Illness-Causal vs. Noncausal-Illness First, Mechanical-Causal vs. Noncausal-Illness First, etc.). The analysis described by the reviewer in (2), in which the regions that distinguish between causal vs. noncausal conditions in searchlight MVPA are used as ROIs to test various causal vs. noncausal contrasts, is non-independent. Therefore, we cannot perform this analysis. In accordance with the reviewer’s suggestions in (3), now include searchlight MVPA results for the mechanical inference condition compared to the two noncausal conditions (Supplementary Figure 9). No regions are shared across the searchlight analyses comparing all possible pairs of causal and noncausal conditions, providing further evidence that there are no shared neural responses to causal inference in our dataset.

      (4) Along the same lines, for the ROI MVPA analysis, it would be useful not only to include the illness-causal vs. mechanical-causal decoding but also to examine the illness-causal vs. non-causal conditions and the mechanical-causal vs. non-causal conditions. Additionally, it would be beneficial to report these data not just in a table (where only the mean accuracy is shown) but also using dot plots, allowing the readers to see not only the mean values but also the accuracy for each individual subject.

      We have performed these analyses and now include a table of the results as well as figures displaying the dispersion across participants (Supplementary Tables 2 and 3, Supplementary Figures 10 and 11). In the left PC, the illness inference condition was decoded from one of the noncausal conditions, and the mechanical inference condition was decoded from the same noncausal condition. The language network did not decode between any causal/noncausal pairs. In the logic network, the illness inference condition was decoded from one of the noncausal conditions, and the mechanical inference condition was decoded from the other noncausal condition. Thus, no regions showed the predicted ‘domain-general’ pattern, i.e., significant decoding between all causal/noncausal pairs. 

      Importantly, the decoding results must be interpreted in light of significant univariate differences across conditions (e.g., greater responses to illness inferences compared to noncausal vignettes in the PC). Linear classifiers are highly sensitive to univariate differences (Coutanche, 2013; Kragel et al., 2012; Hebart & Baker, 2018; Woolgar et al., 2014; Davis et al., 2014; Pakravan et al., 2022).

      (5) The selection of Regions of Interest (ROIs) is not entirely straightforward:

      In the introduction, the authors mention that recent literature identifies the precuneus (PC) as a region that responds preferentially to images and words related to living things across various tasks. While this may be accurate, we can all agree that other regions within the ventral occipital-temporal cortex also exhibit such preferences, particularly areas like the fusiform face area, the occipital face area, and the extrastriate body area. I believe that at least some parts of this network (e.g., the fusiform gyrus) should be included as ROIs in this study. This inclusion would make sense, especially because a complementary portion of the ventral stream known to prefer non-living items (i.e., anterior medial VOTC) has been selected as a control ROI to process information about the mechanical-causal condition. Given the main hypothesis of the study - that causal inferences about illness might depend on content-specific semantic representations in the 'animacy network' - it would be worthwhile to investigate these ROIs alongside the precuneus, as they may also yield interesting results.

      We thank the reviewer for their suggestion to test the FFA region. We think this provides an interesting comparison to the PC and hypothesized that, in contrast to the PC, the FFA does not encode abstract causal information about animacy-specific processes (i.e., illness). As we mention in the Introduction, although the fusiform face area (FFA) also exhibits a preference for animates, it does so primarily for images in sighted people (Kanwisher et al., 1997; Kanwisher et al., 1997; Grill-Spector et al., 2004; Noppeney et al., 2006; Konkle & Caramazza, 2013; Connolly et al., 2016; Bi et al., 2016).

      We did not select the FFA as a region of interest when preregistering the current study because we did not predict it would show sensitivity to causal knowledge. In accordance with the reviewer’s suggestions, we now include the FFA as an ROI in individual-subject univariate analysis (Supplementary Figure 8, Appendix 4). Because we did not run a separate FFA localizer task when collecting the data, we used FFA search spaces from a previous study investigating responses to face images (Julian et al., 2012). We followed the same analysis procedure that was used to investigate responses to illness inferences in the PC. Neither left nor right FFA exhibited a preference for illness inferences compared to mechanical inferences or to the noncausal conditions. This result is interesting and is now briefly discussed in the Discussion section.

      (6) Visual representation of results:

      In all the figures related to ROI analyses, only mean group values are reported (e.g., Figure 1A, Figure 3, Figure 4A, Supplementary Figure 6, Figure 7, Figure 8). To better capture the complexity of fMRI data and provide readers with a more comprehensive view of the results, it would be beneficial to include a dot plot for a specific time point in each graph. This could be a fixed time point (e.g., a certain number of seconds after stimulus presentation) or the time point showing the maximum difference between the conditions of interest. Adding this would allow for a clearer understanding of how the effect is distributed across the full sample, such as whether it is consistently present in every subject or if there is greater variability across individuals.

      We thank the reviewer for this suggestion. We now include scattered box plots displaying the dispersion in average percent signal change across participants in Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14.

      (7) Task selection:

      (a) To improve the clarity of the paper, it would be helpful to explain the rationale behind the choice of the selected task, specifically addressing: (i) why an implicit inference task was chosen instead of an explicit inference task, and (ii) why the "magic detection" task was used, as it might shift participants' attention more towards coherence, surprise, or unexpected elements rather than the inference process itself.

      (b) Additionally, the choice to include a large number of catch trials is unusual, especially since they are modeled as regressors of non-interest in the GLM. It would be beneficial to provide an explanation for this decision.

      We chose an orthogonal foil detection task, rather than an explicit causal judgment task, to investigate automatic causal inferences during reading and to unconfound such processing as much as possible from explicit decision-making processes (see Kuperberg et al., 2006 for discussion). Analogous foil detection paradigms have been used to study sentence processing and word recognition (Pallier et al., 2011; Dehaene-Lambertz et al., 2018). We now clarify this in the Introduction. The “magical” element occurred both within and across sentences so that participants could not use coherence as a cue to complete the task. Approximately 1/5 (19%) of the trials were magical catch trials to ensure that participants remained attentive throughout the experiment.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors hypothesize that "causal inferences about illness depend on content-specific semantic representations in the animacy network". They test this hypothesis in an fMRI task, by comparing brain activity elicited by participants' exposure to written situations suggesting a plausible cause of illness with brain activity in linguistically equivalent situations suggesting a plausible cause of mechanical failure or damage and non-causal situations. These contrasts identify PC as the main "culprit" in a whole-brain univariate analysis. Then the question arises of whether the content-specificity has to do with inferences about animates in general, or if there are some distinctions between reasoning about people's bodies versus mental states. To answer this question, the authors localize the mentalizing network and study the relation between brain activity elicited by Illness-Causal > Mech-Causal and Mentalizing > Physical stories. They conclude that inferring about the causes of illness partially differentiates from reasoning about people's states of mind. The authors finally test the alternative yet non-mutually exclusive hypothesis that both types of causal inferences (illness and mechanical) depend on shared neural machinery. Good candidates are language and logic, which justifies the use of a language/logic localizer. No evidence of commonalities across causal inferences versus non-causal situations is found.

      Strengths:

      (1) This study introduces a useful paradigm and well-designed set of stimuli to test for implicit causal inferences.

      (2) Another important methodological advance is the addition of physical stories to the original mentalizing protocol.

      (3) With these tools, or a variant of these tools, this study has the potential to pave the way for further investigation of naïve biology and causal inference.

      Weaknesses:

      (1) This study is missing a big-picture question. It is not clear whether the authors investigate the neural correlates of causal reasoning or of naïve biology. If the former, the choice of an orthogonal task, making causal reasoning implicit, is questionable. If the latter, the choice of mechanical and physical controls can be seen as reductive and problematic.

      We have modified the Introduction to clarify that the primary goal of the current study is to test the claim that semantic networks encode causal knowledge – in this case, causal intuitive theories of biology. Most conceptions of intuitive biology, intuitive psychology, and intuitive physics describe them as causal frameworks (e.g., Wellman & Gelman, 1992; Simons & Keil, 1995; Keil et al., 1999; Tenenbaum, Griffiths, & Niyogi, 2007; Gopnik & Wellman, 2012; Gerstenberg & Tenenbaum, 2017). As noted above, we chose an implicit task to investigate automatic causal inferences during reading and to unconfound such processing as much as possible from explicit decision-making processes. We are not sure what the reviewer means when they say that mechanical and physical controls are reductive. This is the standard control condition in neural and behavioral paradigms that investigate intuitive psychology and intuitive biology (e.g., Saxe & Kanwisher, 2003; Gelman & Wellman, 1991).

      (2) The rationale for focusing mostly on the precuneus is not clear and this choice could almost be seen as a post-hoc hypothesis.

      This study is preregistered (https://osf.io/6pnqg). The preregistration states that the precuneus is a hypothesized area of interest, so this is not a post-hoc hypothesis. Our hypothesis was informed by multiple prior studies implicating the precuneus in the semantic representation of animates (e.g., people, animals) (Fairhall & Caramazza, 2013a, 2013b; Fairhall et al., 2014; Peer et al., 2015; Wang et al., 2016; Silson et al., 2019; Rabini, Ubaldi, & Fairhall, 2021; Deen & Freiwald, 2022; Aglinskas & Fairhall, 2023; Hauptman, Elli, et al., 2025). We also conducted a pilot experiment with separate participants prior to pre-registering the study. We now clarify our rationale for focusing on the precuneus in the Introduction:

      “Illness affects living things (e.g., people and animals) rather than inanimate objects (e.g., rocks, machines, houses). Thinking about living things (animates) as opposed to non-living things (inanimate objects/places) recruits partially distinct neural systems (e.g., Warrington & Shallice, 1984; Hillis & Caramazza, 1991; Caramazza & Shelton, 1998; Farah & Rabinowitz, 2003). The precuneus (PC) is part of the ‘animacy’ semantic network and responds preferentially to living things (i.e., people and animals), whether presented as images or words (Devlin et al., 2002; Fairhall & Caramazza, 2013a, 2013b; Fairhall et al., 2014; Peer et al., 2015; Wang et al., 2016; Silson et al., 2019; Rabini, Ubaldi, & Fairhall, 2021; Deen & Freiwald, 2022; Aglinskas & Fairhall, 2023; Hauptman, Elli, et al., 2025). By contrast, parts of the visual system (e.g., fusiform face area) that respond preferentially to animates do so primarily for images (Kanwisher et al., 1997; Grill-Spector et al., 2004; Noppeney et al., 2006; Mahon et al., 2009; Konkle & Caramazza, 2013; Connolly et al., 2016; see Bi et al., 2016 for a review). We hypothesized that the PC represents causal knowledge relevant to animates and tested the prediction that it would be activated during implicit causal inferences about illness, which rely on such knowledge (preregistration: https://osf.io/6pnqg).”

      (3) The choice of an orthogonal 'magic detection' task has three problematic consequences in this study:

      (a) It differs in nature from the 'mentalizing' task that consists of evaluating a character's beliefs explicitly from the corresponding story, which complicates the study of the relation between both tasks. While the authors do not compare both tasks directly, it is unclear to what extent this intrinsic difference between implicit versus explicit judgments of people's body versus mental states could influence the results.

      (b) The extent to which the failure to find shared neural machinery between both types of inferences (illness and mechanical) can be attributed to the implicit character of the task is not clear.

      (c) The introduction of a category of non-interest that contains only 36 trials compared to 38 trials for all four categories of interest creates a design imbalance.

      We disagree with the reviewer’s argument that our use of an implicit “magic detection” task is problematic. Indeed, we think it is one of the advances of the current study over prior work.

      a) Prior work has shown that implicit mentalizing tasks (e.g., naturalistic movie watching) engages the theory of mind network, suggesting that the implicit/explicit nature of the task does not drive the activation of this network (Jacoby et al., 2016; Richardson et al., 2018). With these data in mind, it is unlikely that the implicit/explicit nature of the causal inference and theory of mind tasks in the present experiment can explain observed differences between them.

      b) Explicit causal inferences introduce a collection of executive processes that potentially confound the results and make it difficult to know whether neural signatures are related to causal inference per se. The current study focuses on the neural basis of implicit causal inference, a type of inference that is made routinely during language comprehension. We do not claim to find neural signatures of all causal inferences, we do not think any study could claim to do so because causal inferences are a highly varied class.

      c) Our findings do not exclude the possibility that content-invariant responses are elicited during explicit causality judgments. We clarify this point in the Results (e.g., “These results leave open the possibility that domain-general systems support the explicit search for causal connections”) and Discussion (e.g., “The discovery of novel causal relationships (e.g., ‘blicket detectors’; Gopnik et al., 2001) and the identification of complex causes, even in the case of illness, may depend in part on domain-general neural mechanisms”).

      d) Because the magic trials are excluded from our analyses, it is unclear how the imbalance in the number of magic trials could influence the results and our interpretation of them. We note that the number of catch trials in standard target detection paradigms are sometimes much lower than the number of target trials in each condition (e.g., Pallier et al., 2011).

      (4) Another imbalance is present in the design of this study: the number of trials per category is not the same in each run of the main task. This imbalance does not seem to be accounted for in the 1st-level GLM and renders a bit problematic the subsequent use of MVPA.

      Each condition is shown either 6 or 7 times per run (maximum difference of 1 trial between conditions), and the number of trials per condition is equal across the whole experiment: each condition is shown 7 times in two of the runs and 6 times four of the runs. This minor design imbalance is typical of fMRI experiments and should not impact our interpretations of the data, particularly because we average responses from each condition within a run before submitting them to MVPA.

      (5) The main claim of the authors, encapsulated by the title of the present manuscript, is not tested directly. While the authors included in their protocol independent localizers for mentalizing, language, and logic, they did not include an independent localizer for "animacy". As such, they cannot provide a within-subject evaluation of their claim, which is entirely based on the presence of a partial overlap in PC (which is also involved in a wide range of tasks) with previous results on animacy.

      We respectfully disagree with this assertion. Our primary analysis uses a within-subject leave-one-run-out approach. This approach allows us to use part of the data itself to localize animacy-relevant causal responses in the PC without engaging in ‘double-dipping’ or statistical non-independence (Vul & Kanwisher, 2011). We also use the mentalizing network localizer as a partial localizer for animacy. This is because the control condition (physical reasoning) does not include references to people or any animate agents (Supplementary Figures 1 and 15). We now clarify this point in Methods section of the paper (see below).

      From the Methods: “To test the relationship between neural responses to inferences about the body and the mind, and to localize animacy regions, we used a localizer task to identify the mentalizing network in each participant (Saxe & Kanwisher, 2003; Dodell-Feder et al., 2011; http://saxelab.mit.edu/use-our-efficient-false-belief-localizer)...Our physical stories incorporated more vivid descriptions of physical interactions and did not make any references to human agents, enabling us to use the mentalizing localizer as a localizer for animacy.”

      Reviewer #3 (Public review):

      Summary:

      This study employed an implicit task, showing vignettes to participants while a bold signal was acquired. The aim was to capture automatic causal inferences that emerge during language processing and comprehension. In particular, the authors compared causal inferences about illness with two control conditions, causal inferences about mechanical failures and non-causal phrases related to illnesses. All phrases that were employed described contexts with people, to avoid animacy/inanimate confound in the results. The authors had a specific hypothesis concerning the role of the precuneus (PC) in being sensitive to causal inferences about illnesses.

      These findings indicate that implicit causal inferences are facilitated by semantic networks specialized for encoding causal knowledge.

      Strengths:

      The major strength of the study is the clever design of the stimuli (which are nicely matched for a number of features) which can tease apart the role of the type of causal inference (illness-causal or mechanical-causal) and the use of two localizers (logic/language and mentalizing) to investigate the hypothesis that the language and/or logical reasoning networks preferentially respond to causal inference regardless of the content domain being tested (illnesses or mechanical).

      Weaknesses:

      I have identified the following main weaknesses:

      (1) Precuneus (PC) and Temporo-Parietal junction (TPJ) show very similar patterns of results, and the manuscript is mostly focused on PC (also the abstract). To what extent does the fact that PC and TPJ show similar trends affect the inferences we can derive from the results of the paper? I wonder whether additional analyses (connectivity?) would help provide information about this network.

      We thank the reviewer for this suggestion. While the PC shows the most robust univariate preference for illness inferences compared to both mechanical inferences and noncausal vignettes, the TPJ also shows a preference for illness inferences compared to mechanical inferences in individual-subject fROI analysis. However, as we mention in the Results section, the TPJ does not show a preference for illness inferences compared to noncausal vignettes, suggesting that the TPJ is selective for animacy but may not be as sensitive to causal knowledge about animacy-specific processes. When describing our results, we refer to the ‘animacy network’ (i.e., PC and TPJ) but also highlight that the PC exhibited the most robust responses to illness inferences (from the Results: “Inferring illness causes preferentially recruited the animacy semantic network, particularly the PC”; from the Discussion: “We find that a semantic network previously implicated in thinking about animates, particularly the precuneus (PC), is preferentially engaged when people infer causes of illness…”). We did not collect resting state data that would enable a connectivity analysis, as the reviewer suggests. This is an interesting direction for future work.

      (2) Results are mainly supported by an univariate ROI approach, and the MVPA ROI approach is performed on a subregion of one of the ROI regions (left precuneus). Results could then have a limited impact on our understanding of brain functioning.

      The original and current versions of the paper include results from multiple multivariate analyses, including whole-cortex searchlight MVPA and individual-subject fROI MVPA performed in multiple search spaces (see Supplementary Figures 10 and 11, Supplementary Tables 2 and 3).

      We note that our preregistered predictions focused primarily on univariate differences. This is because the current study investigates neural responses to inferences, and univariate increases in activity is thought to reflect the processing of such inferences. We use multivariate analyses to complement our primary univariate analyses. However, given that we observe significant univariate effects and that multivariate analyses are heavily influenced by significant univariate effects (Coutanche, 2013; Kragel et al., 2012; Hebart & Baker, 2018; Woolgar et al., 2014; Davis et al., 2014; Pakravan et al., 2022), our univariate results constitute the main findings of the paper.

      (3) In all figures: there are no measures of dispersion of the data across participants. The reader can only see aggregated (mean) data. E.g., percentage signal changes (PSC) do not report measures of dispersion of the data, nor do we have bold maps showing the overlap of the response across participants. Only in Figure 2, we see the data of 6 selected participants out of 20.

      We thank the reviewer for this suggestion. We now include graphs depicting the dispersion of the data across participants in the following figures: Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14. We have also created 2 figures that display the overlap of univariate responses across participants (Supplementary Figures 6 and 7). These figures show that there is high overlap across participants in PC responses to illness inferences but not mechanical inferences. In addition, all participants’ results from the analysis depicted in Figure 2 are included in Supplementary Figure 3. 

      (4) Sometimes acronyms are defined in the text after they appear for the first time.

      We thank the reviewer for pointing this out. We now define all acronyms before using them.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I was unable to access the pre-registration on OSF because special permission is required.

      We apologize for this technical error. The preregistration is now publicly available: https://osf.io/6pnqg.

      (2) The length of the MRI session is quite long (around 2 hours). It is generally discouraged to have such extended data acquisition periods, as this can affect the stability and cleanliness of the data. Did you observe any effects of fatigue or attention decline in your data?

      The session was 2 hours long including 1-2 10-minute breaks. Without breaks, the scan would be approximately 1.5 hours. This is a standard length for MRI experiments. The main experiment (causal inference task) was always conducted first and lasted approximately 1 hour. Accuracy did not decrease across the 6 runs of this experiment (repeated measures ANOVA, F<sub>(5,114)</sub> = 1.35, p = .25).

      (3) The last sentence of the results states: "Although MVPA searchlight analysis identified several areas where patterns of activity distinguished between causal and non-causal vignettes, all of these regions showed a preference for non-causal vignettes in univariate analysis (Supplementary Figure 5)." This statement is not entirely accurate. As I previously pointed out, the MVPA searchlight analysis is not very informative and is difficult to interpret. However, as previously suggested, there are additional steps that could be taken to better understand and interpret these results. It is incorrect to conclude that because the brain regions identified in the MVPA analyses show a preference for non-causal vignettes in univariate analyses, the multivariate results lack value. While univariate analyses may show a preference for a specific condition, multivariate analyses can reveal more fine-grained representations of multiple conditions. For a notable example, consider the fusiform face area (FFA) that shows a clear preference for faces at the univariate level but can significantly decode other categories at the multivariate level, even when faces are not included in the analysis.

      The decoding analysis that the reviewer is suggesting for the current study would be analogous to identifying univariate differences between faces and places in the FFA and then decoding between faces and places and claiming that the FFA represents places because the decoding is significant. The decoding analyses enabled by our design are not equivalent to decoding within a condition (e.g., among face identities, among types of illness inferences), as the reviewer suggests above. It is not that such multivariate analyses “lack value” but that they recapitulate established univariate differences. Multivariate analyses are useful for revealing more fine-grained representations when i) significant univariate differences are not observed, or ii) when it is possible to decode among categories within a condition (e.g., among face identities, among types of illness inferences). We are currently collecting data that will enable us to perform within-condition decoding analyses in future work, but the design of the current study does not allow for such a comparison.

      We note that the original quotation from the manuscript has been removed because it is no longer accurate. When including participant response time as a covariate of no interest in the GLM, no regions are shared across the 4 searchlight analyses comparing causal and noncausal conditions, suggesting that there are no shared neural responses to causal inference in our dataset.

      Reviewer #2 (Recommendations for the authors):

      (1) Moderating the strength of some claims made to justify the main hypothesis (e.g., "people but not machines transmit diseases to each other through physical contact").

      We changed this wording so that it now reads: “Illness affects living things (e.g., people and animals) rather than inanimate objects (e.g., rocks, machines, houses).” (Introduction)

      (2) Expanding the paragraph introducing the sub-question about inferring people's "body states" vs "mental states". In addition, given the order in which the hypotheses are introduced, and the results are presented, I would suggest switching the order of presentation of both localizers in the methods section and adding a quick reminder of the hypotheses that justify using these localizers.

      We thank the reviewer for these suggestions. In accordance their suggestions, we have expanded the paragraph Introduction that introduces the “body states” vs. “mental states” question (see below). We have also switched the order of the localizer descriptions in the Methods section and added a sentence at the start of each section describing the relevant hypotheses (see below).

      From the Introduction: “We also compared neural responses to causal inferences about the body (i.e., illness) and inferences about the mind (i.e., mental states). Both types of inferences are about animate entities, and some developmental work suggests that children use the same set of causal principles to think about bodies and minds (Carey, 1985, 1988). Other evidence suggests that by early childhood, young children have distinct causal knowledge about the body and the mind (Springer & Keil, 1991; Callanan & Oakes, 1992; Wellman & Gelman, 1992; Inagaki & Hatano, 1993; 2004; Keil, 1994; Hickling & Wellman, 2001; Medin et al., 2010). For instance, preschoolers are more likely to view illness as a consequence of biological causes, such as contagion, rather than psychological causes, such as malicious intent (Springer & Ruckel, 1992; Raman & Winer, 2004; see also Legare & Gelman, 2008). The neural relationship between inferences about bodies and minds has not been fully described. The ‘mentalizing network’, including the PC, is engaged when people reason about agents’ beliefs (Saxe & Kanwisher, 2003; Saxe et al., 2006; Saxe & Powell, 2006; Dodell-Feder et al., 2011; Dufour et al., 2013). We localized this network in individual participants and measured its neuroanatomical relationship to the network activated by illness inferences.”

      From the Methods, localizer descriptions: “To test the relationship between neural responses to inferences about the body and the mind, and to localize animacy regions, we used a localizer task to identify the mentalizing network in each participant… To test for the presence of domain-general responses to causal inference in the language and logic networks (e.g., Kuperberg et al., 2006; Operskalski & Barbey, 2017), we used an additional localizer task to identify both networks in each participant.”

      (3) Adding a quick analysis of lateralization to support the corresponding claim of left lateralization of responses to causal inferences.

      In accordance with the reviewer’s suggestion, we now include hemisphere as a factor in all ANOVAs comparing univariate responses across conditions.

      From the Results: “In individual-subject fROI analysis (leave-one-run-out), we similarly found that inferring illness causes activated the PC more than inferring causes of mechanical breakdown (repeated measures ANOVA, condition (Illness-Causal, Mechanical-Causal) x hemisphere (left, right): main effect of condition, F<sub>(1,19)</sub> = 19.18, p < .001, main effect of hemisphere, F<sub>(1,19)</sub> = 0.3, p = .59, condition x hemisphere interaction, F<sub>(1,19)</sub> = 27.48, p < .001; Figure 1A). This effect was larger in the left than in the right PC (paired samples t-tests; left PC: t<sub>(19)</sub> = 5.36, p < .001, right PC: t<sub>(19)</sub> = 2.27, p = .04)…In contrast to the animacy-responsive PC, the anterior PPA showed the opposite pattern, responding more to mechanical inferences than illness inferences (leave-one-run-out individual-subject fROI analysis; repeated measures ANOVA, condition (Mechanical-Causal, Illness-Causal) x hemisphere (left, right): main effect of condition, F<sub>(1,19)</sub> = 17.93, p < .001, main effect of hemisphere, F<sub>(1,19)</sub> = 1.33, p = .26, condition x hemisphere interaction, F<sub>(1,19)</sub> = 7.8, p = .01; Figure 4A). This effect was significant only in the left anterior PPA (paired samples t-tests; left anterior PPA: t<sub>(19)</sub> = 4, p < .001, right anterior PPA: t<sub>(19)</sub> = 1.88, p = .08).”

      (4) Making public and accessible the pre-registration OSF link.

      We apologize for this technical error. The preregistration is now publicly available: https://osf.io/6pnqg.

      Reviewer #3 (Recommendations for the authors):

      In all figures: there are no measures of dispersion of the data across participants. The reader can only see aggregated (mean) data. E.g., percentage signal changes (PSC) do not report measures of dispersion of the data, nor do we have bold maps showing the overlap of the response across participants. Only in Figure 2, we see the data of 6 selected participants out of 20.

      We thank the reviewer for this suggestion. We now include graphs depicting the dispersion of the data across participants in the following figures: Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14. We have also created 2 figures that display the overlap of univariate responses across participants (Supplementary Figures 6 and 7). In addition, all participants’ results from the analysis depicted in Figure 2 are included in Supplementary Figure 3.

      Minor

      (1) Figure 2: Spatial dissociation between responses to illness inferences and mental state inferences in the precuneus (PC). If the analysis is the result of the MVPA, the figure should report the fact that only the left precuneus was analyzed.

      Figure 2 depicts the spatial dissociation in univariate responses to illness inferences and mental state inferences. We now clarify this in the figure legend.

      (2) VOTC and PSC acronyms are defined in the text after they appear for the first time. TPJ is never defined.

      We thank the reviewer for pointing this out. We now define all acronyms before using them.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The paper addresses the knowledge gap between the representation of goal direction in the central complex and how motor systems stabilize movement toward that goal. The authors focused on two descending neurons, DNa01 and 02, and showed that they play different roles in steering the fly toward a goal. They also explored the connectome data to propose a model to explain how these DNs could mediate response to lateralized sensory inputs. They finally used lateralized optogenetic activation/inactivation experiments to test the roles of these neurons in mediating turnings in freely walking flies.

      Strengths:

      The experiments are well-designed and controlled. The experiment in Figure 4 is elegant, and the authors put a lot of effort into ensuring that ATP puffs do not accidentally activate the DNs. They also have explained complex experiments well. I only have minor comments for the authors.

      We are grateful for this positive feedback.

      Weaknesses:

      (1) I do not fully understand how the authors extracted the correlation functions from the population data in Figure 1. Since the ipsilateral DNs are anti-correlated with the contralateral ones, I expected that the average will drop to zero when they are pooled together (e.g., 1E-G). Of course, this will not be the case if all the data in Figure 1 are collected from the same brain hemisphere. It would be helpful if the authors could explain this.

      We regret that this information was not easy to find in our initial submission. As noted in the Figure 1D legend, Here and elsewhere, ipsi and contra are defined relative to the recorded DN(s). We have now added a sentence to the Results (right after we introduce Figure 1D) that also makes this point.

      (2) What constitutes the goal directions in Figures 1-3 and 8, as the authors could not use EPG activity as a proxy for goal directions? If these experiments were done in the dark, without landmarks, one would expect the fly's heading to drift randomly at times, and they would not engage the DNa01/02 for turning. Do the walking trajectories in these experiments qualify as menotactic bouts?

      Published work (Green et al., 2019) has shown that, even in the dark, flies will often walk for extended periods while holding the bump of EPG activity at a fixed location. During these epochs, the brain is essentially estimating that the fly is walking in a straight line in a fixed direction. (The fact that the fly is actually rotating a bit on the spherical treadmill is not something the fly can know, in the dark.) Thus, epochs where the EPG bump is held fixed are treated as menotactic bouts, even in darkness.

      Our results provide additional support for this interpretation. We find that, when flies are walking in darkness and holding the bump of EPG activity at a fixed location, they will make a corrective behavioral turning maneuver in response to an imposed bump-jump. This result argues that the flies are actually engaging in goal-directed straight-line walking, i.e. menotaxis, and it reproduces the findings of Green et al. (2019).

      To clarify this point, we have adjusted the wording of the Results pertaining to Figure 4.

      (3) In Figure 2B, the authors mentioned that DNa02 overpredicts and 01 underpredicts rapid turning and provided single examples. It would be nice to see more population-level quantification to support this claim.

      In this revision, we have reorganized Figures 1 and 2 (and associated text) to improve clarity. As part of this reorganization, we have removed this passage from the text, as it was a minor point in any event.

      Reviewer #2 (Public review):

      The data is largely electrophysiological recordings coupled with behavioral measurements (technically impressive) and some gain-of-function experiments in freely walking flies. Loss-of-function was tested but had minimal effect, which is not surprising in a system with partially redundant control mechanisms. The data is also consistent with/complementary to subsequent manuscripts (Yang 2023, Feng 2024, and Ros 2024) showing additional descending neurons with contributions to steering in walking and flying.

      The experiments are well executed, the results interesting, and the description clear. Some hypotheses based on connectome anatomy are tested: the insights on the pre-synaptic side - how sensory and central complex heading circuits converge onto these DNs are stronger than the suggestions about biomechanical mechanisms for how turning happens on the motor side.

      Of particular interest is the idea that different sensory cues can converge on a common motor program. The turn-toward or turn-away mechanism is initiated by valence rather than whether the stimulus was odor or temperature or memory of heading. The idea that animals choose a direction based on external sensory information and then maintain that direction as a heading through a more internal, goal-based memory mechanism, is interesting but it is hard to separate conclusively.

      To clarify, we mention the role of memory in connection with two places in the manuscript. First, we note that the EPG/head direction system relies on learning and memory to construct a map of directional cues in the environment. These cues are, in principle, inherently neutral, i.e. without valence. Second, we note that specific mushroom body output neurons rely on learning and memory to store the valence associated with an odor. This information is not necessarily associated with an allocentric direction: it is simply the association of odor with value. Both of these ideas are well-attested by previous work.

      The reviewer may be suggesting a sequential scheme whereby the brain initializes an allocentric goal direction based on valence, and then maintains that goal direction in memory, based on that initialization. In other words, memory is used to associate valence with some allocentric direction. This seems plausible, but it is not a claim we make in our manuscript.

      The "see-saw", where left-right symmetry is broken to allow a turn, presumably by excitation on one side and inhibition of the other leg motor modules, is interesting but not well explained here. How hyperpolarization affects motor outputs is not clear.

      We have added several sentences to the Discussion to clarify this point. According to this see-saw model, steering can emerge from right/left asymmetries in excitation, or inhibition, or both. It may be nonintuitive to think that inhibitory input to a DN can produce an action. However, this becomes more plausible given our finding that DNa02 has a relatively high basal firing rate (Fig. 1D), and DNa02 hyperpolarization is associated with contraversive turning (Fig. 5A). It is also relevant to note that there are many inhibitory cell types that form strong unilateral connections onto DNa02 (e.g., AOTU019).

      The statement near Figure 5B that "DNa02 activity was higher on the side ipsilateral to the attractive stimulus, but contralateral to the aversive stimulus" is really important - and only possible to see because of the dual recordings.

      We thank the reviewer for this positive feedback.

      Reviewer #3 (Public review):

      Summary:

      Rayshubskiy et al. performed whole-cell recordings from descending neurons (DNs) of fruit flies to characterize their role in steering. Two DNs implicated in "walking control" and "steering control" by previous studies (Namiki et al., 2018, Cande et al., 2018, Chen et al., 2018) were chosen by the authors for further characterization. In-vivo whole-cell recordings from DNa01 and DNa02 showed that their activity predicts spontaneous ipsilateral turning events. The recordings also showed that while DNa02 predicts transient turns DNa01 predicts slow sustained turns. However, optogenetic activation or inactivation showed relatively subtle phenotypes for both neurons (consistent with data in other recent preprints, Yang et al 2023 and Feng et al 2024). The authors also further characterized DNa02 with respect to its inputs and showed a functional connection with olfactory and thermosensory inputs as well as with the head-direction system. DNa01 is not characterized to this extent.

      Strengths:

      (1) In-vivo recordings and especially dual recordings are extremely challenging in Drosophila and provide a much higher resolution DN characterization than other recent studies that have relied on behavior or calcium imaging. Especially impressive are the simultaneous recordings from bilateral DNs (Figure 3). These bilateral recordings show clearly that DNa02 cells not only fire more during ipsilateral turning events but that they get inhibited during contralateral turns. In line with this observation, the difference between left and right DNa02 neuronal activity is a much better predictor of turning events compared to individual DNa02 activity.

      (2) Another technical feat in this work is driving local excitation in the head-direction neuronal ensemble

      (PEN-1 neurons), while simultaneously imaging its activity and performing whole-cell recordings from DNa02

      (Figure 4). This impressive approach provided a way to causally relate changes in the head-direction system to DNa02 activity. Indeed, DNa02 activity could predict the rate at which an artificially triggered bump in the PEN-1 ring attractor returns to its previous stable point.

      (3) The authors also support the above observations with connectomics analysis and provide circuit motifs that can explain how the head direction system (as well as external olfactory/thermal stimuli) communicated with DNa02. All these results unequivocally put DNa02 as an essential DN in steering control, both during exploratory navigation as well as stimulus-directed turns.

      We are grateful for this detailed positive feedback.

      Weaknesses:

      (1) I understand that the first version of this preprint was already on biorxiv in 2020, and some of the "weaknesses" I list are likely a reflection of the fact that I'm tasked to review this manuscript in late 2024 (more than 4 years later). But given this is a 2024 updated version it suffers from laying out the results in contemporary terms. For instance, the manuscript lacks any reference to the DNp09 circuit implicated in object-directed turning and upstream to DNa02 even though the authors cite one of the papers where this was analyzed (Braun et al, 2024). More importantly, these studies (both Braun et al 2024 and Sapkal et al 2024) along with recent work from the authors' lab (Yang et al 2023) and other labs (Feng et al 2024) provide a view that the entire suite of leg kinematics changes required for turning are orchestrated by populations of heterogeneous interconnected DNs. Moreover, these studies also show that this DN-DN network has some degree of hierarchy with some DNs being upstream to other DNs. In this contemporary view of steering control, DNa02 (like DNg13 from Yang et al 2023) is a downstream DN that is recruited by hierarchically upstream DNs like DNa03, DNp09, etc. In this view, DNa02 is likely to be involved in most turning events, but by itself unable to drive all the motor outputs required for the said events. This reasoning could be used while discussing the lack of major phenotypes with DNa02 activation or inactivation observed in the current study, which is in stark contrast to strong phenotypes observed in the case of hierarchically upstream DNs like DNp09 or DNa03. In the section, "Contributions of single descending neuron types to steering behavior": the authors start off by asking if individual DNs can make measurable contributions to steering behavior. Once more, any citations to DNp09 or DNa03 - two DNs that are clearly shown to drive strong turning-on activation (Bidaye et al, 2020, Feng et al 2024) - are lacking. Besides misleading the reader, such statements also digress the results away from contemporary knowledge in the field. I appreciate that the brief discussion in the section titled "Ensemble codes for steering" tries to cover these recent updates. However, I think this would serve a better purpose in the introduction and help guide the results.

      We apologize for these omissions of relevant citations, which we have now fixed. Specifically, in our revised Discussion, we now point out that:

      - Braun et al. (2024) reported that bilateral optogenetic activation of either DNa02 or DNa01 can drive turning (in either direction). 

      - Braun et al. (2024) also identified DNb02 as a steering-related DN.

      - Bidaye et al. (2020), Sapkal et al. (2024), and Braun et al. (2024) all contributed to the identification of DNp09 as a broadcaster DN with the capacity to promote ipsiversive turning.

      We have also revised the beginning of the Results section titled “Contributions of single descending neuron types to steering behavior”, as suggested by the Reviewer.

      Finally, we agree with the Reviewer’s overall point that steering is influenced by multiple DNs. We have not claimed that any DN is solely responsible for steering. As we note in the Discussion: “We found that optogenetically inhibiting DNa01 produced only small defects in steering, and inhibiting DNa02 did not produce statistically significant effects on steering; these results make sense if DNa02 is just one of many steering DNs.”

      (2) The second major weakness is the lack of any immunohistochemistry (IHC) images quantifying the expression of the genetic tools used in these studies. Even though the main split-Gal4 tools for DNa01 and DNa02 were previously reported by Namiki et al, 2018, it is important to document the expression with the effectors used in this work and explicitly mention the expression in any ectopic neurons. Similarly, for any experiments where drivers were combined together (double recordings, functional connectivity) or modified for stochastic expression (Figure 8), IHC images are absolutely necessary. Without this evidence, it is difficult to trust many of the results (especially in the case of behavioral experiments in Figure 8). For example, the DNa01 genetic driver used by the authors is also expressed in some neurons in the nerve cord (as shown on the Flylight webpage of Janelia Research Campus). One wonders if all or part of the results described in Figure 8 are due to DNa01 manipulation or manipulation of the nerve cord neurons. The same applies for optic lobe neurons in the DNa02 driver.

      This is a reasonable request. We used DN split-Gal4 lines to express three types of UAS-linked transgenes:

      (1) GFP

      In these flies, we know that expression in DNs is restricted to the DN types in question, based on published work (Namki et al., 2018), as well as the fact that we see one labeled DN soma per hemisphere. When we label both cells with GFP, we use the spike waveform to identify DNa02 and DNa01, as described in Figure S1

      (2) ReaChR

      In these flies, expression patterns were different in different flies because ReaChR expression was stochastically sparsened using hs-FLP. Expression was validated in each fly after the experiment, as described in the Methods (“Stochastic ReaChR expression”). hs-FLP-mediated sparsening will necessarily produce stochastic patterns of expression in both DNa02 and off-target cells, and this is true of all the flies in this experiment. What makes the “unilateral” flies distinct from the “bilateral” flies is that unilateral flies express ReaChR in one copy of DNa02, whereas bilateral flies express ReaChR in both copies of DNa02. On average, off-target expression will be the same in both groups.

      (3) GtACR1

      In these flies, we initially assumed that GtACR1 expression was the same as GFP expression under control of the same driver. However, we agree with the reviewer’s point that these two expression patterns are not necessarily identical. Therefore, to address the reviewer’s question, we performed immunofluorescence microscopy to characterize GtACR1 patterns in the brain and VNC of both genotypes. These expression patterns are now shown in a new supplemental figure (Figure S8). This figure shows that, as it happens, expression of GtACR1 is indeed indistinguishable from the GFP expression patterns for the same lines (archived on the FlyLight website). Both DN split-Gal4 lines are largely selective for the DNs in question, with limited off-target labeling. We have now drawn attention to this off-target labeling in the last paragraph of the Results, where the GtACR1 results are discussed.

      (3) The paper starts off with a comparative analysis of the roles of DNa01 and DNa02 during steering. Unfortunately, after this initial analysis, DNa01 is largely ignored for further characterization (e.g. with respect to inputs, connectomics, etc.), only to return in the final figure for behavioral characterization where DNa01 seems to have a stronger silencing phenotype compared to DNa02. I couldn't find an explanation for this imbalance in the characterization of DNa01 versus DNa02. Is this due to technical reasons? Or was it an informed decision due to some results? In addition to being a biased characterization, this also results in the manuscript lacking a coherent thread, which in turn makes it a bit inaccessible to the non-specialist.

      Yes, the first portion of the manuscript focuses on DNa01 and DNa02. The latter part of the manuscript transitions to focus mainly on DNa02. 

      Our rationale is noted at the point in the manuscript where we make this transition, with the section titled “Steering toward internal goals”: “Having identified steering-related DNs, we proceeded to investigate the brain circuits that provide input to these DNs. Here we decided to focus on DNa02, as this cell’s activity is predictive of larger steering maneuvers.” When we say that DNa02 is predictive of larger steering maneuvers, we are referring to several specific results:

      - We obtain larger filter amplitudes for DNa02 versus DNa01 (Fig. 2A-C). This means that, just after a unit change in DN firing rate, we see on average a larger change in steering velocity for DNa02 versus DNa01.

      - The linear filter for DNa02 has a higher variance explained, as compared to DNa01 (Fig. 2D). This means that DNa02 is more predictive of steering.

      - The relationship between firing rate and rotational velocity (150 ms later) is steeper for DNa02 than for DNa01 (Fig. 2G). This means that, if we ignore dynamics and we just regress firing rate against subsequent rotational velocity, we see a higher-gain relationship for DNa02.

      Our focus on DNa02 was also driven by connectivity considerations. In the same paragraph (the first paragraph in the section titled “Steering toward internal goals”). We note that “there are strong anatomical pathways from the central complex to DNa02”; the same is not true of DNa01. This point has also been noted by other investigators (Hulse et al. 2021).

      We don’t think this focus on DNa02 makes our work biased or inaccessible. Any study must balance breadth with depth. A useful general way to balance these constraints is to begin a study with a somewhat broader scope, and then narrow the study’s focus to obtain more in-depth information. Here, we began with comparative study of two cell types, and we progressed to the cell type that we found more compelling.

      (4) There seems to be a discrepancy with regard to what is emphasized in the main text and what is shown in Figures S3/S4 in relation to the role of these DNs in backward walking. There are only two sentences in the main text where these figures are cited.

      a) "DNa01 and DNa02 firing rate increases were not consistently followed by large changes in forward velocity

      (Figs. 1G and S3)."

      b) "We found that rotational velocity was consistently related to the difference in right-left firing rates (Fig. 3B). This relationship was essentially linear through its entire dynamic range, and was consistent across paired recordings (Fig. 3C). It was also consistent during backward walking, as well as forward walking (Fig. S4)." These main text sentences imply the role of the difference between left and right DNa02 in turning. However, the actual plots in the Figures S3 and S4 and their respective legends seem to imply a role in "backward walking". For instance, see this sentence from the legend of Figure S3 "When (ΔvoltageDNa02>>ΔvoltageDNa01), the fly is typically moving backward. When (firing rateDNa02>>firing rateDNa01), the fly is also often moving backward, but forward movement is still more common overall, and so the net effect is that forward velocity is small but still positive when (firing rateDNa02>>firing rateDNa01). Note that when we condition our analysis on behavior rather than neural activity, we do see that backward walking is associated with a large firing rate differential (Fig. S4)." This sort of discrepancy in what is emphasized in the text, versus what is emphasized in the figures, ends up confusing the reader. More importantly, I do not agree with any of these conclusions regarding the implication of backward walking. Both Figures S3 and S4 are riddled with caveats, misinterpretations, and small sample sizes. As a result, I actually support the authors' decision to not infer too much from these figures in the "main text". In fact, I would recommend going one step further and removing/modifying these figures to focus on the role of "rotational velocity". Please find my concerns about these two figures below:

      a) In Figures S3 and S4, every heat map has a different scale for the same parameter: forward velocity. S3A is -10 to +10mm/s. S3B is -6 to +6 S4B (left) is -12 to +12 and S4B (right) is -4 to +4. Since the authors are trying to depict results based on the color-coding this is highly problematic.

      b) Figure S3A legend "When (ΔvoltageDNa02>>ΔvoltageDNa01), the fly is typically moving backward." There are also several instances when ΔvoltageDNa02= ΔvoltageDNa01 and both are low (lower left quadrant) when the fly is typically moving backwards. So in my opinion, this figure in fact suggests DNa02 has no role in backward velocity control.

      c) Based on the example traces in S4A, every time the fly walks backwards it is also turning. Based on this it is important to show absolute rotational velocity in Figure S4C. It could be that the fly is turning around the backward peak which would change the interpretation from Figure S4C. Also, it is important to note that the backward velocities in S4A are unprecedentedly high. No previous reports show flies walking backwards at such high velocities (for example see Chen et al 2018, Nat Comm. for backward walking velocities on a similar setup).

      d) In my opinion, Figure S4D showing that right-left DNa02 correlates with rotational velocity, regardless of whether the fly is in a forward or backward walking state, is the only important and conclusive result in Figures S3/S4. These figures should be rearranged to only emphasize this panel.

      We agree that it is difficult to interpret some of the correlations between DN activity and forward velocity, given that forward velocity and rotational velocity are themselves correlated to some degree. This is why we did not make claims based on these results in the main text. In response to these comments, we have taken the Reviewer’s suggestion to preserve Figure S4D (now Figure S3). The other components of these supplemental figures have been removed.

      (5) Figure 3 shows a really nice analysis of the bilateral DNa02 recordings data. While Figure S5 [now Figure S4] shows that authors have a similar dataset for DNa01, a similar level analysis (Figures 3D, E) is not done for DNa01 data. Is there a reason why this is not done?

      The reason we did not do the same analysis for DNa01 is that we only have two paired DNa01-DNa01 recordings. It turned out to be substantially more difficult to perform DNa01-DNa01 recordings, as compared to DNa02-DNa02 recordings. For this reason, we were not able to get more than two of these recordings.

      (6) In Figure 4 since the authors have trials where bump-jump led to turning in the opposite direction to the DNa02 being recorded, I wonder if the authors could quantify hyperpolarization in DNa02 as is predicted from connectomics data in Figure 7.

      We agree this is an interesting question. However, DNa02 firing rate and membrane potential are variable, and stimulus-evoked hyperpolarizations in these DNs tend to be relatively small (on the order of 1 mV, in the case of a contralateral fictive olfactory stimulus, Figure 5A). In the case of our fictive olfactory stimuli, we could look carefully for these hyperpolarizations because we had a very large number of trials, and we could align these trials precisely to stimulus onset. By contrast, for the bump-jump experiments, we have a more limited number of trials, and turning onset is not so tightly time-locked to the chemogenetic stimuli; for these reasons, we are hesitant to make claims about any bump-jump-related hyperpolarization in these trials.

      (7) Figure 6 suggests that DNa02 contains information about latent steering drives. This is really interesting. However, in order to unequivocally claim this, a higher-resolution postural analysis might be needed. Especially given that DNa02 activation does not reliably evoke ipsilateral turning, these "latent" steering events could actually contain significant postural changes driven by DNa02 (making them "not latent"). Without this information, at least the authors need to explicitly mention this caveat.

      This is a good point. We cannot exclude the possibility that DNa02 is driving postural changes when the fly is stopped, and these postural changes are so small we cannot detect them. In this case, however, there would still be an interesting mismatch between the stimulus-evoked change in DNa02 firing rate (which is large) and the stimulus-evoked postural response (which would be very small). We have added language to the relevant Results section in order to make this explicit.

      (8) Figure 7 would really benefit from connectome data with synapse numbers (or weighted arrows) and a corresponding analysis of DNa01.

      In response to this comment, we have added synapses number information (represented by weighted arrows) to Figures 7C, E, and F. We also added information to the Methods to explain how cells were chosen for inclusion in this diagram. (In brief: we thresholded these connections so as to discard connections with small numbers of synapses.)

      We did perform an analogous connectome circuit analysis for DNa01, but if we use the same thresholds as we do for DNa02, we obtain a much sparser connectivity graph. We now show this in a new supplemental figure (Figure S9). MBON32 makes no monosynaptic connections onto DNa01, and it only forms one disynaptic connection, via LAL018, which is relatively weak. PFL3 and PFL2 make no mono- or disynaptic connections onto DNa01 comparable in strength to what we find for DNa02. 

      The sparser connectivity graph for DNa01 is partly due to the fact that fewer cell types converge onto DNa01 as compared to DNa02 (110 cell types, versus 287 cell types). Also, it seems that DNa01 is simply less closely connected to the central complex and mushroom body, as compared to DNa02.

      (9) In Figure 8E, the most obvious neuronal silencing phenotype is decreased sideways velocity in the case of DNa01 optogenetic silencing. In Figure S2, the inverse filter for sideways velocity for DNa01 had a higher amplitude than the rotational velocity filter. Taken together, does this point at some role for DNa01 in sideways velocity specifically?

      No. The forward filters describe the average velocity impulse response, given a brief step change in firing rate.

      Figure 1 and Figure S2 show that the sideways velocity forward filter is actually smaller for DNa01 than for DNa02. This means that a brief step change in DNa01 firing rate is followed by only a very small sideways velocity response. Conversely, the reverse filters describe the average firing rate impulse response, given a brief step change in sideways velocity. Figure S2 shows that the sideways velocity reverse filter is larger for DNa01 than for DNa02, but this means that the relationship between DNa01 activity and sideways velocity is so weak that we would need to see a very large neural response in order to get a brief step change in sideways velocity. In other words, the reverse filter says that DNa01 likely has very little role in determining sideways velocity.

      (10) In Figure 8G, the effect on inner hind leg stance prolongation is very weak, and given the huge sample size, hard to interpret. Also, it is not clear how this fits with the role of DNa01 in slow sustained turning based on recordings.

      Yes, this effect is small in magnitude, which is not too surprising, given that many DNs seem to be involved in the control of steering in walking. To clarify the interpretation of these phenotypes, we have added a paragraph to the end of the Results:

      “All these effects are weak, and so they should be interpreted with caution. Also, both DN split-Gal4 lines drive expression in a few off-target cell types, which is another reason for caution (Fig. S8). However, they suggest that both DNs can lengthen the stance phase of the ipsilateral back leg, which would cause ipsiversive turning. These results are also compatible with a scenario where both DNs decrease the step length in the ipsilateral legs, which would also cause ipsiversive turning. Step frequency does not normally change asymmetrically during turning, so the observed decrease in step frequency during optogenetic inhibition may just be a by-product of increasing step length when these DNs are inhibited.” We have also added caveats and clarifications in a new Discussion paragraph:

      “Our study does not fully answer the question of how these DNs affect leg kinematics, because we were not able to simultaneously measure DN activity and leg movement. However, our optogenetic experiments suggest that both DNs can lengthen the stance phase of the ipsilateral back leg (Fig. 8G), and/or  decrease the step length in the ipsilateral legs (Fig. 8H), either of which would cause ipsiversive turning. If these DNs have similar qualitative effects on leg kinematics, then why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I found the sign conventions for rotational velocity particularly confusing. Figure 3 represents clockwise rotations as +ve values, but Figure 4H represents anticlockwise rotations as positive values. But for EPG bumps, anticlockwise rotations are given negative values. Please make them consistent unless I am missing something obvious.

      Different fields use different conventions for yaw velocity. In aeronautics, a clockwise turn is generally positive. In robotics and engineering of terrestrial vehicles, a counterclockwise turn is generally positive. Historically, most Drosophila studies that quantified rotational (yaw) velocity were focused on the behavior of flying flies, and these studies generally used the convention from aeronautics, where a clockwise turn is defined as a positive turn. When we began working in the field, we adopted this convention, in order to conform to previous literature. It might be argued that walking flies are more like robots than airplanes, but it seemed to us that it was confusing to have different conventions for different behaviors of the same animal. Thus, all of the published studies from our lab define clockwise rotation as having positive rotational velocity.

      Figure 4 focuses on the role of the central complex in steering. As the fly turns clockwise (rightward), the bump of activity in EPG neurons normally moves counterclockwise around the ellipsoid body, as viewed from the posterior side (Turner-Evans et al., 2017). The posterior view is the conventional way to represent these dynamics, because (1) we and others typically image the brain from the posterior side, not the anterior side, and (2) in a posterior view, the animal’s left is on the left side of the image, and vice versa. We have added a sentence to the Figure 4A legend to clarify these points.

      Previous work has shown that, when an experimenter artificially “jumps” the EPG bump, this causes the fly to make a compensatory turn that returns the bump to (approximately) its original location (Green et al., 2019). Our work supports this observation. Specifically, we find that clockwise bump jumps are generally followed by rightward turns (which drive the bump to return to its approximate original location via a counterclockwise path), and vice versa. This is noted in the Figure 4D legend. Note that Figure 4D plots the fly’s rotational velocity during the bump return, plotted against the initial bump jump. 

      Figure 4H shows that clockwise (blue) bump returns were typically preceded by leftward turning, counter-clockwise (green) bump returns were preceded by rightward turning, as expected. This is detailed in the Figure 4H legend, and it is consistent with the coordinate frame described above.

      (2) It would be helpful to have images of the DNa01 and DNa02 split lines used in this paper, considering this paper would most likely be used widely to describe the functions of these neurons. Similarly, images of their reconstructions would be a useful addition.

      High-quality three-dimensional confocal stacks of all the driver lines used in our study are publicly available. We have added this information to the Methods (under “Fly husbandry and genotypes”). Confocal images of the full morphologies of DNa01 and DNa02 have been previously published (Namiki et al., 2018). Figure 1A is a schematic that is intended to provide a quick visual summary of this information.

      EM reconstructions of DNa01 and DNa02 are publicly accessible in a whole-brain dataset (https://codex.flywire.ai/) and a whole-VNC dataset (https://neuprint.janelia.org/). Both datasets are referenced in our study. As these datasets are easy to search and browse via user-friendly web-based tools, we expect that interested readers will have no difficulty accessing the underlying datasets directly.

      Reviewer #2 (Recommendations for the authors):

      (1) The description of the activity of the DNs that they "PREDICT steering during walking". This is an interesting word choice. Not causes, not correlates with, not encodes... does that mean the activity always precedes the action? Does that mean when you see activity, you will get behavior? This is important for assessing whether the DN activity is a cause or an effect. It is good to be cautious but it might be worth expanding on exactly what kind of connection is implied to justify the use of the word 'predict'.

      Conventionally, “predict” means “to indicate in advance”. We write that DNs “predict” certain features of behavior. We use this term because (1) these DNs correlate with certain features of behavior, and (2) changes in DN activity precede changes in behavior.

      The notion that neurons can “predict” behavior is not original to our study. Whenever neuroscientists summarize the relationship between neural activity and behavior by fitting a mathematical model (which may be as simple as a linear regression), the fitted model can be said to represent a “prediction” of behavior. These models are evaluated by comparing their predictions with measured behaviors. A good model is predictive, but it also implies that the underlying neural signal is also predictive (Levenstein et al., 2023 Journal of Neuroscience 43: 1074-1088; DOI: 10.1523/JNEUROSCI.1179-22.2022). Here, prediction simply means correlation, without necessarily implying causation. We also use “prediction” to imply correlation.

      We do not think the term “prediction” implies determinism. Meteorologists are said to predict the weather, but it is understood that their predictions are probabilistic, not deterministic. Certainly, we would not claim that there is a deterministic relationship between DN activity and behavior. Figure 2D shows that neither DN type can explain all the variance in the fly’s rotational or sideways velocity. At the same time, both DNs have significant predictive power.

      We might equally say that these DNs “encode” behavior. We have chosen to use the word “predict” rather than “encode” because we do not think it is necessary to use the framework of symbolic communication in connection with these DNs.

      We agree with the Reviewer that it is helpful to test whether any neuron that “predicts” a behavior might also “cause” this behavior. In Figure 8, we show that directly perturbing these DNs can indeed alter locomotor behavior, which suggests a causal role. Connectome analyses also suggest a causal role for these DNs in locomotor behavior (Figure 1B, see especially also Cheong et al., 2024).

      At the same time, it is clear from our results that these DNs are not “command neurons” for turning: they do not deterministically cause turning. Therefore, to avoid misunderstanding, we have generally been careful to summarize the results of our perturbation experiments by avoiding the statement that “this DN causes this behavior”. Rather, we have generally tried to say that “this DN influences this behavior”, or “this DN promotes this behavior”.

      (2) There is some concern about how the linear filter models were developed and then used to predict the relationship between firing rate and steering behavior: how exactly were the build and test data separated to avoid re-extracting the input? It reads like a self-fulfilling prophecy/tautology.

      We used conventional cross-validation for model fitting and evaluation. We apologize that this was not made explicit in our original submission; this was due to an oversight on our part. To be clear: linear filters were computed using the data from the first 20% of a given experiment. We then convolved each cell’s firing rate estimate with the computed Neuron→Behavior filter (the “forward filter”) using the data from the final 80% of the experiment, in order to generate behavioral predictions. Thus, when a model has high variance explained, this is not attributable to overfitting: rather, it quantifies the bona fide predictive power of the model. We have added this information to the Methods (under “Data analysis - Linear filter analysis”).

      (3) Type-O right above Figure 2 [now Figure 1E]: I assume spike rate fluctuations in DNa02 precede DNa01?

      Fixed. Thank you for reading the manuscript carefully.

      (4) The description of the other manuscripts about neural control of the steering as "follow-up" papers is a bit diminishing. They were likely independent works on a similar theme that happened afterwards, rather than deliberate extensions of this paper, so "subsequent" might be a more accurate description.

      We apologize, as we did not intend this to be diminishing. Given this request, we have revised “follow-up” to “subsequent”.

      (5) The idea that DNa02 is high-gain because it is more directly connected to motor neurons is a hypothesis and this should be made clear. We really don't know the functional consequences of the directness of a path or the number of synapses, and which circuits you compare to would change this. DNa02 may be a higher gain than DNa01, but what about relative to the other DNs that enter pre-motor regions? How do you handle a few synapses and several neurons in a common class? All of these connectivity-based deductions await functional tests - like yours! I think it is better to make this clear so readers don't assume a higher level of certainty than we have.

      The Reviewer asks how we handled few-synapse connections, and how we combined neurons in the same class. We apologize for not making this explicit in our original submission. We have now added this information to the Methods. Briefly, to select cell types for inclusion in Figures 7C, we identified all individual cells postsynaptic to PFL3 and presynaptic to DNa02, discarding any unitary connections with <5 synapses. We then grouped unitary connections by cell type, and then summed all synapse numbers within each connection group (e.g., summing all synapses in all PFL3→LAL126 connections). We then discarded connection groups having <200 synapses or <1% of a cell type’s pre- or postsynaptic total. Reported connection weights are per hemisphere, i.e. half of the total within each connection group. For Figure 7F we did the same, but now discarding connection groups having <70 synapses or <0.4% of a cell type’s pre- or postsynaptic total. In Figure S9, we used the same procedures for analyzing connections onto DNa01. 

      We agree that it is tricky to infer function from connectome data, and this applies to motor neuron connectivity. We bring up DN connectivity onto motor neurons in two places. First, in the Results, we note that “steering filters (i.e., rotational and sideways velocity filters) were larger for DNa02 (Fig. 2A,B). This means that an impulse change in firing rate predicts a larger change in steering for this neuron. In other words, this result suggests that DNa02 operates with higher gain. This may be related to the fact that DNa02 makes more direct output synapses onto motor neurons (Fig. 1B) [emphasis added].” We feel this is a relatively conservative statement.

      Subsequently, in the Discussion, we ask, “why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B) [emphasis added].” Again, we feel this is a relatively conservative statement.

      To be sure, none of the motor neurons postsynaptic to DNa02 actually receive most of their synaptic input from DNa02 (or indeed any DN), and this is typical of motor neurons controlling leg muscles. Rather, leg motor neurons tend to get most of their input from interneurons rather than motor neurons (Cheong et al. 2024). Available data suggests that the walking rhythm originates with intrinsic VNC central pattern generators, and the DNs that influence walking do so, in large part, by acting on VNC interneurons. These points have been detailed in recent connectome analyses (see especially Cheong et al. 2024).

      We are reluctant to broaden the scope of our connectome analyses to include other DNs for comparison, because we think these analyses are most appropriate to full-central-nervous-system-(CNS)-connectomes (brain and VNC together), which are currently under construction. Without a full-CNS-connectome, many of the DN axons in the VNC cannot be identified. In the future, we expect that full-CNS-connectomes will allow a systematic comparison of the input and output connectivity of all DN types, and probably also the tentative identification of new steering DNs. Those future analyses should generate new hypotheses about the specializations of DNa02, DNa01, and other DNs. Our study aims to help lay a conceptual foundation for that future work.

      (6) Given the emphasis on the DNa02 to Motor Neuron connectivity shown (Figure 1B) and multiple text mentions, could you include more analyses of which motor neurons are downstream and how these might be expected to affect leg movements? I would like to see the synapse numbers (Figure 1B) as well as the fraction of total output synapses. These additions would help understand the evidence for the "see-saw" model.

      We agree this is interesting. In follow-up work from our lab (Yang et al., 2023), we describe the detailed VNC connectivity linking DNa02 to motor neurons. We refer the Reviewer specifically to Figure 7 of that study (https://www.cell.com/cell/fulltext/S0092-8674(24)00962-0).

      We regret that the see-saw model was perhaps not clear in our original submission. Briefly, this model proposes that an increase in excitatory synaptic input to one DN (and/or a disinhibition of that DN) is often accompanied by an increase in inhibitory synaptic input to the contralateral DN. This model is motivated by connectome data on the brain inputs to DNa02 (Figure 7), along with our observation that excitation of one DN is often accompanied by inhibition of the contralateral DN (Figure 5). We have now added text to the Results in several places in order to clarify these points. 

      This model specifically pertains to the brain inputs to DNs, comparing the downstream targets of these DNs in the VNC would not be a test of this hypothesis. The Reviewer may be asking to see whether there is any connectivity in the brain from one DN to its contralateral partner. We do not find connections of this sort, aside from multisynaptic connections that rely on very weak links (~10 synapses per connection). Figure 7 depicts a much stronger basis for this hypothesis, involving feedforward see-saw connections from PFL3 and MBON32. 

      (7) The conclusions from the data in Figure 8 could be explained more clearly. These seem like small effect sizes on subtle differences in leg movements - maybe like what was seen in granular control by Moonwalker's circuits? Measuring joint angles or step parameters might help clarify, but a summary description would help the reader.

      We agree that these results were not explained very well in our original submission. 

      In our revised manuscript, we have added a new paragraph to the end of this Results section providing some summary and interpretation:

      “All these effects are weak, and so they should be interpreted with caution. However, they suggest that both DNs can lengthen the stance phase of the ipsilateral back leg, which would promote ipsiversive turning. These results are also compatible with a scenario where both DNs decrease the step length in the ipsilateral legs, which would also promote ipsiversive turning. Step frequency does not normally change asymmetrically during turning, so the observed decrease in step frequency during optogenetic inhibition may just be a by-product of increasing step length when these DNs are inhibited.”

      Moreover, in the Discussion, we have also added a new paragraph that synthesizes these results with other results in our study, while also noting the limitations of our study:

      “Our study does not fully answer the question of how these DNs affect leg kinematics, because we were not able to simultaneously measure DN activity and leg movement. However, our optogenetic experiments suggest that both DNs can lengthen the stance phase of the ipsilateral back leg (Fig. 8G), and/or  decrease the step length in the ipsilateral legs (Fig. 8H), either of which would promote ipsiversive turning. If these DNs have similar qualitative effects on leg kinematics, then why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B).”

      In Figure 8D-H, we measure step parameters in freely walking flies during acute optogenetic inhibition of DNa01 and DNa02. In experiments measuring neural activity in flies walking on a spherical treadmill, we did not have a way to measure step parameters. Subsequently, this methodology was developed by Yang et al. (2023) and results for DNa02 are described in that study. 

      Reviewer #3 (Recommendations for the authors):

      Minor Points:

      (1) If space allows, actual membrane potential should be mentioned when raw recordings are shown (for example Figure 1D).

      We have now added absolute membrane potential information to Figure 1d.

      (2) Typo in the sentence "To address this issue directly, we looked closely at the timing of each cell's recruitment in our dual recordings, and found that spike rate fluctuations in DNa02 typically preceded the spike rate fluctuations in DNa02 (Fig. 2A)." The final word should be "DNa01".

      Fixed. Thank you for reading the manuscript carefully.

      (3) Figure 2A - although there aren't direct connections between a01 and a02 in the connectome, the authors never rule out functional connectivity between these two. Given a02 precedes a01, shouldn't this be addressed?

      In the full brain FAFB data set, there are two disynaptic connections from DNa02 onto the ipsilateral copy of DNa01. One connection is via CB0556 (which is GABAergic), and the other is via LAL018 (which is cholinergic). The relevant DNa02 output connections are very weak: each DNa02→CB0556 connection consists of 11 synapses, whereas each DNa02→LAL018 connection consists of 10 synapses (on average). Conversely, each CB0556→DNa01 connection consists of 29 synapses, whereas  each LAL018→DNa01 connection consists of 64 synapses. In short, LAL018 is a nontrivial source of excitatory input to DNa01, but DNa02 is not positioned to exert much influence over LAL018, and the two disynaptic connections from DNa02 onto DNa01 also have the opposite sign. Thus, it seems unlikely that DNa02 is a major driver of DNa01 activity. At the same time, it is difficult to completely exclude this possibility, because we do not understand the logic of the very complicated premotor inputs to these DNs in the brain. Thus, we are hesitant to make a strong statement on this point.